# 임대주택 단지 내 적정 주차수요 예측 대회 

#### [배경]

아파트 단지 내 필요한 주차대수는 ①법정주차대수 ②장래주차수요 중 큰 값에 따라 결정하게되어 있어, 정확한 ②장래주차수요의 산정을 필요로 합니다.
현재 ②장래주차수요는 ‘주차원단위’와 ‘건축연면적’을 기초로하여 산출되고 있으며,
‘주차원단위’는 신규 건축예정 부지 인근의 유사 단지를 피크 시간대 방문하여 주차된 차량대수를 세는 방법으로 조사하고 있습니다.
이 경우 인력조사로 인한 오차발생, 현장조사 시점과 실제 건축시점과의 시간차 등의 문제로 과대 또는 과소 산정의 가능성을 배제할 수 없습니다.

#### [주제] 
유형별 임대주택 설계 시 단지 내 적정 주차 수요를 예측

#### [평가]
- 평가산식 : MAE(Mean Absolute Error)
- Public 평가 : 전체 Test 데이터 중 무작위 33% (50단지)
- Private 평가 : 전체 Test 데이터 중 나머지 67% (100단지)

#### [외부 데이터 및 사전학습 모델]

- 공공 데이터와 같이 누구나 얻을 수 있고 법적 제약이 없는 외부 데이터 허용
- 사전학습 모델의 경우 사전학습에 사용된 데이터를 명시해야함
- 최종 평가시 외부데이터 및 출처 제출

### Data Understanding
##### train.csv - 학습용 데이터
- 단지코드(key)
- 총세대수
- 임대건물구분
- 지역
- 공급유형
- 전용면적
- 전용면적별세대수
- 공가수: 빈집
- 자격유형: 임차를 할 수 있는 자격 요건
- 임대료보증금
- 임대료
- 도보 10분거리 내 지하철역 수(환승노선 수 반영)
- 도보 10분거리 내 버스정류장 수
- 단지내주차면수
- 등록차량수(target)


##### age_gender_info.csv - 지역 임대주택 나이별, 성별 인구 분포
- 지역
- 10대미만(여자)
- 10대미만(남자)
- 20대(여자)
- 20대(남자)
- 30대(여자)
- 30대(남자)
- 40대(여자)
- 40대(남자)
- 50대(여자)
- 50대(남자)
- 60대(여자)
- 60대(남자)
- 70대(여자)
- 70대(남자)
- 80대(여자)
- 80대(남자)
- 90대(여자)
- 90대(남자)
- 100대(여자)
- 100대(남자)

# 필사

In [1]:
PATH = r'C:\Users\rlaek\Desktop\Kaggle&Dakon\Dakon\주차수요 예측 AI 경진대회'

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from matplotlib import font_manager, rc
font_path = 'C:/Windows/Fonts/NGULIM.TTF'
font = font_manager.FontProperties(fname = font_path).get_name()
rc('font', family=font)

import warnings
warnings.filterwarnings('ignore')

In [3]:
age_gender = pd.read_csv(PATH + '\\age_gender_info.csv')
train = pd.read_csv(PATH + '\\train.csv')
test = pd.read_csv(PATH + '\\test.csv')

In [4]:
train.shape, test.shape, age_gender.shape

((2952, 15), (1022, 14), (16, 23))

In [7]:
train.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
총세대수,2952.0,886.661247,513.540168,26.0,513.5,779.0,1106.0,2568.0
전용면적,2952.0,44.757215,31.87428,12.62,32.1,39.93,51.5625,583.4
전용면적별세대수,2952.0,102.747967,132.640159,1.0,14.0,60.0,144.0,1865.0
공가수,2952.0,12.92107,10.778831,0.0,4.0,11.0,20.0,55.0
도보 10분거리 내 지하철역 수(환승노선 수 반영),2741.0,0.176578,0.427408,0.0,0.0,0.0,0.0,3.0
도보 10분거리 내 버스정류장 수,2948.0,3.695726,2.644665,0.0,2.0,3.0,4.0,20.0
단지내주차면수,2952.0,601.66836,396.407072,13.0,279.25,517.0,823.0,1798.0
등록차량수,2952.0,559.768293,433.375027,13.0,220.0,487.0,770.0,2550.0


In [11]:
test.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
총세대수,1022.0,862.080235,536.340894,75.0,488.0,745.0,1161.0,2572.0
전용면적,1022.0,43.706311,35.890759,9.96,33.135,39.72,47.4,583.4
전용면적별세대수,1022.0,100.414873,125.997855,1.0,14.0,60.0,140.0,1341.0
공가수,1022.0,15.544031,11.07014,0.0,6.0,15.0,23.0,45.0
도보 10분거리 내 지하철역 수(환승노선 수 반영),980.0,0.136735,0.4355,0.0,0.0,0.0,0.0,2.0
도보 10분거리 내 버스정류장 수,1022.0,4.626223,5.414568,1.0,2.0,3.0,5.0,50.0
단지내주차면수,1022.0,548.771037,342.636703,29.0,286.0,458.0,711.0,1696.0


- train set과 test set 의 평균과 편차가 큰 차이가 없습니다.

## age_gender_info

In [18]:
ag = age_gender.set_index('지역')
ag

Unnamed: 0_level_0,10대미만(여자),10대미만(남자),10대(여자),10대(남자),20대(여자),20대(남자),30대(여자),30대(남자),40대(여자),40대(남자),...,60대(여자),60대(남자),70대(여자),70대(남자),80대(여자),80대(남자),90대(여자),90대(남자),100대(여자),100대(남자)
지역,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
경상북도,0.030158,0.033195,0.056346,0.06136,0.060096,0.067859,0.053433,0.049572,0.08366,0.072613,...,0.082684,0.063889,0.047717,0.030172,0.029361,0.011211,0.005578,0.001553,0.000234,1.4e-05
경상남도,0.0274,0.026902,0.053257,0.055568,0.06492,0.070618,0.056414,0.05755,0.077092,0.0676,...,0.087201,0.069562,0.048357,0.033277,0.027361,0.011295,0.00491,0.001086,0.000179,1e-05
대전광역시,0.028197,0.029092,0.04049,0.042793,0.060834,0.064247,0.068654,0.066848,0.074667,0.067925,...,0.088468,0.070261,0.05101,0.037143,0.032455,0.013751,0.006494,0.00174,0.000298,6.6e-05
경기도,0.03803,0.039507,0.052546,0.05399,0.058484,0.059894,0.072331,0.068704,0.083208,0.078355,...,0.074237,0.058419,0.042422,0.032725,0.025136,0.012354,0.00539,0.001707,0.00029,6.7e-05
전라북도,0.028089,0.029065,0.059685,0.06008,0.066262,0.070322,0.052027,0.046596,0.077005,0.066645,...,0.076636,0.068042,0.051025,0.035748,0.035049,0.012641,0.007223,0.001898,0.000158,1.3e-05
강원도,0.028888,0.030682,0.051287,0.052712,0.060374,0.063157,0.059821,0.054321,0.076201,0.068002,...,0.088423,0.070014,0.047732,0.03276,0.033515,0.013027,0.007628,0.001677,0.000319,1.7e-05
광주광역시,0.031994,0.034566,0.049851,0.050254,0.065084,0.066875,0.066888,0.064416,0.080028,0.079183,...,0.07593,0.059586,0.048552,0.031754,0.029749,0.010341,0.006343,0.000895,0.000353,1.3e-05
충청남도,0.031369,0.031711,0.059077,0.062422,0.067975,0.072622,0.065095,0.067303,0.07886,0.073418,...,0.070278,0.057692,0.042296,0.028682,0.024514,0.0109,0.005429,0.001549,0.000219,0.000123
부산광역시,0.022003,0.022947,0.032681,0.035512,0.053796,0.057233,0.047049,0.048866,0.061952,0.060769,...,0.109297,0.085294,0.078743,0.053388,0.047908,0.020228,0.008043,0.00224,0.000268,2.8e-05
제주특별자치도,0.03469,0.036695,0.060094,0.06308,0.069135,0.069667,0.050808,0.048026,0.080548,0.07253,...,0.074248,0.055717,0.047944,0.033054,0.026836,0.011332,0.006832,0.000982,0.000368,8.2e-05


In [20]:
ag.loc['전체 평균'] = ag.mean() # row에 전체평균 index 추가
ag.loc['광역시 평균'] = ag.loc[(ag.index.str.contains('시')) & (~ag.index.str.contains('세종'))].mean() # 세종은 광역시가 아니기 때문에 ~을 사용해서 제외
ag.loc['도 평균'] = ag.loc[ag.index.str.contains('도')].mean()`