# Data Loader

- Target : 아파트 낙찰가

- Files
> - **Auction_Master_train.csv** ; 서울/부산 지역의 낙찰가를 포함하여 경매 물건 아파트의 위치, 감정가, 경매 개시/종결일 등의 기본 정보 (최근 2년)
> - **Auction_Master_test.csv** ; 경매 낙찰가를 제외하고 train.csv와 동일
> - **Auction_submission.csv** ; 예측한 낙찰가를 기입하여 제출
> - **Auction_regist.csv** ; 아파트에 대한 등기 정보
> - **Auction_result.csv** ; 경매일자, 감정가, 최저매각가격, 경매 결과 데이터.
> - **Auction_rent.csv** ; 해당 아파트에 임차인이 있는 경우, 전입/점유 여부, 보증금, 월세 등의 데이터.

- Features
> - Auction_key ; 경매 아파트 고유 키값
> - Auction_class ; 경매구분 
> > - **강제경매** : 법원에 소송을 제기하여 판결을 얻은 후 집행권원에 따라 경매를 진행 
> > - **임의경매** : 등기부등본 담보권(저당권, 가압류등)으로 경매를 진행
> - Bid_class ; 입찰구분(일반/개별/일괄)
> - Claim_price ; 경매 신청인의 청구 금액
> - Appraisal_company ; 감정사
> - Appraisal_date ; 감정일자
> - Auction_count ; 총경매횟수
> - Auction_miscarriage_count ; 총유찰횟수
> > - 입찰 결과 낙찰(落札)이 결정되지 아니하고 무효로 돌아가는 일. 응찰 가격이 내정 가격에 미달될 때 주로 일어남.
> - Total_land_gross_area ; 총토지전체면적(㎡)
> - Total_land_real_area ; 총토지실면적(㎡)
> - Total_land_auction_area ; 총토지경매면적(㎡)
> - Total_building_area ; 총건물면적(㎡)
> - Total_building_auction_area ; 총건물경매면적(㎡)
> - Total_appraisal_price ; 총감정가
> - Minimum_sales_price ; 최저매각가격
> > - 입찰 시 입찰인이 최저로 제시해야만 하는 금액
> - First_auction_date ; 최초경매일
> - Final_acution_date ; 최종경매일
> - Final_result ; 최종결과
> - Creditor ; 채권자, 경매 신청인
> - addr_do ; 주소_시도
> - addr_si ; 시군구
> - addr_dong ; 읍면동
> - addr_li ; 리
> - addr_san ; 주소_산번지 여부(Y)
> - addr_bunji1 ; 주소_번지1
> - addr_bunji2 ; 주소_번지2
> - addr_etc ; 주소_기타주소
> - Apartment_usage ; 건물(토지)의 대표 용도
> - Completion_date ; 준공일
> - Preserve_regist_date ; 보존등기일, 건물을 신축하고 처음으로 등기
> - Total_floor ; 총 층수
> - Current_floor ; 현재 층 수
> - Specific ; 기타_특이사항
> - Share_auction_YorN ; 지분경매 여부(Y)
> > - 하나의 부동산이 전체가 아닌 일부만 경매가 진행 (하나의 부동산의 공유자가 지분으로 소유권을 가지고 있을 때 그중 일부 지분만 경매가 진행)
> - road_name ; 도로명주소_도로명
> - road_bunji1 ; 도로명주소_번지1
> - road_bunji2 ; 도로명주소_번지2
> - Close_date ; 종국일자
> - Close_result ; 종국결과, 낙찰과 배당의 차이
> > - 경매 진행은 ①경매진행(낙찰) ▷ ②낙찰허가결정 ▷ ③대금납부 ▷ ④배당 후 종결 로 이뤄집니다. 낙찰자가 최고가로 입찰(①)해서 낙찰허가(②)를 받으면 대금납부기한 낙찰금을 입금(③)합니다. 법원은 납부된 낙찰금을 가지고 채권자에게 순위에 의한 배당(④)을 해주고 경매는 종결됩니다.
> - point.y ; 위도
> - point.x ; 경도
> - Hammer_price ; 낙찰가

In [450]:
import pandas as pd
import numpy as np

pd.set_option('display.max_columns', 100)

Raw_train = pd.read_csv("./data/Auction_Master_train.csv")
Raw_test = pd.read_csv("./data/Auction_Master_test.csv")
submission = pd.read_csv("./data/Auction_submission.csv")
regist = pd.read_csv("./data/Auction_regist.csv")
result = pd.read_csv("./data/Auction_result.csv")
rend = pd.read_csv("./data/Auction_result.csv")

In [451]:
Raw_train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1933 entries, 0 to 1932
Data columns (total 41 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   Auction_key                  1933 non-null   int64  
 1   Auction_class                1933 non-null   object 
 2   Bid_class                    1933 non-null   object 
 3   Claim_price                  1933 non-null   int64  
 4   Appraisal_company            1933 non-null   object 
 5   Appraisal_date               1933 non-null   object 
 6   Auction_count                1933 non-null   int64  
 7   Auction_miscarriage_count    1933 non-null   int64  
 8   Total_land_gross_area        1933 non-null   float64
 9   Total_land_real_area         1933 non-null   float64
 10  Total_land_auction_area      1933 non-null   float64
 11  Total_building_area          1933 non-null   float64
 12  Total_building_auction_area  1933 non-null   float64
 13  Total_appraisal_pr

In [452]:
Raw_train.head()

Unnamed: 0,Auction_key,Auction_class,Bid_class,Claim_price,Appraisal_company,Appraisal_date,Auction_count,Auction_miscarriage_count,Total_land_gross_area,Total_land_real_area,Total_land_auction_area,Total_building_area,Total_building_auction_area,Total_appraisal_price,Minimum_sales_price,First_auction_date,Final_auction_date,Final_result,Creditor,addr_do,addr_si,addr_dong,addr_li,addr_san,addr_bunji1,addr_bunji2,addr_etc,Apartment_usage,Preserve_regist_date,Total_floor,Current_floor,Specific,Share_auction_YorN,road_name,road_bunji1,road_bunji2,Close_date,Close_result,point.y,point.x,Hammer_price
0,2687,임의,개별,1766037301,정명감정,2017-07-26 00:00:00,2,1,12592.0,37.35,37.35,181.77,181.77,836000000,668800000,2018-02-13 00:00:00,2018-03-20 00:00:00,낙찰,베리타스자산관리대부,부산,해운대구,우동,,N,1398.0,,해운대엑소디움 5층 101-502호,주상복합,2009-07-14 00:00:00,45,5,,N,해운대해변로,30.0,,2018-06-14 00:00:00,배당,35.162717,129.137048,760000000
1,2577,임의,일반,152946867,희감정,2016-09-12 00:00:00,2,1,42478.1,18.76,18.76,118.38,118.38,1073000000,858400000,2016-12-29 00:00:00,2017-02-02 00:00:00,낙찰,흥국저축은행,부산,해운대구,우동,,N,1407.0,,해운대두산위브더제니스 103동 51층 5103호,아파트,2011-12-16 00:00:00,70,51,,N,마린시티2로,33.0,,2017-03-30 00:00:00,배당,35.156633,129.145068,971889999
2,2197,임의,개별,11326510,혜림감정,2016-11-22 00:00:00,3,2,149683.1,71.0,71.0,49.94,49.94,119000000,76160000,2017-07-28 00:00:00,2017-10-13 00:00:00,낙찰,국민은행,부산,사상구,모라동,,N,552.0,,백양그린 206동 14층 1403호,아파트,1992-07-31 00:00:00,15,14,,N,모라로110번길,88.0,,2017-12-13 00:00:00,배당,35.184601,128.996765,93399999
3,2642,임의,일반,183581724,신라감정,2016-12-13 00:00:00,2,1,24405.0,32.98,32.98,84.91,84.91,288400000,230720000,2017-07-20 00:00:00,2017-11-02 00:00:00,낙찰,고려저축은행,부산,남구,대연동,,N,243.0,23.0,대연청구 109동 11층 1102호,아파트,2001-07-13 00:00:00,20,11,,N,황령대로319번가길,110.0,,2017-12-27 00:00:00,배당,35.15418,129.089081,256899000
4,1958,강제,일반,45887671,나라감정,2016-03-07 00:00:00,2,1,774.0,45.18,45.18,84.96,84.96,170000000,136000000,2016-07-06 00:00:00,2016-08-03 00:00:00,낙찰,Private,부산,사하구,괴정동,,N,399.0,2.0,동조리젠시 7층 703호,아파트,2001-11-27 00:00:00,7,7,,N,오작로,51.0,,2016-10-04 00:00:00,배당,35.09963,128.998874,158660000


In [453]:
# "총토지 실면적"과 "총토지경매면적"이 서로 다른 example이 있는 지 확인
diff_real = (Raw_train['Total_land_real_area'] != Raw_train['Total_land_auction_area']).sum()
print("real != auction : ", diff_real, "/", len(Raw_train))

# "총건물면적"과 "총건물경매면적"이 서로 다른 example이 있는 지 확인
diff_building = (Raw_train['Total_building_area'] != Raw_train['Total_building_auction_area']).sum()
print("building != auction : ", diff_building, "/", len(Raw_train))

real != auction :  77 / 1933
building != auction :  79 / 1933


In [454]:
# "총토지실면적" 보다 "총토지경매면적"이 더 큰 example이 있는 지 확인
comp_real = Raw_train['Total_land_real_area'] < Raw_train['Total_land_auction_area']
print("real >= auction : ", comp_real.sum(), "/", len(Raw_train))

# "총건물면적" 보다 "총건물경매면적"이 더 큰 example이 있는 지 확인
comp_building = Raw_train['Total_building_area'] < Raw_train['Total_building_auction_area']
print("building >= auction : ", comp_building.sum(), "/", len(Raw_train))

real >= auction :  1 / 1933
building >= auction :  0 / 1933


In [455]:
# "총토지실면적" 보다 "총토지경매면적"이 더 큰 example 출력
Raw_train[comp_real]

Unnamed: 0,Auction_key,Auction_class,Bid_class,Claim_price,Appraisal_company,Appraisal_date,Auction_count,Auction_miscarriage_count,Total_land_gross_area,Total_land_real_area,Total_land_auction_area,Total_building_area,Total_building_auction_area,Total_appraisal_price,Minimum_sales_price,First_auction_date,Final_auction_date,Final_result,Creditor,addr_do,addr_si,addr_dong,addr_li,addr_san,addr_bunji1,addr_bunji2,addr_etc,Apartment_usage,Preserve_regist_date,Total_floor,Current_floor,Specific,Share_auction_YorN,road_name,road_bunji1,road_bunji2,Close_date,Close_result,point.y,point.x,Hammer_price
1000,1416,임의,개별,657000000,경인감정,2016-01-26 00:00:00,2,1,2418.0,43.21,43.22,151.03,151.03,330000000,264000000,2016-06-13 00:00:00,2016-07-25 00:00:00,낙찰,국민은행,서울,중랑구,중화동,,N,207.0,14.0,",-27 미영리치타운102 2층 207호",주상복합,2007-12-11 00:00:00,15,2,,N,망우로,223.0,8.0,2016-11-08 00:00:00,배당,37.595017,127.077879,341000000


In [456]:
Raw_train[Raw_train['Total_land_gross_area'] == 0]

Unnamed: 0,Auction_key,Auction_class,Bid_class,Claim_price,Appraisal_company,Appraisal_date,Auction_count,Auction_miscarriage_count,Total_land_gross_area,Total_land_real_area,Total_land_auction_area,Total_building_area,Total_building_auction_area,Total_appraisal_price,Minimum_sales_price,First_auction_date,Final_auction_date,Final_result,Creditor,addr_do,addr_si,addr_dong,addr_li,addr_san,addr_bunji1,addr_bunji2,addr_etc,Apartment_usage,Preserve_regist_date,Total_floor,Current_floor,Specific,Share_auction_YorN,road_name,road_bunji1,road_bunji2,Close_date,Close_result,point.y,point.x,Hammer_price
13,2550,임의,일반,135000000,미르감정,2016-07-24 00:00:00,2,1,0.0,0.0,0.0,106.34,106.34,560000000,448000000,2017-01-19 00:00:00,2017-03-30 00:00:00,낙찰,대부F&D,부산,해운대구,중동,,N,1519.0,3.0,", 1522-1,-2, 1523-1,-2,-3,-4,-5, 1524-1,-2,-3,...",아파트,2015-06-17 00:00:00,39,5,,N,좌동순환로433번길,30.0,,2017-07-24 00:00:00,배당,35.161948,129.17911,518800000
32,1826,임의,일반,45000000,태화감정,2015-09-21 00:00:00,3,1,0.0,0.0,0.0,68.2,68.2,67000000,53600000,2016-09-30 00:00:00,2016-12-29 00:00:00,낙찰,Private,부산,연제구,연산동,,Y,20.0,55.0,국토 4층 402호,아파트,1111-11-11 00:00:00,8,4,,N,망미번영로,103.0,,2017-03-08 00:00:00,배당,35.176351,129.111235,55203000
118,1986,임의,일반,70000000,알비감정,2016-04-20 00:00:00,1,0,0.0,0.0,0.0,83.33,83.33,54700000,54700000,2016-07-20 00:00:00,2016-07-20 00:00:00,낙찰,Private,부산,연제구,연산동,,Y,20.0,55.0,국토 1층 108호,아파트,1111-11-11 00:00:00,8,1,,N,망미번영로,103.0,,2016-10-19 00:00:00,배당,35.176351,129.111235,60000000
136,2571,강제,일반,31229620,내외감정,2016-08-31 00:00:00,1,0,0.0,0.0,0.0,43.02,43.02,33150000,33150000,2017-02-13 00:00:00,2017-02-13 00:00:00,낙찰,서울보증보험,부산,남구,용호동,,Y,20.0,1.0,",-8,23-1,67-1,68-2 한라임대 4동 4층 409호",아파트,1988-01-08 00:00:00,5,4,,N,이기대공원로26번길,21.0,7.0,2017-04-25 00:00:00,배당,35.123588,129.11664,34387000
176,2504,임의,일반,381780881,태화감정,2016-03-07 00:00:00,2,1,0.0,0.0,0.0,129.88,129.88,694000000,555200000,2017-07-10 00:00:00,2017-09-11 00:00:00,낙찰,국민은행,부산,해운대구,중동,,N,1818.0,,",1817,1819 해운대힐스테이트위브 T-104동 24층 2402호",아파트,2015-06-17 00:00:00,49,24,,N,좌동순환로433번길,30.0,,2017-12-05 00:00:00,배당,35.161229,129.180266,600000000
227,2489,강제,일반,264300000,국제감정,2016-02-05 00:00:00,2,0,0.0,0.0,0.0,84.62,84.62,160000000,160000000,2016-08-17 00:00:00,2016-12-14 00:00:00,낙찰,Private,부산,수영구,남천동,,N,27.0,3.0,",-16 남천마르빌빌딩 11층 1205호",주상복합,1111-11-11 00:00:00,20,11,,N,수영로,452.0,14.0,2017-02-17 00:00:00,배당,35.145924,129.110288,173300000
272,2201,강제,일반,25533259,평진감정,2016-12-01 00:00:00,2,1,0.0,0.0,0.0,37.55,37.55,26000000,20800000,2017-06-02 00:00:00,2017-07-07 00:00:00,낙찰,우리카드,부산,중구,영주동,,N,73.0,1.0,영주 2동 1층 110호,아파트,1975-08-29 00:00:00,4,1,,N,초량상로7번길,22.0,,2017-08-30 00:00:00,배당,35.113513,129.032852,20800000
564,2520,임의,일반,103200000,내외감정,2016-03-31 00:00:00,2,1,0.0,0.0,0.0,134.32,134.32,770000000,616000000,2016-12-01 00:00:00,2016-12-29 00:00:00,낙찰,장산(새),부산,해운대구,중동,,N,1818.0,,",1817,1819 해운대힐스테이트위브 T-201동 39층 3906호",아파트,2015-06-17 00:00:00,47,39,,N,0,,,2017-02-22 00:00:00,배당,35.161229,129.180266,666666668
568,1987,강제,일반,217020382,제일감정,2016-03-30 00:00:00,1,0,0.0,0.0,0.0,36.36,36.36,42000000,42000000,2016-11-18 00:00:00,2016-11-18 00:00:00,낙찰,케이알앤씨,부산,영도구,영선동4가,,N,1278.0,,영선 3동 A3층 304호,아파트,1970-03-28 00:00:00,5,3,2016.5.2자 및 2016.10.4자 사실조회에 따른 회신서(부산도시공사)에 따...,N,절영로,227.0,,2018-07-17 00:00:00,배당,35.077536,129.046754,42148800
696,869,임의,일반,464400000,리파인감정,2016-10-10 00:00:00,2,1,0.0,0.0,0.0,101.94,101.94,552000000,441600000,2017-02-28 00:00:00,2017-05-16 00:00:00,낙찰,중소기업은행,서울,은평구,진관동,,N,10.0,,상림마을 은평뉴타운 736동 2층 201호,아파트,2008-06-25 00:00:00,12,2,,N,진관4로,48.0,51.0,2017-07-06 00:00:00,배당,37.642422,126.930885,521000000


In [457]:
Raw_train[Raw_train['Total_building_area'].isnull()]

Unnamed: 0,Auction_key,Auction_class,Bid_class,Claim_price,Appraisal_company,Appraisal_date,Auction_count,Auction_miscarriage_count,Total_land_gross_area,Total_land_real_area,Total_land_auction_area,Total_building_area,Total_building_auction_area,Total_appraisal_price,Minimum_sales_price,First_auction_date,Final_auction_date,Final_result,Creditor,addr_do,addr_si,addr_dong,addr_li,addr_san,addr_bunji1,addr_bunji2,addr_etc,Apartment_usage,Preserve_regist_date,Total_floor,Current_floor,Specific,Share_auction_YorN,road_name,road_bunji1,road_bunji2,Close_date,Close_result,point.y,point.x,Hammer_price


In [458]:
# 최종적으로 유찰된 example이 있는지 확인
print(Raw_train['Final_result'].unique())

# Close_result에 어떤 값들이 있는 지 확인
print(Raw_train['Close_result'].unique())

['낙찰']
['배당' '    ']


In [459]:
# Close_result에 NULL 값이 포함되어 있는 지 확인
print(Raw_train['Close_result'].isnull().sum())

0


In [460]:
# '배당' 이 아닌 종국결과를 출력
# 아마 낙찰은 받았지만 어떠한 이유에서 아직 배당을 받지 못한 경우인 듯 함.
# 즉, 낙찰을 되었지만 아직 배당되지 않아서 경매가 진행 중인 상황인 듯 함.
Raw_train[Raw_train['Close_result'] != '배당']

Unnamed: 0,Auction_key,Auction_class,Bid_class,Claim_price,Appraisal_company,Appraisal_date,Auction_count,Auction_miscarriage_count,Total_land_gross_area,Total_land_real_area,Total_land_auction_area,Total_building_area,Total_building_auction_area,Total_appraisal_price,Minimum_sales_price,First_auction_date,Final_auction_date,Final_result,Creditor,addr_do,addr_si,addr_dong,addr_li,addr_san,addr_bunji1,addr_bunji2,addr_etc,Apartment_usage,Preserve_regist_date,Total_floor,Current_floor,Specific,Share_auction_YorN,road_name,road_bunji1,road_bunji2,Close_date,Close_result,point.y,point.x,Hammer_price
962,14,임의,일반,1138371040,리파인감정,2014-04-14 00:00:00,2,1,15487.3,78.55,78.55,168.42,168.42,1360000000,1088000000,2014-12-09 00:00:00,2016-04-05 00:00:00,낙찰,안양남부(새),서울,강남구,개포동,,N,12.0,2.0,엘지개포자이 101동 5층 502호,아파트,2004-08-05 00:00:00,22,5,,N,개포로109길,69.0,,1111-11-11 00:00:00,,37.496297,127.076623,1207210000
1016,1,강제,개별,900000000,신명감정,2016-03-15 00:00:00,5,4,510.0,35.11,23.41,92.03,61.35,298000000,122061000,2011-06-21 00:00:00,2016-10-11 00:00:00,낙찰,푸른이상호저축,서울,동작구,상도동,,N,1.0,5.0,삼부한강 2층 201호,아파트,2005-07-15 00:00:00,9,2,,Y,0,,,1111-11-11 00:00:00,,37.508474,126.952834,172000000
1026,843,강제,일반,117409245,도시감정,2016-06-10 00:00:00,1,0,30235.4,47.17,47.17,84.99,84.99,667000000,667000000,2017-11-07 00:00:00,2017-11-07 00:00:00,낙찰,Private,서울,마포구,창전동,,N,444.0,,",445-4,448 서강쌍용예가 111동 4층 404호",아파트,2007-11-19 00:00:00,15,4,,N,독막로,145.0,,1111-11-11 00:00:00,,37.548457,126.929413,725550000
1040,264,임의,일반,94459726,지녕감정,2016-11-04 00:00:00,1,0,259.1,20.42,20.42,50.1,50.1,300000000,300000000,2017-09-20 00:00:00,2017-09-20 00:00:00,낙찰,Private,서울,강남구,논현동,,N,102.0,9.0,논현빌라트 5층 502호,아파트,1999-08-30 00:00:00,5,5,,N,선릉로137길,6.0,,1111-11-11 00:00:00,,37.519267,127.039941,330000100
1146,1215,임의,일반,80000000,현산감정,2017-02-09 00:00:00,2,1,1145.0,56.86,56.86,94.62,94.62,286000000,228800000,2017-12-26 00:00:00,2018-02-06 00:00:00,낙찰,Private,서울,구로구,온수동,,N,97.0,2.0,",99-4 두양그린 5층 503호",아파트,2002-05-30 00:00:00,5,5,,N,부일로1길,136.0,9.0,1111-11-11 00:00:00,,37.495211,126.815756,265970000
1407,1738,임의,일반,70000000,화신감정,2017-09-25 00:00:00,2,1,82595.1,57.37,57.37,102.7,102.7,536000000,428800000,2018-01-22 00:00:00,2018-03-05 00:00:00,낙찰,Private,서울,중랑구,묵동,,N,20.0,,신내대림 501동 4층 403호,아파트,1996-01-30 00:00:00,12,4,,N,신내로21길,16.0,,1111-11-11 00:00:00,,37.616051,127.088214,538897000
1438,627,임의,일반,138000000,온누리감정,2017-06-21 00:00:00,2,1,1391.5,52.23,52.23,84.74,84.74,452000000,361600000,2018-01-29 00:00:00,2018-03-05 00:00:00,낙찰,Private,서울,강동구,성내동,,N,405.0,9.0,한솔애리즈 801동 6층 603호,아파트,1111-11-11 00:00:00,7,6,,N,양재대로95길,60.0,,1111-11-11 00:00:00,,37.531094,127.133775,445000000
1627,1202,강제,일반,238616280,성민감정,2016-11-17 00:00:00,2,1,893.0,47.22,47.22,119.92,119.92,605000000,484000000,2017-05-23 00:00:00,2017-06-28 00:00:00,낙찰,Private,서울,양천구,신정동,,N,118.0,19.0,목동그린빌라트 3층 301호,주상복합,1996-01-19 00:00:00,8,3,,N,목동동로12길,38.0,,1111-11-11 00:00:00,,37.522362,126.875448,556280000
1640,1683,강제,일반,64100475,기린감정,2017-04-28 00:00:00,2,1,25472.5,26.07,26.07,84.32,84.32,340000000,272000000,2017-09-11 00:00:00,2017-11-06 00:00:00,낙찰,Private,서울,중랑구,신내동,,N,783.0,,",-1,-2,상봉동480,-1 성원 102동 9층 902호",아파트,1994-12-20 00:00:00,25,9,,N,신내로,51.0,,1111-11-11 00:00:00,,37.604908,127.094707,331299000
1665,1667,임의,일반,181697863,프라임감정,2017-03-28 00:00:00,2,1,531.3,44.27,44.27,84.38,84.38,277000000,221600000,2017-08-28 00:00:00,2017-10-16 00:00:00,낙찰,Private,서울,중랑구,망우동,,N,184.0,16.0,",-18,-20,-21 주영파크뷰 4층 401호",아파트,2004-04-08 00:00:00,7,4,,N,용마산로114길,37.0,,1111-11-11 00:00:00,,37.599278,127.103362,243100000


In [461]:
Raw_train[Raw_train['Preserve_regist_date'] == '1111-11-11 00:00:00']

Unnamed: 0,Auction_key,Auction_class,Bid_class,Claim_price,Appraisal_company,Appraisal_date,Auction_count,Auction_miscarriage_count,Total_land_gross_area,Total_land_real_area,Total_land_auction_area,Total_building_area,Total_building_auction_area,Total_appraisal_price,Minimum_sales_price,First_auction_date,Final_auction_date,Final_result,Creditor,addr_do,addr_si,addr_dong,addr_li,addr_san,addr_bunji1,addr_bunji2,addr_etc,Apartment_usage,Preserve_regist_date,Total_floor,Current_floor,Specific,Share_auction_YorN,road_name,road_bunji1,road_bunji2,Close_date,Close_result,point.y,point.x,Hammer_price
29,1783,강제,개별,1087000000,대한감정,2012-06-11 00:00:00,6,5,532.0,16.65,16.65,98.08,98.08,228745632,117118000,2013-07-04 00:00:00,2016-04-29 00:00:00,낙찰,Private,부산,동래구,낙민동,,N,236.0,,",237-2 삼주 8층 801호",주상복합,1111-11-11 00:00:00,15,8,"**1차감정:220,000,000",N,0,,,2017-06-29 00:00:00,배당,35.201112,129.088687,126630000
32,1826,임의,일반,45000000,태화감정,2015-09-21 00:00:00,3,1,0.0,0.0,0.0,68.2,68.2,67000000,53600000,2016-09-30 00:00:00,2016-12-29 00:00:00,낙찰,Private,부산,연제구,연산동,,Y,20.0,55.0,국토 4층 402호,아파트,1111-11-11 00:00:00,8,4,,N,망미번영로,103.0,,2017-03-08 00:00:00,배당,35.176351,129.111235,55203000
81,1914,강제,일반,180000000,태평양감정,2015-12-29 00:00:00,2,1,4010.0,21.1,21.1,84.99,84.99,288000000,230400000,2016-05-04 00:00:00,2016-06-01 00:00:00,낙찰,Private,부산,연제구,연산동,,N,406.0,10.0,연산동한솔솔파크 101동 11층 1102호,아파트,1111-11-11 00:00:00,21,11,,N,과정로,211.0,,2016-08-03 00:00:00,배당,35.187269,129.103845,277111500
84,1781,강제,개별,1087000000,대한감정,2012-06-11 00:00:00,6,5,532.0,16.65,16.65,98.08,98.08,228745632,117118000,2013-07-04 00:00:00,2016-04-29 00:00:00,낙찰,Private,부산,동래구,낙민동,,N,236.0,,",237-2 삼주 7층 701호",주상복합,1111-11-11 00:00:00,15,7,"**1차감정:220,000,000",N,0,,,2017-06-29 00:00:00,배당,35.201112,129.088687,135330000
118,1986,임의,일반,70000000,알비감정,2016-04-20 00:00:00,1,0,0.0,0.0,0.0,83.33,83.33,54700000,54700000,2016-07-20 00:00:00,2016-07-20 00:00:00,낙찰,Private,부산,연제구,연산동,,Y,20.0,55.0,국토 1층 108호,아파트,1111-11-11 00:00:00,8,1,,N,망미번영로,103.0,,2016-10-19 00:00:00,배당,35.176351,129.111235,60000000
122,1780,강제,개별,1087000000,대한감정,2012-06-11 00:00:00,6,5,532.0,18.22,18.22,107.28,107.28,249565981,127778000,2013-07-04 00:00:00,2016-04-29 00:00:00,낙찰,Private,부산,동래구,낙민동,,N,236.0,,",237-2 삼주 6층 602호",주상복합,1111-11-11 00:00:00,15,6,"**1차감정:240,000,000",N,0,,,2017-06-29 00:00:00,배당,35.201112,129.088687,137800000
224,1789,강제,개별,1087000000,대한감정,2012-06-11 00:00:00,7,5,532.0,16.65,16.65,98.08,98.08,228745632,93694000,2013-07-04 00:00:00,2016-05-27 00:00:00,낙찰,Private,부산,동래구,낙민동,,N,236.0,,",237-2 삼주 11층 1101호",주상복합,1111-11-11 00:00:00,15,11,"**1차감정: 220,000,000",N,0,,,2017-06-29 00:00:00,배당,35.201112,129.088687,135500000
227,2489,강제,일반,264300000,국제감정,2016-02-05 00:00:00,2,0,0.0,0.0,0.0,84.62,84.62,160000000,160000000,2016-08-17 00:00:00,2016-12-14 00:00:00,낙찰,Private,부산,수영구,남천동,,N,27.0,3.0,",-16 남천마르빌빌딩 11층 1205호",주상복합,1111-11-11 00:00:00,20,11,,N,수영로,452.0,14.0,2017-02-17 00:00:00,배당,35.145924,129.110288,173300000
239,1787,강제,개별,1087000000,대한감정,2012-06-11 00:00:00,6,5,532.0,16.65,16.65,98.08,98.08,228745632,117118000,2013-07-04 00:00:00,2016-04-29 00:00:00,낙찰,Private,부산,동래구,낙민동,,N,236.0,,",237-2 삼주 10층 1001호",주상복합,1111-11-11 00:00:00,15,10,"**1차감정: 220,000,000",N,0,,,2017-06-29 00:00:00,배당,35.201112,129.088687,135330000
294,2497,강제,일반,264300000,오상호감정,2016-02-12 00:00:00,1,0,1096.7,12.3,12.3,84.97,84.97,219000000,219000000,2016-08-17 00:00:00,2016-08-17 00:00:00,낙찰,Private,부산,수영구,남천동,,N,27.0,3.0,",-16 남천마르빌빌딩 18층 1802호",주상복합,1111-11-11 00:00:00,20,18,,N,수영로,452.0,14.0,2016-11-18 00:00:00,배당,35.145924,129.110288,220500000


In [462]:
# Apartment Usage에 어떤 값들이 있는 지 확인
print(Raw_train['Apartment_usage'].unique())

['주상복합' '아파트']


In [463]:
#idx = Raw_train[Raw_train['Preserve_regist_date'] == "1111-11-11 00:00:00"].index
#Raw_train.loc[idx, 'Preserve_regist_date'] = Raw_train.loc[idx,'Appraisal_date']

#Raw_train

In [464]:
# feature engineering
from datetime import datetime, timedelta

# Target to log
log_values = np.log(Raw_train['Hammer_price'])
Raw_train['Hammer_price'] = log_values

def feature_engineering(df) :
    idx = df[df['Preserve_regist_date'] == "1111-11-11 00:00:00"].index
    df.loc[idx, 'Preserve_regist_date'] = df.loc[idx,'Appraisal_date']
    
    # 사용하지 않는 feature drop
    df = df.drop(['Auction_key', 'Appraisal_company', 'Final_result', 'Creditor', 'addr_do',
                           'addr_si', 'addr_dong', 'addr_li', 'addr_san', 'addr_bunji1', 'addr_bunji2',
                           'addr_etc', 'Specific', 'road_name', 'road_bunji1', 'road_bunji2', 'Close_date'], axis = 1)
    
    # 유찰 비율 feature 생성
    df['Miscarriage_rate'] = df['Auction_miscarriage_count'] / df['Auction_count']
    df = df.drop(['Auction_miscarriage_count', 'Auction_count'], axis = 1)
    
    # 층고율 feature 생성
    df['floor_rate'] = df['Current_floor'] / df['Total_floor']
    df = df.drop(['Current_floor', 'Total_floor'], axis = 1)
    
    # 감정가에 대한 최저매각가격의 부담율
    df['Minimum_per_appraisal'] = df['Minimum_sales_price'] / df['Total_appraisal_price']
    df = df.drop(['Minimum_sales_price'], axis = 1)
    
    # 감정가에 대한 청구금액의 부담율(비취하율)
    df['Claim_per_appraisal'] = df['Claim_price'] / df['Total_appraisal_price']
    df = df.drop(['Claim_price', 'Total_appraisal_price'], axis = 1)
    
    # 실질 경매 면적율
    df['Auction_real_area_rate'] = (((df['Total_land_real_area'] + df['Total_building_auction_area']) * df['Total_land_auction_area']) 
                                     / (df['Total_building_area'] + df['Total_land_gross_area']))
    df = df.drop(['Total_land_real_area', 'Total_land_gross_area', 'Total_land_auction_area',
                 'Total_building_auction_area', 'Total_building_area'], axis = 1)
    
    # One-hot Encoding
    df = pd.get_dummies(df, columns=['Bid_class', 'Auction_class', 'Auction_class', 'Apartment_usage', 'Share_auction_YorN', 'Close_result'])
    
    # date processing
    df['Appraisal_date'] = pd.to_datetime(df['Appraisal_date'], format="%Y-%m-%d %H:%M:%S")
    df['First_auction_date'] = pd.to_datetime(df['First_auction_date'], format="%Y-%m-%d %H:%M:%S")
    df['Final_auction_date'] = pd.to_datetime(df['Final_auction_date'], format="%Y-%m-%d %H:%M:%S")
    df['Preserve_regist_date'] = pd.to_datetime(df['Preserve_regist_date'], format="%Y-%m-%d %H:%M:%S")
    
    Preserve_to_Appraisal = df['Appraisal_date'] - df['Preserve_regist_date']
    Appraisal_to_first = df['First_auction_date'] - df['Appraisal_date']
    First_to_final = df['Final_auction_date'] - df['First_auction_date']
    
    Preserve_to_Appraisal = Preserve_to_Appraisal.astype(np.int64)
    Appraisal_to_first = Appraisal_to_first.astype(np.int64)
    First_to_final = First_to_final.astype(np.int64)
    
    df['Auction_duration_rate'] = (Appraisal_to_first + First_to_final) / (Preserve_to_Appraisal + Appraisal_to_first + First_to_final)
    
    df = df.drop(['Appraisal_date', 'First_auction_date', 'Final_auction_date', 'Preserve_regist_date'], axis = 1)
    
    return df

In [465]:
Raw_train = feature_engineering(Raw_train)

Raw_train

Unnamed: 0,point.y,point.x,Hammer_price,Miscarriage_rate,floor_rate,Minimum_per_appraisal,Claim_per_appraisal,Auction_real_area_rate,Bid_class_개별,Bid_class_일괄,Bid_class_일반,Auction_class_강제,Auction_class_임의,Auction_class_강제.1,Auction_class_임의.1,Apartment_usage_아파트,Apartment_usage_주상복합,Share_auction_YorN_N,Share_auction_YorN_Y,Close_result_,Close_result_배당,Auction_duration_rate
0,35.162717,129.137048,20.448829,0.500000,0.111111,0.80,2.112485,0.640698,1,0,0,0,1,0,1,0,1,1,0,0,1,0.074740
1,35.156633,129.145068,20.694753,0.500000,0.728571,0.80,0.142541,0.060398,0,0,1,0,1,0,1,1,0,1,0,0,1,0.076267
2,35.184601,128.996765,18.352402,0.666667,0.933333,0.64,0.095181,0.057347,1,0,0,0,1,0,1,1,0,1,0,0,1,0.035307
3,35.154180,129.089081,19.364194,0.500000,0.550000,0.80,0.636552,0.158760,0,0,1,0,1,0,1,1,0,1,0,0,1,0.054399
4,35.099630,128.998874,18.882274,0.500000,1.000000,0.80,0.269927,6.845168,0,0,1,1,0,1,0,1,0,1,0,0,1,0.027783
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1928,37.648811,127.048683,19.813206,0.000000,0.714286,1.00,0.698254,0.076473,0,0,1,1,0,1,0,1,0,1,0,0,1,0.013576
1929,37.663291,127.077063,19.879841,0.500000,1.000000,0.80,0.440529,0.297569,0,0,1,0,1,0,1,1,0,1,0,0,1,0.122492
1930,37.558319,126.981994,20.835701,0.500000,0.593750,0.80,0.276113,0.383392,0,0,1,0,1,0,1,0,1,1,0,0,1,0.203770
1931,37.647061,127.028002,19.814447,0.500000,0.733333,0.80,0.196560,0.316337,0,0,1,1,0,1,0,1,0,1,0,0,1,0.029291


In [466]:
Raw_test

Unnamed: 0,Auction_key,Auction_class,Bid_class,Claim_price,Appraisal_company,Appraisal_date,Auction_count,Auction_miscarriage_count,Total_land_gross_area,Total_land_real_area,Total_land_auction_area,Total_building_area,Total_building_auction_area,Total_appraisal_price,Minimum_sales_price,First_auction_date,Final_auction_date,Final_result,Creditor,addr_do,addr_si,addr_dong,addr_li,addr_san,addr_bunji1,addr_bunji2,addr_etc,Apartment_usage,Preserve_regist_date,Total_floor,Current_floor,Specific,Share_auction_YorN,road_name,road_bunji1,road_bunji2,Close_date,Close_result,point.y,point.x,Hammer_price
0,1778,강제,개별,1087000000,대한감정,2012-06-11 00:00:00,7,6,532.0,18.22,18.22,107.28,107.28,244565981,100174000,2013-07-04 00:00:00,2016-05-27 00:00:00,낙찰,Private,부산,동래구,낙민동,,N,236.0,,",237-2 삼주 5층 502호",주상복합,1111-11-11 00:00:00,15,5,"**1차감정:235,000,000",N,0,,,2017-06-29 00:00:00,배당,35.201112,129.088687,0
1,1779,강제,개별,1087000000,대한감정,2012-06-11 00:00:00,6,5,532.0,16.65,16.65,98.08,98.08,228745632,117118000,2013-07-04 00:00:00,2016-04-29 00:00:00,낙찰,Private,부산,동래구,낙민동,,N,236.0,,",237-2 삼주 6층 601호",주상복합,1111-11-11 00:00:00,15,6,"**1차감정:220,000,000",N,0,,,2017-06-29 00:00:00,배당,35.201112,129.088687,0
2,1784,강제,개별,1087000000,대한감정,2012-06-11 00:00:00,7,6,532.0,18.22,18.22,107.28,107.28,249565981,102222000,2013-07-04 00:00:00,2016-05-27 00:00:00,낙찰,Private,부산,동래구,낙민동,,N,236.0,,",237-2 삼주 8층 802호",주상복합,1111-11-11 00:00:00,15,8,"**1차감정: 240,000,000",N,0,,,2017-06-29 00:00:00,배당,35.201112,129.088687,0
3,1786,강제,개별,1087000000,대한감정,2012-06-11 00:00:00,7,6,532.0,18.22,18.22,107.28,107.28,249565981,102222000,2013-07-04 00:00:00,2016-05-27 00:00:00,낙찰,Private,부산,동래구,낙민동,,N,236.0,,",237-2 삼주 9층 902호",주상복합,1111-11-11 00:00:00,15,9,"**1차감정: 240,000,000",N,0,,,2017-06-29 00:00:00,배당,35.201112,129.088687,0
4,1790,강제,개별,1087000000,대한감정,2012-06-11 00:00:00,7,6,532.0,18.22,18.22,107.28,107.28,249565981,102222000,2013-07-04 00:00:00,2016-05-27 00:00:00,낙찰,Private,부산,동래구,낙민동,,N,236.0,,",237-2 삼주 11층 1102호",주상복합,1111-11-11 00:00:00,15,11,"**1차감정: 240,000,000",N,0,,,2017-06-29 00:00:00,배당,35.201112,129.088687,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
824,1762,강제,개별,4249164200,강림감정,2017-06-12 00:00:00,2,1,3460.9,13.87,13.87,84.91,84.91,420000000,336000000,2017-11-27 00:00:00,2017-12-26 00:00:00,낙찰,서희건설,서울,성북구,하월곡동,,N,229.0,,길음서희스타힐스 23층 2304호,주상복합,2011-12-13 00:00:00,23,23,,N,동소문로,284.0,,2018-02-28 00:00:00,배당,37.605407,127.027309,0
825,1767,강제,일반,320000000,삼일감정,2017-06-27 00:00:00,2,1,13578.9,72.81,72.81,150.66,150.66,580000000,464000000,2017-12-18 00:00:00,2018-01-22 00:00:00,낙찰,신용보증기금,서울,도봉구,방학동,,N,689.0,,", 690-41 방학동부센트레빌 103동 3층 304호",아파트,2005-12-15 00:00:00,14,3,,N,방학로,120.0,,2018-03-29 00:00:00,배당,37.663305,127.039551,0
826,1770,강제,일반,160000000,성북감정,2017-08-01 00:00:00,2,1,57491.8,34.03,34.03,49.77,49.77,293000000,234400000,2017-11-06 00:00:00,2017-12-04 00:00:00,낙찰,Private,서울,중랑구,신내동,,N,650.0,,신내 616동 5층 508호,아파트,1996-05-08 00:00:00,12,5,,N,신내로19길,42.0,,2018-02-07 00:00:00,배당,37.614529,127.091109,0
827,1772,임의,일반,230000000,생림감정,2017-09-28 00:00:00,1,0,27710.2,42.54,42.54,84.84,84.84,492000000,492000000,2018-01-29 00:00:00,2018-01-29 00:00:00,낙찰,중소기업은행,서울,성북구,장위동,,N,317.0,,꿈의숲대명루첸 109동 16층 1601호,아파트,2009-02-19 00:00:00,17,16,,N,월계로36길,27.0,,2018-04-24 00:00:00,배당,37.620359,127.047071,0


In [467]:
#idx = Raw_test[Raw_test['Preserve_regist_date'] == "1111-11-11 00:00:00"].index
#Raw_test = Raw_test.drop(idx)

Raw_test = feature_engineering(Raw_test)

Raw_test

Unnamed: 0,point.y,point.x,Hammer_price,Miscarriage_rate,floor_rate,Minimum_per_appraisal,Claim_per_appraisal,Auction_real_area_rate,Bid_class_개별,Bid_class_일괄,Bid_class_일반,Auction_class_강제,Auction_class_임의,Auction_class_강제.1,Auction_class_임의.1,Apartment_usage_아파트,Apartment_usage_주상복합,Share_auction_YorN_N,Share_auction_YorN_Y,Close_result_,Close_result_배당,Auction_duration_rate
0,35.201112,129.088687,0,0.857143,0.333333,0.409599,4.444608,3.576852,1,0,0,1,0,1,0,0,1,1,0,0,1,1.000000
1,35.201112,129.088687,0,0.833333,0.400000,0.512001,4.752003,3.031765,1,0,0,1,0,1,0,0,1,1,0,0,1,1.000000
2,35.201112,129.088687,0,0.857143,0.533333,0.409599,4.355562,3.576852,1,0,0,1,0,1,0,0,1,1,0,0,1,1.000000
3,35.201112,129.088687,0,0.857143,0.600000,0.409599,4.355562,3.576852,1,0,0,1,0,1,0,0,1,1,0,0,1,1.000000
4,35.201112,129.088687,0,0.857143,0.733333,0.409599,4.355562,3.576852,1,0,0,1,0,1,0,0,1,1,0,0,1,1.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
824,37.605407,127.027309,0,0.500000,1.000000,0.800000,10.117058,0.386394,1,0,0,1,0,1,0,0,1,1,0,0,1,0.089342
825,37.663305,127.039551,0,0.500000,0.214286,0.800000,0.551724,1.185096,0,0,1,1,0,1,0,1,0,1,0,0,1,0.047274
826,37.614529,127.091109,0,0.500000,0.416667,0.800000,0.546075,0.049559,0,0,1,1,0,1,0,1,0,1,0,0,1,0.015863
827,37.620359,127.047071,0,0.000000,0.941176,1.000000,0.467480,0.194954,0,0,1,0,1,0,1,1,0,1,0,0,1,0.037661


In [468]:
X_train = Raw_train.drop(['Hammer_price'], axis = 1)
y_train = Raw_train['Hammer_price']

X_test = Raw_test.drop(['Hammer_price'], axis = 1)

In [469]:
from sklearn.linear_model import LinearRegression
from sklearn.svm import SVR

linear = LinearRegression()
svr = SVR()

linear.fit(X_train, y_train)
svr.fit(X_train, y_train)

y_pred_linear = linear.predict(X_train)
y_pred_svr = svr.predict(X_train)

predict_linear = linear.predict(X_test)
predict_svr = svr.predict(X_test)

In [470]:
from sklearn.metrics import mean_squared_error

print("mean_squared_error for LinearRegression : {}".format(mean_squared_error(y_train, y_pred_linear)))
print("mean_squared_error for SupportVectorRegressor : {}".format(mean_squared_error(y_train, y_pred_svr)))

mean_squared_error for LinearRegression : 0.4460220975555672
mean_squared_error for SupportVectorRegressor : 0.6080835167069719


In [471]:
submission

Unnamed: 0,Auction_key,Hammer_price
0,1778,0
1,1779,0
2,1784,0
3,1786,0
4,1790,0
...,...,...
824,1762,0
825,1767,0
826,1770,0
827,1772,0


In [472]:
submission['Hammer_price'] = np.exp(predict_linear)
submission['Hammer_price'] = submission['Hammer_price'].astype(np.float64)

submission

Unnamed: 0,Auction_key,Hammer_price
0,1778,1.059008e+08
1,1779,1.296743e+08
2,1784,1.085370e+08
3,1786,1.094087e+08
4,1790,1.111731e+08
...,...,...
824,1762,3.426727e+08
825,1767,4.420109e+08
826,1770,4.403112e+08
827,1772,5.629121e+08


In [475]:
submission.to_csv("result.csv", header=True, index=False)