# Home Credit Default Risk

#### https://www.kaggle.com/c/home-credit-default-risk/data

    Many people struggle to get loans due to insufficient or non-existent credit histories. And, unfortunately, this population is often taken advantage of by untrustworthy lenders.

    Home Credit Group

    Home Credit strives to broaden financial inclusion for the unbanked population by providing a positive and safe borrowing experience. In order to make sure this underserved population has a positive loan experience, Home Credit makes use of a variety of alternative data--including telco and transactional information--to predict their clients' repayment abilities.

    While Home Credit is currently using various statistical and machine learning methods to make these predictions, they're challenging Kagglers to help them unlock the full potential of their data. Doing so will ensure that clients capable of repayment are not rejected and that loans are given with a principal, maturity, and repayment calendar that will empower their clients to be successful

![title](data/home_credit.png)

In [2]:
import pandas as pd

### 1. Application Table 

    This is the main table, broken into two files for Train (with TARGET) and Test (without TARGET).
    Static data for all applications. One row represents one loan in our data sample.
    
    
    - SK_ID_CURR : ID of loan in our sample
        대출 고유값
        
    - TARGET : Target variable (1 - client with payment difficulties: he/she had late payment more than X days on at least one of the first Y installments of the loan in our sample, 0 - all other cases)
        1 : 연체일자 유
        0 : 연체일자 무
        
    - NAME_CONTRACT_TYPE : Identification if loan is cash or revolving
        대출 종류 (cash, revolving)
    
    - CODE_GENDER : Gender of the client
        성별 
        
    - FLAG_OWN_CAR : FLAG_OWN_CAR
        차량 소지 여부
        
    - FLAG_OWN_REALTY : Flag if client owns a house or flat
        주택 또는 아파트 소유 여부
    
    - CNT_CHILDREN : Number of children the client has
        자녀수
    
    - AMT_INCOME_TOTAL : Income of the client
        수입
        
    - AMT_CREDIT : Credit amount of the loan
        대출 최종 금액
    
    - AMT_ANNUITY : Loan annuity
        대출 이자
    
    - AMT_GOODS_PRICE : For consumer loans it is the price of the goods for which the loan is given
        대출 승인액
    
    - NAME_TYPE_SUITE : Who was accompanying client when he was applying for the loan
        대출시 동행한 사람
    
    - NAME_INCOME_TYPE : Clients income type (businessman, working, maternity leave,)
        소득 출처
    
    - NAME_EDUCATION_TYPE : Level of highest education the client achieved
        교육 수준
    
    - NAME_FAMILY_STATUS : Family status of the client
        가족 사항
        
    - NAME_HOUSING_TYPE : What is the housing situation of the client (renting, living with parents, ...)
        숙박 거주 형태
    
    - REGION_POPULATION_RELATIVE : Normalized population of region where client lives (higher number means the client lives in more populated region)
        거주 하는 지역의 크기
    
    - DAYS_BIRTH : Client's age in days at the time of application
        고객 나이
    
    - DAYS_EMPLOYED : How many days before the application the person started current employment
        직장 경력 사항
    
    - DAYS_REGISTRATION : How many days before the application did client change his registration
        가입일 누적
    
    - DAYS_ID_PUBLISH : How many days before the application did client change the identity document with which he applied for the loan
        신원 확인 경과 일수 
    
    - OWN_CAR_AGE : Age of client's car
        차 연차
        
    - FLAG_MOBIL : Did client provide mobile phone (1=YES, 0=NO)
        휴대폰 번호 입력여부
    
    - FLAG_EMP_PHONE : Did client provide work phone (1=YES, 0=NO)
        직장 번호 입력여부
    
    - FLAG_CONT_MOBILE : Was mobile phone reachable (1=YES, 0=NO)
        휴대폰 연결 확인 여부
    
    - FLAG_PHONE : Did client provide home phone (1=YES, 0=NO)
        집 전화 번호 입력 여부
    
    - FLAG_EMAIL : Did client provide email (1=YES, 0=NO)
        이메일 입력 여부
    
    - OCCUPATION_TYPE : What kind of occupation does the client have
        직업 종류
        
    - CNT_FAM_MEMBERS : How many family members does client have
        가족 인원수
    
    - REGION_RATING_CLIENT : Our rating of the region where client lives (1,2,3)
        거주 지역 등급
    
    - REGION_RATING_CLIENT_W_CITY : Our rating of the region where client lives with taking city into account (1,2,3)
        지점 지역 등급
    
    - WEEKDAY_APPR_PROCESS_START : On which day of the week did the client apply for the loan
        대출 신청 요일
    
    - HOUR_APPR_PROCESS_START : Approximately at what hour did the client apply for the loan
        대출 신청 시간
    
    - REG_REGION_NOT_LIVE_REGION : Flag if client's permanent address does not match contact address (1=different, 0=same, at region level)
        접수지역, 거주지 일치 여부 
    
    - REG_REGION_NOT_WORK_REGION : Flag if client's permanent address does not match work address (1=different, 0=same, at region level)
        거주지, 직장 지역 일치 여부
    
    - LIVE_REGION_NOT_WORK_REGION : Flag if client's contact address does not match work address (1=different, 0=same, at region level)
       접수지역, 직장 지역 일치 여부
    
    - REG_CITY_NOT_LIVE_CITY : Flag if client's permanent address does not match contact address (1=different, 0=same, at city level)
        접수지역, 거주지 일치 여부 
    
    - REG_CITY_NOT_WORK_CITY : Flag if client's permanent address does not match work address (1=different, 0=same, at city level)
        거주지, 직장 지역 일치 여부
    
    - LIVE_CITY_NOT_WORK_CITY : Flag if client's contact address does not match work address (1=different, 0=same, at city level)
        접수지역, 직장 지역 일치 여부
    
    - ORGANIZATION_TYPE : Type of organization where client works
        직장 조직 유형
    
####    - EXT_SOURCE_1~3 : Normalized score from external data source
    외부데이터 정규화 점수

#### information about building where the client lives
    - APARTMENTS_AVG, BASEMENTAREA_AVG, YEARS_BEGINEXPLUATATION_AVG, YEARS_BUILD_AVG, COMMONAREA_AVG, ELEVATORS_AVG,ENTRANCES_AVG, FLOORSMAX_AVG, FLOORSMIN_AVG, FLOORSMIN_AVG, LANDAREA_AVG, LIVINGAPARTMENTS_AVG, LIVINGAREA_AVG, NONLIVINGAPARTMENTS_AVG, NONLIVINGAREA_AVG, APARTMENTS_MODE, BASEMENTAREA_MODE, YEARS_BEGINEXPLUATATION_MODE, YEARS_BUILD_MODE, COMMONAREA_MODE, ELEVATORS_MODE, ENTRANCES_MODE. FLOORSMAX_MODE, FLOORSMIN_MODE, LANDAREA_MODE, LIVINGAPARTMENTS_MODE, LIVINGAREA_MODE, LIVINGAREA_MODE, NONLIVINGAPARTMENTS_MODE, NONLIVINGAREA_MODE, APARTMENTS_MEDI, BASEMENTAREA_MEDI, YEARS_BEGINEXPLUATATION_MEDI, YEARS_BUILD_MEDI, COMMONAREA_MEDI, ELEVATORS_MEDI, ENTRANCES_MEDI, FLOORSMAX_MEDI, FLOORSMIN_MEDI, LANDAREA_MEDI, LIVINGAPARTMENTS_MEDI, LIVINGAREA_MEDI, NONLIVINGAPARTMENTS_MEDI, NONLIVINGAREA_MEDI, FONDKAPREMONT_MODE, HOUSETYPE_MODE, TOTALAREA_MODE, WALLSMATERIAL_MODE, EMERGENCYSTATE_MODE: 
    
    Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators, number of entrances, state of the building, number of floor
    
    고객 거주 정보
    
    
    - OBS_30_CNT_SOCIAL_CIRCLE : How many observation of client's social surroundings with observable 30 DPD (days past due) default
    ㅌ
    - DEF_30_CNT_SOCIAL_CIRCLE : How many observation of client's social surroundings defaulted on 30 DPD (days past due) 
    - OBS_60_CNT_SOCIAL_CIRCLE : How many observation of client's social surroundings with observable 60 DPD (days past due) default
    - DEF_60_CNT_SOCIAL_CIRCLE : How many observation of client's social surroundings defaulted on 60 (days past due) DPD
    소셜 활동수
    
    - DAYS_LAST_PHONE_CHANGE : How many days before application did client change phone
    
    전화 번경 후 지난일
    
    
    
    - FLAG_DOCUMENT 2~21 : Did client provide document 2~21
    서류 제출 여부
    
    - AMT_REQ_CREDIT_BUREAU_HOUR : Number of enquiries to Credit Bureau about the client one hour before application
    - AMT_REQ_CREDIT_BUREAU_DAY : Number of enquiries to Credit Bureau about the client one day before application (excluding one hour before application)
    - AMT_REQ_CREDIT_BUREAU_WEEK : Number of enquiries to Credit Bureau about the client one week before application (excluding one day before application)
    - AMT_REQ_CREDIT_BUREAU_MON : Number of enquiries to Credit Bureau about the client one month before application (excluding one week before application)
    - AMT_REQ_CREDIT_BUREAU_QRT : Number of enquiries to Credit Bureau about the client 3 month before application (excluding one month before application)
    - AMT_REQ_CREDIT_BUREAU_YEAR : Number of enquiries to Credit Bureau about the client one day year (excluding last 3 months before application)
        최근 신용조회 숫자

In [4]:
path = 'Data/'
app_train = pd.read_csv(path + "application_train.csv")

In [29]:
app_train[['AMT_CREDIT','AMT_ANNUITY', 'AMT_GOODS_PRICE']].head(3)

Unnamed: 0,AMT_CREDIT,AMT_ANNUITY,AMT_GOODS_PRICE
0,406597.5,24700.5,351000.0
1,1293502.5,35698.5,1129500.0
2,135000.0,6750.0,135000.0


In [27]:
app_train[['NAME_INCOME_TYPE','NAME_EDUCATION_TYPE', 'NAME_FAMILY_STATUS','NAME_HOUSING_TYPE']].head(3)

Unnamed: 0,NAME_INCOME_TYPE,NAME_EDUCATION_TYPE,NAME_FAMILY_STATUS,NAME_HOUSING_TYPE
0,Working,Secondary / secondary special,Single / not married,House / apartment
1,State servant,Higher education,Married,House / apartment
2,Working,Secondary / secondary special,Single / not married,House / apartment


In [28]:
app_train[['OBS_30_CNT_SOCIAL_CIRCLE','DEF_30_CNT_SOCIAL_CIRCLE', 'OBS_60_CNT_SOCIAL_CIRCLE','DEF_60_CNT_SOCIAL_CIRCLE']].head(3)

Unnamed: 0,OBS_30_CNT_SOCIAL_CIRCLE,DEF_30_CNT_SOCIAL_CIRCLE,OBS_60_CNT_SOCIAL_CIRCLE,DEF_60_CNT_SOCIAL_CIRCLE
0,2.0,2.0,2.0,2.0
1,1.0,0.0,1.0,0.0
2,0.0,0.0,0.0,0.0


## 2. bureau

    All client's previous credits provided by other financial institutions that were reported to Credit Bureau (for clients who have a loan in our sample).
    For every loan in our sample, there are as many rows as number of credits the client had in Credit Bureau before the application date.



    - SK_ID_CURR : ID of loan in our sample - one loan in our sample can have 0,1,2 or more related previous credits in credit bureau 
        대출 고유값
        
    - SK_BUREAU_ID : Recoded ID of previous Credit Bureau credit related to our loan (unique coding for each loan application)
        신용 기록 고유 값
        
    - CREDIT_ACTIVE : Status of the Credit Bureau (CB) reported credits
        신용 보증 상태
    
    - CREDIT_CURRENCY : Recoded currency of the Credit Bureau credit
        통화 
    
    - DAYS_CREDIT : How many days before current application did client apply for Credit Bureau credit
        대출 신청된지 얼마나 지났는가
    
    - CREDIT_DAY_OVERDUE : Number of days past due on CB credit at the time of application for related loan in our sample
        크래딧이 만료된 일자의 수
    
    
    - DAYS_CREDIT_ENDDATE : Remaining duration of CB credit (in days) at the time of application in Home Credit
        크래딧 남은 유효기간
    
    - DAYS_ENDDATE_FACT : Days since CB credit ended at the time of application in Home Credit (only for closed credit)
        홈크래딧 종류 후 지난일수
    
    - AMT_CREDIT_MAX_OVERDUE : Maximal amount overdue on the Credit Bureau credit so far (at application date of loan in our sample)
        연체 기록된 최대 금액
     
    - CNT_CREDIT_PROLONG : How many times was the Credit Bureau credit prolonged
        신용 연장일수 
    
    - AMT_CREDIT_SUM : Current credit amount for the Credit Bureau credit
        신용 보장금액
    
    - AMT_CREDIT_SUM_DEBT : Current debt on Credit Bureau credit
        현재 부채
    
    - AMT_CREDIT_SUM_LIMIT : Current credit limit of credit card reported in Credit Bureau
        신용카드 한도
    
    - AMT_CREDIT_SUM_OVERDUE : Current amount overdue on Credit Bureau credit
        연체 금액
    
    - CREDIT_TYPE : Type of Credit Bureau credit (Car, cash,...)
        보증 타입
    
    - DAYS_CREDIT_UPDATE : How many days before loan application did last information about the Credit Bureau credit come
        정보 변경후 경과 일수 
    
    - AMT_ANNUITY : Annuity of the Credit Bureau credit
        이자
    

In [32]:
bureau = pd.read_csv(path + "bureau.csv")
bureau.head()

Unnamed: 0,SK_ID_CURR,SK_ID_BUREAU,CREDIT_ACTIVE,CREDIT_CURRENCY,DAYS_CREDIT,CREDIT_DAY_OVERDUE,DAYS_CREDIT_ENDDATE,DAYS_ENDDATE_FACT,AMT_CREDIT_MAX_OVERDUE,CNT_CREDIT_PROLONG,AMT_CREDIT_SUM,AMT_CREDIT_SUM_DEBT,AMT_CREDIT_SUM_LIMIT,AMT_CREDIT_SUM_OVERDUE,CREDIT_TYPE,DAYS_CREDIT_UPDATE,AMT_ANNUITY
0,215354,5714462,Closed,currency 1,-497,0,-153.0,-153.0,,0,91323.0,0.0,,0.0,Consumer credit,-131,
1,215354,5714463,Active,currency 1,-208,0,1075.0,,,0,225000.0,171342.0,,0.0,Credit card,-20,
2,215354,5714464,Active,currency 1,-203,0,528.0,,,0,464323.5,,,0.0,Consumer credit,-16,
3,215354,5714465,Active,currency 1,-203,0,,,,0,90000.0,,,0.0,Credit card,-16,
4,215354,5714466,Active,currency 1,-629,0,1197.0,,77674.5,0,2700000.0,,,0.0,Consumer credit,-21,


In [33]:
bureau.head(20)

Unnamed: 0,SK_ID_CURR,SK_ID_BUREAU,CREDIT_ACTIVE,CREDIT_CURRENCY,DAYS_CREDIT,CREDIT_DAY_OVERDUE,DAYS_CREDIT_ENDDATE,DAYS_ENDDATE_FACT,AMT_CREDIT_MAX_OVERDUE,CNT_CREDIT_PROLONG,AMT_CREDIT_SUM,AMT_CREDIT_SUM_DEBT,AMT_CREDIT_SUM_LIMIT,AMT_CREDIT_SUM_OVERDUE,CREDIT_TYPE,DAYS_CREDIT_UPDATE,AMT_ANNUITY
0,215354,5714462,Closed,currency 1,-497,0,-153.0,-153.0,,0,91323.0,0.0,,0.0,Consumer credit,-131,
1,215354,5714463,Active,currency 1,-208,0,1075.0,,,0,225000.0,171342.0,,0.0,Credit card,-20,
2,215354,5714464,Active,currency 1,-203,0,528.0,,,0,464323.5,,,0.0,Consumer credit,-16,
3,215354,5714465,Active,currency 1,-203,0,,,,0,90000.0,,,0.0,Credit card,-16,
4,215354,5714466,Active,currency 1,-629,0,1197.0,,77674.5,0,2700000.0,,,0.0,Consumer credit,-21,
5,215354,5714467,Active,currency 1,-273,0,27460.0,,0.0,0,180000.0,71017.38,108982.62,0.0,Credit card,-31,
6,215354,5714468,Active,currency 1,-43,0,79.0,,0.0,0,42103.8,42103.8,0.0,0.0,Consumer credit,-22,
7,162297,5714469,Closed,currency 1,-1896,0,-1684.0,-1710.0,14985.0,0,76878.45,0.0,0.0,0.0,Consumer credit,-1710,
8,162297,5714470,Closed,currency 1,-1146,0,-811.0,-840.0,0.0,0,103007.7,0.0,0.0,0.0,Consumer credit,-840,
9,162297,5714471,Active,currency 1,-1146,0,-484.0,,0.0,0,4500.0,0.0,0.0,0.0,Credit card,-690,


## 3. bureau_balance

    Monthly balances of previous credits in Credit Bureau.
    This table has one row for each month of history of every previous credit reported to Credit Bureau – i.e the table has (#loans in sample * # of relative previous credits * # of months where we have some history observable for the previous credits) rows.
    
    
    - SK_BUREAU_ID : Recoded ID of Credit Bureau credit (unique coding for each application) - use this to join to CREDIT_BUREAU table 
        신용 기록 고유값 
        
    - MONTHS_BALANCE : Month of balance relative to application date (-1 means the freshest balance date)
        월 관계
    
    - STATUS : Status of Credit Bureau loan during the month (active, closed, DPD0-30, [C means closed, X means status unknown, 0 means no DPD, 1 means maximal did during month between 1-30, 2 means DPD 31-60, 5 means DPD 120+ or sold or written off ] )
        신용 조회 여부
       