## New Features

source: kaggle discussion
- late payment
- credit utilization
- debt ratio

In [1]:
#pacakges import
import pandas as pd
import numpy as np

#data visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Suppress warnings from pandas
import warnings
warnings.filterwarnings('ignore')

plt.style.use('fivethirtyeight')

In [2]:
train = pd.read_csv('data/application_train.csv')
test = pd.read_csv('data/application_test.csv')
install = pd.read_csv('data/installments_payments.csv')
buro = pd.read_csv('data/bureau.csv')
buro_bal = pd.read_csv('data/bureau_balance.csv')

## 1. Late Payment
---
installment_payments.csv
- Repayment history for the previously disbursed credits in Home Credit related to the loans in our sample.
- There is a) one row for every payment that was made plus b) one row each for missed payment.
- One row is equivalent to one payment of one installment OR one installment corresponding to one payment of one previous Home Credit credit related to loans in our sample.  

**Columns used here:**  
- DAYS_INSTALMENT: When the installment of previous credit was supposed to be paid (relative to application date of current loan)  
- DAYS_ENTRY_PAYMENT: When was the installments of previous credit paid actually (relative to application date of current loan)

**New feature: LATE_PAYMENT**  
installment['DAYS_INSTALMENT']-installment['DAYS_ENTRY_PAYMENT'] means the number of days delayed  

In [3]:
# installment_temp = installment[installment.DAYS_ENTRY_PAYMENT >= -365]
install['LATE_PAYMENT'] = install['DAYS_INSTALMENT']-install['DAYS_ENTRY_PAYMENT']

In [4]:
install.head()

Unnamed: 0,SK_ID_PREV,SK_ID_CURR,NUM_INSTALMENT_VERSION,NUM_INSTALMENT_NUMBER,DAYS_INSTALMENT,DAYS_ENTRY_PAYMENT,AMT_INSTALMENT,AMT_PAYMENT,LATE_PAYMENT
0,1054186,161674,1.0,6,-1180.0,-1187.0,6948.36,6948.36,7.0
1,1330831,151639,0.0,34,-2156.0,-2156.0,1716.525,1716.525,0.0
2,2085231,193053,2.0,1,-63.0,-63.0,25425.0,25425.0,0.0
3,2452527,199697,1.0,3,-2418.0,-2426.0,24350.13,24350.13,8.0
4,2714724,167756,1.0,2,-1383.0,-1366.0,2165.04,2160.585,-17.0


In [5]:
late_payment_feature = install.groupby('SK_ID_CURR')[['LATE_PAYMENT']].min().reset_index()
late_payment_feature.head()

Unnamed: 0,SK_ID_CURR,LATE_PAYMENT
0,100001,-11.0
1,100002,12.0
2,100003,1.0
3,100004,3.0
4,100005,-1.0


## 2. Credit Utilization
---
According to credit scoring papers, credit utilization is a strong indicator for a risky customer  
credit utilization can be calculated by dividing Credit card balance by credit card limit  
About the data:
credit_card_balance.csv

- Monthly balance snapshots of previous credit cards that the applicant has with Home Credit.
- This table has one row for each month of history of every previous credit in Home Credit (consumer credit and cash loans) related to loans in our sample   
- i.e. the table has
* #of loans in sample 
* #of relative previous credit cards 
* #of months where we have some history observable for the previous credit card) rows.

In [9]:
cred = pd.read_csv('data/credit_card_balance.csv')
cred.head()

Unnamed: 0,SK_ID_PREV,SK_ID_CURR,MONTHS_BALANCE,AMT_BALANCE,AMT_CREDIT_LIMIT_ACTUAL,AMT_DRAWINGS_ATM_CURRENT,AMT_DRAWINGS_CURRENT,AMT_DRAWINGS_OTHER_CURRENT,AMT_DRAWINGS_POS_CURRENT,AMT_INST_MIN_REGULARITY,...,AMT_RECIVABLE,AMT_TOTAL_RECEIVABLE,CNT_DRAWINGS_ATM_CURRENT,CNT_DRAWINGS_CURRENT,CNT_DRAWINGS_OTHER_CURRENT,CNT_DRAWINGS_POS_CURRENT,CNT_INSTALMENT_MATURE_CUM,NAME_CONTRACT_STATUS,SK_DPD,SK_DPD_DEF
0,2562384,378907,-6,56.97,135000,0.0,877.5,0.0,877.5,1700.325,...,0.0,0.0,0.0,1,0.0,1.0,35.0,Active,0,0
1,2582071,363914,-1,63975.555,45000,2250.0,2250.0,0.0,0.0,2250.0,...,64875.555,64875.555,1.0,1,0.0,0.0,69.0,Active,0,0
2,1740877,371185,-7,31815.225,450000,0.0,0.0,0.0,0.0,2250.0,...,31460.085,31460.085,0.0,0,0.0,0.0,30.0,Active,0,0
3,1389973,337855,-4,236572.11,225000,2250.0,2250.0,0.0,0.0,11795.76,...,233048.97,233048.97,1.0,1,0.0,0.0,10.0,Active,0,0
4,1891521,126868,-1,453919.455,450000,0.0,11547.0,0.0,11547.0,22924.89,...,453919.455,453919.455,0.0,1,0.0,1.0,101.0,Active,0,0


In [10]:
month = -2 
cred_temp = cred[cred.MONTHS_BALANCE >= month]
cred_temp['CRED_UTIL'] = cred_temp['AMT_BALANCE'] / cred_temp['AMT_CREDIT_LIMIT_ACTUAL']
cred_util_feature = cred_temp.groupby('SK_ID_CURR')['CRED_UTIL'].max().reset_index().rename(columns={'CRED_UTIL':'CRED_UTIL_'+str(month*-1)})
cred_util_feature.head()

Unnamed: 0,SK_ID_CURR,CRED_UTIL_2
0,100006,0.0
1,100011,0.0
2,100013,0.0
3,100021,0.0
4,100028,0.165937


## 3. Debt Ratio

In [13]:
buro['DEBT_RATIO']=buro['AMT_CREDIT_SUM_DEBT']/buro['AMT_CREDIT_SUM']
debt_ratio_feature = buro.groupby('SK_ID_CURR')['DEBT_RATIO'].max().reset_index()
debt_ratio_feature.head()

Unnamed: 0,SK_ID_CURR,DEBT_RATIO
0,100001,0.987405
1,100002,0.54618
2,100003,0.0
3,100004,0.0
4,100005,0.954794
