# Feature Selection

In this short (compared to Feature Engineering one) notebook, I'll continue data preparation process. Here I'll try to "filter down" bloated dataset that we got at the end of previous notebook. Various techniques will be applied to select most useful for prediction features. Feature Selection should not be neglected, as one can benefit from it a lot: it decreases overfitting, reduces training time, makes models simpler, lifts the curse of dimensionality (partialy) and may even improve accuracy. There are dozens of FS methods, grouped by types (filter, wrapper, embedded), but I'll use simple and effective ones. The reason: I don't want to loose very much data and don't want wait for hours evaluating feature subsets. Check [this page](https://h2o.ai/wiki/feature-selection/) for more info on FS.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import lightgbm as lgb
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import StratifiedKFold
from sklearn.compose import ColumnTransformer
pd.set_option('display.max_columns', 500)
pd.set_option('display.max_rows', 150)
pd.set_option('display.float_format', lambda x: '%.4f' % x)
sns.set()
import warnings
warnings.filterwarnings(action='ignore', category=UserWarning)
import gc
from imports import *

In [3]:
train=pd.read_csv('../data/full_train.csv', engine='pyarrow')
train=convert_types(train, print_info=True)
train.shape

Original Memory Usage: 3.37 gb.
New Memory Usage: 1.71 gb.


(307511, 1369)

Note: with huge files like this, it may be a good idea to [use some formats other than csv](https://pythonspeed.com/articles/pandas-read-csv-fast/). It'll save you a lot if time. 

## Correlation

One of the simplest methods of feature selection is to remove highly correlated features. Presence of this features can negatively affect model's ability to learn, generalize. Hence, we should remove them. One should select the treshold themselves, but it's usually 0.8-0.9.

In [4]:
corr_mat=train.corr().abs()
upper=corr_mat.where(np.triu(np.ones(corr_mat.shape), k=1).astype(np.bool))
upper.head()

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  upper=corr_mat.where(np.triu(np.ones(corr_mat.shape), k=1).astype(np.bool))


Unnamed: 0,Unnamed: 1,SK_ID_CURR,TARGET,CNT_CHILDREN,AMT_INCOME_TOTAL,AMT_CREDIT,AMT_ANNUITY,AMT_GOODS_PRICE,REGION_POPULATION_RELATIVE,DAYS_BIRTH,DAYS_EMPLOYED,DAYS_REGISTRATION,DAYS_ID_PUBLISH,OWN_CAR_AGE,FLAG_MOBIL,FLAG_EMP_PHONE,FLAG_WORK_PHONE,FLAG_CONT_MOBILE,FLAG_PHONE,FLAG_EMAIL,CNT_FAM_MEMBERS,REGION_RATING_CLIENT,REGION_RATING_CLIENT_W_CITY,HOUR_APPR_PROCESS_START,REG_REGION_NOT_LIVE_REGION,REG_REGION_NOT_WORK_REGION,LIVE_REGION_NOT_WORK_REGION,REG_CITY_NOT_LIVE_CITY,REG_CITY_NOT_WORK_CITY,LIVE_CITY_NOT_WORK_CITY,EXT_SOURCE_1,EXT_SOURCE_2,EXT_SOURCE_3,APARTMENTS_AVG,BASEMENTAREA_AVG,YEARS_BEGINEXPLUATATION_AVG,YEARS_BUILD_AVG,COMMONAREA_AVG,ELEVATORS_AVG,ENTRANCES_AVG,FLOORSMAX_AVG,FLOORSMIN_AVG,LANDAREA_AVG,LIVINGAPARTMENTS_AVG,LIVINGAREA_AVG,NONLIVINGAPARTMENTS_AVG,NONLIVINGAREA_AVG,APARTMENTS_MODE,BASEMENTAREA_MODE,YEARS_BEGINEXPLUATATION_MODE,YEARS_BUILD_MODE,COMMONAREA_MODE,ELEVATORS_MODE,ENTRANCES_MODE,FLOORSMAX_MODE,FLOORSMIN_MODE,LANDAREA_MODE,LIVINGAPARTMENTS_MODE,LIVINGAREA_MODE,NONLIVINGAPARTMENTS_MODE,NONLIVINGAREA_MODE,APARTMENTS_MEDI,BASEMENTAREA_MEDI,YEARS_BEGINEXPLUATATION_MEDI,YEARS_BUILD_MEDI,COMMONAREA_MEDI,ELEVATORS_MEDI,ENTRANCES_MEDI,FLOORSMAX_MEDI,FLOORSMIN_MEDI,LANDAREA_MEDI,LIVINGAPARTMENTS_MEDI,LIVINGAREA_MEDI,NONLIVINGAPARTMENTS_MEDI,NONLIVINGAREA_MEDI,TOTALAREA_MODE,OBS_30_CNT_SOCIAL_CIRCLE,DEF_30_CNT_SOCIAL_CIRCLE,OBS_60_CNT_SOCIAL_CIRCLE,DEF_60_CNT_SOCIAL_CIRCLE,DAYS_LAST_PHONE_CHANGE,FLAG_DOCUMENT_2,FLAG_DOCUMENT_3,FLAG_DOCUMENT_4,FLAG_DOCUMENT_5,FLAG_DOCUMENT_6,FLAG_DOCUMENT_7,FLAG_DOCUMENT_8,FLAG_DOCUMENT_9,FLAG_DOCUMENT_10,FLAG_DOCUMENT_11,FLAG_DOCUMENT_12,FLAG_DOCUMENT_13,FLAG_DOCUMENT_14,FLAG_DOCUMENT_15,FLAG_DOCUMENT_16,FLAG_DOCUMENT_17,FLAG_DOCUMENT_18,FLAG_DOCUMENT_19,FLAG_DOCUMENT_20,FLAG_DOCUMENT_21,AMT_REQ_CREDIT_BUREAU_HOUR,AMT_REQ_CREDIT_BUREAU_DAY,AMT_REQ_CREDIT_BUREAU_WEEK,AMT_REQ_CREDIT_BUREAU_MON,AMT_REQ_CREDIT_BUREAU_QRT,AMT_REQ_CREDIT_BUREAU_YEAR,NEW_AMT_CREDIT_TO_AMT_INCOME,NEW_AMT_CREDIT_TO_AMT_ANNUITY,NEW_AMT_CREDIT_TO_AMT_GOODS_PRICE,NEW_AMT_CREDIT_TO_MEAN_AMT_CREDIT_BY_CONTRACT_TYPE,NEW_AMT_CREDIT_TO_MEAN_AMT_CREDIT_BY_HOUSING_TYPE,NEW_AMT_CREDIT_TO_MEAN_AMT_CREDIT_BY_ORGANIZATION_TYPE,NEW_AMT_CREDIT_TO_MEAN_AMT_CREDIT_BY_EDUCATION_TYPE,NEW_AMT_CREDIT_TO_MEAN_AMT_CREDIT_BY_GENDER,NEW_AMT_CREDIT_TO_MEAN_AMT_CREDIT_BY_FAMILY_STATUS,NEW_AMT_CREDIT_TO_MEAN_AMT_INCOME_BY_AGE_GROUP,NEW_AMT_INCOME_BY_AGE_GROUP,NEW_AMT_INCOME_BY_CNT_CHILD,NEW_AMT_INCOME_BY_CNT_FAM_MEMBERS,NEW_AMT_INCOME_BY_AGE,NEW_AMT_INCOME_TO_MEAN_AMT_CREDIT_BY_CONTRACT_TYPE,NEW_AMT_INCOME_TO_MEAN_AMT_CREDIT_BY_HOUSING_TYPE,NEW_AMT_INCOME_TO_MEAN_AMT_CREDIT_BY_ORGANIZATION_TYPE,NEW_AMT_INCOME_TO_MEAN_AMT_CREDIT_BY_EDUCATION_TYPE,NEW_AMT_INCOME_TO_MEAN_AMT_CREDIT_BY_GENDER,NEW_AMT_INCOME_TO_MEAN_AMT_INCOME_BY_AGE_GROUP,NEW_DOC_FLAG_MEAN,NEW_DOC_FLAG_SUM,NEW_CONTACT_FLAG_MEAN,NEW_CONTACT_FLAG_SUM,NEW_ADDRESS_FLAG_MEAN,NEW_ADDRESS_FLAG_SUM,NEW_OWN_CAR_REALTY_COMBINATION,NEW_AGE_TO_MEAN_AGE_BY_FLAG_OWN_REALTY,NEW_AGE_TO_MEAN_AGE_BY_FLAG_OWN_CAR,NEW_AGE_TO_MEAN_AGE_BY_HOUSING_TYPE,NEW_DAYS_EMPLOYED_TO_DAYS_BIRTH,NEW_DAYS_REGISTRATION_TO_DAYS_BIRTH,NEW_OWN_CAR_AGE_TO_DAYS_BIRTH,NEW_OWN_CAR_AGE_TO_DAYS_EMPLOYED,NEW_DAYS_LAST_PHONE_CHANGE_TO_DAYS_BIRTH,NEW_DAYS_LAST_PHONE_CHANGE_TO_DAYS_EMPLOYED,NEW_CNT_CHILD_TO_CNT_FAM_MEMBERS,NEW_EXT_SOURCES_MEAN,NEW_EXT_SOURCES_STD,NEW_DAYS_CHANGE_MEAN,NEW_REGION_RATING_CLIENT_MEAN,NEW_30_CNT_SOCIAL_CIRCLE_MEAN,NEW_60_CNT_SOCIAL_CIRCLE_MEAN,bureau_DAYS_CREDIT_sum,bureau_DAYS_ENDDATE_FACT_sum,bureau_DAYS_CREDIT_min,bureau_DAYS_CREDIT_ENDDATE_min,bureau_DAYS_ENDDATE_FACT_min,bureau_DAYS_ENDDATE_FACT_mean,bureau_DAYS_CREDIT_mean,bureau_DAYS_CREDIT_UPDATE_sum,bureau_DAYS_ENDDATE_FACT_max,bureau_DAYS_CREDIT_UPDATE_min,bureau_DAYS_CREDIT_UPDATE_mean,bureau_DAYS_CREDIT_max,bureau_DAYS_CREDIT_UPDATE_max,bureau_CNT_CREDIT_PROLONG_min,bureau_CREDIT_DAY_OVERDUE_min,bureau_AMT_CREDIT_SUM_OVERDUE_min,bureau_CREDIT_DAY_OVERDUE_mean,bureau_CREDIT_DAY_OVERDUE_max,bureau_CREDIT_DAY_OVERDUE_sum,bureau_AMT_CREDIT_SUM_OVERDUE_mean,bureau_AMT_CREDIT_SUM_OVERDUE_max,bureau_AMT_CREDIT_SUM_OVERDUE_sum,bureau_CNT_CREDIT_PROLONG_mean,bureau_CNT_CREDIT_PROLONG_max,bureau_CNT_CREDIT_PROLONG_sum,bureau_AMT_CREDIT_SUM_DEBT_min,bureau_AMT_CREDIT_SUM_LIMIT_min,bureau_AMT_ANNUITY_min,bureau_AMT_CREDIT_MAX_OVERDUE_count,bureau_AMT_CREDIT_SUM_LIMIT_mean,bureau_AMT_CREDIT_MAX_OVERDUE_sum,bureau_AMT_CREDIT_SUM_LIMIT_sum,bureau_AMT_CREDIT_SUM_LIMIT_max,bureau_DAYS_ENDDATE_FACT_count,bureau_AMT_CREDIT_SUM_LIMIT_count,bureau_AMT_CREDIT_SUM_DEBT_count,bureau_DAYS_CREDIT_ENDDATE_count,bureau_AMT_ANNUITY_count,bureau_AMT_CREDIT_SUM_count,bureau_DAYS_CREDIT_count,bureau_DAYS_CREDIT_ENDDATE_mean,bureau_DAYS_CREDIT_ENDDATE_sum,bureau_DAYS_CREDIT_ENDDATE_max,bureau_AMT_ANNUITY_mean,bureau_AMT_ANNUITY_max,bureau_AMT_ANNUITY_sum,bureau_AMT_CREDIT_SUM_DEBT_mean,bureau_AMT_CREDIT_SUM_min,bureau_AMT_CREDIT_SUM_mean,bureau_AMT_CREDIT_SUM_DEBT_max,bureau_AMT_CREDIT_SUM_max,bureau_AMT_CREDIT_SUM_DEBT_sum,bureau_AMT_CREDIT_SUM_sum,bureau_AMT_CREDIT_MAX_OVERDUE_min,bureau_AMT_CREDIT_MAX_OVERDUE_mean,bureau_AMT_CREDIT_MAX_OVERDUE_max,bureau_CREDIT_TYPE_Mobile operator loan_count_norm,bureau_CREDIT_TYPE_Mobile operator loan_count,bureau_CREDIT_TYPE_Loan for purchase of shares (margin lending)_count_norm,bureau_CREDIT_TYPE_Loan for purchase of shares (margin lending)_count,bureau_CREDIT_ACTIVE_Bad debt_count_norm,bureau_CREDIT_ACTIVE_Bad debt_count,bureau_CREDIT_TYPE_Interbank credit_count_norm,bureau_CREDIT_TYPE_Interbank credit_count,bureau_CREDIT_TYPE_Real estate loan_count_norm,bureau_CREDIT_TYPE_Real estate loan_count,bureau_CREDIT_CURRENCY_currency 4_count_norm,bureau_CREDIT_CURRENCY_currency 4_count,bureau_CREDIT_CURRENCY_currency 3_count_norm,bureau_CREDIT_CURRENCY_currency 3_count,bureau_CREDIT_TYPE_Loan for the purchase of equipment_count_norm,bureau_CREDIT_TYPE_Loan for the purchase of equipment_count,bureau_CREDIT_TYPE_Cash loan (non-earmarked)_count_norm,bureau_CREDIT_TYPE_Cash loan (non-earmarked)_count,bureau_CREDIT_TYPE_Unknown type of loan_count_norm,bureau_CREDIT_TYPE_Unknown type of loan_count,bureau_CREDIT_TYPE_Another type of loan_count_norm,bureau_CREDIT_TYPE_Another type of loan_count,bureau_CREDIT_TYPE_Loan for working capital replenishment_count_norm,bureau_CREDIT_TYPE_Loan for working capital replenishment_count,bureau_CREDIT_CURRENCY_currency 2_count_norm,bureau_CREDIT_CURRENCY_currency 2_count,bureau_CREDIT_ACTIVE_Sold_count_norm,bureau_CREDIT_ACTIVE_Sold_count,bureau_CREDIT_TYPE_Mortgage_count_norm,bureau_CREDIT_TYPE_Mortgage_count,bureau_CREDIT_TYPE_Microloan_count_norm,bureau_CREDIT_TYPE_Microloan_count,bureau_CREDIT_TYPE_Car loan_count_norm,bureau_CREDIT_TYPE_Car loan_count,bureau_CREDIT_TYPE_Loan for business development_count_norm,bureau_CREDIT_TYPE_Loan for business development_count,bureau_CREDIT_TYPE_Credit card_count_norm,bureau_CREDIT_TYPE_Credit card_count,bureau_CREDIT_ACTIVE_Active_count_norm,bureau_CREDIT_ACTIVE_Closed_count_norm,bureau_CREDIT_TYPE_Consumer credit_count_norm,bureau_CREDIT_CURRENCY_currency 1_count_norm,bureau_CREDIT_ACTIVE_Active_count,bureau_CREDIT_ACTIVE_Closed_count,...,loan_card_SK_DPD_DEF_sum_min,loan_card_SK_DPD_DEF_sum_mean,loan_card_SK_DPD_DEF_sum_max,loan_card_SK_DPD_max_min,loan_card_SK_DPD_max_mean,loan_card_SK_DPD_max_max,loan_card_SK_DPD_mean_sum,loan_card_SK_DPD_sum_min,loan_card_SK_DPD_sum_mean,loan_card_SK_DPD_sum_max,loan_card_SK_DPD_DEF_max_sum,loan_card_SK_DPD_DEF_sum_sum,loan_card_SK_DPD_max_sum,loan_card_SK_DPD_sum_sum,loan_card_CNT_DRAWINGS_CURRENT_mean_min,loan_card_CNT_DRAWINGS_CURRENT_mean_mean,loan_card_CNT_DRAWINGS_CURRENT_mean_max,loan_card_CNT_INSTALMENT_MATURE_CUM_min_min,loan_card_CNT_INSTALMENT_MATURE_CUM_min_mean,loan_card_CNT_INSTALMENT_MATURE_CUM_min_max,loan_card_CNT_DRAWINGS_CURRENT_max_min,loan_card_CNT_DRAWINGS_CURRENT_max_mean,loan_card_CNT_DRAWINGS_CURRENT_max_max,loan_card_CNT_DRAWINGS_ATM_CURRENT_sum_min,loan_card_CNT_DRAWINGS_ATM_CURRENT_sum_mean,loan_card_CNT_DRAWINGS_ATM_CURRENT_sum_max,loan_card_CNT_DRAWINGS_CURRENT_sum_min,loan_card_CNT_DRAWINGS_CURRENT_sum_mean,loan_card_CNT_DRAWINGS_CURRENT_sum_max,loan_card_CNT_DRAWINGS_CURRENT_mean_sum,loan_card_CNT_DRAWINGS_ATM_CURRENT_mean_sum,loan_card_CNT_INSTALMENT_MATURE_CUM_mean_min,loan_card_CNT_INSTALMENT_MATURE_CUM_mean_mean,loan_card_CNT_INSTALMENT_MATURE_CUM_mean_max,loan_card_CNT_INSTALMENT_MATURE_CUM_max_min,loan_card_CNT_INSTALMENT_MATURE_CUM_max_mean,loan_card_CNT_INSTALMENT_MATURE_CUM_max_max,loan_card_AMT_DRAWINGS_ATM_CURRENT_count_min,loan_card_AMT_DRAWINGS_ATM_CURRENT_count_mean,loan_card_AMT_DRAWINGS_ATM_CURRENT_count_max,loan_card_AMT_PAYMENT_CURRENT_count_min,loan_card_AMT_PAYMENT_CURRENT_count_mean,loan_card_AMT_PAYMENT_CURRENT_count_max,loan_card_CNT_INSTALMENT_MATURE_CUM_min_sum,loan_card_AMT_PAYMENT_CURRENT_min_count,loan_card_CNT_DRAWINGS_OTHER_CURRENT_min_count,loan_card_CNT_DRAWINGS_ATM_CURRENT_max_sum,loan_card_CNT_DRAWINGS_CURRENT_max_sum,loan_card_CNT_DRAWINGS_ATM_CURRENT_sum_sum,loan_card_CNT_DRAWINGS_CURRENT_sum_sum,loan_card_CNT_INSTALMENT_MATURE_CUM_sum_min,loan_card_CNT_INSTALMENT_MATURE_CUM_sum_mean,loan_card_CNT_INSTALMENT_MATURE_CUM_sum_max,loan_card_CNT_INSTALMENT_MATURE_CUM_mean_sum,loan_card_AMT_DRAWINGS_CURRENT_mean_min,loan_card_AMT_DRAWINGS_CURRENT_mean_mean,loan_card_AMT_DRAWINGS_CURRENT_mean_max,loan_card_CNT_INSTALMENT_MATURE_CUM_max_sum,loan_card_AMT_INST_MIN_REGULARITY_mean_min,loan_card_AMT_INST_MIN_REGULARITY_mean_mean,loan_card_AMT_INST_MIN_REGULARITY_mean_max,loan_card_AMT_PAYMENT_TOTAL_CURRENT_mean_min,loan_card_AMT_PAYMENT_TOTAL_CURRENT_mean_mean,loan_card_AMT_PAYMENT_TOTAL_CURRENT_mean_max,loan_card_AMT_DRAWINGS_ATM_CURRENT_count_sum,loan_card_AMT_PAYMENT_CURRENT_count_sum,loan_card_AMT_INST_MIN_REGULARITY_max_min,loan_card_AMT_INST_MIN_REGULARITY_max_mean,loan_card_AMT_INST_MIN_REGULARITY_max_max,loan_card_AMT_RECEIVABLE_PRINCIPAL_mean_min,loan_card_AMT_RECEIVABLE_PRINCIPAL_mean_mean,loan_card_AMT_RECEIVABLE_PRINCIPAL_mean_max,loan_card_AMT_RECIVABLE_mean_min,loan_card_AMT_RECIVABLE_mean_mean,loan_card_AMT_RECIVABLE_mean_max,loan_card_AMT_TOTAL_RECEIVABLE_mean_min,loan_card_AMT_TOTAL_RECEIVABLE_mean_mean,loan_card_AMT_TOTAL_RECEIVABLE_mean_max,loan_card_AMT_BALANCE_mean_min,loan_card_AMT_BALANCE_mean_mean,loan_card_AMT_BALANCE_mean_max,loan_card_AMT_PAYMENT_TOTAL_CURRENT_max_min,loan_card_AMT_PAYMENT_TOTAL_CURRENT_max_mean,loan_card_AMT_PAYMENT_TOTAL_CURRENT_max_max,loan_card_CNT_INSTALMENT_MATURE_CUM_sum_sum,loan_card_AMT_DRAWINGS_CURRENT_max_min,loan_card_AMT_DRAWINGS_CURRENT_max_mean,loan_card_AMT_DRAWINGS_CURRENT_max_max,loan_card_AMT_RECEIVABLE_PRINCIPAL_max_min,loan_card_AMT_RECEIVABLE_PRINCIPAL_max_mean,loan_card_AMT_RECEIVABLE_PRINCIPAL_max_max,loan_card_AMT_DRAWINGS_ATM_CURRENT_sum_min,loan_card_AMT_DRAWINGS_ATM_CURRENT_sum_mean,loan_card_AMT_DRAWINGS_ATM_CURRENT_sum_max,loan_card_AMT_DRAWINGS_CURRENT_sum_min,loan_card_AMT_DRAWINGS_CURRENT_sum_mean,loan_card_AMT_DRAWINGS_CURRENT_sum_max,loan_card_AMT_DRAWINGS_CURRENT_mean_sum,loan_card_AMT_DRAWINGS_ATM_CURRENT_mean_sum,loan_card_AMT_RECIVABLE_max_min,loan_card_AMT_TOTAL_RECEIVABLE_max_min,loan_card_AMT_RECIVABLE_max_mean,loan_card_AMT_TOTAL_RECEIVABLE_max_mean,loan_card_AMT_RECIVABLE_max_max,loan_card_AMT_TOTAL_RECEIVABLE_max_max,loan_card_AMT_BALANCE_max_min,loan_card_AMT_BALANCE_max_mean,loan_card_AMT_BALANCE_max_max,loan_card_AMT_INST_MIN_REGULARITY_sum_min,loan_card_AMT_INST_MIN_REGULARITY_sum_mean,loan_card_AMT_INST_MIN_REGULARITY_sum_max,loan_card_AMT_INST_MIN_REGULARITY_mean_sum,loan_card_AMT_PAYMENT_TOTAL_CURRENT_sum_min,loan_card_AMT_PAYMENT_TOTAL_CURRENT_sum_mean,loan_card_AMT_PAYMENT_TOTAL_CURRENT_sum_max,loan_card_AMT_PAYMENT_TOTAL_CURRENT_mean_sum,loan_card_AMT_PAYMENT_CURRENT_mean_sum,loan_card_AMT_PAYMENT_CURRENT_sum_min,loan_card_AMT_PAYMENT_CURRENT_sum_mean,loan_card_AMT_PAYMENT_CURRENT_sum_max,loan_card_AMT_INST_MIN_REGULARITY_max_sum,loan_card_AMT_RECEIVABLE_PRINCIPAL_sum_min,loan_card_AMT_RECEIVABLE_PRINCIPAL_sum_mean,loan_card_AMT_RECEIVABLE_PRINCIPAL_sum_max,loan_card_AMT_RECEIVABLE_PRINCIPAL_mean_sum,loan_card_AMT_RECIVABLE_sum_min,loan_card_AMT_RECIVABLE_sum_mean,loan_card_AMT_RECIVABLE_sum_max,loan_card_AMT_TOTAL_RECEIVABLE_sum_min,loan_card_AMT_TOTAL_RECEIVABLE_sum_mean,loan_card_AMT_TOTAL_RECEIVABLE_sum_max,loan_card_AMT_RECIVABLE_mean_sum,loan_card_AMT_TOTAL_RECEIVABLE_mean_sum,loan_card_AMT_BALANCE_mean_sum,loan_card_AMT_BALANCE_sum_min,loan_card_AMT_BALANCE_sum_mean,loan_card_AMT_BALANCE_sum_max,loan_card_AMT_PAYMENT_TOTAL_CURRENT_max_sum,loan_card_AMT_PAYMENT_CURRENT_max_sum,loan_card_AMT_DRAWINGS_ATM_CURRENT_max_sum,loan_card_AMT_DRAWINGS_CURRENT_max_sum,loan_card_AMT_RECEIVABLE_PRINCIPAL_max_sum,loan_card_AMT_DRAWINGS_ATM_CURRENT_sum_sum,loan_card_AMT_DRAWINGS_CURRENT_sum_sum,loan_card_AMT_RECIVABLE_max_sum,loan_card_AMT_TOTAL_RECEIVABLE_max_sum,loan_card_AMT_BALANCE_max_sum,loan_card_AMT_INST_MIN_REGULARITY_sum_sum,loan_card_AMT_PAYMENT_TOTAL_CURRENT_sum_sum,loan_card_AMT_PAYMENT_CURRENT_sum_sum,loan_card_AMT_RECEIVABLE_PRINCIPAL_sum_sum,loan_card_AMT_RECIVABLE_sum_sum,loan_card_AMT_TOTAL_RECEIVABLE_sum_sum,loan_card_AMT_BALANCE_sum_sum,loan_card_NAME_CONTRACT_STATUS_Active_count_norm_min,loan_card_NAME_CONTRACT_STATUS_Active_count_norm_mean,loan_card_NAME_CONTRACT_STATUS_Active_count_norm_max,loan_card_AMT_INST_MIN_REGULARITY_count_min,loan_card_AMT_INST_MIN_REGULARITY_count_mean,loan_card_AMT_INST_MIN_REGULARITY_count_max,loan_card_NAME_CONTRACT_STATUS_Active_count_min,loan_card_NAME_CONTRACT_STATUS_Active_count_mean,loan_card_NAME_CONTRACT_STATUS_Active_count_max,loan_card_NAME_CONTRACT_STATUS_Active_count_norm_sum,loan_card_MONTHS_BALANCE_count_min,loan_card_MONTHS_BALANCE_count_mean,loan_card_MONTHS_BALANCE_count_max,loan_card_NAME_CONTRACT_STATUS_Approved_count_norm_count,loan_card_AMT_INST_MIN_REGULARITY_count_sum,loan_card_NAME_CONTRACT_STATUS_Active_count_sum,loan_card_MONTHS_BALANCE_count_sum,loan_card_AMT_CREDIT_LIMIT_ACTUAL_min_min,loan_card_AMT_CREDIT_LIMIT_ACTUAL_min_mean,loan_card_AMT_CREDIT_LIMIT_ACTUAL_min_max,loan_card_AMT_CREDIT_LIMIT_ACTUAL_mean_min,loan_card_AMT_CREDIT_LIMIT_ACTUAL_mean_mean,loan_card_AMT_CREDIT_LIMIT_ACTUAL_mean_max,loan_card_AMT_CREDIT_LIMIT_ACTUAL_max_min,loan_card_AMT_CREDIT_LIMIT_ACTUAL_max_mean,loan_card_AMT_CREDIT_LIMIT_ACTUAL_max_max,loan_card_AMT_CREDIT_LIMIT_ACTUAL_min_sum,loan_card_AMT_CREDIT_LIMIT_ACTUAL_sum_min,loan_card_AMT_CREDIT_LIMIT_ACTUAL_sum_mean,loan_card_AMT_CREDIT_LIMIT_ACTUAL_sum_max,loan_card_AMT_CREDIT_LIMIT_ACTUAL_mean_sum,loan_card_AMT_CREDIT_LIMIT_ACTUAL_max_sum,loan_card_AMT_CREDIT_LIMIT_ACTUAL_sum_sum,loan_card_CNT_DRAWINGS_OTHER_CURRENT_min_mean,loan_card_CNT_DRAWINGS_OTHER_CURRENT_min_max,loan_card_CNT_DRAWINGS_OTHER_CURRENT_min_min,loan_card_AMT_DRAWINGS_OTHER_CURRENT_min_mean,loan_card_AMT_DRAWINGS_OTHER_CURRENT_min_max,loan_card_AMT_DRAWINGS_OTHER_CURRENT_min_min,loan_card_CNT_DRAWINGS_POS_CURRENT_min_min,loan_card_CNT_DRAWINGS_POS_CURRENT_min_mean,loan_card_CNT_DRAWINGS_POS_CURRENT_min_max,loan_card_AMT_DRAWINGS_POS_CURRENT_min_min,loan_card_AMT_DRAWINGS_POS_CURRENT_min_mean,loan_card_AMT_DRAWINGS_POS_CURRENT_min_max,loan_card_CNT_DRAWINGS_ATM_CURRENT_min_min,loan_card_CNT_DRAWINGS_ATM_CURRENT_min_mean,loan_card_CNT_DRAWINGS_ATM_CURRENT_min_max,loan_card_AMT_DRAWINGS_ATM_CURRENT_min_min,loan_card_AMT_DRAWINGS_ATM_CURRENT_min_mean,loan_card_AMT_DRAWINGS_ATM_CURRENT_min_max,loan_card_AMT_PAYMENT_CURRENT_min_min,loan_card_AMT_PAYMENT_CURRENT_min_mean,loan_card_AMT_PAYMENT_CURRENT_min_max,loan_card_CNT_DRAWINGS_OTHER_CURRENT_mean_min,loan_card_CNT_DRAWINGS_OTHER_CURRENT_mean_mean,loan_card_CNT_DRAWINGS_OTHER_CURRENT_mean_max,loan_card_CNT_DRAWINGS_OTHER_CURRENT_max_min,loan_card_CNT_DRAWINGS_OTHER_CURRENT_max_mean,loan_card_CNT_DRAWINGS_OTHER_CURRENT_max_max,loan_card_AMT_DRAWINGS_OTHER_CURRENT_mean_min,loan_card_AMT_DRAWINGS_OTHER_CURRENT_mean_mean,loan_card_AMT_DRAWINGS_OTHER_CURRENT_mean_max,loan_card_AMT_DRAWINGS_OTHER_CURRENT_max_min,loan_card_AMT_DRAWINGS_OTHER_CURRENT_max_mean,loan_card_AMT_DRAWINGS_OTHER_CURRENT_max_max,loan_card_CNT_DRAWINGS_POS_CURRENT_mean_min,loan_card_CNT_DRAWINGS_POS_CURRENT_mean_mean,loan_card_CNT_DRAWINGS_POS_CURRENT_mean_max,loan_card_CNT_DRAWINGS_POS_CURRENT_max_min,loan_card_CNT_DRAWINGS_POS_CURRENT_max_mean,loan_card_CNT_DRAWINGS_POS_CURRENT_max_max,loan_card_AMT_DRAWINGS_POS_CURRENT_mean_min,loan_card_AMT_DRAWINGS_POS_CURRENT_mean_mean,loan_card_AMT_DRAWINGS_POS_CURRENT_mean_max,loan_card_AMT_DRAWINGS_POS_CURRENT_max_min,loan_card_AMT_DRAWINGS_POS_CURRENT_max_mean,loan_card_AMT_DRAWINGS_POS_CURRENT_max_max,loan_card_CNT_DRAWINGS_ATM_CURRENT_mean_min,loan_card_CNT_DRAWINGS_ATM_CURRENT_mean_mean,loan_card_CNT_DRAWINGS_ATM_CURRENT_mean_max,loan_card_CNT_DRAWINGS_ATM_CURRENT_max_min,loan_card_CNT_DRAWINGS_ATM_CURRENT_max_mean,loan_card_CNT_DRAWINGS_ATM_CURRENT_max_max,loan_card_AMT_DRAWINGS_ATM_CURRENT_mean_min,loan_card_AMT_DRAWINGS_ATM_CURRENT_mean_mean,loan_card_AMT_DRAWINGS_ATM_CURRENT_mean_max,loan_card_AMT_PAYMENT_CURRENT_mean_min,loan_card_AMT_PAYMENT_CURRENT_mean_mean,loan_card_AMT_PAYMENT_CURRENT_mean_max,loan_card_AMT_PAYMENT_CURRENT_max_min,loan_card_AMT_PAYMENT_CURRENT_max_mean,loan_card_AMT_PAYMENT_CURRENT_max_max,loan_card_AMT_DRAWINGS_ATM_CURRENT_max_min,loan_card_AMT_DRAWINGS_ATM_CURRENT_max_mean,loan_card_AMT_DRAWINGS_ATM_CURRENT_max_max
,,1.0,0.0021,0.0011,0.0018,0.0003,0.0004,0.0002,0.0008,0.0015,0.0001,0.001,0.0004,0.0018,0.0028,0.0013,0.0004,0.0028,0.0028,0.0003,0.0029,0.0011,0.0011,0.0003,0.0003,0.0011,0.0029,0.0019,0.0016,0.0001,0.0001,0.0023,0.0002,0.0016,0.0021,0.0016,0.0059,0.0015,0.0049,0.0029,0.0048,0.0031,0.0015,0.0031,0.0018,0.0026,0.003,0.002,0.0014,0.0019,0.0052,0.0011,0.005,0.0028,0.0044,0.0021,0.0015,0.0036,0.0022,0.0019,0.0019,0.002,0.0016,0.0014,0.0058,0.001,0.0051,0.0026,0.0046,0.0028,0.0017,0.0033,0.0022,0.003,0.0024,0.0023,0.0014,0.0001,0.0014,0.0012,0.0009,0.0007,0.0034,0.0041,0.0011,0.0021,0.0027,0.0018,0.0015,0.0008,0.002,0.001,0.0009,0.0011,0.0026,0.0007,0.0015,0.0005,0.0002,0.0011,0.0003,0.0027,0.0022,0.0021,0.0005,0.001,0.0047,0.0017,0.0,0.0011,0.0,0.0002,0.0003,0.0002,0.0003,0.0003,0.0008,0.0019,0.0006,0.0002,0.0024,0.0018,0.0018,0.0015,0.0018,0.0018,0.0019,0.0017,0.0017,0.0009,0.0009,0.0003,0.0003,0.0011,0.0015,0.0016,0.0021,0.0001,0.0002,0.0014,0.0036,0.0003,0.0013,0.0003,0.0013,0.0017,0.0011,0.0011,0.0013,0.0012,0.0012,0.0017,0.001,0.0001,0.0009,0.0003,0.0036,0.0008,0.0025,0.0017,0.0026,0.0051,0.0042,0.0014,0.0024,0.0022,0.0017,0.0007,0.0008,0.0018,0.0005,0.0004,0.0006,0.0009,0.0011,0.0004,0.0002,0.0019,0.0006,0.0007,0.0021,0.0002,0.0016,0.0023,0.003,0.0032,0.0018,0.0,0.0026,0.0026,0.001,0.0011,0.0017,0.0076,0.0082,0.0058,0.0011,0.0006,0.0014,0.0012,0.0013,0.0006,0.0031,0.0036,0.003,0.0025,0.0013,0.0013,0.0038,0.0027,0.003,0.0013,0.0025,0.0025,0.0017,0.0006,0.0007,0.0009,0.0028,0.0033,0.0011,0.0017,0.0008,0.001,0.0007,0.0,0.0004,0.0014,0.0007,0.0013,0.0001,0.0023,0.0063,0.0054,0.0004,0.0015,0.0,0.0005,0.0007,0.0039,0.0003,0.0003,0.0019,0.0041,0.0012,0.0005,0.0021,0.0008,0.0024,0.0023,...,0.0013,0.0026,0.0028,0.002,0.0023,0.0023,0.0014,0.0011,0.0014,0.0014,0.0017,0.0028,0.0019,0.001,0.0053,0.0052,0.0053,0.0001,0.0008,0.0009,0.0059,0.0058,0.0056,0.0036,0.0039,0.0038,0.0069,0.0069,0.0069,0.0069,0.0041,0.0014,0.0013,0.0013,0.0021,0.0021,0.0021,0.0001,0.0003,0.0004,0.0005,0.0,0.0001,0.0019,0.0002,0.0004,0.0032,0.0063,0.0032,0.0065,0.001,0.001,0.001,0.001,0.002,0.0018,0.0018,0.0006,0.0032,0.0032,0.0031,0.0033,0.0031,0.0028,0.0023,0.0019,0.0039,0.0036,0.0033,0.0021,0.0021,0.002,0.0021,0.002,0.002,0.0021,0.002,0.002,0.002,0.002,0.002,0.0021,0.0019,0.0017,0.002,0.0022,0.0019,0.0018,0.0036,0.0036,0.0034,0.0021,0.0021,0.002,0.0035,0.0034,0.0032,0.0032,0.002,0.0036,0.0036,0.0036,0.0036,0.0034,0.0034,0.0037,0.0036,0.0034,0.0038,0.0039,0.0038,0.0039,0.003,0.003,0.0029,0.0028,0.0021,0.0028,0.0028,0.0026,0.0023,0.0044,0.0044,0.0044,0.0043,0.0043,0.0043,0.0042,0.0043,0.0043,0.0042,0.0042,0.0042,0.0042,0.0043,0.0043,0.0042,0.0013,0.0013,0.0025,0.0012,0.0031,0.0017,0.0013,0.0031,0.0031,0.0031,0.0013,0.0006,0.0007,0.0022,0.002,0.0021,0.0021,0.0038,0.0033,0.0032,0.0009,0.0014,0.0014,0.0019,0.0024,0.0025,0.0026,0.0015,0.0021,0.0022,0.0023,0.0024,0.0035,0.003,0.004,0.004,0.0038,0.0032,0.0033,0.0031,0.0025,0.0025,0.0023,0.0055,0.0049,0.0049,0.0049,0.0047,0.0031,0.0025,0.0002,0.0002,0.0002,0.0001,0.0001,0.0001,0.0038,0.0039,0.0044,0.0024,0.0025,0.0034,0.0022,0.0022,0.0015,0.0011,0.0011,0.0004,0.0008,0.0008,0.0005,0.0022,0.0023,0.0024,0.0031,0.0034,0.0035,0.0068,0.0068,0.0069,0.0077,0.0077,0.0077,0.0044,0.0045,0.0048,0.0038,0.0038,0.0038,0.0007,0.0007,0.0011,0.0036,0.0036,0.0035,0.0026,0.0025,0.0021,0.0002,0.0,0.0,0.004,0.0039,0.0037,0.0014,0.0014,0.0013,0.0014,0.0014,0.0014,0.0021,0.0021,0.0022
SK_ID_CURR,,,0.0021,0.0011,0.0018,0.0003,0.0004,0.0002,0.0008,0.0015,0.0001,0.001,0.0004,0.0018,0.0028,0.0013,0.0004,0.0028,0.0028,0.0003,0.0029,0.0011,0.0011,0.0004,0.0003,0.0011,0.0029,0.0019,0.0016,0.0001,0.0001,0.0023,0.0002,0.0016,0.0021,0.0016,0.0059,0.0015,0.0049,0.0029,0.0049,0.0031,0.0015,0.0031,0.0018,0.0026,0.003,0.002,0.0014,0.0019,0.0052,0.0011,0.005,0.0028,0.0044,0.0021,0.0015,0.0036,0.0022,0.0019,0.0019,0.002,0.0016,0.0014,0.0058,0.001,0.0051,0.0026,0.0046,0.0028,0.0017,0.0033,0.0022,0.003,0.0024,0.0023,0.0014,0.0001,0.0014,0.0012,0.0009,0.0007,0.0034,0.0041,0.0011,0.0021,0.0027,0.0018,0.0015,0.0008,0.002,0.001,0.0009,0.0011,0.0026,0.0007,0.0014,0.0005,0.0002,0.0011,0.0003,0.0027,0.0022,0.0021,0.0005,0.001,0.0047,0.0017,0.0,0.0011,0.0,0.0002,0.0003,0.0002,0.0003,0.0003,0.0008,0.0019,0.0006,0.0002,0.0024,0.0018,0.0018,0.0015,0.0018,0.0018,0.0019,0.0017,0.0017,0.0009,0.0009,0.0003,0.0003,0.0011,0.0014,0.0016,0.0021,0.0001,0.0002,0.0014,0.0036,0.0003,0.0013,0.0003,0.0012,0.0017,0.0011,0.0011,0.0013,0.0012,0.0012,0.0017,0.001,0.0001,0.0009,0.0003,0.0036,0.0008,0.0025,0.0017,0.0026,0.0051,0.0042,0.0014,0.0024,0.0022,0.0017,0.0007,0.0008,0.0018,0.0005,0.0004,0.0006,0.0009,0.0011,0.0004,0.0002,0.0019,0.0006,0.0007,0.0021,0.0002,0.0016,0.0023,0.003,0.0032,0.0018,0.0,0.0026,0.0026,0.001,0.0011,0.0017,0.0076,0.0082,0.0058,0.001,0.0006,0.0014,0.0012,0.0013,0.0006,0.0031,0.0036,0.003,0.0025,0.0013,0.0013,0.0038,0.0027,0.003,0.0013,0.0025,0.0025,0.0017,0.0006,0.0007,0.0009,0.0028,0.0033,0.0011,0.0017,0.0008,0.001,0.0008,0.0,0.0004,0.0014,0.0007,0.0013,0.0001,0.0023,0.0063,0.0054,0.0004,0.0015,0.0,0.0005,0.0007,0.0039,0.0003,0.0003,0.0019,0.0041,0.0012,0.0005,0.0021,0.0008,0.0024,0.0023,...,0.0013,0.0026,0.0028,0.002,0.0023,0.0023,0.0014,0.0011,0.0014,0.0014,0.0016,0.0028,0.0019,0.001,0.0053,0.0052,0.0053,0.0001,0.0008,0.0009,0.0059,0.0058,0.0056,0.0036,0.0039,0.0038,0.0069,0.0069,0.0069,0.0069,0.0042,0.0014,0.0013,0.0013,0.0021,0.0021,0.0021,0.0001,0.0003,0.0004,0.0005,0.0,0.0001,0.0019,0.0002,0.0004,0.0032,0.0063,0.0032,0.0065,0.001,0.001,0.001,0.001,0.002,0.0018,0.0018,0.0006,0.0032,0.0032,0.0031,0.0033,0.0031,0.0028,0.0023,0.0019,0.0039,0.0036,0.0033,0.0021,0.0021,0.002,0.0021,0.002,0.002,0.0021,0.002,0.002,0.002,0.002,0.002,0.0021,0.0019,0.0017,0.002,0.0022,0.0019,0.0018,0.0036,0.0036,0.0034,0.0021,0.0021,0.002,0.0035,0.0034,0.0032,0.0032,0.002,0.0036,0.0036,0.0036,0.0036,0.0034,0.0034,0.0037,0.0036,0.0034,0.0038,0.0039,0.0038,0.0039,0.003,0.003,0.0029,0.0028,0.0021,0.0028,0.0028,0.0026,0.0023,0.0044,0.0044,0.0044,0.0043,0.0043,0.0043,0.0042,0.0043,0.0043,0.0042,0.0042,0.0042,0.0042,0.0043,0.0043,0.0042,0.0013,0.0013,0.0025,0.0012,0.0031,0.0017,0.0013,0.0031,0.0031,0.0031,0.0013,0.0006,0.0007,0.0022,0.002,0.0021,0.0021,0.0038,0.0033,0.0032,0.0009,0.0014,0.0014,0.0019,0.0024,0.0025,0.0026,0.0015,0.0022,0.0022,0.0023,0.0024,0.0035,0.003,0.004,0.004,0.0038,0.0032,0.0033,0.0031,0.0025,0.0025,0.0023,0.0055,0.0049,0.0049,0.0049,0.0047,0.0031,0.0025,0.0002,0.0002,0.0002,0.0001,0.0001,0.0001,0.0038,0.0039,0.0044,0.0024,0.0025,0.0034,0.0022,0.0022,0.0015,0.0011,0.0011,0.0004,0.0008,0.0008,0.0005,0.0022,0.0023,0.0023,0.0031,0.0034,0.0035,0.0068,0.0068,0.0069,0.0077,0.0077,0.0077,0.0044,0.0045,0.0048,0.0038,0.0038,0.0038,0.0007,0.0007,0.0011,0.0036,0.0036,0.0035,0.0026,0.0025,0.0021,0.0002,0.0,0.0,0.004,0.0039,0.0037,0.0014,0.0014,0.0013,0.0014,0.0014,0.0014,0.0021,0.0021,0.0022
TARGET,,,,0.0192,0.004,0.0304,0.0128,0.0396,0.0372,0.0782,0.075,0.042,0.0515,0.0376,0.0005,0.046,0.0285,0.0004,0.0238,0.0018,0.0093,0.0589,0.0609,0.0242,0.0056,0.0069,0.0028,0.0444,0.051,0.0325,0.1553,0.1605,0.1789,0.0295,0.0227,0.0097,0.0221,0.0185,0.0342,0.0192,0.044,0.0336,0.0109,0.025,0.033,0.0032,0.0136,0.0273,0.02,0.009,0.0221,0.0163,0.0321,0.0174,0.0432,0.0327,0.0102,0.0234,0.0307,0.0016,0.0127,0.0292,0.0221,0.01,0.0223,0.0186,0.0339,0.019,0.0438,0.0334,0.0113,0.0246,0.0327,0.0028,0.0133,0.0326,0.0091,0.0322,0.009,0.0313,0.0552,0.0054,0.0443,0.0027,0.0003,0.0286,0.0015,0.008,0.0044,0.0014,0.0042,0.0008,0.0116,0.0095,0.0065,0.0116,0.0034,0.008,0.0014,0.0002,0.0037,0.0009,0.0027,0.0008,0.0125,0.002,0.0199,0.0077,0.0321,0.0694,0.0394,0.0283,0.0312,0.0234,0.0313,0.0307,0.0254,0.0056,0.0125,0.0066,0.0089,0.0041,0.0037,0.0079,0.0024,0.0066,0.0056,0.0172,0.0172,0.0208,0.0208,0.0448,0.0448,0.0129,0.0779,0.0812,0.0714,0.068,0.0092,0.0488,0.0291,0.034,0.0003,0.0212,0.2221,0.0477,0.0658,0.0606,0.0141,0.0131,0.042,0.0489,0.0752,0.0343,0.0559,0.0532,0.0897,0.0414,0.0196,0.0429,0.0689,0.0498,0.0282,0.0002,0.0076,0.0,0.0081,0.0055,0.0063,0.0072,0.0106,0.0133,0.003,0.004,0.0041,0.0002,0.0048,0.0025,0.0082,0.0114,0.0025,0.0094,0.0106,0.0305,0.0034,0.0067,0.004,0.0154,0.0041,0.0041,0.047,0.0537,0.0366,0.0014,0.0011,0.002,0.0006,0.0108,0.02,0.0022,0.0197,0.0071,0.0141,0.0023,0.0024,0.0025,0.0006,0.0006,0.0009,0.0011,0.0046,0.004,0.0006,0.0006,0.0019,0.0027,0.0014,0.0017,0.0028,0.0029,0.0003,0.0045,0.001,0.0011,0.0025,0.0023,0.0018,0.0012,0.0016,0.006,0.0068,0.006,0.0165,0.0121,0.0209,0.0233,0.0444,0.0341,0.0201,0.0208,0.0028,0.0038,0.0347,0.0348,0.0774,0.0794,0.0263,0.006,0.0671,0.0308,...,0.0032,0.0066,0.007,0.0065,0.0061,0.006,0.0036,0.0045,0.0037,0.0036,0.0056,0.0074,0.0067,0.0036,0.0823,0.0825,0.0826,0.0319,0.0312,0.0309,0.1006,0.1009,0.1014,0.0498,0.0495,0.0498,0.0502,0.0504,0.0507,0.0507,0.0535,0.0292,0.0289,0.0288,0.0179,0.0177,0.0176,0.04,0.0397,0.0395,0.0422,0.0418,0.0416,0.031,0.0395,0.0393,0.0075,0.0225,0.0069,0.0043,0.0429,0.0425,0.0424,0.0426,0.0587,0.0587,0.0584,0.0367,0.0736,0.0737,0.0742,0.0227,0.0227,0.0224,0.0513,0.0518,0.0627,0.0634,0.0639,0.0859,0.0861,0.0865,0.0863,0.0865,0.0869,0.0863,0.0865,0.0869,0.087,0.0872,0.0876,0.028,0.0281,0.0281,0.044,0.052,0.0521,0.0523,0.0664,0.0666,0.0669,0.0378,0.0377,0.038,0.0235,0.0235,0.0236,0.0237,0.0405,0.0676,0.0676,0.0677,0.0677,0.0681,0.0681,0.0683,0.0684,0.0688,0.0026,0.0026,0.0028,0.0026,0.005,0.005,0.005,0.0049,0.0014,0.0052,0.0052,0.0052,0.0119,0.018,0.0181,0.0184,0.0185,0.0178,0.018,0.0182,0.0178,0.018,0.0183,0.0183,0.0184,0.0187,0.0182,0.0184,0.0186,0.0073,0.007,0.0118,0.0145,0.0099,0.0093,0.0152,0.0097,0.0096,0.0093,0.0279,0.0291,0.0308,0.0196,0.0196,0.0195,0.0196,0.0194,0.0199,0.0214,0.0565,0.0561,0.0559,0.0598,0.0598,0.0597,0.0594,0.0613,0.061,0.0608,0.0605,0.0558,0.0585,0.0589,0.0014,0.0012,0.0011,0.009,0.0089,0.0089,0.012,0.0117,0.0117,0.02,0.0459,0.0458,0.0456,0.0455,0.0513,0.0512,0.0029,0.0029,0.0029,0.0027,0.0027,0.0027,0.0178,0.0178,0.0174,0.005,0.005,0.0054,0.0228,0.0228,0.0228,0.013,0.013,0.0129,0.0125,0.0125,0.0125,0.0147,0.0148,0.0147,0.0004,0.0,0.0,0.0097,0.0098,0.0097,0.0038,0.0041,0.004,0.0525,0.0525,0.0526,0.0685,0.0687,0.0689,0.0041,0.004,0.0042,0.0096,0.0093,0.0092,0.1076,0.1077,0.1083,0.0635,0.0632,0.0637,0.0599,0.0599,0.0602,0.0053,0.0053,0.0051,0.0003,0.0005,0.0004,0.0238,0.0238,0.0241
CNT_CHILDREN,,,,,0.0129,0.0021,0.0214,0.0018,0.0256,0.3309,0.0611,0.1834,0.028,0.0085,0.001,0.2407,0.0556,0.0008,0.0299,0.0226,0.8792,0.0254,0.0248,0.0073,0.0133,0.0082,0.0148,0.0201,0.0706,0.07,0.1385,0.018,0.0427,0.0132,0.0085,0.0069,0.0302,0.0001,0.0071,0.0083,0.0097,0.0088,0.0031,0.0086,0.0101,0.0041,0.0,0.0121,0.0085,0.0062,0.0295,0.0004,0.0064,0.0069,0.0096,0.008,0.0022,0.008,0.0095,0.0041,0.0002,0.013,0.0088,0.0064,0.0301,0.0006,0.0067,0.0083,0.0094,0.0082,0.0028,0.008,0.0101,0.0041,0.0001,0.008,0.0156,0.0013,0.0152,0.0019,0.0059,0.0018,0.0568,0.0037,0.0167,0.157,0.0015,0.0517,0.002,0.0028,0.0053,0.0003,0.0039,0.0055,0.0036,0.0107,0.0008,0.004,0.0009,0.001,0.0025,0.0004,0.0004,0.0024,0.0108,0.0078,0.0415,0.016,0.0273,0.0226,0.0061,0.0038,0.0132,0.0023,0.0009,0.0045,0.0112,0.001,0.2744,0.2317,0.0523,0.013,0.0129,0.0023,0.0094,0.0074,0.001,0.0164,0.0164,0.1231,0.1231,0.06,0.06,0.0309,0.3314,0.3202,0.3301,0.01,0.076,0.0493,0.0001,0.0997,0.0057,0.9513,0.0661,0.0042,0.1452,0.0254,0.0143,0.0142,0.0082,0.0047,0.0192,0.0081,0.0135,0.015,0.0261,0.0068,0.0124,0.0084,0.0195,0.0249,0.0221,0.0062,0.0008,0.0017,0.0027,0.0028,0.0024,0.0018,0.0015,0.0011,0.002,0.0037,0.0041,0.0131,0.0025,0.0014,0.0232,0.003,0.0004,0.0031,0.0021,0.0006,0.007,0.0111,0.0018,0.0052,0.0029,0.0029,0.0197,0.0251,0.0233,0.0023,0.0039,0.0019,0.0383,0.0108,0.0353,0.0514,0.0559,0.0461,0.0341,0.0019,0.0019,0.0012,0.0042,0.0042,0.0001,0.0009,0.0014,0.0015,0.0011,0.0011,0.0065,0.0046,0.0041,0.0029,0.002,0.0029,0.0016,0.0001,0.0024,0.0019,0.0018,0.003,0.0032,0.0037,0.0001,0.003,0.0027,0.0022,0.001,0.005,0.0724,0.0924,0.0087,0.0097,0.015,0.0267,0.002,0.0039,0.002,0.0015,0.0132,0.0131,0.0231,0.0024,0.0088,0.0007,...,0.0006,0.0003,0.0004,0.0042,0.0039,0.0039,0.0048,0.0051,0.0048,0.0048,0.0007,0.0004,0.0038,0.0047,0.0412,0.0412,0.0408,0.0336,0.0336,0.0336,0.0475,0.0475,0.0472,0.0165,0.0167,0.0168,0.0249,0.0248,0.0247,0.0246,0.0173,0.0398,0.0404,0.0404,0.04,0.0405,0.0405,0.0298,0.0303,0.0304,0.0294,0.0299,0.03,0.033,0.0311,0.0313,0.0227,0.008,0.0304,0.0059,0.0419,0.0424,0.0425,0.0424,0.0258,0.0257,0.0253,0.0434,0.013,0.0129,0.0129,0.0157,0.0155,0.0153,0.0318,0.0312,0.0145,0.0144,0.0143,0.0198,0.0197,0.0195,0.0195,0.0195,0.0193,0.0195,0.0195,0.0193,0.0197,0.0196,0.0194,0.0112,0.0109,0.0108,0.0422,0.0173,0.017,0.0167,0.0154,0.0152,0.0151,0.0163,0.0165,0.0167,0.0047,0.0045,0.0043,0.0043,0.0181,0.0152,0.0152,0.015,0.015,0.0149,0.0149,0.0153,0.0152,0.0151,0.0282,0.0286,0.0287,0.0287,0.0107,0.0111,0.0111,0.0112,0.0107,0.0103,0.0106,0.0106,0.0235,0.0199,0.0203,0.0203,0.0205,0.0202,0.0206,0.0207,0.0202,0.0206,0.0207,0.0208,0.0208,0.0207,0.0201,0.0205,0.0206,0.012,0.0118,0.023,0.0142,0.0238,0.0305,0.0204,0.0243,0.0243,0.0241,0.0429,0.0329,0.0323,0.0384,0.0385,0.0385,0.0385,0.0108,0.0107,0.0104,0.0358,0.0364,0.0364,0.0351,0.0356,0.0357,0.0358,0.036,0.0366,0.0366,0.0367,0.0336,0.0318,0.0325,0.0003,0.0002,0.0,0.0011,0.0013,0.0013,0.0025,0.0026,0.0026,0.0235,0.0379,0.0386,0.0387,0.0388,0.0388,0.0446,0.001,0.001,0.001,0.0007,0.0007,0.0007,0.0075,0.0075,0.0074,0.0095,0.0096,0.0101,0.0005,0.0005,0.0005,0.0035,0.0035,0.0035,0.0082,0.0082,0.0078,0.0025,0.0026,0.0025,0.0007,0.001,0.0012,0.0041,0.0041,0.0041,0.007,0.0069,0.0067,0.0487,0.0487,0.0482,0.0614,0.0615,0.0611,0.0375,0.0374,0.0372,0.0395,0.0395,0.0393,0.0153,0.0152,0.0149,0.0043,0.0045,0.0044,0.0027,0.0027,0.0026,0.02,0.0199,0.0198,0.0148,0.0146,0.0146,0.003,0.0032,0.0032
AMT_INCOME_TOTAL,,,,,,0.1569,0.1917,0.1596,0.0748,0.0273,0.013,0.0278,0.0085,0.1173,0.0003,0.064,0.0172,0.0083,0.0002,0.0384,0.0163,0.0855,0.0917,0.0365,0.0312,0.0623,0.0581,0.0036,0.0064,0.0083,0.0262,0.0609,0.0302,0.0345,0.0173,0.0057,0.0423,0.0896,0.0451,0.0054,0.0602,0.1399,0.0016,0.1069,0.04,0.0295,0.0746,0.03,0.0128,0.0053,0.0373,0.0756,0.041,0.002,0.0577,0.1318,0.0037,0.093,0.0349,0.025,0.0618,0.0338,0.0164,0.0056,0.042,0.0879,0.0442,0.0048,0.0597,0.1385,0.0019,0.1049,0.0393,0.0281,0.0708,0.042,0.0131,0.0132,0.013,0.0131,0.0186,0.001,0.0168,0.0005,0.0015,0.0459,0.0038,0.0725,0.0184,0.0003,0.0023,0.0025,0.0227,0.0207,0.0108,0.0073,0.0022,0.0031,0.0024,0.0002,0.0006,0.0007,0.0029,0.0024,0.0247,0.0049,0.0117,0.1082,0.0344,0.0214,0.1688,0.1563,0.1498,0.1434,0.1551,0.1562,0.1531,0.9981,0.909,0.8618,0.9793,1.0,0.9999,0.9935,0.9921,0.9953,0.9981,0.0136,0.0136,0.0304,0.0304,0.033,0.033,0.0298,0.0276,0.0163,0.0293,0.0107,0.0188,0.1066,0.0403,0.0212,0.0033,0.0144,0.0348,0.0018,0.0231,0.0897,0.0145,0.0143,0.044,0.0272,0.0282,0.0061,0.0142,0.0028,0.0088,0.0235,0.0107,0.0096,0.0128,0.0168,0.0272,0.0001,0.0013,0.0009,0.0038,0.0041,0.0041,0.0011,0.0003,0.0005,0.0072,0.017,0.0175,0.0155,0.0063,0.0233,0.0229,0.0288,0.0055,0.0391,0.0394,0.0305,0.0281,0.0435,0.0459,0.0322,0.047,0.047,0.0183,0.0347,0.034,0.0466,0.0428,0.0161,0.0683,0.0314,0.0978,0.1013,0.1263,0.1069,0.1306,0.0039,0.0076,0.013,0.0074,0.0074,0.0008,0.0035,0.0009,0.0002,0.0005,0.0005,0.0015,0.0028,0.0,0.0003,0.0059,0.0065,0.002,0.0041,0.0013,0.0013,0.0001,0.0015,0.0045,0.0085,0.0064,0.0094,0.0268,0.0311,0.0025,0.0079,0.0277,0.0479,0.0068,0.0038,0.0618,0.0889,0.0074,0.0202,0.0265,0.0574,0.0165,0.0168,0.0537,0.0273,0.057,0.0304,...,0.0032,0.0038,0.0038,0.0135,0.0134,0.0134,0.0096,0.0098,0.0096,0.0096,0.003,0.0042,0.0126,0.0094,0.0259,0.026,0.0266,0.0505,0.0512,0.0512,0.0487,0.0488,0.049,0.0328,0.033,0.033,0.0681,0.0684,0.0688,0.0689,0.0317,0.0209,0.0211,0.0211,0.0212,0.0213,0.0213,0.0343,0.0347,0.0347,0.0331,0.0335,0.0335,0.0473,0.0351,0.0351,0.0332,0.0752,0.0321,0.0724,0.0129,0.0132,0.0132,0.0134,0.1103,0.1107,0.1111,0.0123,0.096,0.0961,0.0964,0.1238,0.1242,0.1245,0.0253,0.025,0.1268,0.127,0.1272,0.0902,0.0903,0.0907,0.0898,0.0899,0.0902,0.0898,0.0899,0.0903,0.0893,0.0894,0.0898,0.1348,0.1352,0.1355,0.012,0.1418,0.1422,0.1426,0.1264,0.1266,0.1269,0.1012,0.1015,0.1016,0.1635,0.1639,0.1644,0.1646,0.098,0.1255,0.1255,0.1257,0.1257,0.126,0.126,0.1249,0.1251,0.1254,0.0818,0.0819,0.082,0.0831,0.1547,0.1552,0.1556,0.1557,0.1599,0.1562,0.1567,0.1571,0.1087,0.0875,0.0876,0.0877,0.0878,0.0866,0.0868,0.0868,0.0866,0.0868,0.0868,0.0869,0.0869,0.0869,0.0866,0.0868,0.0868,0.1219,0.1225,0.0818,0.1179,0.1122,0.0864,0.1304,0.1105,0.1104,0.1106,0.0471,0.1053,0.1057,0.0496,0.049,0.049,0.0491,0.0098,0.0098,0.0092,0.0337,0.0341,0.0342,0.0344,0.0348,0.0348,0.0349,0.0361,0.0365,0.0365,0.0365,0.026,0.0266,0.0278,0.0879,0.0883,0.0888,0.1163,0.1166,0.1172,0.131,0.1312,0.1317,0.0978,0.1285,0.1289,0.1289,0.129,0.1366,0.0749,0.0005,0.0005,0.0005,0.0008,0.0008,0.0008,0.0067,0.0067,0.0058,0.0164,0.0165,0.0172,0.0193,0.0193,0.0195,0.0112,0.0112,0.0114,0.017,0.0171,0.0175,0.0123,0.0122,0.0122,0.0102,0.0102,0.0101,0.0085,0.0086,0.0086,0.0154,0.0155,0.0155,0.0277,0.0278,0.0287,0.0567,0.0569,0.0574,0.144,0.1444,0.1458,0.209,0.2093,0.2104,0.0285,0.0284,0.0285,0.0122,0.0127,0.0127,0.0462,0.0463,0.0462,0.1571,0.1576,0.1587,0.1745,0.175,0.1756,0.1088,0.1093,0.1093


In [6]:
upper.to_csv('../data/corr_mat.csv')

In [12]:
upper=pd.read_csv('../data/corr_mat.csv').iloc[:, 1:]
upper.head()

Unnamed: 0,Unnamed: 1,SK_ID_CURR,TARGET,CNT_CHILDREN,AMT_INCOME_TOTAL,AMT_CREDIT,AMT_ANNUITY,AMT_GOODS_PRICE,REGION_POPULATION_RELATIVE,DAYS_BIRTH,DAYS_EMPLOYED,DAYS_REGISTRATION,DAYS_ID_PUBLISH,OWN_CAR_AGE,FLAG_MOBIL,FLAG_EMP_PHONE,FLAG_WORK_PHONE,FLAG_CONT_MOBILE,FLAG_PHONE,FLAG_EMAIL,CNT_FAM_MEMBERS,REGION_RATING_CLIENT,REGION_RATING_CLIENT_W_CITY,HOUR_APPR_PROCESS_START,REG_REGION_NOT_LIVE_REGION,REG_REGION_NOT_WORK_REGION,LIVE_REGION_NOT_WORK_REGION,REG_CITY_NOT_LIVE_CITY,REG_CITY_NOT_WORK_CITY,LIVE_CITY_NOT_WORK_CITY,EXT_SOURCE_1,EXT_SOURCE_2,EXT_SOURCE_3,APARTMENTS_AVG,BASEMENTAREA_AVG,YEARS_BEGINEXPLUATATION_AVG,YEARS_BUILD_AVG,COMMONAREA_AVG,ELEVATORS_AVG,ENTRANCES_AVG,FLOORSMAX_AVG,FLOORSMIN_AVG,LANDAREA_AVG,LIVINGAPARTMENTS_AVG,LIVINGAREA_AVG,NONLIVINGAPARTMENTS_AVG,NONLIVINGAREA_AVG,APARTMENTS_MODE,BASEMENTAREA_MODE,YEARS_BEGINEXPLUATATION_MODE,YEARS_BUILD_MODE,COMMONAREA_MODE,ELEVATORS_MODE,ENTRANCES_MODE,FLOORSMAX_MODE,FLOORSMIN_MODE,LANDAREA_MODE,LIVINGAPARTMENTS_MODE,LIVINGAREA_MODE,NONLIVINGAPARTMENTS_MODE,NONLIVINGAREA_MODE,APARTMENTS_MEDI,BASEMENTAREA_MEDI,YEARS_BEGINEXPLUATATION_MEDI,YEARS_BUILD_MEDI,COMMONAREA_MEDI,ELEVATORS_MEDI,ENTRANCES_MEDI,FLOORSMAX_MEDI,FLOORSMIN_MEDI,LANDAREA_MEDI,LIVINGAPARTMENTS_MEDI,LIVINGAREA_MEDI,NONLIVINGAPARTMENTS_MEDI,NONLIVINGAREA_MEDI,TOTALAREA_MODE,OBS_30_CNT_SOCIAL_CIRCLE,DEF_30_CNT_SOCIAL_CIRCLE,OBS_60_CNT_SOCIAL_CIRCLE,DEF_60_CNT_SOCIAL_CIRCLE,DAYS_LAST_PHONE_CHANGE,FLAG_DOCUMENT_2,FLAG_DOCUMENT_3,FLAG_DOCUMENT_4,FLAG_DOCUMENT_5,FLAG_DOCUMENT_6,FLAG_DOCUMENT_7,FLAG_DOCUMENT_8,FLAG_DOCUMENT_9,FLAG_DOCUMENT_10,FLAG_DOCUMENT_11,FLAG_DOCUMENT_12,FLAG_DOCUMENT_13,FLAG_DOCUMENT_14,FLAG_DOCUMENT_15,FLAG_DOCUMENT_16,FLAG_DOCUMENT_17,FLAG_DOCUMENT_18,FLAG_DOCUMENT_19,FLAG_DOCUMENT_20,FLAG_DOCUMENT_21,AMT_REQ_CREDIT_BUREAU_HOUR,AMT_REQ_CREDIT_BUREAU_DAY,AMT_REQ_CREDIT_BUREAU_WEEK,AMT_REQ_CREDIT_BUREAU_MON,AMT_REQ_CREDIT_BUREAU_QRT,AMT_REQ_CREDIT_BUREAU_YEAR,NEW_AMT_CREDIT_TO_AMT_INCOME,NEW_AMT_CREDIT_TO_AMT_ANNUITY,NEW_AMT_CREDIT_TO_AMT_GOODS_PRICE,NEW_AMT_CREDIT_TO_MEAN_AMT_CREDIT_BY_CONTRACT_TYPE,NEW_AMT_CREDIT_TO_MEAN_AMT_CREDIT_BY_HOUSING_TYPE,NEW_AMT_CREDIT_TO_MEAN_AMT_CREDIT_BY_ORGANIZATION_TYPE,NEW_AMT_CREDIT_TO_MEAN_AMT_CREDIT_BY_EDUCATION_TYPE,NEW_AMT_CREDIT_TO_MEAN_AMT_CREDIT_BY_GENDER,NEW_AMT_CREDIT_TO_MEAN_AMT_CREDIT_BY_FAMILY_STATUS,NEW_AMT_CREDIT_TO_MEAN_AMT_INCOME_BY_AGE_GROUP,NEW_AMT_INCOME_BY_AGE_GROUP,NEW_AMT_INCOME_BY_CNT_CHILD,NEW_AMT_INCOME_BY_CNT_FAM_MEMBERS,NEW_AMT_INCOME_BY_AGE,NEW_AMT_INCOME_TO_MEAN_AMT_CREDIT_BY_CONTRACT_TYPE,NEW_AMT_INCOME_TO_MEAN_AMT_CREDIT_BY_HOUSING_TYPE,NEW_AMT_INCOME_TO_MEAN_AMT_CREDIT_BY_ORGANIZATION_TYPE,NEW_AMT_INCOME_TO_MEAN_AMT_CREDIT_BY_EDUCATION_TYPE,NEW_AMT_INCOME_TO_MEAN_AMT_CREDIT_BY_GENDER,NEW_AMT_INCOME_TO_MEAN_AMT_INCOME_BY_AGE_GROUP,NEW_DOC_FLAG_MEAN,NEW_DOC_FLAG_SUM,NEW_CONTACT_FLAG_MEAN,NEW_CONTACT_FLAG_SUM,NEW_ADDRESS_FLAG_MEAN,NEW_ADDRESS_FLAG_SUM,NEW_OWN_CAR_REALTY_COMBINATION,NEW_AGE_TO_MEAN_AGE_BY_FLAG_OWN_REALTY,NEW_AGE_TO_MEAN_AGE_BY_FLAG_OWN_CAR,NEW_AGE_TO_MEAN_AGE_BY_HOUSING_TYPE,NEW_DAYS_EMPLOYED_TO_DAYS_BIRTH,NEW_DAYS_REGISTRATION_TO_DAYS_BIRTH,NEW_OWN_CAR_AGE_TO_DAYS_BIRTH,NEW_OWN_CAR_AGE_TO_DAYS_EMPLOYED,NEW_DAYS_LAST_PHONE_CHANGE_TO_DAYS_BIRTH,NEW_DAYS_LAST_PHONE_CHANGE_TO_DAYS_EMPLOYED,NEW_CNT_CHILD_TO_CNT_FAM_MEMBERS,NEW_EXT_SOURCES_MEAN,NEW_EXT_SOURCES_STD,NEW_DAYS_CHANGE_MEAN,NEW_REGION_RATING_CLIENT_MEAN,NEW_30_CNT_SOCIAL_CIRCLE_MEAN,NEW_60_CNT_SOCIAL_CIRCLE_MEAN,bureau_DAYS_CREDIT_sum,bureau_DAYS_ENDDATE_FACT_sum,bureau_DAYS_CREDIT_min,bureau_DAYS_CREDIT_ENDDATE_min,bureau_DAYS_ENDDATE_FACT_min,bureau_DAYS_ENDDATE_FACT_mean,bureau_DAYS_CREDIT_mean,bureau_DAYS_CREDIT_UPDATE_sum,bureau_DAYS_ENDDATE_FACT_max,bureau_DAYS_CREDIT_UPDATE_min,bureau_DAYS_CREDIT_UPDATE_mean,bureau_DAYS_CREDIT_max,bureau_DAYS_CREDIT_UPDATE_max,bureau_CNT_CREDIT_PROLONG_min,bureau_CREDIT_DAY_OVERDUE_min,bureau_AMT_CREDIT_SUM_OVERDUE_min,bureau_CREDIT_DAY_OVERDUE_mean,bureau_CREDIT_DAY_OVERDUE_max,bureau_CREDIT_DAY_OVERDUE_sum,bureau_AMT_CREDIT_SUM_OVERDUE_mean,bureau_AMT_CREDIT_SUM_OVERDUE_max,bureau_AMT_CREDIT_SUM_OVERDUE_sum,bureau_CNT_CREDIT_PROLONG_mean,bureau_CNT_CREDIT_PROLONG_max,bureau_CNT_CREDIT_PROLONG_sum,bureau_AMT_CREDIT_SUM_DEBT_min,bureau_AMT_CREDIT_SUM_LIMIT_min,bureau_AMT_ANNUITY_min,bureau_AMT_CREDIT_MAX_OVERDUE_count,bureau_AMT_CREDIT_SUM_LIMIT_mean,bureau_AMT_CREDIT_MAX_OVERDUE_sum,bureau_AMT_CREDIT_SUM_LIMIT_sum,bureau_AMT_CREDIT_SUM_LIMIT_max,bureau_DAYS_ENDDATE_FACT_count,bureau_AMT_CREDIT_SUM_LIMIT_count,bureau_AMT_CREDIT_SUM_DEBT_count,bureau_DAYS_CREDIT_ENDDATE_count,bureau_AMT_ANNUITY_count,bureau_AMT_CREDIT_SUM_count,bureau_DAYS_CREDIT_count,bureau_DAYS_CREDIT_ENDDATE_mean,bureau_DAYS_CREDIT_ENDDATE_sum,bureau_DAYS_CREDIT_ENDDATE_max,bureau_AMT_ANNUITY_mean,bureau_AMT_ANNUITY_max,bureau_AMT_ANNUITY_sum,bureau_AMT_CREDIT_SUM_DEBT_mean,bureau_AMT_CREDIT_SUM_min,bureau_AMT_CREDIT_SUM_mean,bureau_AMT_CREDIT_SUM_DEBT_max,bureau_AMT_CREDIT_SUM_max,bureau_AMT_CREDIT_SUM_DEBT_sum,bureau_AMT_CREDIT_SUM_sum,bureau_AMT_CREDIT_MAX_OVERDUE_min,bureau_AMT_CREDIT_MAX_OVERDUE_mean,bureau_AMT_CREDIT_MAX_OVERDUE_max,bureau_CREDIT_TYPE_Mobile operator loan_count_norm,bureau_CREDIT_TYPE_Mobile operator loan_count,bureau_CREDIT_TYPE_Loan for purchase of shares (margin lending)_count_norm,bureau_CREDIT_TYPE_Loan for purchase of shares (margin lending)_count,bureau_CREDIT_ACTIVE_Bad debt_count_norm,bureau_CREDIT_ACTIVE_Bad debt_count,bureau_CREDIT_TYPE_Interbank credit_count_norm,bureau_CREDIT_TYPE_Interbank credit_count,bureau_CREDIT_TYPE_Real estate loan_count_norm,bureau_CREDIT_TYPE_Real estate loan_count,bureau_CREDIT_CURRENCY_currency 4_count_norm,bureau_CREDIT_CURRENCY_currency 4_count,bureau_CREDIT_CURRENCY_currency 3_count_norm,bureau_CREDIT_CURRENCY_currency 3_count,bureau_CREDIT_TYPE_Loan for the purchase of equipment_count_norm,bureau_CREDIT_TYPE_Loan for the purchase of equipment_count,bureau_CREDIT_TYPE_Cash loan (non-earmarked)_count_norm,bureau_CREDIT_TYPE_Cash loan (non-earmarked)_count,bureau_CREDIT_TYPE_Unknown type of loan_count_norm,bureau_CREDIT_TYPE_Unknown type of loan_count,bureau_CREDIT_TYPE_Another type of loan_count_norm,bureau_CREDIT_TYPE_Another type of loan_count,bureau_CREDIT_TYPE_Loan for working capital replenishment_count_norm,bureau_CREDIT_TYPE_Loan for working capital replenishment_count,bureau_CREDIT_CURRENCY_currency 2_count_norm,bureau_CREDIT_CURRENCY_currency 2_count,bureau_CREDIT_ACTIVE_Sold_count_norm,bureau_CREDIT_ACTIVE_Sold_count,bureau_CREDIT_TYPE_Mortgage_count_norm,bureau_CREDIT_TYPE_Mortgage_count,bureau_CREDIT_TYPE_Microloan_count_norm,bureau_CREDIT_TYPE_Microloan_count,bureau_CREDIT_TYPE_Car loan_count_norm,bureau_CREDIT_TYPE_Car loan_count,bureau_CREDIT_TYPE_Loan for business development_count_norm,bureau_CREDIT_TYPE_Loan for business development_count,bureau_CREDIT_TYPE_Credit card_count_norm,bureau_CREDIT_TYPE_Credit card_count,bureau_CREDIT_ACTIVE_Active_count_norm,bureau_CREDIT_ACTIVE_Closed_count_norm,bureau_CREDIT_TYPE_Consumer credit_count_norm,bureau_CREDIT_CURRENCY_currency 1_count_norm,bureau_CREDIT_ACTIVE_Active_count,bureau_CREDIT_ACTIVE_Closed_count,...,loan_card_SK_DPD_DEF_sum_min,loan_card_SK_DPD_DEF_sum_mean,loan_card_SK_DPD_DEF_sum_max,loan_card_SK_DPD_max_min,loan_card_SK_DPD_max_mean,loan_card_SK_DPD_max_max,loan_card_SK_DPD_mean_sum,loan_card_SK_DPD_sum_min,loan_card_SK_DPD_sum_mean,loan_card_SK_DPD_sum_max,loan_card_SK_DPD_DEF_max_sum,loan_card_SK_DPD_DEF_sum_sum,loan_card_SK_DPD_max_sum,loan_card_SK_DPD_sum_sum,loan_card_CNT_DRAWINGS_CURRENT_mean_min,loan_card_CNT_DRAWINGS_CURRENT_mean_mean,loan_card_CNT_DRAWINGS_CURRENT_mean_max,loan_card_CNT_INSTALMENT_MATURE_CUM_min_min,loan_card_CNT_INSTALMENT_MATURE_CUM_min_mean,loan_card_CNT_INSTALMENT_MATURE_CUM_min_max,loan_card_CNT_DRAWINGS_CURRENT_max_min,loan_card_CNT_DRAWINGS_CURRENT_max_mean,loan_card_CNT_DRAWINGS_CURRENT_max_max,loan_card_CNT_DRAWINGS_ATM_CURRENT_sum_min,loan_card_CNT_DRAWINGS_ATM_CURRENT_sum_mean,loan_card_CNT_DRAWINGS_ATM_CURRENT_sum_max,loan_card_CNT_DRAWINGS_CURRENT_sum_min,loan_card_CNT_DRAWINGS_CURRENT_sum_mean,loan_card_CNT_DRAWINGS_CURRENT_sum_max,loan_card_CNT_DRAWINGS_CURRENT_mean_sum,loan_card_CNT_DRAWINGS_ATM_CURRENT_mean_sum,loan_card_CNT_INSTALMENT_MATURE_CUM_mean_min,loan_card_CNT_INSTALMENT_MATURE_CUM_mean_mean,loan_card_CNT_INSTALMENT_MATURE_CUM_mean_max,loan_card_CNT_INSTALMENT_MATURE_CUM_max_min,loan_card_CNT_INSTALMENT_MATURE_CUM_max_mean,loan_card_CNT_INSTALMENT_MATURE_CUM_max_max,loan_card_AMT_DRAWINGS_ATM_CURRENT_count_min,loan_card_AMT_DRAWINGS_ATM_CURRENT_count_mean,loan_card_AMT_DRAWINGS_ATM_CURRENT_count_max,loan_card_AMT_PAYMENT_CURRENT_count_min,loan_card_AMT_PAYMENT_CURRENT_count_mean,loan_card_AMT_PAYMENT_CURRENT_count_max,loan_card_CNT_INSTALMENT_MATURE_CUM_min_sum,loan_card_AMT_PAYMENT_CURRENT_min_count,loan_card_CNT_DRAWINGS_OTHER_CURRENT_min_count,loan_card_CNT_DRAWINGS_ATM_CURRENT_max_sum,loan_card_CNT_DRAWINGS_CURRENT_max_sum,loan_card_CNT_DRAWINGS_ATM_CURRENT_sum_sum,loan_card_CNT_DRAWINGS_CURRENT_sum_sum,loan_card_CNT_INSTALMENT_MATURE_CUM_sum_min,loan_card_CNT_INSTALMENT_MATURE_CUM_sum_mean,loan_card_CNT_INSTALMENT_MATURE_CUM_sum_max,loan_card_CNT_INSTALMENT_MATURE_CUM_mean_sum,loan_card_AMT_DRAWINGS_CURRENT_mean_min,loan_card_AMT_DRAWINGS_CURRENT_mean_mean,loan_card_AMT_DRAWINGS_CURRENT_mean_max,loan_card_CNT_INSTALMENT_MATURE_CUM_max_sum,loan_card_AMT_INST_MIN_REGULARITY_mean_min,loan_card_AMT_INST_MIN_REGULARITY_mean_mean,loan_card_AMT_INST_MIN_REGULARITY_mean_max,loan_card_AMT_PAYMENT_TOTAL_CURRENT_mean_min,loan_card_AMT_PAYMENT_TOTAL_CURRENT_mean_mean,loan_card_AMT_PAYMENT_TOTAL_CURRENT_mean_max,loan_card_AMT_DRAWINGS_ATM_CURRENT_count_sum,loan_card_AMT_PAYMENT_CURRENT_count_sum,loan_card_AMT_INST_MIN_REGULARITY_max_min,loan_card_AMT_INST_MIN_REGULARITY_max_mean,loan_card_AMT_INST_MIN_REGULARITY_max_max,loan_card_AMT_RECEIVABLE_PRINCIPAL_mean_min,loan_card_AMT_RECEIVABLE_PRINCIPAL_mean_mean,loan_card_AMT_RECEIVABLE_PRINCIPAL_mean_max,loan_card_AMT_RECIVABLE_mean_min,loan_card_AMT_RECIVABLE_mean_mean,loan_card_AMT_RECIVABLE_mean_max,loan_card_AMT_TOTAL_RECEIVABLE_mean_min,loan_card_AMT_TOTAL_RECEIVABLE_mean_mean,loan_card_AMT_TOTAL_RECEIVABLE_mean_max,loan_card_AMT_BALANCE_mean_min,loan_card_AMT_BALANCE_mean_mean,loan_card_AMT_BALANCE_mean_max,loan_card_AMT_PAYMENT_TOTAL_CURRENT_max_min,loan_card_AMT_PAYMENT_TOTAL_CURRENT_max_mean,loan_card_AMT_PAYMENT_TOTAL_CURRENT_max_max,loan_card_CNT_INSTALMENT_MATURE_CUM_sum_sum,loan_card_AMT_DRAWINGS_CURRENT_max_min,loan_card_AMT_DRAWINGS_CURRENT_max_mean,loan_card_AMT_DRAWINGS_CURRENT_max_max,loan_card_AMT_RECEIVABLE_PRINCIPAL_max_min,loan_card_AMT_RECEIVABLE_PRINCIPAL_max_mean,loan_card_AMT_RECEIVABLE_PRINCIPAL_max_max,loan_card_AMT_DRAWINGS_ATM_CURRENT_sum_min,loan_card_AMT_DRAWINGS_ATM_CURRENT_sum_mean,loan_card_AMT_DRAWINGS_ATM_CURRENT_sum_max,loan_card_AMT_DRAWINGS_CURRENT_sum_min,loan_card_AMT_DRAWINGS_CURRENT_sum_mean,loan_card_AMT_DRAWINGS_CURRENT_sum_max,loan_card_AMT_DRAWINGS_CURRENT_mean_sum,loan_card_AMT_DRAWINGS_ATM_CURRENT_mean_sum,loan_card_AMT_RECIVABLE_max_min,loan_card_AMT_TOTAL_RECEIVABLE_max_min,loan_card_AMT_RECIVABLE_max_mean,loan_card_AMT_TOTAL_RECEIVABLE_max_mean,loan_card_AMT_RECIVABLE_max_max,loan_card_AMT_TOTAL_RECEIVABLE_max_max,loan_card_AMT_BALANCE_max_min,loan_card_AMT_BALANCE_max_mean,loan_card_AMT_BALANCE_max_max,loan_card_AMT_INST_MIN_REGULARITY_sum_min,loan_card_AMT_INST_MIN_REGULARITY_sum_mean,loan_card_AMT_INST_MIN_REGULARITY_sum_max,loan_card_AMT_INST_MIN_REGULARITY_mean_sum,loan_card_AMT_PAYMENT_TOTAL_CURRENT_sum_min,loan_card_AMT_PAYMENT_TOTAL_CURRENT_sum_mean,loan_card_AMT_PAYMENT_TOTAL_CURRENT_sum_max,loan_card_AMT_PAYMENT_TOTAL_CURRENT_mean_sum,loan_card_AMT_PAYMENT_CURRENT_mean_sum,loan_card_AMT_PAYMENT_CURRENT_sum_min,loan_card_AMT_PAYMENT_CURRENT_sum_mean,loan_card_AMT_PAYMENT_CURRENT_sum_max,loan_card_AMT_INST_MIN_REGULARITY_max_sum,loan_card_AMT_RECEIVABLE_PRINCIPAL_sum_min,loan_card_AMT_RECEIVABLE_PRINCIPAL_sum_mean,loan_card_AMT_RECEIVABLE_PRINCIPAL_sum_max,loan_card_AMT_RECEIVABLE_PRINCIPAL_mean_sum,loan_card_AMT_RECIVABLE_sum_min,loan_card_AMT_RECIVABLE_sum_mean,loan_card_AMT_RECIVABLE_sum_max,loan_card_AMT_TOTAL_RECEIVABLE_sum_min,loan_card_AMT_TOTAL_RECEIVABLE_sum_mean,loan_card_AMT_TOTAL_RECEIVABLE_sum_max,loan_card_AMT_RECIVABLE_mean_sum,loan_card_AMT_TOTAL_RECEIVABLE_mean_sum,loan_card_AMT_BALANCE_mean_sum,loan_card_AMT_BALANCE_sum_min,loan_card_AMT_BALANCE_sum_mean,loan_card_AMT_BALANCE_sum_max,loan_card_AMT_PAYMENT_TOTAL_CURRENT_max_sum,loan_card_AMT_PAYMENT_CURRENT_max_sum,loan_card_AMT_DRAWINGS_ATM_CURRENT_max_sum,loan_card_AMT_DRAWINGS_CURRENT_max_sum,loan_card_AMT_RECEIVABLE_PRINCIPAL_max_sum,loan_card_AMT_DRAWINGS_ATM_CURRENT_sum_sum,loan_card_AMT_DRAWINGS_CURRENT_sum_sum,loan_card_AMT_RECIVABLE_max_sum,loan_card_AMT_TOTAL_RECEIVABLE_max_sum,loan_card_AMT_BALANCE_max_sum,loan_card_AMT_INST_MIN_REGULARITY_sum_sum,loan_card_AMT_PAYMENT_TOTAL_CURRENT_sum_sum,loan_card_AMT_PAYMENT_CURRENT_sum_sum,loan_card_AMT_RECEIVABLE_PRINCIPAL_sum_sum,loan_card_AMT_RECIVABLE_sum_sum,loan_card_AMT_TOTAL_RECEIVABLE_sum_sum,loan_card_AMT_BALANCE_sum_sum,loan_card_NAME_CONTRACT_STATUS_Active_count_norm_min,loan_card_NAME_CONTRACT_STATUS_Active_count_norm_mean,loan_card_NAME_CONTRACT_STATUS_Active_count_norm_max,loan_card_AMT_INST_MIN_REGULARITY_count_min,loan_card_AMT_INST_MIN_REGULARITY_count_mean,loan_card_AMT_INST_MIN_REGULARITY_count_max,loan_card_NAME_CONTRACT_STATUS_Active_count_min,loan_card_NAME_CONTRACT_STATUS_Active_count_mean,loan_card_NAME_CONTRACT_STATUS_Active_count_max,loan_card_NAME_CONTRACT_STATUS_Active_count_norm_sum,loan_card_MONTHS_BALANCE_count_min,loan_card_MONTHS_BALANCE_count_mean,loan_card_MONTHS_BALANCE_count_max,loan_card_NAME_CONTRACT_STATUS_Approved_count_norm_count,loan_card_AMT_INST_MIN_REGULARITY_count_sum,loan_card_NAME_CONTRACT_STATUS_Active_count_sum,loan_card_MONTHS_BALANCE_count_sum,loan_card_AMT_CREDIT_LIMIT_ACTUAL_min_min,loan_card_AMT_CREDIT_LIMIT_ACTUAL_min_mean,loan_card_AMT_CREDIT_LIMIT_ACTUAL_min_max,loan_card_AMT_CREDIT_LIMIT_ACTUAL_mean_min,loan_card_AMT_CREDIT_LIMIT_ACTUAL_mean_mean,loan_card_AMT_CREDIT_LIMIT_ACTUAL_mean_max,loan_card_AMT_CREDIT_LIMIT_ACTUAL_max_min,loan_card_AMT_CREDIT_LIMIT_ACTUAL_max_mean,loan_card_AMT_CREDIT_LIMIT_ACTUAL_max_max,loan_card_AMT_CREDIT_LIMIT_ACTUAL_min_sum,loan_card_AMT_CREDIT_LIMIT_ACTUAL_sum_min,loan_card_AMT_CREDIT_LIMIT_ACTUAL_sum_mean,loan_card_AMT_CREDIT_LIMIT_ACTUAL_sum_max,loan_card_AMT_CREDIT_LIMIT_ACTUAL_mean_sum,loan_card_AMT_CREDIT_LIMIT_ACTUAL_max_sum,loan_card_AMT_CREDIT_LIMIT_ACTUAL_sum_sum,loan_card_CNT_DRAWINGS_OTHER_CURRENT_min_mean,loan_card_CNT_DRAWINGS_OTHER_CURRENT_min_max,loan_card_CNT_DRAWINGS_OTHER_CURRENT_min_min,loan_card_AMT_DRAWINGS_OTHER_CURRENT_min_mean,loan_card_AMT_DRAWINGS_OTHER_CURRENT_min_max,loan_card_AMT_DRAWINGS_OTHER_CURRENT_min_min,loan_card_CNT_DRAWINGS_POS_CURRENT_min_min,loan_card_CNT_DRAWINGS_POS_CURRENT_min_mean,loan_card_CNT_DRAWINGS_POS_CURRENT_min_max,loan_card_AMT_DRAWINGS_POS_CURRENT_min_min,loan_card_AMT_DRAWINGS_POS_CURRENT_min_mean,loan_card_AMT_DRAWINGS_POS_CURRENT_min_max,loan_card_CNT_DRAWINGS_ATM_CURRENT_min_min,loan_card_CNT_DRAWINGS_ATM_CURRENT_min_mean,loan_card_CNT_DRAWINGS_ATM_CURRENT_min_max,loan_card_AMT_DRAWINGS_ATM_CURRENT_min_min,loan_card_AMT_DRAWINGS_ATM_CURRENT_min_mean,loan_card_AMT_DRAWINGS_ATM_CURRENT_min_max,loan_card_AMT_PAYMENT_CURRENT_min_min,loan_card_AMT_PAYMENT_CURRENT_min_mean,loan_card_AMT_PAYMENT_CURRENT_min_max,loan_card_CNT_DRAWINGS_OTHER_CURRENT_mean_min,loan_card_CNT_DRAWINGS_OTHER_CURRENT_mean_mean,loan_card_CNT_DRAWINGS_OTHER_CURRENT_mean_max,loan_card_CNT_DRAWINGS_OTHER_CURRENT_max_min,loan_card_CNT_DRAWINGS_OTHER_CURRENT_max_mean,loan_card_CNT_DRAWINGS_OTHER_CURRENT_max_max,loan_card_AMT_DRAWINGS_OTHER_CURRENT_mean_min,loan_card_AMT_DRAWINGS_OTHER_CURRENT_mean_mean,loan_card_AMT_DRAWINGS_OTHER_CURRENT_mean_max,loan_card_AMT_DRAWINGS_OTHER_CURRENT_max_min,loan_card_AMT_DRAWINGS_OTHER_CURRENT_max_mean,loan_card_AMT_DRAWINGS_OTHER_CURRENT_max_max,loan_card_CNT_DRAWINGS_POS_CURRENT_mean_min,loan_card_CNT_DRAWINGS_POS_CURRENT_mean_mean,loan_card_CNT_DRAWINGS_POS_CURRENT_mean_max,loan_card_CNT_DRAWINGS_POS_CURRENT_max_min,loan_card_CNT_DRAWINGS_POS_CURRENT_max_mean,loan_card_CNT_DRAWINGS_POS_CURRENT_max_max,loan_card_AMT_DRAWINGS_POS_CURRENT_mean_min,loan_card_AMT_DRAWINGS_POS_CURRENT_mean_mean,loan_card_AMT_DRAWINGS_POS_CURRENT_mean_max,loan_card_AMT_DRAWINGS_POS_CURRENT_max_min,loan_card_AMT_DRAWINGS_POS_CURRENT_max_mean,loan_card_AMT_DRAWINGS_POS_CURRENT_max_max,loan_card_CNT_DRAWINGS_ATM_CURRENT_mean_min,loan_card_CNT_DRAWINGS_ATM_CURRENT_mean_mean,loan_card_CNT_DRAWINGS_ATM_CURRENT_mean_max,loan_card_CNT_DRAWINGS_ATM_CURRENT_max_min,loan_card_CNT_DRAWINGS_ATM_CURRENT_max_mean,loan_card_CNT_DRAWINGS_ATM_CURRENT_max_max,loan_card_AMT_DRAWINGS_ATM_CURRENT_mean_min,loan_card_AMT_DRAWINGS_ATM_CURRENT_mean_mean,loan_card_AMT_DRAWINGS_ATM_CURRENT_mean_max,loan_card_AMT_PAYMENT_CURRENT_mean_min,loan_card_AMT_PAYMENT_CURRENT_mean_mean,loan_card_AMT_PAYMENT_CURRENT_mean_max,loan_card_AMT_PAYMENT_CURRENT_max_min,loan_card_AMT_PAYMENT_CURRENT_max_mean,loan_card_AMT_PAYMENT_CURRENT_max_max,loan_card_AMT_DRAWINGS_ATM_CURRENT_max_min,loan_card_AMT_DRAWINGS_ATM_CURRENT_max_mean,loan_card_AMT_DRAWINGS_ATM_CURRENT_max_max
0,,1.0,0.0021,0.0011,0.0018,0.0003,0.0004,0.0002,0.0008,0.0015,0.0001,0.001,0.0004,0.0018,0.0028,0.0013,0.0004,0.0028,0.0028,0.0003,0.0029,0.0011,0.0011,0.0003,0.0003,0.0011,0.0029,0.0019,0.0016,0.0001,0.0001,0.0023,0.0002,0.0016,0.0021,0.0016,0.0059,0.0015,0.0049,0.0029,0.0048,0.0031,0.0015,0.0031,0.0018,0.0026,0.003,0.002,0.0014,0.0019,0.0052,0.0011,0.005,0.0028,0.0044,0.0021,0.0015,0.0036,0.0022,0.0019,0.0019,0.002,0.0016,0.0014,0.0058,0.001,0.0051,0.0026,0.0046,0.0028,0.0017,0.0033,0.0022,0.003,0.0024,0.0023,0.0014,0.0001,0.0014,0.0012,0.0009,0.0007,0.0034,0.0041,0.0011,0.0021,0.0027,0.0018,0.0015,0.0008,0.002,0.001,0.0009,0.0011,0.0026,0.0007,0.0015,0.0005,0.0002,0.0011,0.0003,0.0027,0.0022,0.0021,0.0005,0.001,0.0047,0.0017,0.0,0.0011,0.0,0.0002,0.0003,0.0002,0.0003,0.0003,0.0008,0.0019,0.0006,0.0002,0.0024,0.0018,0.0018,0.0015,0.0018,0.0018,0.0019,0.0017,0.0017,0.0009,0.0009,0.0003,0.0003,0.0011,0.0015,0.0016,0.0021,0.0001,0.0002,0.0014,0.0036,0.0003,0.0013,0.0003,0.0013,0.0017,0.0011,0.0011,0.0013,0.0012,0.0012,0.0017,0.001,0.0001,0.0009,0.0003,0.0036,0.0008,0.0025,0.0017,0.0026,0.0051,0.0042,0.0014,0.0024,0.0022,0.0017,0.0007,0.0008,0.0018,0.0005,0.0004,0.0006,0.0009,0.0011,0.0004,0.0002,0.0019,0.0006,0.0007,0.0021,0.0002,0.0016,0.0023,0.003,0.0032,0.0018,0.0,0.0026,0.0026,0.001,0.0011,0.0017,0.0076,0.0082,0.0058,0.0011,0.0006,0.0014,0.0012,0.0013,0.0006,0.0031,0.0036,0.003,0.0025,0.0013,0.0013,0.0038,0.0027,0.003,0.0013,0.0025,0.0025,0.0017,0.0006,0.0007,0.0009,0.0028,0.0033,0.0011,0.0017,0.0008,0.001,0.0007,0.0,0.0004,0.0014,0.0007,0.0013,0.0001,0.0023,0.0063,0.0054,0.0004,0.0015,0.0,0.0005,0.0007,0.0039,0.0003,0.0003,0.0019,0.0041,0.0012,0.0005,0.0021,0.0008,0.0024,0.0023,...,0.0013,0.0026,0.0028,0.002,0.0023,0.0023,0.0014,0.0011,0.0014,0.0014,0.0017,0.0028,0.0019,0.001,0.0053,0.0052,0.0053,0.0001,0.0008,0.0009,0.0059,0.0058,0.0056,0.0036,0.0039,0.0038,0.0069,0.0069,0.0069,0.0069,0.0041,0.0014,0.0013,0.0013,0.0021,0.0021,0.0021,0.0001,0.0003,0.0004,0.0005,0.0,0.0001,0.0019,0.0002,0.0004,0.0032,0.0063,0.0032,0.0065,0.001,0.001,0.001,0.001,0.002,0.0018,0.0018,0.0006,0.0032,0.0032,0.0031,0.0033,0.0031,0.0028,0.0023,0.0019,0.0039,0.0036,0.0033,0.0021,0.0021,0.002,0.0021,0.002,0.002,0.0021,0.002,0.002,0.002,0.002,0.002,0.0021,0.0019,0.0017,0.002,0.0022,0.0019,0.0018,0.0036,0.0036,0.0034,0.0021,0.0021,0.002,0.0035,0.0034,0.0032,0.0032,0.002,0.0036,0.0036,0.0036,0.0036,0.0034,0.0034,0.0037,0.0036,0.0034,0.0038,0.0039,0.0038,0.0039,0.003,0.003,0.0029,0.0028,0.0021,0.0028,0.0028,0.0026,0.0023,0.0044,0.0044,0.0044,0.0043,0.0043,0.0043,0.0042,0.0043,0.0043,0.0042,0.0042,0.0042,0.0042,0.0043,0.0043,0.0042,0.0013,0.0013,0.0025,0.0012,0.0031,0.0017,0.0013,0.0031,0.0031,0.0031,0.0013,0.0006,0.0007,0.0022,0.002,0.0021,0.0021,0.0038,0.0033,0.0032,0.0009,0.0014,0.0014,0.0019,0.0024,0.0025,0.0026,0.0015,0.0021,0.0022,0.0023,0.0024,0.0035,0.003,0.004,0.004,0.0038,0.0032,0.0033,0.0031,0.0025,0.0025,0.0023,0.0055,0.0049,0.0049,0.0049,0.0047,0.0031,0.0025,0.0002,0.0002,0.0002,0.0001,0.0001,0.0001,0.0038,0.0039,0.0044,0.0024,0.0025,0.0034,0.0022,0.0022,0.0015,0.0011,0.0011,0.0004,0.0008,0.0008,0.0005,0.0022,0.0023,0.0024,0.0031,0.0034,0.0035,0.0068,0.0068,0.0069,0.0077,0.0077,0.0077,0.0044,0.0045,0.0048,0.0038,0.0038,0.0038,0.0007,0.0007,0.0011,0.0036,0.0036,0.0035,0.0026,0.0025,0.0021,0.0002,0.0,0.0,0.004,0.0039,0.0037,0.0014,0.0014,0.0013,0.0014,0.0014,0.0014,0.0021,0.0021,0.0022
1,,,0.0021,0.0011,0.0018,0.0003,0.0004,0.0002,0.0008,0.0015,0.0001,0.001,0.0004,0.0018,0.0028,0.0013,0.0004,0.0028,0.0028,0.0003,0.0029,0.0011,0.0011,0.0004,0.0003,0.0011,0.0029,0.0019,0.0016,0.0001,0.0001,0.0023,0.0002,0.0016,0.0021,0.0016,0.0059,0.0015,0.0049,0.0029,0.0049,0.0031,0.0015,0.0031,0.0018,0.0026,0.003,0.002,0.0014,0.0019,0.0052,0.0011,0.005,0.0028,0.0044,0.0021,0.0015,0.0036,0.0022,0.0019,0.0019,0.002,0.0016,0.0014,0.0058,0.001,0.0051,0.0026,0.0046,0.0028,0.0017,0.0033,0.0022,0.003,0.0024,0.0023,0.0014,0.0001,0.0014,0.0012,0.0009,0.0007,0.0034,0.0041,0.0011,0.0021,0.0027,0.0018,0.0015,0.0008,0.002,0.001,0.0009,0.0011,0.0026,0.0007,0.0014,0.0005,0.0002,0.0011,0.0003,0.0027,0.0022,0.0021,0.0005,0.001,0.0047,0.0017,0.0,0.0011,0.0,0.0002,0.0003,0.0002,0.0003,0.0003,0.0008,0.0019,0.0006,0.0002,0.0024,0.0018,0.0018,0.0015,0.0018,0.0018,0.0019,0.0017,0.0017,0.0009,0.0009,0.0003,0.0003,0.0011,0.0014,0.0016,0.0021,0.0001,0.0002,0.0014,0.0036,0.0003,0.0013,0.0003,0.0012,0.0017,0.0011,0.0011,0.0013,0.0012,0.0012,0.0017,0.001,0.0001,0.0009,0.0003,0.0036,0.0008,0.0025,0.0017,0.0026,0.0051,0.0042,0.0014,0.0024,0.0022,0.0017,0.0007,0.0008,0.0018,0.0005,0.0004,0.0006,0.0009,0.0011,0.0004,0.0002,0.0019,0.0006,0.0007,0.0021,0.0002,0.0016,0.0023,0.003,0.0032,0.0018,0.0,0.0026,0.0026,0.001,0.0011,0.0017,0.0076,0.0082,0.0058,0.001,0.0006,0.0014,0.0012,0.0013,0.0006,0.0031,0.0036,0.003,0.0025,0.0013,0.0013,0.0038,0.0027,0.003,0.0013,0.0025,0.0025,0.0017,0.0006,0.0007,0.0009,0.0028,0.0033,0.0011,0.0017,0.0008,0.001,0.0008,0.0,0.0004,0.0014,0.0007,0.0013,0.0001,0.0023,0.0063,0.0054,0.0004,0.0015,0.0,0.0005,0.0007,0.0039,0.0003,0.0003,0.0019,0.0041,0.0012,0.0005,0.0021,0.0008,0.0024,0.0023,...,0.0013,0.0026,0.0028,0.002,0.0023,0.0023,0.0014,0.0011,0.0014,0.0014,0.0016,0.0028,0.0019,0.001,0.0053,0.0052,0.0053,0.0001,0.0008,0.0009,0.0059,0.0058,0.0056,0.0036,0.0039,0.0038,0.0069,0.0069,0.0069,0.0069,0.0042,0.0014,0.0013,0.0013,0.0021,0.0021,0.0021,0.0001,0.0003,0.0004,0.0005,0.0,0.0001,0.0019,0.0002,0.0004,0.0032,0.0063,0.0032,0.0065,0.001,0.001,0.001,0.001,0.002,0.0018,0.0018,0.0006,0.0032,0.0032,0.0031,0.0033,0.0031,0.0028,0.0023,0.0019,0.0039,0.0036,0.0033,0.0021,0.0021,0.002,0.0021,0.002,0.002,0.0021,0.002,0.002,0.002,0.002,0.002,0.0021,0.0019,0.0017,0.002,0.0022,0.0019,0.0018,0.0036,0.0036,0.0034,0.0021,0.0021,0.002,0.0035,0.0034,0.0032,0.0032,0.002,0.0036,0.0036,0.0036,0.0036,0.0034,0.0034,0.0037,0.0036,0.0034,0.0038,0.0039,0.0038,0.0039,0.003,0.003,0.0029,0.0028,0.0021,0.0028,0.0028,0.0026,0.0023,0.0044,0.0044,0.0044,0.0043,0.0043,0.0043,0.0042,0.0043,0.0043,0.0042,0.0042,0.0042,0.0042,0.0043,0.0043,0.0042,0.0013,0.0013,0.0025,0.0012,0.0031,0.0017,0.0013,0.0031,0.0031,0.0031,0.0013,0.0006,0.0007,0.0022,0.002,0.0021,0.0021,0.0038,0.0033,0.0032,0.0009,0.0014,0.0014,0.0019,0.0024,0.0025,0.0026,0.0015,0.0022,0.0022,0.0023,0.0024,0.0035,0.003,0.004,0.004,0.0038,0.0032,0.0033,0.0031,0.0025,0.0025,0.0023,0.0055,0.0049,0.0049,0.0049,0.0047,0.0031,0.0025,0.0002,0.0002,0.0002,0.0001,0.0001,0.0001,0.0038,0.0039,0.0044,0.0024,0.0025,0.0034,0.0022,0.0022,0.0015,0.0011,0.0011,0.0004,0.0008,0.0008,0.0005,0.0022,0.0023,0.0023,0.0031,0.0034,0.0035,0.0068,0.0068,0.0069,0.0077,0.0077,0.0077,0.0044,0.0045,0.0048,0.0038,0.0038,0.0038,0.0007,0.0007,0.0011,0.0036,0.0036,0.0035,0.0026,0.0025,0.0021,0.0002,0.0,0.0,0.004,0.0039,0.0037,0.0014,0.0014,0.0013,0.0014,0.0014,0.0014,0.0021,0.0021,0.0022
2,,,,0.0192,0.004,0.0304,0.0128,0.0396,0.0372,0.0782,0.075,0.042,0.0515,0.0376,0.0005,0.046,0.0285,0.0004,0.0238,0.0018,0.0093,0.0589,0.0609,0.0242,0.0056,0.0069,0.0028,0.0444,0.051,0.0325,0.1553,0.1605,0.1789,0.0295,0.0227,0.0097,0.0221,0.0185,0.0342,0.0192,0.044,0.0336,0.0109,0.025,0.033,0.0032,0.0136,0.0273,0.02,0.009,0.0221,0.0163,0.0321,0.0174,0.0432,0.0327,0.0102,0.0234,0.0307,0.0016,0.0127,0.0292,0.0221,0.01,0.0223,0.0186,0.0339,0.019,0.0438,0.0334,0.0113,0.0246,0.0327,0.0028,0.0133,0.0326,0.0091,0.0322,0.009,0.0313,0.0552,0.0054,0.0443,0.0027,0.0003,0.0286,0.0015,0.008,0.0044,0.0014,0.0042,0.0008,0.0116,0.0095,0.0065,0.0116,0.0034,0.008,0.0014,0.0002,0.0037,0.0009,0.0027,0.0008,0.0125,0.002,0.0199,0.0077,0.0321,0.0694,0.0394,0.0283,0.0312,0.0234,0.0313,0.0307,0.0254,0.0056,0.0125,0.0066,0.0089,0.0041,0.0037,0.0079,0.0024,0.0066,0.0056,0.0172,0.0172,0.0208,0.0208,0.0448,0.0448,0.0129,0.0779,0.0812,0.0714,0.068,0.0092,0.0488,0.0291,0.034,0.0003,0.0212,0.2221,0.0477,0.0658,0.0606,0.0141,0.0131,0.042,0.0489,0.0752,0.0343,0.0559,0.0532,0.0897,0.0414,0.0196,0.0429,0.0689,0.0498,0.0282,0.0002,0.0076,0.0,0.0081,0.0055,0.0063,0.0072,0.0106,0.0133,0.003,0.004,0.0041,0.0002,0.0048,0.0025,0.0082,0.0114,0.0025,0.0094,0.0106,0.0305,0.0034,0.0067,0.004,0.0154,0.0041,0.0041,0.047,0.0537,0.0366,0.0014,0.0011,0.002,0.0006,0.0108,0.02,0.0022,0.0197,0.0071,0.0141,0.0023,0.0024,0.0025,0.0006,0.0006,0.0009,0.0011,0.0046,0.004,0.0006,0.0006,0.0019,0.0027,0.0014,0.0017,0.0028,0.0029,0.0003,0.0045,0.001,0.0011,0.0025,0.0023,0.0018,0.0012,0.0016,0.006,0.0068,0.006,0.0165,0.0121,0.0209,0.0233,0.0444,0.0341,0.0201,0.0208,0.0028,0.0038,0.0347,0.0348,0.0774,0.0794,0.0263,0.006,0.0671,0.0308,...,0.0032,0.0066,0.007,0.0065,0.0061,0.006,0.0036,0.0045,0.0037,0.0036,0.0056,0.0074,0.0067,0.0036,0.0823,0.0825,0.0826,0.0319,0.0312,0.0309,0.1006,0.1009,0.1014,0.0498,0.0495,0.0498,0.0502,0.0504,0.0507,0.0507,0.0535,0.0292,0.0289,0.0288,0.0179,0.0177,0.0176,0.04,0.0397,0.0395,0.0422,0.0418,0.0416,0.031,0.0395,0.0393,0.0075,0.0225,0.0069,0.0043,0.0429,0.0425,0.0424,0.0426,0.0587,0.0587,0.0584,0.0367,0.0736,0.0737,0.0742,0.0227,0.0227,0.0224,0.0513,0.0518,0.0627,0.0634,0.0639,0.0859,0.0861,0.0865,0.0863,0.0865,0.0869,0.0863,0.0865,0.0869,0.087,0.0872,0.0876,0.028,0.0281,0.0281,0.044,0.052,0.0521,0.0523,0.0664,0.0666,0.0669,0.0378,0.0377,0.038,0.0235,0.0235,0.0236,0.0237,0.0405,0.0676,0.0676,0.0677,0.0677,0.0681,0.0681,0.0683,0.0684,0.0688,0.0026,0.0026,0.0028,0.0026,0.005,0.005,0.005,0.0049,0.0014,0.0052,0.0052,0.0052,0.0119,0.018,0.0181,0.0184,0.0185,0.0178,0.018,0.0182,0.0178,0.018,0.0183,0.0183,0.0184,0.0187,0.0182,0.0184,0.0186,0.0073,0.007,0.0118,0.0145,0.0099,0.0093,0.0152,0.0097,0.0096,0.0093,0.0279,0.0291,0.0308,0.0196,0.0196,0.0195,0.0196,0.0194,0.0199,0.0214,0.0565,0.0561,0.0559,0.0598,0.0598,0.0597,0.0594,0.0613,0.061,0.0608,0.0605,0.0558,0.0585,0.0589,0.0014,0.0012,0.0011,0.009,0.0089,0.0089,0.012,0.0117,0.0117,0.02,0.0459,0.0458,0.0456,0.0455,0.0513,0.0512,0.0029,0.0029,0.0029,0.0027,0.0027,0.0027,0.0178,0.0178,0.0174,0.005,0.005,0.0054,0.0228,0.0228,0.0228,0.013,0.013,0.0129,0.0125,0.0125,0.0125,0.0147,0.0148,0.0147,0.0004,0.0,0.0,0.0097,0.0098,0.0097,0.0038,0.0041,0.004,0.0525,0.0525,0.0526,0.0685,0.0687,0.0689,0.0041,0.004,0.0042,0.0096,0.0093,0.0092,0.1076,0.1077,0.1083,0.0635,0.0632,0.0637,0.0599,0.0599,0.0602,0.0053,0.0053,0.0051,0.0003,0.0005,0.0004,0.0238,0.0238,0.0241
3,,,,,0.0129,0.0021,0.0214,0.0018,0.0256,0.3309,0.0611,0.1834,0.028,0.0085,0.001,0.2407,0.0556,0.0008,0.0299,0.0226,0.8792,0.0254,0.0248,0.0073,0.0133,0.0082,0.0148,0.0201,0.0706,0.07,0.1385,0.018,0.0427,0.0132,0.0085,0.0069,0.0302,0.0001,0.0071,0.0083,0.0097,0.0088,0.0031,0.0086,0.0101,0.0041,0.0,0.0121,0.0085,0.0062,0.0295,0.0004,0.0064,0.0069,0.0096,0.008,0.0022,0.008,0.0095,0.0041,0.0002,0.013,0.0088,0.0064,0.0301,0.0006,0.0067,0.0083,0.0094,0.0082,0.0028,0.008,0.0101,0.0041,0.0001,0.008,0.0156,0.0013,0.0152,0.0019,0.0059,0.0018,0.0568,0.0037,0.0167,0.157,0.0015,0.0517,0.002,0.0028,0.0053,0.0003,0.0039,0.0055,0.0036,0.0107,0.0008,0.004,0.0009,0.001,0.0025,0.0004,0.0004,0.0024,0.0108,0.0078,0.0415,0.016,0.0273,0.0226,0.0061,0.0038,0.0132,0.0023,0.0009,0.0045,0.0112,0.001,0.2744,0.2317,0.0523,0.013,0.0129,0.0023,0.0094,0.0074,0.001,0.0164,0.0164,0.1231,0.1231,0.06,0.06,0.0309,0.3314,0.3202,0.3301,0.01,0.076,0.0493,0.0001,0.0997,0.0057,0.9513,0.0661,0.0042,0.1452,0.0254,0.0143,0.0142,0.0082,0.0047,0.0192,0.0081,0.0135,0.015,0.0261,0.0068,0.0124,0.0084,0.0195,0.0249,0.0221,0.0062,0.0008,0.0017,0.0027,0.0028,0.0024,0.0018,0.0015,0.0011,0.002,0.0037,0.0041,0.0131,0.0025,0.0014,0.0232,0.003,0.0004,0.0031,0.0021,0.0006,0.007,0.0111,0.0018,0.0052,0.0029,0.0029,0.0197,0.0251,0.0233,0.0023,0.0039,0.0019,0.0383,0.0108,0.0353,0.0514,0.0559,0.0461,0.0341,0.0019,0.0019,0.0012,0.0042,0.0042,0.0001,0.0009,0.0014,0.0015,0.0011,0.0011,0.0065,0.0046,0.0041,0.0029,0.002,0.0029,0.0016,0.0001,0.0024,0.0019,0.0018,0.003,0.0032,0.0037,0.0001,0.003,0.0027,0.0022,0.001,0.005,0.0724,0.0924,0.0087,0.0097,0.015,0.0267,0.002,0.0039,0.002,0.0015,0.0132,0.0131,0.0231,0.0024,0.0088,0.0007,...,0.0006,0.0003,0.0004,0.0042,0.0039,0.0039,0.0048,0.0051,0.0048,0.0048,0.0007,0.0004,0.0038,0.0047,0.0412,0.0412,0.0408,0.0336,0.0336,0.0336,0.0475,0.0475,0.0472,0.0165,0.0167,0.0168,0.0249,0.0248,0.0247,0.0246,0.0173,0.0398,0.0404,0.0404,0.04,0.0405,0.0405,0.0298,0.0303,0.0304,0.0294,0.0299,0.03,0.033,0.0311,0.0313,0.0227,0.008,0.0304,0.0059,0.0419,0.0424,0.0425,0.0424,0.0258,0.0257,0.0253,0.0434,0.013,0.0129,0.0129,0.0157,0.0155,0.0153,0.0318,0.0312,0.0145,0.0144,0.0143,0.0198,0.0197,0.0195,0.0195,0.0195,0.0193,0.0195,0.0195,0.0193,0.0197,0.0196,0.0194,0.0112,0.0109,0.0108,0.0422,0.0173,0.017,0.0167,0.0154,0.0152,0.0151,0.0163,0.0165,0.0167,0.0047,0.0045,0.0043,0.0043,0.0181,0.0152,0.0152,0.015,0.015,0.0149,0.0149,0.0153,0.0152,0.0151,0.0282,0.0286,0.0287,0.0287,0.0107,0.0111,0.0111,0.0112,0.0107,0.0103,0.0106,0.0106,0.0235,0.0199,0.0203,0.0203,0.0205,0.0202,0.0206,0.0207,0.0202,0.0206,0.0207,0.0208,0.0208,0.0207,0.0201,0.0205,0.0206,0.012,0.0118,0.023,0.0142,0.0238,0.0305,0.0204,0.0243,0.0243,0.0241,0.0429,0.0329,0.0323,0.0384,0.0385,0.0385,0.0385,0.0108,0.0107,0.0104,0.0358,0.0364,0.0364,0.0351,0.0356,0.0357,0.0358,0.036,0.0366,0.0366,0.0367,0.0336,0.0318,0.0325,0.0003,0.0002,0.0,0.0011,0.0013,0.0013,0.0025,0.0026,0.0026,0.0235,0.0379,0.0386,0.0387,0.0388,0.0388,0.0446,0.001,0.001,0.001,0.0007,0.0007,0.0007,0.0075,0.0075,0.0074,0.0095,0.0096,0.0101,0.0005,0.0005,0.0005,0.0035,0.0035,0.0035,0.0082,0.0082,0.0078,0.0025,0.0026,0.0025,0.0007,0.001,0.0012,0.0041,0.0041,0.0041,0.007,0.0069,0.0067,0.0487,0.0487,0.0482,0.0614,0.0615,0.0611,0.0375,0.0374,0.0372,0.0395,0.0395,0.0393,0.0153,0.0152,0.0149,0.0043,0.0045,0.0044,0.0027,0.0027,0.0026,0.02,0.0199,0.0198,0.0148,0.0146,0.0146,0.003,0.0032,0.0032
4,,,,,,0.1569,0.1917,0.1596,0.0748,0.0273,0.013,0.0278,0.0085,0.1173,0.0003,0.064,0.0172,0.0083,0.0002,0.0384,0.0163,0.0855,0.0917,0.0365,0.0312,0.0623,0.0581,0.0036,0.0064,0.0083,0.0262,0.0609,0.0302,0.0345,0.0173,0.0057,0.0423,0.0896,0.0451,0.0054,0.0602,0.1399,0.0016,0.1069,0.04,0.0295,0.0746,0.03,0.0128,0.0053,0.0373,0.0756,0.041,0.002,0.0577,0.1318,0.0037,0.093,0.0349,0.025,0.0618,0.0338,0.0164,0.0056,0.042,0.0879,0.0442,0.0048,0.0597,0.1385,0.0019,0.1049,0.0393,0.0281,0.0708,0.042,0.0131,0.0132,0.013,0.0131,0.0186,0.001,0.0168,0.0005,0.0015,0.0459,0.0038,0.0725,0.0184,0.0003,0.0023,0.0025,0.0227,0.0207,0.0108,0.0073,0.0022,0.0031,0.0024,0.0002,0.0006,0.0007,0.0029,0.0024,0.0247,0.0049,0.0117,0.1082,0.0344,0.0214,0.1688,0.1563,0.1498,0.1434,0.1551,0.1562,0.1531,0.9981,0.909,0.8618,0.9793,1.0,0.9999,0.9935,0.9921,0.9953,0.9981,0.0136,0.0136,0.0304,0.0304,0.033,0.033,0.0298,0.0276,0.0163,0.0293,0.0107,0.0188,0.1066,0.0403,0.0212,0.0033,0.0144,0.0348,0.0018,0.0231,0.0897,0.0145,0.0143,0.044,0.0272,0.0282,0.0061,0.0142,0.0028,0.0088,0.0235,0.0107,0.0096,0.0128,0.0168,0.0272,0.0001,0.0013,0.0009,0.0038,0.0041,0.0041,0.0011,0.0003,0.0005,0.0072,0.017,0.0175,0.0155,0.0063,0.0233,0.0229,0.0288,0.0055,0.0391,0.0394,0.0305,0.0281,0.0435,0.0459,0.0322,0.047,0.047,0.0183,0.0347,0.034,0.0466,0.0428,0.0161,0.0683,0.0314,0.0978,0.1013,0.1263,0.1069,0.1306,0.0039,0.0076,0.013,0.0074,0.0074,0.0008,0.0035,0.0009,0.0002,0.0005,0.0005,0.0015,0.0028,0.0,0.0003,0.0059,0.0065,0.002,0.0041,0.0013,0.0013,0.0001,0.0015,0.0045,0.0085,0.0064,0.0094,0.0268,0.0311,0.0025,0.0079,0.0277,0.0479,0.0068,0.0038,0.0618,0.0889,0.0074,0.0202,0.0265,0.0574,0.0165,0.0168,0.0537,0.0273,0.057,0.0304,...,0.0032,0.0038,0.0038,0.0135,0.0134,0.0134,0.0096,0.0098,0.0096,0.0096,0.003,0.0042,0.0126,0.0094,0.0259,0.026,0.0266,0.0505,0.0512,0.0512,0.0487,0.0488,0.049,0.0328,0.033,0.033,0.0681,0.0684,0.0688,0.0689,0.0317,0.0209,0.0211,0.0211,0.0212,0.0213,0.0213,0.0343,0.0347,0.0347,0.0331,0.0335,0.0335,0.0473,0.0351,0.0351,0.0332,0.0752,0.0321,0.0724,0.0129,0.0132,0.0132,0.0134,0.1103,0.1107,0.1111,0.0123,0.096,0.0961,0.0964,0.1238,0.1242,0.1245,0.0253,0.025,0.1268,0.127,0.1272,0.0902,0.0903,0.0907,0.0898,0.0899,0.0902,0.0898,0.0899,0.0903,0.0893,0.0894,0.0898,0.1348,0.1352,0.1355,0.012,0.1418,0.1422,0.1426,0.1264,0.1266,0.1269,0.1012,0.1015,0.1016,0.1635,0.1639,0.1644,0.1646,0.098,0.1255,0.1255,0.1257,0.1257,0.126,0.126,0.1249,0.1251,0.1254,0.0818,0.0819,0.082,0.0831,0.1547,0.1552,0.1556,0.1557,0.1599,0.1562,0.1567,0.1571,0.1087,0.0875,0.0876,0.0877,0.0878,0.0866,0.0868,0.0868,0.0866,0.0868,0.0868,0.0869,0.0869,0.0869,0.0866,0.0868,0.0868,0.1219,0.1225,0.0818,0.1179,0.1122,0.0864,0.1304,0.1105,0.1104,0.1106,0.0471,0.1053,0.1057,0.0496,0.049,0.049,0.0491,0.0098,0.0098,0.0092,0.0337,0.0341,0.0342,0.0344,0.0348,0.0348,0.0349,0.0361,0.0365,0.0365,0.0365,0.026,0.0266,0.0278,0.0879,0.0883,0.0888,0.1163,0.1166,0.1172,0.131,0.1312,0.1317,0.0978,0.1285,0.1289,0.1289,0.129,0.1366,0.0749,0.0005,0.0005,0.0005,0.0008,0.0008,0.0008,0.0067,0.0067,0.0058,0.0164,0.0165,0.0172,0.0193,0.0193,0.0195,0.0112,0.0112,0.0114,0.017,0.0171,0.0175,0.0123,0.0122,0.0122,0.0102,0.0102,0.0101,0.0085,0.0086,0.0086,0.0154,0.0155,0.0155,0.0277,0.0278,0.0287,0.0567,0.0569,0.0574,0.144,0.1444,0.1458,0.209,0.2093,0.2104,0.0285,0.0284,0.0285,0.0122,0.0127,0.0127,0.0462,0.0463,0.0462,0.1571,0.1576,0.1587,0.1745,0.175,0.1756,0.1088,0.1093,0.1093


I chose 90% colinearity as a threshold, you can try other values. Spoiler, lots of features are going to be dropped.

In [8]:
to_drop=[column for column in upper.columns if any(upper[column] > 0.9)]
print(f'Correlation: {len(to_drop)} will be removed')

Correlation: 620 will be removed


In [9]:
train=train.drop(to_drop, axis=1)
train.shape

(307511, 749)

In [8]:
# gc.enable()
# del corr_mat, upper
# gc.collect()

In the very first step of FS, the dataset lost approx 40% of features. How will it affect modelling results?

In [16]:
# train=train.iloc[:, 1:]

In [17]:
def quick_cv(df):
    X, y=df.drop(['TARGET'], axis=1), df['TARGET']
    cat_cols, num_cols=X.select_dtypes(include=['category', 'object']).columns, X.select_dtypes('number').columns

    ohe=OneHotEncoder(sparse=False, handle_unknown='ignore')
    col_tr=ColumnTransformer([
        ('cat', ohe, cat_cols),
        ('num', 'passthrough', num_cols)
    ])

    res, importances=custom_lgbm_cv(X, y, col_tr)
    return res, importances


In [18]:
res, importances=quick_cv(train)
res

Training until validation scores don't improve for 100 rounds
Early stopping, best iteration is:
[410]	train's auc: 0.871944	test's auc: 0.785543
Training until validation scores don't improve for 100 rounds
Early stopping, best iteration is:
[386]	train's auc: 0.868292	test's auc: 0.789011
Training until validation scores don't improve for 100 rounds
Early stopping, best iteration is:
[429]	train's auc: 0.875703	test's auc: 0.7827
Training until validation scores don't improve for 100 rounds
Early stopping, best iteration is:
[322]	train's auc: 0.857741	test's auc: 0.788504
Training until validation scores don't improve for 100 rounds
Early stopping, best iteration is:
[442]	train's auc: 0.876365	test's auc: 0.787083


Unnamed: 0,Train AUC,Test AUC
0,0.8719,0.7855
1,0.8683,0.789
2,0.8757,0.7827
3,0.8577,0.7885
4,0.8764,0.7871
Avg,0.87,0.7866


Well, there are much less features, training is faster and results are almost the same as with the full data. Great! 

## Missing features

Now I'll remove mostly empty features, i.e., features with 70-90% of NaNs. I'll re-use `miss_table` function from EDA notebook.

In [19]:
mt=miss_table(train)
mt.head()

There are 683/748 columns with missing values
Distribution by dtypes:
float32    683
Name: Dtype, dtype: int64


Unnamed: 0,Count,Percent,Dtype
previous_RATE_INTEREST_PRIVILEGED_min,302902.0,98.5012,float32
loan_card_AMT_PAYMENT_CURRENT_min_min,246451.0,80.1438,float32
loan_card_AMT_DRAWINGS_ATM_CURRENT_max_min,246371.0,80.1178,float32
loan_card_CNT_DRAWINGS_ATM_CURRENT_mean_min,246371.0,80.1178,float32
loan_card_CNT_DRAWINGS_ATM_CURRENT_min_min,246371.0,80.1178,float32


In [20]:
thresh=80
to_drop=mt.loc[mt['Percent']>thresh].index
print(f'{len(to_drop)} features with {thresh}% of NaNs will be removed')

16 features with 80% of NaNs will be removed


A few features got dropped, let's see modelling results.

In [21]:
train=train.drop(to_drop, axis=1)
train.shape

(307511, 732)

In [22]:
res, importances=quick_cv(train)
res

Training until validation scores don't improve for 100 rounds


KeyboardInterrupt: 

Performance remains stable (roughly the same), even though we got less and les featues. Perfect!

## Low Variance

Features with zero or very low (here threshold is subjective) variance can be dropped too. Let's try it out.

In [25]:
from sklearn import preprocessing

scaler = preprocessing.MinMaxScaler()
numeric_data=train.iloc[:, 1:].select_dtypes('number').reset_index(drop=True)
numeric_data.replace([np.inf, -np.inf], np.nan, inplace=True)

In [27]:
for i in numeric_data.columns:
    numeric_data[i].fillna(value=numeric_data[i].mean(), inplace=True) #replace NaN with mean of dimension
    numeric_data[i] = scaler.fit_transform(numeric_data[i].values.reshape(-1,1)) 

train_vars=numeric_data.var().sort_values()

In [30]:
to_drop=train_vars[train_vars<0.00005].index
to_drop.shape

(112,)

In [32]:
train=train.drop(to_drop, axis=1)
train.shape

In [18]:
res, importances=quick_cv(train)
res

Training until validation scores don't improve for 100 rounds
Early stopping, best iteration is:
[270]	train's auc: 0.838817	test's auc: 0.777574
Training until validation scores don't improve for 100 rounds
Early stopping, best iteration is:
[291]	train's auc: 0.842919	test's auc: 0.778338
Training until validation scores don't improve for 100 rounds
Early stopping, best iteration is:
[283]	train's auc: 0.842254	test's auc: 0.772549
Training until validation scores don't improve for 100 rounds
Early stopping, best iteration is:
[262]	train's auc: 0.837949	test's auc: 0.77896
Training until validation scores don't improve for 100 rounds
Early stopping, best iteration is:
[426]	train's auc: 0.863819	test's auc: 0.780061


Unnamed: 0,Train AUC,Test AUC
0,0.8388,0.7776
1,0.8429,0.7783
2,0.8423,0.7725
3,0.8379,0.779
4,0.8638,0.7801
Avg,0.8452,0.7775


In [20]:
# train.to_csv('../data/train_filtered.csv', chunksize=500)

In [21]:
# https://www.kaggle.com/code/ogrellier/feature-selection-with-null-importances/notebook
# https://www.kdnuggets.com/2019/10/feature-selection-beyond-feature-importance.html
# https://datascience.stackexchange.com/questions/12554/does-xgboost-handle-multicollinearity-by-itself
# null importance??

## Feature importance

The last thing I'll try out is selecting most important features according to LightGBM. Yes, feature importances can be used for FS too.

First, I'll define a function for getting feature importances in 2 ways: `gain` and `split` (default is this one). You can [refer to LightGBM docs](https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.Booster.html#lightgbm.Booster.feature_importance) for more information. Note, that I won't OHE features, but will tell LightGBM which ones are categorical. LightGBM [can handle](https://lightgbm.readthedocs.io/en/latest/Features.html#optimal-split-for-categorical-features) categorical features natively.

In [19]:
def feature_imp_cv(features, target, k=5):
    metric_df=pd.DataFrame(columns=['Train AUC', 'Test AUC'])
    feat_importances_gain,feat_importances_split=[], []
    kfold=StratifiedKFold(k)
    
    for f, (tr, te) in enumerate(kfold.split(features, y=target)):
        X_train, y_train=features.iloc[tr, :], target.iloc[tr]
        X_test, y_test=features.iloc[te, :], target.iloc[te]

        weight=np.count_nonzero(y_train==0)/np.count_nonzero(y_train==1)

        params={'num_boost_round': 10000,
                'objective': 'binary',
                'scale_pos_weight': weight,
                'metric': 'auc',
                'learning_rate': 0.05,
                'reg_alpha': 0.1,
                'reg_lambda': 0.1,
                'subsample': 0.8,
                'n_jobs': -1,
                'random_state': 5,
                'verbose': -1}

        dtrain=lgb.Dataset(X_train, label=y_train)
        dval=lgb.Dataset(X_test, label=y_test)

        model=lgb.train(
                params=params,
                train_set=dtrain,
                valid_sets=[dtrain, dval],
                valid_names=['train', 'test'],
                categorical_feature=list(features.select_dtypes('category').columns),
                callbacks=[lgb.early_stopping(100, verbose=-1), lgb.log_evaluation(-1)])
    
        test_score, train_score=model.best_score['test']['auc'], model.best_score['train']['auc']
        metric_df.loc[f]=[train_score, test_score]
        
        feat_importances_gain.append(model.feature_importance(importance_type='gain'))
        feat_importances_split.append(model.feature_importance(importance_type='split'))

    
    feat_importances_gain=np.array(feat_importances_gain).mean(axis=0)
    feat_importances_split=np.array(feat_importances_split).mean(axis=0)
    feat_importances_df=pd.DataFrame({'feature': list(features.columns),
                                        'importance (gain)': feat_importances_gain,
                                        'importance (split)': feat_importances_split,})
    metric_df.loc['Avg']=[metric_df['Train AUC'].mean(), metric_df['Test AUC'].mean()]
    return metric_df, feat_importances_df

In [20]:
import re
train=train.rename(columns=lambda x:re.sub('[^A-Za-z0-9_]+', '', x))

In [21]:
X, y=train.drop(['TARGET', 'SK_ID_CURR'], axis=1), train['TARGET']

res, imps=feature_imp_cv(X, y)
res

Training until validation scores don't improve for 100 rounds
Early stopping, best iteration is:
[202]	train's auc: 0.833789	test's auc: 0.775633
Training until validation scores don't improve for 100 rounds
Early stopping, best iteration is:
[303]	train's auc: 0.85302	test's auc: 0.777657
Training until validation scores don't improve for 100 rounds
Early stopping, best iteration is:
[208]	train's auc: 0.83577	test's auc: 0.771201
Training until validation scores don't improve for 100 rounds
Early stopping, best iteration is:
[352]	train's auc: 0.861746	test's auc: 0.777664
Training until validation scores don't improve for 100 rounds
Early stopping, best iteration is:
[412]	train's auc: 0.870351	test's auc: 0.780385


Unnamed: 0,Train AUC,Test AUC
0,0.8338,0.7756
1,0.853,0.7777
2,0.8358,0.7712
3,0.8617,0.7777
4,0.8704,0.7804
Avg,0.8509,0.7765


In [22]:
imps=imps.sort_values(by='importance (gain)', ascending=False).reset_index(drop=True)
imps

Unnamed: 0,feature,importance (gain),importance (split)
0,EXT_SOURCE_3,324262.4357,262.2000
1,EXT_SOURCE_2,260893.6252,247.4000
2,ORGANIZATION_TYPE,184568.2692,1401.0000
3,EXT_SOURCE_1,96844.1151,301.4000
4,client_installments_AMT_PAYMENT_min_sum,63888.0072,194.0000
...,...,...,...
498,previous_CODE_REJECT_REASON_SCOFR_sum,0.0000,0.0000
499,previous_FLAG_LAST_APPL_PER_CONTRACT_N_sum,0.0000,0.0000
500,previous_NAME_GOODS_CATEGORY_AutoAccessories_sum,0.0000,0.0000
501,previous_NAME_CASH_LOAN_PURPOSE_Urgentneeds_sum,0.0000,0.0000


### Zero Importance Features

Features with zero importance can be dropped without much hesitation. Let's test it out.

In [23]:
feat_zero_imp=imps[(imps['importance (split)']==0) & (imps['importance (gain)']==0)]['feature'].values
feat_zero_imp.shape

(66,)

In [24]:
train=train.drop(feat_zero_imp, axis=1)
train.shape

(307511, 439)

In [None]:
res, importances=quick_cv(train)
res

Training until validation scores don't improve for 100 rounds
Early stopping, best iteration is:
[256]	train's auc: 0.83625	test's auc: 0.778258
Training until validation scores don't improve for 100 rounds
Early stopping, best iteration is:
[294]	train's auc: 0.843476	test's auc: 0.778333
Training until validation scores don't improve for 100 rounds
Early stopping, best iteration is:
[350]	train's auc: 0.853485	test's auc: 0.772599
Training until validation scores don't improve for 100 rounds
Early stopping, best iteration is:
[293]	train's auc: 0.84315	test's auc: 0.779272
Training until validation scores don't improve for 100 rounds


### Low Importance Features

This part is a bit tricky, as you need to pick the treshold. I've picked very high one of 0.999. Hence, I expect very little information loss. But with lower thresholds, the risk of data loss (and folliwing performance loss) rises.

In [None]:
aa=np.cumsum(imps.iloc[:, 1:])/np.sum(imps.iloc[:, 1:])
aa

In [124]:
drop_index=np.argwhere(aa['importance (gain)'].values>0.999)[0][0]
drop_index

376

In [125]:
low_zero_imp=imps.iloc[drop_index:, :]['feature'].values
low_zero_imp

array(['client_credit_AMT_INST_MIN_REGULARITY_min_min',
       'previous_NAME_GOODS_CATEGORY_Medicine_mean',
       'client_cash_NAME_CONTRACT_STATUS_Approved_mean_max',
       'previous_NAME_CLIENT_TYPE_XNA_mean',
       'previous_NAME_SELLER_INDUSTRY_Construction_sum',
       'previous_PRODUCT_COMBINATION_CardStreet_sum',
       'previous_NAME_SELLER_INDUSTRY_Jewelry_mean', 'FLAG_OWN_REALTY',
       'previous_NAME_GOODS_CATEGORY_Mobile_sum',
       'AMT_REQ_CREDIT_BUREAU_HOUR',
       'previous_NAME_SELLER_INDUSTRY_Clothing_sum',
       'previous_NAME_CASH_LOAN_PURPOSE_Buyingahome_mean',
       'previous_NAME_GOODS_CATEGORY_Furniture_sum',
       'client_credit_AMT_DRAWINGS_OTHER_CURRENT_sum_sum',
       'previous_FLAG_LAST_APPL_PER_CONTRACT_N_mean', 'FLAG_DOCUMENT_6',
       'previous_NAME_CASH_LOAN_PURPOSE_Education_mean',
       'previous_RATE_INTEREST_PRIVILEGED_sum',
       'previous_PRODUCT_COMBINATION_CashStreetlow_sum',
       'previous_NAME_SELLER_INDUSTRY_Furniture_sum', 'F

In [126]:
train=train.drop(low_zero_imp, axis=1)
train.shape

(307511, 378)

In [127]:
res, importances=quick_cv(train)
res

Training until validation scores don't improve for 100 rounds
Early stopping, best iteration is:
[253]	train's auc: 0.835615	test's auc: 0.777766
Training until validation scores don't improve for 100 rounds
Early stopping, best iteration is:
[258]	train's auc: 0.837459	test's auc: 0.778217
Training until validation scores don't improve for 100 rounds
Early stopping, best iteration is:
[281]	train's auc: 0.841689	test's auc: 0.773277
Training until validation scores don't improve for 100 rounds
Early stopping, best iteration is:
[304]	train's auc: 0.84468	test's auc: 0.778509
Training until validation scores don't improve for 100 rounds
Early stopping, best iteration is:
[428]	train's auc: 0.864528	test's auc: 0.779141


Unnamed: 0,Train AUC,Test AUC
0,0.8356,0.7778
1,0.8375,0.7782
2,0.8417,0.7733
3,0.8447,0.7785
4,0.8645,0.7791
Avg,0.8448,0.7774
