# Santander Kaggle Competition

## Summary

We're going to learn data science by participating in a machine learning competition.

### Competition Description

From frontline support teams to C-suites, customer satisfaction is a key measure of success. Unhappy customers don't stick around. What's more, unhappy customers rarely voice their dissatisfaction before leaving.

Santander Bank is asking Kagglers to help them identify dissatisfied customers early in their relationship. Doing so would allow Santander to take proactive steps to improve a customer's happiness before it's too late.

In this competition, you'll work with hundreds of anonymized features to predict if a customer is satisfied or dissatisfied with their banking experience.

### Get the Data
1. Sign up for Kaggle
2. Download the data https://www.kaggle.com/c/santander-customer-satisfaction/data
3. Open the files and place the 'train' and 'test' csv files in the same folder as this notebook.



### Get our libraries

In [95]:
%matplotlib inline
# inline is an option to make our plots appear inline (instead of pop out)

import matplotlib.pyplot as plt  # our standard plotting library; see 'seaborn' as alternative
import numpy as np  # fast arrays made for scientific computing, needed for sklearn
import scipy  # scientific computing tools, needed for sklearn
import pandas as pd  # great for data manipulation, looking at data
import sklearn  # aka sci-kit learn; python machine learning, built on top of numpy, scipy

In [96]:
# Set some display options so we can see everything (since this dataset isn't too large) 
pd.options.display.max_seq_items = 500
pd.options.display.max_columns = 350
pd.options.display.precision = 3

### Glimpse at our data

We want to load our data into a pandas dataframe. You're probably familiar with one-dimensional data like lists/arrays where it's a row of data. A dataframe is two-dimensional in that there's both rows and columns; this is what an Excel sheet looks like

In [97]:
# Let's load our data
DATA_LOCATION = '/Users/williamliu/Desktop/Santander/'
df_train = pd.read_csv(DATA_LOCATION + 'train.csv')

In [98]:
df_train.head()

Unnamed: 0,ID,var3,var15,imp_ent_var16_ult1,imp_op_var39_comer_ult1,imp_op_var39_comer_ult3,imp_op_var40_comer_ult1,imp_op_var40_comer_ult3,imp_op_var40_efect_ult1,imp_op_var40_efect_ult3,imp_op_var40_ult1,imp_op_var41_comer_ult1,imp_op_var41_comer_ult3,imp_op_var41_efect_ult1,imp_op_var41_efect_ult3,imp_op_var41_ult1,imp_op_var39_efect_ult1,imp_op_var39_efect_ult3,imp_op_var39_ult1,imp_sal_var16_ult1,ind_var1_0,ind_var1,ind_var2_0,ind_var2,ind_var5_0,ind_var5,ind_var6_0,ind_var6,ind_var8_0,ind_var8,ind_var12_0,ind_var12,ind_var13_0,ind_var13_corto_0,ind_var13_corto,ind_var13_largo_0,ind_var13_largo,ind_var13_medio_0,ind_var13_medio,ind_var13,ind_var14_0,ind_var14,ind_var17_0,ind_var17,ind_var18_0,ind_var18,ind_var19,ind_var20_0,ind_var20,ind_var24_0,ind_var24,ind_var25_cte,ind_var26_0,ind_var26_cte,ind_var26,ind_var25_0,ind_var25,ind_var27_0,ind_var28_0,ind_var28,ind_var27,ind_var29_0,ind_var29,ind_var30_0,ind_var30,ind_var31_0,ind_var31,ind_var32_cte,ind_var32_0,ind_var32,ind_var33_0,ind_var33,ind_var34_0,ind_var34,ind_var37_cte,ind_var37_0,ind_var37,ind_var39_0,ind_var40_0,ind_var40,ind_var41_0,ind_var41,ind_var39,ind_var44_0,ind_var44,ind_var46_0,ind_var46,num_var1_0,num_var1,num_var4,num_var5_0,num_var5,num_var6_0,num_var6,num_var8_0,num_var8,num_var12_0,num_var12,num_var13_0,num_var13_corto_0,num_var13_corto,num_var13_largo_0,num_var13_largo,num_var13_medio_0,num_var13_medio,num_var13,num_var14_0,num_var14,num_var17_0,num_var17,num_var18_0,num_var18,num_var20_0,num_var20,num_var24_0,num_var24,num_var26_0,num_var26,num_var25_0,num_var25,num_op_var40_hace2,num_op_var40_hace3,num_op_var40_ult1,num_op_var40_ult3,num_op_var41_hace2,num_op_var41_hace3,num_op_var41_ult1,num_op_var41_ult3,num_op_var39_hace2,num_op_var39_hace3,num_op_var39_ult1,num_op_var39_ult3,num_var27_0,num_var28_0,num_var28,num_var27,num_var29_0,num_var29,num_var30_0,num_var30,num_var31_0,num_var31,num_var32_0,num_var32,num_var33_0,num_var33,num_var34_0,num_var34,num_var35,num_var37_med_ult2,num_var37_0,num_var37,num_var39_0,num_var40_0,num_var40,num_var41_0,num_var41,num_var39,num_var42_0,num_var42,num_var44_0,num_var44,num_var46_0,num_var46,saldo_var1,saldo_var5,saldo_var6,saldo_var8,saldo_var12,saldo_var13_corto,saldo_var13_largo,saldo_var13_medio,saldo_var13,saldo_var14,saldo_var17,...,delta_imp_amort_var34_1y3,delta_imp_aport_var13_1y3,delta_imp_aport_var17_1y3,delta_imp_aport_var33_1y3,delta_imp_compra_var44_1y3,delta_imp_reemb_var13_1y3,delta_imp_reemb_var17_1y3,delta_imp_reemb_var33_1y3,delta_imp_trasp_var17_in_1y3,delta_imp_trasp_var17_out_1y3,delta_imp_trasp_var33_in_1y3,delta_imp_trasp_var33_out_1y3,delta_imp_venta_var44_1y3,delta_num_aport_var13_1y3,delta_num_aport_var17_1y3,delta_num_aport_var33_1y3,delta_num_compra_var44_1y3,delta_num_reemb_var13_1y3,delta_num_reemb_var17_1y3,delta_num_reemb_var33_1y3,delta_num_trasp_var17_in_1y3,delta_num_trasp_var17_out_1y3,delta_num_trasp_var33_in_1y3,delta_num_trasp_var33_out_1y3,delta_num_venta_var44_1y3,imp_amort_var18_hace3,imp_amort_var18_ult1,imp_amort_var34_hace3,imp_amort_var34_ult1,imp_aport_var13_hace3,imp_aport_var13_ult1,imp_aport_var17_hace3,imp_aport_var17_ult1,imp_aport_var33_hace3,imp_aport_var33_ult1,imp_var7_emit_ult1,imp_var7_recib_ult1,imp_compra_var44_hace3,imp_compra_var44_ult1,imp_reemb_var13_hace3,imp_reemb_var13_ult1,imp_reemb_var17_hace3,imp_reemb_var17_ult1,imp_reemb_var33_hace3,imp_reemb_var33_ult1,imp_var43_emit_ult1,imp_trans_var37_ult1,imp_trasp_var17_in_hace3,imp_trasp_var17_in_ult1,imp_trasp_var17_out_hace3,imp_trasp_var17_out_ult1,imp_trasp_var33_in_hace3,imp_trasp_var33_in_ult1,imp_trasp_var33_out_hace3,imp_trasp_var33_out_ult1,imp_venta_var44_hace3,imp_venta_var44_ult1,ind_var7_emit_ult1,ind_var7_recib_ult1,ind_var10_ult1,ind_var10cte_ult1,ind_var9_cte_ult1,ind_var9_ult1,ind_var43_emit_ult1,ind_var43_recib_ult1,var21,num_var2_0_ult1,num_var2_ult1,num_aport_var13_hace3,num_aport_var13_ult1,num_aport_var17_hace3,num_aport_var17_ult1,num_aport_var33_hace3,num_aport_var33_ult1,num_var7_emit_ult1,num_var7_recib_ult1,num_compra_var44_hace3,num_compra_var44_ult1,num_ent_var16_ult1,num_var22_hace2,num_var22_hace3,num_var22_ult1,num_var22_ult3,num_med_var22_ult3,num_med_var45_ult3,num_meses_var5_ult3,num_meses_var8_ult3,num_meses_var12_ult3,num_meses_var13_corto_ult3,num_meses_var13_largo_ult3,num_meses_var13_medio_ult3,num_meses_var17_ult3,num_meses_var29_ult3,num_meses_var33_ult3,num_meses_var39_vig_ult3,num_meses_var44_ult3,num_op_var39_comer_ult1,num_op_var39_comer_ult3,num_op_var40_comer_ult1,num_op_var40_comer_ult3,num_op_var40_efect_ult1,num_op_var40_efect_ult3,num_op_var41_comer_ult1,num_op_var41_comer_ult3,num_op_var41_efect_ult1,num_op_var41_efect_ult3,num_op_var39_efect_ult1,num_op_var39_efect_ult3,num_reemb_var13_hace3,num_reemb_var13_ult1,num_reemb_var17_hace3,num_reemb_var17_ult1,num_reemb_var33_hace3,num_reemb_var33_ult1,num_sal_var16_ult1,num_var43_emit_ult1,num_var43_recib_ult1,num_trasp_var11_ult1,num_trasp_var17_in_hace3,num_trasp_var17_in_ult1,num_trasp_var17_out_hace3,num_trasp_var17_out_ult1,num_trasp_var33_in_hace3,num_trasp_var33_in_ult1,num_trasp_var33_out_hace3,num_trasp_var33_out_ult1,num_venta_var44_hace3,num_venta_var44_ult1,num_var45_hace2,num_var45_hace3,num_var45_ult1,num_var45_ult3,saldo_var2_ult1,saldo_medio_var5_hace2,saldo_medio_var5_hace3,saldo_medio_var5_ult1,saldo_medio_var5_ult3,saldo_medio_var8_hace2,saldo_medio_var8_hace3,saldo_medio_var8_ult1,saldo_medio_var8_ult3,saldo_medio_var12_hace2,saldo_medio_var12_hace3,saldo_medio_var12_ult1,saldo_medio_var12_ult3,saldo_medio_var13_corto_hace2,saldo_medio_var13_corto_hace3,saldo_medio_var13_corto_ult1,saldo_medio_var13_corto_ult3,saldo_medio_var13_largo_hace2,saldo_medio_var13_largo_hace3,saldo_medio_var13_largo_ult1,saldo_medio_var13_largo_ult3,saldo_medio_var13_medio_hace2,saldo_medio_var13_medio_hace3,saldo_medio_var13_medio_ult1,saldo_medio_var13_medio_ult3,saldo_medio_var17_hace2,saldo_medio_var17_hace3,saldo_medio_var17_ult1,saldo_medio_var17_ult3,saldo_medio_var29_hace2,saldo_medio_var29_hace3,saldo_medio_var29_ult1,saldo_medio_var29_ult3,saldo_medio_var33_hace2,saldo_medio_var33_hace3,saldo_medio_var33_ult1,saldo_medio_var33_ult3,saldo_medio_var44_hace2,saldo_medio_var44_hace3,saldo_medio_var44_ult1,saldo_medio_var44_ult3,var38,TARGET
0,1,2,23,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,3,0,0,3,0,0,0,0,0,0,0.0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0.0,0.0,0,0.0,0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,39205.17,0
1,3,2,34,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,1,3,0,0,0,0,0,0,0,3,3,3,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,6,3,0,0,0,0,0,0,0,0,3,0,0,0,3,0,0,3,0,0,3,0,0,0,0,0,0,0.0,0,0,0,300,0,0,300,0,0,...,0,-1,0,0,0,0,0,0,0,0,0,0,0,-1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,300,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,3,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,88.89,0.0,0.0,0,0,0,0,0,0,0.0,0.0,300,122.22,300,240.75,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,49278.03,0
2,4,2,23,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,1,3,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,3,0,0,0,0,0,0,0,0,3,0,0,0,3,0,0,3,0,0,3,3,0,0,0,0,0,3.0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3.0,0.18,3.0,2.07,0,0,0,0,0,0,0.0,0.0,0,0.0,0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,67333.77,0
3,8,2,37,0,195,195,0,0,0,0,0,195,195,0,0,195,0,0,195,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,1,1,1,1,0,0,1,0,0,0,0,0,0,0,0,3,3,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,9,9,0,0,9,9,0,0,0,0,0,0,3,3,0,0,0,0,0,0,0,0,9,6,6,6,3,0,0,3,0,0,3,3,0,0,0,0,0,70.62,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,3,0,15,2,0,0,0,0,0,0,0,0,1,0,9,9,0,0,0,0,9,9,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,27,3,18,48,0,186.09,0.0,91.56,138.84,0,0,0,0,0,0,0.0,0.0,0,0.0,0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,64007.97,0
4,10,2,39,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,1,3,0,0,0,0,0,3,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,6,3,0,0,0,0,0,0,0,0,3,0,0,0,3,0,0,3,0,0,6,3,0,0,0,0,0,0.0,0,0,135003,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,135003,270003,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,6,9,3,0,3,0,1,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,6,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3.0,0.3,40501.08,13501.47,0,0,0,0,0,0,85501.89,85501.89,0,0.0,0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,117310.98,0


In [99]:
np.unique(df_train.dtypes)  # Interesting, we only have numbers (int, float) and no categories (e.g. no text, categoricals)

array([dtype('int64'), dtype('float64')], dtype=object)

In [100]:
df_train.describe()

Unnamed: 0,ID,var3,var15,imp_ent_var16_ult1,imp_op_var39_comer_ult1,imp_op_var39_comer_ult3,imp_op_var40_comer_ult1,imp_op_var40_comer_ult3,imp_op_var40_efect_ult1,imp_op_var40_efect_ult3,imp_op_var40_ult1,imp_op_var41_comer_ult1,imp_op_var41_comer_ult3,imp_op_var41_efect_ult1,imp_op_var41_efect_ult3,imp_op_var41_ult1,imp_op_var39_efect_ult1,imp_op_var39_efect_ult3,imp_op_var39_ult1,imp_sal_var16_ult1,ind_var1_0,ind_var1,ind_var2_0,ind_var2,ind_var5_0,ind_var5,ind_var6_0,ind_var6,ind_var8_0,ind_var8,ind_var12_0,ind_var12,ind_var13_0,ind_var13_corto_0,ind_var13_corto,ind_var13_largo_0,ind_var13_largo,ind_var13_medio_0,ind_var13_medio,ind_var13,ind_var14_0,ind_var14,ind_var17_0,ind_var17,ind_var18_0,ind_var18,ind_var19,ind_var20_0,ind_var20,ind_var24_0,ind_var24,ind_var25_cte,ind_var26_0,ind_var26_cte,ind_var26,ind_var25_0,ind_var25,ind_var27_0,ind_var28_0,ind_var28,ind_var27,ind_var29_0,ind_var29,ind_var30_0,ind_var30,ind_var31_0,ind_var31,ind_var32_cte,ind_var32_0,ind_var32,ind_var33_0,ind_var33,ind_var34_0,ind_var34,ind_var37_cte,ind_var37_0,ind_var37,ind_var39_0,ind_var40_0,ind_var40,ind_var41_0,ind_var41,ind_var39,ind_var44_0,ind_var44,ind_var46_0,ind_var46,num_var1_0,num_var1,num_var4,num_var5_0,num_var5,num_var6_0,num_var6,num_var8_0,num_var8,num_var12_0,num_var12,num_var13_0,num_var13_corto_0,num_var13_corto,num_var13_largo_0,num_var13_largo,num_var13_medio_0,num_var13_medio,num_var13,num_var14_0,num_var14,num_var17_0,num_var17,num_var18_0,num_var18,num_var20_0,num_var20,num_var24_0,num_var24,num_var26_0,num_var26,num_var25_0,num_var25,num_op_var40_hace2,num_op_var40_hace3,num_op_var40_ult1,num_op_var40_ult3,num_op_var41_hace2,num_op_var41_hace3,num_op_var41_ult1,num_op_var41_ult3,num_op_var39_hace2,num_op_var39_hace3,num_op_var39_ult1,num_op_var39_ult3,num_var27_0,num_var28_0,num_var28,num_var27,num_var29_0,num_var29,num_var30_0,num_var30,num_var31_0,num_var31,num_var32_0,num_var32,num_var33_0,num_var33,num_var34_0,num_var34,num_var35,num_var37_med_ult2,num_var37_0,num_var37,num_var39_0,num_var40_0,num_var40,num_var41_0,num_var41,num_var39,num_var42_0,num_var42,num_var44_0,num_var44,num_var46_0,num_var46,saldo_var1,saldo_var5,saldo_var6,saldo_var8,saldo_var12,saldo_var13_corto,saldo_var13_largo,saldo_var13_medio,saldo_var13,saldo_var14,saldo_var17,...,delta_imp_amort_var34_1y3,delta_imp_aport_var13_1y3,delta_imp_aport_var17_1y3,delta_imp_aport_var33_1y3,delta_imp_compra_var44_1y3,delta_imp_reemb_var13_1y3,delta_imp_reemb_var17_1y3,delta_imp_reemb_var33_1y3,delta_imp_trasp_var17_in_1y3,delta_imp_trasp_var17_out_1y3,delta_imp_trasp_var33_in_1y3,delta_imp_trasp_var33_out_1y3,delta_imp_venta_var44_1y3,delta_num_aport_var13_1y3,delta_num_aport_var17_1y3,delta_num_aport_var33_1y3,delta_num_compra_var44_1y3,delta_num_reemb_var13_1y3,delta_num_reemb_var17_1y3,delta_num_reemb_var33_1y3,delta_num_trasp_var17_in_1y3,delta_num_trasp_var17_out_1y3,delta_num_trasp_var33_in_1y3,delta_num_trasp_var33_out_1y3,delta_num_venta_var44_1y3,imp_amort_var18_hace3,imp_amort_var18_ult1,imp_amort_var34_hace3,imp_amort_var34_ult1,imp_aport_var13_hace3,imp_aport_var13_ult1,imp_aport_var17_hace3,imp_aport_var17_ult1,imp_aport_var33_hace3,imp_aport_var33_ult1,imp_var7_emit_ult1,imp_var7_recib_ult1,imp_compra_var44_hace3,imp_compra_var44_ult1,imp_reemb_var13_hace3,imp_reemb_var13_ult1,imp_reemb_var17_hace3,imp_reemb_var17_ult1,imp_reemb_var33_hace3,imp_reemb_var33_ult1,imp_var43_emit_ult1,imp_trans_var37_ult1,imp_trasp_var17_in_hace3,imp_trasp_var17_in_ult1,imp_trasp_var17_out_hace3,imp_trasp_var17_out_ult1,imp_trasp_var33_in_hace3,imp_trasp_var33_in_ult1,imp_trasp_var33_out_hace3,imp_trasp_var33_out_ult1,imp_venta_var44_hace3,imp_venta_var44_ult1,ind_var7_emit_ult1,ind_var7_recib_ult1,ind_var10_ult1,ind_var10cte_ult1,ind_var9_cte_ult1,ind_var9_ult1,ind_var43_emit_ult1,ind_var43_recib_ult1,var21,num_var2_0_ult1,num_var2_ult1,num_aport_var13_hace3,num_aport_var13_ult1,num_aport_var17_hace3,num_aport_var17_ult1,num_aport_var33_hace3,num_aport_var33_ult1,num_var7_emit_ult1,num_var7_recib_ult1,num_compra_var44_hace3,num_compra_var44_ult1,num_ent_var16_ult1,num_var22_hace2,num_var22_hace3,num_var22_ult1,num_var22_ult3,num_med_var22_ult3,num_med_var45_ult3,num_meses_var5_ult3,num_meses_var8_ult3,num_meses_var12_ult3,num_meses_var13_corto_ult3,num_meses_var13_largo_ult3,num_meses_var13_medio_ult3,num_meses_var17_ult3,num_meses_var29_ult3,num_meses_var33_ult3,num_meses_var39_vig_ult3,num_meses_var44_ult3,num_op_var39_comer_ult1,num_op_var39_comer_ult3,num_op_var40_comer_ult1,num_op_var40_comer_ult3,num_op_var40_efect_ult1,num_op_var40_efect_ult3,num_op_var41_comer_ult1,num_op_var41_comer_ult3,num_op_var41_efect_ult1,num_op_var41_efect_ult3,num_op_var39_efect_ult1,num_op_var39_efect_ult3,num_reemb_var13_hace3,num_reemb_var13_ult1,num_reemb_var17_hace3,num_reemb_var17_ult1,num_reemb_var33_hace3,num_reemb_var33_ult1,num_sal_var16_ult1,num_var43_emit_ult1,num_var43_recib_ult1,num_trasp_var11_ult1,num_trasp_var17_in_hace3,num_trasp_var17_in_ult1,num_trasp_var17_out_hace3,num_trasp_var17_out_ult1,num_trasp_var33_in_hace3,num_trasp_var33_in_ult1,num_trasp_var33_out_hace3,num_trasp_var33_out_ult1,num_venta_var44_hace3,num_venta_var44_ult1,num_var45_hace2,num_var45_hace3,num_var45_ult1,num_var45_ult3,saldo_var2_ult1,saldo_medio_var5_hace2,saldo_medio_var5_hace3,saldo_medio_var5_ult1,saldo_medio_var5_ult3,saldo_medio_var8_hace2,saldo_medio_var8_hace3,saldo_medio_var8_ult1,saldo_medio_var8_ult3,saldo_medio_var12_hace2,saldo_medio_var12_hace3,saldo_medio_var12_ult1,saldo_medio_var12_ult3,saldo_medio_var13_corto_hace2,saldo_medio_var13_corto_hace3,saldo_medio_var13_corto_ult1,saldo_medio_var13_corto_ult3,saldo_medio_var13_largo_hace2,saldo_medio_var13_largo_hace3,saldo_medio_var13_largo_ult1,saldo_medio_var13_largo_ult3,saldo_medio_var13_medio_hace2,saldo_medio_var13_medio_hace3,saldo_medio_var13_medio_ult1,saldo_medio_var13_medio_ult3,saldo_medio_var17_hace2,saldo_medio_var17_hace3,saldo_medio_var17_ult1,saldo_medio_var17_ult3,saldo_medio_var29_hace2,saldo_medio_var29_hace3,saldo_medio_var29_ult1,saldo_medio_var29_ult3,saldo_medio_var33_hace2,saldo_medio_var33_hace3,saldo_medio_var33_ult1,saldo_medio_var33_ult3,saldo_medio_var44_hace2,saldo_medio_var44_hace3,saldo_medio_var44_ult1,saldo_medio_var44_ult3,var38,TARGET
count,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76000.0,76020,76020,76020.0,76020.0,76000.0,76000.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76000.0,76000.0,76000.0,76020.0,76020.0,76000.0,76000.0,76000.0,76000.0,76000.0,76000.0,76000.0,76000.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020,76020,76020,76020,76000.0,76000.0,76020.0,76020.0,76000.0,76000.0,76000.0,76000.0,76000.0,76000.0,76000.0,76000.0,76000.0,76020.0,76020.0,76020.0,76020.0,76020.0,76000.0,76020.0,76020,76000.0,76000.0,76000.0,76020,76020,76020.0,76020.0,76020.0,76020.0,76020.0,76000.0,76000.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76000.0,76000.0,76020.0,76020.0,76020.0,76020.0,76000.0,76000.0,76000.0,76020.0,76000.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76000.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020,76020,76020,76020,76000.0,76000.0,76020.0,76020.0,76020.0,76020.0,76000.0,76000.0,76000.0,76000.0,76000.0,76000.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020,76020.0,76020.0,76020.0,76000.0,76000.0,76020,76020,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,...,76000.0,76000.0,76000.0,76000.0,76000.0,76000.0,76000.0,76000.0,76000.0,76000.0,76000.0,76000.0,76000.0,76000.0,76000.0,76000.0,76000.0,76000.0,76000.0,76000.0,76000.0,76000.0,76000.0,76000.0,76000.0,76020,76020.0,76020,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020,76020.0,76020.0,76020.0,76020,76020.0,76020.0,76020.0,76020.0,76020.0,76020,76020.0,76020.0,76020.0,76020,76020.0,76020.0,76020.0,76000.0,76000.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020,76020,76020.0,76020.0,76000.0,76000.0,76000.0,76000.0,76000.0,76020.0,76000.0,76000.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76000.0,76000.0,76000.0,76000.0,76020.0,76000.0,76020.0,76020.0,76020.0,76020.0,76000.0,76000.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020,76000.0,76000.0,76000.0,76020,76000.0,76000.0,76020.0,76020.0,76020.0,76000.0,76000.0,76020,76000.0,76000.0,76000.0,76020,76000.0,76000.0,76000.0,76020.0,76020.0,76020.0,76020.0,76020,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76000.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0
mean,75964.05,-1523.2,33.21,86.21,72.36,119.53,3.56,6.47,0.41,0.57,3.16,68.8,113.06,68.21,113.23,137.24,68.62,113.79,140.4,5.48,0.01,0.00376,0,0,0.96,0.66,0.000105,2.63e-05,0.03,0.03,0.07,0.05,0.05,0.04,0.04,0.01,0.01,2.63e-05,2.63e-05,0.05,0.02,0.0053,0.0018,0.00145,2.63e-05,2.63e-05,0.0042,0.00363,0.0027,0.04,0.04,0.03,0.02,0.03,0.02,0.02,0.02,0,0,0,0,0.000105,2.63e-05,1.0,0.73,0.00428,0.00367,0.00121,0.00108,0.00108,0.00075,0.000631,2.63e-05,2.63e-05,0.07,0.07,0.07,0.88,0.01,0.00372,0.88,0,0.00372,0.00188,0.0017,0,0,0.03,0.01,1.08,2.89,2.0,0.000316,7.89e-05,0.1,0.09,0.21,0.14,0.17,0.13,0.12,0.04,0.04,7.89e-05,7.89e-05,0.16,0.07,0.02,0.01,0.00888,7.89e-05,7.89e-05,0.01,0.00809,0.13,0.11,0.09,0.09,0.09,0.09,0.02,0.00103,0.06,0.08,1.6,0.09,2.86,4.55,1.62,0.09,2.92,4.63,0,0,0,0,0.000316,7.89e-05,3.37,2.38,0.02,0.02,0.00422,0.00422,0.00257,0.00209,7.89e-05,7.89e-05,3.3,0.26,0.42,0.42,2.72,0.03,0.01,2.7,0,0.01,3.2,2.22,0.00568,0.00509,0,0,48.45,1028.47,0.41,141.23,6021.62,4993.75,1493.68,0.51,6487.95,69.1,183.41,...,263000.0,48700000.0,5130000.0,132000.0,9210000.0,5000000.0,2630000.0,132000.0,526000.0,526000.0,658000.0,132000.0,5520000.0,48700000.0,5130000.0,132000.0,9210000.0,5000000.0,2630000.0,132000.0,526000.0,526000.0,658000.0,132000.0,5520000.0,0,0.23,0,0.02,2823.95,619.59,98.79,31.11,2.99,0.05,2.72,127.7,13.96,116.78,0,46.18,0.16,12.57,0,0.02,854.12,1932.95,1.87,2.51,0,1.91,2.79,0.31,0,0.04,3.79,81.43,3.95e-05,0.0027,0.08,0.09,0.1,0.09,0.07,0.13,32.55,0,0,0.08,0.02,0.00154,0.00339,0.00107,0.000316,0.000118,0.01,0.00185,0.00754,0.19,1.3,1.18,0.56,3.04,0.64,4.02,1.98,0.05,0.1,0.1,0.02,5.26e-05,0.00296,0.000105,0.00151,1.59,0.00358,2.19,3.61,0.07,0.14,0.00249,0.00367,2.12,3.46,0.72,1.21,0.72,1.22,0,0.0015,3.95e-05,0.00118,0,3.95e-05,0.00493,0.39,0.81,0.12,0.000118,0.000158,0,0.000158,0.000237,0.000237,0,3.95e-05,0.000158,0.00442,5.39,3.89,4.36,13.65,0,1579.14,891.37,1077.26,1048.86,68.28,9.51,124.62,110.03,3997.02,613.53,5703.01,4401.0,3639.42,556.18,4852.26,3857.85,771.23,162.17,956.95,750.96,0.18,0,0.51,0.34,91.17,36.46,131.03,109.22,0.21,0.00191,0.25,0.19,7.94,1.37,12.22,8.78,31.51,1.86,76.03,56.61,117235.81,0.04
std,43781.95,39033.46,12.96,1614.76,339.32,546.27,93.16,153.74,30.6,36.51,95.27,319.61,512.15,531.9,950.09,697.71,535.47,953.58,712.77,465.39,0.11,0.0612,0,0,0.2,0.47,0.0103,0.00513,0.18,0.17,0.25,0.21,0.22,0.2,0.2,0.1,0.0995,0.00513,0.00513,0.22,0.15,0.0726,0.0424,0.038,0.00513,0.00513,0.0646,0.0601,0.0519,0.2,0.19,0.16,0.16,0.16,0.16,0.15,0.15,0,0,0,0,0.0103,0.00513,0.07,0.44,0.0652,0.0605,0.0348,0.0328,0.0328,0.0274,0.0251,0.00513,0.00513,0.26,0.25,0.25,0.32,0.11,0.0609,0.33,0,0.0609,0.0433,0.0412,0,0,0.32,0.18,0.91,0.66,1.43,0.0308,0.0154,0.53,0.5,0.89,0.64,0.75,0.62,0.6,0.4,0.38,0.0154,0.0154,0.71,0.61,0.22,0.36,0.289,0.0154,0.0154,0.18,0.156,0.61,0.57,0.63,0.63,0.61,0.61,0.9,0.181,1.83,2.56,7.32,1.22,10.89,16.89,7.39,1.23,11.11,17.18,0,0,0,0,0.0308,0.0154,1.34,1.64,0.41,0.33,0.14,0.14,0.101,0.0863,0.0154,0.0154,2.87,1.65,2.24,2.24,1.14,0.32,0.18,1.11,0,0.18,0.94,1.5,0.131,0.123,0,0,10937.47,9852.14,83.1,2515.66,48144.71,32619.13,20016.49,113.6,38416.75,2839.62,22698.13,...,51300000.0,696000000.0,226000000.0,36300000.0,303000000.0,224000000.0,162000000.0,36300000.0,72500000.0,72500000.0,81100000.0,36300000.0,235000000.0,696000000.0,226000000.0,36300000.0,303000000.0,224000000.0,162000000.0,36300000.0,72500000.0,72500000.0,81100000.0,36300000.0,235000000.0,0,57.32,0,4.1,25334.47,11253.0,22120.72,2457.09,226.86,6.02,554.25,6369.0,1151.51,13620.44,0,2859.74,43.62,1093.13,0,4.35,14255.89,25355.72,388.25,508.97,0,357.41,323.81,53.41,0,10.88,811.98,11282.41,0.00628,0.0519,0.27,0.29,0.3,0.28,0.25,0.34,393.83,0,0,0.55,0.29,0.105,0.174,0.0697,0.0344,0.0188,0.23,0.0991,0.315,1.0,3.45,3.26,2.1,6.21,1.84,10.93,1.3,0.33,0.49,0.49,0.21,0.0103,0.0776,0.0136,0.0625,0.72,0.0882,9.13,14.92,2.09,4.29,0.147,0.202,8.8,14.14,3.21,5.16,3.23,5.18,0,0.0671,0.0109,0.0985,0,0.0109,0.155,2.22,3.56,1.17,0.0243,0.0218,0,0.0218,0.0267,0.0308,0,0.0109,0.0267,0.264,14.5,10.42,14.41,33.3,0,12148.45,9888.6,9614.91,8189.95,1733.84,519.39,2205.25,1935.31,37773.14,9292.75,46202.54,35507.18,26359.17,7182.64,31886.62,25572.25,13082.16,4698.87,16006.98,12422.52,34.63,0,113.6,73.38,15392.48,8612.4,14956.53,13082.16,41.82,0.527,52.08,31.88,455.89,113.96,783.21,538.44,2013.13,147.79,4040.34,2852.58,182664.6,0.19
min,1.0,-999999.0,5.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,0.0,0.0,0.0,0.0,0,0,-0.9,-2895.72,0.0,-4942.26,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,-1.0,-1.0,-1.0,-1.0,0.0,-1.0,0.0,-1.0,0.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,0.0,-1.0,0.0,-1.0,0.0,-1.0,0.0,-1.0,0,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,0.0,0.0,0,0.0,0.0,0.0,0.0,0.0,0,0.0,0.0,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,0.0,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,0.0,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,-128.37,-8.04,-922.38,-476.07,-287.67,0.0,-3401.34,-1844.52,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,0.0,-0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5163.75,0.0
25%,38104.75,2.0,23.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0,0.0,0.0,0.0,0,0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,3.0,0,0.0,3.0,0.0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,0.0,0.0,0,0.0,0.0,0.0,0.0,0.0,0,0.0,0.0,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,0.0,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,0.0,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,67870.61,0.0
50%,76043.0,2.0,28.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0,0.0,0.0,0.0,0,0,0.0,0.0,1.0,3.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0.0,0.0,3.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,3.0,0.0,0.0,3.0,0,0.0,3.0,3.0,0.0,0.0,0,0,0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,0.0,0.0,0,0.0,0.0,0.0,0.0,0.0,0,0.0,0.0,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,0.0,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,0.0,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,3.0,0.99,3.0,2.73,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,106409.16,0.0
75%,113748.75,2.0,40.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0,0.0,0.0,0.0,0,0,0.0,0.0,1.0,3.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0.0,0.0,3.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,3.0,0.0,0.0,3.0,0,0.0,3.0,3.0,0.0,0.0,0,0,0.0,90.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,0.0,0.0,0,0.0,0.0,0.0,0.0,0.0,0,0.0,0.0,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,3.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,0.0,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,0.0,0.0,0,0.0,0.0,0.0,3.0,3.0,3.0,12.0,0,90.0,12.22,90.0,83.79,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,118756.25,0.0
max,151838.0,238.0,105.0,210000.0,12888.03,21024.81,8237.82,11073.57,6600.0,6600.0,8237.82,12888.03,16566.81,45990.0,131100.0,47598.09,45990.0,131100.0,47598.09,105000.0,1.0,1.0,0,0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0,0,0,0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0,1.0,1.0,1.0,0,0,6.0,6.0,7.0,15.0,15.0,3.0,3.0,6.0,3.0,111.0,15.0,18.0,6.0,6.0,18.0,18.0,3.0,3.0,18.0,111.0,12.0,36.0,27.0,3.0,3.0,3.0,3.0,9.0,6.0,33.0,33.0,33.0,33.0,117.0,48.0,234.0,351.0,249.0,81.0,468.0,468.0,249.0,81.0,468.0,468.0,0,0,0,0,3.0,3.0,114.0,33.0,36.0,27.0,12.0,12.0,12.0,6.0,3.0,3.0,36.0,105.0,114.0,114.0,33.0,6.0,3.0,33.0,0,3.0,114.0,18.0,6.0,3.0,0,0,3000000.0,619329.15,19531.8,240045.0,3008077.32,450000.0,1500000.0,30000.0,1500000.0,450000.0,6119500.14,...,10000000000.0,10000000000.0,10000000000.0,10000000000.0,10000000000.0,10000000000.0,10000000000.0,10000000000.0,10000000000.0,10000000000.0,10000000000.0,10000000000.0,10000000000.0,10000000000.0,10000000000.0,10000000000.0,10000000000.0,10000000000.0,10000000000.0,10000000000.0,10000000000.0,10000000000.0,10000000000.0,10000000000.0,10000000000.0,0,15691.8,0,1096.02,840000.0,450000.0,6083691.87,432457.32,36000.0,1260.0,145384.92,1039260.0,210001.35,3410058.66,0,450000.0,12027.15,182132.97,0,1200.0,1155003.0,2310003.0,96781.44,133730.58,0,69622.29,49581.27,13207.32,0,3000.0,209834.4,2754476.46,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,30000.0,0,0,24.0,30.0,12.0,21.0,12.0,6.0,3.0,24.0,9.0,39.0,60.0,123.0,108.0,96.0,234.0,78.0,267.0,3.0,3.0,3.0,3.0,3.0,2.0,3.0,2.0,3.0,3.0,3.0,438.0,600.0,210.0,582.0,24.0,24.0,438.0,438.0,90.0,156.0,90.0,156.0,0,3.0,3.0,21.0,0,3.0,15.0,180.0,264.0,93.0,6.0,3.0,0,3.0,3.0,6.0,0,3.0,6.0,39.0,342.0,339.0,510.0,801.0,0,812137.26,1542339.36,601428.6,544365.57,231351.99,77586.21,228031.8,177582.0,3000538.14,668335.32,3004185.6,2272859.43,450000.0,304838.7,450000.0,450000.0,840000.0,534000.0,1500000.0,1034482.74,7741.95,0,30000.0,18870.99,4210084.23,2368558.95,3998687.46,3525776.88,10430.01,145.0,13793.67,7331.34,50003.88,20385.72,138831.63,91778.73,438329.22,24650.01,681462.9,397884.3,22034738.76,1.0


So this summary tells us quite a bit about the numerical part of the data. Let's break it down:

* We're training our dataset to __TARGET__, which can be 0 or 1. When we look at this column, we see that the 25%, 50%, and 75% are all 0's. This means that our dataset is imbalanced (with a bit more people being 0's than 1's.
* We take that above idea of imbalanced data and look at the rest of our variables (e.g. __imp_ent_var16_ult1__). Most of the data is 0's, but then we get a max value of 210,000.
* I think we'll need to clean the data. For example, column __var3__ has a min of -999999 and a max of 238. My guess is that -999999 stands for a state instead of there being many unique values from -999999 to 0.
* Yeah, we'll need to do some data cleaning; some numeric variables (e.g. __ind_var27_0__, __ind_var_28_0__) are just 0 for min and max. We can drop these since they're not telling us anything. We can look at the standard deviation and if it's 0, then we can get rid of because there's no change (e.g. min=70 with max=70 tells us no useful info)
* It looks like some columns are duplicates. Without even needing to sort, we can see that some columns are duplicates (e.g. __num_var13_medio_0__, __num_var13_medio__ and __ind_var34_0__, __ind_var_34__) where they have have the exact same min, max, standard deviation, mean. This will cause multicollinearity, a fancy way of saying don't have a variable that can linearly predict another variable (e.g. I'm alive and I'm dead, we don't need both since one will tell us the other).

TLDR, we need to:

* [ ] Drop duplicate columns
* [ ] Drop values that are not telling us anything (i.e. standard deviation of 0)
* [ ] Possibly clean data (e.g. -999,999)
* [ ] Compensate for unbalanced dataset (lot more TARGET of 0's than 1's)


In [101]:
# Data Cleaning - Drop values that are not telling us anything (i.e. standard deviation of 1)
print "Before Data Cleaning, data shape is: ", df_train.shape

for col in df_train:
    if df_train[col].std()==0:
        print "Dropping: ", col
        del df_train[col]

Before Data Cleaning, data shape is:  (76020, 371)
Dropping:  ind_var2_0
Dropping:  ind_var2
Dropping:  ind_var27_0
Dropping:  ind_var28_0
Dropping:  ind_var28
Dropping:  ind_var27
Dropping:  ind_var41
Dropping:  ind_var46_0
Dropping:  ind_var46
Dropping:  num_var27_0
Dropping:  num_var28_0
Dropping:  num_var28
Dropping:  num_var27
Dropping:  num_var41
Dropping:  num_var46_0
Dropping:  num_var46
Dropping:  saldo_var28
Dropping:  saldo_var27
Dropping:  saldo_var41
Dropping:  saldo_var46
Dropping:  imp_amort_var18_hace3
Dropping:  imp_amort_var34_hace3
Dropping:  imp_reemb_var13_hace3
Dropping:  imp_reemb_var33_hace3
Dropping:  imp_trasp_var17_out_hace3
Dropping:  imp_trasp_var33_out_hace3
Dropping:  num_var2_0_ult1
Dropping:  num_var2_ult1
Dropping:  num_reemb_var13_hace3
Dropping:  num_reemb_var33_hace3
Dropping:  num_trasp_var17_out_hace3
Dropping:  num_trasp_var33_out_hace3
Dropping:  saldo_var2_ult1
Dropping:  saldo_medio_var13_medio_hace3


In [102]:
print "After data cleaning, data shape is ", df_train.shape

After data cleaning, data shape is  (76020, 337)


In [110]:
# Data Cleaning - Look at duplicate columns
print df_train['num_var13_medio_0'].unique()
print df_train['num_var13_medio'].unique()
print df_train['num_var13_medio_0'].sum()
print df_train['num_var13_medio'].sum()

[0 3]
[0 3]
6
6


In [None]:
# Data Cleaning - Drop duplicate columns
df_train.T.drop_duplicates().T

In [None]:
print df_train.shape()

In [51]:
from collections import OrderedDict

df_train_unique = []
for col in df_train:
    # for col in df_train:
    #     unique_count = len(df_train[col].unique())
    #     df_train_unique[col] = unique_count


In [44]:
df_train_unique

{'ID': 76020,
 'TARGET': 2,
 'delta_imp_amort_var18_1y3': 2,
 'delta_imp_amort_var34_1y3': 2,
 'delta_imp_aport_var13_1y3': 27,
 'delta_imp_aport_var17_1y3': 7,
 'delta_imp_aport_var33_1y3': 9,
 'delta_imp_compra_var44_1y3': 17,
 'delta_imp_reemb_var13_1y3': 2,
 'delta_imp_reemb_var17_1y3': 3,
 'delta_imp_reemb_var33_1y3': 2,
 'delta_imp_trasp_var17_in_1y3': 3,
 'delta_imp_trasp_var17_out_1y3': 2,
 'delta_imp_trasp_var33_in_1y3': 3,
 'delta_imp_trasp_var33_out_1y3': 2,
 'delta_imp_venta_var44_1y3': 5,
 'delta_num_aport_var13_1y3': 6,
 'delta_num_aport_var17_1y3': 6,
 'delta_num_aport_var33_1y3': 4,
 'delta_num_compra_var44_1y3': 9,
 'delta_num_reemb_var13_1y3': 2,
 'delta_num_reemb_var17_1y3': 3,
 'delta_num_reemb_var33_1y3': 2,
 'delta_num_trasp_var17_in_1y3': 3,
 'delta_num_trasp_var17_out_1y3': 2,
 'delta_num_trasp_var33_in_1y3': 3,
 'delta_num_trasp_var33_out_1y3': 2,
 'delta_num_venta_var44_1y3': 5,
 'imp_amort_var18_hace3': 1,
 'imp_amort_var18_ult1': 3,
 'imp_amort_var34_hace3':

### Data Cleaning


In [7]:
df_train.columns

Index([u'ID', u'var3', u'var15', u'imp_ent_var16_ult1',
       u'imp_op_var39_comer_ult1', u'imp_op_var39_comer_ult3',
       u'imp_op_var40_comer_ult1', u'imp_op_var40_comer_ult3',
       u'imp_op_var40_efect_ult1', u'imp_op_var40_efect_ult3',
       u'imp_op_var40_ult1', u'imp_op_var41_comer_ult1',
       u'imp_op_var41_comer_ult3', u'imp_op_var41_efect_ult1',
       u'imp_op_var41_efect_ult3', u'imp_op_var41_ult1',
       u'imp_op_var39_efect_ult1', u'imp_op_var39_efect_ult3',
       u'imp_op_var39_ult1', u'imp_sal_var16_ult1', u'ind_var1_0', u'ind_var1',
       u'ind_var2_0', u'ind_var2', u'ind_var5_0', u'ind_var5', u'ind_var6_0',
       u'ind_var6', u'ind_var8_0', u'ind_var8', u'ind_var12_0', u'ind_var12',
       u'ind_var13_0', u'ind_var13_corto_0', u'ind_var13_corto',
       u'ind_var13_largo_0', u'ind_var13_largo', u'ind_var13_medio_0',
       u'ind_var13_medio', u'ind_var13', u'ind_var14_0', u'ind_var14',
       u'ind_var17_0', u'ind_var17', u'ind_var18_0', u'ind_var18',
     

In [45]:
#df_train.select_dtypes(include=['int64'])

### Algorithm Approach

With an imbalanced dataset, we should try doing the following:

* Random oversampling (add instances) to the underrepresented TARGET type (1)
* Random undersampling (remove instances) from the overrepresented TARGET type (0)
* change our algorithm to overweight the underrepresented TARGET type (1)
* Definitely use AUC (Area Under the ROC Curve) to get the real performance (or else you'll get misleading results)
* We should think about algorithms like AdaBoost, Decision Trees, or some type of ensemble 

### TODO: Plot the Results

Confusion Matrix will show the positives, negatives, false positives

## Split the Data

I mentioned that we should only glimpse at our data because we don't want to cheat and look at all our data (which would cause our algorithm to overfit). So before we do any in-depth analysis, we want to split our data into train and test with __X__ being the features we might want and __y__ being the target.

In [8]:
X = df_train.drop(['TARGET'], axis=1)  # We want all the data except the Target
y = df_train.TARGET  # We only want the target

In [11]:
X.info()
X.head()  # Let's double check the column TARGET is no longer there

<class 'pandas.core.frame.DataFrame'>
Int64Index: 76020 entries, 0 to 76019
Columns: 370 entries, ID to var38
dtypes: float64(111), int64(259)
memory usage: 215.2 MB


In [None]:
y.head()  # let's double check the column TARGET is the only data there

### Variations to splitting the data and validating

There's a few ways we can split and validate our data. First, we can do a simple __train and test split__ (default .75 to .25); the advantage with this method is that it's quick and easy, but the issue is we don't use all the training data. Another method is to do __K-Fold__ where you split your data K times, then do validation and average up the scores; the advantage with this method is you use more data, but this takes more computing time. Depending on how balanced your dataset, you might also want to consider __StratifiedKFold__.

In [None]:
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y)
X_train.shape, X_test.shape

In [None]:
from sklearn.cross_validation import KFold
