# Kaggle Competiton | BNP Paribas Cardif Claims Management

>We evaluate which parameters of Random Forest classifier provides the best score, in this case the highest area under the ROC Curve.

>The prediction is performed through Grid Search. Two datasets are generated in each run and two parameters are evaluated each time.

>The ROC Curve area in each case is plotted for each prediction in a heatmap.

>The choosen parameters will be used in run_predict.py for final prediction.

Go to the official page of the [Kaggle Competition.](https://www.kaggle.com/c/bnp-paribas-cardif-claims-management)

### Goal for this Notebook:
* Generate and evaluate predictions through Grid Search
* Compare different values for different parameters of Random Forest classifier

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pandas import Series, DataFrame
import seaborn as sns
from sklearn import preprocessing
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import Imputer
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import confusion_matrix
from sklearn.cross_validation import KFold
from sklearn.cross_validation import train_test_split
from sklearn import metrics
from sklearn import grid_search
from scipy import stats
from data_modifier import *

%matplotlib inline

  if 'order' in inspect.getargspec(np.copy)[0]:


* Load data and split train dataset into train and test 80%-20%

In [3]:
train = pd.read_csv("../../../github_data/bnp_paribas_cardif_data/train.csv")

In [5]:
tr_a, te_a = train_test_split(train, train_size = 0.8) 
y_train = tr_a.target
y_test = te_a.target
columns = train.columns
x_train = tr_a[columns[2:]]
x_test = te_a[columns[2:]]

* Choose differnt combinations of Random Forest parameters for Grid Search

In [6]:
param_grid_1 =  {'randomforestclassifier__max_depth': [8,16],
               'randomforestclassifier__criterion': ['gini'], 
               'randomforestclassifier__n_estimators':[20,100], 
               'randomforestclassifier__max_leaf_nodes':[None], 
               'randomforestclassifier__min_samples_split':[2],
               'randomforestclassifier__min_samples_leaf':[10], 
               'randomforestclassifier__min_weight_fraction_leaf':[0.0],
               'randomforestclassifier__n_jobs':[1]
}
param_grid_2 =  {'randomforestclassifier__max_depth': [16,300],
               'randomforestclassifier__criterion': ['gini'], 
               'randomforestclassifier__n_estimators':[5,10], 
               'randomforestclassifier__max_leaf_nodes':[None], 
               'randomforestclassifier__min_samples_split':[2],
               'randomforestclassifier__min_samples_leaf':[10], 
               'randomforestclassifier__min_weight_fraction_leaf':[0.0],
               'randomforestclassifier__n_jobs':[1]
}
param_grid_3 =  {'randomforestclassifier__max_depth': [16,75],
               'randomforestclassifier__criterion': ['gini'], 
               'randomforestclassifier__n_estimators':[2,5], 
               'randomforestclassifier__max_leaf_nodes':[None], 
               'randomforestclassifier__min_samples_split':[2],
               'randomforestclassifier__min_samples_leaf':[10], 
               'randomforestclassifier__min_weight_fraction_leaf':[0.0],
               'randomforestclassifier__n_jobs':[1]
}
param_grid_4 =  {'randomforestclassifier__max_depth': [8,16],
               'randomforestclassifier__criterion': ['gini'], 
               'randomforestclassifier__n_estimators':[10], 
               'randomforestclassifier__max_leaf_nodes':[None], 
               'randomforestclassifier__min_samples_split':[2],
               'randomforestclassifier__min_samples_leaf':[1,5], 
               'randomforestclassifier__min_weight_fraction_leaf':[0.0],
               'randomforestclassifier__n_jobs':[1]
}
param_grid_5 =  {'randomforestclassifier__max_depth': [16,25],
               'randomforestclassifier__criterion': ['gini'], 
               'randomforestclassifier__n_estimators':[10,50], 
               'randomforestclassifier__max_leaf_nodes':[None], 
               'randomforestclassifier__min_samples_split':[2],
               'randomforestclassifier__min_samples_leaf':[1], 
               'randomforestclassifier__min_weight_fraction_leaf':[0.0],
               'randomforestclassifier__n_jobs':[1]
}
param_grid_6 =  {'randomforestclassifier__max_depth': [25,40],
               'randomforestclassifier__criterion': ['gini'], 
               'randomforestclassifier__n_estimators':[5,10], 
               'randomforestclassifier__max_leaf_nodes':[None], 
               'randomforestclassifier__min_samples_split':[2],
               'randomforestclassifier__min_samples_leaf':[1], 
               'randomforestclassifier__min_weight_fraction_leaf':[0.0],
               'randomforestclassifier__n_jobs':[1]
}
param_grid_7 =  {'randomforestclassifier__max_depth': [40,55],
               'randomforestclassifier__criterion': ['gini'], 
               'randomforestclassifier__n_estimators':[5,10], 
               'randomforestclassifier__max_leaf_nodes':[None], 
               'randomforestclassifier__min_samples_split':[2],
               'randomforestclassifier__min_samples_leaf':[1], 
               'randomforestclassifier__min_weight_fraction_leaf':[0.0],
               'randomforestclassifier__n_jobs':[1]
}
param_grid_8 =  {'randomforestclassifier__max_depth': [40,60],
               'randomforestclassifier__criterion': ['gini'], 
               'randomforestclassifier__n_estimators':[100,500], 
               'randomforestclassifier__max_leaf_nodes':[None], 
               'randomforestclassifier__min_samples_split':[2],
               'randomforestclassifier__min_samples_leaf':[1], 
               'randomforestclassifier__min_weight_fraction_leaf':[0.0],
               'randomforestclassifier__n_jobs':[1]
}

* Grid Search in two datasets with given parameters and ROC Curve as score function

In [7]:
call = PipelineBNP(RandomForestClassifier)
gs = grid_search.GridSearchCV(call, param_grid_8, cv=2, scoring='roc_auc', n_jobs=2, pre_dispatch='n_jobs')
gs = gs.fit(x_train,y_train)

pipeline done.


  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspe

NullToNaNTrans fit done.
NullToNaNTrans transform done.
DataSpliterTrans fit done.
DataSpliterTrans transform done.
DataSpliterTrans fit done.
DataSpliterTrans transform done.
DataSpliterTrans fit done.
DataSpliterTrans transform done.
ObjtoCatStrtoIntTrans fit done.


  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspe

NullToNaNTrans fit done.
NullToNaNTrans transform done.
DataSpliterTrans fit done.
DataSpliterTrans transform done.
DataSpliterTrans fit done.
DataSpliterTrans transform done.
DataSpliterTrans fit done.
DataSpliterTrans transform done.
(45727, 19)
ObjtoCatStrtoIntTrans transform done.
ObjtoCatStrtoIntTrans fit done.
(45729, 19)
ObjtoCatStrtoIntTrans transform done.


  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)


NullToNaNTrans transform done.
DataSpliterTrans transform done.
DataSpliterTrans transform done.
DataSpliterTrans transform done.
NullToNaNTrans transform done.
DataSpliterTrans transform done.
DataSpliterTrans transform done.
DataSpliterTrans transform done.
(45729, 19)
ObjtoCatStrtoIntTrans transform done.


  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspe

(45727, 19)
ObjtoCatStrtoIntTrans transform done.


  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspe

NullToNaNTrans fit done.
NullToNaNTrans transform done.
DataSpliterTrans fit done.
DataSpliterTrans transform done.
DataSpliterTrans fit done.
DataSpliterTrans transform done.
DataSpliterTrans fit done.
DataSpliterTrans transform done.


  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)


ObjtoCatStrtoIntTrans fit done.


  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)


NullToNaNTrans fit done.
NullToNaNTrans transform done.
DataSpliterTrans fit done.
DataSpliterTrans transform done.
DataSpliterTrans fit done.
DataSpliterTrans transform done.
DataSpliterTrans fit done.
DataSpliterTrans transform done.
ObjtoCatStrtoIntTrans fit done.
(45727, 19)
ObjtoCatStrtoIntTrans transform done.
(45729, 19)
ObjtoCatStrtoIntTrans transform done.


  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspe

NullToNaNTrans transform done.
DataSpliterTrans transform done.
DataSpliterTrans transform done.
DataSpliterTrans transform done.
(45729, 19)
ObjtoCatStrtoIntTrans transform done.
NullToNaNTrans transform done.
DataSpliterTrans transform done.
DataSpliterTrans transform done.
DataSpliterTrans transform done.
(45727, 19)
ObjtoCatStrtoIntTrans transform done.


  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspe

NullToNaNTrans fit done.
NullToNaNTrans transform done.
DataSpliterTrans fit done.
DataSpliterTrans transform done.
DataSpliterTrans fit done.
DataSpliterTrans transform done.
DataSpliterTrans fit done.
DataSpliterTrans transform done.
ObjtoCatStrtoIntTrans fit done.
(91456, 19)
ObjtoCatStrtoIntTrans transform done.


  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspec(init)
  args, varargs, kw, default = inspect.getargspe

* Check order of parameters and extract them for the heatmap

In [8]:
sco = gs.grid_scores_
sco

[mean: 0.58540, std: 0.00034, params: {'randomforestclassifier__min_weight_fraction_leaf': 0.0, 'randomforestclassifier__n_jobs': 1, 'randomforestclassifier__n_estimators': 2, 'randomforestclassifier__max_leaf_nodes': None, 'randomforestclassifier__criterion': 'gini', 'randomforestclassifier__max_depth': None, 'randomforestclassifier__min_samples_leaf': 1, 'randomforestclassifier__min_samples_split': 2},
 mean: 0.62269, std: 0.00038, params: {'randomforestclassifier__min_weight_fraction_leaf': 0.0, 'randomforestclassifier__n_jobs': 1, 'randomforestclassifier__n_estimators': 5, 'randomforestclassifier__max_leaf_nodes': None, 'randomforestclassifier__criterion': 'gini', 'randomforestclassifier__max_depth': None, 'randomforestclassifier__min_samples_leaf': 1, 'randomforestclassifier__min_samples_split': 2}]

In [11]:
meanv = np.zeros(len(sco))
deptv = np.zeros(len(sco),dtype=np.int)
estiv = np.zeros(len(sco),dtype=np.int)
deptk = [i for i in sco[0].parameters.keys()][7]
estik = [i for i in sco[0].parameters.keys()][2]

l = 0
while l < len(sco):
    mean = sco[l].mean_validation_score
    par = sco[l].parameters.values()
    values = [v for v in par]
    meanv[l] = mean
    estiv[l] = values[2]
    deptv[l] = values[7]
    l += 1
print(deptk,estik)

randomforestclassifier__min_samples_split randomforestclassifier__n_estimators


In [14]:
ma = np.matrix([[meanv[0], meanv[1]], [meanv[2], meanv[3]]], dtype=np.float64)
scores = pd.DataFrame(ma,columns=np.unique(estiv),index=np.unique(deptv))
ax = sns.heatmap(scores)
ax.set_title('parameters evaluation roc_auc 7 depth vs n_estimators')
ax.set_title
fig = ax.get_figure()
fig.savefig("par_eval_7_depth40_55_esti5_10.png")
print(ma)

IndexError: index 2 is out of bounds for axis 0 with size 2

In [None]:
y_predict = gs.predict_proba(x_test)

The best combination of parameters will be choosen for runing the final prediction with run_predict.py. The results will be written into results.csv and uploaded.