# Notebook de modélisation
### Sommaire : 
- [1 - Import packages](#1)
- [2 - Import data](#2)
- [3 - Feature engineering](#3)
    - [3.1 - Features continues](#3.1)
    - [3.2 - Features catégorielles](#3.2)
    - [3.3 - Features spécifiques (temporelles)](#3.3)
    - [3.4 - Pipeline globale](#3.4)
    - [3.5 - Test de la pipeline globale](#3.5)
- [4 - Modelisation](#4)
    - [4.1 - Sélection du modèle/recherche d'hyperparamètres](#4.1)
    - [4.2 Vérification résultats sur le dataset de test](#4.2)

# 1 - Import packages <a name="1"></a>

In [159]:
import pandas as pd
import math
import missingno as msno
pd.options.display.max_columns = 100
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
import numpy as np
import datetime
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.ensemble import RandomForestClassifier
from sklearn.impute import SimpleImputer
from sklearn.linear_model import LogisticRegression
from sklearn import metrics
from sklearn.pipeline import FeatureUnion, Pipeline 
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.model_selection import GridSearchCV, train_test_split, RandomizedSearchCV
from sklearn.metrics import make_scorer
from sklearn.metrics import accuracy_score
from sklearn.feature_selection import RFE
import warnings
warnings.filterwarnings('ignore')

In [68]:
from utils import weighted_log_loss

# 2 - Import data<a name="2"></a>

In [4]:
train = pd.read_csv('data/train.csv', dtype={'location':object,'target':object})
test = pd.read_csv('data/test.csv', dtype={'location':object})
print(f"Shape of train dataset: {train.shape}")
print(f"Shape of test dataset : {test.shape}")

Shape of train dataset: (25000, 24)
Shape of test dataset : (25000, 23)


In [5]:
dept = pd.read_csv('https://www.data.gouv.fr/fr/datasets/r/70cef74f-70b1-495a-8500-c089229c0254', usecols=['code_departement', 'nom_region'])

In [6]:
train = train.merge(dept, how='left', left_on="location", right_on="code_departement")
test = test.merge(dept, how='left', left_on="location", right_on="code_departement")

In [7]:
# cleaning - drop columns with 50% or more of nan
perc = 50
min_count =  int(((100-perc)/100)*train.shape[0] + 1)
train = train.dropna(axis=1, thresh=min_count)
test = test[[col for col in train.columns if col != 'target']]

In [8]:
train.shape, test.shape

((25000, 22), (25000, 21))

In [151]:
# train test split
X_train, X_test, y_train, y_test = train_test_split(train[[col for col in train.columns if col != 'target']], 
                                                    train['target'], test_size=0.2)

In [153]:
X_train.shape, y_train.shape, X_test.shape, y_test.shape

((20000, 21), (20000,), (5000, 21), (5000,))

# 3. Feature engineering<a name="3"></a>

Le feature engineering va se faire avec les pipelines. Cela simplifie le code et améliore sa lisibilité. On peut aussi facilement tester plusieurs approches différentes dans notre recherche d'hyperparamètres

In [9]:
train.head()

Unnamed: 0,id,AP,creation_date_answer,situation,location,gc_id,gc_label,creation_date_global,id_group,id_group_2,favorite_fruit,fruit_situation_id,fruit_situation_label,number_of_fruit,id_group_3,creation_date_request,hobby,id_group_4,green_vegetables,target,code_departement,nom_region
0,a46cfa61ea20a,f,2019-03-13 11:14:42.549,-1,52,70,G,2019-01-17 10:50:57.767,2d7e206d46ea1,36bac09400660,poire,120,jzy,-1,812a43d710ace,2019-03-13 11:14:42.549,football,aa8f4934a31eb,f,0,52,Grand Est
1,c3d0cb8f0c5e2,f,2019-03-21 14:27:32.441,-1,78,10,A,2018-08-20 05:57:51.038,35e96d6848871,80a697d593706,clementine,10,ae,-1,4b59257f24573,2019-03-21 14:27:32.441,football,6ff9ea9ec85fd,f,1,78,Île-de-France
2,05dfbe0ec3a8b,f,2019-03-15 17:49:50.67,-1,70,10,A,2018-12-20 13:45:51.752,ffaf8085e383d,c309176b96268,clementine,200,ag,-1,f1a838f0d194b,2019-03-15 17:49:50.67,football,6a49a0a97b049,f,0,70,Bourgogne-Franche-Comté
3,952e869ee1076,f,2019-01-07 08:19:29.114,-1,84,10,A,2018-07-21 10:28:49.386,5360cf0a40ce3,13c1a3597648b,clementine,10,ae,0,c3196847d1c14,2019-01-07 08:19:29.114,football,d0dcf1ca1bf04,f,1,84,Provence-Alpes-Côte d'Azur
4,5bd0e71b1395b,f,2019-02-03 17:57:22.926,-1,29,20,D,2018-12-07 19:59:26.968,126c3211f23fc,7b68e0a456571,clementine,10,ae,-1,70e18c6fe58cd,2019-02-03 17:57:22.926,football,b4870b1c8eb42,f,1,29,Bretagne


In [131]:
# Variables choisies
numerical_features = ["situation", "gc_id", "fruit_situation_id", "number_of_fruit"]
categorical_features = ["AP", "gc_label", "favorite_fruit", "fruit_situation_label", "hobby", "green_vegetables", "nom_region"]
date_features = ["creation_date_global"]
target = ["target"]

In [132]:
# Permet de sélectionner les variables concernées par la pipeline
class FeatureSelector(BaseEstimator, TransformerMixin):
    def __init__(self, feature_names):
        self.feature_names = feature_names   
    def fit( self, X, y = None ):
        return self
    def transform(self, X, y=None):
        return X.loc[:, self.feature_names].copy(deep=True)

## 3.1 Features continues<a name="3.1"></a>

In [23]:
# Pipeline pour les features continues : sélection des features + imputation des valeurs manquantes + standard scaling
numerical_pipeline = Pipeline(steps = [ 
    ("num_selector", FeatureSelector(numerical_features)),
    ("imputer", SimpleImputer(strategy="median")),
    ("std_scaler", StandardScaler()) 
])

## 3.2 Features catégorielles<a name="3.2"></a>

In [24]:
# Pipeline pour les features catégorielles : sélection des features + one hot encoding
categorical_pipeline = Pipeline(steps = [ 
    ("num_selector", FeatureSelector(categorical_features)),
    ("ohe", OneHotEncoder(
        handle_unknown="ignore", 
        sparse=False,
        categories='auto')
    ) 
])

## 3.3 Features spécifiques (temporelles)<a name="3.3"></a>

In [47]:
# Pipeline spécifique aux features temporelles
class date_processing_features(BaseEstimator, TransformerMixin):
    def __init__(self):
        pass
    def fit( self, X, y = None ):
        return self
    def transform(self, X, y=None):
        for col in date_features:
            X.loc[:, col] = pd.to_datetime(X.loc[:, col])
            X.loc[:, col+"_month"] = X.loc[:, col].dt.month
            X.loc[:, col+"_weekday"] = X.loc[:, col].dt.weekday
            X.loc[:, col+"_month_norm"] = 2 * math.pi * X.loc[:, col+"_month"] / X.loc[:, col+"_month"].max()
            X.loc[:, col+"_month_cos"] = np.cos(X.loc[:, col+"_month_norm"])
            X.loc[:, col+"_weekday_norm"] = 2 * math.pi * X.loc[:, col+"_weekday"] / X.loc[:, col+"_weekday"].max()
            X.loc[:, col+"_weekday_cos"] = np.cos(X.loc[:, col+"_weekday_norm"])
        liste_cols = [[col+"_weekday_cos", col+"_month_cos"] for col in date_features]
        flat_list = [item for sublist in liste_cols for item in sublist]
        return X.loc[:, flat_list]


date_processing_features_pipeline = Pipeline(steps = [ 
    ("selector", FeatureSelector(date_features)),
    ("feature_engineering", date_processing_features()),
    ("imputer", SimpleImputer(strategy="median"))
])

In [48]:
liste_cols = [[col+"_weekday_cos", col+"_month_cos"] for col in date_features]
flat_list_date_features = [item for sublist in liste_cols for item in sublist]

## 3.4 Pipeline globale<a name="3.4"></a>

In [56]:
# Union des pipelines du feature engineering
feature_pipeline = FeatureUnion(
    n_jobs=-1, 
    transformer_list=[ 
        ("numerical_pipeline", numerical_pipeline),
        ("categorical_pipeline", categorical_pipeline),
        ("date_features_pipeline", date_processing_features_pipeline),
    ]
)

## 3.5 Test de la pipeline globale<a name="3.5"></a>

In [142]:
def test_feature_pipeline():
    test_df = train.sample(5).copy(deep=True).reset_index()
    display(test_df)
    feature_pipeline.fit(test_df)
    test_results = pd.DataFrame(feature_pipeline.transform(test_df),
            columns = (
                numerical_features 
                + list(feature_pipeline.transformer_list[1][1]["ohe"].get_feature_names(categorical_features))
                + list(flat_list_date_features)
            ))
    display(test_results)
    return test_results
test_results = test_feature_pipeline()

Unnamed: 0,index,id,AP,creation_date_answer,situation,location,gc_id,gc_label,creation_date_global,id_group,id_group_2,favorite_fruit,fruit_situation_id,fruit_situation_label,number_of_fruit,id_group_3,creation_date_request,hobby,id_group_4,green_vegetables,target,code_departement,nom_region
0,18419,684355024dd59,f,2019-01-11 11:08:13.674,-1,32,20,D,2018-12-28 09:52:41.621,fedf42bd1823f,51544d0e8775c,clementine,10,ae,-1,c7aae3098de10,2019-01-11 11:08:13.674,football,9a5c503111a95,f,1,32,Occitanie
1,21258,b70726071d817,f,2019-01-07 06:50:00,-1,31,20,D,2019-01-07 11:31:05.285,82b4ea15ff5aa,649f37ae10034,clementine,80,li,-1,06765cf62d184,2019-01-07 06:50:00,football,77f223998407e,f,0,31,Occitanie
2,4152,00190978bb020,f,2019-01-29 08:28:50.866,-1,46,40,B,2018-04-26 14:00:19.739,0c945f92c5aaf,37a206ebd3be2,poire,10,ae,-1,ab5f56971243b,2019-01-29 08:28:50.866,football,0b86cd9365cbd,f,1,46,Occitanie
3,11419,89e6f78d7738d,f,2019-01-07 10:08:45.637,-1,56,20,D,2017-08-11 00:00:00.000,d2b739561c0ae,9faf97a9255b7,clementine,120,jzy,-1,cdcb1bbdab594,2019-01-07 10:08:45.637,football,71fa6e1f7b590,f,0,56,Bretagne
4,13202,efddd0449f17a,f,2019-02-20 11:52:00,-1,89,20,D,2019-02-11 15:51:21.233,ba24ce7c199b6,f5e555b6ff7b1,clementine,120,jzy,-1,05507335b6296,2019-02-20 11:52:00,football,2eeae2376c26c,f,0,89,Bourgogne-Franche-Comté


Unnamed: 0,situation,gc_id,fruit_situation_id,number_of_fruit,AP_f,gc_label_B,gc_label_D,favorite_fruit_clementine,favorite_fruit_poire,fruit_situation_label_ae,fruit_situation_label_jzy,fruit_situation_label_li,hobby_football,green_vegetables_f,nom_region_Bourgogne-Franche-Comté,nom_region_Bretagne,nom_region_Occitanie,creation_date_global_weekday_cos,creation_date_global_month_cos
0,0.0,-0.5,-1.170345,0.0,1.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,1.0,1.0
1,0.0,-0.5,0.24214,0.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,1.0,1.0,0.866025
2,0.0,2.0,-1.170345,0.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,-1.83697e-16,-0.5
3,0.0,-0.5,1.049275,0.0,1.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,1.0,1.0,0.0,1.0,0.0,1.0,-0.5
4,0.0,-0.5,1.049275,0.0,1.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,1.0,1.0,1.0,0.0,0.0,1.0,0.5


#### == Analyse == 
La pipeline feature engineering fonctionne bien

# 4. Modelisation<a name="4"></a>

## 4.1 Sélection du modèle/recherche d'hyperparamètres<a name="4.1"></a>

In [104]:
# 2 scorers : weighted_log_loss and accuracy
my_scorer_wll = make_scorer(weighted_log_loss, greater_is_better=False, needs_proba=True)
my_scorer_accuracy = make_scorer(accuracy_score, greater_is_better=True)

Idéalement la recherche d'hyperparamètres se ferait sur un param grid large comme celui-là

In [122]:
"""
# max_depth
max_depth = [int(x) for x in np.linspace(3, 15, num = 5)]
# Number of trees in random forest
n_estimators = [int(x) for x in np.linspace(start = 100, stop = 500, num = 5)]
# Number of features to consider at every split
max_features = ['auto', 'sqrt']
# Maximum number of levels in tree
max_depth = [int(x) for x in np.linspace(3, 15, num = 5)]
max_depth.append(None)
# Minimum number of samples required to split a node
min_samples_split = [10, 20, 50]
# Minimum number of samples required at each leaf node
min_samples_leaf = [5, 10]
# Method of selecting samples for training each tree
bootstrap = [True, False]
# learning rate
learning_rate = [0.1, 0.2, 0.3]
# subsample
subsample = [0.8, 0.9, 1]
"""

In [154]:
# C for logistic regression
C = [0.1, 10]
# Number of trees in random forest
n_estimators = [int(x) for x in np.linspace(start = 100, stop = 200, num = 2)]
# Maximum number of levels in tree
max_depth = [int(x) for x in np.linspace(3, 6, num = 2)]
max_depth.append(None)
# Minimum number of samples required to split a node
min_samples_split = [10, 20]
# Minimum number of samples required at each leaf node
min_samples_leaf = [5, 10]
# Method of selecting samples for training each tree
bootstrap = [True, False]
# learning rate
learning_rate = [0.1, 0.2]
# subsample
subsample = [0.8, 1]

Idéalement on aurait rajouté une feature selection après le feature engineering

In [162]:
# model_pipeline = Pipeline(steps=[
#     ("feature_pipeline", feature_pipeline),
#     ("rfe", RFE(RandomForestClassifier(), n_features_to_select=0.85)),
#     ("model", LogisticRegression())
# ])
model_pipeline = Pipeline(steps=[
    ("feature_pipeline", feature_pipeline),
    ("model", LogisticRegression())
])
param_grid = [
    {
        "feature_pipeline__numerical_pipeline__imputer__strategy": ["mean", "median"],
        "model": [LogisticRegression()],
        "model__C": C,
    },
    {
        "feature_pipeline__numerical_pipeline__imputer__strategy": ["median"],
        "model": [RandomForestClassifier()],
        "model__max_depth": max_depth,
        "model__max_features": max_features,
        "model__n_estimators": n_estimators,
        "model__min_samples_split": min_samples_split,
        "model__min_samples_leaf": min_samples_leaf,
        "model__bootstrap": bootstrap
    },
    {
    "feature_pipeline__numerical_pipeline__imputer__strategy": ["median"],
    "model": [GradientBoostingClassifier()],
    "model__n_estimators": n_estimators,
    "model__max_depth": max_depth,
    "model__min_samples_split": min_samples_split,
    "model__min_samples_leaf": min_samples_leaf,
    "model__max_features": max_features,
    "model__learning_rate": learning_rate,
    "model__subsample": subsample
    }
]
grid_search = GridSearchCV(
    model_pipeline, 
    param_grid, 
    cv=3,
    scoring={'weighted_log_loss' : my_scorer, 'accuracy' : my_scorer_accuracy},
    refit="weighted_log_loss",
    verbose=3
)

randomized_search = RandomizedSearchCV(
    model_pipeline, 
    param_grid, 
    cv=3,
    scoring={'weighted_log_loss' : my_scorer, 'accuracy' : my_scorer_accuracy},
    refit="weighted_log_loss",
    n_iter = 10,
    verbose=3
)

In [None]:
# RandomizedSearchCV
now = datetime.datetime.now()
randomized_search.fit(X_train, y_train)
print(datetime.datetime.now() - now)

Fitting 3 folds for each of 10 candidates, totalling 30 fits
[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=False, model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=100; accuracy: (test=0.692) weighted_log_loss: (test=-599.610) total time=   2.5s
[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=False, model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=100; accuracy: (test=0.690) weighted_log_loss: (test=-673.316) total time=   2.8s
[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=False, model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=1

#### == Remarque ==
Avec plus de temps, on aurait 
- choisi un nombre élevé d'itérations pour Randomized search CV
- testé plus d'hyperparamètres
- rajouté une feature sélection
- fait un Grid search CV autour des valeurs du meilleur modèle sélectionné par le Randomized search CV

In [158]:
# GridSearchCV
now = datetime.datetime.now()
grid_search.fit(X_train, y_train)
print(datetime.datetime.now() - now)

Fitting 3 folds for each of 292 candidates, totalling 876 fits
[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=LogisticRegression(), model__C=0.1; accuracy: (test=0.679) weighted_log_loss: (test=-618.528) total time=   1.6s
[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=LogisticRegression(), model__C=0.1; accuracy: (test=0.676) weighted_log_loss: (test=-683.091) total time=   1.7s
[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=LogisticRegression(), model__C=0.1; accuracy: (test=0.682) weighted_log_loss: (test=-597.398) total time=   1.7s
[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=LogisticRegression(), model__C=10; accuracy: (test=0.679) weighted_log_loss: (test=-613.632) total time=   1.7s
[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=LogisticRegression(), model__C=10; accuracy: (test=0.679) weighted_log_loss: (tes

[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=True, model__max_depth=3, model__max_features=auto, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=100; accuracy: (test=0.667) weighted_log_loss: (test=-658.749) total time=   1.6s
[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=True, model__max_depth=3, model__max_features=auto, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=100; accuracy: (test=0.661) weighted_log_loss: (test=-699.639) total time=   1.7s
[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=True, model__max_depth=3, model__max_features=auto, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=100; accuracy: (test=0.669) weighted_log_loss: (test=-654.032) total ti

[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=True, model__max_depth=3, model__max_features=sqrt, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=100; accuracy: (test=0.662) weighted_log_loss: (test=-702.464) total time=   1.6s
[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=True, model__max_depth=3, model__max_features=sqrt, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=100; accuracy: (test=0.669) weighted_log_loss: (test=-654.735) total time=   1.6s
[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=True, model__max_depth=3, model__max_features=sqrt, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=200; accuracy: (test=0.673) weighted_log_loss: (test=-662.303) total ti

[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=True, model__max_depth=6, model__max_features=auto, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=100; accuracy: (test=0.677) weighted_log_loss: (test=-610.586) total time=   2.1s
[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=True, model__max_depth=6, model__max_features=auto, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=200; accuracy: (test=0.679) weighted_log_loss: (test=-626.151) total time=   3.4s
[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=True, model__max_depth=6, model__max_features=auto, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=200; accuracy: (test=0.672) weighted_log_loss: (test=-683.259) total ti

[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=True, model__max_depth=6, model__max_features=sqrt, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=200; accuracy: (test=0.676) weighted_log_loss: (test=-623.092) total time=   2.9s
[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=True, model__max_depth=6, model__max_features=sqrt, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=200; accuracy: (test=0.672) weighted_log_loss: (test=-682.359) total time=   3.6s
[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=True, model__max_depth=6, model__max_features=sqrt, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=200; accuracy: (test=0.677) weighted_log_loss: (test=-612.953) total ti

[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=True, model__max_depth=None, model__max_features=auto, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=200; accuracy: (test=0.686) weighted_log_loss: (test=-671.981) total time=   3.8s
[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=True, model__max_depth=None, model__max_features=auto, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=200; accuracy: (test=0.691) weighted_log_loss: (test=-584.747) total time=   3.6s
[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=True, model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=100; accuracy: (test=0.690) weighted_log_loss: (test=-602.251) 

[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=True, model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=200; accuracy: (test=0.690) weighted_log_loss: (test=-585.131) total time=   4.5s
[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=False, model__max_depth=3, model__max_features=auto, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=100; accuracy: (test=0.672) weighted_log_loss: (test=-661.478) total time=   2.0s
[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=False, model__max_depth=3, model__max_features=auto, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=100; accuracy: (test=0.664) weighted_log_loss: (test=-697.771) total

[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=False, model__max_depth=3, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=100; accuracy: (test=0.672) weighted_log_loss: (test=-655.103) total time=   1.9s
[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=False, model__max_depth=3, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=100; accuracy: (test=0.665) weighted_log_loss: (test=-696.325) total time=   1.7s
[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=False, model__max_depth=3, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=100; accuracy: (test=0.664) weighted_log_loss: (test=-659.240) total ti

[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=False, model__max_depth=6, model__max_features=auto, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=100; accuracy: (test=0.674) weighted_log_loss: (test=-681.232) total time=   2.1s
[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=False, model__max_depth=6, model__max_features=auto, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=100; accuracy: (test=0.677) weighted_log_loss: (test=-610.599) total time=   3.0s
[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=False, model__max_depth=6, model__max_features=auto, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=200; accuracy: (test=0.677) weighted_log_loss: (test=-623.741) total ti

[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=False, model__max_depth=6, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=100; accuracy: (test=0.686) weighted_log_loss: (test=-616.327) total time=   2.9s
[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=False, model__max_depth=6, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=200; accuracy: (test=0.676) weighted_log_loss: (test=-623.242) total time=   4.5s
[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=False, model__max_depth=6, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=200; accuracy: (test=0.672) weighted_log_loss: (test=-679.096) total ti

[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=False, model__max_depth=None, model__max_features=auto, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=200; accuracy: (test=0.691) weighted_log_loss: (test=-598.155) total time=   6.6s
[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=False, model__max_depth=None, model__max_features=auto, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=200; accuracy: (test=0.690) weighted_log_loss: (test=-675.949) total time=   4.7s
[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=False, model__max_depth=None, model__max_features=auto, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=200; accuracy: (test=0.694) weighted_log_loss: (test=-577.055)

KeyboardInterrupt: 

## 4.2 Vérification résultats sur le dataset de test<a name="4.2"></a>

In [None]:
# Résultats complets
randomized_search.cv_results_

In [None]:
# Meilleur score
randomized_search.best_score_

In [None]:
# Calcul des prédictions sur le dataset de test
y_test_proba = randomized_search.best_estimator_.predict_proba(X_test)
y_test_pred = randomized_search.best_estimator_.predict(X_test)

In [None]:
weighted_log_loss(y_test, y_test_proba), accuracy_score(y_test, y_test_pred)

In [None]:
# sauvegarde du modèle

In [None]:
# sauvegarde des prédictions

In [130]:
now = datetime.datetime.now()
grid_search.fit(train, train[target[0]])
print(datetime.datetime.now() - now)

Fitting 3 folds for each of 580 candidates, totalling 1740 fits
[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=LogisticRegression(), model__C=0.1; accuracy: (test=0.683) weighted_log_loss: (test=-628.556) total time=   2.3s
[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=LogisticRegression(), model__C=0.1; accuracy: (test=0.679) weighted_log_loss: (test=-627.345) total time=   2.3s
[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=LogisticRegression(), model__C=0.1; accuracy: (test=0.676) weighted_log_loss: (test=-642.106) total time=   2.4s
[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=LogisticRegression(), model__C=10; accuracy: (test=0.683) weighted_log_loss: (test=-626.477) total time=   2.5s
[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=LogisticRegression(), model__C=10; accuracy: (test=0.679) weighted_log_loss: (te

[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=RandomForestClassifier(), model__bootstrap=True, model__max_depth=3, model__max_features=auto, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=100; accuracy: (test=0.671) weighted_log_loss: (test=-666.350) total time=   2.7s
[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=RandomForestClassifier(), model__bootstrap=True, model__max_depth=3, model__max_features=auto, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=100; accuracy: (test=0.674) weighted_log_loss: (test=-666.795) total time=   2.5s
[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=RandomForestClassifier(), model__bootstrap=True, model__max_depth=3, model__max_features=auto, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=100; accuracy: (test=0.668) weighted_log_loss: (test=-670.969) total time=   

[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=RandomForestClassifier(), model__bootstrap=True, model__max_depth=3, model__max_features=sqrt, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=100; accuracy: (test=0.668) weighted_log_loss: (test=-675.813) total time=   2.3s
[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=RandomForestClassifier(), model__bootstrap=True, model__max_depth=3, model__max_features=sqrt, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=100; accuracy: (test=0.666) weighted_log_loss: (test=-678.776) total time=   2.3s
[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=RandomForestClassifier(), model__bootstrap=True, model__max_depth=3, model__max_features=sqrt, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=200; accuracy: (test=0.670) weighted_log_loss: (test=-668.144) total time=   

[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=RandomForestClassifier(), model__bootstrap=True, model__max_depth=6, model__max_features=auto, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=100; accuracy: (test=0.670) weighted_log_loss: (test=-651.794) total time=   2.6s
[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=RandomForestClassifier(), model__bootstrap=True, model__max_depth=6, model__max_features=auto, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=200; accuracy: (test=0.678) weighted_log_loss: (test=-638.466) total time=   5.8s
[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=RandomForestClassifier(), model__bootstrap=True, model__max_depth=6, model__max_features=auto, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=200; accuracy: (test=0.679) weighted_log_loss: (test=-635.449) total time=   

[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=RandomForestClassifier(), model__bootstrap=True, model__max_depth=6, model__max_features=sqrt, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=200; accuracy: (test=0.680) weighted_log_loss: (test=-642.775) total time=   4.4s
[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=RandomForestClassifier(), model__bootstrap=True, model__max_depth=6, model__max_features=sqrt, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=200; accuracy: (test=0.675) weighted_log_loss: (test=-636.894) total time=   5.1s
[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=RandomForestClassifier(), model__bootstrap=True, model__max_depth=6, model__max_features=sqrt, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=200; accuracy: (test=0.671) weighted_log_loss: (test=-650.314) total time=   

[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=RandomForestClassifier(), model__bootstrap=True, model__max_depth=None, model__max_features=auto, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=200; accuracy: (test=0.686) weighted_log_loss: (test=-611.374) total time=   6.2s
[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=RandomForestClassifier(), model__bootstrap=True, model__max_depth=None, model__max_features=auto, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=200; accuracy: (test=0.684) weighted_log_loss: (test=-628.097) total time=   6.1s
[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=RandomForestClassifier(), model__bootstrap=True, model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=100; accuracy: (test=0.696) weighted_log_loss: (test=-606.475) total 

[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=RandomForestClassifier(), model__bootstrap=True, model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=200; accuracy: (test=0.684) weighted_log_loss: (test=-632.060) total time=   5.8s
[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=RandomForestClassifier(), model__bootstrap=False, model__max_depth=3, model__max_features=auto, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=100; accuracy: (test=0.670) weighted_log_loss: (test=-663.026) total time=   2.9s
[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=RandomForestClassifier(), model__bootstrap=False, model__max_depth=3, model__max_features=auto, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=100; accuracy: (test=0.671) weighted_log_loss: (test=-668.219) total time=

[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=RandomForestClassifier(), model__bootstrap=False, model__max_depth=3, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=100; accuracy: (test=0.672) weighted_log_loss: (test=-664.619) total time=   3.0s
[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=RandomForestClassifier(), model__bootstrap=False, model__max_depth=3, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=100; accuracy: (test=0.669) weighted_log_loss: (test=-669.558) total time=   3.3s
[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=RandomForestClassifier(), model__bootstrap=False, model__max_depth=3, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=100; accuracy: (test=0.666) weighted_log_loss: (test=-678.916) total time=   

[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=RandomForestClassifier(), model__bootstrap=False, model__max_depth=6, model__max_features=auto, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=100; accuracy: (test=0.683) weighted_log_loss: (test=-635.857) total time=   3.1s
[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=RandomForestClassifier(), model__bootstrap=False, model__max_depth=6, model__max_features=auto, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=100; accuracy: (test=0.671) weighted_log_loss: (test=-648.246) total time=   3.4s
[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=RandomForestClassifier(), model__bootstrap=False, model__max_depth=6, model__max_features=auto, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=200; accuracy: (test=0.679) weighted_log_loss: (test=-636.471) total time=   

[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=RandomForestClassifier(), model__bootstrap=False, model__max_depth=6, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=100; accuracy: (test=0.674) weighted_log_loss: (test=-648.606) total time=   3.5s
[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=RandomForestClassifier(), model__bootstrap=False, model__max_depth=6, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=200; accuracy: (test=0.680) weighted_log_loss: (test=-637.543) total time=   5.9s
[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=RandomForestClassifier(), model__bootstrap=False, model__max_depth=6, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=200; accuracy: (test=0.674) weighted_log_loss: (test=-633.568) total time=   

[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=RandomForestClassifier(), model__bootstrap=False, model__max_depth=None, model__max_features=auto, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=200; accuracy: (test=0.697) weighted_log_loss: (test=-613.014) total time=   9.1s
[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=RandomForestClassifier(), model__bootstrap=False, model__max_depth=None, model__max_features=auto, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=200; accuracy: (test=0.692) weighted_log_loss: (test=-607.297) total time=   8.3s
[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=RandomForestClassifier(), model__bootstrap=False, model__max_depth=None, model__max_features=auto, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=200; accuracy: (test=0.687) weighted_log_loss: (test=-629.310) total

[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=RandomForestClassifier(), model__bootstrap=False, model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=200; accuracy: (test=0.692) weighted_log_loss: (test=-608.815) total time=   8.6s
[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=RandomForestClassifier(), model__bootstrap=False, model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=200; accuracy: (test=0.687) weighted_log_loss: (test=-628.403) total time=  10.1s
[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=RandomForestClassifier(), model__bootstrap=False, model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=20, model__n_estimators=100; accuracy: (test=0.697) weighted_log_loss: (test=-609.598) total

[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=True, model__max_depth=3, model__max_features=auto, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=200; accuracy: (test=0.664) weighted_log_loss: (test=-677.544) total time=   3.7s
[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=True, model__max_depth=3, model__max_features=auto, model__min_samples_leaf=5, model__min_samples_split=20, model__n_estimators=100; accuracy: (test=0.676) weighted_log_loss: (test=-668.875) total time=   2.3s
[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=True, model__max_depth=3, model__max_features=auto, model__min_samples_leaf=5, model__min_samples_split=20, model__n_estimators=100; accuracy: (test=0.668) weighted_log_loss: (test=-668.651) total time=

[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=True, model__max_depth=3, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=20, model__n_estimators=100; accuracy: (test=0.672) weighted_log_loss: (test=-667.909) total time=   2.4s
[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=True, model__max_depth=3, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=20, model__n_estimators=100; accuracy: (test=0.670) weighted_log_loss: (test=-669.066) total time=   2.4s
[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=True, model__max_depth=3, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=20, model__n_estimators=100; accuracy: (test=0.666) weighted_log_loss: (test=-676.118) total time=

[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=True, model__max_depth=6, model__max_features=auto, model__min_samples_leaf=5, model__min_samples_split=20, model__n_estimators=100; accuracy: (test=0.676) weighted_log_loss: (test=-632.890) total time=   2.8s
[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=True, model__max_depth=6, model__max_features=auto, model__min_samples_leaf=5, model__min_samples_split=20, model__n_estimators=100; accuracy: (test=0.673) weighted_log_loss: (test=-650.891) total time=   3.6s
[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=True, model__max_depth=6, model__max_features=auto, model__min_samples_leaf=5, model__min_samples_split=20, model__n_estimators=200; accuracy: (test=0.680) weighted_log_loss: (test=-638.063) total time=

[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=True, model__max_depth=6, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=20, model__n_estimators=100; accuracy: (test=0.675) weighted_log_loss: (test=-650.473) total time=   2.7s
[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=True, model__max_depth=6, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=20, model__n_estimators=200; accuracy: (test=0.680) weighted_log_loss: (test=-636.392) total time=   4.2s
[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=True, model__max_depth=6, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=20, model__n_estimators=200; accuracy: (test=0.675) weighted_log_loss: (test=-635.135) total time=

[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=True, model__max_depth=None, model__max_features=auto, model__min_samples_leaf=5, model__min_samples_split=20, model__n_estimators=200; accuracy: (test=0.696) weighted_log_loss: (test=-608.648) total time=   6.0s
[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=True, model__max_depth=None, model__max_features=auto, model__min_samples_leaf=5, model__min_samples_split=20, model__n_estimators=200; accuracy: (test=0.690) weighted_log_loss: (test=-607.542) total time=   6.1s
[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=True, model__max_depth=None, model__max_features=auto, model__min_samples_leaf=5, model__min_samples_split=20, model__n_estimators=200; accuracy: (test=0.683) weighted_log_loss: (test=-627.016) to

[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=True, model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=20, model__n_estimators=200; accuracy: (test=0.689) weighted_log_loss: (test=-608.890) total time=   5.9s
[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=True, model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=20, model__n_estimators=200; accuracy: (test=0.684) weighted_log_loss: (test=-628.756) total time=   6.0s
[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=True, model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=10, model__min_samples_split=10, model__n_estimators=100; accuracy: (test=0.694) weighted_log_loss: (test=-615.890) t

[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=False, model__max_depth=3, model__max_features=auto, model__min_samples_leaf=5, model__min_samples_split=20, model__n_estimators=200; accuracy: (test=0.666) weighted_log_loss: (test=-674.780) total time=   5.1s
[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=False, model__max_depth=3, model__max_features=auto, model__min_samples_leaf=10, model__min_samples_split=10, model__n_estimators=100; accuracy: (test=0.672) weighted_log_loss: (test=-668.243) total time=   3.2s
[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=False, model__max_depth=3, model__max_features=auto, model__min_samples_leaf=10, model__min_samples_split=10, model__n_estimators=100; accuracy: (test=0.669) weighted_log_loss: (test=-668.061) total 

[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=False, model__max_depth=3, model__max_features=sqrt, model__min_samples_leaf=10, model__min_samples_split=10, model__n_estimators=100; accuracy: (test=0.671) weighted_log_loss: (test=-670.003) total time=   2.6s
[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=False, model__max_depth=3, model__max_features=sqrt, model__min_samples_leaf=10, model__min_samples_split=10, model__n_estimators=100; accuracy: (test=0.670) weighted_log_loss: (test=-664.501) total time=   2.5s
[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=False, model__max_depth=3, model__max_features=sqrt, model__min_samples_leaf=10, model__min_samples_split=10, model__n_estimators=100; accuracy: (test=0.665) weighted_log_loss: (test=-676.212) total

[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=False, model__max_depth=6, model__max_features=auto, model__min_samples_leaf=10, model__min_samples_split=10, model__n_estimators=100; accuracy: (test=0.675) weighted_log_loss: (test=-633.349) total time=   4.5s
[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=False, model__max_depth=6, model__max_features=auto, model__min_samples_leaf=10, model__min_samples_split=10, model__n_estimators=100; accuracy: (test=0.672) weighted_log_loss: (test=-647.255) total time=   3.1s
[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=False, model__max_depth=6, model__max_features=auto, model__min_samples_leaf=10, model__min_samples_split=10, model__n_estimators=200; accuracy: (test=0.680) weighted_log_loss: (test=-636.868) total

[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=False, model__max_depth=6, model__max_features=sqrt, model__min_samples_leaf=10, model__min_samples_split=10, model__n_estimators=100; accuracy: (test=0.671) weighted_log_loss: (test=-647.412) total time=   3.0s
[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=False, model__max_depth=6, model__max_features=sqrt, model__min_samples_leaf=10, model__min_samples_split=10, model__n_estimators=200; accuracy: (test=0.679) weighted_log_loss: (test=-638.437) total time=   5.2s
[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=False, model__max_depth=6, model__max_features=sqrt, model__min_samples_leaf=10, model__min_samples_split=10, model__n_estimators=200; accuracy: (test=0.675) weighted_log_loss: (test=-635.961) total

[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=False, model__max_depth=None, model__max_features=auto, model__min_samples_leaf=10, model__min_samples_split=10, model__n_estimators=200; accuracy: (test=0.696) weighted_log_loss: (test=-613.143) total time=   8.8s
[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=False, model__max_depth=None, model__max_features=auto, model__min_samples_leaf=10, model__min_samples_split=10, model__n_estimators=200; accuracy: (test=0.687) weighted_log_loss: (test=-609.513) total time=   8.7s
[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=False, model__max_depth=None, model__max_features=auto, model__min_samples_leaf=10, model__min_samples_split=10, model__n_estimators=200; accuracy: (test=0.684) weighted_log_loss: (test=-628.4

[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=False, model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=10, model__min_samples_split=10, model__n_estimators=200; accuracy: (test=0.689) weighted_log_loss: (test=-610.580) total time=   9.0s
[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=False, model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=10, model__min_samples_split=10, model__n_estimators=200; accuracy: (test=0.684) weighted_log_loss: (test=-627.193) total time=   7.9s
[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=RandomForestClassifier(), model__bootstrap=False, model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=100; accuracy: (test=0.695) weighted_log_loss: (test=-611.3

[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=3, model__max_features=auto, model__min_samples_leaf=5, model__min_samples_split=20, model__n_estimators=100, model__subsample=1; accuracy: (test=0.687) weighted_log_loss: (test=-609.181) total time=  36.8s
[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=3, model__max_features=auto, model__min_samples_leaf=5, model__min_samples_split=20, model__n_estimators=100, model__subsample=1; accuracy: (test=0.683) weighted_log_loss: (test=-619.716) total time=  36.5s
[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=3, model__max_features=auto, model__min_samples_leaf=5, model__min_samples_split=20, model__n_estimators=200, model__subsamp

[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=3, model__max_features=auto, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=100, model__subsample=1; accuracy: (test=0.693) weighted_log_loss: (test=-607.450) total time=  36.8s
[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=3, model__max_features=auto, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=100, model__subsample=1; accuracy: (test=0.687) weighted_log_loss: (test=-603.625) total time=  37.3s
[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=3, model__max_features=auto, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=100, model__subs

[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=3, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=20, model__n_estimators=100, model__subsample=0.8; accuracy: (test=0.684) weighted_log_loss: (test=-624.073) total time=   7.7s
[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=3, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=20, model__n_estimators=100, model__subsample=1; accuracy: (test=0.692) weighted_log_loss: (test=-608.264) total time=   8.3s
[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=3, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=20, model__n_estimators=100, model__subsa

[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=3, model__max_features=sqrt, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=100, model__subsample=0.8; accuracy: (test=0.685) weighted_log_loss: (test=-606.786) total time=   8.1s
[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=3, model__max_features=sqrt, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=100, model__subsample=0.8; accuracy: (test=0.683) weighted_log_loss: (test=-629.554) total time=   9.4s
[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=3, model__max_features=sqrt, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=100, model__

[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=6, model__max_features=auto, model__min_samples_leaf=5, model__min_samples_split=20, model__n_estimators=100, model__subsample=0.8; accuracy: (test=0.697) weighted_log_loss: (test=-624.628) total time=  46.7s
[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=6, model__max_features=auto, model__min_samples_leaf=5, model__min_samples_split=20, model__n_estimators=100, model__subsample=0.8; accuracy: (test=0.688) weighted_log_loss: (test=-636.593) total time=  49.7s
[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=6, model__max_features=auto, model__min_samples_leaf=5, model__min_samples_split=20, model__n_estimators=100, model__sub

[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=6, model__max_features=auto, model__min_samples_leaf=10, model__min_samples_split=10, model__n_estimators=200, model__subsample=1; accuracy: (test=0.685) weighted_log_loss: (test=-658.795) total time= 2.1min
[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=6, model__max_features=auto, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=100, model__subsample=0.8; accuracy: (test=0.697) weighted_log_loss: (test=-634.871) total time=  47.7s
[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=6, model__max_features=auto, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=100, model__su

[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=6, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=200, model__subsample=1; accuracy: (test=0.687) weighted_log_loss: (test=-628.890) total time=  25.2s
[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=6, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=200, model__subsample=1; accuracy: (test=0.689) weighted_log_loss: (test=-636.753) total time=  25.6s
[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=6, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=20, model__n_estimators=100, model__subsamp

[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=6, model__max_features=sqrt, model__min_samples_leaf=10, model__min_samples_split=10, model__n_estimators=200, model__subsample=1; accuracy: (test=0.697) weighted_log_loss: (test=-623.706) total time=  24.8s
[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=6, model__max_features=sqrt, model__min_samples_leaf=10, model__min_samples_split=10, model__n_estimators=200, model__subsample=1; accuracy: (test=0.689) weighted_log_loss: (test=-621.314) total time=  26.3s
[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=6, model__max_features=sqrt, model__min_samples_leaf=10, model__min_samples_split=10, model__n_estimators=200, model__subs

[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=None, model__max_features=auto, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=200, model__subsample=0.8; accuracy: (test=0.651) weighted_log_loss: (test=-1959.802) total time= 4.2min
[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=None, model__max_features=auto, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=200, model__subsample=1; accuracy: (test=0.648) weighted_log_loss: (test=-2101.323) total time= 6.8min
[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=None, model__max_features=auto, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=200, m

[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=None, model__max_features=auto, model__min_samples_leaf=10, model__min_samples_split=10, model__n_estimators=200, model__subsample=0.8; accuracy: (test=0.647) weighted_log_loss: (test=-1607.985) total time= 2.8min
[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=None, model__max_features=auto, model__min_samples_leaf=10, model__min_samples_split=10, model__n_estimators=200, model__subsample=0.8; accuracy: (test=0.651) weighted_log_loss: (test=-1614.313) total time= 2.7min
[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=None, model__max_features=auto, model__min_samples_leaf=10, model__min_samples_split=10, model__n_estimators=2

[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=200, model__subsample=0.8; accuracy: (test=0.652) weighted_log_loss: (test=-1205.193) total time=  57.5s
[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=200, model__subsample=0.8; accuracy: (test=0.656) weighted_log_loss: (test=-1200.245) total time=  56.0s
[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=200,

[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=10, model__min_samples_split=10, model__n_estimators=100, model__subsample=1; accuracy: (test=0.670) weighted_log_loss: (test=-825.651) total time=  27.7s
[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=10, model__min_samples_split=10, model__n_estimators=200, model__subsample=0.8; accuracy: (test=0.663) weighted_log_loss: (test=-999.406) total time=  40.7s
[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=10, model__min_samples_split=10, model__n_estimators=200, 

[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.2, model__max_depth=3, model__max_features=auto, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=100, model__subsample=1; accuracy: (test=0.687) weighted_log_loss: (test=-615.045) total time=  29.8s
[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.2, model__max_depth=3, model__max_features=auto, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=100, model__subsample=1; accuracy: (test=0.683) weighted_log_loss: (test=-622.853) total time=  27.6s
[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.2, model__max_depth=3, model__max_features=auto, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=200, model__subsamp

[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.2, model__max_depth=3, model__max_features=auto, model__min_samples_leaf=10, model__min_samples_split=10, model__n_estimators=100, model__subsample=1; accuracy: (test=0.695) weighted_log_loss: (test=-621.989) total time=  28.1s
[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.2, model__max_depth=3, model__max_features=auto, model__min_samples_leaf=10, model__min_samples_split=10, model__n_estimators=100, model__subsample=1; accuracy: (test=0.689) weighted_log_loss: (test=-606.363) total time=  27.8s
[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.2, model__max_depth=3, model__max_features=auto, model__min_samples_leaf=10, model__min_samples_split=10, model__n_estimators=100, model__subs

[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.2, model__max_depth=3, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=100, model__subsample=0.8; accuracy: (test=0.685) weighted_log_loss: (test=-628.284) total time=   6.5s
[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.2, model__max_depth=3, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=100, model__subsample=1; accuracy: (test=0.693) weighted_log_loss: (test=-609.775) total time=   7.2s
[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.2, model__max_depth=3, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=100, model__subsa

[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.2, model__max_depth=3, model__max_features=sqrt, model__min_samples_leaf=10, model__min_samples_split=10, model__n_estimators=100, model__subsample=0.8; accuracy: (test=0.686) weighted_log_loss: (test=-599.682) total time=   6.3s
[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.2, model__max_depth=3, model__max_features=sqrt, model__min_samples_leaf=10, model__min_samples_split=10, model__n_estimators=100, model__subsample=0.8; accuracy: (test=0.685) weighted_log_loss: (test=-620.371) total time=   6.4s
[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.2, model__max_depth=3, model__max_features=sqrt, model__min_samples_leaf=10, model__min_samples_split=10, model__n_estimators=100, model__

[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.2, model__max_depth=6, model__max_features=auto, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=100, model__subsample=0.8; accuracy: (test=0.689) weighted_log_loss: (test=-709.825) total time=  34.3s
[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.2, model__max_depth=6, model__max_features=auto, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=100, model__subsample=0.8; accuracy: (test=0.676) weighted_log_loss: (test=-684.865) total time=  34.0s
[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.2, model__max_depth=6, model__max_features=auto, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=100, model__sub

[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.2, model__max_depth=6, model__max_features=auto, model__min_samples_leaf=5, model__min_samples_split=20, model__n_estimators=200, model__subsample=1; accuracy: (test=0.681) weighted_log_loss: (test=-742.495) total time= 1.6min
[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.2, model__max_depth=6, model__max_features=auto, model__min_samples_leaf=10, model__min_samples_split=10, model__n_estimators=100, model__subsample=0.8; accuracy: (test=0.695) weighted_log_loss: (test=-703.508) total time=  35.3s
[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.2, model__max_depth=6, model__max_features=auto, model__min_samples_leaf=10, model__min_samples_split=10, model__n_estimators=100, model__sub

[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.2, model__max_depth=6, model__max_features=auto, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=200, model__subsample=1; accuracy: (test=0.675) weighted_log_loss: (test=-777.604) total time= 1.5min
[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.2, model__max_depth=6, model__max_features=auto, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=200, model__subsample=1; accuracy: (test=0.680) weighted_log_loss: (test=-753.335) total time= 1.7min
[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.2, model__max_depth=6, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=100, model__subsa

[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.2, model__max_depth=6, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=20, model__n_estimators=200, model__subsample=1; accuracy: (test=0.695) weighted_log_loss: (test=-689.781) total time=  19.9s
[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.2, model__max_depth=6, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=20, model__n_estimators=200, model__subsample=1; accuracy: (test=0.684) weighted_log_loss: (test=-678.677) total time=  19.2s
[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.2, model__max_depth=6, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=20, model__n_estimators=200, model__subsamp

[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.2, model__max_depth=6, model__max_features=sqrt, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=200, model__subsample=0.8; accuracy: (test=0.685) weighted_log_loss: (test=-677.647) total time=  17.0s
[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.2, model__max_depth=6, model__max_features=sqrt, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=200, model__subsample=1; accuracy: (test=0.694) weighted_log_loss: (test=-674.795) total time=  18.8s
[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.2, model__max_depth=6, model__max_features=sqrt, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=200, model__su

[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.2, model__max_depth=None, model__max_features=auto, model__min_samples_leaf=5, model__min_samples_split=20, model__n_estimators=200, model__subsample=0.8; accuracy: (test=0.642) weighted_log_loss: (test=-2858.598) total time= 2.9min
[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.2, model__max_depth=None, model__max_features=auto, model__min_samples_leaf=5, model__min_samples_split=20, model__n_estimators=200, model__subsample=0.8; accuracy: (test=0.644) weighted_log_loss: (test=-2815.662) total time= 3.1min
[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.2, model__max_depth=None, model__max_features=auto, model__min_samples_leaf=5, model__min_samples_split=20, model__n_estimators=200,

[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.2, model__max_depth=None, model__max_features=auto, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=200, model__subsample=0.8; accuracy: (test=0.647) weighted_log_loss: (test=-2558.814) total time= 2.7min
[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.2, model__max_depth=None, model__max_features=auto, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=200, model__subsample=0.8; accuracy: (test=0.648) weighted_log_loss: (test=-2586.508) total time= 2.6min
[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.2, model__max_depth=None, model__max_features=auto, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=2

[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.2, model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=20, model__n_estimators=100, model__subsample=1; accuracy: (test=0.651) weighted_log_loss: (test=-1303.544) total time=  34.9s
[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.2, model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=20, model__n_estimators=200, model__subsample=0.8; accuracy: (test=0.647) weighted_log_loss: (test=-1801.002) total time=  49.8s
[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.2, model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=20, model__n_estimators=200, m

[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.2, model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=100, model__subsample=1; accuracy: (test=0.656) weighted_log_loss: (test=-1076.550) total time=  27.8s
[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.2, model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=100, model__subsample=1; accuracy: (test=0.658) weighted_log_loss: (test=-1119.930) total time=  28.1s
[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=mean, model=GradientBoostingClassifier(), model__learning_rate=0.2, model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=200, 

[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=3, model__max_features=auto, model__min_samples_leaf=5, model__min_samples_split=20, model__n_estimators=100, model__subsample=1; accuracy: (test=0.692) weighted_log_loss: (test=-608.226) total time=  26.7s
[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=3, model__max_features=auto, model__min_samples_leaf=5, model__min_samples_split=20, model__n_estimators=100, model__subsample=1; accuracy: (test=0.687) weighted_log_loss: (test=-609.158) total time=  26.8s
[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=3, model__max_features=auto, model__min_samples_leaf=5, model__min_samples_split=20, model__n_estimators=100, model__s

[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=3, model__max_features=auto, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=100, model__subsample=0.8; accuracy: (test=0.685) weighted_log_loss: (test=-619.923) total time=  20.6s
[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=3, model__max_features=auto, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=100, model__subsample=1; accuracy: (test=0.693) weighted_log_loss: (test=-607.450) total time=  27.4s
[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=3, model__max_features=auto, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=100, mod

[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=3, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=20, model__n_estimators=100, model__subsample=0.8; accuracy: (test=0.686) weighted_log_loss: (test=-606.145) total time=   6.3s
[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=3, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=20, model__n_estimators=100, model__subsample=0.8; accuracy: (test=0.684) weighted_log_loss: (test=-621.194) total time=   6.4s
[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=3, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=20, model__n_estimators=100, mode

[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=3, model__max_features=sqrt, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=100, model__subsample=0.8; accuracy: (test=0.689) weighted_log_loss: (test=-610.481) total time=   6.2s
[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=3, model__max_features=sqrt, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=100, model__subsample=0.8; accuracy: (test=0.685) weighted_log_loss: (test=-605.269) total time=   6.3s
[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=3, model__max_features=sqrt, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=100, m

[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=6, model__max_features=auto, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=200, model__subsample=1; accuracy: (test=0.683) weighted_log_loss: (test=-684.009) total time= 1.5min
[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=6, model__max_features=auto, model__min_samples_leaf=5, model__min_samples_split=20, model__n_estimators=100, model__subsample=0.8; accuracy: (test=0.696) weighted_log_loss: (test=-635.306) total time=  33.5s
[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=6, model__max_features=auto, model__min_samples_leaf=5, model__min_samples_split=20, model__n_estimators=100, model_

[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=6, model__max_features=auto, model__min_samples_leaf=10, model__min_samples_split=10, model__n_estimators=200, model__subsample=1; accuracy: (test=0.684) weighted_log_loss: (test=-669.694) total time= 1.5min
[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=6, model__max_features=auto, model__min_samples_leaf=10, model__min_samples_split=10, model__n_estimators=200, model__subsample=1; accuracy: (test=0.686) weighted_log_loss: (test=-659.080) total time= 1.6min
[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=6, model__max_features=auto, model__min_samples_leaf=10, model__min_samples_split=20, model__n_estimators=100, model

[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=6, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=200, model__subsample=1; accuracy: (test=0.695) weighted_log_loss: (test=-631.089) total time=  21.6s
[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=6, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=200, model__subsample=1; accuracy: (test=0.687) weighted_log_loss: (test=-635.503) total time=  22.8s
[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=6, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=200, model__s

[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=6, model__max_features=sqrt, model__min_samples_leaf=10, model__min_samples_split=10, model__n_estimators=200, model__subsample=0.8; accuracy: (test=0.686) weighted_log_loss: (test=-632.101) total time=  19.8s
[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=6, model__max_features=sqrt, model__min_samples_leaf=10, model__min_samples_split=10, model__n_estimators=200, model__subsample=1; accuracy: (test=0.695) weighted_log_loss: (test=-621.531) total time=  22.1s
[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=6, model__max_features=sqrt, model__min_samples_leaf=10, model__min_samples_split=10, model__n_estimators=200, mod

[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=None, model__max_features=auto, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=200, model__subsample=0.8; accuracy: (test=0.649) weighted_log_loss: (test=-1939.210) total time= 3.7min
[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=None, model__max_features=auto, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=200, model__subsample=0.8; accuracy: (test=0.650) weighted_log_loss: (test=-1957.703) total time= 3.7min
[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=None, model__max_features=auto, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimator

[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=None, model__max_features=auto, model__min_samples_leaf=10, model__min_samples_split=10, model__n_estimators=200, model__subsample=0.8; accuracy: (test=0.646) weighted_log_loss: (test=-1599.120) total time= 2.6min
[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=None, model__max_features=auto, model__min_samples_leaf=10, model__min_samples_split=10, model__n_estimators=200, model__subsample=0.8; accuracy: (test=0.651) weighted_log_loss: (test=-1589.452) total time= 2.6min
[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=None, model__max_features=auto, model__min_samples_leaf=10, model__min_samples_split=10, model__n_estima

[CV 3/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=100, model__subsample=1; accuracy: (test=0.659) weighted_log_loss: (test=-931.557) total time=  42.4s
[CV 1/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=200, model__subsample=0.8; accuracy: (test=0.653) weighted_log_loss: (test=-1197.073) total time= 1.2min
[CV 2/3] END feature_pipeline__numerical_pipeline__imputer__strategy=median, model=GradientBoostingClassifier(), model__learning_rate=0.1, model__max_depth=None, model__max_features=sqrt, model__min_samples_leaf=5, model__min_samples_split=10, model__n_estimators=2

KeyboardInterrupt: 

In [83]:
y_test_proba.shape

(25000, 4)

In [94]:
y_train_pred = grid_search.best_estimator_.predict(train)

In [84]:
y_train_proba = grid_search.best_estimator_.predict_proba(train)

In [85]:
y_train_proba

array([[0.72785104, 0.16318674, 0.10048871, 0.00847351],
       [0.23296672, 0.61030256, 0.15005561, 0.0066751 ],
       [0.64130536, 0.27976506, 0.07512784, 0.00380175],
       ...,
       [0.65369103, 0.23053657, 0.11207564, 0.00369675],
       [0.61503672, 0.28276355, 0.09689031, 0.00530942],
       [0.77655041, 0.15363171, 0.06708654, 0.00273134]])

In [96]:
concat = pd.concat([train['target'], pd.DataFrame(y_train_pred)], axis=1)

In [100]:
concat[concat.target == concat[0]].shape

(17115, 2)

In [101]:
concat[concat.target != concat[0]].shape

(7885, 2)

In [93]:
pd.concat([train['target'], pd.DataFrame(y_train_proba)], axis=1).sample(10)

Unnamed: 0,target,0,1,2,3
4458,1,0.232217,0.52669,0.229626,0.011467
6751,0,0.64457,0.276,0.075882,0.003548
19817,1,0.194192,0.546493,0.245437,0.013878
9550,1,0.182926,0.680154,0.127221,0.009699
10457,3,0.129642,0.195173,0.61314,0.062045
10304,1,0.221713,0.628317,0.144748,0.005222
18076,1,0.255467,0.464087,0.266129,0.014318
13481,1,0.124946,0.528763,0.319184,0.027106
4376,1,0.248021,0.578446,0.167814,0.005719
13057,0,0.467075,0.321375,0.195706,0.015844
