# Grid search

In this section, we will create a huge pipeline which takes the raw data as an input, concatenates it, preprocesses it, trains a model, predics, and finally scores it. It will also use grid search to tune the parameters. Let's get into it, shall we?

In [1]:
import sys
sys.path.append('scripts')

%matplotlib inline

import random as rnd
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb
import process
import transforms
from multiprocessing import Pool, Queue
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
from sklearn.metrics import f1_score
from sklearn.externals import joblib

sb.set_style('dark')

First let's load the data. We'll load both the training and the testing dataset.

In [2]:
data_train, target_train = process.load_dataset_target('data/train.csv', 'data/train_target.csv')

In [3]:
data_val, target_val = process.load_dataset_target('data/valid.csv', 'data/valid_target.csv')

In [4]:
data = pd.concat([data_train, data_val])
target = pd.concat([target_train, target_val])

In [5]:
target = target[0]

Now we have the data, let's create the pipe.

In [6]:
pipe = make_pipeline(
    transforms.LowVarianceRemover(),
    PCA(0.95),
    transforms.BoxcoxTransform(),
    StandardScaler(),
    ExtraTreesClassifier(n_jobs=-1, verbose=2)
)
pipe.steps

[('lowvarianceremover', LowVarianceRemover()),
 ('pca',
  PCA(copy=True, iterated_power='auto', n_components=0.95, random_state=None,
    svd_solver='auto', tol=0.0, whiten=False)),
 ('boxcoxtransform', BoxcoxTransform()),
 ('standardscaler', StandardScaler(copy=True, with_mean=True, with_std=True)),
 ('extratreesclassifier',
  ExtraTreesClassifier(bootstrap=False, class_weight=None, criterion='gini',
             max_depth=None, max_features='auto', max_leaf_nodes=None,
             min_impurity_split=1e-07, min_samples_leaf=1,
             min_samples_split=2, min_weight_fraction_leaf=0.0,
             n_estimators=10, n_jobs=-1, oob_score=False, random_state=None,
             verbose=2, warm_start=False))]

And now the model. We'll use a cross-validated grid search.

In [7]:
%%time
model = GridSearchCV(pipe, dict(pca__n_components=[0.9, 0.95], extratreesclassifier__max_depth=[1, 2, 3]), n_jobs=-1, verbose=2, error_score=0)
model.fit(data, target)

Fitting 3 folds for each of 6 candidates, totalling 18 fits
[CV] pca__n_components=0.9, extratreesclassifier__max_depth=1 ........
[CV] pca__n_components=0.9, extratreesclassifier__max_depth=1 ........
[CV] pca__n_components=0.9, extratreesclassifier__max_depth=1 ........
[CV] pca__n_components=0.95, extratreesclassifier__max_depth=1 .......
[CV] pca__n_components=0.95, extratreesclassifier__max_depth=1 .......
[CV] pca__n_components=0.95, extratreesclassifier__max_depth=1 .......
[CV] pca__n_components=0.9, extratreesclassifier__max_depth=2 ........
[CV] pca__n_components=0.9, extratreesclassifier__max_depth=2 ........
building tree 5 of 10
building tree 6 of 10
building tree 3 of 10
building tree 4 of 10
building tree 7 of 10
building tree 1 of 10
building tree 2 of 10
building tree 8 of 10
building tree 9 of 10
building tree 10 of 10


[Parallel(n_jobs=-1)]: Done   7 out of  10 | elapsed:    0.6s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done  10 out of  10 | elapsed:    0.9s finished


building tree 1 of 10
building tree 4 of 10
building tree 2 of 10
building tree 6 of 10
building tree 7 of 10
building tree 8 of 10
building tree 5 of 10
building tree 3 of 10


[Parallel(n_jobs=8)]: Done   7 out of  10 | elapsed:    0.4s remaining:    0.2s


building tree 9 of 10
building tree 10 of 10


[Parallel(n_jobs=8)]: Done  10 out of  10 | elapsed:    0.6s finished
[Parallel(n_jobs=-1)]: Done   7 out of  10 | elapsed:    0.8s remaining:    0.4s
[Parallel(n_jobs=-1)]: Done  10 out of  10 | elapsed:    1.1s finished


building tree 2 of 10
building tree 1 of 10
building tree 6 of 10
building tree 5 of 10
building tree 4 of 10
building tree 3 of 10
building tree 8 of 10
building tree 7 of 10
building tree 10 of 10
building tree 9 of 10


[Parallel(n_jobs=-1)]: Done   7 out of  10 | elapsed:    0.7s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done  10 out of  10 | elapsed:    0.9s finished
[Parallel(n_jobs=8)]: Done   7 out of  10 | elapsed:    0.3s remaining:    0.1s
[Parallel(n_jobs=8)]: Done  10 out of  10 | elapsed:    0.5s finished


building tree 4 of 10
building tree 6 of 10
building tree 5 of 10
building tree 3 of 10
building tree 7 of 10
building tree 1 of 10
building tree 8 of 10
building tree 2 of 10
building tree 9 of 10
building tree 10 of 10


[Parallel(n_jobs=-1)]: Done   7 out of  10 | elapsed:    0.6s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done  10 out of  10 | elapsed:    0.9s finished


building tree 1 of 10
building tree 2 of 10
building tree 3 of 10
building tree 5 of 10
building tree 8 of 10
building tree 7 of 10
building tree 6 of 10
building tree 4 of 10
building tree 10 of 10
building tree 9 of 10


[Parallel(n_jobs=-1)]: Done   7 out of  10 | elapsed:    0.6s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done  10 out of  10 | elapsed:    0.8s finished


building tree 7 of 10
building tree 2 of 10
building tree 1 of 10
building tree 8 of 10
building tree 4 of 10
building tree 3 of 10
building tree 6 of 10
building tree 5 of 10
building tree 10 of 10
building tree 9 of 10


[Parallel(n_jobs=-1)]: Done   7 out of  10 | elapsed:    0.6s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done  10 out of  10 | elapsed:    0.9s finished
[Parallel(n_jobs=8)]: Done   7 out of  10 | elapsed:    0.3s remaining:    0.1s
[Parallel(n_jobs=8)]: Done  10 out of  10 | elapsed:    0.5s finished
[Parallel(n_jobs=8)]: Done   7 out of  10 | elapsed:    0.9s remaining:    0.4s
[Parallel(n_jobs=8)]: Done  10 out of  10 | elapsed:    1.3s finished


[CV]  pca__n_components=0.9, extratreesclassifier__max_depth=1, total=  57.2s
[CV] pca__n_components=0.9, extratreesclassifier__max_depth=2 ........


[Parallel(n_jobs=8)]: Done   7 out of  10 | elapsed:    0.3s remaining:    0.1s


building tree 5 of 10
building tree 4 of 10
building tree 1 of 10
building tree 2 of 10
building tree 6 of 10
building tree 7 of 10
building tree 8 of 10
building tree 3 of 10


[Parallel(n_jobs=8)]: Done  10 out of  10 | elapsed:    0.6s finished


building tree 10 of 10
building tree 9 of 10


[Parallel(n_jobs=-1)]: Done   7 out of  10 | elapsed:    0.7s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done  10 out of  10 | elapsed:    0.9s finished
[Parallel(n_jobs=8)]: Done   7 out of  10 | elapsed:    0.5s remaining:    0.2s
[Parallel(n_jobs=8)]: Done  10 out of  10 | elapsed:    0.6s finished


building tree 4 of 10
building tree 6 of 10
building tree 2 of 10
building tree 7 of 10
building tree 5 of 10
building tree 8 of 10
building tree 3 of 10
building tree 1 of 10
building tree 9 of 10
building tree 10 of 10


[Parallel(n_jobs=-1)]: Done   7 out of  10 | elapsed:    0.6s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done  10 out of  10 | elapsed:    0.8s finished
[Parallel(n_jobs=8)]: Done   7 out of  10 | elapsed:    1.0s remaining:    0.4s
[Parallel(n_jobs=8)]: Done  10 out of  10 | elapsed:    1.4s finished


[CV]  pca__n_components=0.9, extratreesclassifier__max_depth=1, total= 1.1min


[Parallel(n_jobs=8)]: Done   7 out of  10 | elapsed:    0.3s remaining:    0.1s


[CV] pca__n_components=0.95, extratreesclassifier__max_depth=2 .......


[Parallel(n_jobs=8)]: Done  10 out of  10 | elapsed:    0.4s finished
[Parallel(n_jobs=8)]: Done   7 out of  10 | elapsed:    1.7s remaining:    0.7s
[Parallel(n_jobs=8)]: Done  10 out of  10 | elapsed:    2.0s finished


[CV]  pca__n_components=0.9, extratreesclassifier__max_depth=2, total= 1.1min
[CV] pca__n_components=0.95, extratreesclassifier__max_depth=2 .......


[Parallel(n_jobs=8)]: Done   7 out of  10 | elapsed:    0.3s remaining:    0.1s
[Parallel(n_jobs=8)]: Done  10 out of  10 | elapsed:    0.5s finished
[Parallel(n_jobs=8)]: Done   7 out of  10 | elapsed:    1.2s remaining:    0.5s
[Parallel(n_jobs=8)]: Done  10 out of  10 | elapsed:    1.8s finished
[Parallel(n_jobs=8)]: Done   7 out of  10 | elapsed:    1.3s remaining:    0.5s
[Parallel(n_jobs=8)]: Done  10 out of  10 | elapsed:    1.8s finished
[Parallel(n_jobs=8)]: Done   7 out of  10 | elapsed:    0.6s remaining:    0.3s


[CV]  pca__n_components=0.95, extratreesclassifier__max_depth=1, total= 1.4min
[CV] pca__n_components=0.95, extratreesclassifier__max_depth=2 .......


[Parallel(n_jobs=8)]: Done  10 out of  10 | elapsed:    0.9s finished


[CV]  pca__n_components=0.95, extratreesclassifier__max_depth=1, total= 1.4min
[CV] pca__n_components=0.9, extratreesclassifier__max_depth=3 ........
building tree 1 of 10
building tree 2 of 10
building tree 3 of 10
building tree 5 of 10
building tree 6 of 10
building tree 7 of 10
building tree 4 of 10
building tree 8 of 10
building tree 10 of 10
building tree 9 of 10


[Parallel(n_jobs=-1)]: Done   7 out of  10 | elapsed:    0.6s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done  10 out of  10 | elapsed:    0.8s finished
[Parallel(n_jobs=8)]: Done   7 out of  10 | elapsed:    0.8s remaining:    0.3s
[Parallel(n_jobs=8)]: Done  10 out of  10 | elapsed:    1.3s finished
[Parallel(n_jobs=8)]: Done   7 out of  10 | elapsed:    0.6s remaining:    0.3s
[Parallel(n_jobs=8)]: Done  10 out of  10 | elapsed:    1.0s finished


[CV]  pca__n_components=0.9, extratreesclassifier__max_depth=1, total= 1.5min
[CV] pca__n_components=0.9, extratreesclassifier__max_depth=3 ........


[Parallel(n_jobs=8)]: Done   7 out of  10 | elapsed:    1.7s remaining:    0.7s
[Parallel(n_jobs=8)]: Done  10 out of  10 | elapsed:    2.4s finished


building tree 1 of 10
building tree 7 of 10
building tree 3 of 10
building tree 2 of 10
building tree 4 of 10
building tree 5 of 10
building tree 6 of 10
building tree 8 of 10
building tree 10 of 10
building tree 9 of 10


[Parallel(n_jobs=-1)]: Done   7 out of  10 | elapsed:    1.0s remaining:    0.4s
[Parallel(n_jobs=-1)]: Done  10 out of  10 | elapsed:    1.6s finished
[Parallel(n_jobs=8)]: Done   7 out of  10 | elapsed:    0.7s remaining:    0.3s


[CV]  pca__n_components=0.95, extratreesclassifier__max_depth=1, total= 1.7min


[Parallel(n_jobs=8)]: Done  10 out of  10 | elapsed:    1.1s finished


[CV] pca__n_components=0.9, extratreesclassifier__max_depth=3 ........
[CV]  pca__n_components=0.9, extratreesclassifier__max_depth=2, total= 1.7min
[CV] pca__n_components=0.95, extratreesclassifier__max_depth=3 .......
building tree 3 of 10
building tree 8 of 10
building tree 6 of 10
building tree 1 of 10
building tree 2 of 10
building tree 7 of 10
building tree 4 of 10
building tree 5 of 10


[Parallel(n_jobs=8)]: Done   7 out of  10 | elapsed:    1.7s remaining:    0.7s


building tree 9 of 10
building tree 10 of 10


[Parallel(n_jobs=-1)]: Done   7 out of  10 | elapsed:    1.6s remaining:    0.7s
[Parallel(n_jobs=8)]: Done  10 out of  10 | elapsed:    2.2s finished
[Parallel(n_jobs=-1)]: Done  10 out of  10 | elapsed:    1.8s finished


[CV]  pca__n_components=0.9, extratreesclassifier__max_depth=2, total= 1.1min
[CV] pca__n_components=0.95, extratreesclassifier__max_depth=3 .......
building tree 7 of 10
building tree 5 of 10
building tree 4 of 10
building tree 3 of 10
building tree 1 of 10
building tree 2 of 10
building tree 8 of 10
building tree 6 of 10
building tree 10 of 10
building tree 9 of 10


[Parallel(n_jobs=-1)]: Done   7 out of  10 | elapsed:    0.7s remaining:    0.3s
[Parallel(n_jobs=8)]: Done   7 out of  10 | elapsed:    0.3s remaining:    0.1s
[Parallel(n_jobs=8)]: Done  10 out of  10 | elapsed:    0.6s finished
[Parallel(n_jobs=-1)]: Done  10 out of  10 | elapsed:    1.4s finished


building tree 2 of 10
building tree 3 of 10
building tree 8 of 10
building tree 5 of 10
building tree 1 of 10
building tree 6 of 10
building tree 7 of 10
building tree 4 of 10
building tree 10 of 10
building tree 9 of 10


[Parallel(n_jobs=-1)]: Done   7 out of  10 | elapsed:    0.6s remaining:    0.2s
[Parallel(n_jobs=-1)]: Done  10 out of  10 | elapsed:    0.9s finished
[Parallel(n_jobs=8)]: Done   7 out of  10 | elapsed:    0.3s remaining:    0.1s
[Parallel(n_jobs=8)]: Done  10 out of  10 | elapsed:    0.4s finished
[Parallel(n_jobs=8)]: Done   7 out of  10 | elapsed:    0.2s remaining:    0.1s
[Parallel(n_jobs=8)]: Done  10 out of  10 | elapsed:    0.4s finished
[Parallel(n_jobs=8)]: Done   7 out of  10 | elapsed:    0.5s remaining:    0.2s
[Parallel(n_jobs=8)]: Done  10 out of  10 | elapsed:    0.7s finished


building tree 3 of 10
building tree 1 of 10
building tree 6 of 10
building tree 7 of 10
building tree 2 of 10
building tree 5 of 10
building tree 4 of 10
building tree 8 of 10
building tree 9 of 10
building tree 10 of 10


[Parallel(n_jobs=-1)]: Done   7 out of  10 | elapsed:    0.7s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done  10 out of  10 | elapsed:    1.0s finished
[Parallel(n_jobs=8)]: Done   7 out of  10 | elapsed:    0.9s remaining:    0.4s
[Parallel(n_jobs=8)]: Done  10 out of  10 | elapsed:    1.3s finished
[Parallel(n_jobs=8)]: Done   7 out of  10 | elapsed:    0.8s remaining:    0.4s


[CV]  pca__n_components=0.9, extratreesclassifier__max_depth=3, total= 1.1min


[Parallel(n_jobs=8)]: Done  10 out of  10 | elapsed:    1.2s finished


[CV] pca__n_components=0.95, extratreesclassifier__max_depth=3 .......
[CV]  pca__n_components=0.95, extratreesclassifier__max_depth=2, total= 1.4min


[Parallel(n_jobs=8)]: Done   7 out of  10 | elapsed:    0.3s remaining:    0.1s
[Parallel(n_jobs=8)]: Done  10 out of  10 | elapsed:    0.5s finished


building tree 4 of 10
building tree 2 of 10
building tree 3 of 10
building tree 5 of 10
building tree 6 of 10
building tree 7 of 10
building tree 1 of 10
building tree 8 of 10
building tree 9 of 10
building tree 10 of 10


[Parallel(n_jobs=-1)]: Done   7 out of  10 | elapsed:    0.7s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done  10 out of  10 | elapsed:    0.9s finished


building tree 2 of 10
building tree 3 of 10
building tree 4 of 10
building tree 5 of 10
building tree 6 of 10
building tree 7 of 10
building tree 1 of 10
building tree 8 of 10
building tree 10 of 10
building tree 9 of 10


[Parallel(n_jobs=-1)]: Done   7 out of  10 | elapsed:    0.9s remaining:    0.4s
[Parallel(n_jobs=-1)]: Done  10 out of  10 | elapsed:    1.1s finished
[Parallel(n_jobs=8)]: Done   7 out of  10 | elapsed:    1.3s remaining:    0.5s
[Parallel(n_jobs=8)]: Done  10 out of  10 | elapsed:    1.9s finished


[CV]  pca__n_components=0.95, extratreesclassifier__max_depth=2, total= 1.3min


[Parallel(n_jobs=8)]: Done   7 out of  10 | elapsed:    0.3s remaining:    0.1s
[Parallel(n_jobs=8)]: Done  10 out of  10 | elapsed:    0.6s finished
[Parallel(n_jobs=8)]: Done   7 out of  10 | elapsed:    0.5s remaining:    0.2s
[Parallel(n_jobs=8)]: Done  10 out of  10 | elapsed:    1.0s finished


building tree 4 of 10
building tree 6 of 10
building tree 7 of 10
building tree 5 of 10
building tree 1 of 10
building tree 8 of 10
building tree 3 of 10
building tree 2 of 10


[Parallel(n_jobs=8)]: Done   7 out of  10 | elapsed:    0.8s remaining:    0.3s


[CV]  pca__n_components=0.95, extratreesclassifier__max_depth=2, total= 1.7min


[Parallel(n_jobs=8)]: Done  10 out of  10 | elapsed:    1.3s finished


building tree 9 of 10
building tree 10 of 10


[Parallel(n_jobs=-1)]: Done   7 out of  10 | elapsed:    0.7s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done  13 out of  18 | elapsed:  4.1min remaining:  1.6min
[Parallel(n_jobs=-1)]: Done  10 out of  10 | elapsed:    0.9s finished


[CV]  pca__n_components=0.9, extratreesclassifier__max_depth=3, total= 1.0min


[Parallel(n_jobs=8)]: Done   7 out of  10 | elapsed:    0.4s remaining:    0.2s
[Parallel(n_jobs=8)]: Done  10 out of  10 | elapsed:    0.6s finished
[Parallel(n_jobs=8)]: Done   7 out of  10 | elapsed:    0.2s remaining:    0.1s
[Parallel(n_jobs=8)]: Done  10 out of  10 | elapsed:    0.3s finished


building tree 3 of 10
building tree 5 of 10
building tree 8 of 10
building tree 6 of 10
building tree 4 of 10
building tree 2 of 10
building tree 1 of 10
building tree 7 of 10
building tree 10 of 10
building tree 9 of 10


[Parallel(n_jobs=-1)]: Done   7 out of  10 | elapsed:    0.5s remaining:    0.2s
[Parallel(n_jobs=-1)]: Done  10 out of  10 | elapsed:    0.7s finished
[Parallel(n_jobs=8)]: Done   7 out of  10 | elapsed:    0.5s remaining:    0.2s
[Parallel(n_jobs=8)]: Done  10 out of  10 | elapsed:    0.7s finished


[CV]  pca__n_components=0.95, extratreesclassifier__max_depth=3, total= 1.4min


[Parallel(n_jobs=8)]: Done   7 out of  10 | elapsed:    0.2s remaining:    0.1s
[Parallel(n_jobs=8)]: Done  10 out of  10 | elapsed:    0.3s finished
[Parallel(n_jobs=8)]: Done   7 out of  10 | elapsed:    0.5s remaining:    0.2s
[Parallel(n_jobs=8)]: Done  10 out of  10 | elapsed:    0.7s finished


[CV]  pca__n_components=0.9, extratreesclassifier__max_depth=3, total= 1.7min


[Parallel(n_jobs=8)]: Done   7 out of  10 | elapsed:    0.5s remaining:    0.2s
[Parallel(n_jobs=8)]: Done  10 out of  10 | elapsed:    0.7s finished


[CV]  pca__n_components=0.95, extratreesclassifier__max_depth=3, total= 1.4min


[Parallel(n_jobs=8)]: Done   7 out of  10 | elapsed:    0.4s remaining:    0.2s
[Parallel(n_jobs=8)]: Done  10 out of  10 | elapsed:    0.6s finished


[CV]  pca__n_components=0.95, extratreesclassifier__max_depth=3, total=  52.0s


[Parallel(n_jobs=-1)]: Done  18 out of  18 | elapsed:  4.8min finished


building tree 1 of 10building tree 2 of 10building tree 3 of 10building tree 4 of 10building tree 5 of 10building tree 6 of 10building tree 7 of 10building tree 8 of 10







building tree 9 of 10building tree 10 of 10



[Parallel(n_jobs=-1)]: Done   7 out of  10 | elapsed:    0.5s remaining:    0.2s
[Parallel(n_jobs=-1)]: Done  10 out of  10 | elapsed:    0.7s finished


CPU times: user 56.9 s, sys: 9.06 s, total: 1min 5s
Wall time: 5min 21s


In [8]:
%%time
macro = f1_score(target, model.predict(data), average='macro')
micro = f1_score(target, model.predict(data), average='micro')
print('Best model score: %f' % model.best_score_)
print('Macro F1 score: %f' % macro)
print('Micro F1 score: %f' % micro)

[Parallel(n_jobs=8)]: Done   7 out of  10 | elapsed:    0.6s remaining:    0.3s
[Parallel(n_jobs=8)]: Done  10 out of  10 | elapsed:    0.9s finished
  'precision', 'predicted', average, warn_for)
[Parallel(n_jobs=8)]: Done   7 out of  10 | elapsed:    0.6s remaining:    0.3s
[Parallel(n_jobs=8)]: Done  10 out of  10 | elapsed:    0.8s finished


Best model score: 0.782875
Macro F1 score: 0.043914
Micro F1 score: 0.782877
CPU times: user 41.4 s, sys: 10.2 s, total: 51.5 s
Wall time: 42.1 s


Oh, by the way, what were the best parameters found by grid search?

In [9]:
model.best_params_

{'extratreesclassifier__max_depth': 2, 'pca__n_components': 0.95}

Now let's save it.

In [10]:
joblib.dump(model, 'models/extratrees.pkl')

['models/extratrees.pkl']