# Decision Trees and Ensembles Lab

In this lab we will compare the performance of a simple Decision Tree classifier with a Bagging classifier. We will do that on few datasets, starting from the ones offered by Scikit Learn.

## 1. Breast Cancer Dataset
We will start our comparison on the breast cancer dataset.
You can load it directly from scikit-learn using the `load_breast_cancer` function.

### 1.a Simple comparison
1. Load the data and create X and y
- Initialize a Decision Tree Classifier and use `cross_val_score` to evaluate its performance. Set crossvalidation to 5-folds
- Wrap a Bagging Classifier around the Decision Tree Classifier and use `cross_val_score` to evaluate its performance. Set crossvalidation to 5-folds. 
- Which score is better? 

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [5]:
from sklearn.datasets import load_breast_cancer
data = load_breast_cancer()

In [10]:
data.keys()

['target_names', 'data', 'target', 'DESCR', 'feature_names']

In [12]:
print data['DESCR']

Breast Cancer Wisconsin (Diagnostic) Database

Notes
-----
Data Set Characteristics:
    :Number of Instances: 569

    :Number of Attributes: 30 numeric, predictive attributes and the class

    :Attribute Information:
        - radius (mean of distances from center to points on the perimeter)
        - texture (standard deviation of gray-scale values)
        - perimeter
        - area
        - smoothness (local variation in radius lengths)
        - compactness (perimeter^2 / area - 1.0)
        - concavity (severity of concave portions of the contour)
        - concave points (number of concave portions of the contour)
        - symmetry 
        - fractal dimension ("coastline approximation" - 1)
        
        The mean, standard error, and "worst" or largest (mean of the three
        largest values) of these features were computed for each image,
        resulting in 30 features.  For instance, field 3 is Mean Radius, field
        13 is Radius SE, field 23 is Worst Radius.
 

In [13]:
df = pd.DataFrame(data['data'], columns = data['feature_names'])
df.head()

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst radius,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension
0,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,...,25.38,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189
1,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,0.05667,...,24.99,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902
2,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,0.05999,...,23.57,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758
3,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,0.09744,...,14.91,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173
4,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,0.05883,...,22.54,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678


In [14]:
df.columns

Index([u'mean radius', u'mean texture', u'mean perimeter', u'mean area',
       u'mean smoothness', u'mean compactness', u'mean concavity',
       u'mean concave points', u'mean symmetry', u'mean fractal dimension',
       u'radius error', u'texture error', u'perimeter error', u'area error',
       u'smoothness error', u'compactness error', u'concavity error',
       u'concave points error', u'symmetry error', u'fractal dimension error',
       u'worst radius', u'worst texture', u'worst perimeter', u'worst area',
       u'worst smoothness', u'worst compactness', u'worst concavity',
       u'worst concave points', u'worst symmetry', u'worst fractal dimension'],
      dtype='object')

In [20]:
y = pd.DataFrame(data['target'], columns=['malignant'])
y.head()
y.describe()

Unnamed: 0,malignant
count,569.0
mean,0.627417
std,0.483918
min,0.0
25%,0.0
50%,1.0
75%,1.0
max,1.0


In [23]:
from sklearn.cross_validation import cross_val_score
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import BaggingClassifier

In [24]:
cross_val_score?

In [38]:
decision_tree = DecisionTreeClassifier()
cv_decision_tree = cross_val_score(decision_tree, df.values, 
                                   y['malignant'].values, cv=5)

In [39]:
cv_decision_tree

array([ 0.90434783,  0.93043478,  0.92035398,  0.94690265,  0.89380531])

In [40]:
print 'Decision Tree mean accuracy %s ' % (np.mean(cv_decision_tree))

Decision Tree mean accuracy 0.91916891112 


In [54]:
bagging = BaggingClassifier(DecisionTreeClassifier())
cv_bagging_classifier = cross_val_score(bagging,
                                       df.values,
                                       y['malignant'],
                                       cv=5)

In [55]:
cv_bagging_classifier

array([ 0.91304348,  0.93043478,  0.98230088,  0.92920354,  0.96460177])

In [56]:
print 'Bagging a Decision Tree mean accuracy %s ' % (np.mean(cv_bagging_classifier))

Bagging a Decision Tree mean accuracy 0.943916891112 


In [57]:
bagging.fit(df.values, y['malignant'])

BaggingClassifier(base_estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
            max_features=None, max_leaf_nodes=None, min_samples_leaf=1,
            min_samples_split=2, min_weight_fraction_leaf=0.0,
            presort=False, random_state=None, splitter='best'),
         bootstrap=True, bootstrap_features=False, max_features=1.0,
         max_samples=1.0, n_estimators=10, n_jobs=1, oob_score=False,
         random_state=None, verbose=0, warm_start=False)

In [58]:
bagging.score(df.values, y['malignant'])

0.99824253075571179

In [59]:
bagging.predict(df.values)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0,
       1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1,
       1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0,
       1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1,
       1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1,
       0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1,
       0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1,
       0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1,
       0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0,
       0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1,
       1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1,
       1, 0,

### 1.b Scaled pipelines
As you may have noticed the features are not normalized. Does the score improve with normalization?

Use a scaling preprocessing library from sklearn (such as `StandardScaler`) and compare your results to the previous outcomes

In [60]:
from sklearn.preprocessing import StandardScaler

In [61]:
standard_scaler = StandardScaler()

In [62]:
scaled_xs = standard_scaler.fit_transform(df.values)
scaled_xs

array([[ 1.09706398, -2.07333501,  1.26993369, ...,  2.29607613,
         2.75062224,  1.93701461],
       [ 1.82982061, -0.35363241,  1.68595471, ...,  1.0870843 ,
        -0.24388967,  0.28118999],
       [ 1.57988811,  0.45618695,  1.56650313, ...,  1.95500035,
         1.152255  ,  0.20139121],
       ..., 
       [ 0.70228425,  2.0455738 ,  0.67267578, ...,  0.41406869,
        -1.10454895, -0.31840916],
       [ 1.83834103,  2.33645719,  1.98252415, ...,  2.28998549,
         1.91908301,  2.21963528],
       [-1.80840125,  1.22179204, -1.81438851, ..., -1.74506282,
        -0.04813821, -0.75120669]])

In [63]:
decision_tree = DecisionTreeClassifier()
cv_decision_tree_scaled = cross_val_score(decision_tree, scaled_xs, 
                                   y['malignant'].values, cv=5)

In [64]:
cv_decision_tree_scaled

array([ 0.91304348,  0.89565217,  0.92035398,  0.94690265,  0.90265487])

In [65]:
np.mean(cv_decision_tree_scaled)

0.91572143131973827

In [66]:
bagging = BaggingClassifier(DecisionTreeClassifier())
cv_bagging_classifier_scaled = cross_val_score(bagging,
                                       scaled_xs,
                                       y['malignant'],
                                       cv=5)

In [67]:
cv_bagging_classifier_scaled

array([ 0.93043478,  0.90434783,  0.96460177,  0.94690265,  0.96460177])

In [68]:
np.mean(cv_bagging_classifier_scaled)

0.94217776067718351

### 1.c Grid Search

Grid search is a great way to improve the performance of a classifier. Let's explore the parameter space of both models and see if we can improve their performance.

1. Initialize a GridSearchCV with 5-fold cross validation for the Decision Tree Classifier
- search for few values of the parameters in order to improve the score of the classifier
- Use the whole X, y dataset for your test
- Check the best\_score\_ once you've trained it. Is it better than before?
- How does the score of the Grid-searched DT compare with the score of the Bagging DT?
- Initialize a GridSearchCV with 5-fold cross validation for the Bagging Decision Tree Classifier
- Repeat the search
    - Note that you'll have to change parameter names for the base_estimator
    - Note that there are also additional parameters to change
    - Note that you may end up with a grid space too large to search in a short time
    - Make use of the `n_jobs` parameter to speed up your grid search
- Does the score improve for the Grid-searched Bagging Classifier?
- Which score is better? Are the score significantly different? How can you judge that?

In [71]:
from sklearn.grid_search import GridSearchCV

decision_tree_classifer = DecisionTreeClassifier()
param_grid = {
    'max_depth': [None, 1, 2, 3, 4, 5],
    'min_samples_split': [2, 10, 25, 50, 100]
}

grid_search_dt = GridSearchCV(decision_tree_classifer,
                             param_grid=param_grid,
                                cv=5,
                             verbose=10)

In [72]:
grid_search_dt.fit(df.values, y['malignant'].values)

Fitting 5 folds for each of 30 candidates, totalling 150 fits
[CV] min_samples_split=2, max_depth=None .............................
[CV] .... min_samples_split=2, max_depth=None, score=0.913043 -   0.0s
[CV] min_samples_split=2, max_depth=None .............................
[CV] .... min_samples_split=2, max_depth=None, score=0.913043 -   0.0s
[CV] min_samples_split=2, max_depth=None .............................
[CV] .... min_samples_split=2, max_depth=None, score=0.902655 -   0.0s
[CV] min_samples_split=2, max_depth=None .............................
[CV] .... min_samples_split=2, max_depth=None, score=0.964602 -   0.0s
[CV] min_samples_split=2, max_depth=None .............................
[CV] .... min_samples_split=2, max_depth=None, score=0.911504 -   0.0s
[CV] min_samples_split=10, max_depth=None ............................
[CV] ... min_samples_split=10, max_depth=None, score=0.904348 -   0.0s
[CV] min_samples_split=10, max_depth=None ............................
[CV] ... min_sa

[Parallel(n_jobs=1)]: Done   1 tasks       | elapsed:    0.0s
[Parallel(n_jobs=1)]: Done   4 tasks       | elapsed:    0.0s
[Parallel(n_jobs=1)]: Done   7 tasks       | elapsed:    0.1s
[Parallel(n_jobs=1)]: Done  12 tasks       | elapsed:    0.1s
[Parallel(n_jobs=1)]: Done  17 tasks       | elapsed:    0.2s


[CV] .. min_samples_split=100, max_depth=None, score=0.895652 -   0.0s
[CV] min_samples_split=100, max_depth=None ...........................
[CV] .. min_samples_split=100, max_depth=None, score=0.920354 -   0.0s
[CV] min_samples_split=100, max_depth=None ...........................
[CV] .. min_samples_split=100, max_depth=None, score=0.929204 -   0.0s
[CV] min_samples_split=100, max_depth=None ...........................
[CV] .. min_samples_split=100, max_depth=None, score=0.920354 -   0.0s
[CV] min_samples_split=2, max_depth=1 ................................
[CV] ....... min_samples_split=2, max_depth=1, score=0.878261 -   0.0s
[CV] min_samples_split=2, max_depth=1 ................................
[CV] ....... min_samples_split=2, max_depth=1, score=0.904348 -   0.0s
[CV] min_samples_split=2, max_depth=1 ................................
[CV] ....... min_samples_split=2, max_depth=1, score=0.920354 -   0.0s
[CV] min_samples_split=2, max_depth=1 ................................
[CV] .

[Parallel(n_jobs=1)]: Done  24 tasks       | elapsed:    0.2s
[Parallel(n_jobs=1)]: Done  31 tasks       | elapsed:    0.3s
[Parallel(n_jobs=1)]: Done  40 tasks       | elapsed:    0.3s
[Parallel(n_jobs=1)]: Done  49 tasks       | elapsed:    0.3s
[Parallel(n_jobs=1)]: Done  60 tasks       | elapsed:    0.4s


[CV] ...... min_samples_split=25, max_depth=2, score=0.938053 -   0.0s
[CV] min_samples_split=25, max_depth=2 ...............................
[CV] ...... min_samples_split=25, max_depth=2, score=0.929204 -   0.0s
[CV] min_samples_split=25, max_depth=2 ...............................
[CV] ...... min_samples_split=25, max_depth=2, score=0.938053 -   0.0s
[CV] min_samples_split=50, max_depth=2 ...............................
[CV] ...... min_samples_split=50, max_depth=2, score=0.913043 -   0.0s
[CV] min_samples_split=50, max_depth=2 ...............................
[CV] ...... min_samples_split=50, max_depth=2, score=0.921739 -   0.0s
[CV] min_samples_split=50, max_depth=2 ...............................
[CV] ...... min_samples_split=50, max_depth=2, score=0.938053 -   0.0s
[CV] min_samples_split=50, max_depth=2 ...............................
[CV] ...... min_samples_split=50, max_depth=2, score=0.929204 -   0.0s
[CV] min_samples_split=50, max_depth=2 ...............................
[CV] .

[Parallel(n_jobs=1)]: Done  71 tasks       | elapsed:    0.5s
[Parallel(n_jobs=1)]: Done  84 tasks       | elapsed:    0.6s
[Parallel(n_jobs=1)]: Done  97 tasks       | elapsed:    0.6s


[CV] ...... min_samples_split=50, max_depth=3, score=0.929204 -   0.0s
[CV] min_samples_split=50, max_depth=3 ...............................
[CV] ...... min_samples_split=50, max_depth=3, score=0.920354 -   0.0s
[CV] min_samples_split=100, max_depth=3 ..............................
[CV] ..... min_samples_split=100, max_depth=3, score=0.913043 -   0.0s
[CV] min_samples_split=100, max_depth=3 ..............................
[CV] ..... min_samples_split=100, max_depth=3, score=0.895652 -   0.0s
[CV] min_samples_split=100, max_depth=3 ..............................
[CV] ..... min_samples_split=100, max_depth=3, score=0.920354 -   0.0s
[CV] min_samples_split=100, max_depth=3 ..............................
[CV] ..... min_samples_split=100, max_depth=3, score=0.929204 -   0.0s
[CV] min_samples_split=100, max_depth=3 ..............................
[CV] ..... min_samples_split=100, max_depth=3, score=0.920354 -   0.0s
[CV] min_samples_split=2, max_depth=4 ................................
[CV] .

[Parallel(n_jobs=1)]: Done 112 tasks       | elapsed:    0.8s
[Parallel(n_jobs=1)]: Done 127 tasks       | elapsed:    0.9s


[CV] ...... min_samples_split=50, max_depth=4, score=0.920354 -   0.0s
[CV] min_samples_split=100, max_depth=4 ..............................
[CV] ..... min_samples_split=100, max_depth=4, score=0.913043 -   0.0s
[CV] min_samples_split=100, max_depth=4 ..............................
[CV] ..... min_samples_split=100, max_depth=4, score=0.895652 -   0.0s
[CV] min_samples_split=100, max_depth=4 ..............................
[CV] ..... min_samples_split=100, max_depth=4, score=0.929204 -   0.0s
[CV] min_samples_split=100, max_depth=4 ..............................
[CV] ..... min_samples_split=100, max_depth=4, score=0.929204 -   0.0s
[CV] min_samples_split=100, max_depth=4 ..............................
[CV] ..... min_samples_split=100, max_depth=4, score=0.920354 -   0.0s
[CV] min_samples_split=2, max_depth=5 ................................
[CV] ....... min_samples_split=2, max_depth=5, score=0.913043 -   0.0s
[CV] min_samples_split=2, max_depth=5 ................................
[CV] .

[Parallel(n_jobs=1)]: Done 144 tasks       | elapsed:    1.0s
[Parallel(n_jobs=1)]: Done 150 out of 150 | elapsed:    1.1s finished


GridSearchCV(cv=5, error_score='raise',
       estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
            max_features=None, max_leaf_nodes=None, min_samples_leaf=1,
            min_samples_split=2, min_weight_fraction_leaf=0.0,
            presort=False, random_state=None, splitter='best'),
       fit_params={}, iid=True, n_jobs=1,
       param_grid={'min_samples_split': [2, 10, 25, 50, 100], 'max_depth': [None, 1, 2, 3, 4, 5]},
       pre_dispatch='2*n_jobs', refit=True, scoring=None, verbose=10)

In [73]:
grid_search_dt.best_score_

0.92794376098418274

In [74]:
grid_search_dt.best_estimator_

DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=2,
            max_features=None, max_leaf_nodes=None, min_samples_leaf=1,
            min_samples_split=2, min_weight_fraction_leaf=0.0,
            presort=False, random_state=None, splitter='best')

In [82]:
bagging_classifier = BaggingClassifier(grid_search_dt.best_estimator_)

param_grid_bagging = {
    'n_estimators': [4, 10, 25, 50, 1000],
    'max_samples': [0.25, 0.5, 1.0],
    'max_features': [0.25, 0.5, 0.75, 1.0]
}

grid_search_bagging = GridSearchCV(bagging_classifier,
                             param_grid=param_grid_bagging,
                                cv=5,
                             verbose=10,
                                  n_jobs=-1)

In [83]:
grid_search_bagging.fit(df.values, y['malignant'].values)

Fitting 5 folds for each of 60 candidates, totalling 300 fits
[CV] max_features=0.25, max_samples=0.25, n_estimators=4 .............
[CV] max_features=0.25, max_samples=0.25, n_estimators=4 .............
[CV] max_features=0.25, max_samples=0.25, n_estimators=4 .............
[CV]  max_features=0.25, max_samples=0.25, n_estimators=4, score=0.860870 -   0.0s
[CV]  max_features=0.25, max_samples=0.25, n_estimators=4, score=0.921739 -   0.0s
[CV] max_features=0.25, max_samples=0.25, n_estimators=4 .............
[CV]  max_features=0.25, max_samples=0.25, n_estimators=4, score=0.929204 -   0.0s
[CV] max_features=0.25, max_samples=0.25, n_estimators=4 .............
[CV] max_features=0.25, max_samples=0.25, n_estimators=10 ............
[CV] max_features=0.25, max_samples=0.25, n_estimators=10 ............
[CV]  max_features=0.25, max_samples=0.25, n_estimators=4, score=0.946903 -   0.0s
[CV] max_features=0.25, max_samples=0.25, n_estimators=10 ............
[CV]  max_features=0.25, max_samples=0

[Parallel(n_jobs=-1)]: Batch computation too fast (0.0553s.) Setting batch_size=6.
[Parallel(n_jobs=-1)]: Done   5 tasks      | elapsed:    0.3s


[CV] max_features=0.25, max_samples=0.5, n_estimators=4 ..............
[CV]  max_features=0.25, max_samples=0.5, n_estimators=4, score=0.947826 -   0.0s
[CV] max_features=0.25, max_samples=0.5, n_estimators=4 ..............
[CV]  max_features=0.25, max_samples=0.25, n_estimators=10, score=0.964602 -   0.1s
[CV] max_features=0.25, max_samples=0.25, n_estimators=25 ............
[CV]  max_features=0.25, max_samples=0.5, n_estimators=4, score=0.929204 -   0.0s
[CV]  max_features=0.25, max_samples=0.25, n_estimators=25, score=0.955752 -   0.2s
[CV] max_features=0.25, max_samples=0.5, n_estimators=4 ..............
[CV] max_features=0.25, max_samples=0.25, n_estimators=50 ............
[CV]  max_features=0.25, max_samples=0.5, n_estimators=4, score=0.964602 -   0.0s
[CV] max_features=0.25, max_samples=0.5, n_estimators=4 ..............
[CV]  max_features=0.25, max_samples=0.5, n_estimators=4, score=0.955752 -   0.0s
[CV]  max_features=0.25, max_samples=0.25, n_estimators=25, score=0.930435 -  

[Parallel(n_jobs=-1)]: Done  20 tasks      | elapsed:    1.3s


[CV]  max_features=0.25, max_samples=0.5, n_estimators=25, score=0.921739 -   0.2s
[CV]  max_features=0.25, max_samples=0.5, n_estimators=25, score=0.955752 -   0.2s
[CV] max_features=0.25, max_samples=0.5, n_estimators=25 .............
[CV] max_features=0.25, max_samples=0.5, n_estimators=25 .............
[CV]  max_features=0.25, max_samples=0.5, n_estimators=25, score=0.955752 -   0.1s
[CV] max_features=0.25, max_samples=0.5, n_estimators=50 .............
[CV]  max_features=0.25, max_samples=0.5, n_estimators=25, score=0.955752 -   0.1s
[CV]  max_features=0.25, max_samples=0.25, n_estimators=50, score=0.982301 -   0.3s
[CV] max_features=0.25, max_samples=0.25, n_estimators=50 ............
[CV] max_features=0.25, max_samples=0.5, n_estimators=50 .............
[CV]  max_features=0.25, max_samples=0.5, n_estimators=50, score=0.939130 -   0.3s
[CV] max_features=0.25, max_samples=0.5, n_estimators=50 .............
[CV]  max_features=0.25, max_samples=0.25, n_estimators=50, score=0.955752 

[Parallel(n_jobs=-1)]: Batch computation too slow (2.12s.) Setting batch_size=3.


[CV]  max_features=0.25, max_samples=0.25, n_estimators=1000, score=0.939130 -   6.5s
[CV] max_features=0.25, max_samples=0.25, n_estimators=1000 ..........
[CV]  max_features=0.25, max_samples=0.5, n_estimators=1000, score=0.939130 -   6.1s
[CV] max_features=0.25, max_samples=0.5, n_estimators=1000 ...........
[CV]  max_features=0.25, max_samples=1.0, n_estimators=1000, score=0.930435 -   5.5s
[CV] max_features=0.25, max_samples=1.0, n_estimators=1000 ...........
[CV]  max_features=0.25, max_samples=1.0, n_estimators=1000, score=0.964602 -   5.7s
[CV] max_features=0.5, max_samples=0.25, n_estimators=4 ..............
[CV]  max_features=0.5, max_samples=0.25, n_estimators=4, score=0.913043 -   0.0s
[CV] max_features=0.5, max_samples=0.25, n_estimators=4 ..............
[CV]  max_features=0.5, max_samples=0.25, n_estimators=4, score=0.939130 -   0.0s
[CV] max_features=0.5, max_samples=0.25, n_estimators=4 ..............
[CV]  max_features=0.5, max_samples=0.25, n_estimators=4, score=0.929

[Parallel(n_jobs=-1)]: Done  62 tasks      | elapsed:   10.7s


[CV]  max_features=0.5, max_samples=0.25, n_estimators=10, score=0.946903 -   0.1s
[CV] max_features=0.5, max_samples=0.25, n_estimators=10 .............
[CV]  max_features=0.5, max_samples=0.25, n_estimators=10, score=0.938053 -   0.0s
[CV] max_features=0.5, max_samples=0.25, n_estimators=10 .............
[CV]  max_features=0.5, max_samples=0.25, n_estimators=10, score=0.964602 -   0.1s
[CV] max_features=0.5, max_samples=0.25, n_estimators=25 .............
[CV]  max_features=0.5, max_samples=0.25, n_estimators=25, score=0.930435 -   0.1s
[CV] max_features=0.5, max_samples=0.25, n_estimators=25 .............
[CV]  max_features=0.5, max_samples=0.25, n_estimators=25, score=0.947826 -   0.1s
[CV] max_features=0.5, max_samples=0.25, n_estimators=25 .............
[CV]  max_features=0.5, max_samples=0.25, n_estimators=25, score=0.973451 -   0.2s
[CV] max_features=0.5, max_samples=0.25, n_estimators=25 .............
[CV]  max_features=0.25, max_samples=0.25, n_estimators=1000, score=0.939130

[Parallel(n_jobs=-1)]: Batch computation too slow (15.69s.) Setting batch_size=1.


[CV]  max_features=0.5, max_samples=0.5, n_estimators=10, score=0.930435 -   0.1s
[CV] max_features=0.5, max_samples=0.5, n_estimators=10 ..............
[CV]  max_features=0.5, max_samples=0.5, n_estimators=10, score=0.973451 -   0.1s
[CV] max_features=0.5, max_samples=0.5, n_estimators=10 ..............
[CV]  max_features=0.5, max_samples=0.5, n_estimators=10, score=0.955752 -   0.1s
[CV] max_features=0.5, max_samples=0.5, n_estimators=10 ..............
[CV]  max_features=0.5, max_samples=0.5, n_estimators=10, score=0.955752 -   0.1s
[CV] max_features=0.5, max_samples=0.5, n_estimators=25 ..............
[CV]  max_features=0.5, max_samples=0.5, n_estimators=25, score=0.913043 -   0.1s
[CV] max_features=0.5, max_samples=0.5, n_estimators=25 ..............


[Parallel(n_jobs=-1)]: Done  95 tasks      | elapsed:   26.9s


[CV]  max_features=0.5, max_samples=0.5, n_estimators=25, score=0.921739 -   0.1s
[CV] max_features=0.5, max_samples=0.5, n_estimators=25 ..............
[CV]  max_features=0.5, max_samples=0.5, n_estimators=25, score=0.973451 -   0.1s
[CV] max_features=0.5, max_samples=0.5, n_estimators=25 ..............
[CV]  max_features=0.5, max_samples=0.5, n_estimators=25, score=0.964602 -   0.2s
[CV] max_features=0.5, max_samples=0.5, n_estimators=25 ..............
[CV]  max_features=0.5, max_samples=0.5, n_estimators=25, score=0.964602 -   0.2s
[CV] max_features=0.5, max_samples=0.5, n_estimators=50 ..............
[CV]  max_features=0.5, max_samples=0.5, n_estimators=50, score=0.913043 -   0.3s
[CV] max_features=0.5, max_samples=0.5, n_estimators=50 ..............
[CV]  max_features=0.5, max_samples=0.5, n_estimators=50, score=0.930435 -   0.3s
[CV] max_features=0.5, max_samples=0.5, n_estimators=50 ..............
[CV]  max_features=0.25, max_samples=0.5, n_estimators=1000, score=0.964602 -   5.

[Parallel(n_jobs=-1)]: Done 118 tasks      | elapsed:   34.2s


[CV]  max_features=0.5, max_samples=0.5, n_estimators=1000, score=0.982301 -   5.5s
[CV] max_features=0.5, max_samples=1.0, n_estimators=4 ...............
[CV]  max_features=0.5, max_samples=1.0, n_estimators=4, score=0.904348 -   0.0s
[CV] max_features=0.5, max_samples=1.0, n_estimators=4 ...............
[CV]  max_features=0.5, max_samples=1.0, n_estimators=4, score=0.930435 -   0.0s
[CV] max_features=0.5, max_samples=1.0, n_estimators=4 ...............
[CV]  max_features=0.5, max_samples=1.0, n_estimators=4, score=0.938053 -   0.0s
[CV] max_features=0.5, max_samples=1.0, n_estimators=4 ...............
[CV]  max_features=0.5, max_samples=1.0, n_estimators=4, score=0.955752 -   0.0s
[CV] max_features=0.5, max_samples=1.0, n_estimators=4 ...............
[CV]  max_features=0.5, max_samples=1.0, n_estimators=4, score=0.938053 -   0.0s
[CV] max_features=0.5, max_samples=1.0, n_estimators=10 ..............
[CV]  max_features=0.5, max_samples=1.0, n_estimators=10, score=0.913043 -   0.1s
[CV

[Parallel(n_jobs=-1)]: Done 127 tasks      | elapsed:   35.1s


[CV] max_features=0.5, max_samples=1.0, n_estimators=10 ..............
[CV]  max_features=0.5, max_samples=1.0, n_estimators=10, score=0.946903 -   0.1s
[CV] max_features=0.5, max_samples=1.0, n_estimators=25 ..............
[CV]  max_features=0.5, max_samples=1.0, n_estimators=25, score=0.904348 -   0.2s
[CV] max_features=0.5, max_samples=1.0, n_estimators=25 ..............
[CV]  max_features=0.5, max_samples=1.0, n_estimators=25, score=0.947826 -   0.2s
[CV] max_features=0.5, max_samples=1.0, n_estimators=25 ..............
[CV]  max_features=0.5, max_samples=1.0, n_estimators=25, score=0.964602 -   0.2s
[CV] max_features=0.5, max_samples=1.0, n_estimators=25 ..............
[CV]  max_features=0.5, max_samples=1.0, n_estimators=25, score=0.938053 -   0.2s
[CV] max_features=0.5, max_samples=1.0, n_estimators=25 ..............
[CV]  max_features=0.5, max_samples=1.0, n_estimators=25, score=0.955752 -   0.1s
[CV] max_features=0.5, max_samples=1.0, n_estimators=50 ..............
[CV]  max_f

[Parallel(n_jobs=-1)]: Done 140 tasks      | elapsed:   36.8s


[CV] max_features=0.5, max_samples=1.0, n_estimators=50 ..............
[CV]  max_features=0.5, max_samples=1.0, n_estimators=50, score=0.955752 -   0.3s
[CV] max_features=0.5, max_samples=1.0, n_estimators=1000 ............
[CV]  max_features=0.5, max_samples=1.0, n_estimators=50, score=0.964602 -   0.3s
[CV] max_features=0.5, max_samples=1.0, n_estimators=1000 ............
[CV]  max_features=0.5, max_samples=0.5, n_estimators=1000, score=0.964602 -   5.7s
[CV] max_features=0.5, max_samples=1.0, n_estimators=1000 ............
[CV]  max_features=0.5, max_samples=0.5, n_estimators=1000, score=0.964602 -   5.8s
[CV] max_features=0.5, max_samples=1.0, n_estimators=1000 ............
[CV]  max_features=0.5, max_samples=1.0, n_estimators=1000, score=0.921739 -   5.8s
[CV] max_features=0.5, max_samples=1.0, n_estimators=1000 ............
[CV]  max_features=0.5, max_samples=1.0, n_estimators=1000, score=0.930435 -   5.8s
[CV] max_features=0.75, max_samples=0.25, n_estimators=4 .............
[CV

[Parallel(n_jobs=-1)]: Done 151 tasks      | elapsed:   43.4s


[CV]  max_features=0.75, max_samples=0.25, n_estimators=10, score=0.921739 -   0.1s
[CV] max_features=0.75, max_samples=0.25, n_estimators=10 ............
[CV]  max_features=0.75, max_samples=0.25, n_estimators=10, score=0.982301 -   0.1s
[CV] max_features=0.75, max_samples=0.25, n_estimators=10 ............
[CV]  max_features=0.75, max_samples=0.25, n_estimators=10, score=0.938053 -   0.1s
[CV] max_features=0.75, max_samples=0.25, n_estimators=10 ............
[CV]  max_features=0.75, max_samples=0.25, n_estimators=10, score=0.955752 -   0.1s
[CV] max_features=0.75, max_samples=0.25, n_estimators=25 ............
[CV]  max_features=0.75, max_samples=0.25, n_estimators=25, score=0.913043 -   0.1s
[CV] max_features=0.75, max_samples=0.25, n_estimators=25 ............
[CV]  max_features=0.75, max_samples=0.25, n_estimators=25, score=0.947826 -   0.1s
[CV] max_features=0.75, max_samples=0.25, n_estimators=25 ............
[CV]  max_features=0.75, max_samples=0.25, n_estimators=25, score=0.97

[Parallel(n_jobs=-1)]: Done 164 tasks      | elapsed:   45.2s


[CV]  max_features=0.75, max_samples=0.25, n_estimators=50, score=0.982301 -   0.3s
[CV] max_features=0.75, max_samples=0.25, n_estimators=50 ............
[CV]  max_features=0.75, max_samples=0.25, n_estimators=50, score=0.964602 -   0.3s
[CV] max_features=0.75, max_samples=0.25, n_estimators=50 ............
[CV]  max_features=0.75, max_samples=0.25, n_estimators=50, score=0.955752 -   0.3s
[CV] max_features=0.75, max_samples=0.25, n_estimators=1000 ..........
[CV]  max_features=0.5, max_samples=1.0, n_estimators=1000, score=0.982301 -   6.3s
[CV] max_features=0.75, max_samples=0.25, n_estimators=1000 ..........
[CV]  max_features=0.5, max_samples=1.0, n_estimators=1000, score=0.964602 -   6.3s
[CV] max_features=0.75, max_samples=0.25, n_estimators=1000 ..........
[CV]  max_features=0.5, max_samples=1.0, n_estimators=1000, score=0.964602 -   6.3s
[CV] max_features=0.75, max_samples=0.25, n_estimators=1000 ..........
[CV]  max_features=0.75, max_samples=0.25, n_estimators=1000, score=0.

[Parallel(n_jobs=-1)]: Done 177 tasks      | elapsed:   51.8s


[CV]  max_features=0.75, max_samples=0.5, n_estimators=10, score=0.921739 -   0.1s
[CV] max_features=0.75, max_samples=0.5, n_estimators=10 .............
[CV]  max_features=0.75, max_samples=0.5, n_estimators=10, score=0.973451 -   0.1s
[CV] max_features=0.75, max_samples=0.5, n_estimators=10 .............
[CV]  max_features=0.75, max_samples=0.5, n_estimators=10, score=0.929204 -   0.1s
[CV] max_features=0.75, max_samples=0.5, n_estimators=25 .............
[CV]  max_features=0.75, max_samples=0.5, n_estimators=10, score=0.955752 -   0.1s
[CV] max_features=0.75, max_samples=0.5, n_estimators=25 .............
[CV]  max_features=0.75, max_samples=0.5, n_estimators=25, score=0.930435 -   0.1s
[CV]  max_features=0.75, max_samples=0.5, n_estimators=25, score=0.904348 -   0.2s
[CV] max_features=0.75, max_samples=0.5, n_estimators=25 .............
[CV] max_features=0.75, max_samples=0.5, n_estimators=25 .............
[CV]  max_features=0.75, max_samples=0.5, n_estimators=25, score=0.982301 - 

[Parallel(n_jobs=-1)]: Done 192 tasks      | elapsed:   53.3s


[CV]  max_features=0.75, max_samples=0.5, n_estimators=50, score=0.964602 -   0.3s
[CV] max_features=0.75, max_samples=0.5, n_estimators=1000 ...........
[CV]  max_features=0.75, max_samples=0.25, n_estimators=1000, score=0.964602 -   5.8s
[CV] max_features=0.75, max_samples=0.5, n_estimators=1000 ...........
[CV]  max_features=0.75, max_samples=0.25, n_estimators=1000, score=0.964602 -   5.8s
[CV] max_features=0.75, max_samples=0.5, n_estimators=1000 ...........
[CV]  max_features=0.75, max_samples=0.5, n_estimators=1000, score=0.913043 -   5.8s
[CV] max_features=0.75, max_samples=0.5, n_estimators=1000 ...........
[CV]  max_features=0.75, max_samples=0.5, n_estimators=1000, score=0.947826 -   5.9s
[CV] max_features=0.75, max_samples=1.0, n_estimators=4 ..............
[CV]  max_features=0.75, max_samples=1.0, n_estimators=4, score=0.904348 -   0.0s
[CV] max_features=0.75, max_samples=1.0, n_estimators=4 ..............
[CV]  max_features=0.75, max_samples=1.0, n_estimators=4, score=0.9

[Parallel(n_jobs=-1)]: Done 207 tasks      | elapsed:  1.0min


[CV] max_features=0.75, max_samples=1.0, n_estimators=25 .............
[CV]  max_features=0.75, max_samples=1.0, n_estimators=25, score=0.913043 -   0.2s
[CV] max_features=0.75, max_samples=1.0, n_estimators=25 .............
[CV]  max_features=0.75, max_samples=1.0, n_estimators=25, score=0.973451 -   0.2s
[CV] max_features=0.75, max_samples=1.0, n_estimators=25 .............
[CV]  max_features=0.75, max_samples=1.0, n_estimators=25, score=0.964602 -   0.2s
[CV] max_features=0.75, max_samples=1.0, n_estimators=25 .............
[CV]  max_features=0.75, max_samples=0.5, n_estimators=1000, score=0.982301 -   6.1s
[CV] max_features=0.75, max_samples=1.0, n_estimators=50 .............
[CV]  max_features=0.75, max_samples=1.0, n_estimators=25, score=0.964602 -   0.2s
[CV] max_features=0.75, max_samples=1.0, n_estimators=50 .............
[CV]  max_features=0.75, max_samples=1.0, n_estimators=50, score=0.913043 -   0.3s
[CV] max_features=0.75, max_samples=1.0, n_estimators=50 .............
[CV

[Parallel(n_jobs=-1)]: Done 224 tasks      | elapsed:  1.1min


[CV]  max_features=1.0, max_samples=0.25, n_estimators=10, score=0.913043 -   0.1s
[CV] max_features=1.0, max_samples=0.25, n_estimators=10 .............
[CV]  max_features=1.0, max_samples=0.25, n_estimators=10, score=0.930435 -   0.1s
[CV] max_features=1.0, max_samples=0.25, n_estimators=10 .............
[CV]  max_features=1.0, max_samples=0.25, n_estimators=10, score=0.946903 -   0.1s
[CV] max_features=1.0, max_samples=0.25, n_estimators=10 .............
[CV]  max_features=1.0, max_samples=0.25, n_estimators=10, score=0.946903 -   0.1s
[CV] max_features=1.0, max_samples=0.25, n_estimators=10 .............
[CV]  max_features=1.0, max_samples=0.25, n_estimators=10, score=0.955752 -   0.1s
[CV] max_features=1.0, max_samples=0.25, n_estimators=25 .............
[CV]  max_features=1.0, max_samples=0.25, n_estimators=25, score=0.913043 -   0.2s
[CV] max_features=1.0, max_samples=0.25, n_estimators=25 .............
[CV]  max_features=1.0, max_samples=0.25, n_estimators=25, score=0.947826 - 

[Parallel(n_jobs=-1)]: Done 241 tasks      | elapsed:  1.2min


[CV]  max_features=1.0, max_samples=0.25, n_estimators=50, score=0.955752 -   0.3s
[CV] max_features=1.0, max_samples=0.25, n_estimators=1000 ...........
[CV]  max_features=0.75, max_samples=1.0, n_estimators=1000, score=0.964602 -   7.0s
[CV] max_features=1.0, max_samples=0.25, n_estimators=1000 ...........
[CV]  max_features=0.75, max_samples=1.0, n_estimators=1000, score=0.964602 -   6.8s
[CV] max_features=1.0, max_samples=0.25, n_estimators=1000 ...........
[CV]  max_features=1.0, max_samples=0.25, n_estimators=1000, score=0.895652 -   5.1s
[CV] max_features=1.0, max_samples=0.25, n_estimators=1000 ...........
[CV]  max_features=1.0, max_samples=0.25, n_estimators=1000, score=0.947826 -   5.3s
[CV] max_features=1.0, max_samples=0.5, n_estimators=4 ...............
[CV]  max_features=1.0, max_samples=0.5, n_estimators=4, score=0.895652 -   0.0s
[CV] max_features=1.0, max_samples=0.5, n_estimators=4 ...............
[CV]  max_features=1.0, max_samples=0.5, n_estimators=4, score=0.91304

[Parallel(n_jobs=-1)]: Done 260 tasks      | elapsed:  1.3min


[CV]  max_features=1.0, max_samples=0.5, n_estimators=25, score=0.955752 -   0.2s
[CV] max_features=1.0, max_samples=0.5, n_estimators=50 ..............
[CV]  max_features=1.0, max_samples=0.5, n_estimators=50, score=0.921739 -   0.3s
[CV] max_features=1.0, max_samples=0.5, n_estimators=50 ..............
[CV]  max_features=1.0, max_samples=0.5, n_estimators=50, score=0.947826 -   0.3s
[CV] max_features=1.0, max_samples=0.5, n_estimators=50 ..............
[CV]  max_features=1.0, max_samples=0.5, n_estimators=50, score=0.973451 -   0.3s
[CV] max_features=1.0, max_samples=0.5, n_estimators=50 ..............
[CV]  max_features=1.0, max_samples=0.5, n_estimators=50, score=0.955752 -   0.3s
[CV] max_features=1.0, max_samples=0.5, n_estimators=1000 ............
[CV]  max_features=1.0, max_samples=0.5, n_estimators=50, score=0.964602 -   0.3s
[CV] max_features=1.0, max_samples=0.5, n_estimators=1000 ............
[CV]  max_features=1.0, max_samples=0.25, n_estimators=1000, score=0.964602 -   5.

[Parallel(n_jobs=-1)]: Done 279 tasks      | elapsed:  1.4min


[CV]  max_features=1.0, max_samples=1.0, n_estimators=10, score=0.955752 -   0.1s
[CV] max_features=1.0, max_samples=1.0, n_estimators=10 ..............
[CV]  max_features=1.0, max_samples=1.0, n_estimators=10, score=0.938053 -   0.1s
[CV] max_features=1.0, max_samples=1.0, n_estimators=25 ..............
[CV]  max_features=1.0, max_samples=1.0, n_estimators=25, score=0.913043 -   0.2s
[CV] max_features=1.0, max_samples=1.0, n_estimators=25 ..............
[CV]  max_features=1.0, max_samples=1.0, n_estimators=25, score=0.930435 -   0.2s
[CV] max_features=1.0, max_samples=1.0, n_estimators=25 ..............
[CV]  max_features=1.0, max_samples=1.0, n_estimators=25, score=0.964602 -   0.2s
[CV] max_features=1.0, max_samples=1.0, n_estimators=25 ..............
[CV]  max_features=1.0, max_samples=1.0, n_estimators=25, score=0.955752 -   0.2s
[CV] max_features=1.0, max_samples=1.0, n_estimators=25 ..............
[CV]  max_features=1.0, max_samples=1.0, n_estimators=25, score=0.964602 -   0.2s


[Parallel(n_jobs=-1)]: Done 300 out of 300 | elapsed:  1.7min finished


GridSearchCV(cv=5, error_score='raise',
       estimator=BaggingClassifier(base_estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=2,
            max_features=None, max_leaf_nodes=None, min_samples_leaf=1,
            min_samples_split=2, min_weight_fraction_leaf=0.0,
            presort=False, random_state=None, spl...n_estimators=10, n_jobs=1, oob_score=False,
         random_state=None, verbose=0, warm_start=False),
       fit_params={}, iid=True, n_jobs=-1,
       param_grid={'n_estimators': [4, 10, 25, 50, 1000], 'max_samples': [0.25, 0.5, 1.0], 'max_features': [0.25, 0.5, 0.75, 1.0]},
       pre_dispatch='2*n_jobs', refit=True, scoring=None, verbose=10)

In [84]:
grid_search_bagging.best_score_

0.95782073813708257

In [85]:
grid_search_bagging.best_estimator_

BaggingClassifier(base_estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=2,
            max_features=None, max_leaf_nodes=None, min_samples_leaf=1,
            min_samples_split=2, min_weight_fraction_leaf=0.0,
            presort=False, random_state=None, splitter='best'),
         bootstrap=True, bootstrap_features=False, max_features=0.75,
         max_samples=1.0, n_estimators=10, n_jobs=1, oob_score=False,
         random_state=None, verbose=0, warm_start=False)

In [88]:
from sklearn.cross_validation import train_test_split

combined_df = df.join(y)
combined_df.head(1)

x_train, x_test, y_train, y_test = train_test_split(df, y['malignant'])

In [89]:
x_train.shape, x_test.shape, y_train.shape, y_test.shape

((426, 30), (143, 30), (426,), (143,))

In [90]:
bagging_classifier = BaggingClassifier(grid_search_dt.best_estimator_)

param_grid_bagging = {
    'n_estimators': [4, 10, 25, 50, 1000],
    'max_samples': [0.25, 0.5, 1.0],
    'max_features': [0.25, 0.5, 0.75, 1.0]
}

grid_search_bagging = GridSearchCV(bagging_classifier,
                             param_grid=param_grid_bagging,
                                cv=5,
                             verbose=10,
                                  n_jobs=-1)

In [91]:
grid_search_bagging.fit(x_train, y_train)

Fitting 5 folds for each of 60 candidates, totalling 300 fits
[CV] max_features=0.25, max_samples=0.25, n_estimators=4 .............
[CV]  max_features=0.25, max_samples=0.25, n_estimators=4, score=0.918605 -   0.0s
[CV] max_features=0.25, max_samples=0.25, n_estimators=4 .............
[CV] max_features=0.25, max_samples=0.25, n_estimators=4 .............
[CV] max_features=0.25, max_samples=0.25, n_estimators=4 .............
[CV] max_features=0.25, max_samples=0.25, n_estimators=4 .............
[CV]  max_features=0.25, max_samples=0.25, n_estimators=4, score=0.894118 -   0.1s
[CV]  max_features=0.25, max_samples=0.25, n_estimators=4, score=0.905882 -   0.1s
[CV]  max_features=0.25, max_samples=0.25, n_estimators=4, score=0.905882 -   0.0s
[CV]  max_features=0.25, max_samples=0.25, n_estimators=4, score=0.858824 -   0.1s
[CV] max_features=0.25, max_samples=0.25, n_estimators=10 ............
[CV] max_features=0.25, max_samples=0.25, n_estimators=10 ............
[CV] max_features=0.25, ma

[Parallel(n_jobs=-1)]: Batch computation too fast (0.0667s.) Setting batch_size=4.
[Parallel(n_jobs=-1)]: Done   5 tasks      | elapsed:    0.2s


[CV] max_features=0.25, max_samples=0.25, n_estimators=10 ............
[CV]  max_features=0.25, max_samples=0.25, n_estimators=10, score=0.976471 -   0.1s
[CV] max_features=0.25, max_samples=0.25, n_estimators=1000 ..........
[CV]  max_features=0.25, max_samples=0.25, n_estimators=10, score=0.929412 -   0.1s
[CV] max_features=0.25, max_samples=0.25, n_estimators=25 ............
[CV]  max_features=0.25, max_samples=0.25, n_estimators=25, score=0.858824 -   0.2s
[CV] max_features=0.25, max_samples=0.25, n_estimators=25 ............
[CV]  max_features=0.25, max_samples=0.25, n_estimators=25, score=0.976744 -   0.2s
[CV] max_features=0.25, max_samples=0.25, n_estimators=25 ............
[CV]  max_features=0.25, max_samples=0.25, n_estimators=25, score=0.964706 -   0.2s
[CV]  max_features=0.25, max_samples=0.25, n_estimators=50, score=0.941176 -   0.5s
[CV] max_features=0.25, max_samples=0.25, n_estimators=50 ............
[CV] max_features=0.25, max_samples=0.25, n_estimators=25 ............

[Parallel(n_jobs=-1)]: Done  16 tasks      | elapsed:    1.3s


[CV]  max_features=0.25, max_samples=0.5, n_estimators=10, score=0.965116 -   0.1s
[CV] max_features=0.25, max_samples=0.5, n_estimators=10 .............
[CV]  max_features=0.25, max_samples=0.5, n_estimators=10, score=0.929412 -   0.1s
[CV] max_features=0.25, max_samples=0.5, n_estimators=10 .............
[CV]  max_features=0.25, max_samples=0.5, n_estimators=10, score=0.882353 -   0.1s
[CV] max_features=0.25, max_samples=0.5, n_estimators=10 .............
[CV]  max_features=0.25, max_samples=0.5, n_estimators=10, score=0.964706 -   0.1s
[CV]  max_features=0.25, max_samples=0.25, n_estimators=50, score=0.917647 -   0.5s
[CV] max_features=0.25, max_samples=0.5, n_estimators=10 .............
[CV] max_features=0.25, max_samples=0.5, n_estimators=25 .............
[CV]  max_features=0.25, max_samples=0.5, n_estimators=10, score=0.905882 -   0.1s
[CV] max_features=0.25, max_samples=0.5, n_estimators=25 .............
[CV]  max_features=0.25, max_samples=0.5, n_estimators=25, score=0.929412 -

[Parallel(n_jobs=-1)]: Batch computation too slow (2.62s.) Setting batch_size=2.


[CV] max_features=0.25, max_samples=1.0, n_estimators=10 .............
[CV]  max_features=0.25, max_samples=1.0, n_estimators=10, score=0.930233 -   0.0s
[CV] max_features=0.25, max_samples=1.0, n_estimators=10 .............
[CV]  max_features=0.25, max_samples=1.0, n_estimators=10, score=0.917647 -   0.1s
[CV] max_features=0.25, max_samples=1.0, n_estimators=10 .............
[CV]  max_features=0.25, max_samples=1.0, n_estimators=10, score=0.858824 -   0.1s
[CV] max_features=0.25, max_samples=1.0, n_estimators=10 .............


[Parallel(n_jobs=-1)]: Done  44 tasks      | elapsed:    6.8s


[CV]  max_features=0.25, max_samples=1.0, n_estimators=10, score=0.952941 -   0.1s
[CV] max_features=0.25, max_samples=1.0, n_estimators=10 .............
[CV]  max_features=0.25, max_samples=1.0, n_estimators=10, score=0.905882 -   0.1s
[CV] max_features=0.25, max_samples=1.0, n_estimators=25 .............
[CV]  max_features=0.25, max_samples=1.0, n_estimators=25, score=0.965116 -   0.1s
[CV] max_features=0.25, max_samples=1.0, n_estimators=25 .............
[CV]  max_features=0.25, max_samples=1.0, n_estimators=25, score=0.929412 -   0.1s
[CV] max_features=0.25, max_samples=1.0, n_estimators=25 .............
[CV]  max_features=0.25, max_samples=1.0, n_estimators=25, score=0.882353 -   0.1s
[CV] max_features=0.25, max_samples=1.0, n_estimators=25 .............
[CV]  max_features=0.25, max_samples=1.0, n_estimators=25, score=0.976471 -   0.1s
[CV] max_features=0.25, max_samples=1.0, n_estimators=25 .............
[CV]  max_features=0.25, max_samples=1.0, n_estimators=25, score=0.894118 - 

[Parallel(n_jobs=-1)]: Batch computation too slow (2.81s.) Setting batch_size=1.


[CV]  max_features=0.25, max_samples=0.25, n_estimators=1000, score=0.929412 -   5.1s
[CV] max_features=0.25, max_samples=0.25, n_estimators=1000 ..........
[CV]  max_features=0.25, max_samples=0.5, n_estimators=1000, score=0.929412 -   4.7s
[CV] max_features=0.25, max_samples=0.5, n_estimators=1000 ...........
[CV]  max_features=0.25, max_samples=0.5, n_estimators=1000, score=0.917647 -   4.7s
[CV] max_features=0.25, max_samples=1.0, n_estimators=4 ..............
[CV]  max_features=0.25, max_samples=1.0, n_estimators=4, score=0.953488 -   0.0s
[CV] max_features=0.25, max_samples=1.0, n_estimators=4 ..............
[CV]  max_features=0.25, max_samples=1.0, n_estimators=4, score=0.894118 -   0.0s
[CV] max_features=0.25, max_samples=1.0, n_estimators=1000 ...........
[CV]  max_features=0.25, max_samples=1.0, n_estimators=1000, score=0.965116 -   5.4s
[CV] max_features=0.25, max_samples=1.0, n_estimators=1000 ...........
[CV]  max_features=0.25, max_samples=0.25, n_estimators=1000, score=0

[Parallel(n_jobs=-1)]: Done  68 tasks      | elapsed:   20.0s


[CV]  max_features=0.5, max_samples=0.25, n_estimators=4, score=0.917647 -   0.0s
[CV] max_features=0.5, max_samples=0.25, n_estimators=10 .............
[CV]  max_features=0.5, max_samples=0.25, n_estimators=10, score=0.976744 -   0.1s
[CV] max_features=0.5, max_samples=0.25, n_estimators=10 .............
[CV]  max_features=0.5, max_samples=0.25, n_estimators=10, score=0.882353 -   0.1s
[CV] max_features=0.5, max_samples=0.25, n_estimators=10 .............
[CV]  max_features=0.5, max_samples=0.25, n_estimators=10, score=0.894118 -   0.1s
[CV] max_features=0.5, max_samples=0.25, n_estimators=10 .............
[CV]  max_features=0.5, max_samples=0.25, n_estimators=10, score=0.976471 -   0.1s
[CV] max_features=0.5, max_samples=0.25, n_estimators=10 .............
[CV]  max_features=0.5, max_samples=0.25, n_estimators=10, score=0.905882 -   0.1s
[CV] max_features=0.5, max_samples=0.25, n_estimators=25 .............
[CV]  max_features=0.5, max_samples=0.25, n_estimators=25, score=0.965116 -  

[Parallel(n_jobs=-1)]: Done  78 tasks      | elapsed:   20.8s


[CV]  max_features=0.5, max_samples=0.25, n_estimators=25, score=0.870588 -   0.1s
[CV] max_features=0.5, max_samples=0.25, n_estimators=25 .............
[CV]  max_features=0.5, max_samples=0.25, n_estimators=25, score=0.964706 -   0.1s
[CV] max_features=0.5, max_samples=0.25, n_estimators=25 .............
[CV]  max_features=0.5, max_samples=0.25, n_estimators=25, score=0.917647 -   0.2s
[CV] max_features=0.5, max_samples=0.25, n_estimators=50 .............
[CV]  max_features=0.25, max_samples=0.25, n_estimators=1000, score=0.988235 -   5.0s
[CV] max_features=0.5, max_samples=0.25, n_estimators=50 .............
[CV]  max_features=0.5, max_samples=0.25, n_estimators=50, score=0.965116 -   0.3s
[CV] max_features=0.5, max_samples=0.25, n_estimators=50 .............
[CV]  max_features=0.5, max_samples=0.25, n_estimators=50, score=0.917647 -   0.3s
[CV] max_features=0.5, max_samples=0.25, n_estimators=50 .............
[CV]  max_features=0.5, max_samples=0.25, n_estimators=50, score=0.870588

[Parallel(n_jobs=-1)]: Done  90 tasks      | elapsed:   22.1s


[CV]  max_features=0.25, max_samples=1.0, n_estimators=1000, score=0.917647 -   5.4s
[CV] max_features=0.5, max_samples=0.25, n_estimators=4 ..............
[CV]  max_features=0.5, max_samples=0.25, n_estimators=4, score=0.965116 -   0.0s
[CV] max_features=0.5, max_samples=0.25, n_estimators=1000 ...........
[CV]  max_features=0.25, max_samples=1.0, n_estimators=1000, score=0.976471 -   5.4s
[CV] max_features=0.5, max_samples=0.25, n_estimators=1000 ...........
[CV]  max_features=0.5, max_samples=0.25, n_estimators=1000, score=0.965116 -   5.0s
[CV] max_features=0.5, max_samples=0.25, n_estimators=1000 ...........
[CV]  max_features=0.5, max_samples=0.25, n_estimators=1000, score=0.917647 -   5.0s
[CV] max_features=0.5, max_samples=0.5, n_estimators=4 ...............
[CV]  max_features=0.5, max_samples=0.5, n_estimators=4, score=0.941860 -   0.0s
[CV] max_features=0.5, max_samples=0.5, n_estimators=4 ...............
[CV]  max_features=0.5, max_samples=0.5, n_estimators=4, score=0.882353

[Parallel(n_jobs=-1)]: Done 103 tasks      | elapsed:   27.7s


[CV]  max_features=0.5, max_samples=0.5, n_estimators=10, score=0.964706 -   0.1s
[CV] max_features=0.5, max_samples=0.5, n_estimators=10 ..............
[CV]  max_features=0.5, max_samples=0.5, n_estimators=10, score=0.941176 -   0.1s
[CV] max_features=0.5, max_samples=0.5, n_estimators=25 ..............
[CV]  max_features=0.5, max_samples=0.5, n_estimators=25, score=0.976744 -   0.1s
[CV] max_features=0.5, max_samples=0.5, n_estimators=25 ..............
[CV]  max_features=0.5, max_samples=0.5, n_estimators=25, score=0.929412 -   0.1s
[CV] max_features=0.5, max_samples=0.5, n_estimators=25 ..............
[CV]  max_features=0.5, max_samples=0.5, n_estimators=25, score=0.870588 -   0.1s
[CV] max_features=0.5, max_samples=0.5, n_estimators=25 ..............
[CV]  max_features=0.5, max_samples=0.5, n_estimators=25, score=0.964706 -   0.1s
[CV] max_features=0.5, max_samples=0.5, n_estimators=25 ..............
[CV]  max_features=0.5, max_samples=0.5, n_estimators=25, score=0.917647 -   0.1s


[Parallel(n_jobs=-1)]: Done 114 tasks      | elapsed:   29.3s


[CV]  max_features=0.5, max_samples=0.5, n_estimators=50, score=0.882353 -   0.3s
[CV] max_features=0.5, max_samples=0.5, n_estimators=50 ..............
[CV]  max_features=0.5, max_samples=0.5, n_estimators=50, score=0.976471 -   0.3s
[CV] max_features=0.5, max_samples=0.5, n_estimators=1000 ............
[CV]  max_features=0.5, max_samples=0.5, n_estimators=50, score=0.905882 -   0.3s
[CV] max_features=0.5, max_samples=0.5, n_estimators=1000 ............
[CV]  max_features=0.5, max_samples=0.25, n_estimators=1000, score=0.976471 -   5.4s
[CV] max_features=0.5, max_samples=0.5, n_estimators=1000 ............
[CV]  max_features=0.5, max_samples=0.25, n_estimators=1000, score=0.905882 -   5.1s
[CV] max_features=0.5, max_samples=0.5, n_estimators=1000 ............
[CV]  max_features=0.5, max_samples=0.5, n_estimators=1000, score=0.953488 -   5.1s
[CV] max_features=0.5, max_samples=0.5, n_estimators=1000 ............
[CV]  max_features=0.5, max_samples=0.5, n_estimators=1000, score=0.917647

[Parallel(n_jobs=-1)]: Done 127 tasks      | elapsed:   35.1s


[CV] max_features=0.5, max_samples=1.0, n_estimators=10 ..............
[CV]  max_features=0.5, max_samples=1.0, n_estimators=10, score=0.952941 -   0.1s
[CV] max_features=0.5, max_samples=1.0, n_estimators=25 ..............
[CV]  max_features=0.5, max_samples=1.0, n_estimators=10, score=0.894118 -   0.1s
[CV] max_features=0.5, max_samples=1.0, n_estimators=25 ..............
[CV]  max_features=0.5, max_samples=1.0, n_estimators=25, score=0.941860 -   0.2s
[CV] max_features=0.5, max_samples=1.0, n_estimators=25 ..............
[CV]  max_features=0.5, max_samples=1.0, n_estimators=25, score=0.941176 -   0.2s
[CV] max_features=0.5, max_samples=1.0, n_estimators=25 ..............
[CV]  max_features=0.5, max_samples=1.0, n_estimators=25, score=0.882353 -   0.2s
[CV] max_features=0.5, max_samples=1.0, n_estimators=25 ..............
[CV]  max_features=0.5, max_samples=1.0, n_estimators=25, score=0.964706 -   0.2s
[CV] max_features=0.5, max_samples=1.0, n_estimators=50 ..............
[CV]  max_f

[Parallel(n_jobs=-1)]: Done 140 tasks      | elapsed:   36.2s


[CV]  max_features=0.5, max_samples=1.0, n_estimators=50, score=0.976471 -   0.3s
[CV] max_features=0.5, max_samples=1.0, n_estimators=1000 ............
[CV]  max_features=0.5, max_samples=1.0, n_estimators=50, score=0.905882 -   0.3s
[CV] max_features=0.5, max_samples=1.0, n_estimators=1000 ............
[CV]  max_features=0.5, max_samples=0.5, n_estimators=1000, score=0.976471 -   5.4s
[CV] max_features=0.5, max_samples=1.0, n_estimators=1000 ............
[CV]  max_features=0.5, max_samples=0.5, n_estimators=1000, score=0.917647 -   5.4s
[CV] max_features=0.5, max_samples=1.0, n_estimators=1000 ............
[CV]  max_features=0.5, max_samples=1.0, n_estimators=1000, score=0.953488 -   5.4s
[CV] max_features=0.5, max_samples=1.0, n_estimators=1000 ............
[CV]  max_features=0.5, max_samples=1.0, n_estimators=1000, score=0.941176 -   5.4s
[CV] max_features=0.75, max_samples=0.25, n_estimators=4 .............
[CV]  max_features=0.75, max_samples=0.25, n_estimators=4, score=0.965116 

[Parallel(n_jobs=-1)]: Done 155 tasks      | elapsed:   42.8s


[CV] max_features=0.75, max_samples=0.25, n_estimators=25 ............
[CV]  max_features=0.75, max_samples=0.25, n_estimators=25, score=0.965116 -   0.1s
[CV] max_features=0.75, max_samples=0.25, n_estimators=25 ............
[CV]  max_features=0.75, max_samples=0.25, n_estimators=25, score=0.917647 -   0.1s
[CV] max_features=0.75, max_samples=0.25, n_estimators=25 ............
[CV]  max_features=0.75, max_samples=0.25, n_estimators=25, score=0.870588 -   0.1s
[CV] max_features=0.75, max_samples=0.25, n_estimators=25 ............
[CV]  max_features=0.5, max_samples=1.0, n_estimators=1000, score=0.905882 -   5.8s
[CV] max_features=0.75, max_samples=0.25, n_estimators=25 ............
[CV]  max_features=0.75, max_samples=0.25, n_estimators=25, score=0.964706 -   0.2s
[CV] max_features=0.75, max_samples=0.25, n_estimators=50 ............
[CV]  max_features=0.75, max_samples=0.25, n_estimators=25, score=0.917647 -   0.2s
[CV] max_features=0.75, max_samples=0.25, n_estimators=50 ............

[Parallel(n_jobs=-1)]: Done 170 tasks      | elapsed:   47.8s


[CV]  max_features=0.75, max_samples=0.25, n_estimators=1000, score=0.953488 -   5.0s
[CV] max_features=0.75, max_samples=0.25, n_estimators=1000 ..........
[CV]  max_features=0.75, max_samples=0.25, n_estimators=1000, score=0.917647 -   4.9s
[CV] max_features=0.75, max_samples=0.5, n_estimators=4 ..............
[CV]  max_features=0.75, max_samples=0.5, n_estimators=4, score=0.941860 -   0.0s
[CV] max_features=0.75, max_samples=0.5, n_estimators=4 ..............
[CV]  max_features=0.75, max_samples=0.5, n_estimators=4, score=0.941176 -   0.0s
[CV] max_features=0.75, max_samples=0.5, n_estimators=4 ..............
[CV]  max_features=0.75, max_samples=0.5, n_estimators=4, score=0.882353 -   0.0s
[CV] max_features=0.75, max_samples=0.5, n_estimators=4 ..............
[CV]  max_features=0.75, max_samples=0.5, n_estimators=4, score=0.929412 -   0.0s
[CV] max_features=0.75, max_samples=0.5, n_estimators=4 ..............
[CV]  max_features=0.75, max_samples=0.5, n_estimators=4, score=0.905882 -

[Parallel(n_jobs=-1)]: Done 187 tasks      | elapsed:   51.0s


[CV]  max_features=0.75, max_samples=0.5, n_estimators=50, score=0.953488 -   0.3s
[CV] max_features=0.75, max_samples=0.5, n_estimators=50 .............
[CV]  max_features=0.75, max_samples=0.25, n_estimators=1000, score=0.870588 -   5.3s
[CV] max_features=0.75, max_samples=0.5, n_estimators=50 .............
[CV]  max_features=0.75, max_samples=0.5, n_estimators=50, score=0.917647 -   0.3s
[CV] max_features=0.75, max_samples=0.5, n_estimators=50 .............
[CV]  max_features=0.75, max_samples=0.5, n_estimators=50, score=0.917647 -   0.3s
[CV] max_features=0.75, max_samples=0.5, n_estimators=50 .............
[CV]  max_features=0.75, max_samples=0.5, n_estimators=50, score=0.905882 -   0.3s
[CV] max_features=0.75, max_samples=0.5, n_estimators=1000 ...........
[CV]  max_features=0.75, max_samples=0.5, n_estimators=50, score=0.976471 -   0.3s
[CV] max_features=0.75, max_samples=0.5, n_estimators=1000 ...........
[CV]  max_features=0.75, max_samples=0.25, n_estimators=1000, score=0.964

[Parallel(n_jobs=-1)]: Done 204 tasks      | elapsed:   57.9s


[CV]  max_features=0.75, max_samples=1.0, n_estimators=10, score=0.894118 -   0.1s
[CV] max_features=0.75, max_samples=1.0, n_estimators=25 .............
[CV]  max_features=0.75, max_samples=1.0, n_estimators=25, score=0.965116 -   0.2s
[CV] max_features=0.75, max_samples=1.0, n_estimators=25 .............
[CV]  max_features=0.75, max_samples=1.0, n_estimators=25, score=0.941176 -   0.2s
[CV] max_features=0.75, max_samples=1.0, n_estimators=25 .............
[CV]  max_features=0.75, max_samples=1.0, n_estimators=25, score=0.929412 -   0.2s
[CV] max_features=0.75, max_samples=1.0, n_estimators=25 .............
[CV]  max_features=0.75, max_samples=1.0, n_estimators=25, score=0.976471 -   0.1s
[CV] max_features=0.75, max_samples=1.0, n_estimators=25 .............
[CV]  max_features=0.75, max_samples=1.0, n_estimators=25, score=0.905882 -   0.2s
[CV] max_features=0.75, max_samples=1.0, n_estimators=50 .............
[CV]  max_features=0.75, max_samples=0.5, n_estimators=1000, score=0.894118 

[Parallel(n_jobs=-1)]: Done 223 tasks      | elapsed:  1.1min


[CV] max_features=1.0, max_samples=0.25, n_estimators=4 ..............
[CV]  max_features=0.75, max_samples=1.0, n_estimators=1000, score=0.905882 -   6.0s
[CV]  max_features=1.0, max_samples=0.25, n_estimators=4, score=0.882353 -   0.0s
[CV] max_features=1.0, max_samples=0.25, n_estimators=10 .............
[CV] max_features=1.0, max_samples=0.25, n_estimators=10 .............
[CV]  max_features=1.0, max_samples=0.25, n_estimators=10, score=0.929412 -   0.1s
[CV]  max_features=1.0, max_samples=0.25, n_estimators=10, score=0.976744 -   0.1s
[CV] max_features=1.0, max_samples=0.25, n_estimators=10 .............
[CV] max_features=1.0, max_samples=0.25, n_estimators=10 .............
[CV]  max_features=1.0, max_samples=0.25, n_estimators=10, score=0.835294 -   0.1s
[CV] max_features=1.0, max_samples=0.25, n_estimators=10 .............
[CV]  max_features=1.0, max_samples=0.25, n_estimators=10, score=0.941176 -   0.1s
[CV] max_features=1.0, max_samples=0.25, n_estimators=25 .............
[CV]

[Parallel(n_jobs=-1)]: Done 242 tasks      | elapsed:  1.1min


[CV]  max_features=1.0, max_samples=0.25, n_estimators=50, score=0.894118 -   0.3s
[CV] max_features=1.0, max_samples=0.25, n_estimators=1000 ...........
[CV]  max_features=0.75, max_samples=1.0, n_estimators=1000, score=0.976471 -   7.7s
[CV] max_features=1.0, max_samples=0.25, n_estimators=1000 ...........
[CV]  max_features=0.75, max_samples=1.0, n_estimators=1000, score=0.894118 -   9.2s
[CV] max_features=1.0, max_samples=0.25, n_estimators=1000 ...........
[CV]  max_features=1.0, max_samples=0.25, n_estimators=1000, score=0.917647 -   7.9s
[CV] max_features=1.0, max_samples=0.25, n_estimators=1000 ...........
[CV]  max_features=1.0, max_samples=0.25, n_estimators=1000, score=0.953488 -   8.2s
[CV] max_features=1.0, max_samples=0.5, n_estimators=4 ...............
[CV]  max_features=1.0, max_samples=0.5, n_estimators=4, score=0.941860 -   0.0s
[CV] max_features=1.0, max_samples=0.5, n_estimators=4 ...............
[CV]  max_features=1.0, max_samples=0.5, n_estimators=4, score=0.94117

[Parallel(n_jobs=-1)]: Done 263 tasks      | elapsed:  1.3min


[CV]  max_features=1.0, max_samples=0.5, n_estimators=50, score=0.929412 -   0.3s
[CV] max_features=1.0, max_samples=0.5, n_estimators=50 ..............
[CV]  max_features=1.0, max_samples=0.5, n_estimators=50, score=0.870588 -   0.4s
[CV] max_features=1.0, max_samples=0.5, n_estimators=50 ..............
[CV]  max_features=1.0, max_samples=0.5, n_estimators=50, score=0.964706 -   0.5s
[CV] max_features=1.0, max_samples=0.5, n_estimators=1000 ............
[CV]  max_features=1.0, max_samples=0.5, n_estimators=50, score=0.894118 -   0.4s
[CV] max_features=1.0, max_samples=0.5, n_estimators=1000 ............
[CV]  max_features=1.0, max_samples=0.25, n_estimators=1000, score=0.964706 -   6.8s
[CV] max_features=1.0, max_samples=0.5, n_estimators=1000 ............
[CV]  max_features=1.0, max_samples=0.25, n_estimators=1000, score=0.905882 -   6.5s
[CV] max_features=1.0, max_samples=0.5, n_estimators=1000 ............
[CV]  max_features=1.0, max_samples=0.5, n_estimators=1000, score=0.953488 -

[Parallel(n_jobs=-1)]: Done 284 tasks      | elapsed:  1.4min


[CV] max_features=1.0, max_samples=1.0, n_estimators=25 ..............
[CV]  max_features=1.0, max_samples=1.0, n_estimators=25, score=0.952941 -   0.2s
[CV] max_features=1.0, max_samples=1.0, n_estimators=25 ..............
[CV]  max_features=1.0, max_samples=1.0, n_estimators=25, score=0.894118 -   0.2s
[CV] max_features=1.0, max_samples=1.0, n_estimators=50 ..............
[CV]  max_features=1.0, max_samples=0.5, n_estimators=1000, score=0.894118 -   6.0s
[CV] max_features=1.0, max_samples=1.0, n_estimators=50 ..............
[CV]  max_features=1.0, max_samples=1.0, n_estimators=50, score=0.953488 -   0.4s
[CV] max_features=1.0, max_samples=1.0, n_estimators=50 ..............
[CV]  max_features=1.0, max_samples=1.0, n_estimators=50, score=0.894118 -   0.3s
[CV] max_features=1.0, max_samples=1.0, n_estimators=50 ..............
[CV]  max_features=1.0, max_samples=1.0, n_estimators=50, score=0.929412 -   0.4s
[CV] max_features=1.0, max_samples=1.0, n_estimators=50 ..............
[CV]  max

[Parallel(n_jobs=-1)]: Done 300 out of 300 | elapsed:  1.6min finished


GridSearchCV(cv=5, error_score='raise',
       estimator=BaggingClassifier(base_estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=2,
            max_features=None, max_leaf_nodes=None, min_samples_leaf=1,
            min_samples_split=2, min_weight_fraction_leaf=0.0,
            presort=False, random_state=None, spl...n_estimators=10, n_jobs=1, oob_score=False,
         random_state=None, verbose=0, warm_start=False),
       fit_params={}, iid=True, n_jobs=-1,
       param_grid={'n_estimators': [4, 10, 25, 50, 1000], 'max_samples': [0.25, 0.5, 1.0], 'max_features': [0.25, 0.5, 0.75, 1.0]},
       pre_dispatch='2*n_jobs', refit=True, scoring=None, verbose=10)

In [92]:
grid_search_bagging.best_score_

0.94366197183098588

In [93]:
best_estimator = grid_search_bagging.best_estimator_

In [94]:
best_estimator

BaggingClassifier(base_estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=2,
            max_features=None, max_leaf_nodes=None, min_samples_leaf=1,
            min_samples_split=2, min_weight_fraction_leaf=0.0,
            presort=False, random_state=None, splitter='best'),
         bootstrap=True, bootstrap_features=False, max_features=0.75,
         max_samples=1.0, n_estimators=25, n_jobs=1, oob_score=False,
         random_state=None, verbose=0, warm_start=False)

In [95]:
best_estimator.fit(x_train, y_train)

BaggingClassifier(base_estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=2,
            max_features=None, max_leaf_nodes=None, min_samples_leaf=1,
            min_samples_split=2, min_weight_fraction_leaf=0.0,
            presort=False, random_state=None, splitter='best'),
         bootstrap=True, bootstrap_features=False, max_features=0.75,
         max_samples=1.0, n_estimators=25, n_jobs=1, oob_score=False,
         random_state=None, verbose=0, warm_start=False)

In [96]:
best_estimator.score(x_train, y_train)

0.95539906103286387

In [97]:
best_estimator.score(x_test, y_test)

0.95104895104895104

In [98]:
predictions = best_estimator.predict(x_test)

In [99]:
from sklearn.metrics import confusion_matrix, classification_report

In [100]:
confusion_matrix(y_test, predictions)

array([[48,  4],
       [ 3, 88]])

In [102]:
print classification_report(y_test, predictions)

             precision    recall  f1-score   support

          0       0.94      0.92      0.93        52
          1       0.96      0.97      0.96        91

avg / total       0.95      0.95      0.95       143



In [103]:
best_estimator_full_fit = best_estimator
best_estimator_full_fit.fit(df.values, y['malignant'].values)
best_estimator_full_fit.score(df.values, y['malignant'].values)

0.96836555360281196

In [104]:
predictions = best_estimator_full_fit.predict(df.values)

In [105]:
confusion_matrix(y['malignant'].values, predictions)

array([[204,   8],
       [ 10, 347]])

In [107]:
print classification_report(y['malignant'].values, predictions)

             precision    recall  f1-score   support

          0       0.95      0.96      0.96       212
          1       0.98      0.97      0.97       357

avg / total       0.97      0.97      0.97       569



## 2 Diabetes and Regression

Scikit Learn has a dataset of diabetic patients obtained from this study:

http://www4.stat.ncsu.edu/~boos/var.select/diabetes.html

http://web.stanford.edu/~hastie/Papers/LARS/LeastAngle_2002.pdf

442 diabetes patients were measured on 10 baseline variables: age, sex, body mass index, average blood pressure, and six blood serum measurements.

The target is a quantitative measure of disease progression one year after baseline.

Repeat the above comparison between a DecisionTreeRegressor and a Bagging version of the same.

### 2.a Simple comparison
1. Load the data and create X and y
- Initialize a Decision Tree Regressor and use `cross_val_score` to evaluate its performance. Set crossvalidation to 5-folds. Which score will you use?
- Wrap a Bagging Regressor around the Decision Tree Regressor and use `cross_val_score` to evaluate its performance. Set crossvalidation to 5-folds. 
- Which score is better? Are the score significantly different? How can you judge that?

### 2.b Grid Search

Repeat Grid search as above:

1. Initialize a `GridSearchCV` with 5-fold cross validation for the Decision Tree Regressor
- Search for few values of the parameters in order to improve the score of the regressor
- Use the whole X, y dataset for your test
- Check the best\_score\_ once you've trained it. Is it better than before?
- How does the score of the Grid-searched DT compare with the score of the Bagging DT?
- Initialize a GridSearchCV with 5-fold cross validation for the Bagging Decision Tree Regressor
- Repeat the search
    - Note that you'll have to change parameter names for the `base_estimator`
    - Note that there are also additional parameters to change
    - Note that you may end up with a grid space too large to search in a short time
    - Make use of the `n_jobs` parameter to speed up your grid search
- Does the score improve for the Grid-searched Bagging Regressor?
- Which score is better?

## Bonus: Project 6 data

Repeat the analysis for the Project 6 Dataset