# ML Pipeline Preparation
Follow the instructions below to help you create your ML pipeline.
### 1. Import libraries and load data from database.
- Import Python libraries
- Load dataset from database with [`read_sql_table`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_sql_table.html)
- Define feature and target variables X and Y

In [1]:
! pip install --upgrade setuptools
! pip install --upgrade pip
! pip install xgboost

Collecting setuptools
[?25l  Downloading https://files.pythonhosted.org/packages/b0/3a/88b210db68e56854d0bcf4b38e165e03be377e13907746f825790f3df5bf/setuptools-59.6.0-py3-none-any.whl (952kB)
[K    100% |████████████████████████████████| 962kB 7.3MB/s eta 0:00:01    59% |███████████████████             | 563kB 12.3MB/s eta 0:00:01
[?25hInstalling collected packages: setuptools
  Found existing installation: setuptools 38.4.0
    Uninstalling setuptools-38.4.0:
      Successfully uninstalled setuptools-38.4.0
Successfully installed setuptools-59.6.0
Collecting pip
[?25l  Downloading https://files.pythonhosted.org/packages/a4/6d/6463d49a933f547439d6b5b98b46af8742cc03ae83543e4d7688c2420f8b/pip-21.3.1-py3-none-any.whl (1.7MB)
[K    100% |████████████████████████████████| 1.7MB 486kB/s ta 0:00:011   59% |███████████████████             | 1.0MB 5.1MB/s eta 0:00:01    67% |█████████████████████▌          | 1.2MB 3.6MB/s eta 0:00:01    76% |████████████████████████▌       | 1.3MB 4.3MB/s e

In [2]:
import nltk
nltk.download('stopwords')
nltk.download(['punkt', 'wordnet'])
nltk.download('omw-1.4')
nltk.download('averaged_perceptron_tagger')
from sqlalchemy import create_engine
import pandas as pd
import sqlite3
import numpy as np
import re

from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem.wordnet import WordNetLemmatizer
from sklearn.metrics import confusion_matrix,f1_score, classification_report
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer
from sklearn.multioutput import MultiOutputClassifier
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.model_selection import  train_test_split, GridSearchCV, RandomizedSearchCV
from sklearn.base import BaseEstimator, TransformerMixin

import xgboost as xgb


[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data] Downloading package omw-1.4 to /root/nltk_data...
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


In [3]:
# load data from database
engine = create_engine(
    'sqlite:///disaster_records.db')

#read table and separate X and Y features
df = pd.read_sql_table('disaster_records', engine)



In [4]:
df.describe()

Unnamed: 0,id,related,request,offer,aid_related,medical_help,medical_products,search_and_rescue,security,military,...,aid_centers,other_infrastructure,weather_related,floods,storm,fire,earthquake,cold,other_weather,direct_report
count,26215.0,26215.0,26215.0,26215.0,26215.0,26215.0,26215.0,26215.0,26215.0,26215.0,...,26215.0,26215.0,26215.0,26215.0,26215.0,26215.0,26215.0,26215.0,26215.0,26215.0
mean,15224.871333,0.76647,0.170666,0.004501,0.414267,0.079496,0.050086,0.027618,0.017967,0.032806,...,0.011787,0.043906,0.278352,0.082205,0.093191,0.010757,0.093649,0.020217,0.052489,0.193591
std,8827.053788,0.423085,0.376224,0.066941,0.492604,0.270517,0.218126,0.163878,0.132833,0.178131,...,0.107929,0.20489,0.448196,0.274682,0.290705,0.10316,0.291345,0.140746,0.223015,0.39512
min,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,7446.5,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,15663.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,22924.5,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
max,30265.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


We want to be aware of edge cases like this. blank tweets, or all space tweet could break a model. We will add some try/except nonetype where needed


In [5]:
df.columns

Index(['id', 'message', 'original', 'genre', 'related', 'request', 'offer',
       'aid_related', 'medical_help', 'medical_products', 'search_and_rescue',
       'security', 'military', 'child_alone', 'water', 'food', 'shelter',
       'clothing', 'money', 'missing_people', 'refugees', 'death', 'other_aid',
       'infrastructure_related', 'transport', 'buildings', 'electricity',
       'tools', 'hospitals', 'shops', 'aid_centers', 'other_infrastructure',
       'weather_related', 'floods', 'storm', 'fire', 'earthquake', 'cold',
       'other_weather', 'direct_report'],
      dtype='object')

In [6]:
# child_alone has only 0 values, so we will remove
df=df.drop(['child_alone'],axis=1)

In [7]:
X = df.iloc[:, 1].values
y = df.iloc[:,4:].values

### 2. Write a tokenization function to process your text data

In [8]:
stop_words = stopwords.words("english")
lemmatizer = WordNetLemmatizer()

#Tokenize message

def tokenize(text):
    url_regex = 'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+'
    detected_urls = re.findall(url_regex, text)
    for url in detected_urls:
        text = text.replace(url, "urlplaceholder")
    
    tokens = word_tokenize(text)
    lemmatizer = WordNetLemmatizer()
    
    clean_tokens = []
    for tok in tokens:
        clean_tok = lemmatizer.lemmatize(tok).lower().strip()
        clean_tokens.append(clean_tok)
    
    return clean_tokens

### 3. Build a machine learning pipeline
This machine pipeline should take in the `message` column as input and output classification results on the other 36 categories in the dataset. You may find the [MultiOutputClassifier](http://scikit-learn.org/stable/modules/generated/sklearn.multioutput.MultiOutputClassifier.html) helpful for predicting multiple target variables.

In [87]:
pipeline = Pipeline([
    ('vect', CountVectorizer(tokenizer=tokenize)),
    ('tfidf', TfidfTransformer()),
    ('clf', MultiOutputClassifier(RandomForestClassifier()))
])

### 4. Train pipeline
- Split data into train and test sets
- Train pipeline

In [9]:
X_train, X_test, y_train, y_test = train_test_split(X, y)

### 5. Test your model
Report the f1 score, precision and recall for each output category of the dataset. You can do this by iterating through the columns and calling sklearn's `classification_report` on each.

In [89]:
pipeline.fit(X_train, y_train)
y_pred1 = pipeline.predict(X_test)

In [21]:
labels = list(df.iloc[:,4:].columns)

In [90]:
print(classification_report(y_test,y_pred1, target_names = labels))

                        precision    recall  f1-score   support

               related       0.84      0.91      0.88      5011
               request       0.79      0.42      0.55      1118
                 offer       0.00      0.00      0.00        37
           aid_related       0.74      0.58      0.65      2670
          medical_help       0.59      0.12      0.20       542
      medical_products       0.63      0.09      0.15       317
     search_and_rescue       0.56      0.10      0.17       154
              security       0.40      0.02      0.03       121
              military       0.50      0.06      0.10       218
                 water       0.82      0.25      0.38       376
                  food       0.84      0.54      0.66       700
               shelter       0.81      0.34      0.48       536
              clothing       0.73      0.13      0.22        87
                 money       0.75      0.04      0.08       139
        missing_people       0.33      

  'precision', 'predicted', average, warn_for)


In [27]:
def categoryClassificationReport(labels,y_test,y_pred):
    
    for index, label in enumerate(labels):
        classification = classification_report(y_test[:,index-1], y_pred[:,index-1]);
        print('----------------------------\n')
        print(label,"\n",classification)
    return

In [None]:
categoryClassificationReport(labels,y_test,y_pred1)

### 6. Improve your model
Use grid search to find better parameters. 

In [92]:
pipeline.get_params().keys()

dict_keys(['memory', 'steps', 'vect', 'tfidf', 'clf', 'vect__analyzer', 'vect__binary', 'vect__decode_error', 'vect__dtype', 'vect__encoding', 'vect__input', 'vect__lowercase', 'vect__max_df', 'vect__max_features', 'vect__min_df', 'vect__ngram_range', 'vect__preprocessor', 'vect__stop_words', 'vect__strip_accents', 'vect__token_pattern', 'vect__tokenizer', 'vect__vocabulary', 'tfidf__norm', 'tfidf__smooth_idf', 'tfidf__sublinear_tf', 'tfidf__use_idf', 'clf__estimator__bootstrap', 'clf__estimator__class_weight', 'clf__estimator__criterion', 'clf__estimator__max_depth', 'clf__estimator__max_features', 'clf__estimator__max_leaf_nodes', 'clf__estimator__min_impurity_decrease', 'clf__estimator__min_impurity_split', 'clf__estimator__min_samples_leaf', 'clf__estimator__min_samples_split', 'clf__estimator__min_weight_fraction_leaf', 'clf__estimator__n_estimators', 'clf__estimator__n_jobs', 'clf__estimator__oob_score', 'clf__estimator__random_state', 'clf__estimator__verbose', 'clf__estimator__

In [95]:
parameters = {
    'clf__estimator__max_depth': [2, 5, 10, 15, 20],
    'clf__estimator__min_samples_split': [2, 3, 4, 5, 10],
    'clf__estimator__n_estimators': [5, 50, 100, 250]
}

cv = GridSearchCV(pipeline, param_grid = parameters, n_jobs=-1, scoring = "f1_samples", verbose=2, cv=2)

### 7. Test  model
Show the accuracy, precision, and recall of the tuned model.  

Since this project focuses on code quality, process, and  pipelines, there is no minimum performance metric needed to pass. However, make sure to fine tune your models for accuracy, precision and recall to make your project stand out - especially for your portfolio!

In [96]:
cv.fit(X_train, y_train)
y_pred = cv.predict(X_test)

Fitting 2 folds for each of 100 candidates, totalling 200 fits
[CV] clf__estimator__max_depth=2, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=5 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=2, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=5, score=0.405544292853673, total=  10.0s
[CV] clf__estimator__max_depth=2, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=5 


[Parallel(n_jobs=-1)]: Done   1 out of   1 | elapsed:   14.8s remaining:    0.0s
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=2, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=5, score=0.40577745364409207, total=  10.1s
[CV] clf__estimator__max_depth=2, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=50 


[Parallel(n_jobs=-1)]: Done   2 out of   2 | elapsed:   29.6s remaining:    0.0s
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=2, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=50, score=0.4038993415147968, total=  16.3s
[CV] clf__estimator__max_depth=2, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=50 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=2, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=50, score=0.4036523493082254, total=  16.3s
[CV] clf__estimator__max_depth=2, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=100 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=2, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=100, score=0.40392608958280274, total=  22.8s
[CV] clf__estimator__max_depth=2, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=100 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=2, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=100, score=0.4035897819367515, total=  23.0s
[CV] clf__estimator__max_depth=2, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=250 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=2, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=250, score=0.40390754081450086, total=  43.6s
[CV] clf__estimator__max_depth=2, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=250 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=2, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=250, score=0.40361214814736135, total=  43.4s
[CV] clf__estimator__max_depth=2, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=5 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=2, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=5, score=0.40446954705707383, total=  10.4s
[CV] clf__estimator__max_depth=2, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=5 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=2, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=5, score=0.40884572407990355, total=  10.4s
[CV] clf__estimator__max_depth=2, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=50 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=2, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=50, score=0.40387704937418334, total=  16.1s
[CV] clf__estimator__max_depth=2, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=50 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=2, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=50, score=0.4036007000374608, total=  16.2s
[CV] clf__estimator__max_depth=2, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=100 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=2, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=100, score=0.40397374945761566, total=  22.9s
[CV] clf__estimator__max_depth=2, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=100 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=2, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=100, score=0.40361615590279853, total=  22.8s
[CV] clf__estimator__max_depth=2, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=250 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=2, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=250, score=0.40386371527678505, total=  43.1s
[CV] clf__estimator__max_depth=2, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=250 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=2, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=250, score=0.4035856537871946, total=  42.5s
[CV] clf__estimator__max_depth=2, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=5 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=2, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=5, score=0.4053568683065538, total=  10.2s
[CV] clf__estimator__max_depth=2, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=5 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=2, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=5, score=0.4044119050675747, total=  10.0s
[CV] clf__estimator__max_depth=2, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=50 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=2, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=50, score=0.4039818489497762, total=  16.0s
[CV] clf__estimator__max_depth=2, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=50 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=2, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=50, score=0.40373698262017527, total=  16.0s
[CV] clf__estimator__max_depth=2, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=100 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=2, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=100, score=0.4038802616141975, total=  22.8s
[CV] clf__estimator__max_depth=2, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=100 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=2, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=100, score=0.40365897780248855, total=  22.6s
[CV] clf__estimator__max_depth=2, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=250 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=2, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=250, score=0.4038755824995159, total=  42.3s
[CV] clf__estimator__max_depth=2, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=250 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=2, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=250, score=0.403592419042846, total=  42.3s
[CV] clf__estimator__max_depth=2, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=5 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=2, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=5, score=0.40507417479887986, total=   9.9s
[CV] clf__estimator__max_depth=2, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=5 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=2, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=5, score=0.40489633925766005, total=  10.2s
[CV] clf__estimator__max_depth=2, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=50 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=2, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=50, score=0.4040888083354262, total=  16.2s
[CV] clf__estimator__max_depth=2, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=50 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=2, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=50, score=0.4035590200873815, total=  16.1s
[CV] clf__estimator__max_depth=2, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=100 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=2, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=100, score=0.40386560343384037, total=  22.6s
[CV] clf__estimator__max_depth=2, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=100 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=2, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=100, score=0.4035921910873348, total=  22.4s
[CV] clf__estimator__max_depth=2, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=250 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=2, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=250, score=0.4038671394161501, total=  42.1s
[CV] clf__estimator__max_depth=2, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=250 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=2, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=250, score=0.40359502545197284, total=  41.8s
[CV] clf__estimator__max_depth=2, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=5 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=2, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=5, score=0.40647836683480676, total=   9.9s
[CV] clf__estimator__max_depth=2, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=5 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=2, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=5, score=0.4090326268122506, total=  10.1s
[CV] clf__estimator__max_depth=2, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=50 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=2, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=50, score=0.403885778974533, total=  16.4s
[CV] clf__estimator__max_depth=2, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=50 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=2, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=50, score=0.40358856388255265, total=  16.3s
[CV] clf__estimator__max_depth=2, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=100 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=2, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=100, score=0.4039090437216306, total=  23.3s
[CV] clf__estimator__max_depth=2, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=100 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=2, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=100, score=0.4036177023387295, total=  22.9s
[CV] clf__estimator__max_depth=2, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=250 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=2, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=250, score=0.40389099447708837, total=  42.4s
[CV] clf__estimator__max_depth=2, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=250 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=2, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=250, score=0.403605605953242, total=  42.7s
[CV] clf__estimator__max_depth=5, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=5 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=5, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=5, score=0.41088321714846476, total=  10.5s
[CV] clf__estimator__max_depth=5, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=5 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=5, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=5, score=0.4122413182781078, total=  10.2s
[CV] clf__estimator__max_depth=5, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=50 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=5, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=50, score=0.4054873064845831, total=  18.0s
[CV] clf__estimator__max_depth=5, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=50 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=5, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=50, score=0.40466351696470837, total=  18.0s
[CV] clf__estimator__max_depth=5, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=100 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=5, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=100, score=0.4058233717400112, total=  26.3s
[CV] clf__estimator__max_depth=5, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=100 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=5, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=100, score=0.4041907585866812, total=  26.1s
[CV] clf__estimator__max_depth=5, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=250 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=5, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=250, score=0.404466746278663, total=  50.5s
[CV] clf__estimator__max_depth=5, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=250 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=5, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=250, score=0.40450004711008336, total=  49.9s
[CV] clf__estimator__max_depth=5, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=5 


  'recall', 'true', average, warn_for)
  'precision', 'predicted', average, warn_for)


[CV]  clf__estimator__max_depth=5, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=5, score=0.4126692101085155, total=   9.9s
[CV] clf__estimator__max_depth=5, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=5 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=5, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=5, score=0.4071403034942494, total=   9.9s
[CV] clf__estimator__max_depth=5, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=50 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=5, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=50, score=0.4058131105542667, total=  17.4s
[CV] clf__estimator__max_depth=5, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=50 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=5, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=50, score=0.40415636380168096, total=  17.0s
[CV] clf__estimator__max_depth=5, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=100 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=5, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=100, score=0.4047946234535204, total=  25.2s
[CV] clf__estimator__max_depth=5, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=100 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=5, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=100, score=0.40450470291272445, total=  25.4s
[CV] clf__estimator__max_depth=5, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=250 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=5, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=250, score=0.40501072451777886, total=  49.2s
[CV] clf__estimator__max_depth=5, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=250 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=5, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=250, score=0.40449133644761176, total=  49.6s
[CV] clf__estimator__max_depth=5, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=5 


  'recall', 'true', average, warn_for)
  'precision', 'predicted', average, warn_for)


[CV]  clf__estimator__max_depth=5, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=5, score=0.40884890842010246, total=  10.2s
[CV] clf__estimator__max_depth=5, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=5 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=5, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=5, score=0.4106779871815398, total=   9.9s
[CV] clf__estimator__max_depth=5, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=50 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=5, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=50, score=0.404673602976734, total=  17.1s
[CV] clf__estimator__max_depth=5, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=50 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=5, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=50, score=0.4048877890930321, total=  17.3s
[CV] clf__estimator__max_depth=5, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=100 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=5, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=100, score=0.4051015115096949, total=  25.2s
[CV] clf__estimator__max_depth=5, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=100 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=5, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=100, score=0.4044401652038129, total=  25.3s
[CV] clf__estimator__max_depth=5, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=250 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=5, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=250, score=0.4048399227790078, total=  49.4s
[CV] clf__estimator__max_depth=5, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=250 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=5, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=250, score=0.4040196960726143, total=  49.4s
[CV] clf__estimator__max_depth=5, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=5 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=5, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=5, score=0.41397188717222383, total=  10.2s
[CV] clf__estimator__max_depth=5, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=5 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=5, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=5, score=0.4111864647566785, total=  10.2s
[CV] clf__estimator__max_depth=5, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=50 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=5, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=50, score=0.40530646536947257, total=  17.2s
[CV] clf__estimator__max_depth=5, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=50 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=5, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=50, score=0.40429503008091333, total=  17.2s
[CV] clf__estimator__max_depth=5, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=100 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=5, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=100, score=0.4047281458347055, total=  25.2s
[CV] clf__estimator__max_depth=5, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=100 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=5, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=100, score=0.40461481867419963, total=  25.3s
[CV] clf__estimator__max_depth=5, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=250 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=5, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=250, score=0.4044149774846235, total=  48.7s
[CV] clf__estimator__max_depth=5, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=250 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=5, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=250, score=0.404198639463813, total=  49.2s
[CV] clf__estimator__max_depth=5, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=5 


  'recall', 'true', average, warn_for)
  'precision', 'predicted', average, warn_for)


[CV]  clf__estimator__max_depth=5, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=5, score=0.4158065066417232, total=  10.2s
[CV] clf__estimator__max_depth=5, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=5 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=5, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=5, score=0.4240802851519084, total=  10.4s
[CV] clf__estimator__max_depth=5, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=50 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=5, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=50, score=0.4052300190549858, total=  17.4s
[CV] clf__estimator__max_depth=5, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=50 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=5, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=50, score=0.4046895415729606, total=  17.7s
[CV] clf__estimator__max_depth=5, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=100 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=5, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=100, score=0.4047830638848261, total=  25.9s
[CV] clf__estimator__max_depth=5, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=100 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=5, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=100, score=0.40425774747914434, total=  25.8s
[CV] clf__estimator__max_depth=5, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=250 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=5, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=250, score=0.40458373278564036, total=  49.2s
[CV] clf__estimator__max_depth=5, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=250 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=5, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=250, score=0.40426134017033805, total=  49.3s
[CV] clf__estimator__max_depth=10, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=5 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=10, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=5, score=0.42105344896763847, total=  10.6s
[CV] clf__estimator__max_depth=10, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=5 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=10, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=5, score=0.41972562328240043, total=  10.2s
[CV] clf__estimator__max_depth=10, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=50 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=10, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=50, score=0.41463910637504237, total=  21.1s
[CV] clf__estimator__max_depth=10, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=50 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=10, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=50, score=0.41159687438039544, total=  20.9s
[CV] clf__estimator__max_depth=10, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=100 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=10, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=100, score=0.41362229888046825, total=  32.3s
[CV] clf__estimator__max_depth=10, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=100 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=10, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=100, score=0.4121335276688117, total=  32.4s
[CV] clf__estimator__max_depth=10, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=250 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=10, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=250, score=0.4130595295663302, total= 1.1min
[CV] clf__estimator__max_depth=10, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=250 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=10, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=250, score=0.41126600346544906, total= 1.1min
[CV] clf__estimator__max_depth=10, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=5 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=10, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=5, score=0.4238226247129945, total=  10.7s
[CV] clf__estimator__max_depth=10, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=5 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=10, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=5, score=0.4197583689384567, total=  10.9s
[CV] clf__estimator__max_depth=10, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=50 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=10, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=50, score=0.41448107611928464, total=  21.1s
[CV] clf__estimator__max_depth=10, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=50 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=10, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=50, score=0.41448471924601343, total=  21.0s
[CV] clf__estimator__max_depth=10, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=100 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=10, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=100, score=0.41291329112259945, total=  32.4s
[CV] clf__estimator__max_depth=10, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=100 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=10, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=100, score=0.4127129272095006, total=  32.0s
[CV] clf__estimator__max_depth=10, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=250 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=10, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=250, score=0.4143012239325646, total= 1.1min
[CV] clf__estimator__max_depth=10, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=250 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=10, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=250, score=0.41107670103621546, total= 1.1min
[CV] clf__estimator__max_depth=10, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=5 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=10, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=5, score=0.4235362049383572, total=  10.2s
[CV] clf__estimator__max_depth=10, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=5 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=10, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=5, score=0.42158276711832676, total=  10.3s
[CV] clf__estimator__max_depth=10, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=50 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=10, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=50, score=0.41349281788891834, total=  20.4s
[CV] clf__estimator__max_depth=10, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=50 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=10, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=50, score=0.4131753719684219, total=  20.7s
[CV] clf__estimator__max_depth=10, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=100 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=10, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=100, score=0.41325367633661375, total=  31.5s
[CV] clf__estimator__max_depth=10, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=100 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=10, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=100, score=0.41158201570704994, total=  31.1s
[CV] clf__estimator__max_depth=10, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=250 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=10, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=250, score=0.41313469022102023, total= 1.1min
[CV] clf__estimator__max_depth=10, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=250 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=10, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=250, score=0.41157460695622605, total= 1.1min
[CV] clf__estimator__max_depth=10, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=5 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=10, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=5, score=0.4260463970742335, total=  10.4s
[CV] clf__estimator__max_depth=10, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=5 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=10, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=5, score=0.4370654659918005, total=  10.4s
[CV] clf__estimator__max_depth=10, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=50 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=10, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=50, score=0.4167504121383678, total=  20.5s
[CV] clf__estimator__max_depth=10, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=50 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=10, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=50, score=0.4146826716497828, total=  20.4s
[CV] clf__estimator__max_depth=10, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=100 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=10, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=100, score=0.4141287371442706, total=  31.3s
[CV] clf__estimator__max_depth=10, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=100 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=10, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=100, score=0.4128533900704626, total=  30.9s
[CV] clf__estimator__max_depth=10, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=250 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=10, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=250, score=0.41246507179603553, total= 1.1min
[CV] clf__estimator__max_depth=10, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=250 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=10, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=250, score=0.41119255045252573, total= 1.1min
[CV] clf__estimator__max_depth=10, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=5 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=10, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=5, score=0.4218773632735073, total=  10.1s
[CV] clf__estimator__max_depth=10, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=5 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=10, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=5, score=0.421582342188103, total=  10.2s
[CV] clf__estimator__max_depth=10, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=50 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=10, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=50, score=0.414843258989039, total=  20.2s
[CV] clf__estimator__max_depth=10, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=50 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=10, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=50, score=0.4099563330456282, total=  20.4s
[CV] clf__estimator__max_depth=10, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=100 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=10, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=100, score=0.4126026156883354, total=  31.2s
[CV] clf__estimator__max_depth=10, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=100 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=10, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=100, score=0.4114613067862267, total=  31.4s
[CV] clf__estimator__max_depth=10, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=250 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=10, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=250, score=0.4128293342664132, total= 1.1min
[CV] clf__estimator__max_depth=10, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=250 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=10, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=250, score=0.41020495494575115, total= 1.0min
[CV] clf__estimator__max_depth=15, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=5 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=15, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=5, score=0.4314095522124868, total=  10.8s
[CV] clf__estimator__max_depth=15, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=5 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=15, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=5, score=0.43482926395481786, total=  11.1s
[CV] clf__estimator__max_depth=15, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=50 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=15, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=50, score=0.42801067053334685, total=  24.9s
[CV] clf__estimator__max_depth=15, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=50 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=15, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=50, score=0.4249051239736268, total=  24.6s
[CV] clf__estimator__max_depth=15, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=100 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=15, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=100, score=0.42496949524624156, total=  39.8s
[CV] clf__estimator__max_depth=15, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=100 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=15, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=100, score=0.4245132802822625, total=  39.4s
[CV] clf__estimator__max_depth=15, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=250 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=15, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=250, score=0.4244988997753563, total= 1.4min
[CV] clf__estimator__max_depth=15, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=250 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=15, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=250, score=0.42283012220486826, total= 1.4min
[CV] clf__estimator__max_depth=15, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=5 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=15, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=5, score=0.4275845021003622, total=  10.8s
[CV] clf__estimator__max_depth=15, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=5 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=15, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=5, score=0.4351246631774112, total=  11.4s
[CV] clf__estimator__max_depth=15, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=50 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=15, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=50, score=0.4270180329408607, total=  24.4s
[CV] clf__estimator__max_depth=15, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=50 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=15, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=50, score=0.4251590008415893, total=  24.5s
[CV] clf__estimator__max_depth=15, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=100 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=15, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=100, score=0.4277727012526222, total=  39.0s
[CV] clf__estimator__max_depth=15, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=100 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=15, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=100, score=0.4244102197203409, total=  41.2s
[CV] clf__estimator__max_depth=15, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=250 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=15, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=250, score=0.42463054965836783, total= 1.3min
[CV] clf__estimator__max_depth=15, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=250 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=15, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=250, score=0.42206417391740886, total= 1.3min
[CV] clf__estimator__max_depth=15, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=5 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=15, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=5, score=0.43879044652775323, total=  10.6s
[CV] clf__estimator__max_depth=15, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=5 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=15, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=5, score=0.4360099702938504, total=  10.9s
[CV] clf__estimator__max_depth=15, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=50 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=15, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=50, score=0.42748393889477715, total=  23.3s
[CV] clf__estimator__max_depth=15, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=50 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=15, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=50, score=0.42164465180969046, total=  23.3s
[CV] clf__estimator__max_depth=15, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=100 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=15, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=100, score=0.4245459817320497, total=  37.4s
[CV] clf__estimator__max_depth=15, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=100 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=15, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=100, score=0.4249109880096915, total=  37.4s
[CV] clf__estimator__max_depth=15, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=250 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=15, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=250, score=0.4252244135128617, total= 1.3min
[CV] clf__estimator__max_depth=15, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=250 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=15, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=250, score=0.4222578108700353, total= 1.3min
[CV] clf__estimator__max_depth=15, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=5 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=15, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=5, score=0.43232641174382136, total=  11.3s
[CV] clf__estimator__max_depth=15, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=5 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=15, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=5, score=0.4308745156997646, total=  11.1s
[CV] clf__estimator__max_depth=15, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=50 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=15, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=50, score=0.4276821128343066, total=  23.9s
[CV] clf__estimator__max_depth=15, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=50 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=15, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=50, score=0.4269864834378872, total=  23.3s
[CV] clf__estimator__max_depth=15, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=100 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=15, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=100, score=0.42379343932503766, total=  37.3s
[CV] clf__estimator__max_depth=15, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=100 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=15, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=100, score=0.42296477089128237, total=  36.6s
[CV] clf__estimator__max_depth=15, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=250 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=15, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=250, score=0.42444491906333054, total= 1.3min
[CV] clf__estimator__max_depth=15, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=250 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=15, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=250, score=0.42254014176602595, total= 1.3min
[CV] clf__estimator__max_depth=15, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=5 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=15, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=5, score=0.4305560688944975, total=  11.3s
[CV] clf__estimator__max_depth=15, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=5 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=15, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=5, score=0.43934748035143467, total=  11.2s
[CV] clf__estimator__max_depth=15, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=50 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=15, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=50, score=0.4254958762701885, total=  23.5s
[CV] clf__estimator__max_depth=15, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=50 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=15, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=50, score=0.42408275906595794, total=  23.5s
[CV] clf__estimator__max_depth=15, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=100 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=15, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=100, score=0.4264550502806415, total=  37.4s
[CV] clf__estimator__max_depth=15, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=100 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=15, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=100, score=0.42503848980169734, total=  37.3s
[CV] clf__estimator__max_depth=15, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=250 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=15, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=250, score=0.42420637377079085, total= 1.3min
[CV] clf__estimator__max_depth=15, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=250 


  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=15, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=250, score=0.4212132231152597, total= 1.3min
[CV] clf__estimator__max_depth=20, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=5 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=20, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=5, score=0.4430582943071329, total=  11.8s
[CV] clf__estimator__max_depth=20, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=5 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=20, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=5, score=0.43142983595529705, total=  11.3s
[CV] clf__estimator__max_depth=20, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=50 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=20, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=50, score=0.43667209387697714, total=  28.4s
[CV] clf__estimator__max_depth=20, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=50 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=20, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=50, score=0.4341585934385052, total=  28.4s
[CV] clf__estimator__max_depth=20, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=100 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=20, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=100, score=0.43786730412929253, total=  47.4s
[CV] clf__estimator__max_depth=20, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=100 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=20, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=100, score=0.4356850737870292, total=  47.0s
[CV] clf__estimator__max_depth=20, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=250 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=20, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=250, score=0.4369847791507957, total= 1.7min
[CV] clf__estimator__max_depth=20, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=250 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=20, clf__estimator__min_samples_split=2, clf__estimator__n_estimators=250, score=0.4347595275036426, total= 1.8min
[CV] clf__estimator__max_depth=20, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=5 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=20, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=5, score=0.440893825594184, total=  11.7s
[CV] clf__estimator__max_depth=20, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=5 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=20, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=5, score=0.4404989800137576, total=  11.6s
[CV] clf__estimator__max_depth=20, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=50 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=20, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=50, score=0.4359929877687472, total=  28.1s
[CV] clf__estimator__max_depth=20, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=50 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=20, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=50, score=0.4367514703217092, total=  28.1s
[CV] clf__estimator__max_depth=20, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=100 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=20, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=100, score=0.4368653041125462, total=  46.8s
[CV] clf__estimator__max_depth=20, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=100 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=20, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=100, score=0.4362471471483042, total=  46.3s
[CV] clf__estimator__max_depth=20, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=250 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=20, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=250, score=0.437358369495396, total= 1.7min
[CV] clf__estimator__max_depth=20, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=250 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=20, clf__estimator__min_samples_split=3, clf__estimator__n_estimators=250, score=0.4351070964923998, total= 1.7min
[CV] clf__estimator__max_depth=20, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=5 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=20, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=5, score=0.4414368908015827, total=  11.8s
[CV] clf__estimator__max_depth=20, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=5 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=20, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=5, score=0.44320029160672925, total=  11.2s
[CV] clf__estimator__max_depth=20, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=50 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=20, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=50, score=0.43588416450981526, total=  28.1s
[CV] clf__estimator__max_depth=20, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=50 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=20, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=50, score=0.43561373363176226, total=  27.9s
[CV] clf__estimator__max_depth=20, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=100 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=20, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=100, score=0.43745321007817395, total=  45.9s
[CV] clf__estimator__max_depth=20, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=100 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=20, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=100, score=0.43874950561427556, total=  44.8s
[CV] clf__estimator__max_depth=20, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=250 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=20, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=250, score=0.43593195210239344, total= 1.6min
[CV] clf__estimator__max_depth=20, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=250 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=20, clf__estimator__min_samples_split=4, clf__estimator__n_estimators=250, score=0.43531271274786026, total= 1.6min
[CV] clf__estimator__max_depth=20, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=5 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=20, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=5, score=0.44944939938597067, total=  11.9s
[CV] clf__estimator__max_depth=20, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=5 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=20, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=5, score=0.43875329519466977, total=  11.9s
[CV] clf__estimator__max_depth=20, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=50 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=20, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=50, score=0.43399414610854015, total=  28.0s
[CV] clf__estimator__max_depth=20, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=50 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=20, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=50, score=0.4401921745344058, total=  28.2s
[CV] clf__estimator__max_depth=20, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=100 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=20, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=100, score=0.4373763190988555, total=  46.1s
[CV] clf__estimator__max_depth=20, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=100 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=20, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=100, score=0.4363588207354172, total=  44.1s
[CV] clf__estimator__max_depth=20, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=250 


  'recall', 'true', average, warn_for)
  'precision', 'predicted', average, warn_for)


[CV]  clf__estimator__max_depth=20, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=250, score=0.43803333947784723, total= 1.7min
[CV] clf__estimator__max_depth=20, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=250 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=20, clf__estimator__min_samples_split=5, clf__estimator__n_estimators=250, score=0.4348123152207092, total= 1.6min
[CV] clf__estimator__max_depth=20, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=5 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=20, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=5, score=0.44311605852579555, total=  11.7s
[CV] clf__estimator__max_depth=20, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=5 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=20, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=5, score=0.4412037867634787, total=  11.0s
[CV] clf__estimator__max_depth=20, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=50 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=20, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=50, score=0.440919453549686, total=  25.9s
[CV] clf__estimator__max_depth=20, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=50 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=20, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=50, score=0.4397673276293268, total=  25.9s
[CV] clf__estimator__max_depth=20, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=100 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=20, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=100, score=0.43787867783150186, total=  42.9s
[CV] clf__estimator__max_depth=20, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=100 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=20, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=100, score=0.4382434369436825, total=  42.4s
[CV] clf__estimator__max_depth=20, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=250 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=20, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=250, score=0.43681855026732724, total= 1.5min
[CV] clf__estimator__max_depth=20, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=250 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  clf__estimator__max_depth=20, clf__estimator__min_samples_split=10, clf__estimator__n_estimators=250, score=0.43544128689552514, total= 1.5min


[Parallel(n_jobs=-1)]: Done 200 out of 200 | elapsed: 138.6min finished


In [97]:
cv.best_params_

{'clf__estimator__max_depth': 20,
 'clf__estimator__min_samples_split': 5,
 'clf__estimator__n_estimators': 5}

In [98]:
cv.best_score_

0.4441016193035493

In [99]:
print(classification_report(y_test, y_pred, target_names = labels))

                        precision    recall  f1-score   support

               related       0.77      1.00      0.87      5011
               request       0.90      0.03      0.06      1118
                 offer       0.00      0.00      0.00        37
           aid_related       0.79      0.31      0.44      2670
          medical_help       0.69      0.02      0.03       542
      medical_products       0.75      0.01      0.02       317
     search_and_rescue       1.00      0.01      0.01       154
              security       0.00      0.00      0.00       121
              military       0.00      0.00      0.00       218
                 water       1.00      0.00      0.01       376
                  food       0.43      0.00      0.01       700
               shelter       0.83      0.03      0.05       536
              clothing       0.00      0.00      0.00        87
                 money       0.00      0.00      0.00       139
        missing_people       0.00      

  'precision', 'predicted', average, warn_for)


In [100]:
categoryClassificationReport(labels,y_test,y_pred)

----------------------------

related 
              precision    recall  f1-score   support

          0       0.86      0.97      0.91      5314
          1       0.74      0.33      0.46      1240

avg / total       0.84      0.85      0.83      6554

----------------------------

request 
              precision    recall  f1-score   support

          0       0.61      0.45      0.52      1543
          1       0.84      0.91      0.88      5011

avg / total       0.79      0.80      0.79      6554

----------------------------

offer 
              precision    recall  f1-score   support

          0       0.89      0.98      0.93      5436
          1       0.79      0.42      0.55      1118

avg / total       0.87      0.88      0.87      6554

----------------------------

aid_related 
              precision    recall  f1-score   support

          0       0.99      1.00      1.00      6517
          1       0.00      0.00      0.00        37

avg / total       0.99      0.99

  'precision', 'predicted', average, warn_for)


### 7.B  Compare  model to original

In [102]:
pipeline_HP = Pipeline([
    ('vect', CountVectorizer(tokenizer=tokenize)),
    ('tfidf', TfidfTransformer()),
    ('clf', MultiOutputClassifier(RandomForestClassifier(min_samples_split= 5,n_estimators = 5, max_depth = 20)))
])

In [103]:
pipeline_HP.fit(X_train, y_train, )
y_pred_HP = pipeline_HP.predict(X_test)

In [104]:
print(f'Original Score',f1_score(y_test, y_pred1, average = 'samples'))
print(f'New Score',f1_score(y_test, y_pred_HP, average = 'samples'))

Original Score 0.479144752159
New Score 0.443990955898


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


### 8. Try improving your model further. Here are a few ideas:
* try other machine learning algorithms
* add other features besides the TF-IDF

In [105]:
#XGBoost, using mlogloss for multi-label classification
pipeline_xg = Pipeline([
    ('vect', CountVectorizer(tokenizer=tokenize)),
    ('tfidf', TfidfTransformer()),
    ('xg', MultiOutputClassifier(xgb.XGBClassifier(eval_metric='mlogloss',use_label_encoder=False)))
])

In [106]:
pipeline_xg.fit(X_train,y_train)

Pipeline(memory=None,
     steps=[('vect', CountVectorizer(analyzer='word', binary=False, decode_error='strict',
        dtype=<class 'numpy.int64'>, encoding='utf-8', input='content',
        lowercase=True, max_df=1.0, max_features=None, min_df=1,
        ngram_range=(1, 1), preprocessor=None, stop_words=None,
        strip...e, use_label_encoder=False,
       validate_parameters=None, verbosity=None),
           n_jobs=1))])

In [107]:
y_pred_xg = pipeline_xg.predict(X_test)

In [108]:
print(f'New Score',f1_score(y_test, y_pred_xg, average = 'samples'))

New Score 0.53645837946


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


## New Feature Extraction

In [10]:
class StartingVerbExtractor(BaseEstimator, TransformerMixin):

    def starting_verb(self, text):
        sentence_list = nltk.sent_tokenize(text)
        for sentence in sentence_list:
            pos_tags = nltk.pos_tag(tokenize(sentence))
            first_word, first_tag = pos_tags[0]
            # return true if the first word is an appropriate verb or RT for retweet
            if first_tag in ['VB', 'VBP'] or first_word == 'RT':
                return True
        return False

    def fit(self, X, y=None):
        return self

    def transform(self, X):
        X_tagged = pd.Series(X).apply(self.starting_verb)
        return pd.DataFrame(X_tagged)



In [16]:
class StartingPronounExtractor(BaseEstimator, TransformerMixin):

    def starting_pronoun(self, text):
        sentence_list = nltk.sent_tokenize(text)
        for sentence in sentence_list:
            pos_tags = nltk.pos_tag(tokenize(sentence))
            first_word, first_tag = pos_tags[0]
            # return true if the first word is an appropriate verb or RT for retweet
            if first_tag in ['PRP', 'PRP$']:
                return True
        return False

    def fit(self, X, y=None):
        return self

    def transform(self, X):
        X_tagged = pd.Series(X).apply(self.starting_pronoun)
        return pd.DataFrame(X_tagged)


In [32]:
pipeline_feature = Pipeline([
    ('features', FeatureUnion([

        ('nlp_pipeline', Pipeline([
            ('vect', CountVectorizer(tokenizer=tokenize)),
            ('tfidf', TfidfTransformer())
        ])),

        ('prnoun', StartingPronounExtractor())
    ])),

    ('xg', MultiOutputClassifier(xgb.XGBClassifier(eval_metric='mlogloss',use_label_encoder=False)))
])

In [18]:
pipeline_feature.fit(X_train,y_train)

Pipeline(memory=None,
     steps=[('features', FeatureUnion(n_jobs=1,
       transformer_list=[('nlp_pipeline', Pipeline(memory=None,
     steps=[('vect', CountVectorizer(analyzer='word', binary=False, decode_error='strict',
        dtype=<class 'numpy.int64'>, encoding='utf-8', input='content',
        lowercase=True, max_df...e, use_label_encoder=False,
       validate_parameters=None, verbosity=None),
           n_jobs=1))])

In [19]:
y_pred_feat = pipeline_feature.predict(X_test)

In [22]:
print(classification_report(y_test, y_pred_feat, target_names = labels))

                        precision    recall  f1-score   support

               related       0.85      0.95      0.90      5052
               request       0.78      0.58      0.67      1158
                 offer       0.00      0.00      0.00        33
           aid_related       0.77      0.64      0.70      2765
          medical_help       0.62      0.25      0.36       509
      medical_products       0.64      0.28      0.39       333
     search_and_rescue       0.68      0.21      0.32       198
              security       0.00      0.00      0.00       133
              military       0.60      0.27      0.37       218
                 water       0.76      0.67      0.72       445
                  food       0.81      0.73      0.77       766
               shelter       0.77      0.59      0.67       556
              clothing       0.77      0.51      0.61        99
                 money       0.60      0.23      0.33       150
        missing_people       0.72      

  'precision', 'predicted', average, warn_for)


In [28]:
categoryClassificationReport(labels,y_test,y_pred_feat)

----------------------------

related 
              precision    recall  f1-score   support

          0       0.89      0.96      0.92      5252
          1       0.76      0.51      0.61      1302

avg / total       0.86      0.87      0.86      6554

----------------------------

request 
              precision    recall  f1-score   support

          0       0.71      0.44      0.54      1502
          1       0.85      0.95      0.90      5052

avg / total       0.82      0.83      0.81      6554

----------------------------

offer 
              precision    recall  f1-score   support

          0       0.91      0.96      0.94      5396
          1       0.78      0.58      0.67      1158

avg / total       0.89      0.90      0.89      6554

----------------------------

aid_related 
              precision    recall  f1-score   support

          0       0.99      1.00      1.00      6521
          1       0.00      0.00      0.00        33

avg / total       0.99      0.99

  'precision', 'predicted', average, warn_for)


In [24]:
print(f'New Score',f1_score(y_test, y_pred_feat, average = 'samples'))

New Score 0.53348498611


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


## Who,what,where,why,when

In [29]:
class StartingQuestionExtractor(BaseEstimator, TransformerMixin):

    def starting_w_question(self, text):
        sentence_list = nltk.sent_tokenize(text)
        for sentence in sentence_list:
            pos_tags = nltk.pos_tag(tokenize(sentence))
            first_word, first_tag = pos_tags[0]
            # return true if the first word is an appropriate verb or RT for retweet
            if first_tag in ['WDT', 'WP','WP$','WRB']:
                return True
        return False

    def fit(self, X, y=None):
        return self

    def transform(self, X):
        X_tagged = pd.Series(X).apply(self.starting_w_question)
        return pd.DataFrame(X_tagged)

In [38]:
pipeline_feature_2 = Pipeline([
    ('features', FeatureUnion([

        ('nlp_pipeline', Pipeline([
            ('vect', CountVectorizer(tokenizer=tokenize)),
            ('tfidf', TfidfTransformer())
        ])),

        ('W_question', StartingQuestionExtractor()),
                ('prnoun', StartingPronounExtractor()),
        ('verb',StartingVerbExtractor())
    ])),

    ('xg', MultiOutputClassifier(xgb.XGBClassifier(eval_metric='mlogloss',use_label_encoder=False)))
])

pipeline_feature_3 = Pipeline([
    ('features', FeatureUnion([

        ('nlp_pipeline', Pipeline([
            ('vect', CountVectorizer(tokenizer=tokenize)),
            ('tfidf', TfidfTransformer())
        ])),

        ('W_question', StartingQuestionExtractor())
    ])),

    ('xg', MultiOutputClassifier(xgb.XGBClassifier(eval_metric='mlogloss',use_label_encoder=False)))
])

In [33]:
pipeline_feature_2.fit(X_train,y_train)

Pipeline(memory=None,
     steps=[('features', FeatureUnion(n_jobs=1,
       transformer_list=[('nlp_pipeline', Pipeline(memory=None,
     steps=[('vect', CountVectorizer(analyzer='word', binary=False, decode_error='strict',
        dtype=<class 'numpy.int64'>, encoding='utf-8', input='content',
        lowercase=True, max_df...e, use_label_encoder=False,
       validate_parameters=None, verbosity=None),
           n_jobs=1))])

In [39]:
pipeline_feature_3.fit(X_train,y_train)

Pipeline(memory=None,
     steps=[('features', FeatureUnion(n_jobs=1,
       transformer_list=[('nlp_pipeline', Pipeline(memory=None,
     steps=[('vect', CountVectorizer(analyzer='word', binary=False, decode_error='strict',
        dtype=<class 'numpy.int64'>, encoding='utf-8', input='content',
        lowercase=True, max_df...e, use_label_encoder=False,
       validate_parameters=None, verbosity=None),
           n_jobs=1))])

In [34]:
y_pred_feat_2 = pipeline_feature_2.predict(X_test)

In [40]:
y_pred_feat_3 = pipeline_feature_3.predict(X_test)

In [35]:
print(classification_report(y_test, y_pred_feat_2, target_names = labels))

                        precision    recall  f1-score   support

               related       0.85      0.94      0.89      5052
               request       0.78      0.57      0.66      1158
                 offer       0.00      0.00      0.00        33
           aid_related       0.77      0.65      0.71      2765
          medical_help       0.62      0.25      0.36       509
      medical_products       0.64      0.28      0.39       333
     search_and_rescue       0.67      0.20      0.31       198
              security       0.00      0.00      0.00       133
              military       0.60      0.27      0.37       218
                 water       0.76      0.67      0.72       445
                  food       0.81      0.73      0.77       766
               shelter       0.77      0.59      0.67       556
              clothing       0.75      0.48      0.59        99
                 money       0.60      0.23      0.33       150
        missing_people       0.73      

  'precision', 'predicted', average, warn_for)


In [36]:
print(f'New Score',f1_score(y_test, y_pred_feat_2, average = 'samples'))

New Score 0.531136982903


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


In [41]:
print(f'New Score',f1_score(y_test, y_pred_feat_3, average = 'samples'))

New Score 0.534218314084


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


## Hyper Parameter Compare

All new features lowered model performance, we will add some hyperparameter tuning.

In [45]:
parameters = {
        'xg__estimator__learning_rate' : [0.1, 0.2],
        'xg__estimator__subsample' : [0.25, 0.5],
        'xg__estimator__max_depth' : [4, 5],
        'xg__estimator__n_estimators': [10, 100]
    }

cv = GridSearchCV(pipeline_feature, param_grid = parameters, n_jobs=-1, scoring = "f1_samples", verbose=2,cv =2)

In [46]:
cv.fit(X_train,y_train)

Fitting 2 folds for each of 16 candidates, totalling 32 fits
[CV] xg__estimator__learning_rate=0.1, xg__estimator__max_depth=4, xg__estimator__n_estimators=10, xg__estimator__subsample=0.25 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)
  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)
[Parallel(n_jobs=-1)]: Done   1 out of   1 | elapsed:  1.5min remaining:    0.0s


[CV]  xg__estimator__learning_rate=0.1, xg__estimator__max_depth=4, xg__estimator__n_estimators=10, xg__estimator__subsample=0.25, total= 1.1min
[CV] xg__estimator__learning_rate=0.1, xg__estimator__max_depth=4, xg__estimator__n_estimators=10, xg__estimator__subsample=0.25 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)
  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  xg__estimator__learning_rate=0.1, xg__estimator__max_depth=4, xg__estimator__n_estimators=10, xg__estimator__subsample=0.25, total= 1.1min
[CV] xg__estimator__learning_rate=0.1, xg__estimator__max_depth=4, xg__estimator__n_estimators=10, xg__estimator__subsample=0.5 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)
  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  xg__estimator__learning_rate=0.1, xg__estimator__max_depth=4, xg__estimator__n_estimators=10, xg__estimator__subsample=0.5, total= 1.2min
[CV] xg__estimator__learning_rate=0.1, xg__estimator__max_depth=4, xg__estimator__n_estimators=10, xg__estimator__subsample=0.5 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)
  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  xg__estimator__learning_rate=0.1, xg__estimator__max_depth=4, xg__estimator__n_estimators=10, xg__estimator__subsample=0.5, total= 1.2min
[CV] xg__estimator__learning_rate=0.1, xg__estimator__max_depth=4, xg__estimator__n_estimators=100, xg__estimator__subsample=0.25 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)
  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  xg__estimator__learning_rate=0.1, xg__estimator__max_depth=4, xg__estimator__n_estimators=100, xg__estimator__subsample=0.25, total= 2.7min
[CV] xg__estimator__learning_rate=0.1, xg__estimator__max_depth=4, xg__estimator__n_estimators=100, xg__estimator__subsample=0.25 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)
  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  xg__estimator__learning_rate=0.1, xg__estimator__max_depth=4, xg__estimator__n_estimators=100, xg__estimator__subsample=0.25, total= 2.7min
[CV] xg__estimator__learning_rate=0.1, xg__estimator__max_depth=4, xg__estimator__n_estimators=100, xg__estimator__subsample=0.5 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)
  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  xg__estimator__learning_rate=0.1, xg__estimator__max_depth=4, xg__estimator__n_estimators=100, xg__estimator__subsample=0.5, total= 3.6min
[CV] xg__estimator__learning_rate=0.1, xg__estimator__max_depth=4, xg__estimator__n_estimators=100, xg__estimator__subsample=0.5 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)
  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  xg__estimator__learning_rate=0.1, xg__estimator__max_depth=4, xg__estimator__n_estimators=100, xg__estimator__subsample=0.5, total= 3.6min
[CV] xg__estimator__learning_rate=0.1, xg__estimator__max_depth=5, xg__estimator__n_estimators=10, xg__estimator__subsample=0.25 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)
  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  xg__estimator__learning_rate=0.1, xg__estimator__max_depth=5, xg__estimator__n_estimators=10, xg__estimator__subsample=0.25, total= 1.1min
[CV] xg__estimator__learning_rate=0.1, xg__estimator__max_depth=5, xg__estimator__n_estimators=10, xg__estimator__subsample=0.25 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)
  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  xg__estimator__learning_rate=0.1, xg__estimator__max_depth=5, xg__estimator__n_estimators=10, xg__estimator__subsample=0.25, total= 1.1min
[CV] xg__estimator__learning_rate=0.1, xg__estimator__max_depth=5, xg__estimator__n_estimators=10, xg__estimator__subsample=0.5 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)
  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  xg__estimator__learning_rate=0.1, xg__estimator__max_depth=5, xg__estimator__n_estimators=10, xg__estimator__subsample=0.5, total= 1.3min
[CV] xg__estimator__learning_rate=0.1, xg__estimator__max_depth=5, xg__estimator__n_estimators=10, xg__estimator__subsample=0.5 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)
  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  xg__estimator__learning_rate=0.1, xg__estimator__max_depth=5, xg__estimator__n_estimators=10, xg__estimator__subsample=0.5, total= 1.2min
[CV] xg__estimator__learning_rate=0.1, xg__estimator__max_depth=5, xg__estimator__n_estimators=100, xg__estimator__subsample=0.25 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)
  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  xg__estimator__learning_rate=0.1, xg__estimator__max_depth=5, xg__estimator__n_estimators=100, xg__estimator__subsample=0.25, total= 3.1min
[CV] xg__estimator__learning_rate=0.1, xg__estimator__max_depth=5, xg__estimator__n_estimators=100, xg__estimator__subsample=0.25 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)
  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  xg__estimator__learning_rate=0.1, xg__estimator__max_depth=5, xg__estimator__n_estimators=100, xg__estimator__subsample=0.25, total= 3.1min
[CV] xg__estimator__learning_rate=0.1, xg__estimator__max_depth=5, xg__estimator__n_estimators=100, xg__estimator__subsample=0.5 


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)
  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


[CV]  xg__estimator__learning_rate=0.1, xg__estimator__max_depth=5, xg__estimator__n_estimators=100, xg__estimator__subsample=0.5, total= 4.1min
[CV] xg__estimator__learning_rate=0.1, xg__estimator__max_depth=5, xg__estimator__n_estimators=100, xg__estimator__subsample=0.5 


KeyboardInterrupt: 

In [None]:
cv.best_params_

In [None]:
cv.best_score_

In [36]:
#XGBoost_ hyperparam
pipeline_xg_hp = Pipeline([
    ('vect', CountVectorizer(tokenizer=tokenize)),
    ('tfidf', TfidfTransformer()), 
    ('xg', MultiOutputClassifier(xgb.XGBClassifier(learning_rate=0.1, subsample=0.5, max_depth=4, n_estimators=100, eval_metric='mlogloss',use_label_encoder=False)))])

In [38]:
pipeline_xg_hp.fit(X_train,y_train)

Pipeline(memory=None,
     steps=[('vect', CountVectorizer(analyzer='word', binary=False, decode_error='strict',
        dtype=<class 'numpy.int64'>, encoding='utf-8', input='content',
        lowercase=True, max_df=1.0, max_features=None, min_df=1,
        ngram_range=(1, 1), preprocessor=None, stop_words=None,
        strip...e, use_label_encoder=False,
       validate_parameters=None, verbosity=None),
           n_jobs=1))])

In [40]:
y_pred_xg_hp = pipeline_xg_hp.predict(X_test)

In [42]:
print(f'Original Score',f1_score(y_test, y_pred1, average = 'samples'))
print(f'Original Tuned Score',f1_score(y_test, y_pred, average = 'samples'))
print(f'Original XG Score',f1_score(y_test, y_pred_xg, average = 'samples'))
print(f'Feature XG Score',f1_score(y_test, y_pred_feat, average = 'samples'))
print(f'Second Feature XG Score',f1_score(y_test, y_pred_feat_2, average = 'samples'))
print(f'Third Feature XG Score',f1_score(y_test, y_pred_feat_3, average = 'samples'))
print(f'New Tuned Score',f1_score(y_test, y_pred_xg_hp, average = 'samples'))

Original Score 0.484645562709
Original XG Score 0.536707119522
New Score 0.546952242945


  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


### 9. Export your model as a pickle file

### 10. Use this notebook to complete `train.py`
Use the template file attached in the Resources folder to write a script that runs the steps above to create a database and export a model based on a new dataset specified by the user.