# Problem Statement:
He actually did it! Long time friend Ben Seller sold off his dad's old real estate company and invested it ALL into bitcoin! Lucky for him he bought at just the right time and bitcoin skyrocketed! He's a millionaire! Ben has come to me asking for help, again. He has been receiving a massive outpouring of financial advice from his friends and family, and he needs a quick way to decide what is actual financial advice and what is not.

By pulling data from the popular website Reddit, specifically the subreddits r/personalfinance and r/frugal, I will construct a model that will determine if the advice Ben is being given is “sound” financial advice, or just general advice about ways to save money.

# modeling notebook:
in this notebook you will find all the iterations of models and grid search used for the project. The models are exported and the good models are analysed in the model analysis notebook. The models are broken out in sections and the scores briefly interpreted, with the two best models being sent on to the next notebook. The major scoring method I will be relying on is accuracy. This is because there is not a real difference in severity between a type-1 and a type-2 error. The overall accuracy of the model can reliably tell me how well the model is performing, and the effect of the other metrics don't seem able to have more impact on the outcome of the model compared to the problem statement. 

A note about lemmetizing and stemming: Due to the relatively similar languague used between these subreddits, I deemed that it was important for the model to be able to train on the difference between saying something is cheap, versus saying something is cheaper. This type of nuance felt like it could be lost in the lemmetizing or stemming process, and the models were performing well with just the use of vectorization. 

In [1]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer

from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split, GridSearchCV, RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
import joblib
import nltk
from nltk.stem import WordNetLemmatizer
from xgboost import XGBClassifier

The stop words are also being brought over from the EDA, as most of the vectorizing will be happening in pipeleines, having the stop word list as an option to search over was important. 

In [3]:
from sklearn.feature_extraction import text 
my_stop_words = ['just', 'like','know', 've','money','don']
stop_words = text.ENGLISH_STOP_WORDS.union(my_stop_words)

In [5]:
df = pd.read_csv('./data/modeling_data.csv')

In [4]:
df.head()

Unnamed: 0,selftext,subreddit,target
0,eeing conflicting information about this inclu...,personalfinance,1
1,"I’m 32f, married with no kids. describe mysel...",personalfinance,1
2,I already received the last timulus check last...,personalfinance,1
3,The deadline for appealing the appraisal is M...,personalfinance,1
4,"I don't know what information to give, so I ap...",personalfinance,1


Some nulls were introduced in the export and import process, but there were only 6 of them so I just dropped them. 

In [7]:
df.dropna(inplace=True)

In [8]:
df.isnull().sum()

selftext     0
subreddit    0
target       0
dtype: int64

Setting up a train test split to look at the data and establish a baseline model, to make sure all models are performing better than the baseline. 

In [8]:
X = df['selftext']
y = df['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify = y, random_state = 42)

In [53]:
X.shape

(12377,)

In [20]:
y.shape

(12377,)

baseline model

In [8]:
y.value_counts(normalize=True)

1    0.511271
0    0.488729
Name: target, dtype: float64

# Random Forest Classifier with Random Search

In [39]:
rfc = RandomForestClassifier()

In [12]:
pipe = Pipeline([
    ('cvec', CountVectorizer(stop_words='english')),
    ('rfc', RandomForestClassifier())
])

In [14]:
pipe_params = {
    'cvec__max_features': [20000, 15000, 10000],
    'cvec__min_df': [2, 3, 4],
    'cvec__max_df': [.9, .95],
    'cvec__ngram_range': [(1,1), (1,2)],
    'rfc__n_estimators': [500, 750, 1000, 1250],
    'rfc__max_depth': [10,15,20,25],
    'rfc__min_samples_split': [4, 5],
    'rfc__min_samples_leaf': [2, 3],
    'rfc__max_features': ['auto']
}

In [15]:
rs_pipe = RandomizedSearchCV(pipe, pipe_params,cv = 5, verbose=1, n_jobs=-2)

In [16]:
rs_pipe.fit(X_train,y_train)

Fitting 5 folds for each of 10 candidates, totalling 50 fits


[Parallel(n_jobs=-2)]: Using backend LokyBackend with 15 concurrent workers.
[Parallel(n_jobs=-2)]: Done  20 tasks      | elapsed:   21.9s
[Parallel(n_jobs=-2)]: Done  50 out of  50 | elapsed:   55.9s finished


RandomizedSearchCV(cv=5,
                   estimator=Pipeline(steps=[('cvec',
                                              CountVectorizer(stop_words='english')),
                                             ('rfc',
                                              RandomForestClassifier())]),
                   n_jobs=-2,
                   param_distributions={'cvec__max_df': [0.9, 0.95],
                                        'cvec__max_features': [20000, 15000,
                                                               10000],
                                        'cvec__min_df': [2, 3, 4],
                                        'cvec__ngram_range': [(1, 1), (1, 2)],
                                        'rfc__max_depth': [10, 15, 20, 25],
                                        'rfc__max_features': ['auto'],
                                        'rfc__min_samples_leaf': [2, 3],
                                        'rfc__min_samples_split': [4, 5],
                    

In [17]:
rs_pipe.best_params_

{'rfc__n_estimators': 750,
 'rfc__min_samples_split': 4,
 'rfc__min_samples_leaf': 2,
 'rfc__max_features': 'auto',
 'rfc__max_depth': 20,
 'cvec__ngram_range': (1, 1),
 'cvec__min_df': 3,
 'cvec__max_features': 15000,
 'cvec__max_df': 0.95}

In [18]:
rs_pipe.best_score_

0.8800897234137375

In [19]:
rs_pipe.score(X_train,y_train), rs_pipe.score(X_test,y_test)

(0.9107950872656755, 0.8859450726978998)

These are pretty good scores, slightly overfit but not too bad. I wanted to rerun the search with an even more robust parameter grid to try and get even better scores.

random forest with a larger param search. 

In [21]:
pipe_params2 = {
    'cvec__max_features': list(range(15000,25000,500)),
    'cvec__min_df': [2, 3, 4],
    'cvec__max_df': [.7,.75,.8,.85,.9, .95],
    'cvec__ngram_range': [(1,1), (1,2),(1,3)],
    'rfc__n_estimators': list(range(500,1400,100)),
    'rfc__max_depth': [10,15,20,25,30,35],
    'rfc__min_samples_split': [1,2,3,4,5,6],
    'rfc__min_samples_leaf': [2, 3,4,5],
    'rfc__max_features': ['auto']
}

In [28]:
rs_pipe2 = RandomizedSearchCV(pipe, pipe_params2,cv = 5, n_iter=1000, verbose=1, n_jobs=-2)

In [29]:
rs_pipe2.fit(X_train,y_train)

Fitting 5 folds for each of 1000 candidates, totalling 5000 fits


[Parallel(n_jobs=-2)]: Using backend LokyBackend with 15 concurrent workers.
[Parallel(n_jobs=-2)]: Done  20 tasks      | elapsed:   19.5s
[Parallel(n_jobs=-2)]: Done 170 tasks      | elapsed:  3.9min
[Parallel(n_jobs=-2)]: Done 420 tasks      | elapsed:  9.0min
[Parallel(n_jobs=-2)]: Done 770 tasks      | elapsed: 16.8min
[Parallel(n_jobs=-2)]: Done 1220 tasks      | elapsed: 25.1min
[Parallel(n_jobs=-2)]: Done 1770 tasks      | elapsed: 35.2min
[Parallel(n_jobs=-2)]: Done 2420 tasks      | elapsed: 46.3min
[Parallel(n_jobs=-2)]: Done 3170 tasks      | elapsed: 60.5min
[Parallel(n_jobs=-2)]: Done 4020 tasks      | elapsed: 75.5min
[Parallel(n_jobs=-2)]: Done 4970 tasks      | elapsed: 95.2min
[Parallel(n_jobs=-2)]: Done 5000 out of 5000 | elapsed: 95.9min finished


RandomizedSearchCV(cv=5,
                   estimator=Pipeline(steps=[('cvec',
                                              CountVectorizer(stop_words='english')),
                                             ('rfc',
                                              RandomForestClassifier())]),
                   n_iter=1000, n_jobs=-2,
                   param_distributions={'cvec__max_df': [0.7, 0.75, 0.8, 0.85,
                                                         0.9, 0.95],
                                        'cvec__max_features': [15000, 15500,
                                                               16000, 16500,
                                                               17000, 17500,
                                                               18000, 18500,
                                                               19000, 19500,
                                                               20000, 20500,
                                                      

In [93]:
rs_pipe2.best_params_

{'rfc__n_estimators': 900,
 'rfc__min_samples_split': 5,
 'rfc__min_samples_leaf': 2,
 'rfc__max_features': 'auto',
 'rfc__max_depth': 35,
 'cvec__ngram_range': (1, 1),
 'cvec__min_df': 2,
 'cvec__max_features': 20000,
 'cvec__max_df': 0.85}

In [31]:
rs_pipe2.best_estimator_

Pipeline(steps=[('cvec',
                 CountVectorizer(max_df=0.85, max_features=20000, min_df=2,
                                 stop_words='english')),
                ('rfc',
                 RandomForestClassifier(max_depth=35, min_samples_leaf=2,
                                        min_samples_split=5,
                                        n_estimators=900))])

In [32]:
rs_pipe2.score(X_train,y_train), rs_pipe2.score(X_test,y_test)

(0.9283559577677225, 0.8949919224555735)

These scores are slightly better! They are also very good, but I wanted to run one more, with slightly more parameters in the vectorizer to attempt to bump the scores one last time. 

third times the charm

In [14]:
pipe3 = Pipeline([
    ('cvec', CountVectorizer()),
    ('rfc', RandomForestClassifier())
])

In [15]:
pipe_params3 = {
    'cvec__max_features': list(range(15000,25000,500)),
    'cvec__min_df': [2, 3, 4],
    'cvec__max_df': [.7,.75,.8,.85,.9, .95],
    'cvec__ngram_range': [(1,1), (1,2),(1,3)],
    'cvec__stop_words' : ['english', stop_words,None],
    'rfc__n_estimators': list(range(500,1400,100)),
    'rfc__max_depth': [10,15,20,25,30,35],
    'rfc__min_samples_split': [1,2,3,4,5,6],
    'rfc__min_samples_leaf': [2, 3,4,5],
    'rfc__max_features': ['auto']
}

In [18]:
rs_pipe3 = RandomizedSearchCV(pipe3, pipe_params3,cv = 5, n_iter=500, verbose=1, n_jobs=8)

In [19]:
rs_pipe3.fit(X_train,y_train)

Fitting 5 folds for each of 500 candidates, totalling 2500 fits


[Parallel(n_jobs=8)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done  34 tasks      | elapsed:  1.0min
[Parallel(n_jobs=8)]: Done 184 tasks      | elapsed:  4.9min
[Parallel(n_jobs=8)]: Done 434 tasks      | elapsed: 12.5min
[Parallel(n_jobs=8)]: Done 784 tasks      | elapsed: 25.8min
[Parallel(n_jobs=8)]: Done 1234 tasks      | elapsed: 41.1min
[Parallel(n_jobs=8)]: Done 1784 tasks      | elapsed: 60.7min
[Parallel(n_jobs=8)]: Done 2434 tasks      | elapsed: 83.7min
[Parallel(n_jobs=8)]: Done 2500 out of 2500 | elapsed: 85.4min finished


RandomizedSearchCV(cv=5,
                   estimator=Pipeline(steps=[('cvec', CountVectorizer()),
                                             ('rfc',
                                              RandomForestClassifier())]),
                   n_iter=500, n_jobs=8,
                   param_distributions={'cvec__max_df': [0.7, 0.75, 0.8, 0.85,
                                                         0.9, 0.95],
                                        'cvec__max_features': [15000, 15500,
                                                               16000, 16500,
                                                               17000, 17500,
                                                               18000, 18500,
                                                               19000, 19500,
                                                               20000, 20500,
                                                               21000, 21500,
                                             

In [20]:
rs_pipe3.best_score_

0.8898943071880862

In [21]:
rs_pipe3.score(X_train,y_train), rs_pipe3.score(X_test,y_test)

(0.9286791639732817, 0.8940226171243942)

These are very similar scores to the second pipeline. This model has a lot of promise, as it is scoring fairly well and doesn't seem to have massive problems with being overfit. Exporting this model.

In [22]:
joblib.dump(rs_pipe3,'./models/RandomForestpipe3.pkl')

['./data/RandomForestpipe3.pkl']

# Extra Trees classifier with random search

In [25]:
X = df['selftext']
y = df['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify = y, random_state = 42)

In [26]:
pipe = Pipeline([
    ('cvec', CountVectorizer()),
    ('et', ExtraTreesClassifier())
])

In [27]:
et_params = {
    'cvec__max_features': list(range(15000,25000,500)),
    'cvec__min_df': [2, 3, 4],
    'cvec__max_df': [.7,.75,.8,.85,.9, .95],
    'cvec__ngram_range': [(1,1), (1,2),(1,3)],
    'et__n_estimators': [500, 750, 1000],
    'et__max_depth':[10,15,20,25,30,35],
    'et__min_samples_split': [1,2,3,4,5],
    'et__min_samples_leaf': [1,2,3,4,5],
    'et__bootstrap': [True, False],
    'et__max_features': ['auto'],
    'et__class_weight': ['balanced']
}

In [31]:
et_random = RandomizedSearchCV(pipe, et_params,cv = 5, n_iter=500, verbose=1, n_jobs=-2)

In [32]:
et_random.fit(X_train,y_train)

Fitting 5 folds for each of 500 candidates, totalling 2500 fits


[Parallel(n_jobs=-2)]: Using backend LokyBackend with 15 concurrent workers.
[Parallel(n_jobs=-2)]: Done  20 tasks      | elapsed:   19.1s
[Parallel(n_jobs=-2)]: Done 170 tasks      | elapsed:  3.8min
[Parallel(n_jobs=-2)]: Done 420 tasks      | elapsed:  9.5min
[Parallel(n_jobs=-2)]: Done 770 tasks      | elapsed: 18.1min
[Parallel(n_jobs=-2)]: Done 1220 tasks      | elapsed: 30.0min
[Parallel(n_jobs=-2)]: Done 1770 tasks      | elapsed: 45.0min
[Parallel(n_jobs=-2)]: Done 2420 tasks      | elapsed: 63.0min
[Parallel(n_jobs=-2)]: Done 2500 out of 2500 | elapsed: 65.3min finished


RandomizedSearchCV(cv=5,
                   estimator=Pipeline(steps=[('cvec', CountVectorizer()),
                                             ('et', ExtraTreesClassifier())]),
                   n_iter=500, n_jobs=-2,
                   param_distributions={'cvec__max_df': [0.7, 0.75, 0.8, 0.85,
                                                         0.9, 0.95],
                                        'cvec__max_features': [15000, 15500,
                                                               16000, 16500,
                                                               17000, 17500,
                                                               18000, 18500,
                                                               19000, 19500,
                                                               20000, 20500,
                                                               21000, 21500,
                                                               22000, 22500,
                

In [35]:
et_random.best_params_

{'et__n_estimators': 500,
 'et__min_samples_split': 5,
 'et__min_samples_leaf': 1,
 'et__max_features': 'auto',
 'et__max_depth': 35,
 'et__class_weight': 'balanced',
 'et__bootstrap': False,
 'cvec__ngram_range': (1, 1),
 'cvec__min_df': 2,
 'cvec__max_features': 15500,
 'cvec__max_df': 0.75}

In [33]:
et_random.best_score_

0.8828914475516685

In [34]:
et_random.score(X_train,y_train), et_random.score(X_test,y_test)

(0.9465632406808877, 0.8836833602584814)

This model is providing a better training score, but a worse test score than the random forest. This is showing slightly more signs of being overfit, which makes it less desirable as a model than the random forest.

In [39]:
joblib.dump(et_random,'./models/ExtraTrees.pkl')

['./data/ExtraTrees.pkl']

# Logistic Reg with Random Search

no regularlization

In [None]:
X = df['selftext']
y = df['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify = y, random_state = 42)

In [95]:
log_pipe = pipe = Pipeline([
    ('cvec', CountVectorizer()),
    ('log', LogisticRegression(penalty='none', max_iter=1000))
])

In [96]:
log_params = {
    'cvec__max_features': list(range(15000,25000,500)),
    'cvec__min_df': [2, 3, 4],
    'cvec__max_df': [.7,.75,.8,.85,.9, .95],
    'cvec__stop_words' : ['english', stop_words,None],
    'cvec__ngram_range': [(1,1), (1,2),(1,3)],
     'log__C': [1,2,3,5,7,10,15],
    'log__class_weight':['balanced',None]
    
}

In [97]:
log_reg = RandomizedSearchCV(log_pipe,log_params,cv=5,verbose=1,n_iter=200,n_jobs=4)

In [98]:
log_reg.fit(X_train,y_train)

Fitting 5 folds for each of 200 candidates, totalling 1000 fits


[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:  1.1min
[Parallel(n_jobs=4)]: Done 192 tasks      | elapsed:  4.1min
[Parallel(n_jobs=4)]: Done 442 tasks      | elapsed:  8.8min
[Parallel(n_jobs=4)]: Done 792 tasks      | elapsed: 15.7min
[Parallel(n_jobs=4)]: Done 1000 out of 1000 | elapsed: 19.9min finished


RandomizedSearchCV(cv=5,
                   estimator=Pipeline(steps=[('cvec', CountVectorizer()),
                                             ('log',
                                              LogisticRegression(max_iter=1000,
                                                                 penalty='none'))]),
                   n_iter=200, n_jobs=4,
                   param_distributions={'cvec__max_df': [0.7, 0.75, 0.8, 0.85,
                                                         0.9, 0.95],
                                        'cvec__max_features': [15000, 15500,
                                                               16000, 16500,
                                                               17000, 17500,
                                                               18000, 18500,
                                                               19000, 19500,
                                                               20000, 20500,
                                

In [99]:
log_reg.best_score_

0.8981902702727795

In [100]:
log_reg.best_params_

{'log__class_weight': 'balanced',
 'log__C': 7,
 'cvec__stop_words': frozenset({'a',
            'about',
            'above',
            'across',
            'after',
            'afterwards',
            'again',
            'against',
            'all',
            'almost',
            'alone',
            'along',
            'already',
            'also',
            'although',
            'always',
            'am',
            'among',
            'amongst',
            'amoungst',
            'amount',
            'an',
            'and',
            'another',
            'any',
            'anyhow',
            'anyone',
            'anything',
            'anyway',
            'anywhere',
            'are',
            'around',
            'as',
            'at',
            'back',
            'be',
            'became',
            'because',
            'become',
            'becomes',
            'becoming',
            'been',
            'before',
            'bef

In [101]:
log_reg.score(X_train,y_train), log_reg.score(X_test,y_test)

(0.9992458521870287, 0.9056542810985461)

This is a slight improvement on the test data, but a huge overfitting problem with such a high score on the train data. Will try and regularize to bring the overfitting down. 

In [102]:
joblib.dump(log_reg,'./models/LogRegNoRegularlization.pkl')

['./data/LogRegNoRegularlization.pkl']

With regularization

In [82]:
log_pipe_reg = pipe = Pipeline([
    ('cvec', CountVectorizer()),
    ('sc', StandardScaler()),
    ('log', LogisticRegression(solver='liblinear'))
])

In [83]:
log_params2 = {
    'sc__with_mean': [True, False],
    'cvec__max_features': list(range(15000,25000,500)),
    'cvec__min_df': [2, 3, 4],
    'cvec__max_df': [.7,.75,.8,.85,.9, .95],
    'cvec__stop_words' : ['english', stop_words,None],
    'cvec__ngram_range': [(1,1), (1,2),(1,3)],
    'log__C': [.01,.1,.2,.3,.4,.5,.6,.7,.8,.9,1],
    'log__penalty':['l1','l2'],
    'log__class_weight':['balanced',None]
}
    

In [89]:
log_reg_reg = RandomizedSearchCV(log_pipe_reg,log_params2,cv=5,n_iter=200,n_jobs=4,verbose=1)

In [90]:
log_reg_reg.fit(X_train,y_train)

Fitting 5 folds for each of 200 candidates, totalling 1000 fits


[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:   51.3s
[Parallel(n_jobs=4)]: Done 192 tasks      | elapsed:  2.8min
[Parallel(n_jobs=4)]: Done 442 tasks      | elapsed:  6.9min
[Parallel(n_jobs=4)]: Done 792 tasks      | elapsed: 11.7min
[Parallel(n_jobs=4)]: Done 1000 out of 1000 | elapsed: 15.3min finished


RandomizedSearchCV(cv=5,
                   estimator=Pipeline(steps=[('cvec', CountVectorizer()),
                                             ('sc', StandardScaler()),
                                             ('log',
                                              LogisticRegression(solver='liblinear'))]),
                   n_iter=200, n_jobs=4,
                   param_distributions={'cvec__max_df': [0.7, 0.75, 0.8, 0.85,
                                                         0.9, 0.95],
                                        'cvec__max_features': [15000, 15500,
                                                               16000, 16500,
                                                               17000, 17500,
                                                               18000, 18500,
                                                               19000, 19500,
                                                               20000, 20500,
                                     

In [91]:
log_reg_reg.best_score_

0.9041147893339275

In [92]:
log_reg_reg.best_params_

{'sc__with_mean': False,
 'log__penalty': 'l2',
 'log__class_weight': None,
 'log__C': 0.01,
 'cvec__stop_words': None,
 'cvec__ngram_range': (1, 2),
 'cvec__min_df': 3,
 'cvec__max_features': 16000,
 'cvec__max_df': 0.85}

In [93]:
log_reg_reg.score(X_train,y_train), log_reg_reg.score(X_test,y_test)

(0.9991381167851756, 0.9075928917609046)

It doesn't seem like the regularization resulted in much of a reduction of the overfitting from the logistic regression model. It does a better job at predicting the test data, however it is very overfit so the random forest model still seems like the best model choice. 

In [94]:
joblib.dump(log_reg_reg,'./models/LogRegRegularlization.pkl')

['./data/LogRegRegularlization.pkl']

# Boosting with Random Search

In [12]:
X = df['selftext']
y = df['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify = y, random_state = 42)

In [13]:
boost_pipe = Pipeline([
    ('cvec', CountVectorizer()),
    ('boost', XGBClassifier())
])

In [11]:
boost_params = {
    'cvec__max_features': list(range(15000,25000,500)),
    'cvec__min_df': [2, 3, 4],
    'cvec__max_df': [.7,.75,.8,.85,.9, .95],
    'cvec__stop_words' : ['english', stop_words,None],
    'cvec__ngram_range': [(1,1), (1,2),(1,3)],
    'boost__n_estimators':[1,2,3],
    'boost__max_depth': [1,2,3],
    'boost__learning_rate': [.1,.2,.3],
    'boost__min_child_weight':[.1,.2],
    'boost__subsample':[.3,.4,.5]
}

In [14]:
boost = RandomizedSearchCV(boost_pipe,boost_params,cv=5,n_iter=75,n_jobs=4,verbose=1)

In [15]:
boost.fit(X_train,y_train)

Fitting 5 folds for each of 75 candidates, totalling 375 fits


[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed:   58.7s
[Parallel(n_jobs=4)]: Done 192 tasks      | elapsed:  3.4min
[Parallel(n_jobs=4)]: Done 375 out of 375 | elapsed:  6.7min finished






RandomizedSearchCV(cv=5,
                   estimator=Pipeline(steps=[('cvec', CountVectorizer()),
                                             ('boost',
                                              XGBClassifier(base_score=None,
                                                            booster=None,
                                                            colsample_bylevel=None,
                                                            colsample_bynode=None,
                                                            colsample_bytree=None,
                                                            gamma=None,
                                                            gpu_id=None,
                                                            importance_type='gain',
                                                            interaction_constraints=None,
                                                            learning_rate=None,
                                              

In [16]:
boost.best_score_

0.7972433058511131

In [17]:
boost.best_params_

{'cvec__stop_words': 'english',
 'cvec__ngram_range': (1, 1),
 'cvec__min_df': 3,
 'cvec__max_features': 24500,
 'cvec__max_df': 0.8,
 'boost__subsample': 0.5,
 'boost__n_estimators': 3,
 'boost__min_child_weight': 0.2,
 'boost__max_depth': 3,
 'boost__learning_rate': 0.3}

In [18]:
boost.score(X_train,y_train), boost.score(X_test,y_test)

(0.8120017237664297, 0.812924071082391)

Finally, a model with zero overfitting! unfortunately the bias-variance tradeoff has reared its ugly head and we have lost some accuracy in our model. Still this model scores are pretty strong, and has no overfitting, which makes it seem like a promising model going forward.

In [19]:
joblib.dump(boost,'./models/boost.pkl')

['./data/boost.pkl']

# Testing Bayes Search CV

I hoped to build some stronger models using Bayes search to quickly search over very large numbers of parameters, but did not have the success I wanted. 

## Decision Tree with Bayes Search

In [9]:
X = df['selftext']
y = df['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify = y, random_state = 42)

In [8]:
from skopt import BayesSearchCV
from skopt.space import Real, Categorical, Integer

In [66]:
X.shape

(12377,)

In [67]:
y.shape

(12377,)

In [94]:
dtree_pipe = Pipeline([
    ('cvec', CountVectorizer(stop_words=stop_words,ngram_range=(1,2) )),
    ('dtree', DecisionTreeClassifier(random_state = 42))
])

In [97]:
dtree_params = {
    'cvec__max_features': Integer(15000,30000),
    'cvec__min_df': Integer(2,10),
    'cvec__max_df': Real(.1,1),
    'dtree__criterion': Categorical(['gini', 'entropy']),
    'dtree__splitter': Categorical(['best', 'random']),
    'dtree__min_samples_split': Real(0, .5),
    'dtree__min_samples_leaf': Real(0, .5),
    'dtree__max_features': Categorical(['auto', 'sqrt']),
    'dtree__min_impurity_decrease': Real(0, .2, prior='uniform'),
    'dtree__ccp_alpha': Real(0, .2, prior='uniform')
}

In [100]:
dtree_bs = BayesSearchCV(estimator = dtree_pipe,
                     search_spaces = dtree_params,
                     scoring = 'f1',
                     n_iter = 50,
                     cv = 5,
                     refit = True,
                     optimizer_kwargs = {'base_estimator': 'RF'},
                     random_state=42,
                        verbose=2)

In [101]:
type(dtree_bs)

skopt.searchcv.BayesSearchCV

In [102]:
dtree_bs.fit(X_train,y_train)

Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] cvec__max_df=0.46909356296798244, cvec__max_features=28079, cvec__min_df=5, dtree__ccp_alpha=0.06585257100433049, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.1949351926906874, dtree__min_samples_leaf=0.22708510882967803, dtree__min_samples_split=0.07970907652550675, dtree__splitter=random 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  cvec__max_df=0.46909356296798244, cvec__max_features=28079, cvec__min_df=5, dtree__ccp_alpha=0.06585257100433049, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.1949351926906874, dtree__min_samples_leaf=0.22708510882967803, dtree__min_samples_split=0.07970907652550675, dtree__splitter=random, total=   2.7s
[CV] cvec__max_df=0.46909356296798244, cvec__max_features=28079, cvec__min_df=5, dtree__ccp_alpha=0.06585257100433049, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.1949351926906874, dtree__min_samples_leaf=0.22708510882967803, dtree__min_samples_split=0.07970907652550675, dtree__splitter=random 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    2.6s remaining:    0.0s


[CV]  cvec__max_df=0.46909356296798244, cvec__max_features=28079, cvec__min_df=5, dtree__ccp_alpha=0.06585257100433049, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.1949351926906874, dtree__min_samples_leaf=0.22708510882967803, dtree__min_samples_split=0.07970907652550675, dtree__splitter=random, total=   2.6s
[CV] cvec__max_df=0.46909356296798244, cvec__max_features=28079, cvec__min_df=5, dtree__ccp_alpha=0.06585257100433049, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.1949351926906874, dtree__min_samples_leaf=0.22708510882967803, dtree__min_samples_split=0.07970907652550675, dtree__splitter=random 
[CV]  cvec__max_df=0.46909356296798244, cvec__max_features=28079, cvec__min_df=5, dtree__ccp_alpha=0.06585257100433049, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.1949351926906874, dtree__min_samples_leaf=0.22708510882967803, dtree__min_samples_split=0.07970907652550675, dtr

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   13.4s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  cvec__max_df=0.8536495199979558, cvec__max_features=28088, cvec__min_df=9, dtree__ccp_alpha=0.060682021976835, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.012462588168814294, dtree__min_samples_leaf=0.06915426913928759, dtree__min_samples_split=0.17679369881420906, dtree__splitter=random, total=   2.7s
[CV] cvec__max_df=0.8536495199979558, cvec__max_features=28088, cvec__min_df=9, dtree__ccp_alpha=0.060682021976835, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.012462588168814294, dtree__min_samples_leaf=0.06915426913928759, dtree__min_samples_split=0.17679369881420906, dtree__splitter=random 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    2.6s remaining:    0.0s


[CV]  cvec__max_df=0.8536495199979558, cvec__max_features=28088, cvec__min_df=9, dtree__ccp_alpha=0.060682021976835, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.012462588168814294, dtree__min_samples_leaf=0.06915426913928759, dtree__min_samples_split=0.17679369881420906, dtree__splitter=random, total=   2.7s
[CV] cvec__max_df=0.8536495199979558, cvec__max_features=28088, cvec__min_df=9, dtree__ccp_alpha=0.060682021976835, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.012462588168814294, dtree__min_samples_leaf=0.06915426913928759, dtree__min_samples_split=0.17679369881420906, dtree__splitter=random 
[CV]  cvec__max_df=0.8536495199979558, cvec__max_features=28088, cvec__min_df=9, dtree__ccp_alpha=0.060682021976835, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.012462588168814294, dtree__min_samples_leaf=0.06915426913928759, dtree__min_samples_split=0.17679369881420906, dtree_

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   13.3s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  cvec__max_df=0.5003492610710366, cvec__max_features=24783, cvec__min_df=6, dtree__ccp_alpha=0.02697915313833235, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.12733702473573766, dtree__min_samples_leaf=0.11458510368071653, dtree__min_samples_split=0.15107784410270667, dtree__splitter=best, total=   2.7s
[CV] cvec__max_df=0.5003492610710366, cvec__max_features=24783, cvec__min_df=6, dtree__ccp_alpha=0.02697915313833235, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.12733702473573766, dtree__min_samples_leaf=0.11458510368071653, dtree__min_samples_split=0.15107784410270667, dtree__splitter=best 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    2.6s remaining:    0.0s


[CV]  cvec__max_df=0.5003492610710366, cvec__max_features=24783, cvec__min_df=6, dtree__ccp_alpha=0.02697915313833235, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.12733702473573766, dtree__min_samples_leaf=0.11458510368071653, dtree__min_samples_split=0.15107784410270667, dtree__splitter=best, total=   2.6s
[CV] cvec__max_df=0.5003492610710366, cvec__max_features=24783, cvec__min_df=6, dtree__ccp_alpha=0.02697915313833235, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.12733702473573766, dtree__min_samples_leaf=0.11458510368071653, dtree__min_samples_split=0.15107784410270667, dtree__splitter=best 
[CV]  cvec__max_df=0.5003492610710366, cvec__max_features=24783, cvec__min_df=6, dtree__ccp_alpha=0.02697915313833235, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.12733702473573766, dtree__min_samples_leaf=0.11458510368071653, dtree__min_samples_split=0.15107784410270667, dtree__splitter=

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   13.4s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  cvec__max_df=0.831156389521627, cvec__max_features=16606, cvec__min_df=8, dtree__ccp_alpha=0.19614354342634202, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.0979174669918426, dtree__min_samples_leaf=0.1692107017063735, dtree__min_samples_split=0.20254884254905903, dtree__splitter=random, total=   2.7s
[CV] cvec__max_df=0.831156389521627, cvec__max_features=16606, cvec__min_df=8, dtree__ccp_alpha=0.19614354342634202, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.0979174669918426, dtree__min_samples_leaf=0.1692107017063735, dtree__min_samples_split=0.20254884254905903, dtree__splitter=random 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    2.6s remaining:    0.0s


[CV]  cvec__max_df=0.831156389521627, cvec__max_features=16606, cvec__min_df=8, dtree__ccp_alpha=0.19614354342634202, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.0979174669918426, dtree__min_samples_leaf=0.1692107017063735, dtree__min_samples_split=0.20254884254905903, dtree__splitter=random, total=   2.6s
[CV] cvec__max_df=0.831156389521627, cvec__max_features=16606, cvec__min_df=8, dtree__ccp_alpha=0.19614354342634202, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.0979174669918426, dtree__min_samples_leaf=0.1692107017063735, dtree__min_samples_split=0.20254884254905903, dtree__splitter=random 
[CV]  cvec__max_df=0.831156389521627, cvec__max_features=16606, cvec__min_df=8, dtree__ccp_alpha=0.19614354342634202, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.0979174669918426, dtree__min_samples_leaf=0.1692107017063735, dtree__min_samples_split=0.20254884254905903, dtree__splitter=rando

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   13.4s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  cvec__max_df=0.8195980974464557, cvec__max_features=26841, cvec__min_df=4, dtree__ccp_alpha=0.10532404742553848, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.14340623023196739, dtree__min_samples_leaf=0.21208903623034103, dtree__min_samples_split=0.3251420508738813, dtree__splitter=best, total=   2.7s
[CV] cvec__max_df=0.8195980974464557, cvec__max_features=26841, cvec__min_df=4, dtree__ccp_alpha=0.10532404742553848, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.14340623023196739, dtree__min_samples_leaf=0.21208903623034103, dtree__min_samples_split=0.3251420508738813, dtree__splitter=best 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    2.6s remaining:    0.0s


[CV]  cvec__max_df=0.8195980974464557, cvec__max_features=26841, cvec__min_df=4, dtree__ccp_alpha=0.10532404742553848, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.14340623023196739, dtree__min_samples_leaf=0.21208903623034103, dtree__min_samples_split=0.3251420508738813, dtree__splitter=best, total=   2.7s
[CV] cvec__max_df=0.8195980974464557, cvec__max_features=26841, cvec__min_df=4, dtree__ccp_alpha=0.10532404742553848, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.14340623023196739, dtree__min_samples_leaf=0.21208903623034103, dtree__min_samples_split=0.3251420508738813, dtree__splitter=best 
[CV]  cvec__max_df=0.8195980974464557, cvec__max_features=26841, cvec__min_df=4, dtree__ccp_alpha=0.10532404742553848, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.14340623023196739, dtree__min_samples_leaf=0.21208903623034103, dtree__min_samples_split=0.3251420508738813, dtree__spl

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   13.4s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  cvec__max_df=0.7606252161870115, cvec__max_features=17291, cvec__min_df=4, dtree__ccp_alpha=0.1602712708539148, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.059931459216810035, dtree__min_samples_leaf=0.1928124894751801, dtree__min_samples_split=0.4025075092363505, dtree__splitter=random, total=   2.7s
[CV] cvec__max_df=0.7606252161870115, cvec__max_features=17291, cvec__min_df=4, dtree__ccp_alpha=0.1602712708539148, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.059931459216810035, dtree__min_samples_leaf=0.1928124894751801, dtree__min_samples_split=0.4025075092363505, dtree__splitter=random 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    2.6s remaining:    0.0s


[CV]  cvec__max_df=0.7606252161870115, cvec__max_features=17291, cvec__min_df=4, dtree__ccp_alpha=0.1602712708539148, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.059931459216810035, dtree__min_samples_leaf=0.1928124894751801, dtree__min_samples_split=0.4025075092363505, dtree__splitter=random, total=   2.7s
[CV] cvec__max_df=0.7606252161870115, cvec__max_features=17291, cvec__min_df=4, dtree__ccp_alpha=0.1602712708539148, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.059931459216810035, dtree__min_samples_leaf=0.1928124894751801, dtree__min_samples_split=0.4025075092363505, dtree__splitter=random 
[CV]  cvec__max_df=0.7606252161870115, cvec__max_features=17291, cvec__min_df=4, dtree__ccp_alpha=0.1602712708539148, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.059931459216810035, dtree__min_samples_leaf=0.1928124894751801, dtree__min_samples_split=0.4025075092363505, dtree__sp

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   13.4s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  cvec__max_df=0.6553714226520847, cvec__max_features=16958, cvec__min_df=4, dtree__ccp_alpha=0.07180458638428079, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.11892400701736769, dtree__min_samples_leaf=0.3243887548518629, dtree__min_samples_split=0.21104129297534868, dtree__splitter=random, total=   2.6s
[CV] cvec__max_df=0.6553714226520847, cvec__max_features=16958, cvec__min_df=4, dtree__ccp_alpha=0.07180458638428079, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.11892400701736769, dtree__min_samples_leaf=0.3243887548518629, dtree__min_samples_split=0.21104129297534868, dtree__splitter=random 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    2.5s remaining:    0.0s


[CV]  cvec__max_df=0.6553714226520847, cvec__max_features=16958, cvec__min_df=4, dtree__ccp_alpha=0.07180458638428079, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.11892400701736769, dtree__min_samples_leaf=0.3243887548518629, dtree__min_samples_split=0.21104129297534868, dtree__splitter=random, total=   2.6s
[CV] cvec__max_df=0.6553714226520847, cvec__max_features=16958, cvec__min_df=4, dtree__ccp_alpha=0.07180458638428079, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.11892400701736769, dtree__min_samples_leaf=0.3243887548518629, dtree__min_samples_split=0.21104129297534868, dtree__splitter=random 
[CV]  cvec__max_df=0.6553714226520847, cvec__max_features=16958, cvec__min_df=4, dtree__ccp_alpha=0.07180458638428079, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.11892400701736769, dtree__min_samples_leaf=0.3243887548518629, dtree__min_samples_split=0.21104129297534868, dtree_

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   13.3s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  cvec__max_df=0.5890627609212813, cvec__max_features=29732, cvec__min_df=6, dtree__ccp_alpha=0.16753128741210532, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.11415556053577965, dtree__min_samples_leaf=0.012900542526760542, dtree__min_samples_split=0.19286479549233237, dtree__splitter=random, total=   2.6s
[CV] cvec__max_df=0.5890627609212813, cvec__max_features=29732, cvec__min_df=6, dtree__ccp_alpha=0.16753128741210532, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.11415556053577965, dtree__min_samples_leaf=0.012900542526760542, dtree__min_samples_split=0.19286479549233237, dtree__splitter=random 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    2.5s remaining:    0.0s


[CV]  cvec__max_df=0.5890627609212813, cvec__max_features=29732, cvec__min_df=6, dtree__ccp_alpha=0.16753128741210532, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.11415556053577965, dtree__min_samples_leaf=0.012900542526760542, dtree__min_samples_split=0.19286479549233237, dtree__splitter=random, total=   2.7s
[CV] cvec__max_df=0.5890627609212813, cvec__max_features=29732, cvec__min_df=6, dtree__ccp_alpha=0.16753128741210532, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.11415556053577965, dtree__min_samples_leaf=0.012900542526760542, dtree__min_samples_split=0.19286479549233237, dtree__splitter=random 
[CV]  cvec__max_df=0.5890627609212813, cvec__max_features=29732, cvec__min_df=6, dtree__ccp_alpha=0.16753128741210532, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.11415556053577965, dtree__min_samples_leaf=0.012900542526760542, dtree__min_samples_split=0.19286479549233237, dtree__sp

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   13.3s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  cvec__max_df=0.9599301876789353, cvec__max_features=26601, cvec__min_df=10, dtree__ccp_alpha=0.174330385692643, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.04228285745483925, dtree__min_samples_leaf=3.052537633674302e-05, dtree__min_samples_split=0.1853427438313522, dtree__splitter=random, total=   2.6s
[CV] cvec__max_df=0.9599301876789353, cvec__max_features=26601, cvec__min_df=10, dtree__ccp_alpha=0.174330385692643, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.04228285745483925, dtree__min_samples_leaf=3.052537633674302e-05, dtree__min_samples_split=0.1853427438313522, dtree__splitter=random 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    2.5s remaining:    0.0s


[CV]  cvec__max_df=0.9599301876789353, cvec__max_features=26601, cvec__min_df=10, dtree__ccp_alpha=0.174330385692643, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.04228285745483925, dtree__min_samples_leaf=3.052537633674302e-05, dtree__min_samples_split=0.1853427438313522, dtree__splitter=random, total=   2.7s
[CV] cvec__max_df=0.9599301876789353, cvec__max_features=26601, cvec__min_df=10, dtree__ccp_alpha=0.174330385692643, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.04228285745483925, dtree__min_samples_leaf=3.052537633674302e-05, dtree__min_samples_split=0.1853427438313522, dtree__splitter=random 
[CV]  cvec__max_df=0.9599301876789353, cvec__max_features=26601, cvec__min_df=10, dtree__ccp_alpha=0.174330385692643, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.04228285745483925, dtree__min_samples_leaf=3.052537633674302e-05, dtree__min_samples_split=0.1853427438313522, dtree__split

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   13.3s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] cvec__max_df=0.10326727652926891, cvec__max_features=18817, cvec__min_df=10, dtree__ccp_alpha=0.14825654887360548, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.054942969767083495, dtree__min_samples_leaf=0.15641965311731407, dtree__min_samples_split=0.07629729341922216, dtree__splitter=random 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  cvec__max_df=0.10326727652926891, cvec__max_features=18817, cvec__min_df=10, dtree__ccp_alpha=0.14825654887360548, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.054942969767083495, dtree__min_samples_leaf=0.15641965311731407, dtree__min_samples_split=0.07629729341922216, dtree__splitter=random, total=   2.7s
[CV] cvec__max_df=0.10326727652926891, cvec__max_features=18817, cvec__min_df=10, dtree__ccp_alpha=0.14825654887360548, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.054942969767083495, dtree__min_samples_leaf=0.15641965311731407, dtree__min_samples_split=0.07629729341922216, dtree__splitter=random 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    2.6s remaining:    0.0s


[CV]  cvec__max_df=0.10326727652926891, cvec__max_features=18817, cvec__min_df=10, dtree__ccp_alpha=0.14825654887360548, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.054942969767083495, dtree__min_samples_leaf=0.15641965311731407, dtree__min_samples_split=0.07629729341922216, dtree__splitter=random, total=   2.6s
[CV] cvec__max_df=0.10326727652926891, cvec__max_features=18817, cvec__min_df=10, dtree__ccp_alpha=0.14825654887360548, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.054942969767083495, dtree__min_samples_leaf=0.15641965311731407, dtree__min_samples_split=0.07629729341922216, dtree__splitter=random 
[CV]  cvec__max_df=0.10326727652926891, cvec__max_features=18817, cvec__min_df=10, dtree__ccp_alpha=0.14825654887360548, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.054942969767083495, dtree__min_samples_leaf=0.15641965311731407, dtree__min_samples_split=0.07629729341922216, dtr

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   13.3s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] cvec__max_df=0.655580862360495, cvec__max_features=21041, cvec__min_df=4, dtree__ccp_alpha=0.015479362751686468, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.06080759297526134, dtree__min_samples_leaf=0.4230804085006891, dtree__min_samples_split=0.032165410989458416, dtree__splitter=best 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  cvec__max_df=0.655580862360495, cvec__max_features=21041, cvec__min_df=4, dtree__ccp_alpha=0.015479362751686468, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.06080759297526134, dtree__min_samples_leaf=0.4230804085006891, dtree__min_samples_split=0.032165410989458416, dtree__splitter=best, total=   2.7s
[CV] cvec__max_df=0.655580862360495, cvec__max_features=21041, cvec__min_df=4, dtree__ccp_alpha=0.015479362751686468, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.06080759297526134, dtree__min_samples_leaf=0.4230804085006891, dtree__min_samples_split=0.032165410989458416, dtree__splitter=best 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    2.6s remaining:    0.0s


[CV]  cvec__max_df=0.655580862360495, cvec__max_features=21041, cvec__min_df=4, dtree__ccp_alpha=0.015479362751686468, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.06080759297526134, dtree__min_samples_leaf=0.4230804085006891, dtree__min_samples_split=0.032165410989458416, dtree__splitter=best, total=   2.7s
[CV] cvec__max_df=0.655580862360495, cvec__max_features=21041, cvec__min_df=4, dtree__ccp_alpha=0.015479362751686468, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.06080759297526134, dtree__min_samples_leaf=0.4230804085006891, dtree__min_samples_split=0.032165410989458416, dtree__splitter=best 
[CV]  cvec__max_df=0.655580862360495, cvec__max_features=21041, cvec__min_df=4, dtree__ccp_alpha=0.015479362751686468, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.06080759297526134, dtree__min_samples_leaf=0.4230804085006891, dtree__min_samples_split=0.032165410989458416, dtree__

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   13.5s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] cvec__max_df=0.28480485606273576, cvec__max_features=22837, cvec__min_df=7, dtree__ccp_alpha=0.13860642938561002, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.17392904666507125, dtree__min_samples_leaf=0.07435182205469616, dtree__min_samples_split=0.31239298946351446, dtree__splitter=best 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  cvec__max_df=0.28480485606273576, cvec__max_features=22837, cvec__min_df=7, dtree__ccp_alpha=0.13860642938561002, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.17392904666507125, dtree__min_samples_leaf=0.07435182205469616, dtree__min_samples_split=0.31239298946351446, dtree__splitter=best, total=   2.6s
[CV] cvec__max_df=0.28480485606273576, cvec__max_features=22837, cvec__min_df=7, dtree__ccp_alpha=0.13860642938561002, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.17392904666507125, dtree__min_samples_leaf=0.07435182205469616, dtree__min_samples_split=0.31239298946351446, dtree__splitter=best 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    2.5s remaining:    0.0s


[CV]  cvec__max_df=0.28480485606273576, cvec__max_features=22837, cvec__min_df=7, dtree__ccp_alpha=0.13860642938561002, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.17392904666507125, dtree__min_samples_leaf=0.07435182205469616, dtree__min_samples_split=0.31239298946351446, dtree__splitter=best, total=   2.6s
[CV] cvec__max_df=0.28480485606273576, cvec__max_features=22837, cvec__min_df=7, dtree__ccp_alpha=0.13860642938561002, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.17392904666507125, dtree__min_samples_leaf=0.07435182205469616, dtree__min_samples_split=0.31239298946351446, dtree__splitter=best 
[CV]  cvec__max_df=0.28480485606273576, cvec__max_features=22837, cvec__min_df=7, dtree__ccp_alpha=0.13860642938561002, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.17392904666507125, dtree__min_samples_leaf=0.07435182205469616, dtree__min_samples_split=0.31239298946351446, dtree__splitt

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   13.3s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] cvec__max_df=0.28842524759431665, cvec__max_features=19965, cvec__min_df=4, dtree__ccp_alpha=0.16134386406634407, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.163024532005293, dtree__min_samples_leaf=0.42787129624195247, dtree__min_samples_split=0.03968643280375473, dtree__splitter=random 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  cvec__max_df=0.28842524759431665, cvec__max_features=19965, cvec__min_df=4, dtree__ccp_alpha=0.16134386406634407, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.163024532005293, dtree__min_samples_leaf=0.42787129624195247, dtree__min_samples_split=0.03968643280375473, dtree__splitter=random, total=   2.7s
[CV] cvec__max_df=0.28842524759431665, cvec__max_features=19965, cvec__min_df=4, dtree__ccp_alpha=0.16134386406634407, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.163024532005293, dtree__min_samples_leaf=0.42787129624195247, dtree__min_samples_split=0.03968643280375473, dtree__splitter=random 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    2.6s remaining:    0.0s


[CV]  cvec__max_df=0.28842524759431665, cvec__max_features=19965, cvec__min_df=4, dtree__ccp_alpha=0.16134386406634407, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.163024532005293, dtree__min_samples_leaf=0.42787129624195247, dtree__min_samples_split=0.03968643280375473, dtree__splitter=random, total=   2.7s
[CV] cvec__max_df=0.28842524759431665, cvec__max_features=19965, cvec__min_df=4, dtree__ccp_alpha=0.16134386406634407, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.163024532005293, dtree__min_samples_leaf=0.42787129624195247, dtree__min_samples_split=0.03968643280375473, dtree__splitter=random 
[CV]  cvec__max_df=0.28842524759431665, cvec__max_features=19965, cvec__min_df=4, dtree__ccp_alpha=0.16134386406634407, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.163024532005293, dtree__min_samples_leaf=0.42787129624195247, dtree__min_samples_split=0.03968643280375473, dtree_

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   13.4s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] cvec__max_df=0.707135337149732, cvec__max_features=15005, cvec__min_df=10, dtree__ccp_alpha=0.0934396006915923, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.07005051712040115, dtree__min_samples_leaf=0.018393479480770084, dtree__min_samples_split=0.34093066007315903, dtree__splitter=best 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  cvec__max_df=0.707135337149732, cvec__max_features=15005, cvec__min_df=10, dtree__ccp_alpha=0.0934396006915923, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.07005051712040115, dtree__min_samples_leaf=0.018393479480770084, dtree__min_samples_split=0.34093066007315903, dtree__splitter=best, total=   2.7s
[CV] cvec__max_df=0.707135337149732, cvec__max_features=15005, cvec__min_df=10, dtree__ccp_alpha=0.0934396006915923, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.07005051712040115, dtree__min_samples_leaf=0.018393479480770084, dtree__min_samples_split=0.34093066007315903, dtree__splitter=best 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    2.6s remaining:    0.0s


[CV]  cvec__max_df=0.707135337149732, cvec__max_features=15005, cvec__min_df=10, dtree__ccp_alpha=0.0934396006915923, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.07005051712040115, dtree__min_samples_leaf=0.018393479480770084, dtree__min_samples_split=0.34093066007315903, dtree__splitter=best, total=   2.6s
[CV] cvec__max_df=0.707135337149732, cvec__max_features=15005, cvec__min_df=10, dtree__ccp_alpha=0.0934396006915923, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.07005051712040115, dtree__min_samples_leaf=0.018393479480770084, dtree__min_samples_split=0.34093066007315903, dtree__splitter=best 
[CV]  cvec__max_df=0.707135337149732, cvec__max_features=15005, cvec__min_df=10, dtree__ccp_alpha=0.0934396006915923, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.07005051712040115, dtree__min_samples_leaf=0.018393479480770084, dtree__min_samples_split=0.34093066007315903, dtree__splitter=

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   13.2s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] cvec__max_df=0.15753176097502491, cvec__max_features=26461, cvec__min_df=5, dtree__ccp_alpha=0.021829053558563464, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.030251749885587327, dtree__min_samples_leaf=0.12957720695644157, dtree__min_samples_split=0.008152642840139614, dtree__splitter=best 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  cvec__max_df=0.15753176097502491, cvec__max_features=26461, cvec__min_df=5, dtree__ccp_alpha=0.021829053558563464, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.030251749885587327, dtree__min_samples_leaf=0.12957720695644157, dtree__min_samples_split=0.008152642840139614, dtree__splitter=best, total=   2.7s
[CV] cvec__max_df=0.15753176097502491, cvec__max_features=26461, cvec__min_df=5, dtree__ccp_alpha=0.021829053558563464, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.030251749885587327, dtree__min_samples_leaf=0.12957720695644157, dtree__min_samples_split=0.008152642840139614, dtree__splitter=best 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    2.6s remaining:    0.0s


[CV]  cvec__max_df=0.15753176097502491, cvec__max_features=26461, cvec__min_df=5, dtree__ccp_alpha=0.021829053558563464, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.030251749885587327, dtree__min_samples_leaf=0.12957720695644157, dtree__min_samples_split=0.008152642840139614, dtree__splitter=best, total=   2.6s
[CV] cvec__max_df=0.15753176097502491, cvec__max_features=26461, cvec__min_df=5, dtree__ccp_alpha=0.021829053558563464, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.030251749885587327, dtree__min_samples_leaf=0.12957720695644157, dtree__min_samples_split=0.008152642840139614, dtree__splitter=best 
[CV]  cvec__max_df=0.15753176097502491, cvec__max_features=26461, cvec__min_df=5, dtree__ccp_alpha=0.021829053558563464, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.030251749885587327, dtree__min_samples_leaf=0.12957720695644157, dtree__min_samples_split=0.008152642840139

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   13.3s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] cvec__max_df=0.25284194653315467, cvec__max_features=28286, cvec__min_df=5, dtree__ccp_alpha=0.03353174415874707, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.14987317667702635, dtree__min_samples_leaf=0.23416340775090927, dtree__min_samples_split=0.003890131858226898, dtree__splitter=random 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  cvec__max_df=0.25284194653315467, cvec__max_features=28286, cvec__min_df=5, dtree__ccp_alpha=0.03353174415874707, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.14987317667702635, dtree__min_samples_leaf=0.23416340775090927, dtree__min_samples_split=0.003890131858226898, dtree__splitter=random, total=   2.6s
[CV] cvec__max_df=0.25284194653315467, cvec__max_features=28286, cvec__min_df=5, dtree__ccp_alpha=0.03353174415874707, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.14987317667702635, dtree__min_samples_leaf=0.23416340775090927, dtree__min_samples_split=0.003890131858226898, dtree__splitter=random 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    2.6s remaining:    0.0s


[CV]  cvec__max_df=0.25284194653315467, cvec__max_features=28286, cvec__min_df=5, dtree__ccp_alpha=0.03353174415874707, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.14987317667702635, dtree__min_samples_leaf=0.23416340775090927, dtree__min_samples_split=0.003890131858226898, dtree__splitter=random, total=   2.7s
[CV] cvec__max_df=0.25284194653315467, cvec__max_features=28286, cvec__min_df=5, dtree__ccp_alpha=0.03353174415874707, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.14987317667702635, dtree__min_samples_leaf=0.23416340775090927, dtree__min_samples_split=0.003890131858226898, dtree__splitter=random 
[CV]  cvec__max_df=0.25284194653315467, cvec__max_features=28286, cvec__min_df=5, dtree__ccp_alpha=0.03353174415874707, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.14987317667702635, dtree__min_samples_leaf=0.23416340775090927, dtree__min_samples_split=0.00389013185822689

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   13.3s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] cvec__max_df=0.25556362589734183, cvec__max_features=27581, cvec__min_df=2, dtree__ccp_alpha=0.02828871606676882, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.18917544711815917, dtree__min_samples_leaf=0.34599528394495727, dtree__min_samples_split=0.0020042504340614946, dtree__splitter=random 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  cvec__max_df=0.25556362589734183, cvec__max_features=27581, cvec__min_df=2, dtree__ccp_alpha=0.02828871606676882, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.18917544711815917, dtree__min_samples_leaf=0.34599528394495727, dtree__min_samples_split=0.0020042504340614946, dtree__splitter=random, total=   2.7s
[CV] cvec__max_df=0.25556362589734183, cvec__max_features=27581, cvec__min_df=2, dtree__ccp_alpha=0.02828871606676882, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.18917544711815917, dtree__min_samples_leaf=0.34599528394495727, dtree__min_samples_split=0.0020042504340614946, dtree__splitter=random 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    2.6s remaining:    0.0s


[CV]  cvec__max_df=0.25556362589734183, cvec__max_features=27581, cvec__min_df=2, dtree__ccp_alpha=0.02828871606676882, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.18917544711815917, dtree__min_samples_leaf=0.34599528394495727, dtree__min_samples_split=0.0020042504340614946, dtree__splitter=random, total=   2.7s
[CV] cvec__max_df=0.25556362589734183, cvec__max_features=27581, cvec__min_df=2, dtree__ccp_alpha=0.02828871606676882, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.18917544711815917, dtree__min_samples_leaf=0.34599528394495727, dtree__min_samples_split=0.0020042504340614946, dtree__splitter=random 
[CV]  cvec__max_df=0.25556362589734183, cvec__max_features=27581, cvec__min_df=2, dtree__ccp_alpha=0.02828871606676882, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.18917544711815917, dtree__min_samples_leaf=0.34599528394495727, dtree__min_samples_split=0.002004250434061

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   13.5s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] cvec__max_df=0.2116996626797935, cvec__max_features=25458, cvec__min_df=8, dtree__ccp_alpha=0.024596110502303776, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.02239011978846546, dtree__min_samples_leaf=0.39514144332571005, dtree__min_samples_split=0.011081754914672783, dtree__splitter=best 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  cvec__max_df=0.2116996626797935, cvec__max_features=25458, cvec__min_df=8, dtree__ccp_alpha=0.024596110502303776, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.02239011978846546, dtree__min_samples_leaf=0.39514144332571005, dtree__min_samples_split=0.011081754914672783, dtree__splitter=best, total=   2.7s
[CV] cvec__max_df=0.2116996626797935, cvec__max_features=25458, cvec__min_df=8, dtree__ccp_alpha=0.024596110502303776, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.02239011978846546, dtree__min_samples_leaf=0.39514144332571005, dtree__min_samples_split=0.011081754914672783, dtree__splitter=best 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    2.6s remaining:    0.0s


[CV]  cvec__max_df=0.2116996626797935, cvec__max_features=25458, cvec__min_df=8, dtree__ccp_alpha=0.024596110502303776, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.02239011978846546, dtree__min_samples_leaf=0.39514144332571005, dtree__min_samples_split=0.011081754914672783, dtree__splitter=best, total=   2.6s
[CV] cvec__max_df=0.2116996626797935, cvec__max_features=25458, cvec__min_df=8, dtree__ccp_alpha=0.024596110502303776, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.02239011978846546, dtree__min_samples_leaf=0.39514144332571005, dtree__min_samples_split=0.011081754914672783, dtree__splitter=best 
[CV]  cvec__max_df=0.2116996626797935, cvec__max_features=25458, cvec__min_df=8, dtree__ccp_alpha=0.024596110502303776, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.02239011978846546, dtree__min_samples_leaf=0.39514144332571005, dtree__min_samples_split=0.011081754914672783, d

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   13.3s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] cvec__max_df=0.1420789856325086, cvec__max_features=21186, cvec__min_df=10, dtree__ccp_alpha=0.0013798681697514195, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.03613713274659927, dtree__min_samples_leaf=0.01689908436318133, dtree__min_samples_split=0.034765466848627365, dtree__splitter=best 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  cvec__max_df=0.1420789856325086, cvec__max_features=21186, cvec__min_df=10, dtree__ccp_alpha=0.0013798681697514195, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.03613713274659927, dtree__min_samples_leaf=0.01689908436318133, dtree__min_samples_split=0.034765466848627365, dtree__splitter=best, total=   2.7s
[CV] cvec__max_df=0.1420789856325086, cvec__max_features=21186, cvec__min_df=10, dtree__ccp_alpha=0.0013798681697514195, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.03613713274659927, dtree__min_samples_leaf=0.01689908436318133, dtree__min_samples_split=0.034765466848627365, dtree__splitter=best 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    2.6s remaining:    0.0s


[CV]  cvec__max_df=0.1420789856325086, cvec__max_features=21186, cvec__min_df=10, dtree__ccp_alpha=0.0013798681697514195, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.03613713274659927, dtree__min_samples_leaf=0.01689908436318133, dtree__min_samples_split=0.034765466848627365, dtree__splitter=best, total=   2.6s
[CV] cvec__max_df=0.1420789856325086, cvec__max_features=21186, cvec__min_df=10, dtree__ccp_alpha=0.0013798681697514195, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.03613713274659927, dtree__min_samples_leaf=0.01689908436318133, dtree__min_samples_split=0.034765466848627365, dtree__splitter=best 
[CV]  cvec__max_df=0.1420789856325086, cvec__max_features=21186, cvec__min_df=10, dtree__ccp_alpha=0.0013798681697514195, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.03613713274659927, dtree__min_samples_leaf=0.01689908436318133, dtree__min_samples_split=0.034765466848627365, dtre

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   13.4s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] cvec__max_df=0.13975916656195111, cvec__max_features=27248, cvec__min_df=4, dtree__ccp_alpha=0.06053030644255671, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.013806782478740657, dtree__min_samples_leaf=0.43992716389956305, dtree__min_samples_split=0.0025063748583559757, dtree__splitter=best 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  cvec__max_df=0.13975916656195111, cvec__max_features=27248, cvec__min_df=4, dtree__ccp_alpha=0.06053030644255671, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.013806782478740657, dtree__min_samples_leaf=0.43992716389956305, dtree__min_samples_split=0.0025063748583559757, dtree__splitter=best, total=   2.7s
[CV] cvec__max_df=0.13975916656195111, cvec__max_features=27248, cvec__min_df=4, dtree__ccp_alpha=0.06053030644255671, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.013806782478740657, dtree__min_samples_leaf=0.43992716389956305, dtree__min_samples_split=0.0025063748583559757, dtree__splitter=best 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    2.6s remaining:    0.0s


[CV]  cvec__max_df=0.13975916656195111, cvec__max_features=27248, cvec__min_df=4, dtree__ccp_alpha=0.06053030644255671, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.013806782478740657, dtree__min_samples_leaf=0.43992716389956305, dtree__min_samples_split=0.0025063748583559757, dtree__splitter=best, total=   2.6s
[CV] cvec__max_df=0.13975916656195111, cvec__max_features=27248, cvec__min_df=4, dtree__ccp_alpha=0.06053030644255671, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.013806782478740657, dtree__min_samples_leaf=0.43992716389956305, dtree__min_samples_split=0.0025063748583559757, dtree__splitter=best 
[CV]  cvec__max_df=0.13975916656195111, cvec__max_features=27248, cvec__min_df=4, dtree__ccp_alpha=0.06053030644255671, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.013806782478740657, dtree__min_samples_leaf=0.43992716389956305, dtree__min_samples_split=0.0025063748583559

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   13.4s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] cvec__max_df=0.35752758278405306, cvec__max_features=15479, cvec__min_df=10, dtree__ccp_alpha=0.15456726604976442, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.028368204673883707, dtree__min_samples_leaf=0.007001622719937874, dtree__min_samples_split=0.008817842027317281, dtree__splitter=best 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  cvec__max_df=0.35752758278405306, cvec__max_features=15479, cvec__min_df=10, dtree__ccp_alpha=0.15456726604976442, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.028368204673883707, dtree__min_samples_leaf=0.007001622719937874, dtree__min_samples_split=0.008817842027317281, dtree__splitter=best, total=   2.7s
[CV] cvec__max_df=0.35752758278405306, cvec__max_features=15479, cvec__min_df=10, dtree__ccp_alpha=0.15456726604976442, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.028368204673883707, dtree__min_samples_leaf=0.007001622719937874, dtree__min_samples_split=0.008817842027317281, dtree__splitter=best 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    2.6s remaining:    0.0s


[CV]  cvec__max_df=0.35752758278405306, cvec__max_features=15479, cvec__min_df=10, dtree__ccp_alpha=0.15456726604976442, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.028368204673883707, dtree__min_samples_leaf=0.007001622719937874, dtree__min_samples_split=0.008817842027317281, dtree__splitter=best, total=   2.6s
[CV] cvec__max_df=0.35752758278405306, cvec__max_features=15479, cvec__min_df=10, dtree__ccp_alpha=0.15456726604976442, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.028368204673883707, dtree__min_samples_leaf=0.007001622719937874, dtree__min_samples_split=0.008817842027317281, dtree__splitter=best 
[CV]  cvec__max_df=0.35752758278405306, cvec__max_features=15479, cvec__min_df=10, dtree__ccp_alpha=0.15456726604976442, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.028368204673883707, dtree__min_samples_leaf=0.007001622719937874, dtree__min_samples_split=0.008817842027317281, d

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   13.3s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] cvec__max_df=0.1304727850526459, cvec__max_features=26142, cvec__min_df=3, dtree__ccp_alpha=0.18840907855869707, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.01065136080487026, dtree__min_samples_leaf=0.03893948314783708, dtree__min_samples_split=0.011284162814005684, dtree__splitter=best 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  cvec__max_df=0.1304727850526459, cvec__max_features=26142, cvec__min_df=3, dtree__ccp_alpha=0.18840907855869707, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.01065136080487026, dtree__min_samples_leaf=0.03893948314783708, dtree__min_samples_split=0.011284162814005684, dtree__splitter=best, total=   2.7s
[CV] cvec__max_df=0.1304727850526459, cvec__max_features=26142, cvec__min_df=3, dtree__ccp_alpha=0.18840907855869707, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.01065136080487026, dtree__min_samples_leaf=0.03893948314783708, dtree__min_samples_split=0.011284162814005684, dtree__splitter=best 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    2.6s remaining:    0.0s


[CV]  cvec__max_df=0.1304727850526459, cvec__max_features=26142, cvec__min_df=3, dtree__ccp_alpha=0.18840907855869707, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.01065136080487026, dtree__min_samples_leaf=0.03893948314783708, dtree__min_samples_split=0.011284162814005684, dtree__splitter=best, total=   2.6s
[CV] cvec__max_df=0.1304727850526459, cvec__max_features=26142, cvec__min_df=3, dtree__ccp_alpha=0.18840907855869707, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.01065136080487026, dtree__min_samples_leaf=0.03893948314783708, dtree__min_samples_split=0.011284162814005684, dtree__splitter=best 
[CV]  cvec__max_df=0.1304727850526459, cvec__max_features=26142, cvec__min_df=3, dtree__ccp_alpha=0.18840907855869707, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.01065136080487026, dtree__min_samples_leaf=0.03893948314783708, dtree__min_samples_split=0.011284162814005684, dtree__splitt

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   13.4s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] cvec__max_df=0.2375764900799013, cvec__max_features=19014, cvec__min_df=6, dtree__ccp_alpha=0.008791362812822247, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.006141853323163683, dtree__min_samples_leaf=0.2598325457018784, dtree__min_samples_split=0.020253496389384288, dtree__splitter=best 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  cvec__max_df=0.2375764900799013, cvec__max_features=19014, cvec__min_df=6, dtree__ccp_alpha=0.008791362812822247, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.006141853323163683, dtree__min_samples_leaf=0.2598325457018784, dtree__min_samples_split=0.020253496389384288, dtree__splitter=best, total=   2.7s
[CV] cvec__max_df=0.2375764900799013, cvec__max_features=19014, cvec__min_df=6, dtree__ccp_alpha=0.008791362812822247, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.006141853323163683, dtree__min_samples_leaf=0.2598325457018784, dtree__min_samples_split=0.020253496389384288, dtree__splitter=best 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    2.6s remaining:    0.0s


[CV]  cvec__max_df=0.2375764900799013, cvec__max_features=19014, cvec__min_df=6, dtree__ccp_alpha=0.008791362812822247, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.006141853323163683, dtree__min_samples_leaf=0.2598325457018784, dtree__min_samples_split=0.020253496389384288, dtree__splitter=best, total=   2.6s
[CV] cvec__max_df=0.2375764900799013, cvec__max_features=19014, cvec__min_df=6, dtree__ccp_alpha=0.008791362812822247, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.006141853323163683, dtree__min_samples_leaf=0.2598325457018784, dtree__min_samples_split=0.020253496389384288, dtree__splitter=best 
[CV]  cvec__max_df=0.2375764900799013, cvec__max_features=19014, cvec__min_df=6, dtree__ccp_alpha=0.008791362812822247, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.006141853323163683, dtree__min_samples_leaf=0.2598325457018784, dtree__min_samples_split=0.020253496389384288, dtree__spl

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   13.5s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] cvec__max_df=0.19274862688134686, cvec__max_features=25584, cvec__min_df=4, dtree__ccp_alpha=0.02070603759293566, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.18104950603983283, dtree__min_samples_leaf=0.3162780116921235, dtree__min_samples_split=0.004465251657743475, dtree__splitter=random 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  cvec__max_df=0.19274862688134686, cvec__max_features=25584, cvec__min_df=4, dtree__ccp_alpha=0.02070603759293566, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.18104950603983283, dtree__min_samples_leaf=0.3162780116921235, dtree__min_samples_split=0.004465251657743475, dtree__splitter=random, total=   2.7s
[CV] cvec__max_df=0.19274862688134686, cvec__max_features=25584, cvec__min_df=4, dtree__ccp_alpha=0.02070603759293566, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.18104950603983283, dtree__min_samples_leaf=0.3162780116921235, dtree__min_samples_split=0.004465251657743475, dtree__splitter=random 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    2.6s remaining:    0.0s


[CV]  cvec__max_df=0.19274862688134686, cvec__max_features=25584, cvec__min_df=4, dtree__ccp_alpha=0.02070603759293566, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.18104950603983283, dtree__min_samples_leaf=0.3162780116921235, dtree__min_samples_split=0.004465251657743475, dtree__splitter=random, total=   2.7s
[CV] cvec__max_df=0.19274862688134686, cvec__max_features=25584, cvec__min_df=4, dtree__ccp_alpha=0.02070603759293566, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.18104950603983283, dtree__min_samples_leaf=0.3162780116921235, dtree__min_samples_split=0.004465251657743475, dtree__splitter=random 
[CV]  cvec__max_df=0.19274862688134686, cvec__max_features=25584, cvec__min_df=4, dtree__ccp_alpha=0.02070603759293566, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.18104950603983283, dtree__min_samples_leaf=0.3162780116921235, dtree__min_samples_split=0.004465251657743475, 

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   13.4s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] cvec__max_df=0.16858660678187384, cvec__max_features=28114, cvec__min_df=9, dtree__ccp_alpha=0.0021930055000487507, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.014138395429908338, dtree__min_samples_leaf=0.31650326312216814, dtree__min_samples_split=0.0009974456854091043, dtree__splitter=random 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  cvec__max_df=0.16858660678187384, cvec__max_features=28114, cvec__min_df=9, dtree__ccp_alpha=0.0021930055000487507, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.014138395429908338, dtree__min_samples_leaf=0.31650326312216814, dtree__min_samples_split=0.0009974456854091043, dtree__splitter=random, total=   2.7s
[CV] cvec__max_df=0.16858660678187384, cvec__max_features=28114, cvec__min_df=9, dtree__ccp_alpha=0.0021930055000487507, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.014138395429908338, dtree__min_samples_leaf=0.31650326312216814, dtree__min_samples_split=0.0009974456854091043, dtree__splitter=random 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    2.6s remaining:    0.0s


[CV]  cvec__max_df=0.16858660678187384, cvec__max_features=28114, cvec__min_df=9, dtree__ccp_alpha=0.0021930055000487507, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.014138395429908338, dtree__min_samples_leaf=0.31650326312216814, dtree__min_samples_split=0.0009974456854091043, dtree__splitter=random, total=   2.7s
[CV] cvec__max_df=0.16858660678187384, cvec__max_features=28114, cvec__min_df=9, dtree__ccp_alpha=0.0021930055000487507, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.014138395429908338, dtree__min_samples_leaf=0.31650326312216814, dtree__min_samples_split=0.0009974456854091043, dtree__splitter=random 
[CV]  cvec__max_df=0.16858660678187384, cvec__max_features=28114, cvec__min_df=9, dtree__ccp_alpha=0.0021930055000487507, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.014138395429908338, dtree__min_samples_leaf=0.31650326312216814, dtree__min_samples_split=0.000997445685409

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   13.4s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] cvec__max_df=0.15861063328380537, cvec__max_features=25786, cvec__min_df=4, dtree__ccp_alpha=0.0073548321760192005, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.17387033825499246, dtree__min_samples_leaf=0.035401403883927214, dtree__min_samples_split=0.0012202925575836023, dtree__splitter=best 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  cvec__max_df=0.15861063328380537, cvec__max_features=25786, cvec__min_df=4, dtree__ccp_alpha=0.0073548321760192005, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.17387033825499246, dtree__min_samples_leaf=0.035401403883927214, dtree__min_samples_split=0.0012202925575836023, dtree__splitter=best, total=   2.6s
[CV] cvec__max_df=0.15861063328380537, cvec__max_features=25786, cvec__min_df=4, dtree__ccp_alpha=0.0073548321760192005, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.17387033825499246, dtree__min_samples_leaf=0.035401403883927214, dtree__min_samples_split=0.0012202925575836023, dtree__splitter=best 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    2.5s remaining:    0.0s


[CV]  cvec__max_df=0.15861063328380537, cvec__max_features=25786, cvec__min_df=4, dtree__ccp_alpha=0.0073548321760192005, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.17387033825499246, dtree__min_samples_leaf=0.035401403883927214, dtree__min_samples_split=0.0012202925575836023, dtree__splitter=best, total=   2.7s
[CV] cvec__max_df=0.15861063328380537, cvec__max_features=25786, cvec__min_df=4, dtree__ccp_alpha=0.0073548321760192005, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.17387033825499246, dtree__min_samples_leaf=0.035401403883927214, dtree__min_samples_split=0.0012202925575836023, dtree__splitter=best 
[CV]  cvec__max_df=0.15861063328380537, cvec__max_features=25786, cvec__min_df=4, dtree__ccp_alpha=0.0073548321760192005, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.17387033825499246, dtree__min_samples_leaf=0.035401403883927214, dtree__min_samples_split=0.0012202925575836023

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   13.4s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] cvec__max_df=0.12191773649427248, cvec__max_features=15445, cvec__min_df=5, dtree__ccp_alpha=0.003673569805266142, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.0020710515449619666, dtree__min_samples_leaf=0.09528659332202341, dtree__min_samples_split=0.33167473144381515, dtree__splitter=best 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  cvec__max_df=0.12191773649427248, cvec__max_features=15445, cvec__min_df=5, dtree__ccp_alpha=0.003673569805266142, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.0020710515449619666, dtree__min_samples_leaf=0.09528659332202341, dtree__min_samples_split=0.33167473144381515, dtree__splitter=best, total=   2.7s
[CV] cvec__max_df=0.12191773649427248, cvec__max_features=15445, cvec__min_df=5, dtree__ccp_alpha=0.003673569805266142, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.0020710515449619666, dtree__min_samples_leaf=0.09528659332202341, dtree__min_samples_split=0.33167473144381515, dtree__splitter=best 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    2.6s remaining:    0.0s


[CV]  cvec__max_df=0.12191773649427248, cvec__max_features=15445, cvec__min_df=5, dtree__ccp_alpha=0.003673569805266142, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.0020710515449619666, dtree__min_samples_leaf=0.09528659332202341, dtree__min_samples_split=0.33167473144381515, dtree__splitter=best, total=   2.7s
[CV] cvec__max_df=0.12191773649427248, cvec__max_features=15445, cvec__min_df=5, dtree__ccp_alpha=0.003673569805266142, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.0020710515449619666, dtree__min_samples_leaf=0.09528659332202341, dtree__min_samples_split=0.33167473144381515, dtree__splitter=best 
[CV]  cvec__max_df=0.12191773649427248, cvec__max_features=15445, cvec__min_df=5, dtree__ccp_alpha=0.003673569805266142, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.0020710515449619666, dtree__min_samples_leaf=0.09528659332202341, dtree__min_samples_split=0.33167473144381515, dtre

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   13.4s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] cvec__max_df=0.11802735658918248, cvec__max_features=22073, cvec__min_df=4, dtree__ccp_alpha=0.011264220343959644, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.13733973379664965, dtree__min_samples_leaf=0.2926562300520268, dtree__min_samples_split=0.3359536057855648, dtree__splitter=best 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  cvec__max_df=0.11802735658918248, cvec__max_features=22073, cvec__min_df=4, dtree__ccp_alpha=0.011264220343959644, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.13733973379664965, dtree__min_samples_leaf=0.2926562300520268, dtree__min_samples_split=0.3359536057855648, dtree__splitter=best, total=   2.6s
[CV] cvec__max_df=0.11802735658918248, cvec__max_features=22073, cvec__min_df=4, dtree__ccp_alpha=0.011264220343959644, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.13733973379664965, dtree__min_samples_leaf=0.2926562300520268, dtree__min_samples_split=0.3359536057855648, dtree__splitter=best 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    2.5s remaining:    0.0s


[CV]  cvec__max_df=0.11802735658918248, cvec__max_features=22073, cvec__min_df=4, dtree__ccp_alpha=0.011264220343959644, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.13733973379664965, dtree__min_samples_leaf=0.2926562300520268, dtree__min_samples_split=0.3359536057855648, dtree__splitter=best, total=   2.7s
[CV] cvec__max_df=0.11802735658918248, cvec__max_features=22073, cvec__min_df=4, dtree__ccp_alpha=0.011264220343959644, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.13733973379664965, dtree__min_samples_leaf=0.2926562300520268, dtree__min_samples_split=0.3359536057855648, dtree__splitter=best 
[CV]  cvec__max_df=0.11802735658918248, cvec__max_features=22073, cvec__min_df=4, dtree__ccp_alpha=0.011264220343959644, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.13733973379664965, dtree__min_samples_leaf=0.2926562300520268, dtree__min_samples_split=0.3359536057855648, dtree__

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   13.4s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] cvec__max_df=0.12113374252690973, cvec__max_features=17677, cvec__min_df=9, dtree__ccp_alpha=0.010588055375824482, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.006452630177287967, dtree__min_samples_leaf=0.02196420353435719, dtree__min_samples_split=0.0032229557339537593, dtree__splitter=best 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  cvec__max_df=0.12113374252690973, cvec__max_features=17677, cvec__min_df=9, dtree__ccp_alpha=0.010588055375824482, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.006452630177287967, dtree__min_samples_leaf=0.02196420353435719, dtree__min_samples_split=0.0032229557339537593, dtree__splitter=best, total=   2.7s
[CV] cvec__max_df=0.12113374252690973, cvec__max_features=17677, cvec__min_df=9, dtree__ccp_alpha=0.010588055375824482, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.006452630177287967, dtree__min_samples_leaf=0.02196420353435719, dtree__min_samples_split=0.0032229557339537593, dtree__splitter=best 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    2.6s remaining:    0.0s


[CV]  cvec__max_df=0.12113374252690973, cvec__max_features=17677, cvec__min_df=9, dtree__ccp_alpha=0.010588055375824482, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.006452630177287967, dtree__min_samples_leaf=0.02196420353435719, dtree__min_samples_split=0.0032229557339537593, dtree__splitter=best, total=   2.6s
[CV] cvec__max_df=0.12113374252690973, cvec__max_features=17677, cvec__min_df=9, dtree__ccp_alpha=0.010588055375824482, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.006452630177287967, dtree__min_samples_leaf=0.02196420353435719, dtree__min_samples_split=0.0032229557339537593, dtree__splitter=best 
[CV]  cvec__max_df=0.12113374252690973, cvec__max_features=17677, cvec__min_df=9, dtree__ccp_alpha=0.010588055375824482, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.006452630177287967, dtree__min_samples_leaf=0.02196420353435719, dtree__min_samples_split=0.0032229557339537593, d

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   13.4s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] cvec__max_df=0.11333466068825164, cvec__max_features=15931, cvec__min_df=8, dtree__ccp_alpha=0.06870711064234308, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.0014240267727842152, dtree__min_samples_leaf=0.3979415013075369, dtree__min_samples_split=0.3893123592483423, dtree__splitter=random 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  cvec__max_df=0.11333466068825164, cvec__max_features=15931, cvec__min_df=8, dtree__ccp_alpha=0.06870711064234308, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.0014240267727842152, dtree__min_samples_leaf=0.3979415013075369, dtree__min_samples_split=0.3893123592483423, dtree__splitter=random, total=   2.7s
[CV] cvec__max_df=0.11333466068825164, cvec__max_features=15931, cvec__min_df=8, dtree__ccp_alpha=0.06870711064234308, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.0014240267727842152, dtree__min_samples_leaf=0.3979415013075369, dtree__min_samples_split=0.3893123592483423, dtree__splitter=random 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    2.6s remaining:    0.0s


[CV]  cvec__max_df=0.11333466068825164, cvec__max_features=15931, cvec__min_df=8, dtree__ccp_alpha=0.06870711064234308, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.0014240267727842152, dtree__min_samples_leaf=0.3979415013075369, dtree__min_samples_split=0.3893123592483423, dtree__splitter=random, total=   2.7s
[CV] cvec__max_df=0.11333466068825164, cvec__max_features=15931, cvec__min_df=8, dtree__ccp_alpha=0.06870711064234308, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.0014240267727842152, dtree__min_samples_leaf=0.3979415013075369, dtree__min_samples_split=0.3893123592483423, dtree__splitter=random 
[CV]  cvec__max_df=0.11333466068825164, cvec__max_features=15931, cvec__min_df=8, dtree__ccp_alpha=0.06870711064234308, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.0014240267727842152, dtree__min_samples_leaf=0.3979415013075369, dtree__min_samples_split=0.3893123592483423, 

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   13.3s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] cvec__max_df=0.10845568731020869, cvec__max_features=24060, cvec__min_df=5, dtree__ccp_alpha=0.009133306973438107, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.15624641756810123, dtree__min_samples_leaf=0.0016271517703710385, dtree__min_samples_split=0.003321549560763971, dtree__splitter=random 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  cvec__max_df=0.10845568731020869, cvec__max_features=24060, cvec__min_df=5, dtree__ccp_alpha=0.009133306973438107, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.15624641756810123, dtree__min_samples_leaf=0.0016271517703710385, dtree__min_samples_split=0.003321549560763971, dtree__splitter=random, total=   2.8s
[CV] cvec__max_df=0.10845568731020869, cvec__max_features=24060, cvec__min_df=5, dtree__ccp_alpha=0.009133306973438107, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.15624641756810123, dtree__min_samples_leaf=0.0016271517703710385, dtree__min_samples_split=0.003321549560763971, dtree__splitter=random 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    2.7s remaining:    0.0s


[CV]  cvec__max_df=0.10845568731020869, cvec__max_features=24060, cvec__min_df=5, dtree__ccp_alpha=0.009133306973438107, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.15624641756810123, dtree__min_samples_leaf=0.0016271517703710385, dtree__min_samples_split=0.003321549560763971, dtree__splitter=random, total=   2.7s
[CV] cvec__max_df=0.10845568731020869, cvec__max_features=24060, cvec__min_df=5, dtree__ccp_alpha=0.009133306973438107, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.15624641756810123, dtree__min_samples_leaf=0.0016271517703710385, dtree__min_samples_split=0.003321549560763971, dtree__splitter=random 
[CV]  cvec__max_df=0.10845568731020869, cvec__max_features=24060, cvec__min_df=5, dtree__ccp_alpha=0.009133306973438107, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.15624641756810123, dtree__min_samples_leaf=0.0016271517703710385, dtree__min_samples_split=0.00332154

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   13.5s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] cvec__max_df=0.10199397052198615, cvec__max_features=23490, cvec__min_df=10, dtree__ccp_alpha=0.1346486866134889, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.0012464010265163864, dtree__min_samples_leaf=0.38920397910216353, dtree__min_samples_split=0.13706771952464977, dtree__splitter=best 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  cvec__max_df=0.10199397052198615, cvec__max_features=23490, cvec__min_df=10, dtree__ccp_alpha=0.1346486866134889, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.0012464010265163864, dtree__min_samples_leaf=0.38920397910216353, dtree__min_samples_split=0.13706771952464977, dtree__splitter=best, total=   2.9s
[CV] cvec__max_df=0.10199397052198615, cvec__max_features=23490, cvec__min_df=10, dtree__ccp_alpha=0.1346486866134889, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.0012464010265163864, dtree__min_samples_leaf=0.38920397910216353, dtree__min_samples_split=0.13706771952464977, dtree__splitter=best 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    2.8s remaining:    0.0s


[CV]  cvec__max_df=0.10199397052198615, cvec__max_features=23490, cvec__min_df=10, dtree__ccp_alpha=0.1346486866134889, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.0012464010265163864, dtree__min_samples_leaf=0.38920397910216353, dtree__min_samples_split=0.13706771952464977, dtree__splitter=best, total=   2.9s
[CV] cvec__max_df=0.10199397052198615, cvec__max_features=23490, cvec__min_df=10, dtree__ccp_alpha=0.1346486866134889, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.0012464010265163864, dtree__min_samples_leaf=0.38920397910216353, dtree__min_samples_split=0.13706771952464977, dtree__splitter=best 
[CV]  cvec__max_df=0.10199397052198615, cvec__max_features=23490, cvec__min_df=10, dtree__ccp_alpha=0.1346486866134889, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.0012464010265163864, dtree__min_samples_leaf=0.38920397910216353, dtree__min_samples_split=0.13706771952464977

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   13.8s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] cvec__max_df=0.6564837894086004, cvec__max_features=21849, cvec__min_df=7, dtree__ccp_alpha=0.14782672391314378, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.0015500057145970028, dtree__min_samples_leaf=0.32777197829505983, dtree__min_samples_split=0.00022204591527358325, dtree__splitter=best 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  cvec__max_df=0.6564837894086004, cvec__max_features=21849, cvec__min_df=7, dtree__ccp_alpha=0.14782672391314378, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.0015500057145970028, dtree__min_samples_leaf=0.32777197829505983, dtree__min_samples_split=0.00022204591527358325, dtree__splitter=best, total=   2.7s
[CV] cvec__max_df=0.6564837894086004, cvec__max_features=21849, cvec__min_df=7, dtree__ccp_alpha=0.14782672391314378, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.0015500057145970028, dtree__min_samples_leaf=0.32777197829505983, dtree__min_samples_split=0.00022204591527358325, dtree__splitter=best 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    2.6s remaining:    0.0s


[CV]  cvec__max_df=0.6564837894086004, cvec__max_features=21849, cvec__min_df=7, dtree__ccp_alpha=0.14782672391314378, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.0015500057145970028, dtree__min_samples_leaf=0.32777197829505983, dtree__min_samples_split=0.00022204591527358325, dtree__splitter=best, total=   2.7s
[CV] cvec__max_df=0.6564837894086004, cvec__max_features=21849, cvec__min_df=7, dtree__ccp_alpha=0.14782672391314378, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.0015500057145970028, dtree__min_samples_leaf=0.32777197829505983, dtree__min_samples_split=0.00022204591527358325, dtree__splitter=best 
[CV]  cvec__max_df=0.6564837894086004, cvec__max_features=21849, cvec__min_df=7, dtree__ccp_alpha=0.14782672391314378, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.0015500057145970028, dtree__min_samples_leaf=0.32777197829505983, dtree__min_samples_split=0.00022204591527358325, d

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   13.4s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] cvec__max_df=0.891026623663071, cvec__max_features=16542, cvec__min_df=7, dtree__ccp_alpha=0.022963847592747548, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.00839877552082651, dtree__min_samples_leaf=0.27336355383456223, dtree__min_samples_split=0.002162863627075851, dtree__splitter=random 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  cvec__max_df=0.891026623663071, cvec__max_features=16542, cvec__min_df=7, dtree__ccp_alpha=0.022963847592747548, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.00839877552082651, dtree__min_samples_leaf=0.27336355383456223, dtree__min_samples_split=0.002162863627075851, dtree__splitter=random, total=   2.6s
[CV] cvec__max_df=0.891026623663071, cvec__max_features=16542, cvec__min_df=7, dtree__ccp_alpha=0.022963847592747548, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.00839877552082651, dtree__min_samples_leaf=0.27336355383456223, dtree__min_samples_split=0.002162863627075851, dtree__splitter=random 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    2.5s remaining:    0.0s


[CV]  cvec__max_df=0.891026623663071, cvec__max_features=16542, cvec__min_df=7, dtree__ccp_alpha=0.022963847592747548, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.00839877552082651, dtree__min_samples_leaf=0.27336355383456223, dtree__min_samples_split=0.002162863627075851, dtree__splitter=random, total=   2.6s
[CV] cvec__max_df=0.891026623663071, cvec__max_features=16542, cvec__min_df=7, dtree__ccp_alpha=0.022963847592747548, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.00839877552082651, dtree__min_samples_leaf=0.27336355383456223, dtree__min_samples_split=0.002162863627075851, dtree__splitter=random 
[CV]  cvec__max_df=0.891026623663071, cvec__max_features=16542, cvec__min_df=7, dtree__ccp_alpha=0.022963847592747548, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.00839877552082651, dtree__min_samples_leaf=0.27336355383456223, dtree__min_samples_split=0.002162863627075851, dtree__sp

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   13.1s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] cvec__max_df=0.625986654821623, cvec__max_features=15058, cvec__min_df=10, dtree__ccp_alpha=0.19096557702992675, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.0003219702306050732, dtree__min_samples_leaf=0.008170222976130072, dtree__min_samples_split=0.12430709348378846, dtree__splitter=best 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  cvec__max_df=0.625986654821623, cvec__max_features=15058, cvec__min_df=10, dtree__ccp_alpha=0.19096557702992675, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.0003219702306050732, dtree__min_samples_leaf=0.008170222976130072, dtree__min_samples_split=0.12430709348378846, dtree__splitter=best, total=   2.6s
[CV] cvec__max_df=0.625986654821623, cvec__max_features=15058, cvec__min_df=10, dtree__ccp_alpha=0.19096557702992675, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.0003219702306050732, dtree__min_samples_leaf=0.008170222976130072, dtree__min_samples_split=0.12430709348378846, dtree__splitter=best 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    2.6s remaining:    0.0s


[CV]  cvec__max_df=0.625986654821623, cvec__max_features=15058, cvec__min_df=10, dtree__ccp_alpha=0.19096557702992675, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.0003219702306050732, dtree__min_samples_leaf=0.008170222976130072, dtree__min_samples_split=0.12430709348378846, dtree__splitter=best, total=   2.6s
[CV] cvec__max_df=0.625986654821623, cvec__max_features=15058, cvec__min_df=10, dtree__ccp_alpha=0.19096557702992675, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.0003219702306050732, dtree__min_samples_leaf=0.008170222976130072, dtree__min_samples_split=0.12430709348378846, dtree__splitter=best 
[CV]  cvec__max_df=0.625986654821623, cvec__max_features=15058, cvec__min_df=10, dtree__ccp_alpha=0.19096557702992675, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.0003219702306050732, dtree__min_samples_leaf=0.008170222976130072, dtree__min_samples_split=0.12430709348378846

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   13.1s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] cvec__max_df=0.10892264719185651, cvec__max_features=21462, cvec__min_df=10, dtree__ccp_alpha=0.019350393048759965, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.00500348487172151, dtree__min_samples_leaf=0.15210338481008415, dtree__min_samples_split=0.4312432316391185, dtree__splitter=random 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  cvec__max_df=0.10892264719185651, cvec__max_features=21462, cvec__min_df=10, dtree__ccp_alpha=0.019350393048759965, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.00500348487172151, dtree__min_samples_leaf=0.15210338481008415, dtree__min_samples_split=0.4312432316391185, dtree__splitter=random, total=   2.6s
[CV] cvec__max_df=0.10892264719185651, cvec__max_features=21462, cvec__min_df=10, dtree__ccp_alpha=0.019350393048759965, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.00500348487172151, dtree__min_samples_leaf=0.15210338481008415, dtree__min_samples_split=0.4312432316391185, dtree__splitter=random 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    2.5s remaining:    0.0s


[CV]  cvec__max_df=0.10892264719185651, cvec__max_features=21462, cvec__min_df=10, dtree__ccp_alpha=0.019350393048759965, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.00500348487172151, dtree__min_samples_leaf=0.15210338481008415, dtree__min_samples_split=0.4312432316391185, dtree__splitter=random, total=   2.6s
[CV] cvec__max_df=0.10892264719185651, cvec__max_features=21462, cvec__min_df=10, dtree__ccp_alpha=0.019350393048759965, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.00500348487172151, dtree__min_samples_leaf=0.15210338481008415, dtree__min_samples_split=0.4312432316391185, dtree__splitter=random 
[CV]  cvec__max_df=0.10892264719185651, cvec__max_features=21462, cvec__min_df=10, dtree__ccp_alpha=0.019350393048759965, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.00500348487172151, dtree__min_samples_leaf=0.15210338481008415, dtree__min_samples_split=0.431243231639118

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   12.9s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] cvec__max_df=0.11321030376066124, cvec__max_features=23008, cvec__min_df=7, dtree__ccp_alpha=0.003588519417206327, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.031139595259511096, dtree__min_samples_leaf=0.40089173308566817, dtree__min_samples_split=0.0015985509623256404, dtree__splitter=random 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  cvec__max_df=0.11321030376066124, cvec__max_features=23008, cvec__min_df=7, dtree__ccp_alpha=0.003588519417206327, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.031139595259511096, dtree__min_samples_leaf=0.40089173308566817, dtree__min_samples_split=0.0015985509623256404, dtree__splitter=random, total=   2.6s
[CV] cvec__max_df=0.11321030376066124, cvec__max_features=23008, cvec__min_df=7, dtree__ccp_alpha=0.003588519417206327, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.031139595259511096, dtree__min_samples_leaf=0.40089173308566817, dtree__min_samples_split=0.0015985509623256404, dtree__splitter=random 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    2.5s remaining:    0.0s


[CV]  cvec__max_df=0.11321030376066124, cvec__max_features=23008, cvec__min_df=7, dtree__ccp_alpha=0.003588519417206327, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.031139595259511096, dtree__min_samples_leaf=0.40089173308566817, dtree__min_samples_split=0.0015985509623256404, dtree__splitter=random, total=   2.5s
[CV] cvec__max_df=0.11321030376066124, cvec__max_features=23008, cvec__min_df=7, dtree__ccp_alpha=0.003588519417206327, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.031139595259511096, dtree__min_samples_leaf=0.40089173308566817, dtree__min_samples_split=0.0015985509623256404, dtree__splitter=random 
[CV]  cvec__max_df=0.11321030376066124, cvec__max_features=23008, cvec__min_df=7, dtree__ccp_alpha=0.003588519417206327, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.031139595259511096, dtree__min_samples_leaf=0.40089173308566817, dtree__min_samples_split=0.001598550

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   12.8s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] cvec__max_df=0.7905065946373928, cvec__max_features=17625, cvec__min_df=10, dtree__ccp_alpha=0.0633759001142101, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.004475182032425696, dtree__min_samples_leaf=0.3440470728013023, dtree__min_samples_split=0.3864836494835044, dtree__splitter=best 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  cvec__max_df=0.7905065946373928, cvec__max_features=17625, cvec__min_df=10, dtree__ccp_alpha=0.0633759001142101, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.004475182032425696, dtree__min_samples_leaf=0.3440470728013023, dtree__min_samples_split=0.3864836494835044, dtree__splitter=best, total=   2.6s
[CV] cvec__max_df=0.7905065946373928, cvec__max_features=17625, cvec__min_df=10, dtree__ccp_alpha=0.0633759001142101, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.004475182032425696, dtree__min_samples_leaf=0.3440470728013023, dtree__min_samples_split=0.3864836494835044, dtree__splitter=best 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    2.5s remaining:    0.0s


[CV]  cvec__max_df=0.7905065946373928, cvec__max_features=17625, cvec__min_df=10, dtree__ccp_alpha=0.0633759001142101, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.004475182032425696, dtree__min_samples_leaf=0.3440470728013023, dtree__min_samples_split=0.3864836494835044, dtree__splitter=best, total=   2.6s
[CV] cvec__max_df=0.7905065946373928, cvec__max_features=17625, cvec__min_df=10, dtree__ccp_alpha=0.0633759001142101, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.004475182032425696, dtree__min_samples_leaf=0.3440470728013023, dtree__min_samples_split=0.3864836494835044, dtree__splitter=best 
[CV]  cvec__max_df=0.7905065946373928, cvec__max_features=17625, cvec__min_df=10, dtree__ccp_alpha=0.0633759001142101, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.004475182032425696, dtree__min_samples_leaf=0.3440470728013023, dtree__min_samples_split=0.3864836494835044, dtree__splitter=bes

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   12.8s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] cvec__max_df=0.10915605491462718, cvec__max_features=15821, cvec__min_df=7, dtree__ccp_alpha=0.1717792103544734, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.023171160228777993, dtree__min_samples_leaf=0.011444310671861692, dtree__min_samples_split=0.0029498392503609647, dtree__splitter=random 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  cvec__max_df=0.10915605491462718, cvec__max_features=15821, cvec__min_df=7, dtree__ccp_alpha=0.1717792103544734, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.023171160228777993, dtree__min_samples_leaf=0.011444310671861692, dtree__min_samples_split=0.0029498392503609647, dtree__splitter=random, total=   2.5s
[CV] cvec__max_df=0.10915605491462718, cvec__max_features=15821, cvec__min_df=7, dtree__ccp_alpha=0.1717792103544734, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.023171160228777993, dtree__min_samples_leaf=0.011444310671861692, dtree__min_samples_split=0.0029498392503609647, dtree__splitter=random 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    2.4s remaining:    0.0s


[CV]  cvec__max_df=0.10915605491462718, cvec__max_features=15821, cvec__min_df=7, dtree__ccp_alpha=0.1717792103544734, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.023171160228777993, dtree__min_samples_leaf=0.011444310671861692, dtree__min_samples_split=0.0029498392503609647, dtree__splitter=random, total=   2.5s
[CV] cvec__max_df=0.10915605491462718, cvec__max_features=15821, cvec__min_df=7, dtree__ccp_alpha=0.1717792103544734, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.023171160228777993, dtree__min_samples_leaf=0.011444310671861692, dtree__min_samples_split=0.0029498392503609647, dtree__splitter=random 
[CV]  cvec__max_df=0.10915605491462718, cvec__max_features=15821, cvec__min_df=7, dtree__ccp_alpha=0.1717792103544734, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.023171160228777993, dtree__min_samples_leaf=0.011444310671861692, dtree__min_samples_split=0.0029498392503609647, 

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   12.7s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] cvec__max_df=0.9156400372784491, cvec__max_features=18923, cvec__min_df=8, dtree__ccp_alpha=0.0286863287304517, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.006783699542625611, dtree__min_samples_leaf=0.00979767347183136, dtree__min_samples_split=0.21278082035860796, dtree__splitter=random 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  cvec__max_df=0.9156400372784491, cvec__max_features=18923, cvec__min_df=8, dtree__ccp_alpha=0.0286863287304517, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.006783699542625611, dtree__min_samples_leaf=0.00979767347183136, dtree__min_samples_split=0.21278082035860796, dtree__splitter=random, total=   2.5s
[CV] cvec__max_df=0.9156400372784491, cvec__max_features=18923, cvec__min_df=8, dtree__ccp_alpha=0.0286863287304517, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.006783699542625611, dtree__min_samples_leaf=0.00979767347183136, dtree__min_samples_split=0.21278082035860796, dtree__splitter=random 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    2.4s remaining:    0.0s


[CV]  cvec__max_df=0.9156400372784491, cvec__max_features=18923, cvec__min_df=8, dtree__ccp_alpha=0.0286863287304517, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.006783699542625611, dtree__min_samples_leaf=0.00979767347183136, dtree__min_samples_split=0.21278082035860796, dtree__splitter=random, total=   2.6s
[CV] cvec__max_df=0.9156400372784491, cvec__max_features=18923, cvec__min_df=8, dtree__ccp_alpha=0.0286863287304517, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.006783699542625611, dtree__min_samples_leaf=0.00979767347183136, dtree__min_samples_split=0.21278082035860796, dtree__splitter=random 
[CV]  cvec__max_df=0.9156400372784491, cvec__max_features=18923, cvec__min_df=8, dtree__ccp_alpha=0.0286863287304517, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.006783699542625611, dtree__min_samples_leaf=0.00979767347183136, dtree__min_samples_split=0.21278082035860796, dtree__split

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   12.6s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] cvec__max_df=0.34099960269328133, cvec__max_features=17751, cvec__min_df=8, dtree__ccp_alpha=0.01065668913728677, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.004924582465360185, dtree__min_samples_leaf=0.01599822295664816, dtree__min_samples_split=0.19070379350293853, dtree__splitter=best 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  cvec__max_df=0.34099960269328133, cvec__max_features=17751, cvec__min_df=8, dtree__ccp_alpha=0.01065668913728677, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.004924582465360185, dtree__min_samples_leaf=0.01599822295664816, dtree__min_samples_split=0.19070379350293853, dtree__splitter=best, total=   2.5s
[CV] cvec__max_df=0.34099960269328133, cvec__max_features=17751, cvec__min_df=8, dtree__ccp_alpha=0.01065668913728677, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.004924582465360185, dtree__min_samples_leaf=0.01599822295664816, dtree__min_samples_split=0.19070379350293853, dtree__splitter=best 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    2.4s remaining:    0.0s


[CV]  cvec__max_df=0.34099960269328133, cvec__max_features=17751, cvec__min_df=8, dtree__ccp_alpha=0.01065668913728677, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.004924582465360185, dtree__min_samples_leaf=0.01599822295664816, dtree__min_samples_split=0.19070379350293853, dtree__splitter=best, total=   2.6s
[CV] cvec__max_df=0.34099960269328133, cvec__max_features=17751, cvec__min_df=8, dtree__ccp_alpha=0.01065668913728677, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.004924582465360185, dtree__min_samples_leaf=0.01599822295664816, dtree__min_samples_split=0.19070379350293853, dtree__splitter=best 
[CV]  cvec__max_df=0.34099960269328133, cvec__max_features=17751, cvec__min_df=8, dtree__ccp_alpha=0.01065668913728677, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.004924582465360185, dtree__min_samples_leaf=0.01599822295664816, dtree__min_samples_split=0.19070379350293853, dtree__spl

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   12.7s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] cvec__max_df=0.7191601172248167, cvec__max_features=25281, cvec__min_df=10, dtree__ccp_alpha=0.17829241914040805, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.09247730696572504, dtree__min_samples_leaf=0.0039487309714883465, dtree__min_samples_split=0.246199217123802, dtree__splitter=random 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  cvec__max_df=0.7191601172248167, cvec__max_features=25281, cvec__min_df=10, dtree__ccp_alpha=0.17829241914040805, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.09247730696572504, dtree__min_samples_leaf=0.0039487309714883465, dtree__min_samples_split=0.246199217123802, dtree__splitter=random, total=   2.6s
[CV] cvec__max_df=0.7191601172248167, cvec__max_features=25281, cvec__min_df=10, dtree__ccp_alpha=0.17829241914040805, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.09247730696572504, dtree__min_samples_leaf=0.0039487309714883465, dtree__min_samples_split=0.246199217123802, dtree__splitter=random 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    2.5s remaining:    0.0s


[CV]  cvec__max_df=0.7191601172248167, cvec__max_features=25281, cvec__min_df=10, dtree__ccp_alpha=0.17829241914040805, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.09247730696572504, dtree__min_samples_leaf=0.0039487309714883465, dtree__min_samples_split=0.246199217123802, dtree__splitter=random, total=   2.5s
[CV] cvec__max_df=0.7191601172248167, cvec__max_features=25281, cvec__min_df=10, dtree__ccp_alpha=0.17829241914040805, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.09247730696572504, dtree__min_samples_leaf=0.0039487309714883465, dtree__min_samples_split=0.246199217123802, dtree__splitter=random 
[CV]  cvec__max_df=0.7191601172248167, cvec__max_features=25281, cvec__min_df=10, dtree__ccp_alpha=0.17829241914040805, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.09247730696572504, dtree__min_samples_leaf=0.0039487309714883465, dtree__min_samples_split=0.246199217123802, dtree__sp

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   12.8s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] cvec__max_df=0.4459586681069575, cvec__max_features=16728, cvec__min_df=9, dtree__ccp_alpha=0.007571090070761689, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.17692713943970204, dtree__min_samples_leaf=0.003946662945656377, dtree__min_samples_split=0.4813195549893457, dtree__splitter=random 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  cvec__max_df=0.4459586681069575, cvec__max_features=16728, cvec__min_df=9, dtree__ccp_alpha=0.007571090070761689, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.17692713943970204, dtree__min_samples_leaf=0.003946662945656377, dtree__min_samples_split=0.4813195549893457, dtree__splitter=random, total=   2.6s
[CV] cvec__max_df=0.4459586681069575, cvec__max_features=16728, cvec__min_df=9, dtree__ccp_alpha=0.007571090070761689, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.17692713943970204, dtree__min_samples_leaf=0.003946662945656377, dtree__min_samples_split=0.4813195549893457, dtree__splitter=random 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    2.5s remaining:    0.0s


[CV]  cvec__max_df=0.4459586681069575, cvec__max_features=16728, cvec__min_df=9, dtree__ccp_alpha=0.007571090070761689, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.17692713943970204, dtree__min_samples_leaf=0.003946662945656377, dtree__min_samples_split=0.4813195549893457, dtree__splitter=random, total=   2.6s
[CV] cvec__max_df=0.4459586681069575, cvec__max_features=16728, cvec__min_df=9, dtree__ccp_alpha=0.007571090070761689, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.17692713943970204, dtree__min_samples_leaf=0.003946662945656377, dtree__min_samples_split=0.4813195549893457, dtree__splitter=random 
[CV]  cvec__max_df=0.4459586681069575, cvec__max_features=16728, cvec__min_df=9, dtree__ccp_alpha=0.007571090070761689, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.17692713943970204, dtree__min_samples_leaf=0.003946662945656377, dtree__min_samples_split=0.4813195549893457, dtree__sp

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   12.9s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] cvec__max_df=0.7841751227044891, cvec__max_features=15506, cvec__min_df=10, dtree__ccp_alpha=0.006380469347720209, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.004837329033714566, dtree__min_samples_leaf=0.31946341829637176, dtree__min_samples_split=0.04325035562419877, dtree__splitter=best 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  cvec__max_df=0.7841751227044891, cvec__max_features=15506, cvec__min_df=10, dtree__ccp_alpha=0.006380469347720209, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.004837329033714566, dtree__min_samples_leaf=0.31946341829637176, dtree__min_samples_split=0.04325035562419877, dtree__splitter=best, total=   2.6s
[CV] cvec__max_df=0.7841751227044891, cvec__max_features=15506, cvec__min_df=10, dtree__ccp_alpha=0.006380469347720209, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.004837329033714566, dtree__min_samples_leaf=0.31946341829637176, dtree__min_samples_split=0.04325035562419877, dtree__splitter=best 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    2.5s remaining:    0.0s


[CV]  cvec__max_df=0.7841751227044891, cvec__max_features=15506, cvec__min_df=10, dtree__ccp_alpha=0.006380469347720209, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.004837329033714566, dtree__min_samples_leaf=0.31946341829637176, dtree__min_samples_split=0.04325035562419877, dtree__splitter=best, total=   2.5s
[CV] cvec__max_df=0.7841751227044891, cvec__max_features=15506, cvec__min_df=10, dtree__ccp_alpha=0.006380469347720209, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.004837329033714566, dtree__min_samples_leaf=0.31946341829637176, dtree__min_samples_split=0.04325035562419877, dtree__splitter=best 
[CV]  cvec__max_df=0.7841751227044891, cvec__max_features=15506, cvec__min_df=10, dtree__ccp_alpha=0.006380469347720209, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.004837329033714566, dtree__min_samples_leaf=0.31946341829637176, dtree__min_samples_split=0.04325035562419877, dtree__

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   12.9s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] cvec__max_df=0.26467052753443043, cvec__max_features=17863, cvec__min_df=7, dtree__ccp_alpha=0.008652023427494139, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.004478306152198598, dtree__min_samples_leaf=0.33588930465940026, dtree__min_samples_split=0.2463547892931771, dtree__splitter=random 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  cvec__max_df=0.26467052753443043, cvec__max_features=17863, cvec__min_df=7, dtree__ccp_alpha=0.008652023427494139, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.004478306152198598, dtree__min_samples_leaf=0.33588930465940026, dtree__min_samples_split=0.2463547892931771, dtree__splitter=random, total=   2.6s
[CV] cvec__max_df=0.26467052753443043, cvec__max_features=17863, cvec__min_df=7, dtree__ccp_alpha=0.008652023427494139, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.004478306152198598, dtree__min_samples_leaf=0.33588930465940026, dtree__min_samples_split=0.2463547892931771, dtree__splitter=random 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    2.5s remaining:    0.0s


[CV]  cvec__max_df=0.26467052753443043, cvec__max_features=17863, cvec__min_df=7, dtree__ccp_alpha=0.008652023427494139, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.004478306152198598, dtree__min_samples_leaf=0.33588930465940026, dtree__min_samples_split=0.2463547892931771, dtree__splitter=random, total=   2.5s
[CV] cvec__max_df=0.26467052753443043, cvec__max_features=17863, cvec__min_df=7, dtree__ccp_alpha=0.008652023427494139, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.004478306152198598, dtree__min_samples_leaf=0.33588930465940026, dtree__min_samples_split=0.2463547892931771, dtree__splitter=random 
[CV]  cvec__max_df=0.26467052753443043, cvec__max_features=17863, cvec__min_df=7, dtree__ccp_alpha=0.008652023427494139, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.004478306152198598, dtree__min_samples_leaf=0.33588930465940026, dtree__min_samples_split=0.246354789293177

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   12.8s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] cvec__max_df=0.7029705369341789, cvec__max_features=15749, cvec__min_df=7, dtree__ccp_alpha=0.09427788325780785, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.07458323884648933, dtree__min_samples_leaf=0.004713873099565947, dtree__min_samples_split=0.3813495595539205, dtree__splitter=best 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  cvec__max_df=0.7029705369341789, cvec__max_features=15749, cvec__min_df=7, dtree__ccp_alpha=0.09427788325780785, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.07458323884648933, dtree__min_samples_leaf=0.004713873099565947, dtree__min_samples_split=0.3813495595539205, dtree__splitter=best, total=   2.6s
[CV] cvec__max_df=0.7029705369341789, cvec__max_features=15749, cvec__min_df=7, dtree__ccp_alpha=0.09427788325780785, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.07458323884648933, dtree__min_samples_leaf=0.004713873099565947, dtree__min_samples_split=0.3813495595539205, dtree__splitter=best 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    2.5s remaining:    0.0s


[CV]  cvec__max_df=0.7029705369341789, cvec__max_features=15749, cvec__min_df=7, dtree__ccp_alpha=0.09427788325780785, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.07458323884648933, dtree__min_samples_leaf=0.004713873099565947, dtree__min_samples_split=0.3813495595539205, dtree__splitter=best, total=   2.5s
[CV] cvec__max_df=0.7029705369341789, cvec__max_features=15749, cvec__min_df=7, dtree__ccp_alpha=0.09427788325780785, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.07458323884648933, dtree__min_samples_leaf=0.004713873099565947, dtree__min_samples_split=0.3813495595539205, dtree__splitter=best 
[CV]  cvec__max_df=0.7029705369341789, cvec__max_features=15749, cvec__min_df=7, dtree__ccp_alpha=0.09427788325780785, dtree__criterion=entropy, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.07458323884648933, dtree__min_samples_leaf=0.004713873099565947, dtree__min_samples_split=0.3813495595539205, dtree__

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   12.8s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] cvec__max_df=0.1731135969566237, cvec__max_features=28149, cvec__min_df=3, dtree__ccp_alpha=0.009522650597652829, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.004737219793877269, dtree__min_samples_leaf=0.18312565997345934, dtree__min_samples_split=0.2073122510863175, dtree__splitter=best 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  cvec__max_df=0.1731135969566237, cvec__max_features=28149, cvec__min_df=3, dtree__ccp_alpha=0.009522650597652829, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.004737219793877269, dtree__min_samples_leaf=0.18312565997345934, dtree__min_samples_split=0.2073122510863175, dtree__splitter=best, total=   2.6s
[CV] cvec__max_df=0.1731135969566237, cvec__max_features=28149, cvec__min_df=3, dtree__ccp_alpha=0.009522650597652829, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.004737219793877269, dtree__min_samples_leaf=0.18312565997345934, dtree__min_samples_split=0.2073122510863175, dtree__splitter=best 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    2.5s remaining:    0.0s


[CV]  cvec__max_df=0.1731135969566237, cvec__max_features=28149, cvec__min_df=3, dtree__ccp_alpha=0.009522650597652829, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.004737219793877269, dtree__min_samples_leaf=0.18312565997345934, dtree__min_samples_split=0.2073122510863175, dtree__splitter=best, total=   2.5s
[CV] cvec__max_df=0.1731135969566237, cvec__max_features=28149, cvec__min_df=3, dtree__ccp_alpha=0.009522650597652829, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.004737219793877269, dtree__min_samples_leaf=0.18312565997345934, dtree__min_samples_split=0.2073122510863175, dtree__splitter=best 
[CV]  cvec__max_df=0.1731135969566237, cvec__max_features=28149, cvec__min_df=3, dtree__ccp_alpha=0.009522650597652829, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.004737219793877269, dtree__min_samples_leaf=0.18312565997345934, dtree__min_samples_split=0.2073122510863175, dtree__splitt

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   13.0s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] cvec__max_df=0.5414104876528328, cvec__max_features=20399, cvec__min_df=7, dtree__ccp_alpha=0.008051481830632225, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.004026976811730343, dtree__min_samples_leaf=0.021230635058142825, dtree__min_samples_split=0.45733154810846816, dtree__splitter=best 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  cvec__max_df=0.5414104876528328, cvec__max_features=20399, cvec__min_df=7, dtree__ccp_alpha=0.008051481830632225, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.004026976811730343, dtree__min_samples_leaf=0.021230635058142825, dtree__min_samples_split=0.45733154810846816, dtree__splitter=best, total=   2.6s
[CV] cvec__max_df=0.5414104876528328, cvec__max_features=20399, cvec__min_df=7, dtree__ccp_alpha=0.008051481830632225, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.004026976811730343, dtree__min_samples_leaf=0.021230635058142825, dtree__min_samples_split=0.45733154810846816, dtree__splitter=best 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    2.5s remaining:    0.0s


[CV]  cvec__max_df=0.5414104876528328, cvec__max_features=20399, cvec__min_df=7, dtree__ccp_alpha=0.008051481830632225, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.004026976811730343, dtree__min_samples_leaf=0.021230635058142825, dtree__min_samples_split=0.45733154810846816, dtree__splitter=best, total=   2.6s
[CV] cvec__max_df=0.5414104876528328, cvec__max_features=20399, cvec__min_df=7, dtree__ccp_alpha=0.008051481830632225, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.004026976811730343, dtree__min_samples_leaf=0.021230635058142825, dtree__min_samples_split=0.45733154810846816, dtree__splitter=best 
[CV]  cvec__max_df=0.5414104876528328, cvec__max_features=20399, cvec__min_df=7, dtree__ccp_alpha=0.008051481830632225, dtree__criterion=entropy, dtree__max_features=auto, dtree__min_impurity_decrease=0.004026976811730343, dtree__min_samples_leaf=0.021230635058142825, dtree__min_samples_split=0.45733154810846816

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   13.0s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] cvec__max_df=0.6062152436438916, cvec__max_features=22263, cvec__min_df=5, dtree__ccp_alpha=0.010578882547501126, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.004429500610162563, dtree__min_samples_leaf=0.21388864616930925, dtree__min_samples_split=0.3069974808905537, dtree__splitter=random 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  cvec__max_df=0.6062152436438916, cvec__max_features=22263, cvec__min_df=5, dtree__ccp_alpha=0.010578882547501126, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.004429500610162563, dtree__min_samples_leaf=0.21388864616930925, dtree__min_samples_split=0.3069974808905537, dtree__splitter=random, total=   2.5s
[CV] cvec__max_df=0.6062152436438916, cvec__max_features=22263, cvec__min_df=5, dtree__ccp_alpha=0.010578882547501126, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.004429500610162563, dtree__min_samples_leaf=0.21388864616930925, dtree__min_samples_split=0.3069974808905537, dtree__splitter=random 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    2.4s remaining:    0.0s


[CV]  cvec__max_df=0.6062152436438916, cvec__max_features=22263, cvec__min_df=5, dtree__ccp_alpha=0.010578882547501126, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.004429500610162563, dtree__min_samples_leaf=0.21388864616930925, dtree__min_samples_split=0.3069974808905537, dtree__splitter=random, total=   2.5s
[CV] cvec__max_df=0.6062152436438916, cvec__max_features=22263, cvec__min_df=5, dtree__ccp_alpha=0.010578882547501126, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.004429500610162563, dtree__min_samples_leaf=0.21388864616930925, dtree__min_samples_split=0.3069974808905537, dtree__splitter=random 
[CV]  cvec__max_df=0.6062152436438916, cvec__max_features=22263, cvec__min_df=5, dtree__ccp_alpha=0.010578882547501126, dtree__criterion=gini, dtree__max_features=sqrt, dtree__min_impurity_decrease=0.004429500610162563, dtree__min_samples_leaf=0.21388864616930925, dtree__min_samples_split=0.3069974808905537, dtree__sp

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   12.8s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] cvec__max_df=0.4638177252146807, cvec__max_features=22423, cvec__min_df=8, dtree__ccp_alpha=0.08454127049863715, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.00460938569627909, dtree__min_samples_leaf=0.014056980005170031, dtree__min_samples_split=0.3938904848466602, dtree__splitter=random 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  cvec__max_df=0.4638177252146807, cvec__max_features=22423, cvec__min_df=8, dtree__ccp_alpha=0.08454127049863715, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.00460938569627909, dtree__min_samples_leaf=0.014056980005170031, dtree__min_samples_split=0.3938904848466602, dtree__splitter=random, total=   2.6s
[CV] cvec__max_df=0.4638177252146807, cvec__max_features=22423, cvec__min_df=8, dtree__ccp_alpha=0.08454127049863715, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.00460938569627909, dtree__min_samples_leaf=0.014056980005170031, dtree__min_samples_split=0.3938904848466602, dtree__splitter=random 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    2.5s remaining:    0.0s


[CV]  cvec__max_df=0.4638177252146807, cvec__max_features=22423, cvec__min_df=8, dtree__ccp_alpha=0.08454127049863715, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.00460938569627909, dtree__min_samples_leaf=0.014056980005170031, dtree__min_samples_split=0.3938904848466602, dtree__splitter=random, total=   2.5s
[CV] cvec__max_df=0.4638177252146807, cvec__max_features=22423, cvec__min_df=8, dtree__ccp_alpha=0.08454127049863715, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.00460938569627909, dtree__min_samples_leaf=0.014056980005170031, dtree__min_samples_split=0.3938904848466602, dtree__splitter=random 
[CV]  cvec__max_df=0.4638177252146807, cvec__max_features=22423, cvec__min_df=8, dtree__ccp_alpha=0.08454127049863715, dtree__criterion=gini, dtree__max_features=auto, dtree__min_impurity_decrease=0.00460938569627909, dtree__min_samples_leaf=0.014056980005170031, dtree__min_samples_split=0.3938904848466602, dtree__split

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   12.9s finished


BayesSearchCV(cv=5,
              estimator=Pipeline(steps=[('cvec',
                                         CountVectorizer(ngram_range=(1, 2),
                                                         stop_words=frozenset({'a',
                                                                               'about',
                                                                               'above',
                                                                               'across',
                                                                               'after',
                                                                               'afterwards',
                                                                               'again',
                                                                               'against',
                                                                               'all',
                                                            

In [104]:
dtree_bs.best_score_

0.6766466897747829

This model threw a lot of errors while it was fitting, and had a low best score number. I did not end up testing its score values, assuming it was not going to perform as well as the boost or random forest model.

## Checking Random Forest with Bayes Search

In [11]:
rforr_pipe = Pipeline([
    ('cvec', CountVectorizer(stop_words=stop_words )),
    ('rfc', RandomForestClassifier())
])

In [15]:
rforr_params = {
    'cvec__max_features': Integer(15000,30000),
    'cvec__min_df': Integer(2,10),
    'cvec__max_df': Real(.1,1),
    'rfc__n_estimators': Integer(1,20000),
    'rfc__max_depth': Integer(1,50),
    'rfc__min_samples_split': Integer(2,20),
    'rfc__min_samples_leaf': Integer(2,10),
    'rfc__max_features': Categorical(['auto','sqrt','log2']),
    'rfc__ccp_alpha' :Real(0.1,1)
}

In [16]:
rforr_bs = BayesSearchCV(estimator = rforr_pipe,
                     search_spaces = rforr_params,
                     n_iter = 50,
                     cv = 5,
                     refit = True,
                     optimizer_kwargs = {'base_estimator': 'RF'},
                     random_state=42,
                        verbose=1,
                        n_jobs = 6)

In [17]:
rforr_bs.fit(X_train,y_train)

Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.
[Parallel(n_jobs=6)]: Done   5 out of   5 | elapsed:   21.7s finished
[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.


Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=6)]: Done   5 out of   5 | elapsed:   57.1s finished
[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.


Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=6)]: Done   5 out of   5 | elapsed:  1.1min finished
[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.


Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=6)]: Done   5 out of   5 | elapsed:  1.7min finished
[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.


Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=6)]: Done   5 out of   5 | elapsed:   10.4s finished
[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.


Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=6)]: Done   5 out of   5 | elapsed:   33.0s finished
[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.


Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=6)]: Done   5 out of   5 | elapsed:   18.5s finished
[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.


Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=6)]: Done   5 out of   5 | elapsed:   36.6s finished
[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.


Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=6)]: Done   5 out of   5 | elapsed:  1.9min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.
[Parallel(n_jobs=6)]: Done   5 out of   5 | elapsed:  1.3min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.
[Parallel(n_jobs=6)]: Done   5 out of   5 | elapsed:    8.6s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.
[Parallel(n_jobs=6)]: Done   5 out of   5 | elapsed:  2.2min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.
[Parallel(n_jobs=6)]: Done   5 out of   5 | elapsed:   22.2s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.
[Parallel(n_jobs=6)]: Done   5 out of   5 | elapsed:   42.6s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.
[Parallel(n_jobs=6)]: Done   5 out of   5 | elapsed:    1.8s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.
[Parallel(n_jobs=6)]: Done   5 out of   5 | elapsed:    6.4s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.
[Parallel(n_jobs=6)]: Done   5 out of   5 | elapsed:  1.8min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.
[Parallel(n_jobs=6)]: Done   5 out of   5 | elapsed:    4.3s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.
[Parallel(n_jobs=6)]: Done   5 out of   5 | elapsed:   53.6s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.
[Parallel(n_jobs=6)]: Done   5 out of   5 | elapsed:    6.9s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.
[Parallel(n_jobs=6)]: Done   5 out of   5 | elapsed:    6.0s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.
[Parallel(n_jobs=6)]: Done   5 out of   5 | elapsed:   36.0s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.
[Parallel(n_jobs=6)]: Done   5 out of   5 | elapsed:    5.3s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.
[Parallel(n_jobs=6)]: Done   5 out of   5 | elapsed:   56.6s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.
[Parallel(n_jobs=6)]: Done   5 out of   5 | elapsed:   43.5s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.
[Parallel(n_jobs=6)]: Done   5 out of   5 | elapsed:   30.0s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.
[Parallel(n_jobs=6)]: Done   5 out of   5 | elapsed:    1.1s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.
[Parallel(n_jobs=6)]: Done   5 out of   5 | elapsed:  2.3min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.
[Parallel(n_jobs=6)]: Done   5 out of   5 | elapsed:  1.5min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.
[Parallel(n_jobs=6)]: Done   5 out of   5 | elapsed:   58.4s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.
[Parallel(n_jobs=6)]: Done   5 out of   5 | elapsed:   52.3s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.
[Parallel(n_jobs=6)]: Done   5 out of   5 | elapsed:   19.1s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.
[Parallel(n_jobs=6)]: Done   5 out of   5 | elapsed:  1.1min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.
[Parallel(n_jobs=6)]: Done   5 out of   5 | elapsed:   41.6s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.
[Parallel(n_jobs=6)]: Done   5 out of   5 | elapsed:  1.1min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.
[Parallel(n_jobs=6)]: Done   5 out of   5 | elapsed:  3.7min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.
[Parallel(n_jobs=6)]: Done   5 out of   5 | elapsed:   35.4s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.
[Parallel(n_jobs=6)]: Done   5 out of   5 | elapsed:    5.7s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.
[Parallel(n_jobs=6)]: Done   5 out of   5 | elapsed:    2.5s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.
[Parallel(n_jobs=6)]: Done   5 out of   5 | elapsed:   31.7s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.
[Parallel(n_jobs=6)]: Done   5 out of   5 | elapsed:  1.1min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.
[Parallel(n_jobs=6)]: Done   5 out of   5 | elapsed:  1.1min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.
[Parallel(n_jobs=6)]: Done   5 out of   5 | elapsed:  3.4min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.
[Parallel(n_jobs=6)]: Done   5 out of   5 | elapsed:   27.9s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.
[Parallel(n_jobs=6)]: Done   5 out of   5 | elapsed:  1.9min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.
[Parallel(n_jobs=6)]: Done   5 out of   5 | elapsed:  2.6min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.
[Parallel(n_jobs=6)]: Done   5 out of   5 | elapsed:  1.1min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.
[Parallel(n_jobs=6)]: Done   5 out of   5 | elapsed:   15.7s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.
[Parallel(n_jobs=6)]: Done   5 out of   5 | elapsed:   36.5s finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.
[Parallel(n_jobs=6)]: Done   5 out of   5 | elapsed:   18.9s finished


BayesSearchCV(cv=5,
              estimator=Pipeline(steps=[('cvec',
                                         CountVectorizer(stop_words=frozenset({'a',
                                                                               'about',
                                                                               'above',
                                                                               'across',
                                                                               'after',
                                                                               'afterwards',
                                                                               'again',
                                                                               'against',
                                                                               'all',
                                                                               'almost',
                                                

In [18]:
rforr_bs.best_score_

0.5113122171945701

In [20]:
rforr_bs.best_params_

OrderedDict([('cvec__max_df', 0.46909356296798244),
             ('cvec__max_features', 28079),
             ('cvec__min_df', 5),
             ('rfc__ccp_alpha', 0.39633656951948726),
             ('rfc__max_depth', 38),
             ('rfc__max_features', 'log2'),
             ('rfc__min_samples_leaf', 2),
             ('rfc__min_samples_split', 10),
             ('rfc__n_estimators', 3694)])

In [19]:
rforr_bs.score(X_train,y_train), rforr_bs.score(X_test,y_test)

(0.5113122171945701, 0.5111470113085622)

This model ended up breaking. It has ended up guessing only the positive class, and failing to beat the baseline model. It was the first model I finished after learning how to export models so I did export it to test the export function. 

In [41]:
joblib.dump(rforr_bs,'./models/RandomForrestBayes.pkl')

['./data/RandomForrestBayes.pkl']

# Conclusion of Models:

The models that I think performed the best on this data are the Random Forest, and the XGBoost model. I will use both of those models in the analysis notebook to pick which model will be the final winner to submit to Mr. Seller. I can also report to Mr. Seller that the models appear to be working reasonably well and to have high hopes for a final model that will sort through his advice with high accuracy. 