# Real or Not?  NLP with Disaster Tweets

This is my first attempt at creating an NLP ML program.

Some resources I used:
 - https://www.kaggle.com/philculliton/nlp-getting-started-tutorial
 - https://www.kaggle.com/abhishek/approaching-almost-any-nlp-problem-on-kaggle
 - https://www.kaggle.com/gunesevitan/nlp-with-disaster-tweets-eda-cleaning-and-bert

In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split

pd.set_option('display.max_columns', 500)

In [2]:
train_df = pd.read_csv('train.csv')
test_df = pd.read_csv('test.csv')
train_df.head()

Unnamed: 0,id,keyword,location,text,target
0,1,,,Our Deeds are the Reason of this #earthquake M...,1
1,4,,,Forest fire near La Ronge Sask. Canada,1
2,5,,,All residents asked to 'shelter in place' are ...,1
3,6,,,"13,000 people receive #wildfires evacuation or...",1
4,7,,,Just got sent this photo from Ruby #Alaska as ...,1


# 1. Data Cleanup

In [3]:
print (f'Train has {len(train_df)} records\nTest has {len(test_df)} records\n')

null_train_keyword = train_df['keyword'].isnull().sum() / len(train_df) * 100
null_test_keyword = test_df['keyword'].isnull().sum() / len(test_df) * 100
null_train_location = train_df['location'].isnull().sum() / len(train_df) * 100
null_test_location = test_df['location'].isnull().sum() / len(test_df) * 100

print (f'Keyword Values: Train = {round(null_train_keyword,3)}% Test = {round(null_test_keyword,3)}%')
print (f'Location Values: Train = {round(null_train_location,3)}% Test = {round(null_test_location,3)}%')

Train has 7613 records
Test has 3263 records

Keyword Values: Train = 0.801% Test = 0.797%
Location Values: Train = 33.272% Test = 33.865%


In [4]:
# Let's see if we can use a '#' value for our keyword...
null_train_keyword_df = train_df[train_df['keyword'].isnull()]['text'].str.contains('#')

print(f'Train contains {len(null_train_keyword_df)} records with no keywords')
print(f'  {null_train_keyword_df.sum()} of which don\'t have any \'#\'')

Train contains 61 records with no keywords
  21 of which don't have any '#'


I'm not going to spend much time on the keyword right now.  I might circle back on this.


### Convert http links, @names, punctuation, etc...
Examples of the cleanup tasks (This is based on GloVe word vector):
 - http://my-link --> `<urL>`
 - @name --> `<user>`
 - 13,000 --> `<number>`
 - #topic --> `<hashtag> topic`
 
Sources:
 - https://nlp.stanford.edu/projects/glove/preprocess-twitter.rb
 - https://www.kaggle.com/amackcrane/python-version-of-glove-twitter-preprocess-script

In [5]:
# Testing area:
eyes = r"[8:=;]"
nose = r"['`\-]?"
df = pd.DataFrame({'test': ['This is ALL CAPS too longggg']})
df['test'].str.replace(r" ([A-Z -_]{2,}) ", r' \1 <allcaps> ')

0    This is ALL CAPS <allcaps> too longggg
Name: test, dtype: object

In [6]:
eyes = r"[8:=;]"
nose = r"['`\-]?"
key_words = ['user', 'number', 'hashtag', 'repeat', 'smile', 
             'lolface', 'sadface', 'neutralface', 'heart',
             'elong', 'allcaps', 'url']

all_data = [train_df, test_df]

for df in all_data:
    # Replace websites URLs
    df['text'] = df['text'].str.replace('http\S+|www.\S+', '<url>', case=False)
    # Replace usernames
    df['text'] = df['text'].str.replace('@\S+', ' <user>')
    # Remove encodings like &amp; and &gt;
    df['text'] = df['text'].str.replace('&\S+;', '') # not used in GloVe
    # Replace numbers
    df['text'] = df['text'].str.replace(r"[-+]?[.\d]*[\d]+[:,.\d]*", "<number>")
    # Replace hashtags
    df['text'] = df['text'].str.replace('#', '<hashtag> ')
    # Replace repeat !! ?? (not words)
    #df['text'] = df['text'].str.replace(r'(?<!\S)((\S+))(?:\s+\2)+(?!\S)', r'\1 <repeat>') # words: my misunderstanding
    df['text'] = df['text'].str.replace(r"([!?.]){2,}", r"\1 <repeat>")
    # Replace emoticons
    df['text'] = df['text'].str.replace(r"{}{}[)dD]+|[)dD]+{}{}".format(eyes, nose, nose, eyes), "<smile>")
    df['text'] = df['text'].str.replace(r"{}{}p+".format(eyes, nose), "<lolface>")
    df['text'] = df['text'].str.replace(r"{}{}\(+|\)+{}{}".format(eyes, nose, nose, eyes), "<sadface>")
    df['text'] = df['text'].str.replace(r"{}{}[\/|l*]".format(eyes, nose), "<neutralface>")
    df['text'] = df['text'].str.replace(r"<3","<heart>")
    # Elongated words like wayyyyy too longgg
    df['text'] = df['text'].str.replace(r"\b(\S*?)(.)\2{2,}\b", r"\1\2 <elong>")
    # ALL CAPS
    df['text'] = df['text'].str.replace(r" ([A-Z -_]{2,}) ", r' \1 <allcaps> ')
    # Remove *
    df['text'] = df['text'].str.replace(r"\*", r'')

### Split the TRAIN set into Train and Validate sets

In [7]:
xtrain_full = train_df['text']
xtest_full = test_df['text']
ytrain_full = train_df['target']
xtrain, xvalid, ytrain, yvalid = train_test_split(train_df.text.values, ytrain_full, 
                                                  stratify=ytrain_full, 
                                                  random_state=42, 
                                                  test_size=0.333, shuffle=True)

print(f'Train = {len(xtrain)} records\nValidate = {len(xvalid)} records')

Train = 5077 records
Validate = 2536 records


In [8]:
print(f'Train data:\n {xtrain[0:1000]}')

Train data:
 ['One Direction Is my pick for <url> Fan Army <hashtag> Directioners <url> x<number>'
 'Wire<smile> Reddit Will Now Quarantine Offensive Content - Reddit co-founder and CEO <allcaps> Steve Huffman has unveiled more sp. <repeat> <url>'
 "Next May I'll be free. <repeat>from school from obligations like family. <repeat> Best of all that damn curfew. <repeat>"
 'Dem FLATLINERS <allcaps> who destroy creativity-balance-longevity  TRUTH <allcaps> stand with Lucifer in all his flames of destruction <url>'
 'That usually NEVER <allcaps> happens'
 "Russian nuclear-biological-chemical (NBC) <allcaps> brigade 'emergency response' exercise in Southern MD <allcaps> <url> <url>"
 'IS claims suicide bombing against Saudi police: RIYADH (AFP) - <allcaps> An Islamic State group suicide bomber on Thursd. <repeat> <url>'
 'My dad is panicking as my weight loss means he needs to hurry up with my new clothes fundwhen I reach my goal.  ? <allcaps> <repeat>'
 " <user> All Andre and Gore have to d

### Create functions used to create submission to Kaggle

In [9]:
def CreateVectorizerSumbission(clf, vectorizer):
    xtrain_full_v = vectorizer.transform(xtrain_full)
    
    clf.fit(xtrain_full_v, ytrain_full)
    test_acc = metrics.f1_score(clf.predict(xtrain_full_v), ytrain_full)

    xtest_full_v = vectorizer.transform(xtest_full)
    predictions = clf.predict(xtest_full_v)
    output = pd.DataFrame({'id': test_df.id, 'target': predictions})
    output.to_csv('my_submission.csv', index=False)
    
    return test_acc

# 2. Word Processor
This will take all our examples and build a model to be used to train on.

In [10]:
# This will display what is contained in the CountVectorizer or TFIDF output at the given index
def DisplayWordVectorValues(vectorizer, vectors, index):
    df = pd.DataFrame()
    cx = vectors[index].tocoo()
    start=1
    for i,j,v in zip(cx.row, cx.col, cx.data):
        if start==1:
            df = pd.DataFrame({'Type': ['Index','Count'], vectorizer.get_feature_names()[j]: [j,v]})
            start=0
        else:
            d = pd.DataFrame({'Type': ['Index','Count'], vectorizer.get_feature_names()[j]: [j,v]})
            df = df.merge(d,left_on=['Type'], right_on=['Type'])
    return df

### CountVectorizer

In [11]:
from sklearn import feature_extraction

ctv = feature_extraction.text.CountVectorizer(analyzer='word',token_pattern=r'\w{1,}',
            ngram_range=(1, 3))

ctv.fit(list(xtrain) + list(xvalid))
#train_vectors = ctv.fit_transform(xtrain)
xtrain_ctv = ctv.transform(xtrain)
xvalid_ctv = ctv.transform(xvalid)
xtrain_ctv_full = ctv.transform(xtrain_full)

print('Example of what the data we\'re going to train on looks like:')
ex = 75 #42 75
print(f'--------------------------------------------------\nIn: {xtrain[ex]}\n--------------------------------------------------\nOut:')
print(DisplayWordVectorValues(ctv, xtrain_ctv, ex))


Example of what the data we're going to train on looks like:
--------------------------------------------------
In: <hashtag> Lifestyle Û÷It makes me sickÛª: Baby clothes deemed a Û÷hazardÛª <url> <url>
--------------------------------------------------
Out:
    Type   a   a û  a û hazard   baby  baby clothes  baby clothes deemed  \
0  Index  17  2505        2506  15239         15245                15246   
1  Count   1     1           1      1             1                    1   

   clothes  clothes deemed  clothes deemed a  deemed  deemed a  deemed a û  \
0    28188           28191             28192   33968     33969       33970   
1        1               1                 1       1         1           1   

   hashtag  hashtag lifestyle  hashtag lifestyle û  hazard  hazard ûª  \
0    55025              56900                56901   59484      59541   
1        1                  1                    1       1          1   

   hazard ûª url     it  it makes  it makes me  lifes

### TFIDF - Term Frequency - Inverse Document Frequency

In [12]:
from sklearn.feature_extraction.text import TfidfVectorizer

tfv = TfidfVectorizer(min_df=3,  max_features=None, 
            strip_accents='unicode', analyzer='word',token_pattern=r'\w{1,}',
            ngram_range=(1, 3), use_idf=1,smooth_idf=1,sublinear_tf=1,
            stop_words = 'english')

tfv.fit(list(xtrain) + list(xvalid))
xtrain_tfv =  tfv.transform(xtrain) 
xvalid_tfv = tfv.transform(xvalid)
xtrain_tfv_full = tfv.transform(xtrain_full)


print('Example of what the data we\'re going to train on looks like:')
ex = 42 #42 75
print(f'--------------------------------------------------\nIn: {xtrain[ex]}\n--------------------------------------------------\nOut:')
print(DisplayWordVectorValues(tfv, xtrain_tfv, ex))

Example of what the data we're going to train on looks like:
--------------------------------------------------
In: Lose bus card.
Panic.
Kind bus driver.
Replace bus card.
Find bus card.
Headdesk.
--------------------------------------------------
Out:
    Type      replace        panic         lose         kind       driver  \
0  Index  7240.000000  6440.000000  5101.000000  4695.000000  2389.000000   
1  Count     0.270794     0.217746     0.263774     0.291357     0.248545   

          card          bus  
0  1294.000000  1165.000000  
1     0.611446     0.537728  


# 3. Experiment with Solo Algorithms
Here I'm going to try several different models to see what works best against the validation set.

## 3.1 RidgeClassifier
This is just linear regression with a regularization parameter

In [13]:
# This will do 3 sets of cross-validation.  Since we've already done cross-validation above, we use the standard fit/predict method
# from sklearn import model_selection
# clf = linear_model.RidgeClassifier()
# scores = model_selection.cross_val_score(clf, train_vectors, ytrain, cv=3, scoring="f1")
# scores
#  Result: [0.70359281, 0.71489971, 0.73281361]

In [14]:
from sklearn import linear_model
from sklearn import metrics

clf = linear_model.RidgeClassifier()

clf.fit(xtrain_ctv, ytrain)
print(f'CountVector Result:\t{metrics.f1_score(clf.predict(xvalid_ctv), yvalid)}')  # Result: 0.7362581946545639
clf.fit(xtrain_tfv, ytrain)
print(f'TFIDF Result:\t\t{metrics.f1_score(clf.predict(xvalid_tfv), yvalid)}')  # Result: 0.7541463414634147

CountVector Result:	0.7332998493219488
TFIDF Result:		0.7478091528724441


## 3.2 Linear Regression

In [15]:
from sklearn import linear_model
from sklearn import metrics

clf = linear_model.LogisticRegression(C=1.0, max_iter=500)

clf.fit(xtrain_ctv, ytrain)
print(f'CountVector Result:\t{metrics.f1_score(clf.predict(xvalid_ctv), yvalid)}')  # Result: 0.7486201705970899
clf.fit(xtrain_tfv, ytrain)
print(f'TFIDF Result:\t\t{metrics.f1_score(clf.predict(xvalid_tfv), yvalid)}')  # Result: 0.7580893682588599

CountVector Result:	0.7475149105367793
TFIDF Result:		0.7539480387162506


In [16]:
#print(CreateVectorizerSumbission(clf, tfv))

#0.8346430362815581 / 0.79711 (no http)
#0.8372015361496077 / 0.79619 (with http)

## 3.3 Naive Bayes

In [17]:
from sklearn.naive_bayes import MultinomialNB
from sklearn import metrics

clf = MultinomialNB()

clf.fit(xtrain_ctv, ytrain)
print(f'CountVector Result:\t{metrics.f1_score(clf.predict(xvalid_ctv), yvalid)}')  # Result: 0.7607807535179302
clf.fit(xtrain_tfv, ytrain)
print(f'TFIDF Result:\t\t{metrics.f1_score(clf.predict(xvalid_tfv), yvalid)}')  # Result: 0.7420382165605096

CountVector Result:	0.7488500459981601
TFIDF Result:		0.7324396782841823


In [18]:
#print(CreateVectorizerSumbission(clf, ctv)) 

#0.9693972179289025 / 0.80110 (no http)
#0.9697156983930779 / 0.80202 (no http, numbers or &;)

## 3.4 SVM
TruncatedSVD - Singular Value Decomposition will be used to reduce the parameter set and standardize the data

In [19]:
# from sklearn.decomposition import TruncatedSVD
# from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn import metrics

# # Reduce parameters --> no benefit in score
# svd = TruncatedSVD(n_components=120)
# svd.fit(xtrain_tfv)
# xtrain_svd = svd.transform(xtrain_tfv)
# xvalid_svd = svd.transform(xvalid_tfv)

# # Scaled...
# scl = StandardScaler()
# scl.fit(xtrain_svd)
# xtrain_svd_scl = scl.transform(xtrain_svd)
# xvalid_svd_scl = scl.transform(xvalid_svd)

clf = SVC(C=1.0, probability=True) 

clf.fit(xtrain_tfv, ytrain)
print(f'TFIDF Result:\t\t{metrics.f1_score(clf.predict(xvalid_tfv), yvalid)}')  # Result: 0.7424400417101147

TFIDF Result:		0.7348008385744234


## 3.5 XGBoost

In [20]:
import xgboost as xgb
from sklearn import metrics

clf = xgb.XGBClassifier(max_depth=6, n_estimators=500, colsample_bytree=0.8, 
                        subsample=0.8, nthread=10, learning_rate=0.1)

clf.fit(xtrain_ctv, ytrain)
print(f'CountVector Result:\t{metrics.f1_score(clf.predict(xvalid_ctv), yvalid)}')  # Result: 0.7524461839530333
clf.fit(xtrain_tfv, ytrain)
print(f'TFIDF Result:\t\t{metrics.f1_score(clf.predict(xvalid_tfv), yvalid)}')  # Result: 0.735678391959799

CountVector Result:	0.7513434294088911
TFIDF Result:		0.7433102081268582


# 4. GridSearch and Pipeline

In [93]:
from sklearn.metrics import f1_score, make_scorer
f1_scorer = make_scorer(f1_score)

def GetBestScoreString(clf, param_grid):
    output = ""
    best = clf.best_estimator_.get_params()
    for p in sorted(param_grid.keys()):
        output += "%s: %r " % (p, best[p])
    return output

## 4.1 Naive Bayes

In [None]:
from sklearn.naive_bayes import MultinomialNB
from sklearn import pipeline
from sklearn.model_selection import GridSearchCV

nb_model = MultinomialNB()
pline = pipeline.Pipeline([('nb', nb_model)])

param_grid = {'nb__alpha': [0.001, 0.003, 0.006, 0.01, 0.03, 0.06, 0.1, 0.3, 0.6, 1, 3, 6, 10]}
clf = GridSearchCV(estimator=pline, param_grid=param_grid, scoring=f1_scorer,
                     verbose=10, n_jobs=-1, iid=True, cv=2)

clf.fit(xtrain_ctv, ytrain)
print(f'CountVector Result:\t{clf.best_score_} [Params = {GetBestScoreString(clf, param_grid)}]')  # Result: 0.7607807535179302
clf.fit(xtrain_tfv, ytrain)
print(f'TFIDF Result:\t\t{clf.best_score_} [Params = {GetBestScoreString(clf, param_grid)}]')  # Result: 0.7420382165605096

# 5. Word Vectors

## 5.1 Global Vectors (GloVe)
Here we're goign to use a pre-trained model that was developed by Stanford on a Twitter dataset.  This model contains vector representations of each word so when comparing two or more words, their dot product equals the logarithm of the words' probability of co-occurrence.
Source: https://nlp.stanford.edu/projects/glove/

In [13]:
from tqdm.notebook import tqdm  # This is an awesome library that shows the progress of whatever tqdm() is applied to

with open('D:\Datasets\GloVe\glove.twitter.27B.25d.txt', 'r', encoding="utf8") as f:
    embeddings_index = {}
    for line in tqdm(f):
        vals = line.rstrip().split(' ')
        embeddings_index[vals[0]] = [float(x) for x in vals[1:]]
print('Found %s word vectors.' % len(embeddings_index))  #1193514 

1193514it [00:09, 125899.07it/s]

Found 1193514 word vectors.





In [14]:
# Prior to using stanfordnlp, I needed to install pyTorch.
# How to install pyTorch: https://pytorch.org/get-started/locally/#mac-anaconda
#     conda install pytorch torchvision cudatoolkit=10.2 -c pytorch
# Then I could run `pip install stanfordnlp`

# import stanfordnlp
# #stanfordnlp.download("en")  # run this the first time
# nlp = stanfordnlp.Pipeline(processors='tokenize', lang='en')
# doc = nlp(xtrain[0])
# print(*[f'word: {word.text+" "}\tlemma: {word.lemma}' for sent in doc.sentences for word in sent.words], sep='\n')

In [67]:
# import nltk
# nltk.download('punkt')
# nltk.download('stopwords')

from nltk import word_tokenize
from nltk.corpus import stopwords
stop_words = stopwords.words('english')
import re

def new_tokenize(s):
    # This will regroup key_words like '<', 'hashtag', '>' into '<hashtag>'
    words = word_tokenize(s)
    new_words = []
    skip = 0
    for i, w in enumerate(words):
        if skip > 0:
            skip = skip-1
        else:
            if w == '<' and words[i+1] in key_words and words[i+2] == '>':
                new_words.append('<' + words[i+1] + '>')
                skip = 2
            else:
                new_words.append(w)
    return new_words
            

def sent2vec(s):
    words = str(s).lower()
    words = new_tokenize(words)
    words = [w for w in words if not w in stop_words]
    words = [w for w in words if w.isalpha() or re.match("<\S+>",w)]
    M = []
    for w in words:
        try:
            # This adds the np.array (size=25) of values from the GloVe file for each word to the matrix
            M.append(embeddings_index[w])
        except:
            continue
    M = np.array(M)
    
    # Now we sum up each column to create a vector of size 25
    v = M.sum(axis=0)
    
    if type(v) != np.ndarray:
        return np.zeros(300)
    
    # I don't understand what the heck this is doing...
    return v / np.sqrt((v ** 2).sum())


# Print an example to show what this does...
s = xtrain[1973]
print (s)
print (word_tokenize(s))
print (new_tokenize(s))
print (sent2vec(s))

screams internally
['screams', 'internally']
['screams', 'internally']
[-0.18842637 -0.14155117  0.13611215  0.14572803 -0.01665227 -0.16658199
  0.33013203 -0.39650936 -0.1280619   0.09574502 -0.1280681   0.35729542
 -0.2917416  -0.07044935 -0.07878197  0.1386751  -0.0919395   0.08689471
  0.10740657 -0.3965204   0.15054632  0.02527084  0.25635967 -0.14913656
 -0.17113405]


In [66]:
# This is me trying to figure out what the above is doing... Still don't get it!

# import re

# s = xtest_full[13]
# words = str(s).lower()
# words = new_tokenize(words)
# words = [w for w in words if not w in stop_words]
# print (words)
# words = [w for w in words if w.isalpha() or re.match("<\S+>",w)]
# print (words)
# M = []
# for w in words:
#     try:
#         M.append(embeddings_index[w])
#     except:
#         continue
# M = np.array(M)
# v = M.sum(axis=0)
# print (words)
# print (M)
# print(v)  # must remember v is a sum of the columns
# print(v**2)
# print((v**2).sum())  # this is a sum on the row
# print(np.sqrt((v**2).sum()))
# print(v /np.sqrt((v**2).sum()))
# print (v / np.std(v))

In [69]:
from tqdm.notebook import tqdm

xtrain_glove = [sent2vec(x) for x in tqdm(xtrain)]
xvalid_glove = [sent2vec(x) for x in tqdm(xvalid)]
xtrain_full_glove = [sent2vec(x) for x in tqdm(xtrain_full)]
xtest_full_glove = [sent2vec(x) for x in tqdm(xtest_full)]

HBox(children=(FloatProgress(value=0.0, max=5077.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=2536.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=7613.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=3263.0), HTML(value='')))




In [64]:
for i, v in enumerate(xtest_full_glove):
    if len(v) > 25:
        print(f'Index = {i}')

In [65]:
xtrain_glove_np = np.stack(xtrain_glove)
xvalid_glove_np = np.stack(xvalid_glove)
xtrain_full_glove_np = np.stack(xtrain_full_glove)
xtest_full_glove_np = np.stack(xtest_full_glove)

### 5.1.1 XGBClassifier

In [119]:
import xgboost as xgb
from sklearn import metrics

# clf = xgb.XGBClassifier(max_depth=6, n_estimators=180, colsample_bytree=0.8,   # Result: 0.7417794970986459
#                         subsample=0.8, nthread=10, learning_rate=0.1)

# clf = xgb.XGBClassifier(max_depth=5, n_estimators=150, colsample_bytree=0.5,   # Result: 0.7229254571026722
#                         subsample=0.6, nthread=10, learning_rate=0.6)

clf = xgb.XGBClassifier(max_depth=6, n_estimators=150, colsample_bytree=0.6,   # Result: 0.7554691298006806
                        subsample=0.7, nthread=10, learning_rate=0.1)

clf.fit(xtrain_glove_np, ytrain)
print(f'CountVector Result:\t{metrics.f1_score(clf.predict(xvalid_glove_np), yvalid)}')

CountVector Result:	0.7554691298006806


Using a GridSearch to find Best Parameters.

In [111]:
# xg_model = xgb.XGBClassifier()
# from sklearn import pipeline
# from sklearn.model_selection import GridSearchCV

# pline = pipeline.Pipeline([('xg', xg_model)])

# param_grid = {'xg__max_depth': [4,5,6],
#               'xg__n_estimators': [150,160,170,180],
#               'xg__colsample_bytree': [0.5,0.6,0.7,0.8],
#               'xg__subsample': [0.6,0.7,0.8],
#               'xg__learning_rate': [0.1,0.3,0.5,0.6]}
# clf = GridSearchCV(estimator=pline, param_grid=param_grid, scoring=f1_scorer,
#                      verbose=10, cv=2)

# clf.fit(xtrain_full_glove_np, ytrain_full)
# print(f'CountVector Result:\t{clf.best_score_} [Params = {GetBestScoreString(clf, param_grid)}]')  # Result: 0.7607807535179302

# Output:
#CountVector Result:	0.7249966532003029 [Params = xg__colsample_bytree: 0.5 xg__learning_rate: 0.06 xg__max_depth: 5 xg__n_estimators: 150 xg__subsample: 0.6 ]

# Output 2:
#CountVector Result:	0.7211488547217996 [Params = xg__colsample_bytree: 0.6 xg__learning_rate: 0.1 xg__max_depth: 6 xg__n_estimators: 150 xg__subsample: 0.7 ]

Fitting 2 folds for each of 576 candidates, totalling 1152 fits
[CV] xg__colsample_bytree=0.5, xg__learning_rate=0.1, xg__max_depth=4, xg__n_estimators=150, xg__subsample=0.6 
[CV]  xg__colsample_bytree=0.5, xg__learning_rate=0.1, xg__max_depth=4, xg__n_estimators=150, xg__subsample=0.6, score=0.713, total=   0.1s
[CV] xg__colsample_bytree=0.5, xg__learning_rate=0.1, xg__max_depth=4, xg__n_estimators=150, xg__subsample=0.6 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s


[CV]  xg__colsample_bytree=0.5, xg__learning_rate=0.1, xg__max_depth=4, xg__n_estimators=150, xg__subsample=0.6, score=0.710, total=   0.1s
[CV] xg__colsample_bytree=0.5, xg__learning_rate=0.1, xg__max_depth=4, xg__n_estimators=150, xg__subsample=0.7 
[CV]  xg__colsample_bytree=0.5, xg__learning_rate=0.1, xg__max_depth=4, xg__n_estimators=150, xg__subsample=0.7, score=0.708, total=   0.1s
[CV] xg__colsample_bytree=0.5, xg__learning_rate=0.1, xg__max_depth=4, xg__n_estimators=150, xg__subsample=0.7 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.2s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed:    0.3s remaining:    0.0s


[CV]  xg__colsample_bytree=0.5, xg__learning_rate=0.1, xg__max_depth=4, xg__n_estimators=150, xg__subsample=0.7, score=0.724, total=   0.1s
[CV] xg__colsample_bytree=0.5, xg__learning_rate=0.1, xg__max_depth=4, xg__n_estimators=150, xg__subsample=0.8 
[CV]  xg__colsample_bytree=0.5, xg__learning_rate=0.1, xg__max_depth=4, xg__n_estimators=150, xg__subsample=0.8, score=0.714, total=   0.1s
[CV] xg__colsample_bytree=0.5, xg__learning_rate=0.1, xg__max_depth=4, xg__n_estimators=150, xg__subsample=0.8 


[Parallel(n_jobs=1)]: Done   4 out of   4 | elapsed:    0.4s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:    0.6s remaining:    0.0s


[CV]  xg__colsample_bytree=0.5, xg__learning_rate=0.1, xg__max_depth=4, xg__n_estimators=150, xg__subsample=0.8, score=0.723, total=   0.1s
[CV] xg__colsample_bytree=0.5, xg__learning_rate=0.1, xg__max_depth=4, xg__n_estimators=160, xg__subsample=0.6 
[CV]  xg__colsample_bytree=0.5, xg__learning_rate=0.1, xg__max_depth=4, xg__n_estimators=160, xg__subsample=0.6, score=0.717, total=   0.1s
[CV] xg__colsample_bytree=0.5, xg__learning_rate=0.1, xg__max_depth=4, xg__n_estimators=160, xg__subsample=0.6 


[Parallel(n_jobs=1)]: Done   6 out of   6 | elapsed:    0.7s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   7 out of   7 | elapsed:    0.9s remaining:    0.0s


[CV]  xg__colsample_bytree=0.5, xg__learning_rate=0.1, xg__max_depth=4, xg__n_estimators=160, xg__subsample=0.6, score=0.711, total=   0.1s
[CV] xg__colsample_bytree=0.5, xg__learning_rate=0.1, xg__max_depth=4, xg__n_estimators=160, xg__subsample=0.7 
[CV]  xg__colsample_bytree=0.5, xg__learning_rate=0.1, xg__max_depth=4, xg__n_estimators=160, xg__subsample=0.7, score=0.711, total=   0.1s
[CV] xg__colsample_bytree=0.5, xg__learning_rate=0.1, xg__max_depth=4, xg__n_estimators=160, xg__subsample=0.7 


[Parallel(n_jobs=1)]: Done   8 out of   8 | elapsed:    1.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   9 out of   9 | elapsed:    1.1s remaining:    0.0s


[CV]  xg__colsample_bytree=0.5, xg__learning_rate=0.1, xg__max_depth=4, xg__n_estimators=160, xg__subsample=0.7, score=0.726, total=   0.1s
[CV] xg__colsample_bytree=0.5, xg__learning_rate=0.1, xg__max_depth=4, xg__n_estimators=160, xg__subsample=0.8 
[CV]  xg__colsample_bytree=0.5, xg__learning_rate=0.1, xg__max_depth=4, xg__n_estimators=160, xg__subsample=0.8, score=0.716, total=   0.1s
[CV] xg__colsample_bytree=0.5, xg__learning_rate=0.1, xg__max_depth=4, xg__n_estimators=160, xg__subsample=0.8 
[CV]  xg__colsample_bytree=0.5, xg__learning_rate=0.1, xg__max_depth=4, xg__n_estimators=160, xg__subsample=0.8, score=0.723, total=   0.1s
[CV] xg__colsample_bytree=0.5, xg__learning_rate=0.1, xg__max_depth=4, xg__n_estimators=170, xg__subsample=0.6 
[CV]  xg__colsample_bytree=0.5, xg__learning_rate=0.1, xg__max_depth=4, xg__n_estimators=170, xg__subsample=0.6, score=0.710, total=   0.1s
[CV] xg__colsample_bytree=0.5, xg__learning_rate=0.1, xg__max_depth=4, xg__n_estimators=170, xg__subsamp

[CV]  xg__colsample_bytree=0.5, xg__learning_rate=0.1, xg__max_depth=5, xg__n_estimators=180, xg__subsample=0.6, score=0.714, total=   0.2s
[CV] xg__colsample_bytree=0.5, xg__learning_rate=0.1, xg__max_depth=5, xg__n_estimators=180, xg__subsample=0.7 
[CV]  xg__colsample_bytree=0.5, xg__learning_rate=0.1, xg__max_depth=5, xg__n_estimators=180, xg__subsample=0.7, score=0.704, total=   0.2s
[CV] xg__colsample_bytree=0.5, xg__learning_rate=0.1, xg__max_depth=5, xg__n_estimators=180, xg__subsample=0.7 
[CV]  xg__colsample_bytree=0.5, xg__learning_rate=0.1, xg__max_depth=5, xg__n_estimators=180, xg__subsample=0.7, score=0.715, total=   0.2s
[CV] xg__colsample_bytree=0.5, xg__learning_rate=0.1, xg__max_depth=5, xg__n_estimators=180, xg__subsample=0.8 
[CV]  xg__colsample_bytree=0.5, xg__learning_rate=0.1, xg__max_depth=5, xg__n_estimators=180, xg__subsample=0.8, score=0.715, total=   0.2s
[CV] xg__colsample_bytree=0.5, xg__learning_rate=0.1, xg__max_depth=5, xg__n_estimators=180, xg__subsamp

[CV]  xg__colsample_bytree=0.5, xg__learning_rate=0.3, xg__max_depth=4, xg__n_estimators=150, xg__subsample=0.8, score=0.703, total=   0.1s
[CV] xg__colsample_bytree=0.5, xg__learning_rate=0.3, xg__max_depth=4, xg__n_estimators=160, xg__subsample=0.6 
[CV]  xg__colsample_bytree=0.5, xg__learning_rate=0.3, xg__max_depth=4, xg__n_estimators=160, xg__subsample=0.6, score=0.697, total=   0.1s
[CV] xg__colsample_bytree=0.5, xg__learning_rate=0.3, xg__max_depth=4, xg__n_estimators=160, xg__subsample=0.6 
[CV]  xg__colsample_bytree=0.5, xg__learning_rate=0.3, xg__max_depth=4, xg__n_estimators=160, xg__subsample=0.6, score=0.690, total=   0.1s
[CV] xg__colsample_bytree=0.5, xg__learning_rate=0.3, xg__max_depth=4, xg__n_estimators=160, xg__subsample=0.7 
[CV]  xg__colsample_bytree=0.5, xg__learning_rate=0.3, xg__max_depth=4, xg__n_estimators=160, xg__subsample=0.7, score=0.697, total=   0.1s
[CV] xg__colsample_bytree=0.5, xg__learning_rate=0.3, xg__max_depth=4, xg__n_estimators=160, xg__subsamp

[CV]  xg__colsample_bytree=0.5, xg__learning_rate=0.3, xg__max_depth=5, xg__n_estimators=170, xg__subsample=0.7, score=0.697, total=   0.2s
[CV] xg__colsample_bytree=0.5, xg__learning_rate=0.3, xg__max_depth=5, xg__n_estimators=170, xg__subsample=0.8 
[CV]  xg__colsample_bytree=0.5, xg__learning_rate=0.3, xg__max_depth=5, xg__n_estimators=170, xg__subsample=0.8, score=0.699, total=   0.2s
[CV] xg__colsample_bytree=0.5, xg__learning_rate=0.3, xg__max_depth=5, xg__n_estimators=170, xg__subsample=0.8 
[CV]  xg__colsample_bytree=0.5, xg__learning_rate=0.3, xg__max_depth=5, xg__n_estimators=170, xg__subsample=0.8, score=0.708, total=   0.2s
[CV] xg__colsample_bytree=0.5, xg__learning_rate=0.3, xg__max_depth=5, xg__n_estimators=180, xg__subsample=0.6 
[CV]  xg__colsample_bytree=0.5, xg__learning_rate=0.3, xg__max_depth=5, xg__n_estimators=180, xg__subsample=0.6, score=0.702, total=   0.2s
[CV] xg__colsample_bytree=0.5, xg__learning_rate=0.3, xg__max_depth=5, xg__n_estimators=180, xg__subsamp

[CV]  xg__colsample_bytree=0.5, xg__learning_rate=0.5, xg__max_depth=4, xg__n_estimators=150, xg__subsample=0.6, score=0.689, total=   0.1s
[CV] xg__colsample_bytree=0.5, xg__learning_rate=0.5, xg__max_depth=4, xg__n_estimators=150, xg__subsample=0.7 
[CV]  xg__colsample_bytree=0.5, xg__learning_rate=0.5, xg__max_depth=4, xg__n_estimators=150, xg__subsample=0.7, score=0.694, total=   0.1s
[CV] xg__colsample_bytree=0.5, xg__learning_rate=0.5, xg__max_depth=4, xg__n_estimators=150, xg__subsample=0.7 
[CV]  xg__colsample_bytree=0.5, xg__learning_rate=0.5, xg__max_depth=4, xg__n_estimators=150, xg__subsample=0.7, score=0.678, total=   0.1s
[CV] xg__colsample_bytree=0.5, xg__learning_rate=0.5, xg__max_depth=4, xg__n_estimators=150, xg__subsample=0.8 
[CV]  xg__colsample_bytree=0.5, xg__learning_rate=0.5, xg__max_depth=4, xg__n_estimators=150, xg__subsample=0.8, score=0.697, total=   0.1s
[CV] xg__colsample_bytree=0.5, xg__learning_rate=0.5, xg__max_depth=4, xg__n_estimators=150, xg__subsamp

[CV]  xg__colsample_bytree=0.5, xg__learning_rate=0.5, xg__max_depth=5, xg__n_estimators=160, xg__subsample=0.8, score=0.701, total=   0.2s
[CV] xg__colsample_bytree=0.5, xg__learning_rate=0.5, xg__max_depth=5, xg__n_estimators=170, xg__subsample=0.6 
[CV]  xg__colsample_bytree=0.5, xg__learning_rate=0.5, xg__max_depth=5, xg__n_estimators=170, xg__subsample=0.6, score=0.679, total=   0.2s
[CV] xg__colsample_bytree=0.5, xg__learning_rate=0.5, xg__max_depth=5, xg__n_estimators=170, xg__subsample=0.6 
[CV]  xg__colsample_bytree=0.5, xg__learning_rate=0.5, xg__max_depth=5, xg__n_estimators=170, xg__subsample=0.6, score=0.678, total=   0.2s
[CV] xg__colsample_bytree=0.5, xg__learning_rate=0.5, xg__max_depth=5, xg__n_estimators=170, xg__subsample=0.7 
[CV]  xg__colsample_bytree=0.5, xg__learning_rate=0.5, xg__max_depth=5, xg__n_estimators=170, xg__subsample=0.7, score=0.684, total=   0.2s
[CV] xg__colsample_bytree=0.5, xg__learning_rate=0.5, xg__max_depth=5, xg__n_estimators=170, xg__subsamp

[CV]  xg__colsample_bytree=0.5, xg__learning_rate=0.5, xg__max_depth=6, xg__n_estimators=180, xg__subsample=0.7, score=0.688, total=   0.2s
[CV] xg__colsample_bytree=0.5, xg__learning_rate=0.5, xg__max_depth=6, xg__n_estimators=180, xg__subsample=0.7 
[CV]  xg__colsample_bytree=0.5, xg__learning_rate=0.5, xg__max_depth=6, xg__n_estimators=180, xg__subsample=0.7, score=0.710, total=   0.2s
[CV] xg__colsample_bytree=0.5, xg__learning_rate=0.5, xg__max_depth=6, xg__n_estimators=180, xg__subsample=0.8 
[CV]  xg__colsample_bytree=0.5, xg__learning_rate=0.5, xg__max_depth=6, xg__n_estimators=180, xg__subsample=0.8, score=0.690, total=   0.2s
[CV] xg__colsample_bytree=0.5, xg__learning_rate=0.5, xg__max_depth=6, xg__n_estimators=180, xg__subsample=0.8 
[CV]  xg__colsample_bytree=0.5, xg__learning_rate=0.5, xg__max_depth=6, xg__n_estimators=180, xg__subsample=0.8, score=0.680, total=   0.2s
[CV] xg__colsample_bytree=0.5, xg__learning_rate=0.6, xg__max_depth=4, xg__n_estimators=150, xg__subsamp

[CV]  xg__colsample_bytree=0.5, xg__learning_rate=0.6, xg__max_depth=5, xg__n_estimators=150, xg__subsample=0.8, score=0.710, total=   0.2s
[CV] xg__colsample_bytree=0.5, xg__learning_rate=0.6, xg__max_depth=5, xg__n_estimators=160, xg__subsample=0.6 
[CV]  xg__colsample_bytree=0.5, xg__learning_rate=0.6, xg__max_depth=5, xg__n_estimators=160, xg__subsample=0.6, score=0.687, total=   0.2s
[CV] xg__colsample_bytree=0.5, xg__learning_rate=0.6, xg__max_depth=5, xg__n_estimators=160, xg__subsample=0.6 
[CV]  xg__colsample_bytree=0.5, xg__learning_rate=0.6, xg__max_depth=5, xg__n_estimators=160, xg__subsample=0.6, score=0.683, total=   0.2s
[CV] xg__colsample_bytree=0.5, xg__learning_rate=0.6, xg__max_depth=5, xg__n_estimators=160, xg__subsample=0.7 
[CV]  xg__colsample_bytree=0.5, xg__learning_rate=0.6, xg__max_depth=5, xg__n_estimators=160, xg__subsample=0.7, score=0.687, total=   0.2s
[CV] xg__colsample_bytree=0.5, xg__learning_rate=0.6, xg__max_depth=5, xg__n_estimators=160, xg__subsamp

[CV]  xg__colsample_bytree=0.6, xg__learning_rate=0.1, xg__max_depth=4, xg__n_estimators=180, xg__subsample=0.8, score=0.712, total=   0.2s
[CV] xg__colsample_bytree=0.6, xg__learning_rate=0.1, xg__max_depth=5, xg__n_estimators=150, xg__subsample=0.6 
[CV]  xg__colsample_bytree=0.6, xg__learning_rate=0.1, xg__max_depth=5, xg__n_estimators=150, xg__subsample=0.6, score=0.700, total=   0.2s
[CV] xg__colsample_bytree=0.6, xg__learning_rate=0.1, xg__max_depth=5, xg__n_estimators=150, xg__subsample=0.6 
[CV]  xg__colsample_bytree=0.6, xg__learning_rate=0.1, xg__max_depth=5, xg__n_estimators=150, xg__subsample=0.6, score=0.714, total=   0.2s
[CV] xg__colsample_bytree=0.6, xg__learning_rate=0.1, xg__max_depth=5, xg__n_estimators=150, xg__subsample=0.7 
[CV]  xg__colsample_bytree=0.6, xg__learning_rate=0.1, xg__max_depth=5, xg__n_estimators=150, xg__subsample=0.7, score=0.709, total=   0.2s
[CV] xg__colsample_bytree=0.6, xg__learning_rate=0.1, xg__max_depth=5, xg__n_estimators=150, xg__subsamp

[CV]  xg__colsample_bytree=0.6, xg__learning_rate=0.3, xg__max_depth=4, xg__n_estimators=170, xg__subsample=0.8, score=0.700, total=   0.2s
[CV] xg__colsample_bytree=0.6, xg__learning_rate=0.3, xg__max_depth=4, xg__n_estimators=180, xg__subsample=0.6 
[CV]  xg__colsample_bytree=0.6, xg__learning_rate=0.3, xg__max_depth=4, xg__n_estimators=180, xg__subsample=0.6, score=0.706, total=   0.2s
[CV] xg__colsample_bytree=0.6, xg__learning_rate=0.3, xg__max_depth=4, xg__n_estimators=180, xg__subsample=0.6 
[CV]  xg__colsample_bytree=0.6, xg__learning_rate=0.3, xg__max_depth=4, xg__n_estimators=180, xg__subsample=0.6, score=0.692, total=   0.2s
[CV] xg__colsample_bytree=0.6, xg__learning_rate=0.3, xg__max_depth=4, xg__n_estimators=180, xg__subsample=0.7 
[CV]  xg__colsample_bytree=0.6, xg__learning_rate=0.3, xg__max_depth=4, xg__n_estimators=180, xg__subsample=0.7, score=0.698, total=   0.2s
[CV] xg__colsample_bytree=0.6, xg__learning_rate=0.3, xg__max_depth=4, xg__n_estimators=180, xg__subsamp

[CV]  xg__colsample_bytree=0.6, xg__learning_rate=0.3, xg__max_depth=6, xg__n_estimators=150, xg__subsample=0.7, score=0.701, total=   0.2s
[CV] xg__colsample_bytree=0.6, xg__learning_rate=0.3, xg__max_depth=6, xg__n_estimators=150, xg__subsample=0.7 
[CV]  xg__colsample_bytree=0.6, xg__learning_rate=0.3, xg__max_depth=6, xg__n_estimators=150, xg__subsample=0.7, score=0.702, total=   0.2s
[CV] xg__colsample_bytree=0.6, xg__learning_rate=0.3, xg__max_depth=6, xg__n_estimators=150, xg__subsample=0.8 
[CV]  xg__colsample_bytree=0.6, xg__learning_rate=0.3, xg__max_depth=6, xg__n_estimators=150, xg__subsample=0.8, score=0.698, total=   0.2s
[CV] xg__colsample_bytree=0.6, xg__learning_rate=0.3, xg__max_depth=6, xg__n_estimators=150, xg__subsample=0.8 
[CV]  xg__colsample_bytree=0.6, xg__learning_rate=0.3, xg__max_depth=6, xg__n_estimators=150, xg__subsample=0.8, score=0.703, total=   0.2s
[CV] xg__colsample_bytree=0.6, xg__learning_rate=0.3, xg__max_depth=6, xg__n_estimators=160, xg__subsamp

[CV]  xg__colsample_bytree=0.6, xg__learning_rate=0.5, xg__max_depth=4, xg__n_estimators=160, xg__subsample=0.8, score=0.700, total=   0.2s
[CV] xg__colsample_bytree=0.6, xg__learning_rate=0.5, xg__max_depth=4, xg__n_estimators=170, xg__subsample=0.6 
[CV]  xg__colsample_bytree=0.6, xg__learning_rate=0.5, xg__max_depth=4, xg__n_estimators=170, xg__subsample=0.6, score=0.667, total=   0.2s
[CV] xg__colsample_bytree=0.6, xg__learning_rate=0.5, xg__max_depth=4, xg__n_estimators=170, xg__subsample=0.6 
[CV]  xg__colsample_bytree=0.6, xg__learning_rate=0.5, xg__max_depth=4, xg__n_estimators=170, xg__subsample=0.6, score=0.698, total=   0.2s
[CV] xg__colsample_bytree=0.6, xg__learning_rate=0.5, xg__max_depth=4, xg__n_estimators=170, xg__subsample=0.7 
[CV]  xg__colsample_bytree=0.6, xg__learning_rate=0.5, xg__max_depth=4, xg__n_estimators=170, xg__subsample=0.7, score=0.690, total=   0.1s
[CV] xg__colsample_bytree=0.6, xg__learning_rate=0.5, xg__max_depth=4, xg__n_estimators=170, xg__subsamp

[CV] xg__colsample_bytree=0.6, xg__learning_rate=0.5, xg__max_depth=5, xg__n_estimators=180, xg__subsample=0.7 
[CV]  xg__colsample_bytree=0.6, xg__learning_rate=0.5, xg__max_depth=5, xg__n_estimators=180, xg__subsample=0.7, score=0.678, total=   0.2s
[CV] xg__colsample_bytree=0.6, xg__learning_rate=0.5, xg__max_depth=5, xg__n_estimators=180, xg__subsample=0.7 
[CV]  xg__colsample_bytree=0.6, xg__learning_rate=0.5, xg__max_depth=5, xg__n_estimators=180, xg__subsample=0.7, score=0.706, total=   0.2s
[CV] xg__colsample_bytree=0.6, xg__learning_rate=0.5, xg__max_depth=5, xg__n_estimators=180, xg__subsample=0.8 
[CV]  xg__colsample_bytree=0.6, xg__learning_rate=0.5, xg__max_depth=5, xg__n_estimators=180, xg__subsample=0.8, score=0.696, total=   0.2s
[CV] xg__colsample_bytree=0.6, xg__learning_rate=0.5, xg__max_depth=5, xg__n_estimators=180, xg__subsample=0.8 
[CV]  xg__colsample_bytree=0.6, xg__learning_rate=0.5, xg__max_depth=5, xg__n_estimators=180, xg__subsample=0.8, score=0.697, total=

[CV]  xg__colsample_bytree=0.6, xg__learning_rate=0.6, xg__max_depth=4, xg__n_estimators=150, xg__subsample=0.8, score=0.695, total=   0.2s
[CV] xg__colsample_bytree=0.6, xg__learning_rate=0.6, xg__max_depth=4, xg__n_estimators=160, xg__subsample=0.6 
[CV]  xg__colsample_bytree=0.6, xg__learning_rate=0.6, xg__max_depth=4, xg__n_estimators=160, xg__subsample=0.6, score=0.665, total=   0.2s
[CV] xg__colsample_bytree=0.6, xg__learning_rate=0.6, xg__max_depth=4, xg__n_estimators=160, xg__subsample=0.6 
[CV]  xg__colsample_bytree=0.6, xg__learning_rate=0.6, xg__max_depth=4, xg__n_estimators=160, xg__subsample=0.6, score=0.684, total=   0.2s
[CV] xg__colsample_bytree=0.6, xg__learning_rate=0.6, xg__max_depth=4, xg__n_estimators=160, xg__subsample=0.7 
[CV]  xg__colsample_bytree=0.6, xg__learning_rate=0.6, xg__max_depth=4, xg__n_estimators=160, xg__subsample=0.7, score=0.686, total=   0.2s
[CV] xg__colsample_bytree=0.6, xg__learning_rate=0.6, xg__max_depth=4, xg__n_estimators=160, xg__subsamp

[CV]  xg__colsample_bytree=0.6, xg__learning_rate=0.6, xg__max_depth=5, xg__n_estimators=170, xg__subsample=0.7, score=0.687, total=   0.2s
[CV] xg__colsample_bytree=0.6, xg__learning_rate=0.6, xg__max_depth=5, xg__n_estimators=170, xg__subsample=0.7 
[CV]  xg__colsample_bytree=0.6, xg__learning_rate=0.6, xg__max_depth=5, xg__n_estimators=170, xg__subsample=0.7, score=0.698, total=   0.2s
[CV] xg__colsample_bytree=0.6, xg__learning_rate=0.6, xg__max_depth=5, xg__n_estimators=170, xg__subsample=0.8 
[CV]  xg__colsample_bytree=0.6, xg__learning_rate=0.6, xg__max_depth=5, xg__n_estimators=170, xg__subsample=0.8, score=0.678, total=   0.2s
[CV] xg__colsample_bytree=0.6, xg__learning_rate=0.6, xg__max_depth=5, xg__n_estimators=170, xg__subsample=0.8 
[CV]  xg__colsample_bytree=0.6, xg__learning_rate=0.6, xg__max_depth=5, xg__n_estimators=170, xg__subsample=0.8, score=0.696, total=   0.2s
[CV] xg__colsample_bytree=0.6, xg__learning_rate=0.6, xg__max_depth=5, xg__n_estimators=180, xg__subsamp

[CV]  xg__colsample_bytree=0.6, xg__learning_rate=0.6, xg__max_depth=6, xg__n_estimators=180, xg__subsample=0.8, score=0.682, total=   0.2s
[CV] xg__colsample_bytree=0.7, xg__learning_rate=0.1, xg__max_depth=4, xg__n_estimators=150, xg__subsample=0.6 
[CV]  xg__colsample_bytree=0.7, xg__learning_rate=0.1, xg__max_depth=4, xg__n_estimators=150, xg__subsample=0.6, score=0.712, total=   0.1s
[CV] xg__colsample_bytree=0.7, xg__learning_rate=0.1, xg__max_depth=4, xg__n_estimators=150, xg__subsample=0.6 
[CV]  xg__colsample_bytree=0.7, xg__learning_rate=0.1, xg__max_depth=4, xg__n_estimators=150, xg__subsample=0.6, score=0.717, total=   0.1s
[CV] xg__colsample_bytree=0.7, xg__learning_rate=0.1, xg__max_depth=4, xg__n_estimators=150, xg__subsample=0.7 
[CV]  xg__colsample_bytree=0.7, xg__learning_rate=0.1, xg__max_depth=4, xg__n_estimators=150, xg__subsample=0.7, score=0.708, total=   0.1s
[CV] xg__colsample_bytree=0.7, xg__learning_rate=0.1, xg__max_depth=4, xg__n_estimators=150, xg__subsamp

[CV]  xg__colsample_bytree=0.7, xg__learning_rate=0.1, xg__max_depth=5, xg__n_estimators=160, xg__subsample=0.7, score=0.709, total=   0.2s
[CV] xg__colsample_bytree=0.7, xg__learning_rate=0.1, xg__max_depth=5, xg__n_estimators=160, xg__subsample=0.7 
[CV]  xg__colsample_bytree=0.7, xg__learning_rate=0.1, xg__max_depth=5, xg__n_estimators=160, xg__subsample=0.7, score=0.730, total=   0.2s
[CV] xg__colsample_bytree=0.7, xg__learning_rate=0.1, xg__max_depth=5, xg__n_estimators=160, xg__subsample=0.8 
[CV]  xg__colsample_bytree=0.7, xg__learning_rate=0.1, xg__max_depth=5, xg__n_estimators=160, xg__subsample=0.8, score=0.710, total=   0.2s
[CV] xg__colsample_bytree=0.7, xg__learning_rate=0.1, xg__max_depth=5, xg__n_estimators=160, xg__subsample=0.8 
[CV]  xg__colsample_bytree=0.7, xg__learning_rate=0.1, xg__max_depth=5, xg__n_estimators=160, xg__subsample=0.8, score=0.719, total=   0.2s
[CV] xg__colsample_bytree=0.7, xg__learning_rate=0.1, xg__max_depth=5, xg__n_estimators=170, xg__subsamp

[CV]  xg__colsample_bytree=0.7, xg__learning_rate=0.1, xg__max_depth=6, xg__n_estimators=170, xg__subsample=0.8, score=0.719, total=   0.3s
[CV] xg__colsample_bytree=0.7, xg__learning_rate=0.1, xg__max_depth=6, xg__n_estimators=180, xg__subsample=0.6 
[CV]  xg__colsample_bytree=0.7, xg__learning_rate=0.1, xg__max_depth=6, xg__n_estimators=180, xg__subsample=0.6, score=0.713, total=   0.2s
[CV] xg__colsample_bytree=0.7, xg__learning_rate=0.1, xg__max_depth=6, xg__n_estimators=180, xg__subsample=0.6 
[CV]  xg__colsample_bytree=0.7, xg__learning_rate=0.1, xg__max_depth=6, xg__n_estimators=180, xg__subsample=0.6, score=0.715, total=   0.3s
[CV] xg__colsample_bytree=0.7, xg__learning_rate=0.1, xg__max_depth=6, xg__n_estimators=180, xg__subsample=0.7 
[CV]  xg__colsample_bytree=0.7, xg__learning_rate=0.1, xg__max_depth=6, xg__n_estimators=180, xg__subsample=0.7, score=0.703, total=   0.2s
[CV] xg__colsample_bytree=0.7, xg__learning_rate=0.1, xg__max_depth=6, xg__n_estimators=180, xg__subsamp

[CV]  xg__colsample_bytree=0.7, xg__learning_rate=0.3, xg__max_depth=5, xg__n_estimators=150, xg__subsample=0.7, score=0.711, total=   0.2s
[CV] xg__colsample_bytree=0.7, xg__learning_rate=0.3, xg__max_depth=5, xg__n_estimators=150, xg__subsample=0.8 
[CV]  xg__colsample_bytree=0.7, xg__learning_rate=0.3, xg__max_depth=5, xg__n_estimators=150, xg__subsample=0.8, score=0.697, total=   0.2s
[CV] xg__colsample_bytree=0.7, xg__learning_rate=0.3, xg__max_depth=5, xg__n_estimators=150, xg__subsample=0.8 
[CV]  xg__colsample_bytree=0.7, xg__learning_rate=0.3, xg__max_depth=5, xg__n_estimators=150, xg__subsample=0.8, score=0.717, total=   0.2s
[CV] xg__colsample_bytree=0.7, xg__learning_rate=0.3, xg__max_depth=5, xg__n_estimators=160, xg__subsample=0.6 
[CV]  xg__colsample_bytree=0.7, xg__learning_rate=0.3, xg__max_depth=5, xg__n_estimators=160, xg__subsample=0.6, score=0.696, total=   0.2s
[CV] xg__colsample_bytree=0.7, xg__learning_rate=0.3, xg__max_depth=5, xg__n_estimators=160, xg__subsamp

[CV]  xg__colsample_bytree=0.7, xg__learning_rate=0.3, xg__max_depth=6, xg__n_estimators=170, xg__subsample=0.6, score=0.693, total=   0.2s
[CV] xg__colsample_bytree=0.7, xg__learning_rate=0.3, xg__max_depth=6, xg__n_estimators=170, xg__subsample=0.6 
[CV]  xg__colsample_bytree=0.7, xg__learning_rate=0.3, xg__max_depth=6, xg__n_estimators=170, xg__subsample=0.6, score=0.692, total=   0.2s
[CV] xg__colsample_bytree=0.7, xg__learning_rate=0.3, xg__max_depth=6, xg__n_estimators=170, xg__subsample=0.7 
[CV]  xg__colsample_bytree=0.7, xg__learning_rate=0.3, xg__max_depth=6, xg__n_estimators=170, xg__subsample=0.7, score=0.700, total=   0.2s
[CV] xg__colsample_bytree=0.7, xg__learning_rate=0.3, xg__max_depth=6, xg__n_estimators=170, xg__subsample=0.7 
[CV]  xg__colsample_bytree=0.7, xg__learning_rate=0.3, xg__max_depth=6, xg__n_estimators=170, xg__subsample=0.7, score=0.702, total=   0.2s
[CV] xg__colsample_bytree=0.7, xg__learning_rate=0.3, xg__max_depth=6, xg__n_estimators=170, xg__subsamp

[CV]  xg__colsample_bytree=0.7, xg__learning_rate=0.5, xg__max_depth=4, xg__n_estimators=180, xg__subsample=0.7, score=0.705, total=   0.2s
[CV] xg__colsample_bytree=0.7, xg__learning_rate=0.5, xg__max_depth=4, xg__n_estimators=180, xg__subsample=0.8 
[CV]  xg__colsample_bytree=0.7, xg__learning_rate=0.5, xg__max_depth=4, xg__n_estimators=180, xg__subsample=0.8, score=0.693, total=   0.2s
[CV] xg__colsample_bytree=0.7, xg__learning_rate=0.5, xg__max_depth=4, xg__n_estimators=180, xg__subsample=0.8 
[CV]  xg__colsample_bytree=0.7, xg__learning_rate=0.5, xg__max_depth=4, xg__n_estimators=180, xg__subsample=0.8, score=0.683, total=   0.2s
[CV] xg__colsample_bytree=0.7, xg__learning_rate=0.5, xg__max_depth=5, xg__n_estimators=150, xg__subsample=0.6 
[CV]  xg__colsample_bytree=0.7, xg__learning_rate=0.5, xg__max_depth=5, xg__n_estimators=150, xg__subsample=0.6, score=0.681, total=   0.2s
[CV] xg__colsample_bytree=0.7, xg__learning_rate=0.5, xg__max_depth=5, xg__n_estimators=150, xg__subsamp

[CV]  xg__colsample_bytree=0.7, xg__learning_rate=0.5, xg__max_depth=6, xg__n_estimators=160, xg__subsample=0.6, score=0.667, total=   0.2s
[CV] xg__colsample_bytree=0.7, xg__learning_rate=0.5, xg__max_depth=6, xg__n_estimators=160, xg__subsample=0.6 
[CV]  xg__colsample_bytree=0.7, xg__learning_rate=0.5, xg__max_depth=6, xg__n_estimators=160, xg__subsample=0.6, score=0.692, total=   0.2s
[CV] xg__colsample_bytree=0.7, xg__learning_rate=0.5, xg__max_depth=6, xg__n_estimators=160, xg__subsample=0.7 
[CV]  xg__colsample_bytree=0.7, xg__learning_rate=0.5, xg__max_depth=6, xg__n_estimators=160, xg__subsample=0.7, score=0.677, total=   0.2s
[CV] xg__colsample_bytree=0.7, xg__learning_rate=0.5, xg__max_depth=6, xg__n_estimators=160, xg__subsample=0.7 
[CV]  xg__colsample_bytree=0.7, xg__learning_rate=0.5, xg__max_depth=6, xg__n_estimators=160, xg__subsample=0.7, score=0.699, total=   0.2s
[CV] xg__colsample_bytree=0.7, xg__learning_rate=0.5, xg__max_depth=6, xg__n_estimators=160, xg__subsamp

[CV]  xg__colsample_bytree=0.7, xg__learning_rate=0.6, xg__max_depth=4, xg__n_estimators=170, xg__subsample=0.7, score=0.684, total=   0.2s
[CV] xg__colsample_bytree=0.7, xg__learning_rate=0.6, xg__max_depth=4, xg__n_estimators=170, xg__subsample=0.8 
[CV]  xg__colsample_bytree=0.7, xg__learning_rate=0.6, xg__max_depth=4, xg__n_estimators=170, xg__subsample=0.8, score=0.671, total=   0.2s
[CV] xg__colsample_bytree=0.7, xg__learning_rate=0.6, xg__max_depth=4, xg__n_estimators=170, xg__subsample=0.8 
[CV]  xg__colsample_bytree=0.7, xg__learning_rate=0.6, xg__max_depth=4, xg__n_estimators=170, xg__subsample=0.8, score=0.692, total=   0.2s
[CV] xg__colsample_bytree=0.7, xg__learning_rate=0.6, xg__max_depth=4, xg__n_estimators=180, xg__subsample=0.6 
[CV]  xg__colsample_bytree=0.7, xg__learning_rate=0.6, xg__max_depth=4, xg__n_estimators=180, xg__subsample=0.6, score=0.688, total=   0.2s
[CV] xg__colsample_bytree=0.7, xg__learning_rate=0.6, xg__max_depth=4, xg__n_estimators=180, xg__subsamp

[CV] xg__colsample_bytree=0.7, xg__learning_rate=0.6, xg__max_depth=6, xg__n_estimators=150, xg__subsample=0.6 
[CV]  xg__colsample_bytree=0.7, xg__learning_rate=0.6, xg__max_depth=6, xg__n_estimators=150, xg__subsample=0.6, score=0.679, total=   0.2s
[CV] xg__colsample_bytree=0.7, xg__learning_rate=0.6, xg__max_depth=6, xg__n_estimators=150, xg__subsample=0.7 
[CV]  xg__colsample_bytree=0.7, xg__learning_rate=0.6, xg__max_depth=6, xg__n_estimators=150, xg__subsample=0.7, score=0.680, total=   0.2s
[CV] xg__colsample_bytree=0.7, xg__learning_rate=0.6, xg__max_depth=6, xg__n_estimators=150, xg__subsample=0.7 
[CV]  xg__colsample_bytree=0.7, xg__learning_rate=0.6, xg__max_depth=6, xg__n_estimators=150, xg__subsample=0.7, score=0.697, total=   0.2s
[CV] xg__colsample_bytree=0.7, xg__learning_rate=0.6, xg__max_depth=6, xg__n_estimators=150, xg__subsample=0.8 
[CV]  xg__colsample_bytree=0.7, xg__learning_rate=0.6, xg__max_depth=6, xg__n_estimators=150, xg__subsample=0.8, score=0.691, total=

[CV]  xg__colsample_bytree=0.8, xg__learning_rate=0.1, xg__max_depth=4, xg__n_estimators=160, xg__subsample=0.8, score=0.705, total=   0.2s
[CV] xg__colsample_bytree=0.8, xg__learning_rate=0.1, xg__max_depth=4, xg__n_estimators=160, xg__subsample=0.8 
[CV]  xg__colsample_bytree=0.8, xg__learning_rate=0.1, xg__max_depth=4, xg__n_estimators=160, xg__subsample=0.8, score=0.715, total=   0.2s
[CV] xg__colsample_bytree=0.8, xg__learning_rate=0.1, xg__max_depth=4, xg__n_estimators=170, xg__subsample=0.6 
[CV]  xg__colsample_bytree=0.8, xg__learning_rate=0.1, xg__max_depth=4, xg__n_estimators=170, xg__subsample=0.6, score=0.708, total=   0.2s
[CV] xg__colsample_bytree=0.8, xg__learning_rate=0.1, xg__max_depth=4, xg__n_estimators=170, xg__subsample=0.6 
[CV]  xg__colsample_bytree=0.8, xg__learning_rate=0.1, xg__max_depth=4, xg__n_estimators=170, xg__subsample=0.6, score=0.715, total=   0.2s
[CV] xg__colsample_bytree=0.8, xg__learning_rate=0.1, xg__max_depth=4, xg__n_estimators=170, xg__subsamp

[CV]  xg__colsample_bytree=0.8, xg__learning_rate=0.1, xg__max_depth=5, xg__n_estimators=180, xg__subsample=0.6, score=0.732, total=   0.2s
[CV] xg__colsample_bytree=0.8, xg__learning_rate=0.1, xg__max_depth=5, xg__n_estimators=180, xg__subsample=0.7 
[CV]  xg__colsample_bytree=0.8, xg__learning_rate=0.1, xg__max_depth=5, xg__n_estimators=180, xg__subsample=0.7, score=0.699, total=   0.2s
[CV] xg__colsample_bytree=0.8, xg__learning_rate=0.1, xg__max_depth=5, xg__n_estimators=180, xg__subsample=0.7 
[CV]  xg__colsample_bytree=0.8, xg__learning_rate=0.1, xg__max_depth=5, xg__n_estimators=180, xg__subsample=0.7, score=0.723, total=   0.3s
[CV] xg__colsample_bytree=0.8, xg__learning_rate=0.1, xg__max_depth=5, xg__n_estimators=180, xg__subsample=0.8 
[CV]  xg__colsample_bytree=0.8, xg__learning_rate=0.1, xg__max_depth=5, xg__n_estimators=180, xg__subsample=0.8, score=0.713, total=   0.3s
[CV] xg__colsample_bytree=0.8, xg__learning_rate=0.1, xg__max_depth=5, xg__n_estimators=180, xg__subsamp

[CV]  xg__colsample_bytree=0.8, xg__learning_rate=0.3, xg__max_depth=4, xg__n_estimators=150, xg__subsample=0.8, score=0.723, total=   0.1s
[CV] xg__colsample_bytree=0.8, xg__learning_rate=0.3, xg__max_depth=4, xg__n_estimators=160, xg__subsample=0.6 
[CV]  xg__colsample_bytree=0.8, xg__learning_rate=0.3, xg__max_depth=4, xg__n_estimators=160, xg__subsample=0.6, score=0.692, total=   0.2s
[CV] xg__colsample_bytree=0.8, xg__learning_rate=0.3, xg__max_depth=4, xg__n_estimators=160, xg__subsample=0.6 
[CV]  xg__colsample_bytree=0.8, xg__learning_rate=0.3, xg__max_depth=4, xg__n_estimators=160, xg__subsample=0.6, score=0.704, total=   0.2s
[CV] xg__colsample_bytree=0.8, xg__learning_rate=0.3, xg__max_depth=4, xg__n_estimators=160, xg__subsample=0.7 
[CV]  xg__colsample_bytree=0.8, xg__learning_rate=0.3, xg__max_depth=4, xg__n_estimators=160, xg__subsample=0.7, score=0.695, total=   0.2s
[CV] xg__colsample_bytree=0.8, xg__learning_rate=0.3, xg__max_depth=4, xg__n_estimators=160, xg__subsamp

[CV]  xg__colsample_bytree=0.8, xg__learning_rate=0.3, xg__max_depth=5, xg__n_estimators=170, xg__subsample=0.7, score=0.709, total=   0.2s
[CV] xg__colsample_bytree=0.8, xg__learning_rate=0.3, xg__max_depth=5, xg__n_estimators=170, xg__subsample=0.7 
[CV]  xg__colsample_bytree=0.8, xg__learning_rate=0.3, xg__max_depth=5, xg__n_estimators=170, xg__subsample=0.7, score=0.705, total=   0.2s
[CV] xg__colsample_bytree=0.8, xg__learning_rate=0.3, xg__max_depth=5, xg__n_estimators=170, xg__subsample=0.8 
[CV]  xg__colsample_bytree=0.8, xg__learning_rate=0.3, xg__max_depth=5, xg__n_estimators=170, xg__subsample=0.8, score=0.699, total=   0.2s
[CV] xg__colsample_bytree=0.8, xg__learning_rate=0.3, xg__max_depth=5, xg__n_estimators=170, xg__subsample=0.8 
[CV]  xg__colsample_bytree=0.8, xg__learning_rate=0.3, xg__max_depth=5, xg__n_estimators=170, xg__subsample=0.8, score=0.701, total=   0.2s
[CV] xg__colsample_bytree=0.8, xg__learning_rate=0.3, xg__max_depth=5, xg__n_estimators=180, xg__subsamp

[CV]  xg__colsample_bytree=0.8, xg__learning_rate=0.3, xg__max_depth=6, xg__n_estimators=180, xg__subsample=0.8, score=0.723, total=   0.3s
[CV] xg__colsample_bytree=0.8, xg__learning_rate=0.5, xg__max_depth=4, xg__n_estimators=150, xg__subsample=0.6 
[CV]  xg__colsample_bytree=0.8, xg__learning_rate=0.5, xg__max_depth=4, xg__n_estimators=150, xg__subsample=0.6, score=0.680, total=   0.2s
[CV] xg__colsample_bytree=0.8, xg__learning_rate=0.5, xg__max_depth=4, xg__n_estimators=150, xg__subsample=0.6 
[CV]  xg__colsample_bytree=0.8, xg__learning_rate=0.5, xg__max_depth=4, xg__n_estimators=150, xg__subsample=0.6, score=0.707, total=   0.2s
[CV] xg__colsample_bytree=0.8, xg__learning_rate=0.5, xg__max_depth=4, xg__n_estimators=150, xg__subsample=0.7 
[CV]  xg__colsample_bytree=0.8, xg__learning_rate=0.5, xg__max_depth=4, xg__n_estimators=150, xg__subsample=0.7, score=0.683, total=   0.2s
[CV] xg__colsample_bytree=0.8, xg__learning_rate=0.5, xg__max_depth=4, xg__n_estimators=150, xg__subsamp

[CV]  xg__colsample_bytree=0.8, xg__learning_rate=0.5, xg__max_depth=5, xg__n_estimators=160, xg__subsample=0.7, score=0.692, total=   0.2s
[CV] xg__colsample_bytree=0.8, xg__learning_rate=0.5, xg__max_depth=5, xg__n_estimators=160, xg__subsample=0.7 
[CV]  xg__colsample_bytree=0.8, xg__learning_rate=0.5, xg__max_depth=5, xg__n_estimators=160, xg__subsample=0.7, score=0.694, total=   0.2s
[CV] xg__colsample_bytree=0.8, xg__learning_rate=0.5, xg__max_depth=5, xg__n_estimators=160, xg__subsample=0.8 
[CV]  xg__colsample_bytree=0.8, xg__learning_rate=0.5, xg__max_depth=5, xg__n_estimators=160, xg__subsample=0.8, score=0.701, total=   0.2s
[CV] xg__colsample_bytree=0.8, xg__learning_rate=0.5, xg__max_depth=5, xg__n_estimators=160, xg__subsample=0.8 
[CV]  xg__colsample_bytree=0.8, xg__learning_rate=0.5, xg__max_depth=5, xg__n_estimators=160, xg__subsample=0.8, score=0.700, total=   0.2s
[CV] xg__colsample_bytree=0.8, xg__learning_rate=0.5, xg__max_depth=5, xg__n_estimators=170, xg__subsamp

[CV]  xg__colsample_bytree=0.8, xg__learning_rate=0.5, xg__max_depth=6, xg__n_estimators=170, xg__subsample=0.8, score=0.710, total=   0.3s
[CV] xg__colsample_bytree=0.8, xg__learning_rate=0.5, xg__max_depth=6, xg__n_estimators=180, xg__subsample=0.6 
[CV]  xg__colsample_bytree=0.8, xg__learning_rate=0.5, xg__max_depth=6, xg__n_estimators=180, xg__subsample=0.6, score=0.675, total=   0.2s
[CV] xg__colsample_bytree=0.8, xg__learning_rate=0.5, xg__max_depth=6, xg__n_estimators=180, xg__subsample=0.6 
[CV]  xg__colsample_bytree=0.8, xg__learning_rate=0.5, xg__max_depth=6, xg__n_estimators=180, xg__subsample=0.6, score=0.681, total=   0.2s
[CV] xg__colsample_bytree=0.8, xg__learning_rate=0.5, xg__max_depth=6, xg__n_estimators=180, xg__subsample=0.7 
[CV]  xg__colsample_bytree=0.8, xg__learning_rate=0.5, xg__max_depth=6, xg__n_estimators=180, xg__subsample=0.7, score=0.687, total=   0.3s
[CV] xg__colsample_bytree=0.8, xg__learning_rate=0.5, xg__max_depth=6, xg__n_estimators=180, xg__subsamp

[CV]  xg__colsample_bytree=0.8, xg__learning_rate=0.6, xg__max_depth=5, xg__n_estimators=150, xg__subsample=0.7, score=0.691, total=   0.2s
[CV] xg__colsample_bytree=0.8, xg__learning_rate=0.6, xg__max_depth=5, xg__n_estimators=150, xg__subsample=0.7 
[CV]  xg__colsample_bytree=0.8, xg__learning_rate=0.6, xg__max_depth=5, xg__n_estimators=150, xg__subsample=0.7, score=0.688, total=   0.2s
[CV] xg__colsample_bytree=0.8, xg__learning_rate=0.6, xg__max_depth=5, xg__n_estimators=150, xg__subsample=0.8 
[CV]  xg__colsample_bytree=0.8, xg__learning_rate=0.6, xg__max_depth=5, xg__n_estimators=150, xg__subsample=0.8, score=0.688, total=   0.2s
[CV] xg__colsample_bytree=0.8, xg__learning_rate=0.6, xg__max_depth=5, xg__n_estimators=150, xg__subsample=0.8 
[CV]  xg__colsample_bytree=0.8, xg__learning_rate=0.6, xg__max_depth=5, xg__n_estimators=150, xg__subsample=0.8, score=0.703, total=   0.2s
[CV] xg__colsample_bytree=0.8, xg__learning_rate=0.6, xg__max_depth=5, xg__n_estimators=160, xg__subsamp

[CV]  xg__colsample_bytree=0.8, xg__learning_rate=0.6, xg__max_depth=6, xg__n_estimators=160, xg__subsample=0.8, score=0.696, total=   0.3s
[CV] xg__colsample_bytree=0.8, xg__learning_rate=0.6, xg__max_depth=6, xg__n_estimators=170, xg__subsample=0.6 
[CV]  xg__colsample_bytree=0.8, xg__learning_rate=0.6, xg__max_depth=6, xg__n_estimators=170, xg__subsample=0.6, score=0.686, total=   0.3s
[CV] xg__colsample_bytree=0.8, xg__learning_rate=0.6, xg__max_depth=6, xg__n_estimators=170, xg__subsample=0.6 
[CV]  xg__colsample_bytree=0.8, xg__learning_rate=0.6, xg__max_depth=6, xg__n_estimators=170, xg__subsample=0.6, score=0.673, total=   0.3s
[CV] xg__colsample_bytree=0.8, xg__learning_rate=0.6, xg__max_depth=6, xg__n_estimators=170, xg__subsample=0.7 
[CV]  xg__colsample_bytree=0.8, xg__learning_rate=0.6, xg__max_depth=6, xg__n_estimators=170, xg__subsample=0.7, score=0.696, total=   0.3s
[CV] xg__colsample_bytree=0.8, xg__learning_rate=0.6, xg__max_depth=6, xg__n_estimators=170, xg__subsamp

[Parallel(n_jobs=1)]: Done 1152 out of 1152 | elapsed:  3.7min finished


CountVector Result:	0.7211488547217996 [Params = xg__colsample_bytree: 0.6 xg__learning_rate: 0.1 xg__max_depth: 6 xg__n_estimators: 150 xg__subsample: 0.7 ]


In [114]:
# clf.fit(xtrain_full_glove_np, ytrain_full)
# test_acc = metrics.f1_score(clf.predict(xtrain_full_glove_np), ytrain_full)

# print (test_acc)

# predictions = clf.predict(xtest_full_glove_np)
# output = pd.DataFrame({'id': test_df.id, 'target': predictions})
# output.to_csv('my_submission.csv', index=False)

#0.9844639286263651 / 0.79007 (using our first guess)
#0.9835359286044008 / 0.74931 (with GridSearch values)
#0.9261006289308175 / 0.79497 (new Gridsearch)

0.9261006289308175


# 6. Deep Learning

In [None]:
# See separate notebook

## 6.1 GPT-2 Application

Source:
 1. https://openai.com/blog/gpt-2-1-5b-release/
 2. https://github.com/openai/gpt-2-output-dataset