<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>
<br></br>

# Sprint Challenge
## *Data Science Unit 4 Sprint 1*

After a week of Natural Language Processing, you've learned some cool new stuff: how to process text, how turn text into vectors, and how to model topics from documents. Apply your newly acquired skills to one of the most famous NLP datasets out there: [Yelp](https://www.yelp.com/dataset/challenge). As part of the job selection process, some of my friends have been asked to create analysis of this dataset, so I want to empower you to have a head start.  

The real dataset is massive (almost 8 gigs uncompressed). I've sampled the data for you to something more managable for the Sprint Challenge. You can analyze the full dataset as a stretch goal or after the sprint challenge. As you work on the challenge, I suggest adding notes about your findings and things you want to analyze in the future.

## Challenge Objectives
*Successfully complete these all these objectives to earn a 2. There are more details on each objective further down in the notebook.*
* <a href="#p1">Part 1</a>: Write a function to tokenize the yelp reviews
* <a href="#p2">Part 2</a>: Create a vector representation of those tokens
* <a href="#p3">Part 3</a>: Use your tokens in a classification model on yelp rating
* <a href="#p4">Part 4</a>: Estimate & Interpret a topic model of the Yelp reviews

In [2]:
import pandas as pd

df = pd.read_json('./data/review_sample.json', lines=True)

In [3]:
df.head()

Unnamed: 0,business_id,cool,date,funny,review_id,stars,text,useful,user_id
0,nDuEqIyRc8YKS1q1fX0CZg,1,2015-03-31 16:50:30,0,eZs2tpEJtXPwawvHnHZIgQ,1,"BEWARE!!! FAKE, FAKE, FAKE....We also own a sm...",10,n1LM36qNg4rqGXIcvVXv8w
1,eMYeEapscbKNqUDCx705hg,0,2015-12-16 05:31:03,0,DoQDWJsNbU0KL1O29l_Xug,4,Came here for lunch Togo. Service was quick. S...,0,5CgjjDAic2-FAvCtiHpytA
2,6Q7-wkCPc1KF75jZLOTcMw,1,2010-06-20 19:14:48,1,DDOdGU7zh56yQHmUnL1idQ,3,I've been to Vegas dozens of times and had nev...,2,BdV-cf3LScmb8kZ7iiBcMA
3,k3zrItO4l9hwfLRwHBDc9w,3,2010-07-13 00:33:45,4,LfTMUWnfGFMOfOIyJcwLVA,1,We went here on a night where they closed off ...,5,cZZnBqh4gAEy4CdNvJailQ
4,6hpfRwGlOzbNv7k5eP9rsQ,1,2018-06-30 02:30:01,0,zJSUdI7bJ8PNJAg4lnl_Gg,4,"3.5 to 4 stars\n\nNot bad for the price, $12.9...",5,n9QO4ClYAS7h9fpQwa5bhA


## Part 1: Tokenize Function
<a id="#p1"></a>

Complete the function `tokenize`. Your function should
- accept one document at a time
- return a list of tokens

You are free to use any method you have learned this week.

In [4]:
import re
import spacy
from spacy.tokenizer import Tokenizer
from nltk.stem import PorterStemmer
from nltk.stem import WordNetLemmatizer
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
import matplotlib.pyplot as plt

In [5]:
nlp = spacy.load("en_core_web_lg")

In [6]:
def tokenize(doc):
    '''tokenizer, lemmatizer, stopword remover'''
    tokenizer = Tokenizer(nlp.vocab)
    tokens_list = [token.lemma_ for token in tokenizer(doc) if (token.is_stop == False) & (token.is_punct == False)]
    clean_doc = ''
    for token in tokens_list:
        clean_doc += token + ' '
    return clean_doc

In [7]:
tokenize("Today is gonna be the day that they throw it all back to you")

'Today gonna day throw '

In [9]:
df['tokens'] = df['text'].apply(tokenize)

In [10]:
df['tokens'].head()

0    BEWARE!!! FAKE, FAKE, FAKE....We small busines...
1    Came lunch Togo. Service quick. Staff friendly...
2    I've Vegas dozen time step foot Circus Circus....
3    go night close street party... well was, actua...
4    3.5 4 star \n\n bad price, $12.99 lunch, senio...
Name: tokens, dtype: object

## Part 2: Vector Representation
<a id="#p2"></a>
1. Create a vector representation of the reviews
2. Write a fake review and query for the 10 most similiar reviews, print the text of the reviews. Do you notice any patterns?
    - Given the size of the dataset, it will probably be best to use a `NearestNeighbors` model for this. 

In [11]:
counter = CountVectorizer(max_df = .97,
                          min_df = 3,
                          stop_words='english',
                          ngram_range= (1,2),
                          tokenizer=tokenize)

In [12]:
dtm = counter.fit_transform(df['text'])

  'stop_words.' % sorted(inconsistent))


In [13]:
dtm = pd.DataFrame(dtm.todense(), columns=counter.get_feature_names())

In [14]:
dtm.head()

Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,!,"""",#,$,&,...,方,日,日 本,是,本,本 人,的,！,！.1,，
0,0,0,0,0,8,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,2,1,1,1,4,0,0,0,2,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,8,4,4,4,4,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0


In [15]:
from sklearn.neighbors import NearestNeighbors

In [16]:
nn = NearestNeighbors(n_neighbors=10, algorithm='kd_tree')
nn.fit(dtm)

NearestNeighbors(algorithm='kd_tree', leaf_size=30, metric='minkowski',
         metric_params=None, n_jobs=None, n_neighbors=10, p=2, radius=1.0)

In [17]:
fake_review = ['I did not like this restaurant. It smelled and the atmosphere was rotten.']
fake_review_counter = counter.transform(fake_review)
arrays = nn.kneighbors(fake_review_counter.todense())

In [18]:
for i in arrays[1][0].tolist():
    print(df['text'][i][:100])

The turn around date is great.
My kinda of place quite and very peaceful atmosphere. I intend to return
Not huge, but an interesting collection.
Low rent yet marked up selection of shoooz.
Never ate ramen before but I would eat here again. The buttered corn was awesome.
Very good and everything tastes fresh. There are many options so you can always switch it up.
This restaurant is too overly priced and it isn't great.  Its A OK.
The tellers here are so nice they make you feel at home and the banking is easier than ever
The girl told me I look like Lauren Conrad so I was happy.
Food and service are ok. There are much better places to go to for lunch.


## Part 3: Classification
<a id="#p3"></a>
Your goal in this section will be to predict `stars` from the review dataset. 

1. Create a piepline object with a sklearn `CountVectorizer` or `TfidfVector` and any sklearn classifier. Use that pipeline to estimate a model to predict `stars`. Use the Pipeline to predict a star rating for your fake review from Part 2. 
2. Tune the entire pipeline with a GridSearch

In [19]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV

In [20]:
X = df['text']
y = df['stars']

X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.2, random_state=42) 



In [21]:
pipeline = Pipeline([('counter', counter),
                     ('rf', RandomForestRegressor())])

In [None]:
parameters = {
    'counter__max_df': (0.75, 1.0),
    'counter__min_df': (0, 10),
    'counter__ngram_range': [(1,2), (1,1)],
    'rf__max_depth':(5,10,15,20),
    'rf__min_samples_split': (2, 10),
    'rf__min_samples_leaf': (1,50)
}

grid_search = GridSearchCV(pipeline,parameters, cv=5, n_jobs=1, verbose=5)
grid_search.fit(X_train, y_train)

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
  'stop_words.' % sorted(inconsistent))


Fitting 5 folds for each of 128 candidates, totalling 640 fits
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=2 




[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=2, score=0.17003100825803774, total=   6.4s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=2 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:   10.3s remaining:    0.0s
  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=2, score=0.21723483584786285, total=   5.4s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=2 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:   19.3s remaining:    0.0s
  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=2, score=0.20122884086137413, total=   5.6s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=2 


[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed:   29.0s remaining:    0.0s
  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=2, score=0.21928694525356132, total=   5.4s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=2 


[Parallel(n_jobs=1)]: Done   4 out of   4 | elapsed:   38.3s remaining:    0.0s
  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=2, score=0.18571316881230227, total=   5.5s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=10, score=0.20131141138985278, total=   3.4s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=10, score=0.21636434322491915, total=   3.1s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=10, score=0.20098958493787356, total=   3.2s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=10, score=0.23312182849096108, total=   4.0s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=10, score=0.1985739999099034, total=   3.7s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=2, score=0.16608401794276706, total=   3.4s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=2, score=0.22083914226465018, total=   4.7s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=2, score=0.16847793269845845, total=   3.9s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=2, score=0.21896613571312604, total=   4.0s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=2, score=0.13147958530474424, total=   3.5s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=10, score=0.13903783323141206, total=   3.9s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=10, score=0.19746148171884947, total=   3.9s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=10, score=0.1925965317544155, total=   3.8s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=10, score=0.21316311546524983, total=   3.3s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=10, score=0.1602230389033985, total=   3.7s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=2, score=0.18473623084021798, total=   6.8s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=2, score=0.21685583791178042, total=   7.5s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=2, score=0.1965307008052205, total=   6.9s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=2, score=0.27389192876306245, total=   6.7s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=2, score=0.1690224947203579, total=   6.0s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=10, score=0.16060816008243428, total=   6.8s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=10, score=0.22863674346921736, total=   5.8s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=10, score=0.2355138973247426, total=   4.4s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=10, score=0.22111924176131173, total=   4.1s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=10, score=0.1971778802436439, total=   4.1s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=2, score=0.1614173457474981, total=   3.2s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=2, score=0.1890779087966332, total=   3.1s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=2, score=0.2099790531858512, total=   3.3s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=2, score=0.20353907891098777, total=   3.3s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=2, score=0.1330381269992701, total=   4.2s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=10, score=0.1744981935938028, total=   3.3s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=10, score=0.1952755540994957, total=   4.7s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=10, score=0.20101517643148892, total=   3.2s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=10, score=0.20987126350730112, total=   3.1s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=10, score=0.15324589252480258, total=   3.1s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=15, rf__min_samples_leaf=1, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=15, rf__min_samples_leaf=1, rf__min_samples_split=2, score=0.24830806931034824, total=   8.5s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=15, rf__min_samples_leaf=1, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=15, rf__min_samples_leaf=1, rf__min_samples_split=2, score=0.22032120768013663, total=   5.8s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=15, rf__min_samples_leaf=1, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=15, rf__min_samples_leaf=1, rf__min_samples_split=2, score=0.25896925182465413, total=   5.9s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=15, rf__min_samples_leaf=1, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=15, rf__min_samples_leaf=1, rf__min_samples_split=2, score=0.21673425660206735, total=   5.8s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=15, rf__min_samples_leaf=1, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=15, rf__min_samples_leaf=1, rf__min_samples_split=2, score=0.14369599826414847, total=   6.5s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=15, rf__min_samples_leaf=1, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=15, rf__min_samples_leaf=1, rf__min_samples_split=10, score=0.23182244583766043, total=   6.0s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=15, rf__min_samples_leaf=1, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=15, rf__min_samples_leaf=1, rf__min_samples_split=10, score=0.22083942957028158, total=   4.8s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=15, rf__min_samples_leaf=1, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=15, rf__min_samples_leaf=1, rf__min_samples_split=10, score=0.23539528188774692, total=   5.0s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=15, rf__min_samples_leaf=1, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=15, rf__min_samples_leaf=1, rf__min_samples_split=10, score=0.20397324914517845, total=   5.4s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=15, rf__min_samples_leaf=1, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=15, rf__min_samples_leaf=1, rf__min_samples_split=10, score=0.19788976904270006, total=   5.9s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=15, rf__min_samples_leaf=50, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=15, rf__min_samples_leaf=50, rf__min_samples_split=2, score=0.1499317455423409, total=   3.1s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=15, rf__min_samples_leaf=50, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=15, rf__min_samples_leaf=50, rf__min_samples_split=2, score=0.2173222725842502, total=   3.2s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=15, rf__min_samples_leaf=50, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=15, rf__min_samples_leaf=50, rf__min_samples_split=2, score=0.17245999637208564, total=   7.0s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=15, rf__min_samples_leaf=50, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=15, rf__min_samples_leaf=50, rf__min_samples_split=2, score=0.21511003755677052, total=   4.9s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=15, rf__min_samples_leaf=50, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=15, rf__min_samples_leaf=50, rf__min_samples_split=2, score=0.14694156813787995, total=   4.3s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=15, rf__min_samples_leaf=50, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=15, rf__min_samples_leaf=50, rf__min_samples_split=10, score=0.1475809216107674, total=   3.9s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=15, rf__min_samples_leaf=50, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=15, rf__min_samples_leaf=50, rf__min_samples_split=10, score=0.18738874278577455, total=   3.6s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=15, rf__min_samples_leaf=50, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=15, rf__min_samples_leaf=50, rf__min_samples_split=10, score=0.18529204082088513, total=   4.1s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=15, rf__min_samples_leaf=50, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=15, rf__min_samples_leaf=50, rf__min_samples_split=10, score=0.205706475102979, total=   4.8s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=15, rf__min_samples_leaf=50, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=15, rf__min_samples_leaf=50, rf__min_samples_split=10, score=0.1671254489318631, total=   4.2s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=20, rf__min_samples_leaf=1, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=20, rf__min_samples_leaf=1, rf__min_samples_split=2, score=0.21864839656863477, total=   8.6s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=20, rf__min_samples_leaf=1, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=20, rf__min_samples_leaf=1, rf__min_samples_split=2, score=0.2060811479426654, total=   6.0s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=20, rf__min_samples_leaf=1, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=20, rf__min_samples_leaf=1, rf__min_samples_split=2, score=0.23240857229148637, total=   5.5s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=20, rf__min_samples_leaf=1, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=20, rf__min_samples_leaf=1, rf__min_samples_split=2, score=0.21682996498959262, total=   7.6s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=20, rf__min_samples_leaf=1, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=20, rf__min_samples_leaf=1, rf__min_samples_split=2, score=0.16077184623477525, total=   7.9s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=20, rf__min_samples_leaf=1, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=20, rf__min_samples_leaf=1, rf__min_samples_split=10, score=0.15646311272710323, total=   4.8s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=20, rf__min_samples_leaf=1, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=20, rf__min_samples_leaf=1, rf__min_samples_split=10, score=0.22712677469433673, total=   5.1s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=20, rf__min_samples_leaf=1, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=20, rf__min_samples_leaf=1, rf__min_samples_split=10, score=0.24409715169664092, total=   6.3s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=20, rf__min_samples_leaf=1, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=20, rf__min_samples_leaf=1, rf__min_samples_split=10, score=0.21946666800282433, total=   6.2s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=20, rf__min_samples_leaf=1, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=20, rf__min_samples_leaf=1, rf__min_samples_split=10, score=0.14826853171826837, total=   5.6s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=20, rf__min_samples_leaf=50, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=20, rf__min_samples_leaf=50, rf__min_samples_split=2, score=0.1693541110810376, total=   3.1s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=20, rf__min_samples_leaf=50, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=20, rf__min_samples_leaf=50, rf__min_samples_split=2, score=0.2160367491717424, total=   4.0s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=20, rf__min_samples_leaf=50, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=20, rf__min_samples_leaf=50, rf__min_samples_split=2, score=0.15724835045371, total=   3.1s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=20, rf__min_samples_leaf=50, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=20, rf__min_samples_leaf=50, rf__min_samples_split=2, score=0.1888768749273858, total=   4.1s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=20, rf__min_samples_leaf=50, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=20, rf__min_samples_leaf=50, rf__min_samples_split=2, score=0.14357803480053744, total=   4.9s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=20, rf__min_samples_leaf=50, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=20, rf__min_samples_leaf=50, rf__min_samples_split=10, score=0.16215120066887698, total=   5.2s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=20, rf__min_samples_leaf=50, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=20, rf__min_samples_leaf=50, rf__min_samples_split=10, score=0.22500714825675483, total=   3.8s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=20, rf__min_samples_leaf=50, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=20, rf__min_samples_leaf=50, rf__min_samples_split=10, score=0.1866314493235527, total=   3.7s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=20, rf__min_samples_leaf=50, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=20, rf__min_samples_leaf=50, rf__min_samples_split=10, score=0.21161442668626695, total=   3.0s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=20, rf__min_samples_leaf=50, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 2), rf__max_depth=20, rf__min_samples_leaf=50, rf__min_samples_split=10, score=0.1581147854460535, total=   4.9s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=2, score=0.08445425765450254, total=   2.0s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=2, score=0.13083572674507848, total=   2.1s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=2, score=0.16120513896767208, total=   2.6s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=2, score=0.16700075682194382, total=   2.1s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=2, score=0.11547159643209581, total=   2.9s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=10, score=0.0876013348370469, total=   2.3s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=10, score=0.14827439655638197, total=   2.5s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=10, score=0.17696882539872671, total=   2.8s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=10, score=0.16272815815405695, total=   3.4s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=10, score=0.11354864116350427, total=   2.3s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=2, score=0.10897069988308106, total=   2.2s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=2, score=0.15182219678060394, total=   3.6s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=2, score=0.14367732259367316, total=   2.5s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=2, score=0.11763282220143911, total=   2.6s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=2, score=0.1360342119941157, total=   2.5s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=10, score=0.10013861545193425, total=   3.4s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=10, score=0.13175022175724782, total=   3.3s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=10, score=0.14209354285242937, total=   2.5s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=10, score=0.13211158058176298, total=   2.2s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=10, score=0.1054195748958997, total=   2.0s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=2, score=0.08011300646753407, total=   2.2s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=2, score=0.14086208533894873, total=   2.0s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=2, score=0.13217653789108663, total=   2.7s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=2, score=0.1414132979225945, total=   5.4s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=2, score=0.06471505422241353, total=   2.3s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=10, score=0.11124766753036996, total=   3.3s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=10, score=0.14601910731890633, total=   2.8s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=10, score=0.1854820390420491, total=   2.1s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=10, score=0.13594894080048148, total=   2.1s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=10, score=0.10856454027486839, total=   2.4s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=2, score=0.10672636105055754, total=   2.3s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=2, score=0.13836127290769917, total=   2.1s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=2, score=0.14251408687534706, total=   2.1s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=2, score=0.13395200417762287, total=   2.4s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=2, score=0.09041223915853502, total=   1.9s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=10, score=0.0946849168212941, total=   1.9s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=10, score=0.13705278008568267, total=   1.9s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=10, score=0.14984641609397475, total=   2.0s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=10, score=0.14026788316683636, total=   1.9s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=10, score=0.09788022975273425, total=   1.9s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=15, rf__min_samples_leaf=1, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=15, rf__min_samples_leaf=1, rf__min_samples_split=2, score=-1.3000667626883455e-05, total=   2.0s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=15, rf__min_samples_leaf=1, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=15, rf__min_samples_leaf=1, rf__min_samples_split=2, score=0.1271046868941802, total=   2.6s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=15, rf__min_samples_leaf=1, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=15, rf__min_samples_leaf=1, rf__min_samples_split=2, score=0.20052072794694953, total=   3.2s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=15, rf__min_samples_leaf=1, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=15, rf__min_samples_leaf=1, rf__min_samples_split=2, score=0.027411291588461232, total=   3.2s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=15, rf__min_samples_leaf=1, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=15, rf__min_samples_leaf=1, rf__min_samples_split=2, score=0.07116211930568173, total=   2.8s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=15, rf__min_samples_leaf=1, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=15, rf__min_samples_leaf=1, rf__min_samples_split=10, score=0.09493933179755265, total=   2.6s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=15, rf__min_samples_leaf=1, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=15, rf__min_samples_leaf=1, rf__min_samples_split=10, score=0.1603058714574107, total=   2.3s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=15, rf__min_samples_leaf=1, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=15, rf__min_samples_leaf=1, rf__min_samples_split=10, score=0.20264259054408507, total=   2.4s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=15, rf__min_samples_leaf=1, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=15, rf__min_samples_leaf=1, rf__min_samples_split=10, score=0.1178804874054522, total=   2.0s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=15, rf__min_samples_leaf=1, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=15, rf__min_samples_leaf=1, rf__min_samples_split=10, score=0.08257194109878019, total=   2.3s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=15, rf__min_samples_leaf=50, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=15, rf__min_samples_leaf=50, rf__min_samples_split=2, score=0.0933179708282097, total=   2.0s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=15, rf__min_samples_leaf=50, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=15, rf__min_samples_leaf=50, rf__min_samples_split=2, score=0.13750101068023934, total=   2.1s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=15, rf__min_samples_leaf=50, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=15, rf__min_samples_leaf=50, rf__min_samples_split=2, score=0.13590937851028717, total=   2.4s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=15, rf__min_samples_leaf=50, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=15, rf__min_samples_leaf=50, rf__min_samples_split=2, score=0.13856550338237317, total=   2.1s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=15, rf__min_samples_leaf=50, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=15, rf__min_samples_leaf=50, rf__min_samples_split=2, score=0.09686796826484723, total=   2.4s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=15, rf__min_samples_leaf=50, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=15, rf__min_samples_leaf=50, rf__min_samples_split=10, score=0.09909994943895105, total=   2.1s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=15, rf__min_samples_leaf=50, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=15, rf__min_samples_leaf=50, rf__min_samples_split=10, score=0.1361556087941883, total=   2.3s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=15, rf__min_samples_leaf=50, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=15, rf__min_samples_leaf=50, rf__min_samples_split=10, score=0.15275556834001303, total=   2.1s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=15, rf__min_samples_leaf=50, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=15, rf__min_samples_leaf=50, rf__min_samples_split=10, score=0.14765036247215935, total=   2.0s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=15, rf__min_samples_leaf=50, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=15, rf__min_samples_leaf=50, rf__min_samples_split=10, score=0.10882651141485455, total=   1.8s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=20, rf__min_samples_leaf=1, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=20, rf__min_samples_leaf=1, rf__min_samples_split=2, score=-0.05634159871121014, total=   2.0s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=20, rf__min_samples_leaf=1, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=20, rf__min_samples_leaf=1, rf__min_samples_split=2, score=0.03586319808383809, total=   1.9s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=20, rf__min_samples_leaf=1, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=20, rf__min_samples_leaf=1, rf__min_samples_split=2, score=0.07530630488244483, total=   1.9s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=20, rf__min_samples_leaf=1, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=20, rf__min_samples_leaf=1, rf__min_samples_split=2, score=0.04852832112085925, total=   1.9s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=20, rf__min_samples_leaf=1, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=20, rf__min_samples_leaf=1, rf__min_samples_split=2, score=-0.010217622704520759, total=   1.9s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=20, rf__min_samples_leaf=1, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=20, rf__min_samples_leaf=1, rf__min_samples_split=10, score=0.037246962609449374, total=   1.9s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=20, rf__min_samples_leaf=1, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=20, rf__min_samples_leaf=1, rf__min_samples_split=10, score=0.1260668592034201, total=   1.9s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=20, rf__min_samples_leaf=1, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=20, rf__min_samples_leaf=1, rf__min_samples_split=10, score=0.15247648307112782, total=   1.9s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=20, rf__min_samples_leaf=1, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=20, rf__min_samples_leaf=1, rf__min_samples_split=10, score=0.0848721937811483, total=   1.9s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=20, rf__min_samples_leaf=1, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=20, rf__min_samples_leaf=1, rf__min_samples_split=10, score=0.046079226208270874, total=   1.9s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=20, rf__min_samples_leaf=50, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=20, rf__min_samples_leaf=50, rf__min_samples_split=2, score=0.08855935284719885, total=   1.7s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=20, rf__min_samples_leaf=50, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=20, rf__min_samples_leaf=50, rf__min_samples_split=2, score=0.15592762336358013, total=   1.7s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=20, rf__min_samples_leaf=50, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=20, rf__min_samples_leaf=50, rf__min_samples_split=2, score=0.1414391564056774, total=   2.6s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=20, rf__min_samples_leaf=50, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=20, rf__min_samples_leaf=50, rf__min_samples_split=2, score=0.13642440449537052, total=   1.8s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=20, rf__min_samples_leaf=50, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=20, rf__min_samples_leaf=50, rf__min_samples_split=2, score=0.0988780751992987, total=   1.7s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=20, rf__min_samples_leaf=50, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=20, rf__min_samples_leaf=50, rf__min_samples_split=10, score=0.10280628803160619, total=   2.0s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=20, rf__min_samples_leaf=50, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=20, rf__min_samples_leaf=50, rf__min_samples_split=10, score=0.13496685536810127, total=   1.7s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=20, rf__min_samples_leaf=50, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=20, rf__min_samples_leaf=50, rf__min_samples_split=10, score=0.13010344243343608, total=   1.8s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=20, rf__min_samples_leaf=50, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=20, rf__min_samples_leaf=50, rf__min_samples_split=10, score=0.14601969302921203, total=   1.8s
[CV] counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=20, rf__min_samples_leaf=50, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=0, counter__ngram_range=(1, 1), rf__max_depth=20, rf__min_samples_leaf=50, rf__min_samples_split=10, score=0.12204239574287667, total=   1.8s
[CV] counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=2, score=0.17864172811572454, total=   2.6s
[CV] counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=2, score=0.2165928850047194, total=   2.5s
[CV] counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=2, score=0.1842894775689179, total=   2.7s
[CV] counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=2, score=0.18422848043601214, total=   2.6s
[CV] counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=2, score=0.18197357940426884, total=   2.6s
[CV] counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=10, score=0.1724212414095424, total=   2.6s
[CV] counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=10, score=0.24974232327501097, total=   2.5s
[CV] counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=10, score=0.2245571695975389, total=   2.6s
[CV] counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=10, score=0.22784877878292376, total=   2.5s
[CV] counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=1, rf__min_samples_split=10, score=0.17467805553025081, total=   2.5s
[CV] counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=2, score=0.17164526570522343, total=   2.4s
[CV] counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=2, score=0.22193337392930987, total=   2.4s
[CV] counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=2, score=0.1689560174051079, total=   2.4s
[CV] counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=2, score=0.19203817712162496, total=   2.5s
[CV] counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=2, score=0.1728096063750607, total=   2.3s
[CV] counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=10, score=0.14374252023901857, total=   2.3s
[CV] counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=10, score=0.20939585025837534, total=   2.4s
[CV] counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=10, score=0.18773101014343951, total=   2.4s
[CV] counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=10, score=0.1948031496634006, total=   2.4s
[CV] counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=5, rf__min_samples_leaf=50, rf__min_samples_split=10, score=0.1552632110881267, total=   2.4s
[CV] counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=2, score=0.1743041248529951, total=   3.5s
[CV] counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=2, score=0.2861598112036051, total=   3.6s
[CV] counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=2, score=0.21662342599684503, total=   3.6s
[CV] counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=2, score=0.26219106790117397, total=   3.4s
[CV] counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=2, score=0.17689108248494004, total=   3.4s
[CV] counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=10, score=0.1584625112874396, total=   3.2s
[CV] counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=10, score=0.19942781406908305, total=   3.4s
[CV] counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=10, score=0.23425486971232456, total=   3.2s
[CV] counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=10, score=0.24388176386526517, total=   3.2s
[CV] counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=1, rf__min_samples_split=10, score=0.19475693281019169, total=   3.2s
[CV] counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=2, score=0.18053160661532716, total=   2.4s
[CV] counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=2, score=0.20075923992893063, total=   2.4s
[CV] counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=2, score=0.22445324853145798, total=   2.4s
[CV] counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=2, score=0.23529580368706526, total=   2.6s
[CV] counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=2, score=0.17897652747909165, total=   2.5s
[CV] counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=10, score=0.153639525916764, total=   3.7s
[CV] counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=10, score=0.21704984097396518, total=   2.5s
[CV] counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=10, score=0.18764313564308954, total=   2.4s
[CV] counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=10, score=0.20546745790507392, total=   2.4s
[CV] counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=10, rf__min_samples_leaf=50, rf__min_samples_split=10, score=0.16358928781716076, total=   2.3s
[CV] counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=15, rf__min_samples_leaf=1, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=15, rf__min_samples_leaf=1, rf__min_samples_split=2, score=0.15759613744557688, total=   4.3s
[CV] counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=15, rf__min_samples_leaf=1, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=15, rf__min_samples_leaf=1, rf__min_samples_split=2, score=0.24715591949327764, total=   4.1s
[CV] counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=15, rf__min_samples_leaf=1, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=15, rf__min_samples_leaf=1, rf__min_samples_split=2, score=0.18799546378799614, total=   4.3s
[CV] counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=15, rf__min_samples_leaf=1, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=15, rf__min_samples_leaf=1, rf__min_samples_split=2, score=0.2327731849019723, total=   4.0s
[CV] counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=15, rf__min_samples_leaf=1, rf__min_samples_split=2 


  'stop_words.' % sorted(inconsistent))


[CV]  counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=15, rf__min_samples_leaf=1, rf__min_samples_split=2, score=0.13569502311788373, total=   5.9s
[CV] counter__max_df=0.75, counter__min_df=10, counter__ngram_range=(1, 2), rf__max_depth=15, rf__min_samples_leaf=1, rf__min_samples_split=10 


  'stop_words.' % sorted(inconsistent))


In [None]:
grid_search.predict(fake_review)

## Part 4: Topic Modeling

Let's find out what those yelp reviews are saying! :D

1. Estimate a LDA topic model of the review text
    - Keep the `iterations` parameter at or below 5 to reduce run time
    - The `workers` parameter should match the number of physical cores on your machine.
2. Create 1-2 visualizations of the results
    - You can use the most important 3 words of a topic in relevant visualizations. Refer to yesterday's notebook to extract. 
3. In markdown, write 1-2 paragraphs of analysis on the results of your topic model

__*Note*__: You can pass the DataFrame column of text reviews to gensim. You do not have to use a generator.

In [None]:
from gensim.models import LdaMulticore
from gensim.corpora import Dictionary

Learn the vocubalary of the yelp data:

In [None]:
id2word = Dictionary(df['text'])

Create a bag of words representation of the entire corpus

In [None]:
corpus = [id2word.doc2bow(text) for text in df['text']]

Your LDA model should be ready for estimation: 

In [None]:
lda = LdaMulticore(corpus=corpus,
                   id2word=id2word,
                   iterations=5,
                   workers=1,
                   num_topics = 10 # You can change this parameter
                  )

In [None]:
import pyLDAvis.gensim

pyLDAvis.enable_notebook()
pyLDAvis.gensim.prepare(lda, corpus, id2word)

Create 1-2 visualizations of the results

## Stretch Goals

Complete one of more of these to push your score towards a three: 
* Incorporate named entity recognition into your analysis
* Compare vectorization methods in the classification section
* Analyze more (or all) of the yelp dataset - this one is v. hard. 
* Use a generator object on the reviews file - this would help you with the analyzing the whole dataset.
* Incorporate any of the other yelp dataset entities in your analysis (business, users, etc.)