# Project 3: Web APIs & Classification

General Assembly DSI 19 Project 3 Adrian Teng 

# Executive Summary

In this project, it is splitted into three parts respectively:
- API & EDA (01_api_eda)
- Processing & Modeling (02_processing_modeling)
- Conclusion (03_conclusion)

In the second notebook, 02_processing_modeling, chosen subreddits was to pre-process. Regex Regular Expression Techniques was used to tokenize each post and title. Next, vectors of Stemmed and Lemmatized words from the tokens was created. GridSearch will be run across all classification models to rule out non-viable options. 

Classification models: Multinomial Naive Bayes, K-Nearest Neighbors and Logistic Regression will be used for accessment with the pre-processed data. They are also tested using two-vectorization transformers: CountVectorizer and TfidfVectorizer.

# Content

- Pre-processing
- Tokenization
- Stemming & Lemmazting 
- Train-Test Split
- Model Selection
- GridSearch CV
- CountVectorizer
- TFIDFVectorizer


# Notebook 2: Processing & Modeling

In [1]:

# library imports
import requests
import time
import pandas as pd
import numpy as np
import ast
import regex as re
from tqdm import tqdm
import collections


# preprocessing imports
from nltk.tokenize import RegexpTokenizer
from nltk.stem import WordNetLemmatizer
from nltk.stem.porter import PorterStemmer
from sklearn.metrics import confusion_matrix, accuracy_score
from sklearn.model_selection import GridSearchCV, train_test_split, cross_val_score
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer

In [2]:
# random var state
r = 42

In [3]:
#opening the datasets from notebook 01

df = pd.read_csv('../datasets/tifucofns.csv')
tf_df = pd.read_csv('../datasets/tf_df.csv')
cfns_df = pd.read_csv('../datasets/cfns_df.csv')

In [4]:
df.head()

Unnamed: 0,title,authors,posts,upvotes,tifu,content
0,TIFU by accidentally hitting on a customer,t2_3q0ykzzz,Not a long story but as a new years resolution...,4869,1,TIFU by accidentally hitting on a customer Not...
1,TIFU by making someone think I was going to mu...,t2_azh6y,This just happened and I can't believe I didn'...,22880,1,TIFU by making someone think I was going to mu...
2,TIFU by having my husband examine me,t2_8gfpn13y,This happened last night.\n\nI am seven months...,9895,1,TIFU by having my husband examine me This happ...
3,TIFU by taking a new dose of ADHD medication a...,t2_9rkodgxe,"TIFU. Yes, today. Just now. I am still trying ...",387,1,TIFU by taking a new dose of ADHD medication a...
4,Tifu by sitting too close to the fire to escap...,t2_5rfxegiq,TL/DR down the bottom. \n\nNote: Maybe not sui...,15661,1,Tifu by sitting too close to the fire to escap...


In [5]:
df.shape

(1482, 6)

## Pre-Processing

### Tokenizing titles and posts

In [6]:
rtr = RegexpTokenizer(r"[\w/\']+") # regex to include words, slash characters for urls, apostrophes

In [7]:
# removing html-formatted character values
for i, texts in enumerate(df.content):    #each string in the content column 
    texts_loop = texts.replace('&amp;', '&')          #removing symbols &, nzsp, nbsp, \n
    texts_loop = texts_loop.replace('#x200B;', ' ') 
    texts_loop = texts_loop.replace('nbsp;', ' ')
    df.content.iloc[i] = texts_loop.replace('\n', ' ').strip()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  iloc._setitem_with_indexer(indexer, value)


In [8]:
len(df.content)

1482

In [9]:
content_tokens = []  # create an empty token list for the column content

for i in range(len(df.content)):
    loop_tokens = rtr.tokenize(df.content.iloc[i].lower())     #using iloc to skip removed rows
    for j, token in enumerate(loop_tokens):
        if re.match(r"\d+[\w]*", token):
            loop_tokens[j] = ''
        if re.match(r"//[/w]*", token):
            loop_tokens[j] = ''
        if ('tifu' in token)|('confessions' in token)|('http' in token):        # removing the titles and http
            loop_tokens[j] = ''
    content_tokens.append(loop_tokens)        #adding tokenized string to the token list

In [10]:
len(content_tokens)

1482

In [11]:
len(content_tokens[0])

211

In [12]:
content_tokens[0:100][0:20]

[['',
  'by',
  'accidentally',
  'hitting',
  'on',
  'a',
  'customer',
  'not',
  'a',
  'long',
  'story',
  'but',
  'as',
  'a',
  'new',
  'years',
  'resolution',
  'i',
  'told',
  'myself',
  'i',
  'would',
  'make',
  'an',
  'effort',
  'to',
  'compliment',
  'people',
  'because',
  'it',
  'always',
  'makes',
  'me',
  'feel',
  'good',
  'when',
  'i',
  'receive',
  'one',
  'so',
  'this',
  'woman',
  'comes',
  'in',
  'and',
  "she's",
  'really',
  'nice',
  'and',
  'friendly',
  'and',
  'as',
  "i'm",
  'making',
  'her',
  'coffee',
  'we',
  'are',
  'having',
  'a',
  'chat',
  'and',
  'she',
  'has',
  'this',
  'huge',
  'belt',
  'that',
  'is',
  'absolutely',
  'covered',
  'in',
  'those',
  'sparkly',
  'jewel',
  'thingies',
  'so',
  'i',
  'say',
  'i',
  'really',
  'like',
  'your',
  'belt',
  "it's",
  'really',
  'shiny',
  'she',
  'smiles',
  'and',
  'says',
  'thanks',
  'so',
  'i',
  'go',
  'i',
  'wish',
  'i',
  'could',
  'pull',


In [13]:
#format the tokenized for vectorizer

content_tokens = [" ".join(i) for i in content_tokens]
content_tokens[:5]

[" by accidentally hitting on a customer not a long story but as a new years resolution i told myself i would make an effort to compliment people because it always makes me feel good when i receive one so this woman comes in and she's really nice and friendly and as i'm making her coffee we are having a chat and she has this huge belt that is absolutely covered in those sparkly jewel thingies so i say i really like your belt it's really shiny she smiles and says thanks so i go i wish i could pull it off not thinking that it could mean something else other than i wish it looked good on me she blushes and looks down and when i finish her coffee she looks at me and smiles and says thanks so my assistant manager comes over and she's like did you just tell her you wanted to take her belt off my face goes red as i realise what i said i explained it to my assistant manager who just laughed and told me to think about what i say before i say it tl dr complimented a customer on her belt and told

## Stem tokens

In [14]:
# Instantiate object of class PorterStemmer.
p_stemmer = PorterStemmer()

In [15]:
# Stem tokens.
stem_spam = [p_stemmer.stem(i) for i in content_tokens]

In [16]:
# Compare tokens to stemmed version.

list(zip(content_tokens, stem_spam))

[(" by accidentally hitting on a customer not a long story but as a new years resolution i told myself i would make an effort to compliment people because it always makes me feel good when i receive one so this woman comes in and she's really nice and friendly and as i'm making her coffee we are having a chat and she has this huge belt that is absolutely covered in those sparkly jewel thingies so i say i really like your belt it's really shiny she smiles and says thanks so i go i wish i could pull it off not thinking that it could mean something else other than i wish it looked good on me she blushes and looks down and when i finish her coffee she looks at me and smiles and says thanks so my assistant manager comes over and she's like did you just tell her you wanted to take her belt off my face goes red as i realise what i said i explained it to my assistant manager who just laughed and told me to think about what i say before i say it tl dr complimented a customer on her belt and tol

In [17]:
# Print only those stemmed tokens that are different.

for i in range(len(content_tokens)):
    if content_tokens[i] != stem_spam[i]:
        print((content_tokens[i], stem_spam[i]))

(" by making someone think i was going to murder and bury her this just happened and i can't believe i didn't think about the situation for reference i am a pretty big guy and don't exactly look like a teddy bear i posted a folding wagon online and someone said she wanted it to both of our surprise we work at the same hospital so drop off / pickup would be easy our hospital is huge so she agreed to meet me at my car in a parking garage the wagon was in my car and we greeted one another and i opened my trunk to get the wagon out she immediately stepped back when i opened the trunk then i saw it i will take a step back to mention that i spent last weekend at my parents house helping them tear down a shed and a bunch of trees i had all the tools at my house so i brought them over to help the tools include a shovel pickaxe sledgehammer recirpocating saw axe hatchet hacksaw large blue tarp work gloves and rope they were all covered in mud it was literally a murder and disposal kit that all 

In [18]:
# combined stemmed to list

posts_stem = []

for post in stem_spam:
    posts_stem.append(''.join(post))


In [19]:
posts_stem

[" by accidentally hitting on a customer not a long story but as a new years resolution i told myself i would make an effort to compliment people because it always makes me feel good when i receive one so this woman comes in and she's really nice and friendly and as i'm making her coffee we are having a chat and she has this huge belt that is absolutely covered in those sparkly jewel thingies so i say i really like your belt it's really shiny she smiles and says thanks so i go i wish i could pull it off not thinking that it could mean something else other than i wish it looked good on me she blushes and looks down and when i finish her coffee she looks at me and smiles and says thanks so my assistant manager comes over and she's like did you just tell her you wanted to take her belt off my face goes red as i realise what i said i explained it to my assistant manager who just laughed and told me to think about what i say before i say it tl dr complimented a customer on her belt and told

## Lemmatized tokens

In [20]:
# Lemmatize tokens.

lemmatizer = WordNetLemmatizer()

In [21]:
# Compare tokens to lemmatized version.

tokens_lem = [lemmatizer.lemmatize(i) for i in content_tokens]

In [22]:

# Print only those lemmatized tokens that are different.
for i in range(len(content_tokens)):
    if content_tokens[i] != tokens_lem[i]:
        print((content_tokens[i], tokens_lem[i]))

In [23]:
#combine lemmatized to list
posts_lem = []

for post in tokens_lem:
    posts_lem.append(''.join(post))

In [24]:
posts_lem[:4]

[" by accidentally hitting on a customer not a long story but as a new years resolution i told myself i would make an effort to compliment people because it always makes me feel good when i receive one so this woman comes in and she's really nice and friendly and as i'm making her coffee we are having a chat and she has this huge belt that is absolutely covered in those sparkly jewel thingies so i say i really like your belt it's really shiny she smiles and says thanks so i go i wish i could pull it off not thinking that it could mean something else other than i wish it looked good on me she blushes and looks down and when i finish her coffee she looks at me and smiles and says thanks so my assistant manager comes over and she's like did you just tell her you wanted to take her belt off my face goes red as i realise what i said i explained it to my assistant manager who just laughed and told me to think about what i say before i say it tl dr complimented a customer on her belt and told

In [25]:
#adding stemmed and lemmatized in a dataframe
df_pre = pd.DataFrame(data=[stem_spam, tokens_lem], index=['post_stem', 'post_lem'])

In [26]:
#transpose
df_pre = df_pre.T

In [27]:
df_pre.head()

Unnamed: 0,post_stem,post_lem
0,by accidentally hitting on a customer not a l...,by accidentally hitting on a customer not a l...
1,by making someone think i was going to murder...,by making someone think i was going to murder...
2,by having my husband examine me this happened...,by having my husband examine me this happened...
3,by taking a new dose of adhd medication and t...,by taking a new dose of adhd medication and t...
4,by sitting too close to the fire to escape he...,by sitting too close to the fire to escape he...


In [28]:
#adding the target value to dataframe
df_pre['tifu'] = df['tifu']

In [29]:
df_pre.to_csv('../datasets/df_pre.csv', index = False) 

In [30]:
df_pre = pd.read_csv('../datasets/df_pre.csv') #reimport data

In [31]:
df_pre.shape

(1482, 3)

In [32]:
df_pre.head(10)

Unnamed: 0,post_stem,post_lem,tifu
0,by accidentally hitting on a customer not a l...,by accidentally hitting on a customer not a l...,1
1,by making someone think i was going to murder...,by making someone think i was going to murder...,1
2,by having my husband examine me this happened...,by having my husband examine me this happened...,1
3,by taking a new dose of adhd medication and t...,by taking a new dose of adhd medication and t...,1
4,by sitting too close to the fire to escape he...,by sitting too close to the fire to escape he...,1
5,by assuming someone was a child so this was q...,by assuming someone was a child so this was q...,1
6,by sending fake homework to school for month...,by sending fake homework to school for month...,1
7,by scratching jalapeño pepper oil all over my...,by scratching jalapeño pepper oil all over my...,1
8,by swimming causing month marital separation...,by swimming causing month marital separation...,1
9,by drinking way too much passing out and this...,by drinking way too much passing out and this...,1


In [33]:
#check for any nan values again for new data frame
df_pre.isnull().sum()

post_stem    1
post_lem     1
tifu         0
dtype: int64

In [34]:
#dropping nan values in new dataframe
df_pre.dropna(inplace = True)

## Train-Test Split

A train-test split will be perform for the data, which will share the target values. This will allows a direct comparison fir the modeling accuracy scores between stemmed and lemmatized vectors.

In [35]:
X = df_pre[['post_stem', 'post_lem']]
y = df_pre['tifu']

In [36]:
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=r, stratify=y)

In [37]:
len(X_train)

1110

In [38]:
len(X_test)

371

In [39]:
len(y_train)

1110

In [40]:
len(y_test)

371

In [41]:
y_train = pd.DataFrame(y_train, columns=['tifu'])
y_test = pd.DataFrame(y_test, columns=['tifu'])

In [42]:
X_train.shape

(1110, 2)

In [43]:
X_test.shape

(371, 2)

In [44]:
y_train.shape

(1110, 1)

In [45]:
y_test.shape

(371, 1)

In [46]:
X_train.to_csv('../datasets/X_train.csv', index = True)
X_test.to_csv('../datasets/X_test.csv', index = True)
y_train.to_csv('../datasets/y_train.csv', index = True)
y_test.to_csv('../datasets/y_test.csv', index = True)

## Model Selection

### Baseline accuracy

In [47]:
y.value_counts(normalize = True)

0    0.598244
1    0.401756
Name: tifu, dtype: float64

### GridSearch CV

GridSearch CV tool allows multiple hyperparameters to program across our models. Individual model will be generated for each combination of our ideal hyperparameters, with the optimal highest-scoring result

#### CountVectorizer

In [48]:
 steps_cv =  [ #list of pipeline steps for each model 
    [('cv',CountVectorizer()),('multi_nb',MultinomialNB())],
    [('cv',CountVectorizer()),('scaler',StandardScaler(with_mean=False)),('knn',KNeighborsClassifier())], 
    [('cv',CountVectorizer()),('scaler',StandardScaler(with_mean=False)),('logreg',LogisticRegression())]]

In [49]:
pipe_titles = ['multi_nb','knn','logreg']

In [50]:
pipe_params_cv = [
    {"cv__stop_words":['english'], "cv__ngram_range":[(1,1),(1,2)]},
    {"cv__stop_words":['english'], "cv__ngram_range":[(1,1),(1,2)]},
    {"cv__stop_words":['english'], "cv__ngram_range":[(1,1),(1,2)]}
]

In [51]:
# instantiate results DataFrame

grid_results = pd.DataFrame(columns=['step','best_params','train_accuracy','test_accuracy','tn','fp','fn','tp'])

In [52]:
X_train_pre = X_train['post_lem']
X_test_pre = X_test['post_lem']

In [53]:
for i in tqdm(range(len(steps_cv))):           # timed loop through index of number of steps
    pipe = Pipeline(steps=steps_cv[i])         # configure pipeline for each model
    grid = GridSearchCV(pipe, pipe_params_cv[i], cv=3) # fit GridSearchCV to model and model's params

    model_results = {}

    grid.fit(X_train_pre, y_train)
    
    print('Step: ',pipe_titles[i])
    model_results['step'] = pipe_titles[i] + '_cv'

    print('Best Params: ', grid.best_params_)
    model_results['best_params'] = grid.best_params_

    print(grid.score(X_train_pre, y_train), '\n')
    model_results['train_accuracy'] = grid.score(X_train_pre, y_train)
    
    print(grid.score(X_test_pre, y_test), '\n')
    model_results['test_accuracy'] = grid.score(X_test_pre, y_test)

    # Display the confusion matrix results showing true/false positive/negative
    tn, fp, fn, tp = confusion_matrix(y_test, grid.predict(X_test_pre)).ravel() 
    print("True Negatives: %s" % tn)
    model_results['tn'] = tn

    print("False Positives: %s" % fp)  
    model_results['fp'] = fp

    print("False Negatives: %s" % fn)
    model_results['fn'] = fn

    print("True Positives: %s" % tp, '\n')
    model_results['tp'] = tp

    grid_results = grid_results.append(model_results, ignore_index=True)

  return f(**kwargs)
  return f(**kwargs)
  return f(**kwargs)
  return f(**kwargs)
  return f(**kwargs)
  return f(**kwargs)
  return f(**kwargs)


Step:  multi_nb
Best Params:  {'cv__ngram_range': (1, 1), 'cv__stop_words': 'english'}
0.9711711711711711 



 33%|███▎      | 1/3 [00:02<00:05,  2.88s/it]

0.8652291105121294 

True Negatives: 194
False Positives: 28
False Negatives: 22
True Positives: 127 



  self._final_estimator.fit(Xt, y, **fit_params_last_step)
  self._final_estimator.fit(Xt, y, **fit_params_last_step)
  self._final_estimator.fit(Xt, y, **fit_params_last_step)
  self._final_estimator.fit(Xt, y, **fit_params_last_step)
  self._final_estimator.fit(Xt, y, **fit_params_last_step)
  self._final_estimator.fit(Xt, y, **fit_params_last_step)
  self._final_estimator.fit(Xt, y, **fit_params_last_step)


Step:  knn
Best Params:  {'cv__ngram_range': (1, 1), 'cv__stop_words': 'english'}
0.5981981981981982 



 67%|██████▋   | 2/3 [00:05<00:02,  2.93s/it]

0.5983827493261455 

True Negatives: 222
False Positives: 0
False Negatives: 149
True Positives: 0 



  return f(**kwargs)
  return f(**kwargs)
  return f(**kwargs)
  return f(**kwargs)
  return f(**kwargs)
  return f(**kwargs)
  return f(**kwargs)


Step:  logreg
Best Params:  {'cv__ngram_range': (1, 1), 'cv__stop_words': 'english'}
1.0 



100%|██████████| 3/3 [00:10<00:00,  3.67s/it]

0.8301886792452831 

True Negatives: 210
False Positives: 12
False Negatives: 51
True Positives: 98 






In [54]:
grid_results_cv = grid_results

In [55]:
grid_results.sort_values('test_accuracy', ascending = False)

Unnamed: 0,step,best_params,train_accuracy,test_accuracy,tn,fp,fn,tp
0,multi_nb_cv,"{'cv__ngram_range': (1, 1), 'cv__stop_words': ...",0.971171,0.865229,194,28,22,127
2,logreg_cv,"{'cv__ngram_range': (1, 1), 'cv__stop_words': ...",1.0,0.830189,210,12,51,98
1,knn_cv,"{'cv__ngram_range': (1, 1), 'cv__stop_words': ...",0.598198,0.598383,222,0,149,0


#### TFIDFVectorizer

In [56]:
 steps_tf =  [ #list of pipeline steps for each model
    [('tf',TfidfVectorizer()),('multi_nb',MultinomialNB())],
    [('tf',TfidfVectorizer()),('scaler',StandardScaler(with_mean=False)),('knn',KNeighborsClassifier())], 
    [('tf',TfidfVectorizer()),('scaler',StandardScaler(with_mean=False)),('logreg',LogisticRegression())]]

In [57]:
pipe_params_tf = [
    {"tf__stop_words":['english'], "tf__ngram_range":[(1,1),(1,2)]},
    {"tf__stop_words":['english'], "tf__ngram_range":[(1,1),(1,2)]},
    {"tf__stop_words":['english'], "tf__ngram_range":[(1,1),(1,2)]}
]

In [58]:
for i in tqdm(range(len(steps_tf))):           # timed loop through index of number of steps
    pipe = Pipeline(steps=steps_tf[i])         # configure pipeline for each model
    grid = GridSearchCV(pipe, pipe_params_tf[i], cv=3) # fit GridSearchCV to model and model's params

    model_results = {}

    grid.fit(X_train_pre, y_train)
    
    print('Step: ',pipe_titles[i])
    model_results['step'] = pipe_titles[i] + '_tf'

    print('Best Params: ', grid.best_params_)
    model_results['best_params'] = grid.best_params_

    print(grid.score(X_train_pre, y_train), '\n')
    model_results['train_accuracy'] = grid.score(X_train_pre, y_train)
    
    print(grid.score(X_test_pre, y_test), '\n')
    model_results['test_accuracy'] = grid.score(X_test_pre, y_test)

    # Display the confusion matrix results showing true/false positive/negative
    tn, fp, fn, tp = confusion_matrix(y_test, grid.predict(X_test_pre)).ravel() 
    print("True Negatives: %s" % tn)
    model_results['tn'] = tn

    print("False Positives: %s" % fp)  
    model_results['fp'] = fp

    print("False Negatives: %s" % fn)
    model_results['fn'] = fn

    print("True Positives: %s" % tp, '\n')
    model_results['tp'] = tp

    grid_results = grid_results.append(model_results, ignore_index=True)

  return f(**kwargs)
  return f(**kwargs)
  return f(**kwargs)
  return f(**kwargs)
  return f(**kwargs)
  return f(**kwargs)
  return f(**kwargs)


Step:  multi_nb
Best Params:  {'tf__ngram_range': (1, 2), 'tf__stop_words': 'english'}
0.9819819819819819 



 33%|███▎      | 1/3 [00:03<00:06,  3.48s/it]

0.8194070080862533 

True Negatives: 219
False Positives: 3
False Negatives: 64
True Positives: 85 



  self._final_estimator.fit(Xt, y, **fit_params_last_step)
  self._final_estimator.fit(Xt, y, **fit_params_last_step)
  self._final_estimator.fit(Xt, y, **fit_params_last_step)
  self._final_estimator.fit(Xt, y, **fit_params_last_step)
  self._final_estimator.fit(Xt, y, **fit_params_last_step)
  self._final_estimator.fit(Xt, y, **fit_params_last_step)
  self._final_estimator.fit(Xt, y, **fit_params_last_step)


Step:  knn
Best Params:  {'tf__ngram_range': (1, 1), 'tf__stop_words': 'english'}
0.5981981981981982 



 67%|██████▋   | 2/3 [00:06<00:03,  3.37s/it]

0.5983827493261455 

True Negatives: 222
False Positives: 0
False Negatives: 149
True Positives: 0 



  return f(**kwargs)
  return f(**kwargs)
  return f(**kwargs)
  return f(**kwargs)
  return f(**kwargs)
  return f(**kwargs)
  return f(**kwargs)


Step:  logreg
Best Params:  {'tf__ngram_range': (1, 1), 'tf__stop_words': 'english'}
1.0 



100%|██████████| 3/3 [00:11<00:00,  3.74s/it]

0.8867924528301887 

True Negatives: 207
False Positives: 15
False Negatives: 27
True Positives: 122 






In [59]:
grid_results_tf = grid_results

In [60]:
grid_results_tf.sort_values('test_accuracy', ascending = False).head(20)

Unnamed: 0,step,best_params,train_accuracy,test_accuracy,tn,fp,fn,tp
5,logreg_tf,"{'tf__ngram_range': (1, 1), 'tf__stop_words': ...",1.0,0.886792,207,15,27,122
0,multi_nb_cv,"{'cv__ngram_range': (1, 1), 'cv__stop_words': ...",0.971171,0.865229,194,28,22,127
2,logreg_cv,"{'cv__ngram_range': (1, 1), 'cv__stop_words': ...",1.0,0.830189,210,12,51,98
3,multi_nb_tf,"{'tf__ngram_range': (1, 2), 'tf__stop_words': ...",0.981982,0.819407,219,3,64,85
1,knn_cv,"{'cv__ngram_range': (1, 1), 'cv__stop_words': ...",0.598198,0.598383,222,0,149,0
4,knn_tf,"{'tf__ngram_range': (1, 1), 'tf__stop_words': ...",0.598198,0.598383,222,0,149,0


From the model types, we can see TFIDF Logisitic Regression did the best, followed by CountVectorized Multinomial Naive-Bayes. Hence,

##### Model Selections:
- 1. Lemmatized TFIDF Logistic Regression (tf__ngram_range:(1,1))
- 2. Lemmatized CountVectorizer Multinomial Naive_Bayes (cv__ngram_range:(1,1)) (requirement)