# Vectorizer Tuning

In [2]:
import pandas as pd
df = pd.read_csv("reviews.csv")

df.head()

Unnamed: 0,target,reviews
0,neg,"plot : two teen couples go to a church party ,..."
1,neg,the happy bastard's quick movie review \ndamn ...
2,neg,it is movies like these that make a jaded movi...
3,neg,""" quest for camelot "" is warner bros . ' firs..."
4,neg,synopsis : a mentally unstable man undergoing ...


The dataset is made up of positive and negative movie reviews.

## Preprocessing

👇 Remove punctuation and lower case the text.

In [3]:
import nltk
import string 

In [4]:
def remove_punct(x):
    for p in string.punctuation:
        x = x.replace(p, '').lower()
    return x

df["clean_reviews"] = df['reviews'].apply(lambda x: remove_punct(x))

df.head()

Unnamed: 0,target,reviews,clean_reviews
0,neg,"plot : two teen couples go to a church party ,...",plot two teen couples go to a church party d...
1,neg,the happy bastard's quick movie review \ndamn ...,the happy bastards quick movie review \ndamn t...
2,neg,it is movies like these that make a jaded movi...,it is movies like these that make a jaded movi...
3,neg,""" quest for camelot "" is warner bros . ' firs...",quest for camelot is warner bros first fe...
4,neg,synopsis : a mentally unstable man undergoing ...,synopsis a mentally unstable man undergoing p...


## Tuning

👇 Tune a vectorizer of your choice (or try both!) and a MultinomialNB model simultaneously.

In [5]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import make_pipeline

In [6]:
y = df.target
X = df["clean_reviews"]

# Create Pipeline
pipe = make_pipeline(
        (TfidfVectorizer()),
        (MultinomialNB()))

In [11]:
parameters = {
'tfidfvectorizer__ngram_range': ((1,1), (2,2)),
'multinomialnb__alpha': (0.1,1),}

grid_search = GridSearchCV(pipe, parameters, n_jobs=-1,
            verbose=1, scoring = "accuracy", 
            refit=True, cv=5)

grid_search.fit(X, y)

grid_search.best_params_

⚠️ Please push the exercise once you are done 🙃

## 🏁 