<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Natural-Language-Processing" data-toc-modified-id="Natural-Language-Processing-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Natural Language Processing</a></span><ul class="toc-item"><li><span><a href="#Importing-the-libraries" data-toc-modified-id="Importing-the-libraries-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Importing the libraries</a></span></li><li><span><a href="#Importing-the-dataset" data-toc-modified-id="Importing-the-dataset-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Importing the dataset</a></span></li><li><span><a href="#Cleaning-the-texts" data-toc-modified-id="Cleaning-the-texts-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Cleaning the texts</a></span></li><li><span><a href="#Creating-the-Bag-of-Words-model" data-toc-modified-id="Creating-the-Bag-of-Words-model-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>Creating the Bag of Words model</a></span></li><li><span><a href="#Splitting-the-dataset-into-the-Training-set-and-Test-set" data-toc-modified-id="Splitting-the-dataset-into-the-Training-set-and-Test-set-1.5"><span class="toc-item-num">1.5&nbsp;&nbsp;</span>Splitting the dataset into the Training set and Test set</a></span></li><li><span><a href="#Training-the-Naive-Bayes-model-on-the-Training-set" data-toc-modified-id="Training-the-Naive-Bayes-model-on-the-Training-set-1.6"><span class="toc-item-num">1.6&nbsp;&nbsp;</span>Training the Naive Bayes model on the Training set</a></span></li><li><span><a href="#Predicting-the-Test-set-results" data-toc-modified-id="Predicting-the-Test-set-results-1.7"><span class="toc-item-num">1.7&nbsp;&nbsp;</span>Predicting the Test set results</a></span></li><li><span><a href="#Making-the-Confusion-Matrix" data-toc-modified-id="Making-the-Confusion-Matrix-1.8"><span class="toc-item-num">1.8&nbsp;&nbsp;</span>Making the Confusion Matrix</a></span></li><li><span><a href="#Predicting-Single-Review" data-toc-modified-id="Predicting-Single-Review-1.9"><span class="toc-item-num">1.9&nbsp;&nbsp;</span>Predicting Single Review</a></span></li></ul></li></ul></div>

# Natural Language Processing

## Importing the libraries

In [5]:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

## Importing the dataset

In [11]:
dataset = pd.read_csv("Restaurant_Reviews.tsv", delimiter='\t', quoting = 3) # quoting = 3 removes ""
dataset

Unnamed: 0,Review,Liked
0,Wow... Loved this place.,1
1,Crust is not good.,0
2,Not tasty and the texture was just nasty.,0
3,Stopped by during the late May bank holiday of...,1
4,The selection on the menu was great and so wer...,1
...,...,...
995,I think food should have flavor and texture an...,0
996,Appetite instantly gone.,0
997,Overall I was not impressed and would not go b...,0
998,"The whole experience was underwhelming, and I ...",0


## Cleaning the texts

In [62]:
import re
import nltk
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer

corpus = []
for i in range(len(dataset)):
    review = re.sub('[^a-zA-Z]', ' ', dataset["Review"][i])
    review = review.lower()
    review = review.split()
    ps = PorterStemmer()
    all_stopwords = stopwords.words("english")
    all_stopwords.remove('not')
    review = [ps.stem(word) for word in review if not word in set(all_stopwords)]
    review = ' '.join(review)
    corpus.append(review)

## Creating the Bag of Words model

In [82]:
from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(max_features = 1500)
inputs = cv.fit_transform(corpus).toarray()
target = dataset.iloc[:,1].values

## Splitting the dataset into the Training set and Test set

In [88]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(inputs, target, random_state = 0)

## Training the Naive Bayes model on the Training set

In [90]:
from sklearn.naive_bayes import GaussianNB
classifier = GaussianNB()
classifier.fit(x_train, y_train)

GaussianNB(priors=None, var_smoothing=1e-09)

## Predicting the Test set results

In [102]:
y_pred = classifier.predict(x_test)
pred_df = pd.DataFrame({"prediction":y_pred, "actual":y_test, "diff":y_test - y_pred})
pred_df

Unnamed: 0,prediction,actual,diff
0,1,0,-1
1,1,0,-1
2,1,0,-1
3,0,0,0
4,0,0,0
...,...,...,...
245,0,1,1
246,0,0,0
247,0,0,0
248,1,1,0


## Making the Confusion Matrix

In [105]:
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_pred, y_test)
cm

array([[ 67,  20],
       [ 50, 113]])

## Predicting Single Review

In [143]:
reviews = "I loved this place. It was amazing."

In [144]:
review = re.sub('[^a-zA-Z]', ' ', reviews)
review = review.lower()
review = review.split()
ps = PorterStemmer()
all_stopwords = stopwords.words("english")
all_stopwords.remove('not')
review = [ps.stem(word) for word in review if not word in set(all_stopwords)]
review = [' '.join(review)]
review = cv.transform(review).toarray()
y_pred = classifier.predict(review)
y_pred

array([0])

In [None]:
y_