# **<font color='orange'><b>Natural Language Processing (NLP)</b></font> - <font color='white'>Natural language processing (NLP) is a subfield of computer science and Artificial Intelligence (AI) that enables computers to understand and communicate with human language.</font>**


<img src="https://media.geeksforgeeks.org/wp-content/uploads/20240524132821/nlp-working.webp" alt="My image" width="900" height="560"/>


# **<font color='Orange'><b>Importing libraries -</b></font>**

In [63]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# **<font color='orange'><b>Importing dataset-</b></font>**


In [64]:
data = pd.read_csv('Restaurant_Reviews.tsv',delimiter = '\t',quoting = 3)

In [65]:
data

Unnamed: 0,Review,Liked
0,Wow... Loved this place.,1
1,Crust is not good.,0
2,Not tasty and the texture was just nasty.,0
3,Stopped by during the late May bank holiday of...,1
4,The selection on the menu was great and so wer...,1
...,...,...
995,I think food should have flavor and texture an...,0
996,Appetite instantly gone.,0
997,Overall I was not impressed and would not go b...,0
998,"The whole experience was underwhelming, and I ...",0


# **<font color='orange'><b>Cleaning of the dataset -</b></font>**


In [66]:
# This is for cleaning the dataset which make better understanding for machine learning model to make predictions.

import re
import nltk # Natural Language Toolkit
nltk.download( 'stopwords' )
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
corpus = []
for i in range(0,1000):
  review = re.sub('^a-zA-Z]',' ',data['Review'][i])
  review = review.lower()
  review = review.split()
  ps = PorterStemmer()
  all_stopwords = stopwords.words('english')
  all_stopwords.remove('not')
  review  =  [ps.stem(word) for word in review if not word in set(all_stopwords)]
  review = ' '.join(review)
  corpus.append(review)

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


# **<font color='orange'><b>Here you can see the result -</b></font>**


In [67]:
print(corpus)

['wow... love place.', 'crust not good.', 'not tasti textur nasty.', 'stop late may bank holiday rick steve recommend love it.', 'select menu great prices.', 'get angri want damn pho.', 'honeslti tast fresh.)', 'potato like rubber could tell made ahead time kept warmer.', 'fri great too.', 'great touch.', 'servic prompt.', 'would not go back.', 'cashier care ever say still end wayyy overpriced.', 'tri cape cod ravoli, chicken, cranberry...mmmm!', 'disgust pretti sure human hair.', 'shock sign indic cash only.', 'highli recommended.', 'waitress littl slow service.', 'place not worth time, let alon vegas.', 'not like all.', 'burritto blah!', 'food, amazing.', 'servic also cute.', 'could care less... interior beautiful.', 'performed.', "that' right....th red velvet cake.....ohhh stuff good.", '- never brought salad ask for.', 'hole wall great mexican street tacos, friendli staff.', 'took hour get food 4 tabl restaur food luke warm, sever run around like total overwhelmed.', 'worst salmon 

# **<font color='orange'><b>Creating the Bag of Words Model -</b></font>**


In [68]:
from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(max_features = 1500)
X = cv.fit_transform(corpus).toarray()
y = data.iloc[:,-1].values

In [69]:
len(X[0])

1500

# **<font color='orange'><b>Spliting the dataset on the training set or test set -</b></font>**


In [70]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = 0.20,random_state = 0)

# **<font color='orange'><b>Training teh Naive Bayes model on the Training set-</b></font>**


In [71]:
from sklearn.naive_bayes import GaussianNB
classifier = GaussianNB()
classifier.fit(X_train,y_train)

# **<font color='orange'><b>Predicting the test set result-</b></font>**


In [72]:
y_pred = classifier.predict(X_test)
print(np.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)),1))

[[1 0]
 [1 0]
 [1 0]
 [0 0]
 [0 0]
 [1 0]
 [1 1]
 [0 0]
 [1 0]
 [1 1]
 [1 1]
 [1 1]
 [1 0]
 [1 1]
 [1 1]
 [1 1]
 [1 0]
 [1 0]
 [0 0]
 [1 1]
 [0 0]
 [0 1]
 [1 1]
 [1 0]
 [1 0]
 [0 1]
 [1 1]
 [1 1]
 [1 1]
 [0 0]
 [1 1]
 [1 1]
 [1 1]
 [1 1]
 [1 1]
 [0 0]
 [1 0]
 [1 0]
 [0 0]
 [1 1]
 [1 1]
 [1 0]
 [1 1]
 [0 0]
 [0 0]
 [0 0]
 [1 0]
 [1 0]
 [0 0]
 [0 0]
 [1 1]
 [1 1]
 [1 1]
 [1 1]
 [1 0]
 [0 0]
 [1 1]
 [1 1]
 [0 0]
 [1 1]
 [1 0]
 [0 0]
 [1 0]
 [1 0]
 [1 1]
 [0 0]
 [1 1]
 [1 1]
 [1 1]
 [1 0]
 [1 1]
 [1 1]
 [1 1]
 [1 1]
 [0 0]
 [1 0]
 [1 1]
 [0 1]
 [0 0]
 [1 1]
 [0 0]
 [1 1]
 [1 1]
 [1 0]
 [1 1]
 [1 1]
 [0 0]
 [0 0]
 [1 1]
 [1 0]
 [0 0]
 [1 1]
 [0 0]
 [0 0]
 [1 0]
 [1 1]
 [1 0]
 [1 1]
 [1 1]
 [1 0]
 [0 1]
 [1 1]
 [1 1]
 [1 0]
 [0 1]
 [1 0]
 [1 1]
 [1 1]
 [0 0]
 [1 1]
 [0 1]
 [1 1]
 [0 0]
 [1 0]
 [1 1]
 [0 0]
 [1 1]
 [1 1]
 [1 1]
 [1 1]
 [1 1]
 [1 0]
 [1 1]
 [1 0]
 [0 0]
 [0 0]
 [1 1]
 [1 0]
 [0 0]
 [1 1]
 [1 0]
 [1 1]
 [0 0]
 [0 0]
 [1 1]
 [1 1]
 [1 1]
 [1 1]
 [1 1]
 [1 0]
 [0 1]
 [1 1]
 [1 1]

# **<font color='orange'><b>Making the confusion matrix -</b></font>**


In [73]:
from sklearn.metrics import confusion_matrix,accuracy_score
cm = confusion_matrix(y_test,y_pred)
print(cm)
accuracy_score(y_test,y_pred)

[[53 44]
 [11 92]]


0.725

**As in the result mentioned Here 53 means 53 correct prediction on the negative review and 92 correct prediction on the positive reviews**