# Sentiment Analysis of Restaurant Reviews

### This script performs sentiment analysis on a collection of restaurant reviews using machine learning techniques. Sentiment analysis aims to determine the sentiment expressed in text, specifically whether a review conveys a positive or negative sentiment about a restaurant experience.

### Steps:
1. Data Loading and Exploration: The script loads a dataset containing restaurant reviews from a TSV file.It checks for any missing values and provides a glimpse of the dataset's structure.

2. Data Preprocessing: The text data in the reviews undergoes preprocessing to ensure consistency and remove irrelevant information. The text is converted to lowercase, tokenized into words, and stemmed to their root forms. Common stopwords are also removed to focus on meaningful content.

3. Feature Extraction: To enable machine learning models to process text data, the preprocessed reviews are transformed into numerical features. This process, known as Count vectorization, assigns weights to words based on their importance within each review and across the entire dataset.

4. Model Selection and Evaluation: The script considers multiple machine learning models for sentiment analysis, including Multinomial Naive Bayes, Random Forest, Gradient Boosting, XG Boost and Support Vector Classifier (SVC).Each model is trained on a subset of the data and evaluated on another subset to measure its predictive ability.

5. Best Model Identification: During evaluation, the script identifies the best-performing model based on its accuracy in predicting sentiment. This helps determine which model is most effective for this analysis.

6. Evaluation Metrics and Insights: The script provides evaluation metrics such as accuracy, which reflects the proportion of correctly predicted sentiments. Additionally, it generates a confusion matrix and a classification report to offer insights into model performance for positive and negative sentiments.

7. Conclusion: Sentiment analysis of restaurant reviews holds practical value for understanding customer opinions, improving restaurant services, and making informed business decisions. This script demonstrates the process of preprocessing text, training machine learning models, and evaluating their performance in sentiment analysis.

### Sentiment analysis plays a crucial role in extracting valuable insights from unstructured text data, contributing to enhanced customer experiences and data-driven decision-making in the restaurant industry.

Import necessary libraries

In [1]:
import numpy as np
import pandas as pd
from nltk.stem import PorterStemmer
from string import punctuation
from spacy.lang.en import STOP_WORDS
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.ensemble import RandomForestClassifier,GradientBoostingClassifier
from xgboost import XGBClassifier
from sklearn.svm import SVC
from sklearn.metrics import classification_report,confusion_matrix,accuracy_score

Load the dataset


In [2]:
review=pd.read_csv("https://raw.githubusercontent.com/Timmapuram-Karthik/Edunet-IBM-Restaurant-Sentiment/main/Restaurant_Reviews.tsv",sep="\t")

Display the loaded dataset

In [3]:
review

Unnamed: 0,Review,Liked
0,Wow... Loved this place.,1
1,Crust is not good.,0
2,Not tasty and the texture was just nasty.,0
3,Stopped by during the late May bank holiday of...,1
4,The selection on the menu was great and so wer...,1
...,...,...
995,I think food should have flavor and texture an...,0
996,Appetite instantly gone.,0
997,Overall I was not impressed and would not go b...,0
998,"The whole experience was underwhelming, and I ...",0


Check for any missing values in the dataset

In [4]:
review.isnull().sum()

Review    0
Liked     0
dtype: int64

Display the first few rows of the dataset

In [5]:
review.head()

Unnamed: 0,Review,Liked
0,Wow... Loved this place.,1
1,Crust is not good.,0
2,Not tasty and the texture was just nasty.,0
3,Stopped by during the late May bank holiday of...,1
4,The selection on the menu was great and so wer...,1


Display the last few rows of the dataset

In [6]:
review.tail()

Unnamed: 0,Review,Liked
995,I think food should have flavor and texture an...,0
996,Appetite instantly gone.,0
997,Overall I was not impressed and would not go b...,0
998,"The whole experience was underwhelming, and I ...",0
999,"Then, as if I hadn't wasted enough of my life ...",0


Preprocess the data

In [7]:
corpus=[]
stopwords=list(STOP_WORDS)
stopwords_to_remove=["n‘t","n't","n’t","not"]
stopwords=[word for word in stopwords if word not in stopwords_to_remove]

In [8]:
for i in range(review.shape[0]):
    data=review.iloc[i,0]
    data=data.lower()
    data=data.split()
    ps=PorterStemmer()
    data=[ps.stem(word) for word in data if not word in set(stopwords)]
    data=' '.join(data)
    corpus.append(data)

Create Count vectorizer and transform the text data into a numerical format

In [9]:
cv=CountVectorizer(max_features=1500, ngram_range=(1, 2), stop_words='english')
x=cv.fit_transform(corpus).toarray()
y=review['Liked'].values

Split the data into training and testing sets

In [10]:
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=.2)

Multinomial Naive Bayes models for sentiment analysis

In [11]:
MultinomialNB=MultinomialNB()
MultinomialNB.fit(x_train,y_train)
train_score=MultinomialNB.score(x_train,y_train)
test_score=MultinomialNB.score(x_test,y_test)
print(f"Train Accuracy = {train_score}\nTest Accuracy = {test_score}")

Train Accuracy = 0.91875
Test Accuracy = 0.77


In [12]:
y_pred=MultinomialNB.predict(x_test)
conf_matrix = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")
print(conf_matrix)

Confusion Matrix:
[[77 26]
 [20 77]]


In [13]:
class_rep = classification_report(y_test, y_pred)
print("Classification Report:")
print(class_rep)

Classification Report:
              precision    recall  f1-score   support

           0       0.79      0.75      0.77       103
           1       0.75      0.79      0.77        97

    accuracy                           0.77       200
   macro avg       0.77      0.77      0.77       200
weighted avg       0.77      0.77      0.77       200



Random Forest models for sentiment analysis

In [14]:
RandomForestClassifier=RandomForestClassifier(n_estimators=100,max_depth=4)
RandomForestClassifier.fit(x_train,y_train)
train_score=RandomForestClassifier.score(x_train,y_train)
test_score=RandomForestClassifier.score(x_test,y_test)
print(f"Train Accuracy = {train_score}\nTest Accuracy = {test_score}")

Train Accuracy = 0.7925
Test Accuracy = 0.725


In [15]:
y_pred=RandomForestClassifier.predict(x_test)
conf_matrix = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")
print(conf_matrix)

Confusion Matrix:
[[94  9]
 [46 51]]


In [16]:
class_rep = classification_report(y_test, y_pred)
print("Classification Report:")
print(class_rep)

Classification Report:
              precision    recall  f1-score   support

           0       0.67      0.91      0.77       103
           1       0.85      0.53      0.65        97

    accuracy                           0.73       200
   macro avg       0.76      0.72      0.71       200
weighted avg       0.76      0.72      0.71       200



Gradient Boosting models for sentiment analysis

In [17]:
GradientBoostingClassifier=GradientBoostingClassifier(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42)
GradientBoostingClassifier.fit(x_train,y_train)
train_score=GradientBoostingClassifier.score(x_train,y_train)
test_score=GradientBoostingClassifier.score(x_test,y_test)
print(f"Train Accuracy = {train_score}\nTest Accuracy = {test_score}")

Train Accuracy = 0.81875
Test Accuracy = 0.8


In [18]:
y_pred=GradientBoostingClassifier.predict(x_test)
conf_matrix = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")
print(conf_matrix)

Confusion Matrix:
[[99  4]
 [36 61]]


In [19]:
class_rep = classification_report(y_test, y_pred)
print("Classification Report:")
print(class_rep)

Classification Report:
              precision    recall  f1-score   support

           0       0.73      0.96      0.83       103
           1       0.94      0.63      0.75        97

    accuracy                           0.80       200
   macro avg       0.84      0.80      0.79       200
weighted avg       0.83      0.80      0.79       200



XG Boost models for sentiment analysis

In [20]:
xgb_classifier = XGBClassifier(n_estimators=100,learning_rate=0.1,max_depth=3,random_state=42)
xgb_classifier.fit(x_train,y_train)
train_score=xgb_classifier.score(x_train,y_train)
test_score=xgb_classifier.score(x_test,y_test)
print(f"Train Accuracy = {train_score}\nTest Accuracy = {test_score}")

Train Accuracy = 0.74875
Test Accuracy = 0.745


In [21]:
y_pred=xgb_classifier.predict(x_test)
conf_matrix = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")
print(conf_matrix)

Confusion Matrix:
[[100   3]
 [ 48  49]]


In [22]:
class_rep = classification_report(y_test, y_pred)
print("Classification Report:")
print(class_rep)

Classification Report:
              precision    recall  f1-score   support

           0       0.68      0.97      0.80       103
           1       0.94      0.51      0.66        97

    accuracy                           0.74       200
   macro avg       0.81      0.74      0.73       200
weighted avg       0.80      0.74      0.73       200



Support Vector Machine models for sentiment analysis

In [23]:
SVC = SVC(kernel='linear')
SVC.fit(x_train,y_train)
train_score=SVC.score(x_train,y_train)
test_score=SVC.score(x_test,y_test)
print(f"Train Accuracy = {train_score}\nTest Accuracy = {test_score}")

Train Accuracy = 0.97375
Test Accuracy = 0.78


In [24]:
y_pred=SVC.predict(x_test)
conf_matrix = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")
print(conf_matrix)

Confusion Matrix:
[[84 19]
 [25 72]]


In [25]:
class_rep = classification_report(y_test, y_pred)
print("Classification Report:")
print(class_rep)

Classification Report:
              precision    recall  f1-score   support

           0       0.77      0.82      0.79       103
           1       0.79      0.74      0.77        97

    accuracy                           0.78       200
   macro avg       0.78      0.78      0.78       200
weighted avg       0.78      0.78      0.78       200

