# Restaurant Review Sentiment Analysis and Feedback
This project focuses on building a machine learning model to analyze and classify customer reviews of restaurants as either positive or negative. The goal is to understand customer sentiments and provide actionable insights based on the textual feedback.

## Project Overview
Restaurants receive numerous reviews daily, and manually analyzing these reviews to gauge customer satisfaction can be time-consuming and inefficient. By leveraging Natural Language Processing (NLP) techniques and machine learning algorithms, this project automates the sentiment analysis process, enabling quick and accurate identification of customer opinions.

## 1. Importing Libraries
Description: Importing necessary libraries for data processing, machine learning, and sentiment analysis.

In [12]:
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import re
import nltk
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.model_selection import GridSearchCV

# Downloading NLTK data
nltk.download('stopwords')
nltk.download('vader_lexicon')
from nltk.sentiment.vader import SentimentIntensityAnalyzer





[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\kumar\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\kumar\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


## 2. Loading and Preprocessing Data
Loading the dataset and displaying the first few rows to understand the structure.

In [13]:
# Loading the dataset
# Use raw string literal to avoid issues with file path
data = pd.read_csv('C:\\Users\\kumar\\Downloads\\Restaurant_Reviews.tsv', sep='\t', quoting=3)

# Display the first few rows of the dataset
data.head()

Unnamed: 0,Review,Liked
0,Wow... Loved this place.,1
1,Crust is not good.,0
2,Not tasty and the texture was just nasty.,0
3,Stopped by during the late May bank holiday of...,1
4,The selection on the menu was great and so wer...,1


## 3. Text Cleaning Function
Defining a function to clean and preprocess the text data, including removing HTML tags, non-letter characters, and stopwords, as well as applying stemming.

In [14]:
# Function to clean text data
def clean_text(review):
    # Remove HTML tags
    review = re.sub('<.*?>', ' ', review)  
    # Keep only letters
    review = re.sub('[^a-zA-Z]', ' ', review)  
    # Convert to lowercase
    review = review.lower()  
    # Split into words
    review = review.split()  
    # Initialize PorterStemmer for stemming
    ps = PorterStemmer()  
    # Remove stopwords and apply stemming
    review = [ps.stem(word) for word in review if not word in set(stopwords.words('english'))]  
    # Join words back into a single string
    review = ' '.join(review)  
    return review

# Cleaning all reviews in the dataset
corpus = [clean_text(review) for review in data['Review']]

## 4. TF-IDF Vectorization
Converting the cleaned text into numerical features using TF-IDF Vectorization.

In [15]:
# Vectorizing the text data using TF-IDF
tfidf = TfidfVectorizer(max_features=1500)
x = tfidf.fit_transform(corpus).toarray()

# Defining the dependent variable
y = data.iloc[:, 1].values

## 5. Splitting the Dataset
Splitting the dataset into training and testing sets with an 80-20 split.

In [16]:
# Splitting the dataset into training and testing sets
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.20, random_state=0)

## 6. Model Building and Hyperparameter Tuning
Building the RandomForestClassifier model and using GridSearchCV to tune hyperparameters.



In [17]:
# Initializing the RandomForestClassifier
classifier = RandomForestClassifier()

# Defining the hyperparameter grid for tuning
param_grid = {
    'n_estimators': [50, 100, 200],
    'max_features': ['auto', 'sqrt', 'log2'],
    'max_depth': [10, 50, 100, None],
    'criterion': ['gini', 'entropy']
}

# Using GridSearchCV to find the best parameters
grid_search = GridSearchCV(estimator=classifier, param_grid=param_grid, cv=5, n_jobs=-1, verbose=2)
grid_search.fit(x_train, y_train)
best_classifier = grid_search.best_estimator_

Fitting 5 folds for each of 72 candidates, totalling 360 fits


## 7. Model Evaluation
Evaluating the model's performance using the confusion matrix, classification report, and accuracy score.



In [18]:
# Predicting the test set results
y_pred = best_classifier.predict(x_test)

# Confusion Matrix
cm = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")
print(cm)

# Classification Report
cr = classification_report(y_test, y_pred)
print("Classification Report:")
print(cr)

# Accuracy Score
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")

Confusion Matrix:
[[85 12]
 [43 60]]
Classification Report:
              precision    recall  f1-score   support

           0       0.66      0.88      0.76        97
           1       0.83      0.58      0.69       103

    accuracy                           0.73       200
   macro avg       0.75      0.73      0.72       200
weighted avg       0.75      0.72      0.72       200

Accuracy: 72.50%


## 8. Sentiment Prediction Function
Creating a function to predict whether a review is positive or negative based on the trained model.

In [19]:
# Function to predict the sentiment of a review
def predict_review(review):
    # Clean the review text
    cleaned_review = clean_text(review)
    # Transform the review text into TF-IDF vector
    review_vector = tfidf.transform([cleaned_review]).toarray()
    # Predict sentiment using the best classifier
    prediction = best_classifier.predict(review_vector)
    return "Positive" if prediction == 1 else "Negative"

# Example predictions
print(predict_review("The food was excellent and the service was great!"))
print(predict_review("The food was horrible and the service was terrible."))

Positive
Negative


## 9. Suggesting Areas of Improvement
Defining a function to suggest improvements based on specific aspects like food, service, ambiance, and price.

In [20]:
# Function to suggest areas of improvement based on the review
def suggest_improvements(review):
    sia = SentimentIntensityAnalyzer()
    aspects = {
        'food': ['food', 'meal', 'dish', 'taste'],
        'service': ['service', 'staff', 'waiter', 'waitress'],
        'ambiance': ['ambiance', 'atmosphere', 'environment'],
        'price': ['price', 'cost', 'value']
    }
    suggestions = []
    cleaned_review = clean_text(review)
    for aspect, keywords in aspects.items():
        if any(word in cleaned_review for word in keywords):
            sentiment_score = sia.polarity_scores(review)
            if sentiment_score['neg'] > sentiment_score['pos']:
                suggestions.append(f"Improve {aspect}")
    return suggestions if suggestions else ["No specific improvements needed"]

# Example suggestion
print(suggest_improvements("The food was horrible and the service was terrible."))

['Improve food']


## 10. Full Review Analysis
Combining sentiment prediction and improvement suggestions into a full analysis function for restaurant reviews.



In [21]:
# Function to perform full review analysis: sentiment prediction and improvement suggestions
def full_review_analysis(review):
    sentiment = predict_review(review)
    suggestions = suggest_improvements(review) if sentiment == "Negative" else ["No improvements needed"]
    return sentiment, suggestions


## Example 1

In [22]:
# Example full analysis
review1 = "The food was horrible and the service was terrible."
sentiment, suggestions = full_review_analysis(review1)
print("Review 1: ", review1)
print(f"Review Sentiment: {sentiment}")
print("Suggestions for Improvement:")
for suggestion in suggestions:
    print(suggestion)

Review 1:  The food was horrible and the service was terrible.
Review Sentiment: Negative
Suggestions for Improvement:
Improve food


## 2

In [23]:
review2 = "The food was good and the service was nice."
print("Review 2: ", review2)
sentiment, suggestions = full_review_analysis(review2)
print(f"Review Sentiment: {sentiment}")
print("Suggestions for Improvement:")
for suggestion in suggestions:
    print(suggestion)

Review 2:  The food was good and the service was nice.
Review Sentiment: Positive
Suggestions for Improvement:
No improvements needed
