**Introduction**

This project focuses on performing sentiment analysis on customer reviews from the Swiggy dataset. The goal is to automatically classify customer feedback as positive or negative based on the text of their review. By analyzing sentiments, this project aims to understand customer satisfaction levels and identify areas for improvement in restaurant services.

The dataset contains various attributes such as restaurant details, ratings, food items, and reviews. The textual reviews are preprocessed — including converting to lowercase, removing special characters, and tokenizing the text. The cleaned text data is then transformed into numerical sequences and used to train a Recurrent Neural Network (RNN) model using TensorFlow and Keras.

The model learns patterns in the review text to predict sentiment accurately. After training, it achieved around 72% test accuracy, demonstrating the ability of RNNs to capture contextual meaning in sequential data. Finally, a prediction function allows testing new customer reviews to determine their sentiment in real-time.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
import os
os.chdir('/content/drive/MyDrive/Projects/NLP/Sentiment Analysis with an Recurrent Neural Networks')

**Importing Libraries and Dataset**

In [4]:
import pandas as pd
import numpy as np
import re
from sklearn.model_selection import train_test_split
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN, Dense, Embedding

**Loading Dataset**

In [5]:
data = pd.read_csv('swiggy.csv')
print("Columns in the dataset:")
print(data.columns.tolist())

Columns in the dataset:
['ID', 'Area', 'City', 'Restaurant Price', 'Avg Rating', 'Total Rating', 'Food Item', 'Food Type', 'Delivery Time', 'Review']


**Text Cleaning and Sentiment Labeling**

In [6]:
data["Review"] = data["Review"].str.lower()
data["Review"] = data["Review"].replace(r'[^a-z0-9\s]', '', regex=True)

data['sentiment'] = data['Avg Rating'].apply(lambda x: 1 if x > 3.5 else 0)
data = data.dropna()

**Tokenization and Padding**

In [7]:
max_features = 5000
max_length = 200

tokenizer = Tokenizer(num_words=max_features)
tokenizer.fit_on_texts(data["Review"])
X = pad_sequences(tokenizer.texts_to_sequences(
    data["Review"]), maxlen=max_length)
y = data['sentiment'].values

**Splitting the Data**

In [8]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)
X_train, X_val, y_train, y_val = train_test_split(
    X_train, y_train, test_size=0.1, random_state=42, stratify=y_train
)

**Building RNN Model**

In [9]:
model = Sequential([
    Embedding(input_dim=max_features, output_dim=16, input_length=max_length),
    SimpleRNN(64, activation='tanh', return_sequences=False),
    Dense(1, activation='sigmoid')
])

model.compile(
    loss='binary_crossentropy',
    optimizer='adam',
    metrics=['accuracy']
)



**Training and Evaluating Model**

In [10]:
history = model.fit(
    X_train, y_train,
    epochs=5,
    batch_size=32,
    validation_data=(X_val, y_val),
    verbose=1
)

score = model.evaluate(X_test, y_test, verbose=0)
print(f"Test accuracy: {score[1]:.2f}")

Epoch 1/5
[1m180/180[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 40ms/step - accuracy: 0.7223 - loss: 0.6027 - val_accuracy: 0.7156 - val_loss: 0.5989
Epoch 2/5
[1m180/180[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 46ms/step - accuracy: 0.7188 - loss: 0.5968 - val_accuracy: 0.7156 - val_loss: 0.5972
Epoch 3/5
[1m180/180[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 33ms/step - accuracy: 0.7131 - loss: 0.5998 - val_accuracy: 0.7156 - val_loss: 0.5966
Epoch 4/5
[1m180/180[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 44ms/step - accuracy: 0.7170 - loss: 0.5958 - val_accuracy: 0.7156 - val_loss: 0.5971
Epoch 5/5
[1m180/180[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 34ms/step - accuracy: 0.7187 - loss: 0.5940 - val_accuracy: 0.7156 - val_loss: 0.5960
Test accuracy: 0.72


**Predicting Sentiment**

In [11]:
def predict_sentiment(review_text):
    text = review_text.lower()
    text = re.sub(r'[^a-z0-9\s]', '', text)

    seq = tokenizer.texts_to_sequences([text])
    padded = pad_sequences(seq, maxlen=max_length)

    prediction = model.predict(padded)[0][0]
    return f"{'Positive' if prediction >= 0.5 else 'Negative'} (Probability: {prediction:.2f})"


sample_review = "The food was great."
print(f"Review: {sample_review}")
print(f"Sentiment: {predict_sentiment(sample_review)}")

Review: The food was great.
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 165ms/step
Sentiment: Positive (Probability: 0.71)


**Conclusion**

In this project, we developed a Sentiment Analysis model to automatically classify Swiggy customer reviews as positive or negative using a Recurrent Neural Network (RNN) architecture. The primary goal was to analyze customer feedback and understand satisfaction levels based on textual reviews.

The dataset was first cleaned and preprocessed — converting text to lowercase, removing special characters, and tokenizing the reviews. Each review was then converted into numerical sequences and padded to a fixed length for input into the RNN model. The sentiment labels were created based on the average customer rating, where reviews with an average rating greater than 3.5 were considered positive and the rest negative.

The RNN model consisted of an Embedding layer, a SimpleRNN layer with 64 units, and a Dense output layer with a sigmoid activation function. It was trained on the processed data for 5 epochs using the Adam optimizer and binary cross-entropy loss function.

After training, the model achieved the following performance metrics:

Training Accuracy: ~72%

Validation Accuracy: ~71.5%

Test Accuracy: 72%

These results indicate that the model learned meaningful patterns from the review text and could effectively predict customer sentiment with reasonable accuracy.

A custom prediction function was also implemented to test new reviews. For example:

Input Review: “The food was great.”
Predicted Sentiment: Positive (Probability: 0.71)

This demonstrates that the model successfully identifies the sentiment of unseen reviews based on learned textual patterns.

Overall, the project highlights the effectiveness of Recurrent Neural Networks for Natural Language Processing (NLP) tasks like sentiment analysis. Future improvements could involve using more advanced architectures such as LSTM or Bidirectional LSTM, integrating pretrained word embeddings (e.g., GloVe or Word2Vec), and performing hyperparameter tuning to enhance prediction accuracy and robustness.