### Real-Time Sentiment Analysis for Customer Feedback Using Neural Networks and Streamlit App


**To Develop a system that uses a Neural Network (NN) model to perform sentiment analysis on customer feedback provided through a web application**

#### Dataset Loading and Preprocessing

In [1]:
# Install Hugging Face dataset loader
!pip install datasets --quiet

* Installed the Hugging Face datasets library
* The library is essential for loading popular datasets, including TweetEval, which is used for sentiment analysis.

In [3]:
# Preprocessing

import re

def preprocess_text(text):
    # Lowercase the text
    text = text.lower()
    
    # Remove URLs
    text = re.sub(r'http\S+|www\S+|https\S+', '', text, flags=re.MULTILINE)
    
    # Remove mentions and hashtags
    text = re.sub(r'@\w+|#\w+', '', text)
    
    # Remove emojis and special characters (optional)
    text = re.sub(r'[^\w\s]', '', text)
    
    # Remove extra whitespace
    text = re.sub(r'\s+', ' ', text).strip()
    
    return text


This function performs preprocessing on tweet text to prepare it for the LSTM model.

In [4]:
# Load TweetEval Sentiment Dataset
from datasets import load_dataset
import pandas as pd

dataset = load_dataset("tweet_eval", "sentiment")

# Apply preprocessing to all splits
for split in ['train', 'validation', 'test']:
    dataset[split] = dataset[split].map(lambda x: {"text": preprocess_text(x["text"])})


* Loaded the TweetEval dataset using Hugging Face's datasets library.

* tweet_eval is a benchmark dataset for sentiment classification (labels: 0=Negative, 1=Neutral, 2=Positive).

* Applied preprocessing to train, validation and test splits

In [5]:
# Convert to pandas DataFrames
df_train = dataset["train"].to_pandas()
df_val = dataset["validation"].to_pandas()
df_test = dataset["test"].to_pandas()

* Converted the dataset splits (train, validation, test) to pandas DataFrames for easier manipulation.

In [6]:
# Map numerical labels to text
label_map = {0: "Negative", 1: "Neutral", 2: "Positive"}
df_train["sentiment"] = df_train["label"].map(label_map)
df_val["sentiment"] = df_val["label"].map(label_map)
df_test["sentiment"] = df_test["label"].map(label_map)


* Adds a new sentiment column with human-readable labels using a mapping dictionary.

In [7]:
# View sample data
df_train.head()

Unnamed: 0,text,label,sentiment
0,qt in the original draft of the 7th book remus...,2,Positive
1,ben smith smith concussion remains out of the ...,1,Neutral
2,sorry bout the stream last night i crashed out...,1,Neutral
3,chase headleys rbi double in the 8th inning of...,1,Neutral
4,alciato bee will invest 150 million in january...,2,Positive


* Displays the first few rows of the training dataset to verify structure and content.

### 1. LSTM-BASED SENTIMENT CLASSIFIER

In [12]:
# Import Required Libraries
import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense, Dropout
from sklearn.metrics import classification_report


* Loads the libraries needed for:

  * Text tokenization & padding (Tokenizer, pad_sequences)

  * LSTM model building (Sequential, Embedding, LSTM, Dense, Dropout)

  * Evaluation metrics (classification_report)

In [9]:
# Tokenize and Pad the text
# Parameters
vocab_size = 20000
max_len = 100

# Tokenization
tokenizer = Tokenizer(num_words=vocab_size, oov_token='<OOV>')
tokenizer.fit_on_texts(df_train['text'])

X_train = tokenizer.texts_to_sequences(df_train['text'])
X_val = tokenizer.texts_to_sequences(df_val['text'])
X_test = tokenizer.texts_to_sequences(df_test['text'])

# Padding
X_train = pad_sequences(X_train, maxlen=max_len, padding='post')
X_val = pad_sequences(X_val, maxlen=max_len, padding='post')
X_test = pad_sequences(X_test, maxlen=max_len, padding='post')

y_train = df_train['label']
y_val = df_val['label']
y_test = df_test['label']


* Tokenization: Converts text into sequences of integers.

* Padding: Ensures all sequences are the same length (100) for model input.

* <OOV> token: Handles out-of-vocabulary words.

**Checking Class Distribution (for class weights)**

In [10]:
from sklearn.utils import class_weight

# Compute class weights for imbalance handling
class_weights = class_weight.compute_class_weight(
    class_weight='balanced',
    classes=np.unique(y_train),
    y=y_train
)
class_weights = dict(enumerate(class_weights))
print("Class Weights:", class_weights)


Class Weights: {0: np.float64(2.14366276610743), 1: np.float64(0.7355004111643206), 2: np.float64(0.8518684520141184)}


**LSTM Model Building**

In [11]:
model_lstm = Sequential([
    Embedding(input_dim=vocab_size, output_dim=128),
    LSTM(128, return_sequences=True),
    Dropout(0.5),
    LSTM(64),
    Dropout(0.5),
    Dense(3, activation='softmax')
])

model_lstm.build(input_shape=(None, max_len))
model_lstm.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model_lstm.summary()



* Embedding Layer: Converts word indices to dense vectors of fixed size (output_dim=128).

* LSTM Layer: A Long Short-Term Memory layer with 64 units, useful for sequential dependencies.

* Dropout: Prevents overfitting by randomly disabling 50% of the neurons during training.

* Dense Output Layer: 3 output units for the 3 sentiment classes with softmax activation.

* Loss: sparse_categorical_crossentropy used for multi-class classification with integer labels.

* Optimizer: Adam is used for efficient training.



In [12]:
print(X_train.shape,y_train.shape)
print(X_val.shape,y_val.shape)
print(X_test.shape,y_test.shape)

(45615, 100) (45615,)
(2000, 100) (2000,)
(12284, 100) (12284,)


**Model Training**

In [13]:
history_lstm = model_lstm.fit(X_train, y_train,
                              validation_data=(X_val, y_val),
                              epochs=5,
                              batch_size=32,
                              class_weight=class_weights)


Epoch 1/5
[1m1426/1426[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m556s[0m 379ms/step - accuracy: 0.3270 - loss: 1.1007 - val_accuracy: 0.4345 - val_loss: 1.0915
Epoch 2/5
[1m1426/1426[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m575s[0m 403ms/step - accuracy: 0.3388 - loss: 1.1016 - val_accuracy: 0.1560 - val_loss: 1.1044
Epoch 3/5
[1m1426/1426[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m575s[0m 403ms/step - accuracy: 0.3111 - loss: 1.1028 - val_accuracy: 0.4345 - val_loss: 1.0959
Epoch 4/5
[1m1426/1426[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m589s[0m 413ms/step - accuracy: 0.3698 - loss: 1.0991 - val_accuracy: 0.4095 - val_loss: 1.1021
Epoch 5/5
[1m1426/1426[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m652s[0m 457ms/step - accuracy: 0.2628 - loss: 1.1038 - val_accuracy: 0.4345 - val_loss: 1.0962


* Training for 5 epochs using a batch size of 32.

* Validation data is used to evaluate model performance after each epoch.

**Evaluation on Test Data**

In [14]:
y_pred_probs = model_lstm.predict(X_test)
y_pred = np.argmax(y_pred_probs, axis=1)

print(classification_report(y_test, y_pred, target_names=['Negative', 'Neutral', 'Positive'], zero_division=0))


[1m384/384[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m58s[0m 150ms/step
              precision    recall  f1-score   support

    Negative       0.00      0.00      0.00      3972
     Neutral       0.48      1.00      0.65      5937
    Positive       0.00      0.00      0.00      2375

    accuracy                           0.48     12284
   macro avg       0.16      0.33      0.22     12284
weighted avg       0.23      0.48      0.31     12284



* Model prediction on test data.

* np.argmax() gets the predicted class labels.

* classification_report() shows precision, recall, F1-score, and support for each class.

**The LSTM Model is Severely Underperforming.**

* **Accuracy = 48%**

* The model **predicts only the "Neutral" class** (label 1), for all inputs.

* **Precision/Recall/F1 for Negative and Positive = 0.00** → they’re not being predicted at all.

* **Macro Avg F1 = 0.22** → very poor overall performance.


### LSTM Model Limitations & Observations

Although I applied text preprocessing and class weighting to reduce the impact of class imbalance, the LSTM model struggled to learn meaningful patterns. During evaluation, the model predicted almost all test samples as Neutral, resulting in poor recall and F1-score for both Positive and Negative classes.

This may be due to:
- Class imbalance in the TweetEval dataset
- LSTM's limited ability to capture context in short, noisy tweet data
- Shallow architecture or limited training time (epochs)

To address this, I used a transformer-based **BERT model (`bert-base-uncased`)** fine-tuned on the same dataset. BERT can capture bidirectional context, handle slang and informal language better, and significantly outperforms LSTM on sentiment classification tasks. The BERT model showed improved accuracy and a more balanced prediction across classes.


#### BERT-Based Sentiment Classifier (with Hugging Face)

In [8]:
# Import Libraries and Load Tokenizer

from transformers import BertTokenizer, TFBertForSequenceClassification
import tensorflow as tf

# Load BERT tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')




* Loads the **pretrained BERT tokenizer**.

* bert-base-uncased means all input is lowercased (better for general sentiment tasks).



**Encode the Text Data**

In [9]:
# Tokenize the text and truncate/pad to max length
def encode_texts(texts, labels):
    encodings = tokenizer(texts.tolist(), truncation=True, padding=True, max_length=128, return_tensors='tf')
    dataset = tf.data.Dataset.from_tensor_slices((
        dict(encodings),
        labels
    ))
    return dataset

train_dataset = encode_texts(df_train['text'], df_train['label']).batch(16)
val_dataset = encode_texts(df_val['text'], df_val['label']).batch(16)
test_dataset = encode_texts(df_test['text'], df_test['label']).batch(16)



* Converts texts into token ID sequences with attention masks.

* Creates TensorFlow datasets for train, val, and test.

**Load and Compile the BERT Model**

In [10]:
model_bert = TFBertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=3)

optimizer = tf.keras.optimizers.Adam(learning_rate=5e-5)
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
metrics = ['accuracy']

model_bert.compile(optimizer=optimizer, loss=loss, metrics=metrics)
model_bert.summary()

model_bert.fit(train_dataset.take(100), validation_data=val_dataset.take(30), epochs=1)



All PyTorch model weights were used when initializing TFBertForSequenceClassification.

Some weights or buffers of the TF 2.0 model TFBertForSequenceClassification were not initialized from the PyTorch model and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Model: "tf_bert_for_sequence_classification"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 bert (TFBertMainLayer)      multiple                  109482240 
                                                                 
 dropout_37 (Dropout)        multiple                  0         
                                                                 
 classifier (Dense)          multiple                  2307      
                                                                 
Total params: 109,484,547
Trainable params: 109,484,547
Non-trainable params: 0
_________________________________________________________________


<keras.callbacks.History at 0x107f42ce050>

* Loads BERT with a classification head for 3 labels: Negative, Neutral, and Positive.
  
* Uses a low learning rate (5e-5), ideal for fine-tuning BERT.

* from_logits=True is important as BERT outputs raw scores (logits).

**Predict on Test Set**

In [11]:
logits = model_bert.predict(test_dataset).logits
y_pred = tf.argmax(logits, axis=1).numpy()




* Generates logits for each test sample and converts to class predictions.



**Evaluate on Test Set**

In [12]:
from sklearn.metrics import classification_report
print(classification_report(df_test['label'], y_pred, target_names=['Negative', 'Neutral', 'Positive']))


              precision    recall  f1-score   support

    Negative       0.61      0.79      0.69      3972
     Neutral       0.72      0.58      0.64      5937
    Positive       0.62      0.61      0.62      2375

    accuracy                           0.65     12284
   macro avg       0.65      0.66      0.65     12284
weighted avg       0.66      0.65      0.65     12284



**Observations:**
  
* Accuracy improved to 65% (vs 48% with LSTM).

* Recall improved across all classes.

* The model is much more balanced in its predictions.

