# Sentiment-Based Review Logger

This project is an AI-powered sentiment analysis system designed to classify user-submitted reviews and log negative feedback for further analysis. Built using **PyTorch**, **TorchText**, and **NLP techniques**, the system prompts users for their name and product review, then processes the input using a **pre-trained sentiment classification model**.

## Key Features:
- Interactive User Input – Prompts users for their name and review in a Jupyter Notebook environment.
- Sentiment Classification – Utilizes a trained deep learning model to categorize reviews as Negative, Positive, or Neutral.
- Automated Review Logging – If a review is classified as negative, it is automatically saved in a structured CSV file (negative_reviews.csv).
- Scalable Data Handling – Uses pandas for efficient data storage, allowing easy analysis of recurring negative feedback trends.
- Customizable File Path – The CSV file can be saved in any specified directory for centralized logging.

## Use Cases:
- Businesses can track **customer dissatisfaction** trends to improve their products.
- AI researchers can **fine-tune NLP models** using real-world feedback.
- Developers can **extend the project** to include real-time review monitoring in web applications.

## Dependencies
- torch
- torchtext
- spacy ('en_core_web_sm' language model)
- pandas

Author: Tamunowunari-Tasker Anointing

In [1]:
# Importing necessary libraries from PyTorch and TorchText
import torchtext.data.utils as tdu  # For tokenization utilities
import torchtext.vocab as tv        # For vocabulary building
import torch.nn as nn               # For neural network components like Embedding
import torch                        # For tensor operations
import torch.optim as optim
import torch.nn.functional as F

In [2]:
import pandas as pd

# Load the CSV file
df = pd.read_csv('Dataset.csv')  # Replace with your actual filename

# Convert DataFrame to list of tuples format (text, label)
dataset = list(zip(df['Statement'], df['Sentiment']))

print(f"✓ Loaded {len(dataset)} samples from CSV")
print(f"  - Positive: {sum(1 for _, label in dataset if label == 1)}")
print(f"  - Negative: {sum(1 for _, label in dataset if label == 0)}")
print(f"  - Neutral: {sum(1 for _, label in dataset if label == 2)}")

✓ Loaded 324 samples from CSV
  - Positive: 109
  - Negative: 110
  - Neutral: 105


In [3]:
# Initialize the tokenizer using SpaCy's small English model
tokenizer = tdu.get_tokenizer('spacy', language = 'en_core_web_sm') # get_tokenizer() is unique to torchtext

In [4]:
# Function to yield tokens from each sentence in the dataset
def yield_tokens(data_iter):
    """
    Tokenizes each sentence in the dataset.
    
    Args:
        data_iter (iter): An iterator over the dataset.
        
    Yields:
        list: A list of tokens for each sentence.
    """
    for sentence, _  in data_iter:
        yield tokenizer(sentence) # gives you one value at a time but remembers where it left off, ready to continue when called again.

# Create an iterator over the dataset
data_iter = iter(dataset) # iter is an inbuilt function in Python that returns an iterator from a list, tuple or a custom dataset class

In [5]:
# Build vocabulary from the tokenized dataset
vocab = tv.build_vocab_from_iterator(
    yield_tokens(dataset),
    specials=["<unk>"]
)

# Set <unk> as the default token for unknown words
vocab.set_default_index(vocab["<unk>"])

In [6]:
# Display the vocabulary as a list of words (index-to-string mapping)
print(vocab.get_itos()) # get_itos() returns the list/bag of words. This results in a bag of 9

['<unk>', 'This', 'is', 'this', 'I', 'I‚Äôm', 'work', 'Very', 'done', 'Not', 'okay', 'bad', 'It', 'job', 'It‚Äôs', 'does', 'experience', 'quality', 'result', 'the', 'works', 'Could', 'good', '-', 'disappointing', 'well', 'Absolutely', 'Acceptable', 'Average', 'Awful', 'Brilliant', 'Clean', 'Completely', 'Excellent', 'Extremely', 'Fair', 'Great', 'Highly', 'Just', 'Low', 'Major', 'Mediocre', 'Neither', 'Neutral', 'Nicely', 'No', 'Nothing', 'Outstanding', 'Poorly', 'Really', 'Sloppy', 'So', 'Standard', 'Superb', 'Top', 'Unacceptable', 'Well', 'about', 'amazing', 'and', 'annoying', 'badly', 'broken', 'built', 'decent', 'dislike', 'either', 'elegant', 'enough', 'exceeded', 'execution', 'expectations', 'failed', 'fantastic', 'feelings', 'fine', 'frustrating', 'go', 'implementation', 'impressed', 'impressive', 'indifferent', 'letdown', 'love', 'makes', 'my', 'no', 'nor', 'not', 'notch', 'outcome', 'passable', 'perfectly', 'recommended', 'regret', 'remarkable', 'satisfied', 'sense', 'so', 'sp

In [7]:
# Convert sentences to token indices
input_indexes = lambda data: [torch.tensor(vocab(tokenizer(sentence))) for sentence, _ in data]

In [8]:
# Convert dataset sentences to token indices
index = input_indexes(dataset) # index has three tensors
print("Token indices:", index)

Token indices: [tensor([  1,   2,  22, 111]), tensor([124, 111]), tensor([ 9, 22]), tensor([  4, 133,   3, 111]), tensor([114,   6]), tensor([ 1,  2, 11]), tensor([ 21, 119, 116, 117]), tensor([ 21, 121, 119, 116, 117]), tensor([ 12, 113,  10]), tensor([127,  10]), tensor([128,   4, 118]), tensor([129,  24]), tensor([114,  13]), tensor([126]), tensor([  1, 122,   2,  25, 137]), tensor([  1, 136, 112, 123, 143, 130,  23, 139, 120]), tensor([  4, 132, 121, 112,   3, 122]), tensor([  4, 118, 120, 113,  10]), tensor([  4, 140, 134, 142, 138, 141, 112,   3]), tensor([  1,   2, 131, 135]), tensor([115]), tensor([  9, 115]), tensor([ 9, 11]), tensor([125]), tensor([36,  6]), tensor([  1,   2, 101]), tensor([14, 75]), tensor([33, 13]), tensor([ 7, 24]), tensor([46, 99]), tensor([ 7, 80]), tensor([ 4, 65,  3]), tensor([28, 16]), tensor([ 4, 83,  3]), tensor([48,  8]), tensor([12, 20]), tensor([ 1,  2, 73]), tensor([ 1, 15, 88,  6]), tensor([ 45, 100,  74]), tensor([56,  8]), tensor([  4,  94, 1

In [9]:
# Extract sentiment labels
labels = torch.tensor([label for _, label in dataset])

In [10]:
# Set embedding dimensions (number of features each word vector will have)
embedding_dim = 3 # Each word will be represented in a 3D vector space

# Number of unique words in the vocabulary
n_embedding = len(vocab)

In [11]:
# Initialize EmbeddingBag layer to get sentence-level embeddings
embedding_bag = nn.EmbeddingBag(num_embeddings=n_embedding, embedding_dim=embedding_dim, mode='mean')

In [12]:
# Prepare flattened token indices and offsets for EmbeddingBag
index_flat = torch.cat(index)
offsets = torch.tensor([0] + [len(sample) for sample in index[:-1]]).cumsum(0)

In [13]:
# Define a simple sentiment classifier model
class SentimentClassifier(nn.Module):
    def __init__(self, embedding_bag, embedding_dim, num_classes):
        super(SentimentClassifier, self).__init__()
        self.embedding_bag = embedding_bag
        self.fc = nn.Linear(embedding_dim, num_classes)  # Linear layer for classification
        
    def forward(self, text, offsets):
        embedded = self.embedding_bag(text, offsets)  # Get sentence embeddings
        return self.fc(embedded)  # Pass through linear layer

In [14]:
# Instantiate the model
num_classes = 3  # Positive, Negative, Neutral
model = SentimentClassifier(embedding_bag, embedding_dim, num_classes)

In [15]:
# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

In [16]:
# Training loop
num_epochs = 200
for epoch in range(num_epochs):
    model.train()
    
    # Zero the gradients
    optimizer.zero_grad()
    
    # Forward pass
    outputs = model(index_flat, offsets)
    
    # Compute loss
    loss = criterion(outputs, labels)
    
    # Backward pass and optimization
    loss.backward()
    optimizer.step()
    
    if (epoch + 1) % 20 == 0:
        print(f"Epoch [{epoch + 1}/{num_epochs}], Loss: {loss.item():.4f}")

Epoch [20/200], Loss: 1.0126
Epoch [40/200], Loss: 0.8235
Epoch [60/200], Loss: 0.5743
Epoch [80/200], Loss: 0.3375
Epoch [100/200], Loss: 0.1797
Epoch [120/200], Loss: 0.0985
Epoch [140/200], Loss: 0.0599
Epoch [160/200], Loss: 0.0405
Epoch [180/200], Loss: 0.0297
Epoch [200/200], Loss: 0.0229


In [17]:
# Testing the classifier
model.eval()
with torch.no_grad():
    test_sentence = "I hate cats and dogs"
    test_tokens = torch.tensor(vocab(tokenizer(test_sentence)))
    test_offset = torch.tensor([0])
    
    output = model(test_tokens, test_offset)
    predicted_label = torch.argmax(output).item()
    
    sentiment_map = {0: "Negative", 1: "Positive", 2: "Neutral"}
    print(f"Sentence: '{test_sentence}' -> Sentiment: {sentiment_map[predicted_label]}")

Sentence: 'I hate cats and dogs' -> Sentiment: Negative


In [18]:
# Save the trained model and vocabulary - VERSION COMPATIBLE
torch.save({
    'model_state_dict': model.state_dict(),
    'vocab_stoi': vocab.get_stoi(),  # Save as plain dict
    'vocab_itos': vocab.get_itos(),  # Save as plain list
    'embedding_dim': embedding_dim,
    'num_classes': num_classes
}, 'sentiment_model.pth')

print("✓ Model saved successfully as 'sentiment_model.pth'")

✓ Model saved successfully as 'sentiment_model.pth'


In [18]:
import pandas as pd
import os

def classify_review(review):
    """Classify sentiment of a review using the trained model."""
    model.eval()
    with torch.no_grad():
        test_tokens = torch.tensor(vocab(tokenizer(review)))
        test_offset = torch.tensor([0])
        output = model(test_tokens, test_offset)
        predicted_label = torch.argmax(output).item()
    return predicted_label  # Returns 0 (Negative), 1 (Positive), or 2 (Neutral

In [19]:
def save_negative_review(name, review, filename="negative_reviews.csv"):
    """Save negative reviews to a CSV file."""
    file_exists = os.path.exists(filename)

    # Create DataFrame from new entry
    new_entry = pd.DataFrame([[name, review]], columns=["Name", "Review"])

    # Append to CSV, creating it if necessary
    if file_exists:
        new_entry.to_csv(filename, mode="a", header=False, index=False)
    else:
        new_entry.to_csv(filename, mode="w", header=True, index=False)

In [21]:
# --- User Interaction ---
name = input("Enter your name: ")
review = input("Share your review of the product: ")

sentiment_label = classify_review(review)
sentiment_map = {0: "Negative", 1: "Positive", 2: "Neutral"}
print(f"\nYour review was classified as: {sentiment_map[sentiment_label]}")

# Save negative reviews
if sentiment_label == 0:
    save_negative_review(name, review)
    print("Your negative review has been recorded.")
else:
    print("Thank you for your feedback!")

Enter your name:  Anointina
Share your review of the product:  I hate dogs



Your review was classified as: Negative
Your negative review has been recorded.
