# Sentiment-Based Review Logger

This project is an AI-powered sentiment analysis system designed to classify user-submitted reviews and log negative feedback for further analysis. Built using **PyTorch**, **TorchText**, and **NLP techniques**, the system prompts users for their name and product review, then processes the input using a **pre-trained sentiment classification model**.

## Key Features:
- Interactive User Input – Prompts users for their name and review in a Jupyter Notebook environment.
- Sentiment Classification – Utilizes a trained deep learning model to categorize reviews as Negative, Positive, or Neutral.
- Automated Review Logging – If a review is classified as negative, it is automatically saved in a structured CSV file (negative_reviews.csv).
- Scalable Data Handling – Uses pandas for efficient data storage, allowing easy analysis of recurring negative feedback trends.
- Customizable File Path – The CSV file can be saved in any specified directory for centralized logging.

## Use Cases:
- Businesses can track **customer dissatisfaction** trends to improve their products.
- AI researchers can **fine-tune NLP models** using real-world feedback.
- Developers can **extend the project** to include real-time review monitoring in web applications.

## Dependencies
- torch
- torchtext
- spacy ('en_core_web_sm' language model)
- pandas

Author: Tamunowunari-Tasker Anointing

In [1]:
# Importing necessary libraries from PyTorch and TorchText
import torchtext.data.utils as tdu  # For tokenization utilities
import torchtext.vocab as tv        # For vocabulary building
import torch.nn as nn               # For neural network components like Embedding
import torch                        # For tensor operations
import torch.optim as optim
import torch.nn.functional as F

In [2]:
# Sample dataset of simple sentences
dataset = [
    ("I like cats", 1), # Positive
    ("I hate dogs", 0), # Negative
    ("I'm impartial to hippos", 2)] # Neutral

In [3]:
# Initialize the tokenizer using SpaCy's small English model
tokenizer = tdu.get_tokenizer('spacy', language = 'en_core_web_sm') # get_tokenizer() is unique to torchtext

In [4]:
# Function to yield tokens from each sentence in the dataset
def yield_tokens(data_iter):
    """
    Tokenizes each sentence in the dataset.
    
    Args:
        data_iter (iter): An iterator over the dataset.
        
    Yields:
        list: A list of tokens for each sentence.
    """
    for sentence, _  in data_iter:
        yield tokenizer(sentence) # gives you one value at a time but remembers where it left off, ready to continue when called again.

# Create an iterator over the dataset
data_iter = iter(dataset) # iter is an inbuilt function in Python that returns an iterator from a list, tuple or a custom dataset class

In [5]:
# Build vocabulary from the tokenized dataset
vocab = tv.build_vocab_from_iterator(yield_tokens(dataset))

In [6]:
# Display the vocabulary as a list of words (index-to-string mapping)
print(vocab.get_itos()) # get_itos() returns the list/bag of words. This results in a bag of 9

['I', "'m", 'cats', 'dogs', 'hate', 'hippos', 'impartial', 'like', 'to']


In [7]:
# Convert sentences to token indices
input_indexes = lambda data: [torch.tensor(vocab(tokenizer(sentence))) for sentence, _ in data]

In [8]:
# Convert dataset sentences to token indices
index = input_indexes(dataset) # index has three tensors
print("Token indices:", index)

Token indices: [tensor([0, 7, 2]), tensor([0, 4, 3]), tensor([0, 1, 6, 8, 5])]


In [9]:
# Extract sentiment labels
labels = torch.tensor([label for _, label in dataset])

In [10]:
# Set embedding dimensions (number of features each word vector will have)
embedding_dim = 3 # Each word will be represented in a 3D vector space

# Number of unique words in the vocabulary
n_embedding = len(vocab)

In [11]:
# Initialize EmbeddingBag layer to get sentence-level embeddings
embedding_bag = nn.EmbeddingBag(num_embeddings=n_embedding, embedding_dim=embedding_dim, mode='mean')

In [12]:
# Prepare flattened token indices and offsets for EmbeddingBag
index_flat = torch.cat(index)
offsets = torch.tensor([0] + [len(sample) for sample in index[:-1]]).cumsum(0)

In [13]:
# Define a simple sentiment classifier model
class SentimentClassifier(nn.Module):
    def __init__(self, embedding_bag, embedding_dim, num_classes):
        super(SentimentClassifier, self).__init__()
        self.embedding_bag = embedding_bag
        self.fc = nn.Linear(embedding_dim, num_classes)  # Linear layer for classification
        
    def forward(self, text, offsets):
        embedded = self.embedding_bag(text, offsets)  # Get sentence embeddings
        return self.fc(embedded)  # Pass through linear layer

In [14]:
# Instantiate the model
num_classes = 3  # Positive, Negative, Neutral
model = SentimentClassifier(embedding_bag, embedding_dim, num_classes)

In [15]:
# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

In [16]:
# Training loop
num_epochs = 100
for epoch in range(num_epochs):
    model.train()
    
    # Zero the gradients
    optimizer.zero_grad()
    
    # Forward pass
    outputs = model(index_flat, offsets)
    
    # Compute loss
    loss = criterion(outputs, labels)
    
    # Backward pass and optimization
    loss.backward()
    optimizer.step()
    
    if (epoch + 1) % 10 == 0:
        print(f"Epoch [{epoch + 1}/{num_epochs}], Loss: {loss.item():.4f}")

Epoch [10/100], Loss: 1.0028
Epoch [20/100], Loss: 0.8270
Epoch [30/100], Loss: 0.6588
Epoch [40/100], Loss: 0.5029
Epoch [50/100], Loss: 0.3663
Epoch [60/100], Loss: 0.2570
Epoch [70/100], Loss: 0.1776
Epoch [80/100], Loss: 0.1240
Epoch [90/100], Loss: 0.0890
Epoch [100/100], Loss: 0.0663


In [17]:
# Testing the classifier
model.eval()
with torch.no_grad():
    test_sentence = "I hate cats hippos dogs"
    test_tokens = torch.tensor(vocab(tokenizer(test_sentence)))
    test_offset = torch.tensor([0])
    
    output = model(test_tokens, test_offset)
    predicted_label = torch.argmax(output).item()
    
    sentiment_map = {0: "Negative", 1: "Positive", 2: "Neutral"}
    print(f"Sentence: '{test_sentence}' -> Sentiment: {sentiment_map[predicted_label]}")

Sentence: 'I hate cats hippos dogs' -> Sentiment: Negative


In [18]:
import pandas as pd
import os

def classify_review(review):
    """Classify sentiment of a review using the trained model."""
    model.eval()
    with torch.no_grad():
        test_tokens = torch.tensor(vocab(tokenizer(review)))
        test_offset = torch.tensor([0])
        output = model(test_tokens, test_offset)
        predicted_label = torch.argmax(output).item()
    return predicted_label  # Returns 0 (Negative), 1 (Positive), or 2 (Neutral

In [19]:
def save_negative_review(name, review, filename="negative_reviews.csv"):
    """Save negative reviews to a CSV file."""
    file_exists = os.path.exists(filename)

    # Create DataFrame from new entry
    new_entry = pd.DataFrame([[name, review]], columns=["Name", "Review"])

    # Append to CSV, creating it if necessary
    if file_exists:
        new_entry.to_csv(filename, mode="a", header=False, index=False)
    else:
        new_entry.to_csv(filename, mode="w", header=True, index=False)

In [21]:
# --- User Interaction ---
name = input("Enter your name: ")
review = input("Share your review of the product: ")

sentiment_label = classify_review(review)
sentiment_map = {0: "Negative", 1: "Positive", 2: "Neutral"}
print(f"\nYour review was classified as: {sentiment_map[sentiment_label]}")

# Save negative reviews
if sentiment_label == 0:
    save_negative_review(name, review)
    print("Your negative review has been recorded.")
else:
    print("Thank you for your feedback!")

Enter your name:  Anointina
Share your review of the product:  I hate dogs



Your review was classified as: Negative
Your negative review has been recorded.
