# Demo: Sentiment Analysis with a Custom Transformer Encoder from Scratch


# Introduction

In this notebook, we demonstrate how to build, train, and evaluate a **transformer encoder-based sentiment analysis model**, developed entirely **from scratch** using PyTorch.

This model is designed to classify movie reviews as positive or negative. Key features include:

- **Custom multi-head self-attention layers** with support for **causal masking** and **padding masks**, enabling the model to handle variable-length input sequences effectively.
- **Positional encoding** added to input embeddings to retain word order information.
- A **feed-forward network** integrated within each encoder block.
- End-to-end training pipeline with loss tracking and accuracy evaluation.

By the end of this notebook, you will see how the model learns to predict sentiment and how to use it to infer on new reviews.



In [None]:
# Setup & Imports

import torch
import torch.nn as nn
import matplotlib.pyplot as plt

# Import your implemented model and utility functions
from model import Sentiment_Model
from utils import train_transformer_encoder, plot_confusion_matrix, predict_sentiment
from dataloader_generator import train_dl, valid_dl, vocab, tokenizer, SEED

# Set device and fix seed for reproducibility
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
torch.manual_seed(SEED)


# Model and Training Setup

Here we define hyperparameters and initialize the transformer encoder model.

- `num_layers`: number of encoder layers stacked
- `embed_size`: embedding dimension of tokens
- `num_heads`: number of attention heads for multi-head attention
- `ffn_hidden_dim`: dimension of feed-forward network inside the encoder block
- `dropout`: dropout rate for regularization

We also initialize the optimizer and the binary cross-entropy loss function appropriate for sentiment classification.


In [None]:
NUM_EPOCHS = 10
num_layers = 2
src_vocab_size = len(vocab)
embed_size = 128
d_out_n_heads = embed_size
num_heads = 4
ffn_hidden_dim = 2 * embed_size
dropout = 0.4

model = Sentiment_Model(num_layers, src_vocab_size, embed_size, d_out_n_heads, num_heads, ffn_hidden_dim).to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
loss_fn = nn.BCELoss()


# Training Loop

We train the model for a set number of epochs, recording training and validation accuracy and loss at each epoch.

This allows us to monitor learning progress and diagnose potential overfitting or underfitting.


In [None]:
train_acc, train_loss, valid_acc, valid_loss = train_transformer_encoder(
    model, loss_fn, optimizer, train_dl, valid_dl, NUM_EPOCHS
)


# Visualizing Training Progress

Plot accuracy and loss curves for both training and validation datasets.

This visualization helps to ensure the model is converging and generalizing well.


In [None]:
plt.figure(figsize=(10, 4))

plt.subplot(1, 2, 1)
plt.plot(train_acc, label='Train Accuracy', color='blue')
plt.plot(valid_acc, label='Validation Accuracy', color='red')
plt.title("Accuracy Over Epochs")
plt.xlabel("Epoch")
plt.ylabel("Accuracy")
plt.legend()
plt.grid(True)

plt.subplot(1, 2, 2)
plt.plot(train_loss, label='Train Loss', color='blue')
plt.plot(valid_loss, label='Validation Loss', color='red')
plt.title("Loss Over Epochs")
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.legend()
plt.grid(True)

plt.tight_layout()
plt.show()


# Confusion Matrix on Validation Data

We plot the confusion matrix to inspect true positives, true negatives, false positives, and false negatives.

This gives insights into the types of classification errors the model makes.


In [None]:
plot_confusion_matrix(model, valid_dl, labels=['Negative', 'Positive'], normalize=True, title='Normalized Confusion Matrix')
plt.show()


# Inference on New Movie Reviews

Finally, we demonstrate the model's ability to predict sentiment on new, unseen reviews.

Each review is tokenized, converted to tensor inputs, and passed through the model to output a sentiment classification with confidence score.


In [None]:
reviews = [
    "I absolutely loved this movie! The story was compelling, the acting was top-notch, and the soundtrack gave me chills. I’d definitely watch it again.",
    "This was a total waste of time. The plot made no sense, the characters were dull, and the ending was painfully predictable.",
    "The film had some strong performances and great cinematography, but it was dragged down by a slow-paced and confusing storyline.",
    "I am really not sure if I like or hate the movie. It was long and I honestly did not get the whole theme or plot of the movie."
]

for i, review in enumerate(reviews, 1):
    sentiment, score = predict_sentiment(model, review, vocab, tokenizer)
    print(f"Review {i} - Predicted Sentiment: {sentiment} (Confidence: {score:.4f})")


# Conclusion

- We successfully built a transformer encoder from scratch that supports **masking** to handle padded sequences and optional causal masking.
- The model learns meaningful sentiment representations from movie reviews using **positional encoding** and **multi-head self-attention**.
- Training curves and confusion matrix confirm the model's effectiveness on the validation set.
- Sample predictions on unseen text demonstrate practical usage of the model.

Future improvements could involve experimenting with deeper models, larger vocabularies, more training data, or integrating pretrained embeddings.
