# Sentiment Analysis Using Custom and Built-in GRU Models

This notebook demonstrates sentiment analysis on the IMDB movie reviews dataset using three different GRU architectures:

1. **Custom GRU** — a manually implemented GRU cell and model from scratch  
2. **Uni-directional GRU** — PyTorch's built-in GRU moving left-to-right  
3. **Bi-directional GRU** — PyTorch's built-in GRU capturing context in both directions  

We compare the training performance and evaluation metrics of these models to understand their strengths and weaknesses.

---

## Why GRUs?

Gated Recurrent Units (GRUs) are a popular type of RNN that efficiently capture dependencies in sequence data with fewer parameters than LSTMs. They help handle the vanishing gradient problem in RNNs, making them suitable for NLP tasks like sentiment classification.


## Dataset and Preprocessing

We use the IMDB dataset with labeled movie reviews as positive or negative.

Key preprocessing steps include:

- Tokenizing text into words (excluding punctuation)
- Building a vocabulary based on token frequency
- Mapping tokens and labels to integer indices
- Padding sequences within each batch for uniform input length

The data is then loaded into PyTorch DataLoaders for efficient mini-batch training.


In [None]:
import torch
import torch.nn as nn
import matplotlib.pyplot as plt

from dataloader_generator import get_dataloaders
from models import GRU_Sentiment_Analysis, GRULeftToRight, BidirectionalGRU
from utils import train_each_models, plot_confusion_matrix

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

DATA_PATH = "hf://datasets/scikit-learn/imdb/IMDB Dataset.csv"
train_dl, valid_dl, vocab = get_dataloaders(DATA_PATH)
vocab_size = len(vocab)

print(f"Vocabulary size: {vocab_size}")


## Training and Evaluation Helper

We define a helper function to:

- Initialize models with fixed seeds for reproducibility  
- Train models for a specified number of epochs  
- Plot training and validation accuracy and loss curves  
- Display the confusion matrix on the validation set

This modularity allows easy comparison of different architectures.


In [None]:
def run_training(model_class, seed, model_name):
    torch.manual_seed(seed)
    print(f"Training {model_name}...")

    model = model_class(vocab_size, 20, 64, 64).to(device)
    optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
    loss_fn = nn.BCELoss()

    train_acc, train_loss, valid_acc, valid_loss = train_each_models(
        model, loss_fn, optimizer, train_dl, valid_dl, device=device, NUM_EPOCHS=10
    )

    plt.figure(figsize=(12, 5))
    plt.subplot(1, 2, 1)
    plt.plot(train_acc, label="Train Accuracy")
    plt.plot(valid_acc, label="Validation Accuracy")
    plt.title(f"{model_name} Accuracy")
    plt.xlabel("Epoch")
    plt.ylabel("Accuracy")
    plt.legend()
    plt.grid(True)

    plt.subplot(1, 2, 2)
    plt.plot(train_loss, label="Train Loss")
    plt.plot(valid_loss, label="Validation Loss")
    plt.title(f"{model_name} Loss")
    plt.xlabel("Epoch")
    plt.ylabel("Loss")
    plt.legend()
    plt.grid(True)

    plt.tight_layout()
    plt.show()

    plot_confusion_matrix(model, valid_dl, device=device, title=f"{model_name} Confusion Matrix")


## Custom GRU Model Training

First, we train our manually implemented GRU model.  
This gives insight into the inner workings of GRUs and serves as a baseline.


In [None]:
run_training(GRU_Sentiment_Analysis, seed=1, model_name="Custom GRU")

## Uni-directional GRU Training

Next, we train a standard left-to-right GRU using PyTorch's built-in implementation.

This model usually achieves better performance than the custom GRU due to optimized internals.


In [None]:
run_training(GRULeftToRight, seed=2, model_name="Uni-directional GRU")


## Bi-directional GRU Training

Finally, we train a bidirectional GRU to leverage both past and future context.

This typically yields the best performance on sequence classification tasks like sentiment analysis.


In [None]:
run_training(BidirectionalGRU, seed=3, model_name="Bi-directional GRU")

# Summary and Conclusion

- The **Custom GRU** implementation serves as an educational baseline, showing how GRU cells can be built from scratch. However, it generally trains slower and performs slightly worse.
- The **Uni-directional GRU** uses optimized PyTorch components and improves accuracy by efficiently capturing sequential dependencies from left to right.
- The **Bi-directional GRU** leverages context from both past and future tokens, achieving the highest accuracy and most robust results.
- Bidirectional RNNs are particularly effective in NLP tasks where understanding both preceding and succeeding words enhances comprehension.
- Implementing and comparing these models provides both practical experience and theoretical insight into sequence modeling.

---

## Next Steps

- Experiment with hyperparameter tuning (embedding sizes, hidden units, learning rates)
- Extend to multi-class sentiment classification or other NLP tasks  
- Explore advanced architectures like attention mechanisms or Transformers for state-of-the-art results
