# Emotion Transformer from Scratch

This notebook builds a Transformer Encoder model **completely from scratch** in PyTorch for multi-label emotion classification on the GoEmotions dataset.

- Custom Multi-Head Attention  
- Custom Positional Encoding  
- Full training and evaluation pipeline  
- No pretrained transformers or Huggingface models used  


In [None]:
# Imports and setup

import torch
import torch.nn as nn
import matplotlib.pyplot as plt

from model import EmotionsModel
from utils import train_transformer_encoder, predict_from_text_or_dataset
from dataset import text_processor, train_dl, valid_dl, test_ds, dataset

# Reproducibility
SEED = 25
torch.manual_seed(SEED)

# Device configuration
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")


## Model Configuration
Setting up hyperparameters and initializing the model.


In [None]:
# Hyperparameters
num_layers = 2
src_vocab_size = len(text_processor.vocab)
embed_size = 128
d_out_n_heads = embed_size
num_heads = 4
ffn_hidden_dim = 4 * embed_size
dropout = 0.2
learning_rate = 3e-4

# Initialize model and send to device
model = EmotionsModel(num_layers,src_vocab_size,embed_size,d_out_n_heads,num_heads,ffn_hidden_dim).to(device)

# Optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)


## Handling Class Imbalance with Weighted Loss

Computing positive class weights based on label frequencies in the training and validation splits.


In [None]:
def compute_class_weights(dataset):
    """
    Computes class-wise positive weights for BCEWithLogitsLoss
    based on label frequency.
    """
    label_freq = torch.zeros(28)
    for split in ['train', 'validation']:
        for sample in dataset[split]:
            for label in sample['labels']:
                label_freq[label] += 1
    total = label_freq.sum()
    pos_weight = total / (label_freq + 1e-6)  # avoid division by zero
    return pos_weight

pos_weight = compute_class_weights(dataset).to(device)

# Define loss function with class weights
loss_fn = nn.BCEWithLogitsLoss(pos_weight=pos_weight)


## Training the Model

Training for 10 epochs using the custom `train_transformer_encoder` utility function.


In [None]:
NUM_EPOCHS = 10

train_metrics_history, train_loss_history, valid_metrics_history, valid_loss_history = train_transformer_encoder(
    model, loss_fn, optimizer, train_dl, valid_dl, NUM_EPOCHS=NUM_EPOCHS
)


## Sample Inference on Example Texts

Test the trained model on several example sentences.


In [None]:
print("\nSample text used to test model after training\n")

sample_texts = [
    "I am so happy and excited about this!",
    "This makes me really angry and sad.",
    "I'm feeling a bit anxious but hopeful.",
    "I'm feeling very sad but also relieved."
]

for i, text in enumerate(sample_texts, 1):
    emotions, confidences = predict_from_text_or_dataset(
        model, text, text_processor, device=device, threshold=0.85
    )
    print(f"Text {i}: {text}\n→ Predicted emotions: {emotions}\n")


## Random Predictions from the Test Set

Run inference on 10 random samples from the test set.


In [None]:
predict_from_text_or_dataset(model, test_ds, text_processor, n=10, threshold=0.55)


## Visualization of Training Progress

Plotting Macro F1 score over epochs for training and validation sets.


In [None]:
plt.figure(figsize=(10, 5))
plt.plot([m['f1_macro'] for m in train_metrics_history], label="Train F1 Macro")
plt.plot([m['f1_macro'] for m in valid_metrics_history], label="Valid F1 Macro")
plt.title("Macro F1 Score Over Epochs")
plt.xlabel("Epoch")
plt.ylabel("F1 Macro")
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()


# Conclusion

This notebook demonstrated building a Transformer-based multi-label emotion classifier **from scratch**:

- Implemented custom Transformer encoder layers without relying on pretrained transformer libraries.  
- Handled multi-label classification on GoEmotions dataset with a weighted loss to combat class imbalance.  
- Showed inference on both example sentences and random test samples.  
- Visualized training progression showing consistent improvement in F1 Macro scores.

This approach highlights the power and flexibility of building deep learning models from fundamental building blocks, giving deeper insight into the workings of transformer models applied to emotion recognition.

---

*Feel free to extend this notebook by experimenting with different hyperparameters, adding more sophisticated data augmentation, or fine-tuning on other datasets!*  
