# Training Notebook - Narrative Similarity (SemEval4)

This notebook implements the training of `AspectSupervisedEncoder` model for **narrative similarity** based on triplet learning with aspect attention. The model uses a pre-trained encoder (DistilBERT) with specialized projections to capture narrative similarity between texts.

## 1. Path Setup

The main directory is added to the system path to enable importing modules from the `src` package.

In [1]:
import sys
sys.path.insert(0, '..')

## 2. Import Libraries

All necessary dependencies are imported

In [5]:
import torch 
from torch.utils.data import DataLoader
from src.trainer import Trainer
from src.datasets.dataset import AspectTripletDatasetTrain, AspectTripletDatasetDev, load_aspect_triplets, load_eval_triplets
from src.trainer import Trainer
from src.models.encoder import AspectSupervisedEncoder

## 3. Load Data

The preprocessed data is loaded:
- **Training data**: triplets (anchor, positive, negative) with augmentation based on narrative aspects
- **Eval data**: triplets for model validation

The files are in JSONL format and contain examples already processed for triplet learning.

In [None]:
training_data = load_aspect_triplets("./data/processed/train_from_dev_w_aspect_aug_triplets.jsonl")
eval_data = load_eval_triplets("./data/processed/eval_from_dev_triplets.jsonl")

len(training_data), len(eval_data)

(418, 60)

## 4. Create DataLoaders

The datasets and DataLoaders for training are created:
- **AspectTripletDatasetTrain**: dataset for training with triplets (anchor, positive, negative)
- **AspectTripletDatasetDev**: dataset for validation

In [3]:
train_dataset = AspectTripletDatasetTrain(training_data)
train_dataloader = DataLoader(train_dataset, batch_size=12, shuffle=True)

eval_dataset = AspectTripletDatasetDev(eval_data)
eval_dataloader = DataLoader(eval_dataset, batch_size=12, shuffle=False)

## 5. Model Configuration

The **AspectSupervisedEncoder** is initialized with the following parameters:

| Parameter | Value | Description |
|-----------|-------|-------------|
| `model_name` | distilbert-base-uncased | Pre-trained transformer model as base encoder |
| `projection_dim` | 256 | Dimension of the projected embedding space |
| `aspect_dim` | 128 | Dimension of aspect representations |
| `num_heads` | 4 | Number of attention heads for multi-head attention |
| `dropout` | 0.3 | Dropout probability for regularization |
| `freeze_encoder` | True | Freeze pre-trained encoder weights |
| `use_lora` | False | LoRA (Low-Rank Adaptation) disabled |

**Note**: LoRA parameters (`lora_r=8`, `lora_alpha=16`) are configured but not active when `use_lora=False`.

In [None]:
# Model
model = AspectSupervisedEncoder(
    model_name="distilbert-base-uncased", # You can change to other models
    projection_dim=256, 
    aspect_dim=128,
    num_heads=4,
    dropout=0.3,
    freeze_encoder=True,
    use_lora=False,
    lora_r=8, # DO NOT CHANGE THIS VALUE
    lora_alpha=16,# DO NOT CHANGE THIS VALUE
    lora_dropout=0.1, 
    target_modules=["q_lin", "v_lin"]  # For DistilBERT. See above for other models
)

# Count params
total = sum(p.numel() for p in model.parameters())
trainable = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f"Parameters: {trainable:,} trainable / {total:,} total")

## 6. Model Training

Training is configured and started.
It automatically uses the GPU if available (CUDA), otherwise the CPU is used.

In [None]:
EPOCHS = 10
LR = 1e-4
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {DEVICE}")

# Trainer
trainer = Trainer(
    model=model,
    device=DEVICE,
    lr=LR,
    weight_decay=0.01,
    triplet_margin=0.5,
    cross_margin=0.3,
    gamma=0.3, # related to aspect loss
    beta=1.0 # related to cross loss
)

# Train
trainer.fit(train_dataloader, eval_dataloader, epochs=EPOCHS)