# SC4002 Assignment - Part 3.1: Hyperparameter Tuning (Round 2)

**Team 3: Aaron Chen & Javier Tin**

This notebook performs a **refined, reproducible grid search** to find the optimal hyperparameters for our BiLSTM and BiGRU models.

**Updates in this version:**
1.  **SEED = 42:** All training is now seeded for reproducible results.
2.  **Refined Grid:** Based on the results of the first 216-combination run, this grid is now focused on the most promising parameter ranges (e.g., more layers, specific dropout, and small weight decay). This reduces the search space to **108 combinations**.

## 1. Imports & Setup
This cell imports all data from our compliant `data_pipeline.py` and sets the random seed.

In [None]:
# === Core PyTorch Imports ===
import torch
import torch.nn as nn
import torch.optim as optim
import time
import itertools
import json
import pandas as pd

# === Plotting Imports ===
import matplotlib.pyplot as plt
import numpy as np
import sys
import os
import random


def set_seed(seed):
    """Sets the random seed for full reproducibility."""
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    np.random.seed(seed)
    random.seed(seed)
    
    # --- NEW/UPDATED LINES ---
    # Force deterministic algorithms
    torch.use_deterministic_algorithms(True) 
    
    # Configure CUDNN
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False 
    
    # Set environment variables
    os.environ['PYTHONHASHSEED'] = str(seed)
    # This is often needed for deterministic bmm/RNNs on GPU
    os.environ['CUBLAS_WORKSPACE_CONFIG'] = ':4096:8' 
    # --- END NEW/UPDATED LINES ---

# --- Put this at the TOP of your script ---
SEED = 42  # You can pick any number
set_seed(SEED)

# ... now run the rest of your notebook ...

# === Check PyTorch and CUDA Versions ===
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA version: {torch.version.cuda}")
    print(f"Current device: {torch.cuda.current_device()}")
    print(f"Device name: {torch.cuda.get_device_name(0)}")

# Add parent directory to path
sys.path.append(os.path.abspath(".."))

# === Data Pipeline Import ===
try:
    from data_pipeline import (
        train_iterator, 
        valid_iterator, 
        test_iterator, 
        TEXT, 
        LABEL, 
        create_embedding_layer,
        device,
        BATCH_SIZE
    )
    print("\n✓ Successfully imported data pipeline.")
    print(f"  - Using device: {device}")
    print(f"  - Batch Size: {BATCH_SIZE}")
except ImportError:
    print("--- ERROR ---")
    print("Could not find 'data_pipeline.py'.")
    print("Please make sure 'data_pipeline.py' is in the same directory as this notebook.")

--- Setting global random seed to 42 ---
PyTorch version: 2.5.1
CUDA available: True
CUDA version: 12.1
Current device: 0
Device name: NVIDIA GeForce RTX 4060 Laptop GPU


  self.itos, self.stoi, self.vectors, self.dim = torch.load(path_pt)



✓ Successfully imported data pipeline.
  - Using device: cuda
  - Batch Size: 64


## 2. Define Refined Hyperparameter Grid

Based on the first run, we observed:
- **Hidden Dim:** 256 and 512 were the best. We'll add 384.
- **Layers:** 1 layer was almost always worse. We'll focus on 2 and 3.
- **Dropout:** 0.7 was often too high. We'll test 0.4, 0.5, 0.6.
- **Weight Decay:** 0 and 1e-4 were clear losers. We'll focus on the area around 1e-5 and 1e-6.

In [2]:
# Previous Grid:
# param_grid = {
#     'model_type': ['BiLSTM', 'BiGRU'],
#     'hidden_dim': [128, 256, 512],
#     'n_layers': [1, 2, 3],
#     'dropout': [0.5, 0.6, 0.7],
#     'weight_decay': [0, 1e-4, 1e-5, 1e-6]
# } # Total: 216 combinations

# Refined Grid (Round 2):
param_grid = {
    'model_type': ['BiLSTM', 'BiGRU'],
    'hidden_dim': [256, 384, 512],
    'n_layers': [2, 3, 4],
    'dropout': [0.4, 0.5, 0.6],
    'weight_decay': [5e-6, 1e-5, 5e-5] 
}

# Create all combinations
keys, values = zip(*param_grid.items())
hyperparam_combos = [dict(zip(keys, v)) for v in itertools.product(*values)]

print(f"Total combinations to test: {len(hyperparam_combos)}")
print("\n--- First 3 Combinations ---")
for combo in hyperparam_combos[:3]:
    print(combo)

Total combinations to test: 162

--- First 3 Combinations ---
{'model_type': 'BiLSTM', 'hidden_dim': 256, 'n_layers': 2, 'dropout': 0.4, 'weight_decay': 5e-06}
{'model_type': 'BiLSTM', 'hidden_dim': 256, 'n_layers': 2, 'dropout': 0.4, 'weight_decay': 1e-05}
{'model_type': 'BiLSTM', 'hidden_dim': 256, 'n_layers': 2, 'dropout': 0.4, 'weight_decay': 5e-05}


## 3. Model & Training Definitions

These are the same models and functions from our previous notebook, modified slightly to accept the hyperparams.

In [3]:
# === Model Definitions (BiLSTM & BiGRU) ===

class BiRNN_Model(nn.Module):
    def __init__(self, model_type, emb_dim, hidden_dim, output_dim, n_layers, bidirectional, dropout):
        super().__init__()
        
        self.embedding = create_embedding_layer(freeze=False) # Keep this compliant
        
        RNN_CLASS = nn.LSTM if model_type == 'BiLSTM' else nn.GRU
        
        self.rnn = RNN_CLASS(
            input_size=emb_dim, 
            hidden_size=hidden_dim, 
            num_layers=n_layers, 
            bidirectional=bidirectional,
            dropout=dropout if n_layers > 1 else 0,
            batch_first=False
        )
        
        fc_input_dim = hidden_dim * 2 if bidirectional else hidden_dim
        self.fc = nn.Linear(fc_input_dim, output_dim)
        self.dropout_layer = nn.Dropout(dropout)
        
    def forward(self, text, lengths):
        embedded = self.embedding(text)
        packed_embedded = nn.utils.rnn.pack_padded_sequence(embedded, lengths.to('cpu'), enforce_sorted=False)
        
        packed_output, hidden = self.rnn(packed_embedded)
        
        # If LSTM, hidden is a tuple (hidden, cell)
        if isinstance(hidden, tuple):
            hidden = hidden[0] # Just get the hidden state
        
        last_hidden_state = torch.cat((hidden[-2,:,:], hidden[-1,:,:]), dim=1)
        dropped_hidden = self.dropout_layer(last_hidden_state)
        prediction = self.fc(dropped_hidden)
        return prediction

# === Training/Evaluation Function Definitions ===

def get_accuracy(preds, y):
    top_pred = preds.argmax(1, keepdim=True)
    correct = top_pred.eq(y.view_as(top_pred)).sum()
    acc = correct.float() / y.shape[0]
    return acc

def train_epoch(model, iterator, optimizer, criterion):
    model.train()
    for batch in iterator:
        optimizer.zero_grad()
        text, lengths = batch.text
        predictions = model(text, lengths)
        loss = criterion(predictions, batch.label)
        loss.backward()
        optimizer.step()

def evaluate_epoch(model, iterator, criterion):
    model.eval()
    epoch_loss = 0
    epoch_acc = 0
    with torch.no_grad():
        for batch in iterator:
            text, lengths = batch.text
            predictions = model(text, lengths)
            loss = criterion(predictions, batch.label)
            acc = get_accuracy(predictions, batch.label)
            epoch_loss += loss.item()
            epoch_acc += acc.item()
    return epoch_loss / len(iterator), epoch_acc / len(iterator)

## 4. Run Grid Search (Round 2)

This will loop through all combinations. This cell will still take a while to run!

In [4]:
# === Get Static Parameters ===
INPUT_DIM = len(TEXT.vocab)
OUTPUT_DIM = len(LABEL.vocab)
EMBEDDING_DIM = create_embedding_layer().embedding_dim
N_EPOCHS = 10 # Train each combo for 10 epochs

grid_search_results = []
total_runs = len(hyperparam_combos)

print(f"--- Starting REFINED Grid Search for {total_runs} combinations ---),This will take some time.")

for i, params in enumerate(hyperparam_combos):
    run_num = i + 1
    print(f"\n{'='*20} RUN {run_num}/{total_runs} {'='*20}")
    print(f"Params: {params}")
    
    # 1. Instantiate Model
    model = BiRNN_Model(
        model_type=params['model_type'],
        emb_dim=EMBEDDING_DIM,
        hidden_dim=params['hidden_dim'],
        output_dim=OUTPUT_DIM,
        n_layers=params['n_layers'],
        bidirectional=True,
        dropout=params['dropout']
    ).to(device)
    
    # 2. Instantiate Optimizer and Criterion
    optimizer = optim.Adam(
        model.parameters(), 
        weight_decay=params['weight_decay'] # Add L2
    )
    criterion = nn.CrossEntropyLoss().to(device)
    
    best_valid_acc = -1.0
    start_run_time = time.time()
    
    # 3. Training Loop for this combination
    for epoch in range(N_EPOCHS):
        train_epoch(model, train_iterator, optimizer, criterion)
        valid_loss, valid_acc = evaluate_epoch(model, valid_iterator, criterion)
        
        if valid_acc > best_valid_acc:
            best_valid_acc = valid_acc
    
    end_run_time = time.time()
    run_duration_mins = (end_run_time - start_run_time) / 60
    
    # 4. Store Results
    result = params.copy()
    result['best_valid_acc'] = best_valid_acc * 100 # As percentage
    result['time_mins'] = run_duration_mins
    grid_search_results.append(result)
    
    print(f"Run {run_num} complete. Time: {run_duration_mins:.2f}m. Best Valid Acc: {best_valid_acc*100:.2f}%")

print("\n--- REFINED GRID SEARCH COMPLETE ---")

--- Starting REFINED Grid Search for 162 combinations ---),This will take some time.

Params: {'model_type': 'BiLSTM', 'hidden_dim': 256, 'n_layers': 2, 'dropout': 0.4, 'weight_decay': 5e-06}
Run 1 complete. Time: 0.09m. Best Valid Acc: 86.72%

Params: {'model_type': 'BiLSTM', 'hidden_dim': 256, 'n_layers': 2, 'dropout': 0.4, 'weight_decay': 1e-05}
Run 2 complete. Time: 0.08m. Best Valid Acc: 86.46%

Params: {'model_type': 'BiLSTM', 'hidden_dim': 256, 'n_layers': 2, 'dropout': 0.4, 'weight_decay': 5e-05}
Run 3 complete. Time: 0.08m. Best Valid Acc: 85.33%

Params: {'model_type': 'BiLSTM', 'hidden_dim': 256, 'n_layers': 2, 'dropout': 0.5, 'weight_decay': 5e-06}
Run 4 complete. Time: 0.08m. Best Valid Acc: 85.42%

Params: {'model_type': 'BiLSTM', 'hidden_dim': 256, 'n_layers': 2, 'dropout': 0.5, 'weight_decay': 1e-05}
Run 5 complete. Time: 0.08m. Best Valid Acc: 84.72%

Params: {'model_type': 'BiLSTM', 'hidden_dim': 256, 'n_layers': 2, 'dropout': 0.5, 'weight_decay': 5e-05}
Run 6 complet

KeyboardInterrupt: 

## 5. Analyze Results

Now we can load all the new results into a `pandas.DataFrame` to find the winning model.

In [None]:
results_df = pd.DataFrame(grid_search_results)

results_df = results_df.sort_values(by='best_valid_acc', ascending=False)

print("--- Refined Hyperparameter Tuning Results ---)(Ranked by Best Validation Accuracy)")
print(results_df.to_markdown(index=False, floatfmt=".6g"))

# Filter the results by model type
bilstm_df = results_df[results_df["model_type"] == "BiLSTM"]
bigru_df  = results_df[results_df["model_type"] == "BiGRU"]

# Save new JSON files
bilstm_json_path = "grid_search_results_BiLSTM.json"
bigru_json_path  = "grid_search_results_BiGRU.json"

bilstm_df.to_json(bilstm_json_path, orient="records", indent=4)
bigru_df.to_json(bigru_json_path,  orient="records", indent=4)

print(f"\n✓ Saved BiLSTM results to {bilstm_json_path}")
print(f"✓ Saved BiGRU results to {bigru_json_path}")

--- Refined Hyperparameter Tuning Results ---)(Ranked by Best Validation Accuracy)
| model_type   |   hidden_dim |   n_layers |   dropout |   weight_decay |   best_valid_acc |   time_mins |
|:-------------|-------------:|-----------:|----------:|---------------:|-----------------:|------------:|
| BiGRU        |          512 |          4 |       0.6 |          5e-06 |          88.4549 |   0.234842  |
| BiGRU        |          384 |          2 |       0.4 |          1e-05 |          88.2812 |   0.0972018 |
| BiGRU        |          384 |          2 |       0.6 |          5e-06 |          88.2812 |   0.0919387 |
| BiGRU        |          384 |          2 |       0.4 |          5e-06 |          87.7604 |   0.0912761 |
| BiGRU        |          512 |          2 |       0.4 |          5e-06 |          87.7604 |   0.177418  |
| BiGRU        |          256 |          3 |       0.6 |          1e-05 |          87.6736 |   0.13361   |
| BiLSTM       |          512 |          2 |       0.6 |     

### Next Steps

1.  This notebook has created new `grid_search_results_BiLSTM.json` and `grid_search_results_BiGRU.json` files.
2.  Look at the table above and identify the **new best-performing hyperparameters** for both BiLSTM and BiGRU.
3.  Go to your `Advanced_RNN_Models.ipynb` notebook.
4.  **Update the hyperparameters** in cells #11 (for BiLSTM) and #15 (for BiGRU) to match these new winning combinations.
5.  Re-run the `Advanced_RNN_Models.ipynb` notebook to get your final, reproducible results for your report.