# **Kaggle Challenge: Pirate Pain Dataset üè¥‚Äç‚ò†Ô∏è**

This notebook is a modified version of the Lecture 4 (Timeseries Classification) code, adapted for the Kaggle challenge.

**Local Setup:**
1.  Ensure you have a Conda environment with PyTorch (GPU), `pandas`, `sklearn`, `jupyterlab`, `ray[tune]`, and `optuna`.
2.  Place the Kaggle CSVs (`pirate_pain_train.csv`, `pirate_pain_train_labels.csv`, `pirate_pain_test.csv`) in a folder named `data/` in the same directory as this notebook.
3.  To run TensorBoard, open a separate terminal, `conda activate` your environment, `cd` to this folder, and run: `tensorboard --logdir=./tensorboard`

## ‚öôÔ∏è **1. Setup & Libraries**

In [66]:
# Set seed for reproducibility
SEED = 123

# Import necessary libraries
import os
import logging
import random
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns 
import copy
from itertools import product
import time

# Set environment variables before importing modules
os.environ['MPLCONFIGDIR'] = os.getcwd() + '/configs/'

# Suppress warnings
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
warnings.simplefilter(action='ignore', category=Warning)

# --- PyTorch Imports ---
import torch
from torch import nn
from torch.utils.tensorboard import SummaryWriter
from torch.utils.data import TensorDataset, DataLoader

# --- Sklearn Imports ---
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.model_selection import StratifiedShuffleSplit
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score, confusion_matrix

# --- Ray[tune] & Optuna Imports ---
import ray
from ray import tune
from ray.tune.schedulers import ASHAScheduler
from ray.tune.search.optuna import OptunaSearch
from functools import partial

# --- Setup Directories & Device ---
logs_dir = "tensorboard"
os.makedirs("models", exist_ok=True)
os.makedirs(logs_dir, exist_ok=True)

if torch.cuda.is_available():
    device = torch.device("cuda")
    torch.cuda.manual_seed_all(SEED)
    torch.backends.cudnn.benchmark = True
    print("\n--- Using GPU (RTX 3070, here we come!) ---")
else:
    device = torch.device("cpu")
    print("\n--- Using CPU ---")

print(f"PyTorch version: {torch.__version__}")
print(f"Device: {device}")

# Configure plot display settings
sns.set_theme(font_scale=1.4)
sns.set_style('white')
plt.rc('font', size=14)
%matplotlib inline


--- Using GPU (RTX 3070, here we come!) ---
PyTorch version: 2.5.1
Device: cuda


## üîÑ **2. Data Loading & Reshaping**

This is the most critical new step. The data is in a "long" format (one row per timestep), where each `sample_index` is one complete time series. We must:
1.  Define the features we want to use.
2.  Pivot the data to get a 3D tensor of shape `(num_samples, num_timesteps, num_features)`.

In [67]:
# --- 1. Define File Paths and Features ---
DATA_DIR = "data"
X_TRAIN_PATH = os.path.join(DATA_DIR, "pirate_pain_train.csv")
Y_TRAIN_PATH = os.path.join(DATA_DIR, "pirate_pain_train_labels.csv")
X_TEST_PATH = os.path.join(DATA_DIR, "pirate_pain_test.csv")
SUBMISSION_PATH = os.path.join(DATA_DIR, "sample_submission.csv")

# Define our time-series features
# We'll ignore static features (n_legs, etc.) for our baseline model
JOINT_FEATURES = [f"joint_{i:02d}" for i in range(31)]
PAIN_FEATURES = [f"pain_survey_{i}" for i in range(1, 5)]
FEATURES = JOINT_FEATURES + PAIN_FEATURES

N_FEATURES = len(FEATURES)
N_TIMESTEPS = 160 # Fixed from our earlier debugging

print(f"Using {N_FEATURES} features: {FEATURES[:3]}... to {FEATURES[-3:]}")

# --- 2. Create the Reshaping Function ---
def reshape_data(df, features_list, n_timesteps):
    """
    Pivots the long-format dataframe into a 3D NumPy array.
    Shape: (n_samples, n_timesteps, n_features)
    """
    df_pivot = df.pivot(index='sample_index', columns='time', values=features_list)
    data_2d = df_pivot.values
    n_samples = data_2d.shape[0]
    data_3d = data_2d.reshape(n_samples, len(features_list), n_timesteps)
    return data_3d.transpose(0, 2, 1)

def create_sliding_windows(X_3d, y=None, window_size=100, stride=20):
    """
    Takes 3D data (n_samples, n_timesteps, n_features)
    and creates overlapping windows.
    
    Returns:
    - new_X: (n_windows, window_size, n_features)
    - new_y (if y is provided): (n_windows,)
    - window_indices: (n_windows,) array tracking which original sample
                      (e.g., 0, 1, 2...) each window came from.
    """
    new_X = []
    new_y = []
    # This new array tracks which original sample each window came from.
    window_indices = [] 
    
    n_samples, n_timesteps, n_features = X_3d.shape
    
    # Iterate over each original sample
    for i in range(n_samples):
        sample = X_3d[i] # Shape (160, 35)
        
        # Slide a window over this sample
        idx = 0
        while (idx + window_size) <= n_timesteps:
            window = sample[idx : idx + window_size]
            new_X.append(window)
            window_indices.append(i) # Track the original sample index (0, 1, 2...)
            
            if y is not None:
                new_y.append(y[i]) # The label is the same for all windows
                
            idx += stride
            
    if y is not None:
        # Return new X, new y, and the index mapping
        return np.array(new_X), np.array(new_y), np.array(window_indices)
    else:
        # Return new X and the index mapping
        return np.array(new_X), np.array(window_indices)

# --- 3. Load and Reshape Data ---
print("Loading and reshaping training data...")
X_train_long = pd.read_csv(X_TRAIN_PATH)
X_train_full = reshape_data(X_train_long[X_train_long['sample_index'].isin(X_train_long['sample_index'].unique())], FEATURES, N_TIMESTEPS)

print("Loading and reshaping test data...")
X_test_long = pd.read_csv(X_TEST_PATH)
X_test = reshape_data(X_test_long, FEATURES, N_TIMESTEPS)

# Load labels
y_train_df = pd.read_csv(Y_TRAIN_PATH)
y_train_full_df = y_train_df.sort_values(by='sample_index')
y_train_labels_str = y_train_full_df['label'].values # Fixed from our debugging

print(f"X_train_full shape: {X_train_full.shape}")
print(f"y_train_labels_str shape: {y_train_labels_str.shape}")
print(f"X_test shape: {X_test.shape}")

del X_train_long, X_test_long, y_train_df

Using 35 features: ['joint_00', 'joint_01', 'joint_02']... to ['pain_survey_2', 'pain_survey_3', 'pain_survey_4']
Loading and reshaping training data...
Loading and reshaping test data...
X_train_full shape: (661, 160, 35)
y_train_labels_str shape: (661,)
X_test shape: (1324, 160, 35)


## üöß **3. Preprocessing: Split & Scale**

1.  **Encode Labels:** Convert `no_pain`, `low_pain`, `high_pain` to `0`, `1`, `2`.
2.  **Split Data:** Use `StratifiedShuffleSplit` to create a single 80/20 train/validation split. This ensures both sets have the same class proportions.
3.  **Scale Features:** Use `StandardScaler`. We `fit` it *only* on the training data and `transform` all sets (train, val, and test).

In [68]:
# --- 1. Encode Labels ---
LABEL_MAPPING = {'no_pain': 0, 'low_pain': 1, 'high_pain': 2}
le = LabelEncoder()
le.fit(list(LABEL_MAPPING.keys()))
y_train_full = le.transform(y_train_labels_str)
N_CLASSES = len(LABEL_MAPPING)

print(f"Labels encoded. {N_CLASSES} classes: {LABEL_MAPPING}")

# 1. DEFINE YOUR WINDOW PARAMETERS
NEW_WINDOW_SIZE = 80 # Example: 80 timesteps
NEW_STRIDE = 20       # Example: 20 timesteps

print("--- Applying sliding window augmentation ---")
# 3. APPLY THE NEW WINDOWING FUNCTION
X_train_windowed, y_train_windowed, _ = create_sliding_windows(
    X_train_full, 
    y_train_full, 
    window_size=NEW_WINDOW_SIZE, 
    stride=NEW_STRIDE
)

print(f"Original X shape: {X_train_full.shape}")
print(f"Windowed X shape: {X_train_windowed.shape}")
print(f"Original y shape: {y_train_full.shape}")
print(f"Windowed y shape: {y_train_windowed.shape}")

# (You would also need to apply this to X_test for submission)
# (But NOT to y_train_full for the split)

# 4. USE THE *WINDOWED* DATA FOR YOUR SPLIT
print("\n--- Splitting windowed data ---")
sss = StratifiedShuffleSplit(n_splits=1, test_size=0.2, random_state=SEED)

# IMPORTANT: You split the *new*, *larger* X and y arrays
for train_idx, val_idx in sss.split(X_train_windowed, y_train_windowed):
    X_train_split = X_train_windowed[train_idx]
    y_train_split = y_train_windowed[train_idx]
    X_val_split = X_train_windowed[val_idx]
    y_val_split = y_train_windowed[val_idx]

print(f"  X_train_split: {X_train_split.shape}")
print(f"  y_train_split: {y_train_split.shape}")
print(f"  X_val_split:   {X_val_split.shape}")
print(f"  y_val_split:   {y_val_split.shape}")
# --- 3. Scale Features (The "No-Cheating" Rule) ---
scaler = StandardScaler()
ns, ts, f = X_train_split.shape
X_train_2d = X_train_split.reshape(ns * ts, f)
print(f"Fitting Scaler on X_train_2d shape: {X_train_2d.shape}")
scaler.fit(X_train_2d)

X_train_scaled_2d = scaler.transform(X_train_2d)
X_train_scaled = X_train_scaled_2d.reshape(ns, ts, f)

ns_val, ts_val, f_val = X_val_split.shape
X_val_2d = X_val_split.reshape(ns_val * ts_val, f_val)
X_val_scaled_2d = scaler.transform(X_val_2d)
X_val_scaled = X_val_scaled_2d.reshape(ns_val, ts_val, f_val)

ns_test, ts_test, f_test = X_test.shape
X_test_2d = X_test.reshape(ns_test * ts_test, f_test)
X_test_scaled_2d = scaler.transform(X_test_2d)
X_test_scaled = X_test_scaled_2d.reshape(ns_test, ts_test, f_test)

print("Scaling complete.")
print(f"  X_train_scaled: {X_train_scaled.shape}")
print(f"  X_val_scaled:   {X_val_scaled.shape}")
print(f"  X_test_scaled:  {X_test_scaled.shape}")

del X_train_2d, X_val_2d, X_test_2d, X_train_scaled_2d, X_val_scaled_2d, X_test_scaled_2d

Labels encoded. 3 classes: {'no_pain': 0, 'low_pain': 1, 'high_pain': 2}
--- Applying sliding window augmentation ---
Original X shape: (661, 160, 35)
Windowed X shape: (3305, 80, 35)
Original y shape: (661,)
Windowed y shape: (3305,)

--- Splitting windowed data ---
  X_train_split: (2644, 80, 35)
  y_train_split: (2644,)
  X_val_split:   (661, 80, 35)
  y_val_split:   (661,)
Fitting Scaler on X_train_2d shape: (211520, 35)
Scaling complete.
  X_train_scaled: (2644, 80, 35)
  X_val_scaled:   (661, 80, 35)
  X_test_scaled:  (1324, 160, 35)


## üöö **4. PyTorch DataLoaders**

This section is identical to Lecture 4. We wrap our NumPy arrays in `TensorDataset` and `DataLoader` to efficiently feed batches to the GPU.

In [69]:
# --- 1. Convert to Tensors ---
train_features = torch.from_numpy(X_train_scaled).float()
train_targets = torch.from_numpy(y_train_split).long()

val_features = torch.from_numpy(X_val_scaled).float()
val_targets = torch.from_numpy(y_val_split).long()

test_features = torch.from_numpy(X_test_scaled).float()

# --- 2. Create TensorDatasets ---
train_ds = TensorDataset(train_features, train_targets)
val_ds = TensorDataset(val_features, val_targets)
test_ds = TensorDataset(test_features) # Test set has no labels

# --- 3. Define make_loader function (from Lecture 4) ---
BATCH_SIZE = 128 # This will be our default, but Optuna can tune it

def make_loader(ds, batch_size, shuffle, drop_last):
    # Set num_workers=0 for Windows-friendly loading (from our debugging)
    num_workers = 0 
    
    # Create DataLoader with performance optimizations
    return DataLoader(
        ds,
        batch_size=int(batch_size), # Ensure batch_size is an int for the DataLoader
        shuffle=shuffle,
        drop_last=drop_last,
        num_workers=num_workers,
        pin_memory=True,
        pin_memory_device="cuda" if torch.cuda.is_available() else "",
        prefetch_factor=None,
    )

# --- 4. Create DataLoaders ---
# We will create these *inside* the objective function now,
# as the batch size is a hyperparameter we want to tune.
print("DataLoaders will be created inside the tuning loop.")
del X_train_scaled, X_val_scaled, test_features, train_features, val_features

DataLoaders will be created inside the tuning loop.


## üõ†Ô∏è **5. Model & Training Engine (From Lecture 4)**

These are the core components from Lecture 4, slightly modified for Ray Tune.

-   `RecurrentClassifier`: Our flexible model (RNN, LSTM, GRU).
-   `fit`: The main training loop. **This is now modified** to accept a `config` dict and report results back to `ray.tune` instead of running for a fixed number of epochs.

In [70]:
def recurrent_summary(model, input_size):
    """
    Custom summary function that emulates torchinfo's output while correctly
    counting parameters for RNN/GRU/LSTM layers.
    """

    output_shapes = {}
    hooks = []

    def get_hook(name):
        def hook(module, input, output):
            if isinstance(output, tuple):
                shape1 = list(output[0].shape)
                shape1[0] = -1  # Replace batch dimension with -1

                if isinstance(output[1], tuple):  # LSTM case: (h_n, c_n)
                    shape2 = list(output[1][0].shape)
                else:  # RNN/GRU case: h_n only
                    shape2 = list(output[1].shape)
                shape2[1] = -1
                output_shapes[name] = f"[{shape1}, {shape2}]"
            else:
                shape = list(output.shape)
                shape[0] = -1
                output_shapes[name] = f"{shape}"
        return hook

    try:
        device_summary = next(model.parameters()).device
    except StopIteration:
        device_summary = torch.device("cpu")

    dummy_input = torch.randn(1, *input_size).to(device_summary)

    for name, module in model.named_children():
        if isinstance(module, (nn.Linear, nn.RNN, nn.GRU, nn.LSTM)):
            hook_handle = module.register_forward_hook(get_hook(name))
            hooks.append(hook_handle)

    model.eval()
    with torch.no_grad():
        try:
            model(dummy_input)
        except Exception as e:
            print(f"Error during dummy forward pass: {e}")
            for h in hooks:
                h.remove()
            return

    for h in hooks:
        h.remove()

    print("-" * 79)
    print(f"{'Layer (type)':<25} {'Output Shape':<28} {'Param #':<18}")
    print("=" * 79)

    total_params = 0
    total_trainable_params = 0

    for name, module in model.named_children():
        if name in output_shapes:
            module_params = sum(p.numel() for p in module.parameters())
            trainable_params = sum(p.numel() for p in module.parameters() if p.requires_grad)

            total_params += module_params
            total_trainable_params += trainable_params

            layer_name = f"{name} ({type(module).__name__})"
            output_shape_str = str(output_shapes[name])
            params_str = f"{trainable_params:,}"

            print(f"{layer_name:<25} {output_shape_str:<28} {params_str:<15}")

    print("=" * 79)
    print(f"Total params: {total_params:,}")
    print(f"Trainable params: {total_trainable_params:,}")
    print(f"Non-trainable params: {total_params - total_trainable_params:,}")
    print("-" * 79)

In [71]:
class RecurrentClassifier(nn.Module):
    """
    Generic RNN classifier (RNN, LSTM, GRU) from Lecture 4.
    Uses the last hidden state for classification.
    """
    def __init__(
            self,
            input_size,
            hidden_size,
            num_layers,
            num_classes,
            rnn_type='GRU',        # 'RNN', 'LSTM', or 'GRU'
            bidirectional=False,
            dropout_rate=0.2
            ):
        super().__init__()

        self.rnn_type = rnn_type
        self.num_layers = num_layers
        self.hidden_size = hidden_size
        self.bidirectional = bidirectional

        rnn_map = {
            'RNN': nn.RNN,
            'LSTM': nn.LSTM,
            'GRU': nn.GRU
        }

        if rnn_type not in rnn_map:
            raise ValueError("rnn_type must be 'RNN', 'LSTM', or 'GRU'")

        rnn_module = rnn_map[rnn_type]

        # Dropout is only applied between layers (if num_layers > 1)
        dropout_val = dropout_rate if num_layers > 1 else 0

        self.rnn = rnn_module(
            input_size=input_size,
            hidden_size=hidden_size,
            num_layers=num_layers,
            batch_first=True,       # Input shape: (batch, seq_len, features)
            bidirectional=bidirectional,
            dropout=dropout_val
        )

        if self.bidirectional:
            classifier_input_size = hidden_size * 2 # Concat fwd + bwd
        else:
            classifier_input_size = hidden_size

        self.classifier = nn.Linear(classifier_input_size, num_classes)

    def forward(self, x):
        """
        x shape: (batch_size, seq_length, input_size)
        """
        rnn_out, hidden = self.rnn(x)

        if self.rnn_type == 'LSTM':
            hidden = hidden[0]

        if self.bidirectional:
            hidden = hidden.view(self.num_layers, 2, -1, self.hidden_size)
            hidden_to_classify = torch.cat([hidden[-1, 0, :, :], hidden[-1, 1, :, :]], dim=1)
        else:
            hidden_to_classify = hidden[-1]

        logits = self.classifier(hidden_to_classify)
        return logits

In [72]:
# This cell is MODIFIED to work with Ray Tune.
# The fit function is now part of the 'objective_function'

def train_one_epoch(model, train_loader, criterion, optimizer, scaler, device, l1_lambda=0, l2_lambda=0):
    model.train()
    running_loss = 0.0
    all_predictions = []
    all_targets = []

    for batch_idx, (inputs, targets) in enumerate(train_loader):
        inputs, targets = inputs.to(device), targets.to(device)
        optimizer.zero_grad(set_to_none=True)

        with torch.amp.autocast(device_type=device.type, enabled=(device.type == 'cuda')):
            logits = model(inputs)
            loss = criterion(logits, targets)
            l1_norm = sum(p.abs().sum() for p in model.parameters())
            l2_norm = sum(p.pow(2).sum() for p in model.parameters())
            loss = loss + l1_lambda * l1_norm + l2_lambda * l2_norm

        scaler.scale(loss).backward()
        scaler.step(optimizer)
        scaler.update()

        running_loss += loss.item() * inputs.size(0)
        predictions = logits.argmax(dim=1)
        all_predictions.append(predictions.cpu().numpy())
        all_targets.append(targets.cpu().numpy())

    epoch_loss = running_loss / len(train_loader.dataset)
    epoch_f1 = f1_score(
        np.concatenate(all_targets),
        np.concatenate(all_predictions),
        average='weighted'
    )
    return epoch_loss, epoch_f1

def validate_one_epoch(model, val_loader, criterion, device):
    model.eval()
    running_loss = 0.0
    all_predictions = []
    all_targets = []

    with torch.no_grad():
        for inputs, targets in val_loader:
            inputs, targets = inputs.to(device), targets.to(device)

            with torch.amp.autocast(device_type=device.type, enabled=(device.type == 'cuda')):
                logits = model(inputs)
                loss = criterion(logits, targets)

            running_loss += loss.item() * inputs.size(0)
            predictions = logits.argmax(dim=1)
            all_predictions.append(predictions.cpu().numpy())
            all_targets.append(targets.cpu().numpy())

    epoch_loss = running_loss / len(val_loader.dataset)
    epoch_f1 = f1_score(
        np.concatenate(all_targets),
        np.concatenate(all_predictions),
        average='weighted'
    )
    return epoch_loss, epoch_f1

def log_metrics_to_tensorboard(writer, epoch, train_loss, train_f1, val_loss, val_f1, model):
    # This function is not strictly needed for tune, but good to keep
    writer.add_scalar('Loss/Training', train_loss, epoch)
    writer.add_scalar('Loss/Validation', val_loss, epoch)
    writer.add_scalar('F1/Training', train_f1, epoch)
    writer.add_scalar('F1/Validation', val_f1, epoch)


# The 'fit' function is now part of the objective function below
# We create a NEW objective_function for Ray Tune

def objective_function(config, train_ds, val_ds):
    """
    This is the main function that Ray Tune will call for each trial.
    'config' is a dictionary of hyperparameters from Optuna.
    'train_ds' and 'val_ds' are our TensorDatasets.
    """
    
    # --- 1. Create DataLoaders with the tuned batch size ---
    # We create them here because batch_size is a hyperparameter
    train_loader = make_loader(train_ds, batch_size=config["batch_size"], shuffle=True, drop_last=True)
    val_loader = make_loader(val_ds, batch_size=config["batch_size"], shuffle=False, drop_last=False)
    
    # --- 2. Create Model --- 
    model = RecurrentClassifier(
        input_size=N_FEATURES,
        hidden_size=config["hidden_size"],
        num_layers=config["num_layers"],
        num_classes=N_CLASSES,
        dropout_rate=config["dropout_rate"],
        bidirectional=config["bidirectional"],
        rnn_type=config["rnn_type"]
    ).to(device)
    
    if torch.__version__[0] >= "2":
        model = torch.compile(model)
    
    # --- 3. Create Optimizer, Loss, Scaler ---
    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.AdamW(model.parameters(), lr=config["lr"], weight_decay=config["l2_lambda"])
    scaler = torch.amp.GradScaler(enabled=(device.type == 'cuda'))

    # --- 4. The Training Loop (adapted from fit) ---
    # We loop for a fixed number of epochs (e.g., 200) and let Ray's
    # ASHA scheduler handle early stopping of bad trials.
    EPOCHS = 200 
    
    for epoch in range(1, EPOCHS + 1):
        train_loss, train_f1 = train_one_epoch(
            model, train_loader, criterion, optimizer, scaler, device, 0, config["l2_lambda"]
        )

        val_loss, val_f1 = validate_one_epoch(
            model, val_loader, criterion, device
        )
        
        # --- Send Results to Ray Tune --- 
        # This is the most important part.
        # Ray Tune will use this 'val_f1' score to make decisions.
        # FIX: Pass metrics as a single dictionary
        tune.report({
            "train_loss": train_loss,
            "train_f1": train_f1,
            "val_loss": val_loss,
            "val_f1": val_f1
        })


## üß™ **6. Hyperparameter Search with Ray Tune & Optuna**

This cell replaces our old manual "Experiment 1".

1.  **`search_space`**: We define the hyperparameters Optuna is allowed to search.
2.  **`OptunaSearch`**: We tell Ray Tune to use Optuna's smart search algorithm.
3.  **`ASHAScheduler`**: We tell Ray Tune to automatically stop trials that are not performing well (to save time).
4.  **`tune.run`**: This starts the parallel search. It will run `NUM_SAMPLES` (e.g., 20) different experiments, using `gpus_per_trial` to run them on the GPU.

In [74]:
# --- 1. Define the Search Space for Optuna ---
search_space = {
    "rnn_type": tune.choice(['GRU', 'LSTM']),
    "lr": tune.loguniform(1e-5, 1e-2),           # Widen the learning rate
    "batch_size": tune.choice([64, 128, 256]),  
    "hidden_size": tune.choice([128, 256, 384]),# Let's try bigger models
    "num_layers": tune.choice([2, 3]),       # Let's try deeper models
    "dropout_rate": tune.uniform(0.1, 0.6),     # Widen the dropout range
    "bidirectional": tune.choice([True, False]),
    "l2_lambda": tune.loguniform(1e-7, 1e-3)      # Widen the L2 range
}

# --- 2. Define the Optimizer (Optuna) and Scheduler (ASHA) ---
optuna_search = OptunaSearch(
    metric="val_f1",
    mode="max"
)

scheduler = ASHAScheduler(
    metric="val_f1",
    mode="max",
    grace_period=20,  # Min epochs a trial must run
    reduction_factor=2  # How aggressively to stop trials
)

# --- 3. Initialize Ray (WITH THE BIG HAMMER FIX) ---
if ray.is_initialized():
    ray.shutdown()

# --- FIX 2 (The "Big Hammer"): Set the Environment Variable ---
# This forces Ray to use this short path for *all* its temp files.
ray_logs_path = os.path.abspath("./ray_results")
os.makedirs(ray_logs_path, exist_ok=True)
os.environ["RAY_TEMP_DIR"] = ray_logs_path
# --- END FIX ---

ray.init(
    num_cpus=16, 
    num_gpus=1, 
    ignore_reinit_error=True
)

def short_trial_name(trial):
    """Creates a short, unique name for each trial folder."""
    return f"{trial.trainable_name}_{trial.trial_id}"


# --- 4. Run the Tuner ---
print("Starting hyperparameter search (1 trial at a time)...")

analysis = tune.run(
    tune.with_parameters(objective_function, train_ds=train_ds, val_ds=val_ds),
    
    resources_per_trial={"cpu": 4, "gpu": 0.25}, 
    
    config=search_space,
    num_samples=20, # Number of different HPO trials to run
    search_alg=optuna_search,
    scheduler=scheduler,
    name="pirate_pain_optuna_search",

    storage_path=ray_logs_path,
    
    trial_dirname_creator=short_trial_name,
    
    log_to_file=True,
    verbose=1 # 0 = quiet, 1 = table, 2 = detailed
)

print("\n--- Search Complete ---")


0,1
Current time:,2025-11-08 15:00:39
Running for:,00:11:38.72
Memory:,10.1/13.9 GiB

Trial name,status,loc,batch_size,bidirectional,dropout_rate,hidden_size,l2_lambda,lr,num_layers,rnn_type,iter,total time (s),train_loss,train_f1,val_loss
objective_function_f14856cc,TERMINATED,127.0.0.1:30156,64,True,0.197461,256,1.60861e-05,6.34824e-05,2,LSTM,20,67.5903,0.176599,0.949192,0.127393
objective_function_f169db90,TERMINATED,127.0.0.1:34416,256,True,0.10323,384,2.08283e-05,0.00031591,3,LSTM,100,216.986,0.0415726,1.0,0.0783661
objective_function_b1d11c86,TERMINATED,127.0.0.1:48160,64,True,0.124886,384,9.07201e-07,0.00104391,2,LSTM,100,385.408,0.00531774,1.0,0.0705994
objective_function_2d4c9246,TERMINATED,127.0.0.1:49928,64,False,0.434178,256,3.44393e-06,4.25182e-05,2,LSTM,20,57.0431,0.270809,0.890112,0.248498
objective_function_c5b176ba,TERMINATED,127.0.0.1:3192,128,True,0.119342,384,0.000797479,1.43133e-05,2,LSTM,20,63.8124,2.99978,0.739021,0.514179
objective_function_db57d655,TERMINATED,127.0.0.1:35972,64,True,0.177049,384,0.000610535,0.000562012,3,GRU,20,139.939,0.223743,0.955876,0.140335
objective_function_d4a14296,TERMINATED,127.0.0.1:20440,128,True,0.157407,128,8.95704e-07,0.0020676,3,LSTM,40,64.5558,0.0465825,0.985502,0.101211
objective_function_98c3f61d,TERMINATED,127.0.0.1:29668,64,False,0.308104,128,1.47736e-05,2.41254e-05,3,GRU,20,23.2886,0.515759,0.78475,0.458598
objective_function_71bdab5b,TERMINATED,127.0.0.1:47852,256,False,0.291669,256,1.27757e-07,9.0598e-05,2,GRU,20,13.44,0.392044,0.821992,0.364737
objective_function_c65380e2,TERMINATED,127.0.0.1:20360,128,False,0.463619,128,1.89574e-05,0.00650139,3,LSTM,40,24.6971,0.237155,0.957687,0.122496


[36m(pid=gcs_server)[0m [2025-11-08 14:49:21,423 E 48960 49820] (gcs_server.exe) gcs_server.cc:302: Failed to establish connection to the event+metrics exporter agent. Events and metrics will not be exported. Exporter agent status: RpcError: Running out of retries to initialize the metrics agent. rpc_code: 14
[33m(raylet)[0m [2025-11-08 14:49:26,071 E 49188 23516] (raylet.exe) main.cc:975: Failed to establish connection to the metrics exporter agent. Metrics will not be exported. Exporter agent status: RpcError: Running out of retries to initialize the metrics agent. rpc_code: 14
2025-11-08 15:00:39,428	INFO tune.py:1009 -- Wrote the latest version of all result files and experiment state to 'c:/Users/Karim Negm/Documents/AN2DL/Challenge 1/ray_results/pirate_pain_optuna_search' in 0.0352s.
2025-11-08 15:00:39,451	INFO tune.py:1041 -- Total run time: 698.77 seconds (698.67 seconds for the tuning loop).



--- Search Complete ---


In [75]:
# --- 5. Get Best Results (FIX 3) ---
# We must *explicitly* tell the analysis object what metric/mode to use
print("Getting best trial from analysis...")
best_trial = analysis.get_best_trial(metric="val_f1", mode="max", scope="all")
if best_trial:
    best_config = best_trial.config
    best_val_f1 = best_trial.last_result["val_f1"]
    
    print(f"Best validation F1 score: {best_val_f1:.4f}")
    print("Best hyperparameters found:")
    print(best_config)
else:
    print("ERROR: No trials completed successfully. Check the 'ray_results' folder for logs.")
    best_config = None # Handle the case where all trials failed

Getting best trial from analysis...
Best validation F1 score: 0.9895
Best hyperparameters found:
{'rnn_type': 'GRU', 'lr': 0.000945587937050466, 'batch_size': 64, 'hidden_size': 384, 'num_layers': 2, 'dropout_rate': 0.4140005708856067, 'bidirectional': False, 'l2_lambda': 3.205126834524651e-07}


## üèÜ **7. Final Model Configuration**

After the search is complete, **copy the output from the cell above** and paste it here. This cell now holds our winning configuration, ready for the final training and submission.

In [76]:
# ===================================================================
# --- üèÜ FINAL MODEL CONFIGURATION üèÜ ---
# ===================================================================
# --- 1. Get Best Config from Analysis --- 
# We can get this automatically from the 'analysis' object
FINAL_CONFIG = best_config
FINAL_BEST_VAL_F1 = best_val_f1

# We need to get the 'best_epoch' for the final training.
# We'll re-run the 'fit' function from Lecture 4 (which we must add back)
# on the best config to find the optimal number of epochs.

print("--- üèÜ Final Configuration Set --- ")
print(f"Best Val F1 from search: {FINAL_BEST_VAL_F1:.4f}")
print(FINAL_CONFIG)

--- üèÜ Final Configuration Set --- 
Best Val F1 from search: 0.9895
{'rnn_type': 'GRU', 'lr': 0.000945587937050466, 'batch_size': 64, 'hidden_size': 384, 'num_layers': 2, 'dropout_rate': 0.4140005708856067, 'bidirectional': False, 'l2_lambda': 3.205126834524651e-07}


### **Re-run to find Best Epoch**

The search trained models for 200 epochs and reported the best score *at any point*. Now we need to re-run the best model *with early stopping* to find the *exact* best epoch for our final submission training.

In [77]:
# --- We need the original 'fit' function back ---
def fit(model, train_loader, val_loader, epochs, criterion, optimizer, scaler, device,
        l1_lambda=0, l2_lambda=0, patience=0, evaluation_metric="val_f1", mode='max',
        restore_best_weights=True, writer=None, verbose=10, experiment_name=""):
    
    training_history = {
        'train_loss': [], 'val_loss': [],
        'train_f1': [], 'val_f1': []
    }
    
    model_path = f"models/{experiment_name}_best_model.pt"

    if patience > 0:
        patience_counter = 0
        best_metric = float('-inf') if mode == 'max' else float('inf')
        best_epoch = 0

    print(f"--- Starting Training: {experiment_name} ---")
    print(f"Will train for {epochs} epochs with patience={patience} monitoring {evaluation_metric}")

    for epoch in range(1, epochs + 1):
        train_loss, train_f1 = train_one_epoch(
            model, train_loader, criterion, optimizer, scaler, device, l1_lambda, l2_lambda
        )

        val_loss, val_f1 = validate_one_epoch(
            model, val_loader, criterion, device
        )

        training_history['train_loss'].append(train_loss)
        training_history['val_loss'].append(val_loss)
        training_history['train_f1'].append(train_f1)
        training_history['val_f1'].append(val_f1)

        if writer is not None:
            log_metrics_to_tensorboard(
                writer, epoch, train_loss, train_f1, val_loss, val_f1, model
            )

        if verbose > 0 and (epoch % verbose == 0 or epoch == 1):
            print(f"Epoch {epoch:3d}/{epochs} | "
                  f"Train: Loss={train_loss:.4f}, F1={train_f1:.4f} | "
                  f"Val: Loss={val_loss:.4f}, F1={val_f1:.4f}")

        if patience > 0:
            current_metric = training_history[evaluation_metric][-1]
            is_improvement = (current_metric > best_metric) if mode == 'max' else (current_metric < best_metric)

            if is_improvement:
                best_metric = current_metric
                best_epoch = epoch
                torch.save(model.state_dict(), model_path)
                patience_counter = 0
            else:
                patience_counter += 1
                if patience_counter >= patience:
                    print(f"\nEarly stopping triggered after {epoch} epochs.")
                    break

    if restore_best_weights and patience > 0:
        print(f"Restoring best model from epoch {best_epoch} with {evaluation_metric} {best_metric:.4f}")
        model.load_state_dict(torch.load(model_path))

    if patience == 0:
        print("Training complete. Saving final model.")
        torch.save(model.state_dict(), model_path.replace("_best_model.pt", "_final_model.pt"))

    if writer is not None:
        writer.close()
    
    print(f"--- Finished Training: {experiment_name} ---")
    return model, training_history, best_epoch if 'best_epoch' in locals() else epochs

# --- 1. Create DataLoaders for the best config ---
best_batch_size = FINAL_CONFIG["batch_size"]
train_loader_final_check = make_loader(train_ds, batch_size=best_batch_size, shuffle=True, drop_last=True)
val_loader_final_check = make_loader(val_ds, batch_size=best_batch_size, shuffle=False, drop_last=False)

# --- 2. Instantiate Fresh Model ---
final_check_model = RecurrentClassifier(
    input_size=N_FEATURES,
    hidden_size=FINAL_CONFIG["hidden_size"],
    num_layers=FINAL_CONFIG["num_layers"],
    num_classes=N_CLASSES,
    dropout_rate=FINAL_CONFIG["dropout_rate"],
    bidirectional=FINAL_CONFIG["bidirectional"],
    rnn_type=FINAL_CONFIG["rnn_type"]
).to(device)

if torch.__version__[0] >= "2":
    final_check_model = torch.compile(final_check_model)

final_check_optimizer = torch.optim.AdamW(final_check_model.parameters(), lr=FINAL_CONFIG["lr"], weight_decay=FINAL_CONFIG["l2_lambda"])
final_check_scaler = torch.amp.GradScaler(enabled=(device.type == 'cuda'))
final_check_criterion = nn.CrossEntropyLoss()

# --- 3. Run Training with Early Stopping ---
print("--- Finding best epoch number for the winning model ---")
_, _, FINAL_BEST_EPOCH = fit(
    model=final_check_model,
    train_loader=train_loader_final_check,
    val_loader=val_loader_final_check,
    epochs=200, # Max epochs
    criterion=final_check_criterion,
    optimizer=final_check_optimizer,
    scaler=final_check_scaler,
    device=device,
    writer=None, # No need to log this one
    verbose=10,
    experiment_name="final_check",
    patience=30 # Use a reasonable patience
)

print(f"\n--- üèÜ Optimal Epochs Found: {FINAL_BEST_EPOCH} ---")

# --- 4. Set variables for the submission cell ---
# This populates the variables needed for the final submission script
FINAL_MODEL_TYPE = FINAL_CONFIG["rnn_type"]
FINAL_HIDDEN_SIZE = FINAL_CONFIG["hidden_size"]
FINAL_HIDDEN_LAYERS = FINAL_CONFIG["num_layers"]
FINAL_BIDIRECTIONAL = FINAL_CONFIG["bidirectional"]
FINAL_DROPOUT_RATE = FINAL_CONFIG["dropout_rate"]
FINAL_LEARNING_RATE = FINAL_CONFIG["lr"]
FINAL_L2_LAMBDA = FINAL_CONFIG["l2_lambda"]
FINAL_BATCH_SIZE = FINAL_CONFIG["batch_size"]

FINAL_EXPERIMENT_NAME = f"{FINAL_MODEL_TYPE}_H{FINAL_HIDDEN_SIZE}_L{FINAL_HIDDEN_LAYERS}_B{FINAL_BIDIRECTIONAL}_D{FINAL_DROPOUT_RATE}_Optuna_FINAL"

print(f"Submission name will be: submission_{FINAL_EXPERIMENT_NAME}.csv")

--- Finding best epoch number for the winning model ---
--- Starting Training: final_check ---
Will train for 200 epochs with patience=30 monitoring val_f1
Epoch   1/200 | Train: Loss=0.6026, F1=0.7457 | Val: Loss=0.4480, F1=0.7934
Epoch  10/200 | Train: Loss=0.1089, F1=0.9598 | Val: Loss=0.1203, F1=0.9603
Epoch  20/200 | Train: Loss=0.0305, F1=0.9904 | Val: Loss=0.0779, F1=0.9800
Epoch  30/200 | Train: Loss=0.0233, F1=0.9924 | Val: Loss=0.0749, F1=0.9833
Epoch  40/200 | Train: Loss=0.0002, F1=1.0000 | Val: Loss=0.0818, F1=0.9878
Epoch  50/200 | Train: Loss=0.0001, F1=1.0000 | Val: Loss=0.0874, F1=0.9894
Epoch  60/200 | Train: Loss=0.0000, F1=1.0000 | Val: Loss=0.0912, F1=0.9894
Epoch  70/200 | Train: Loss=0.0000, F1=1.0000 | Val: Loss=0.0945, F1=0.9894

Early stopping triggered after 73 epochs.
Restoring best model from epoch 43 with val_f1 0.9894
--- Finished Training: final_check ---

--- üèÜ Optimal Epochs Found: 43 ---
Submission name will be: submission_GRU_H384_L2_BFalse_D0.414

## üì¨ **8. Create Submission**

Now we follow the final plan:
1.  Create a **new, full training set** (train + val).
2.  Re-scale the data and create a `full_train_loader`.
3.  Instantiate a **fresh copy** of our best model (using the `FINAL_` variables).
4.  Train it on the **full dataset** for the `FINAL_BEST_EPOCH` we just found.
5.  Generate predictions on the test set.
6.  Save to a unique `submission_...csv` file in the `submissions/` folder.

In [78]:
# --- 1. & 2. Create Full Training Set & Loader (with windows) ---
print("\n--- Preparing full dataset for final training ---")

scaler_final = StandardScaler()
ns, ts, f = X_train_full.shape
X_train_full_2d = X_train_full.reshape(ns * ts, f)

print(f"Fitting FINAL Scaler on X_train_full_2d shape: {X_train_full_2d.shape}")
scaler_final.fit(X_train_full_2d)

X_train_full_scaled_2d = scaler_final.transform(X_train_full_2d)
X_train_full_scaled = X_train_full_scaled_2d.reshape(ns, ts, f)

ns_test, ts_test, f_test = X_test.shape
X_test_2d = X_test.reshape(ns_test * ts_test, f_test)
X_test_final_scaled_2d = scaler_final.transform(X_test_2d)
X_test_final_scaled = X_test_final_scaled_2d.reshape(ns_test, ts_test, f_test)

print("Final scaling complete.")
print("--- Applying sliding windows to final dataset ---")

# --- NEW: Apply windowing to the final training and test sets ---
X_train_full_windowed, y_train_full_windowed, _ = create_sliding_windows(
    X_train_full_scaled,
    y_train_full,
    window_size=NEW_WINDOW_SIZE,
    stride=NEW_STRIDE
)

X_test_final_windowed, test_window_indices = create_sliding_windows(
    X_test_final_scaled,
    y=None, # No labels for the test set
    window_size=NEW_WINDOW_SIZE,
    stride=NEW_STRIDE
)
print(f"Full train windowed shape: {X_train_full_windowed.shape}")
print(f"Test windowed shape: {X_test_final_windowed.shape}")
print(f"Test window indices shape: {test_window_indices.shape}")


# --- Create Tensors and DataLoaders from WINDOWED data ---
full_train_features = torch.from_numpy(X_train_full_windowed).float()
full_train_targets = torch.from_numpy(y_train_full_windowed).long()
final_test_features = torch.from_numpy(X_test_final_windowed).float()

full_train_ds = TensorDataset(full_train_features, full_train_targets)
final_test_ds = TensorDataset(final_test_features) # No labels

def make_final_loader(ds, batch_size, shuffle, drop_last):
    return DataLoader(
        ds, batch_size=int(batch_size), shuffle=shuffle, drop_last=drop_last,
        num_workers=0, pin_memory=True, pin_memory_device="cuda", prefetch_factor=None
    )

full_train_loader = make_final_loader(full_train_ds, batch_size=FINAL_BATCH_SIZE, shuffle=True, drop_last=True)
test_loader = make_final_loader(final_test_ds, batch_size=FINAL_BATCH_SIZE, shuffle=False, drop_last=False)
print("Final DataLoaders created.")

# --- 3. Instantiate Fresh Model (using FINAL_... vars) ---
print(f"\n--- Building FINAL model for submission: {FINAL_EXPERIMENT_NAME} ---")
final_model = RecurrentClassifier(
    input_size=N_FEATURES,
    hidden_size=FINAL_HIDDEN_SIZE,
    num_layers=FINAL_HIDDEN_LAYERS,
    num_classes=N_CLASSES,
    dropout_rate=FINAL_DROPOUT_RATE,
    bidirectional=FINAL_BIDIRECTIONAL,
    rnn_type=FINAL_MODEL_TYPE
).to(device)

if torch.__version__[0] >= "2":
    print("Compiling final model...")
    final_model = torch.compile(final_model)

final_optimizer = torch.optim.AdamW(final_model.parameters(), lr=FINAL_LEARNING_RATE, weight_decay=FINAL_L2_LAMBDA)
final_scaler = torch.amp.GradScaler(enabled=(device.type == 'cuda'))

# --- 4. Train on Full Dataset ---
print(f"Training final model for {FINAL_BEST_EPOCH} epochs on ALL data...")

final_model.train() 
for epoch in range(1, FINAL_BEST_EPOCH + 1):
    train_loss, train_f1 = train_one_epoch(
        final_model, full_train_loader, final_check_criterion, final_optimizer, final_scaler, device, FINAL_L2_LAMBDA
    )
    if epoch % 5 == 0 or epoch == 1 or epoch == FINAL_BEST_EPOCH:
        print(f"Final Training Epoch {epoch:3d}/{FINAL_BEST_EPOCH} | Train: Loss={train_loss:.4f}, F1={train_f1:.4f}")

print("Final training complete.")

# --- 5. Generate Predictions ---
print("\n--- Generating predictions on test set (windowed) ---")
final_model.eval()
all_predictions = []

with torch.no_grad():
    for (inputs,) in test_loader: 
        inputs = inputs.to(device)
        with torch.amp.autocast(device_type=device.type, enabled=(device.type == 'cuda')):
            logits = final_model(inputs)
            preds = logits.argmax(dim=1)
            all_predictions.append(preds.cpu().numpy())

all_predictions = np.concatenate(all_predictions)
print(f"Generated {len(all_predictions)} predictions (from {len(test_window_indices)} windows).")


# --- 6. NEW: Aggregate Predictions (Majority Vote) ---
print("Aggregating window predictions to sample predictions...")

# Use pandas for easy aggregation
df_preds = pd.DataFrame({
    'original_index': test_window_indices, # Map from window to original sample
    'prediction': all_predictions
})

from scipy.stats import mode 
# Group by the original sample index and find the most common prediction
# 'mode(x)[0]' gets the most frequent value (the 'mode')
agg_preds = df_preds.groupby('original_index')['prediction'].apply(lambda x: mode(x)[0]).values

# 'agg_preds' now has one prediction per original sample (length 1324)
print(f"Aggregated to {len(agg_preds)} final predictions.")

# Inverse transform these aggregated predictions to labels
predicted_labels = le.inverse_transform(agg_preds)


# --- 7. Save Submission File (This part now works) ---
print("Loading sample submission file for correct formatting...")
X_test_long = pd.read_csv(X_TEST_PATH)
test_sample_indices = sorted(X_test_long['sample_index'].unique())

if len(predicted_labels) != len(test_sample_indices):
    print(f"ERROR: Prediction count mismatch! Predictions: {len(predicted_labels)}, Test Indices: {len(test_sample_indices)}")
else:
    print("Prediction count matches. Creating submission.")
    
    final_submission_df = pd.DataFrame({
        'sample_index': test_sample_indices,
        'label': predicted_labels 
    })
    
    final_submission_df['sample_index'] = final_submission_df['sample_index'].apply(lambda x: f"{x:03d}")

    SUBMISSIONS_DIR = "submissions"
    os.makedirs(SUBMISSIONS_DIR, exist_ok=True)
    
    submission_filename = f"submission_{FINAL_EXPERIMENT_NAME}_windowed_w{NEW_WINDOW_SIZE}_s{NEW_STRIDE}.csv"
    submission_filepath = os.path.join(SUBMISSIONS_DIR, submission_filename)
    
    final_submission_df.to_csv(submission_filepath, index=False)

    print(f"\nSuccessfully saved to {submission_filepath}!")
    print("This file is correctly formatted for Kaggle:")
    print(final_submission_df.head())

del final_model, full_train_loader, test_loader, full_train_features, final_test_features


--- Preparing full dataset for final training ---
Fitting FINAL Scaler on X_train_full_2d shape: (105760, 35)
Final scaling complete.
--- Applying sliding windows to final dataset ---
Full train windowed shape: (3305, 80, 35)
Test windowed shape: (6620, 80, 35)
Test window indices shape: (6620,)
Final DataLoaders created.

--- Building FINAL model for submission: GRU_H384_L2_BFalse_D0.4140005708856067_Optuna_FINAL ---
Compiling final model...
Training final model for 43 epochs on ALL data...
Final Training Epoch   1/43 | Train: Loss=0.5865, F1=0.7627
Final Training Epoch   5/43 | Train: Loss=0.2152, F1=0.9265
Final Training Epoch  10/43 | Train: Loss=0.0902, F1=0.9725
Final Training Epoch  15/43 | Train: Loss=0.0792, F1=0.9774
Final Training Epoch  20/43 | Train: Loss=0.0426, F1=0.9914
Final Training Epoch  25/43 | Train: Loss=0.0327, F1=0.9932
Final Training Epoch  30/43 | Train: Loss=0.0157, F1=0.9994
Final Training Epoch  35/43 | Train: Loss=0.0321, F1=0.9960
Final Training Epoch  