# **Kaggle Challenge: Pirate Pain Dataset üè¥‚Äç‚ò†Ô∏è (v2 - Advanced)**

This notebook has been modified to include two new experimental techniques:

1.  **Static Feature Integration:** We now load, one-hot encode, and feed the static features (`n_legs`, `n_hands`, `n_eyes`) into the model.
2.  **Sliding Window Augmentation:** We now use a sliding window over the 160-timestep samples to create more (but shorter) training examples.

**Local Setup:**
1.  Ensure you have a Conda environment with PyTorch (GPU), `pandas`, `sklearn`, `jupyterlab`, `ray[tune]`, and `optuna`.
2.  Place the Kaggle CSVs (`pirate_pain_train.csv`, `pirate_pain_train_labels.csv`, `pirate_pain_test.csv`) in a folder named `data/` in the same directory as this notebook.
3.  To run TensorBoard, open a separate terminal, `conda activate` your environment, `cd` to this folder, and run: `tensorboard --logdir=./tensorboard`

## ‚öôÔ∏è **1. Setup & Libraries**

In [14]:
# Set seed for reproducibility
SEED = 123

# Import necessary libraries
import os
import logging
import random
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns 
import copy
from itertools import product
import time
from scipy.stats import mode # --- NEW --- for aggregation

# Set environment variables before importing modules
os.environ['MPLCONFIGDIR'] = os.getcwd() + '/configs/'

# Suppress warnings
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
warnings.simplefilter(action='ignore', category=Warning)

# --- PyTorch Imports ---\n
import torch
from torch import nn
from torch.utils.tensorboard import SummaryWriter
from torch.utils.data import TensorDataset, DataLoader

# --- Sklearn Imports ---\n
from sklearn.preprocessing import StandardScaler, LabelEncoder, OneHotEncoder # --- MODIFIED ---
from sklearn.model_selection import StratifiedShuffleSplit
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score, confusion_matrix

# --- Ray[tune] & Optuna Imports ---\n
import ray
from ray import tune
from ray.tune.schedulers import ASHAScheduler
from ray.tune.search.optuna import OptunaSearch
from functools import partial

# --- Setup Directories & Device ---\n
logs_dir = "tensorboard"
os.makedirs("models", exist_ok=True)
os.makedirs(logs_dir, exist_ok=True)

if torch.cuda.is_available():
    device = torch.device("cuda")
    torch.cuda.manual_seed_all(SEED)
    torch.backends.cudnn.benchmark = True
    print("\n--- Using GPU (RTX 3070, here we come!) ---")
else:
    device = torch.device("cpu")
    print("\n--- Using CPU ---")

print(f"PyTorch version: {torch.__version__}")
print(f"Device: {device}")

# Configure plot display settings
sns.set_theme(font_scale=1.4)
sns.set_style('white')
plt.rc('font', size=14)
# %matplotlib inline # Uncomment if running in Jupyter


--- Using GPU (RTX 3070, here we come!) ---
PyTorch version: 2.5.1
Device: cuda


## üîÑ **2. Data Loading & Reshaping (MODIFIED)**
 
This is now a two-part process:
1.  Define all features (time-series and static).
2.  Create a function to pivot the time-series data into a 3D tensor: `(num_samples, num_timesteps, num_features)`.
3.  Create a function to load and one-hot encode the static data into a 2D tensor: `(num_samples, num_static_features)`.

In [15]:
# --- 1. Define File Paths and Features ---
DATA_DIR = "data"
X_TRAIN_PATH = os.path.join(DATA_DIR, "pirate_pain_train.csv")
Y_TRAIN_PATH = os.path.join(DATA_DIR, "pirate_pain_train_labels.csv")
X_TEST_PATH = os.path.join(DATA_DIR, "pirate_pain_test.csv")
SUBMISSION_PATH = os.path.join(DATA_DIR, "sample_submission.csv")

# Define our features
JOINT_FEATURES = [f"joint_{i:02d}" for i in range(31)]
PAIN_FEATURES = [f"pain_survey_{i}" for i in range(1, 5)]
STATIC_FEATURES = ["n_legs", "n_hands", "n_eyes"] # --- NEW ---

TS_FEATURES = JOINT_FEATURES + PAIN_FEATURES # Time-Series Features
N_TS_FEATURES = len(TS_FEATURES)
N_TIMESTEPS = 160 # Fixed from our earlier debugging

print(f"Using {N_TS_FEATURES} time-series features: {TS_FEATURES[:3]}... to {TS_FEATURES[-3:]}")
print(f"Using {len(STATIC_FEATURES)} static features: {STATIC_FEATURES}")

# --- 2. Create the Reshaping Functions ---

def reshape_timeseries_data(df, features_list, n_timesteps):
    """
    Pivots the long-format dataframe into a 3D NumPy array.
    Shape: (n_samples, n_timesteps, n_features)
    """
    df_pivot = df.pivot(index='sample_index', columns='time', values=features_list)
    data_2d = df_pivot.values
    n_samples = data_2d.shape[0]
    data_3d = data_2d.reshape(n_samples, len(features_list), n_timesteps)
    return data_3d.transpose(0, 2, 1)

def load_and_encode_static_data(csv_path, static_features, ohe_encoder=None, fit_encoder=False):
    """
    Loads static features, takes the first row for each sample,
    and One-Hot Encodes them.
    """
    df = pd.read_csv(csv_path)
    # Get just one row per sample_index
    df_static = df.drop_duplicates(subset='sample_index', keep='first').set_index('sample_index')
    df_static = df_static[static_features]
    
    # Handle string-based categorical data
    df_static = df_static.astype(str)

    if fit_encoder:
        print("Fitting new OneHotEncoder for static features...")
        ohe_encoder = OneHotEncoder(sparse_output=False, handle_unknown='ignore')
        ohe_encoder.fit(df_static)
    
    print("Transforming static features with OneHotEncoder...")
    static_data_encoded = ohe_encoder.transform(df_static)
    
    if fit_encoder:
        return static_data_encoded, ohe_encoder
    else:
        return static_data_encoded

# --- 3. Load and Reshape Data ---
print("Loading and reshaping time-series training data...")
X_train_long = pd.read_csv(X_TRAIN_PATH)
X_train_full_ts = reshape_timeseries_data(X_train_long, TS_FEATURES, N_TIMESTEPS)

print("Loading and reshaping time-series test data...")
X_test_long = pd.read_csv(X_TEST_PATH)
X_test_ts = reshape_timeseries_data(X_test_long, TS_FEATURES, N_TIMESTEPS)

# --- NEW: Load and Encode Static Data ---
# Fit the encoder on the training data
X_train_full_static, static_ohe_encoder = load_and_encode_static_data(
    X_TRAIN_PATH, 
    STATIC_FEATURES, 
    fit_encoder=True
)

# Use the *same* encoder to transform the test data
X_test_static = load_and_encode_static_data(
    X_TEST_PATH, 
    STATIC_FEATURES, 
    ohe_encoder=static_ohe_encoder, 
    fit_encoder=False
)

N_STATIC_FEATURES = X_train_full_static.shape[1]

# Load labels
y_train_df = pd.read_csv(Y_TRAIN_PATH)
y_train_full_df = y_train_df.sort_values(by='sample_index')
y_train_labels_str = y_train_full_df['label'].values

print(f"\nX_train_full_ts shape: {X_train_full_ts.shape}")
print(f"X_train_full_static shape: {X_train_full_static.shape}")
print(f"y_train_labels_str shape: {y_train_labels_str.shape}")
print(f"X_test_ts shape: {X_test_ts.shape}")
print(f"X_test_static shape: {X_test_static.shape}")
print(f"Total static features after OHE: {N_STATIC_FEATURES}")

del X_train_long, X_test_long, y_train_df

Using 35 time-series features: ['joint_00', 'joint_01', 'joint_02']... to ['pain_survey_2', 'pain_survey_3', 'pain_survey_4']
Using 3 static features: ['n_legs', 'n_hands', 'n_eyes']
Loading and reshaping time-series training data...
Loading and reshaping time-series test data...
Fitting new OneHotEncoder for static features...
Transforming static features with OneHotEncoder...
Transforming static features with OneHotEncoder...

X_train_full_ts shape: (661, 160, 35)
X_train_full_static shape: (661, 6)
y_train_labels_str shape: (661,)
X_test_ts shape: (1324, 160, 35)
X_test_static shape: (1324, 6)
Total static features after OHE: 6


## üöß **3. Preprocessing: Window, Split & Scale (MODIFIED)**
 
1.  **Encode Labels:** Convert `no_pain`, `low_pain`, `high_pain` to `0`, `1`, `2`.
2.  **Create Sliding Windows:** Use a new function to "chop" the 160-timestep data into smaller, overlapping windows. This augments our data.
3.  **Split Data:** Use `StratifiedShuffleSplit` on the new *windowed* data.
4.  **Scale Features:** Use *two* `StandardScaler`s: one for time-series, one for static. Fit *only* on the training split.

In [16]:
# --- 1. Encode Labels ---
LABEL_MAPPING = {'no_pain': 0, 'low_pain': 1, 'high_pain': 2}
le = LabelEncoder()
le.fit(list(LABEL_MAPPING.keys()))
y_train_full = le.transform(y_train_labels_str)
N_CLASSES = len(LABEL_MAPPING)

print(f"Labels encoded. {N_CLASSES} classes: {LABEL_MAPPING}")

# --- 2. Create Sliding Windows ---
# --- NEW --- Define Window Parameters ---
WINDOW_SIZE = 100 # Experiment with this (e.g., 80, 100, 120)
STRIDE = 20       # Experiment with this (e.g., 10, 20)
# ---

def create_sliding_windows(X_3d_ts, X_2d_static, y, window_size, stride):
    """
    Takes 3D time-series, 2D static data, and 1D labels
    and creates overlapping windows.
    Returns:
    - new_X_ts (4D): (n_windows, window_size, n_ts_features)
    - new_X_static (2D): (n_windows, n_static_features)
    - new_y (1D): (n_windows,)
    - window_to_sample_idx (1D): (n_windows,) mapping to original sample
    """
    new_X_ts = []
    new_X_static = []
    new_y = []
    window_to_sample_idx = []
    
    n_samples, n_timesteps, n_features = X_3d_ts.shape
    
    for i in range(n_samples):
        sample_ts = X_3d_ts[i]
        sample_static = X_2d_static[i]
        label = y[i]
        
        idx = 0
        while (idx + window_size) <= n_timesteps:
            window = sample_ts[idx : idx + window_size]
            new_X_ts.append(window)
            new_X_static.append(sample_static) # Static features are repeated
            new_y.append(label)
            window_to_sample_idx.append(i) # Track original sample
            idx += stride
            
    return (
        np.array(new_X_ts), 
        np.array(new_X_static), 
        np.array(new_y), 
        np.array(window_to_sample_idx)
    )

print(f"\nCreating sliding windows (W={WINDOW_SIZE}, S={STRIDE})...")
(
    X_ts_windowed, 
    X_static_windowed, 
    y_windowed, 
    _ # We don't need the index map for training
) = create_sliding_windows(
    X_train_full_ts, 
    X_train_full_static, 
    y_train_full, 
    WINDOW_SIZE, 
    STRIDE
)

# Our new sequence length is the window size
N_TIMESTEPS_WINDOWED = WINDOW_SIZE

print(f"Data augmented with sliding windows:")
print(f"  Original TS shape: {X_train_full_ts.shape}")
print(f"  Windowed TS shape: {X_ts_windowed.shape}")
print(f"  Windowed Static shape: {X_static_windowed.shape}")
print(f"  Windowed y shape: {y_windowed.shape}")


# --- 3. Create Validation Split (on windowed data) ---
sss = StratifiedShuffleSplit(n_splits=1, test_size=0.2, random_state=SEED)

# We split the new, larger windowed arrays
for train_idx, val_idx in sss.split(X_ts_windowed, y_windowed):
    X_ts_train_split = X_ts_windowed[train_idx]
    X_static_train_split = X_static_windowed[train_idx]
    y_train_split = y_windowed[train_idx]
    
    X_ts_val_split = X_ts_windowed[val_idx]
    X_static_val_split = X_static_windowed[val_idx]
    y_val_split = y_windowed[val_idx]

print(f"\nData split into Train and Validation sets:")
print(f"  X_ts_train_split:       {X_ts_train_split.shape}")
print(f"  X_static_train_split: {X_static_train_split.shape}")
print(f"  y_train_split:          {y_train_split.shape}")
print(f"  X_ts_val_split:         {X_ts_val_split.shape}")
print(f"  X_static_val_split:   {X_static_val_split.shape}")
print(f"  y_val_split:          {y_val_split.shape}")

# --- 4. Scale Features (The "No-Cheating" Rule) ---

# --- Scaler 1: Time-Series ---
scaler_ts = StandardScaler()
ns, ts, f = X_ts_train_split.shape
X_ts_train_2d = X_ts_train_split.reshape(ns * ts, f)
print(f"\nFitting Time-Series Scaler on X_ts_train_2d shape: {X_ts_train_2d.shape}")
scaler_ts.fit(X_ts_train_2d)

# Transform TS Train
X_ts_train_scaled_2d = scaler_ts.transform(X_ts_train_2d)
X_ts_train_scaled = X_ts_train_scaled_2d.reshape(ns, ts, f)

# Transform TS Val
ns_val, ts_val, f_val = X_ts_val_split.shape
X_ts_val_2d = X_ts_val_split.reshape(ns_val * ts_val, f_val)
X_ts_val_scaled_2d = scaler_ts.transform(X_ts_val_2d)
X_ts_val_scaled = X_ts_val_scaled_2d.reshape(ns_val, ts_val, f_val)

# --- Scaler 2: Static ---
# Note: Scaling OHE features is fine, it just centers them around 0
scaler_static = StandardScaler()
print(f"Fitting Static Scaler on X_static_train_split shape: {X_static_train_split.shape}")
scaler_static.fit(X_static_train_split)

# Transform Static Train
X_static_train_scaled = scaler_static.transform(X_static_train_split)
# Transform Static Val
X_static_val_scaled = scaler_static.transform(X_static_val_split)

print("\nScaling complete.")
print(f"  X_ts_train_scaled:       {X_ts_train_scaled.shape}")
print(f"  X_static_train_scaled: {X_static_train_scaled.shape}")
print(f"  X_ts_val_scaled:         {X_ts_val_scaled.shape}")
print(f"  X_static_val_scaled:   {X_static_val_scaled.shape}")

del X_ts_train_2d, X_ts_val_2d, X_ts_train_scaled_2d, X_ts_val_scaled_2d

Labels encoded. 3 classes: {'no_pain': 0, 'low_pain': 1, 'high_pain': 2}

Creating sliding windows (W=100, S=20)...
Data augmented with sliding windows:
  Original TS shape: (661, 160, 35)
  Windowed TS shape: (2644, 100, 35)
  Windowed Static shape: (2644, 6)
  Windowed y shape: (2644,)

Data split into Train and Validation sets:
  X_ts_train_split:       (2115, 100, 35)
  X_static_train_split: (2115, 6)
  y_train_split:          (2115,)
  X_ts_val_split:         (529, 100, 35)
  X_static_val_split:   (529, 6)
  y_val_split:          (529,)

Fitting Time-Series Scaler on X_ts_train_2d shape: (211500, 35)
Fitting Static Scaler on X_static_train_split shape: (2115, 6)

Scaling complete.
  X_ts_train_scaled:       (2115, 100, 35)
  X_static_train_scaled: (2115, 6)
  X_ts_val_scaled:         (529, 100, 35)
  X_static_val_scaled:   (529, 6)


## üöö **4. PyTorch DataLoaders (MODIFIED)**
 
We now create a `TensorDataset` that holds **three** items:
1.  Time-series features
2.  Static features
3.  Targets

In [17]:
# --- 1. Convert to Tensors ---
# Train
train_ts_features = torch.from_numpy(X_ts_train_scaled).float()
train_static_features = torch.from_numpy(X_static_train_scaled).float()
train_targets = torch.from_numpy(y_train_split).long()

# Validation
val_ts_features = torch.from_numpy(X_ts_val_scaled).float()
val_static_features = torch.from_numpy(X_static_val_scaled).float()
val_targets = torch.from_numpy(y_val_split).long()

# Test (we'll process this in the submission cell)

# --- 2. Create TensorDatasets ---
train_ds = TensorDataset(train_ts_features, train_static_features, train_targets)
val_ds = TensorDataset(val_ts_features, val_static_features, val_targets)
# test_ds will be created in the final submission cell

print(f"TensorDatasets created.")
print(f"Example train_ds[0] shapes:")
print(f"  TS features:  {train_ds[0][0].shape}")
print(f"  Static features: {train_ds[0][1].shape}")
print(f"  Target:         {train_ds[0][2].shape}")

# --- 3. Define make_loader function (from Lecture 4) ---
BATCH_SIZE = 128 # This will be our default, but Optuna can tune it

def make_loader(ds, batch_size, shuffle, drop_last):
    num_workers = 0 
    
    return DataLoader(
        ds,
        batch_size=int(batch_size),
        shuffle=shuffle,
        drop_last=drop_last,
        num_workers=num_workers,
        pin_memory=True,
        pin_memory_device="cuda" if torch.cuda.is_available() else "",
        prefetch_factor=None,
    )

# --- 4. Create DataLoaders ---
# We will create these *inside* the objective function now,
# as the batch size is a hyperparameter we want to tune.
print("\nDataLoaders will be created inside the tuning loop.")
del X_ts_train_scaled, X_static_train_scaled, val_ts_features, val_static_features
del train_ts_features, train_static_features

TensorDatasets created.
Example train_ds[0] shapes:
  TS features:  torch.Size([100, 35])
  Static features: torch.Size([6])
  Target:         torch.Size([])

DataLoaders will be created inside the tuning loop.


## üõ†Ô∏è **5. Model & Training Engine (MODIFIED)**
 
-   `RecurrentClassifier`: **Modified** to accept `static_input_size` and a second input `x_static`. It concatenates the RNN output with the static features before classifying.
-   `train_one_epoch` / `validate_one_epoch`: **Modified** to handle 3-part batches `(ts_inputs, static_inputs, targets)`.
-   `objective_function`: **Modified** to pass `static_input_size` to the model and handle the new batch structure.

In [18]:
def recurrent_summary(model, ts_input_size, static_input_size):
    """
    Custom summary function (MODIFIED) for 2-input model.
    """
    output_shapes = {}
    hooks = []

    def get_hook(name):
        def hook(module, input, output):
            if isinstance(output, tuple):
                shape1 = list(output[0].shape)
                shape1[0] = -1
                if isinstance(output[1], tuple):
                    shape2 = list(output[1][0].shape)
                else:
                    shape2 = list(output[1].shape)
                shape2[1] = -1
                output_shapes[name] = f"[{shape1}, {shape2}]"
            else:
                shape = list(output.shape)
                shape[0] = -1
                output_shapes[name] = f"{shape}"
        return hook

    try:
        device_summary = next(model.parameters()).device
    except StopIteration:
        device_summary = torch.device("cpu")

    # --- MODIFIED: Create two dummy inputs ---
    dummy_input_ts = torch.randn(1, *ts_input_size).to(device_summary)
    dummy_input_static = torch.randn(1, static_input_size).to(device_summary)

    for name, module in model.named_children():
        if isinstance(module, (nn.Linear, nn.RNN, nn.GRU, nn.LSTM)):
            hook_handle = module.register_forward_hook(get_hook(name))
            hooks.append(hook_handle)

    model.eval()
    with torch.no_grad():
        try:
            # --- MODIFIED: Pass both inputs ---
            model(dummy_input_ts, dummy_input_static)
        except Exception as e:
            print(f"Error during dummy forward pass: {e}")
            for h in hooks:
                h.remove()
            return

    for h in hooks:
        h.remove()

    print("-" * 79)
    print(f"{'Layer (type)':<25} {'Output Shape':<28} {'Param #':<18}")
    print("=" * 79)

    total_params = 0
    total_trainable_params = 0

    for name, module in model.named_children():
        if name in output_shapes:
            module_params = sum(p.numel() for p in module.parameters())
            trainable_params = sum(p.numel() for p in module.parameters() if p.requires_grad)

            total_params += module_params
            total_trainable_params += trainable_params

            layer_name = f"{name} ({type(module).__name__})"
            output_shape_str = str(output_shapes[name])
            params_str = f"{trainable_params:,}"

            print(f"{layer_name:<25} {output_shape_str:<28} {params_str:<15}")

    print("=" * 79)
    print(f"Total params: {total_params:,}")
    print(f"Trainable params: {total_trainable_params:,}")
    print(f"Non-trainable params: {total_params - total_trainable_params:,}")
    print("-" * 79)

In [19]:
# --- MODIFIED --- RecurrentClassifier ---
class RecurrentClassifier(nn.Module):
    """
    Generic RNN classifier (RNN, LSTM, GRU) from Lecture 4.
    MODIFIED to accept static features.
    """
    def __init__(
            self,
            input_size,         # N_TS_FEATURES
            static_input_size,  # N_STATIC_FEATURES
            hidden_size,
            num_layers,
            num_classes,
            rnn_type='GRU',
            bidirectional=False,
            dropout_rate=0.2
            ):
        super().__init__()

        self.rnn_type = rnn_type
        self.num_layers = num_layers
        self.hidden_size = hidden_size
        self.bidirectional = bidirectional

        rnn_map = {
            'RNN': nn.RNN,
            'LSTM': nn.LSTM,
            'GRU': nn.GRU
        }
        if rnn_type not in rnn_map:
            raise ValueError("rnn_type must be 'RNN', 'LSTM', or 'GRU'")
        
        rnn_module = rnn_map[rnn_type]
        dropout_val = dropout_rate if num_layers > 1 else 0

        self.rnn = rnn_module(
            input_size=input_size,
            hidden_size=hidden_size,
            num_layers=num_layers,
            batch_first=True,
            bidirectional=bidirectional,
            dropout=dropout_val
        )

        if self.bidirectional:
            rnn_output_size = hidden_size * 2
        else:
            rnn_output_size = hidden_size
        
        # --- NEW: Classifier input is RNN output + static features ---
        classifier_input_size = rnn_output_size + static_input_size

        self.classifier = nn.Linear(classifier_input_size, num_classes)

    def forward(self, x_ts, x_static): # --- MODIFIED: two inputs ---
        """
        x_ts shape: (batch_size, seq_length, input_size)
        x_static shape: (batch_size, static_input_size)
        """
        rnn_out, hidden = self.rnn(x_ts)

        if self.rnn_type == 'LSTM':
            hidden = hidden[0]

        if self.bidirectional:
            hidden = hidden.view(self.num_layers, 2, -1, self.hidden_size)
            hidden_to_classify = torch.cat([hidden[-1, 0, :, :], hidden[-1, 1, :, :]], dim=1)
        else:
            hidden_to_classify = hidden[-1]
        
        # --- NEW: Concatenate RNN output with static features ---
        combined_features = torch.cat([hidden_to_classify, x_static], dim=1)

        logits = self.classifier(combined_features)
        return logits

In [20]:
# --- MODIFIED --- Training & Validation Loops ---

def train_one_epoch(model, train_loader, criterion, optimizer, scaler, device, l1_lambda=0, l2_lambda=0):
    model.train()
    running_loss = 0.0
    all_predictions = []
    all_targets = []

    # --- MODIFIED: Unpack 3-part batch ---
    for batch_idx, (ts_inputs, static_inputs, targets) in enumerate(train_loader):
        # --- MODIFIED: Move all parts to device ---
        ts_inputs, static_inputs, targets = ts_inputs.to(device), static_inputs.to(device), targets.to(device)
        optimizer.zero_grad(set_to_none=True)

        with torch.amp.autocast(device_type=device.type, enabled=(device.type == 'cuda')):
            # --- MODIFIED: Pass both inputs to model ---
            logits = model(ts_inputs, static_inputs)
            loss = criterion(logits, targets)
            
            l1_norm = sum(p.abs().sum() for p in model.parameters())
            l2_norm = sum(p.pow(2).sum() for p in model.parameters())
            loss = loss + l1_lambda * l1_norm + l2_lambda * l2_norm

        scaler.scale(loss).backward()
        scaler.step(optimizer)
        scaler.update()

        running_loss += loss.item() * ts_inputs.size(0)
        predictions = logits.argmax(dim=1)
        all_predictions.append(predictions.cpu().numpy())
        all_targets.append(targets.cpu().numpy())

    epoch_loss = running_loss / len(train_loader.dataset)
    epoch_f1 = f1_score(
        np.concatenate(all_targets),
        np.concatenate(all_predictions),
        average='weighted'
    )
    return epoch_loss, epoch_f1

def validate_one_epoch(model, val_loader, criterion, device):
    model.eval()
    running_loss = 0.0
    all_predictions = []
    all_targets = []

    with torch.no_grad():
        # --- MODIFIED: Unpack 3-part batch ---
        for (ts_inputs, static_inputs, targets) in val_loader:
            # --- MODIFIED: Move all parts to device ---
            ts_inputs, static_inputs, targets = ts_inputs.to(device), static_inputs.to(device), targets.to(device)

            with torch.amp.autocast(device_type=device.type, enabled=(device.type == 'cuda')):
                # --- MODIFIED: Pass both inputs to model ---
                logits = model(ts_inputs, static_inputs)
                loss = criterion(logits, targets)

            running_loss += loss.item() * ts_inputs.size(0)
            predictions = logits.argmax(dim=1)
            all_predictions.append(predictions.cpu().numpy())
            all_targets.append(targets.cpu().numpy())

    epoch_loss = running_loss / len(val_loader.dataset)
    epoch_f1 = f1_score(
        np.concatenate(all_targets),
        np.concatenate(all_predictions),
        average='weighted'
    )
    return epoch_loss, epoch_f1

def log_metrics_to_tensorboard(writer, epoch, train_loss, train_f1, val_loss, val_f1, model):
    writer.add_scalar('Loss/Training', train_loss, epoch)
    writer.add_scalar('Loss/Validation', val_loss, epoch)
    writer.add_scalar('F1/Training', train_f1, epoch)
    writer.add_scalar('F1/Validation', val_f1, epoch)


# --- MODIFIED --- Objective Function ---
def objective_function(config, train_ds, val_ds):
    """
    This is the main function that Ray Tune will call for each trial.
    """
    
    # --- 1. Create DataLoaders with the tuned batch size ---
    train_loader = make_loader(train_ds, batch_size=config["batch_size"], shuffle=True, drop_last=True)
    val_loader = make_loader(val_ds, batch_size=config["batch_size"], shuffle=False, drop_last=False)
    
    # --- 2. Create Model --- 
    model = RecurrentClassifier(
        input_size=N_TS_FEATURES,          # Time-series features
        static_input_size=N_STATIC_FEATURES, # --- NEW --- Static features
        hidden_size=config["hidden_size"],
        num_layers=config["num_layers"],
        num_classes=N_CLASSES,
        dropout_rate=config["dropout_rate"],
        bidirectional=config["bidirectional"],
        rnn_type=config["rnn_type"]
    ).to(device)
    
    if torch.__version__[0] >= "2":
        model = torch.compile(model)
    
    # --- 3. Create Optimizer, Loss, Scaler ---\n
    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.AdamW(model.parameters(), lr=config["lr"], weight_decay=config["l2_lambda"])
    scaler = torch.amp.GradScaler(enabled=(device.type == 'cuda'))

    # --- 4. The Training Loop ---
    EPOCHS = 200 
    
    for epoch in range(1, EPOCHS + 1):
        # --- MODIFIED: train/validate functions are new versions ---
        train_loss, train_f1 = train_one_epoch(
            model, train_loader, criterion, optimizer, scaler, device, 0, config["l2_lambda"]
        )

        val_loss, val_f1 = validate_one_epoch(
            model, val_loader, criterion, device
        )
        
        # --- Send Results to Ray Tune --- 
        tune.report({
            "train_loss": train_loss,
            "train_f1": train_f1,
            "val_loss": val_loss,
            "val_f1": val_f1
        })

## üß™ **6. Hyperparameter Search with Ray Tune & Optuna**
 
This cell is unchanged. It will now automatically use the new model and data pipeline.

In [21]:
# --- 1. Define the Search Space for Optuna ---\n
search_space = {
    "rnn_type": tune.choice(['GRU', 'LSTM']),
    "lr": tune.loguniform(1e-5, 1e-2),
    "batch_size": tune.choice([64, 128]),  
    "hidden_size": tune.choice([64, 128, 256, 384]),
    "num_layers": tune.choice([2, 3]),
    "dropout_rate": tune.uniform(0.1, 0.6),
    "bidirectional": tune.choice([True, False]),
    "l2_lambda": tune.loguniform(1e-7, 1e-3)
}

# --- 2. Define the Optimizer (Optuna) and Scheduler (ASHA) ---
optuna_search = OptunaSearch(
    metric="val_f1",
    mode="max"
)

scheduler = ASHAScheduler(
    metric="val_f1",
    mode="max",
    grace_period=20,  # Min epochs a trial must run
    reduction_factor=2
)

# --- 3. Initialize Ray ---
if ray.is_initialized():
    ray.shutdown()

ray_logs_path = os.path.abspath("./ray_results")
os.makedirs(ray_logs_path, exist_ok=True)
os.environ["RAY_TEMP_DIR"] = ray_logs_path
os.environ["RAY_RAYLET_START_WAIT_TIME_S"] = "120"
ray.init(
    num_cpus=16, 
    num_gpus=1, 
    ignore_reinit_error=True
)

def short_trial_name(trial):
    return f"{trial.trainable_name}_{trial.trial_id}"


# --- 4. Run the Tuner ---
print("Starting hyperparameter search (1 trial at a time)...")

# --- MODIFIED: Pass the new 3-tensor datasets ---
analysis = tune.run(
    tune.with_parameters(objective_function, train_ds=train_ds, val_ds=val_ds),
    
    resources_per_trial={
        "cpu": 4, 
        "gpu": 0.25
    },
    
    config=search_space,
    num_samples=20, # Number of different HPO trials to run
    search_alg=optuna_search,
    scheduler=scheduler,
    name="pirate_pain_optuna_search_v2",

    storage_path=ray_logs_path,
    trial_dirname_creator=short_trial_name,
    log_to_file=True,
    verbose=1
)

print("\n--- Search Complete ---")

0,1
Current time:,2025-11-08 14:16:34
Running for:,00:14:14.31
Memory:,10.1/13.9 GiB

Trial name,status,loc,batch_size,bidirectional,dropout_rate,hidden_size,l2_lambda,lr,num_layers,rnn_type,iter,total time (s),train_loss,train_f1,val_loss
objective_function_3c8f5a6b,TERMINATED,127.0.0.1:39648,128,True,0.449863,384,1.55699e-06,0.0071723,2,LSTM,80,208.026,0.105711,0.998047,0.448443
objective_function_32ae99ef,TERMINATED,127.0.0.1:43288,128,True,0.317327,128,2.82998e-05,1.4419e-05,2,LSTM,20,15.5797,0.717849,0.706303,0.662027
objective_function_814b89bf,TERMINATED,127.0.0.1:49432,128,True,0.354524,384,9.29181e-06,0.00338544,2,LSTM,100,332.602,0.101406,1.0,0.105049
objective_function_2726046a,TERMINATED,127.0.0.1:24964,64,False,0.300067,128,4.84556e-06,0.00434132,2,LSTM,100,151.939,0.0309734,1.0,0.0449008
objective_function_be825108,TERMINATED,127.0.0.1:20568,64,False,0.372461,256,4.60463e-06,0.00165158,2,LSTM,100,314.232,0.0137545,1.0,0.0696831
objective_function_86beef32,TERMINATED,127.0.0.1:45312,64,True,0.260787,384,1.05134e-07,0.00137486,2,LSTM,100,542.429,0.00125076,1.0,0.0767007
objective_function_e7047a9b,TERMINATED,127.0.0.1:37228,128,True,0.20574,256,1.16829e-06,0.00352773,2,LSTM,100,298.956,0.00886131,1.0,0.0354344
objective_function_e8a19773,TERMINATED,127.0.0.1:44724,128,True,0.125128,128,1.03541e-07,6.88449e-05,2,GRU,20,26.1692,0.397567,0.81841,0.419326
objective_function_bc2a2d81,TERMINATED,127.0.0.1:35040,128,False,0.505653,64,9.09173e-06,0.00151572,2,LSTM,100,73.1486,0.00556539,1.0,0.0211121
objective_function_27337408,TERMINATED,127.0.0.1:34416,128,True,0.594296,256,1.34443e-06,0.0015202,3,GRU,100,357.348,0.00488792,1.0,0.0534718


[36m(pid=gcs_server)[0m [2025-11-08 14:02:39,750 E 12724 2408] (gcs_server.exe) gcs_server.cc:302: Failed to establish connection to the event+metrics exporter agent. Events and metrics will not be exported. Exporter agent status: RpcError: Running out of retries to initialize the metrics agent. rpc_code: 14
[33m(raylet)[0m [2025-11-08 14:02:43,615 E 45472 22436] (raylet.exe) main.cc:975: Failed to establish connection to the metrics exporter agent. Metrics will not be exported. Exporter agent status: RpcError: Running out of retries to initialize the metrics agent. rpc_code: 14
2025-11-08 14:16:34,733	INFO tune.py:1009 -- Wrote the latest version of all result files and experiment state to 'c:/Users/Karim Negm/Documents/AN2DL/Challenge 1/ray_results/pirate_pain_optuna_search_v2' in 0.0428s.
2025-11-08 14:16:34,755	INFO tune.py:1041 -- Total run time: 854.36 seconds (854.26 seconds for the tuning loop).



--- Search Complete ---


In [26]:
# --- 5. Get Best Results ---
print("Getting best trial from analysis...")
best_trial = analysis.get_best_trial(metric="val_f1", mode="max", scope="all")
if best_trial:
    best_config = best_trial.config
    best_val_f1 = best_trial.last_result["val_f1"]
    
    print(f"Best validation F1 score: {best_val_f1:.4f}")
    print("Best hyperparameters found:")
    print(best_config)
else:
    print("ERROR: No trials completed successfully. Check the 'ray_results' folder for logs.")
    # --- FAKE CONFIG FOR TESTING ---
    # best_config = search_space = {
    #     "rnn_type": 'GRU', "lr": 1e-3, "batch_size": 128, "hidden_size": 128,
    #     "num_layers": 2, "dropout_rate": 0.2, "bidirectional": False, "l2_lambda": 1e-5
    # }
    # best_val_f1 = 0.8 # Dummy value
    # print("USING FAKE CONFIG FOR DEMONSTRATION")
    # --- END FAKE CONFIG ---
    best_config = None

Getting best trial from analysis...
Best validation F1 score: 0.9981
Best hyperparameters found:
{'rnn_type': 'GRU', 'lr': 0.0011459748962260494, 'batch_size': 128, 'hidden_size': 64, 'num_layers': 3, 'dropout_rate': 0.5783440469483345, 'bidirectional': True, 'l2_lambda': 9.963258042875386e-05}


## üèÜ **7. Final Model Configuration**
 
This cell is unchanged.

In [27]:
# ===================================================================
# --- üèÜ FINAL MODEL CONFIGURATION üèÜ ---
# ===================================================================
# --- 1. Get Best Config from Analysis --- 
FINAL_CONFIG = best_config
FINAL_BEST_VAL_F1 = best_val_f1

print("--- üèÜ Final Configuration Set --- ")
print(f"Best Val F1 from search: {FINAL_BEST_VAL_F1:.4f}")
print(FINAL_CONFIG)

--- üèÜ Final Configuration Set --- 
Best Val F1 from search: 0.9981
{'rnn_type': 'GRU', 'lr': 0.0011459748962260494, 'batch_size': 128, 'hidden_size': 64, 'num_layers': 3, 'dropout_rate': 0.5783440469483345, 'bidirectional': True, 'l2_lambda': 9.963258042875386e-05}


### **Re-run to find Best Epoch (MODIFIED)**
 
The `fit` function is now the **MODIFIED** version that handles the new data and model structure.

In [28]:
# --- We need the original 'fit' function back (MODIFIED) ---
def fit(model, train_loader, val_loader, epochs, criterion, optimizer, scaler, device,
        l1_lambda=0, l2_lambda=0, patience=0, evaluation_metric="val_f1", mode='max',
        restore_best_weights=True, writer=None, verbose=10, experiment_name=""):
    
    training_history = {
        'train_loss': [], 'val_loss': [],
        'train_f1': [], 'val_f1': []
    }
    
    model_path = f"models/{experiment_name}_best_model.pt"

    if patience > 0:
        patience_counter = 0
        best_metric = float('-inf') if mode == 'max' else float('inf')
        best_epoch = 0

    print(f"--- Starting Training: {experiment_name} ---")
    print(f"Will train for {epochs} epochs with patience={patience} monitoring {evaluation_metric}")

    for epoch in range(1, epochs + 1):
        # --- MODIFIED: Use new 3-part-batch train/val functions ---
        train_loss, train_f1 = train_one_epoch(
            model, train_loader, criterion, optimizer, scaler, device, l1_lambda, l2_lambda
        )

        val_loss, val_f1 = validate_one_epoch(
            model, val_loader, criterion, device
        )
        # --- End modifications ---

        training_history['train_loss'].append(train_loss)
        training_history['val_loss'].append(val_loss)
        training_history['train_f1'].append(train_f1)
        training_history['val_f1'].append(val_f1)

        if writer is not None:
            log_metrics_to_tensorboard(
                writer, epoch, train_loss, train_f1, val_loss, val_f1, model
            )

        if verbose > 0 and (epoch % verbose == 0 or epoch == 1):
            print(f"Epoch {epoch:3d}/{epochs} | "
                  f"Train: Loss={train_loss:.4f}, F1={train_f1:.4f} | "
                  f"Val: Loss={val_loss:.4f}, F1={val_f1:.4f}")

        if patience > 0:
            current_metric = training_history[evaluation_metric][-1]
            is_improvement = (current_metric > best_metric) if mode == 'max' else (current_metric < best_metric)

            if is_improvement:
                best_metric = current_metric
                best_epoch = epoch
                torch.save(model.state_dict(), model_path)
                patience_counter = 0
            else:
                patience_counter += 1
                if patience_counter >= patience:
                    print(f"\nEarly stopping triggered after {epoch} epochs.")
                    break

    if restore_best_weights and patience > 0:
        print(f"Restoring best model from epoch {best_epoch} with {evaluation_metric} {best_metric:.4f}")
        model.load_state_dict(torch.load(model_path))

    if patience == 0:
        print("Training complete. Saving final model.")
        torch.save(model.state_dict(), model_path.replace("_best_model.pt", "_final_model.pt"))

    if writer is not None:
        writer.close()
    
    print(f"--- Finished Training: {experiment_name} ---")
    return model, training_history, best_epoch if 'best_epoch' in locals() else epochs

# --- 1. Create DataLoaders for the best config ---
best_batch_size = FINAL_CONFIG["batch_size"]
# --- MODIFIED: Use the 3-tensor datasets ---
train_loader_final_check = make_loader(train_ds, batch_size=best_batch_size, shuffle=True, drop_last=True)
val_loader_final_check = make_loader(val_ds, batch_size=best_batch_size, shuffle=False, drop_last=False)

# --- 2. Instantiate Fresh Model ---
final_check_model = RecurrentClassifier(
    input_size=N_TS_FEATURES,
    static_input_size=N_STATIC_FEATURES, # --- NEW ---
    hidden_size=FINAL_CONFIG["hidden_size"],
    num_layers=FINAL_CONFIG["num_layers"],
    num_classes=N_CLASSES,
    dropout_rate=FINAL_CONFIG["dropout_rate"],
    bidirectional=FINAL_CONFIG["bidirectional"],
    rnn_type=FINAL_CONFIG["rnn_type"]
).to(device)

if torch.__version__[0] >= "2":
    final_check_model = torch.compile(final_check_model)

final_check_optimizer = torch.optim.AdamW(final_check_model.parameters(), lr=FINAL_CONFIG["lr"], weight_decay=FINAL_CONFIG["l2_lambda"])
final_check_scaler = torch.amp.GradScaler(enabled=(device.type == 'cuda'))
final_check_criterion = nn.CrossEntropyLoss()

# --- 3. Run Training with Early Stopping ---
print("--- Finding best epoch number for the winning model ---\n")
_, _, FINAL_BEST_EPOCH = fit(
    model=final_check_model,
    train_loader=train_loader_final_check,
    val_loader=val_loader_final_check,
    epochs=200, # Max epochs
    criterion=final_check_criterion,
    optimizer=final_check_optimizer,
    scaler=final_check_scaler,
    device=device,
    writer=None,
    verbose=10,
    experiment_name="final_check_v2",
    patience=30
)

print(f"\n--- üèÜ Optimal Epochs Found: {FINAL_BEST_EPOCH} ---")

# --- 4. Set variables for the submission cell ---
FINAL_MODEL_TYPE = FINAL_CONFIG["rnn_type"]
FINAL_HIDDEN_SIZE = FINAL_CONFIG["hidden_size"]
FINAL_HIDDEN_LAYERS = FINAL_CONFIG["num_layers"]
FINAL_BIDIRECTIONAL = FINAL_CONFIG["bidirectional"]
FINAL_DROPOUT_RATE = FINAL_CONFIG["dropout_rate"]
FINAL_LEARNING_RATE = FINAL_CONFIG["lr"]
FINAL_L2_LAMBDA = FINAL_CONFIG["l2_lambda"]
FINAL_BATCH_SIZE = FINAL_CONFIG["batch_size"]

FINAL_EXPERIMENT_NAME = (
    f"{FINAL_MODEL_TYPE}_H{FINAL_HIDDEN_SIZE}_L{FINAL_HIDDEN_LAYERS}_B{FINAL_BIDIRECTIONAL}"
    f"_D{FINAL_DROPOUT_RATE:.4f}_Static_Window_w{WINDOW_SIZE}_s{STRIDE}_Optuna_FINAL"
)

print(f"Submission name will be: submission_{FINAL_EXPERIMENT_NAME}.csv")

--- Finding best epoch number for the winning model ---

--- Starting Training: final_check_v2 ---
Will train for 200 epochs with patience=30 monitoring val_f1
Epoch   1/200 | Train: Loss=0.7254, F1=0.7007 | Val: Loss=0.5959, F1=0.7319
Epoch  10/200 | Train: Loss=0.1256, F1=0.9553 | Val: Loss=0.2008, F1=0.9441
Epoch  20/200 | Train: Loss=0.0430, F1=0.9843 | Val: Loss=0.0872, F1=0.9697
Epoch  30/200 | Train: Loss=0.0105, F1=0.9971 | Val: Loss=0.0352, F1=0.9848
Epoch  40/200 | Train: Loss=0.0243, F1=0.9893 | Val: Loss=0.0558, F1=0.9848
Epoch  50/200 | Train: Loss=0.0066, F1=0.9976 | Val: Loss=0.0245, F1=0.9904
Epoch  60/200 | Train: Loss=0.0076, F1=0.9966 | Val: Loss=0.0307, F1=0.9924
Epoch  70/200 | Train: Loss=0.0004, F1=1.0000 | Val: Loss=0.0220, F1=0.9924
Epoch  80/200 | Train: Loss=0.0003, F1=1.0000 | Val: Loss=0.0284, F1=0.9943

Early stopping triggered after 84 epochs.
Restoring best model from epoch 54 with val_f1 0.9962
--- Finished Training: final_check_v2 ---

--- üèÜ Optimal

## üì¨ **8. Create Submission (HEAVILY MODIFIED)**
 
This cell is completely new to handle the complex submission logic.
1.  Re-scale all data (TS and Static) on the *full* training set.
2.  Apply **sliding windows** to the *full* training set and the *full* test set.
3.  Train a new model on the *full windowed* training set.
4.  Generate predictions on the *windowed test set*.
5.  **Aggregate** the windowed predictions (e.g., 4 predictions for sample `000`) into a *single* prediction using a **majority vote**.
6.  Save the final aggregated submission file.

In [29]:
# --- 1. & 2. Create Full Training Set & Loader (with windows) ---
print("\n--- Preparing full dataset for final training ---")

# --- Scaler 1: Time-Series ---
scaler_final_ts = StandardScaler()
ns, ts, f = X_train_full_ts.shape
X_train_full_ts_2d = X_train_full_ts.reshape(ns * ts, f)
print(f"Fitting FINAL TS Scaler on X_train_full_ts_2d shape: {X_train_full_ts_2d.shape}")
scaler_final_ts.fit(X_train_full_ts_2d)

# Scale final TS Train
X_train_full_ts_scaled_2d = scaler_final_ts.transform(X_train_full_ts_2d)
X_train_full_ts_scaled = X_train_full_ts_scaled_2d.reshape(ns, ts, f)

# Scale final TS Test
ns_test, ts_test, f_test = X_test_ts.shape
X_test_ts_2d = X_test_ts.reshape(ns_test * ts_test, f_test)
X_test_ts_scaled_2d = scaler_final_ts.transform(X_test_ts_2d)
X_test_ts_scaled = X_test_ts_scaled_2d.reshape(ns_test, ts_test, f_test)

# --- Scaler 2: Static ---
scaler_final_static = StandardScaler()
print(f"Fitting FINAL Static Scaler on X_train_full_static shape: {X_train_full_static.shape}")
scaler_final_static.fit(X_train_full_static)

# Scale final Static Train
X_train_full_static_scaled = scaler_final_static.transform(X_train_full_static)
# Scale final Static Test
X_test_static_scaled = scaler_final_static.transform(X_test_static)

print("Final scaling complete.")
print("--- Applying sliding windows to final dataset ---")

# --- Apply windowing to the final training set ---
(
    X_train_full_windowed, 
    X_static_full_windowed, 
    y_train_full_windowed, 
    _
) = create_sliding_windows(
    X_train_full_ts_scaled,
    X_train_full_static_scaled,
    y_train_full,
    window_size=WINDOW_SIZE,
    stride=STRIDE
)

# --- Apply windowing to the final test set ---
(
    X_test_final_windowed, 
    X_static_test_windowed, 
    _, 
    test_window_to_sample_idx # CRITICAL: We need this to map preds back
) = create_sliding_windows(
    X_test_ts_scaled,
    X_test_static_scaled,
    y=np.zeros(len(X_test_ts_scaled)), # Dummy 'y'
    window_size=WINDOW_SIZE,
    stride=STRIDE
)

print(f"Full train windowed TS shape: {X_train_full_windowed.shape}")
print(f"Full train windowed Static shape: {X_static_full_windowed.shape}")
print(f"Test windowed TS shape: {X_test_final_windowed.shape}")
print(f"Test windowed Static shape: {X_static_test_windowed.shape}")
print(f"Test window-to-sample map shape: {test_window_to_sample_idx.shape}")

# --- Create Tensors and DataLoaders from WINDOWED data ---
full_train_features_ts = torch.from_numpy(X_train_full_windowed).float()
full_train_features_static = torch.from_numpy(X_static_full_windowed).float()
full_train_targets = torch.from_numpy(y_train_full_windowed).long()

final_test_features_ts = torch.from_numpy(X_test_final_windowed).float()
final_test_features_static = torch.from_numpy(X_static_test_windowed).float()

# 3-tensor dataset for training
full_train_ds = TensorDataset(full_train_features_ts, full_train_features_static, full_train_targets)
# 2-tensor dataset for test (no labels)
final_test_ds = TensorDataset(final_test_features_ts, final_test_features_static)

# --- NEW: Define the loader function here ---
def make_final_loader(ds, batch_size, shuffle, drop_last):
    return DataLoader(
        ds, batch_size=int(batch_size), shuffle=shuffle, drop_last=drop_last,
        num_workers=0, pin_memory=True, pin_memory_device="cuda" if torch.cuda.is_available() else "", prefetch_factor=None
    )

full_train_loader = make_final_loader(full_train_ds, batch_size=FINAL_BATCH_SIZE, shuffle=True, drop_last=True)
test_loader = make_final_loader(final_test_ds, batch_size=FINAL_BATCH_SIZE, shuffle=False, drop_last=False)
print("Final DataLoaders created.")

# --- 3. Instantiate Fresh Model ---
print(f"\n--- Building FINAL model for submission: {FINAL_EXPERIMENT_NAME} ---")
final_model = RecurrentClassifier(
    input_size=N_TS_FEATURES,
    static_input_size=N_STATIC_FEATURES,
    hidden_size=FINAL_HIDDEN_SIZE,
    num_layers=FINAL_HIDDEN_LAYERS,
    num_classes=N_CLASSES,
    dropout_rate=FINAL_DROPOUT_RATE,
    bidirectional=FINAL_BIDIRECTIONAL,
    rnn_type=FINAL_MODEL_TYPE
).to(device)

if torch.__version__[0] >= "2":
    print("Compiling final model...")
    final_model = torch.compile(final_model)

final_optimizer = torch.optim.AdamW(final_model.parameters(), lr=FINAL_LEARNING_RATE, weight_decay=FINAL_L2_LAMBDA)
final_scaler = torch.amp.GradScaler(enabled=(device.type == 'cuda'))

# --- 4. Train on Full Dataset ---
print(f"Training final model for {FINAL_BEST_EPOCH} epochs on ALL data...")

final_model.train() 
for epoch in range(1, FINAL_BEST_EPOCH + 1):
    train_loss, train_f1 = train_one_epoch(
        final_model, full_train_loader, final_check_criterion, final_optimizer, final_scaler, device, 0, FINAL_L2_LAMBDA
    )
    if epoch % 5 == 0 or epoch == 1 or epoch == FINAL_BEST_EPOCH:
        print(f"Final Training Epoch {epoch:3d}/{FINAL_BEST_EPOCH} | Train: Loss={train_loss:.4f}, F1={train_f1:.4f}")

print("Final training complete.")

# --- 5. Generate Predictions (on windows) ---
print("\n--- Generating predictions on test set (windowed) ---")
final_model.eval()
all_predictions = []

with torch.no_grad():
    # --- MODIFIED: Unpack 2-part batch ---
    for (ts_inputs, static_inputs) in test_loader: 
        ts_inputs, static_inputs = ts_inputs.to(device), static_inputs.to(device)
        with torch.amp.autocast(device_type=device.type, enabled=(device.type == 'cuda')):
            # --- MODIFIED: Pass both inputs ---
            logits = final_model(ts_inputs, static_inputs)
            preds = logits.argmax(dim=1)
            all_predictions.append(preds.cpu().numpy())

all_predictions = np.concatenate(all_predictions)
print(f"Generated {len(all_predictions)} predictions (from {len(test_window_to_sample_idx)} windows).")


# --- 6. NEW: Aggregate Predictions (Majority Vote) ---
print("Aggregating window predictions to sample predictions...")

df_preds = pd.DataFrame({
    'original_index': test_window_to_sample_idx,
    'prediction': all_predictions
})

# Group by the original sample index (0 to 1323) and find the most common prediction
# mode(x)[0] gets the most frequent value
agg_preds = df_preds.groupby('original_index')['prediction'].apply(lambda x: mode(x)[0]).values

print(f"Aggregated to {len(agg_preds)} final predictions.")

# Inverse transform these aggregated predictions to labels
predicted_labels = le.inverse_transform(agg_preds)

# --- 7. Save Submission File ---
print("Loading sample submission file for correct formatting...")
X_test_long = pd.read_csv(X_TEST_PATH)
test_sample_indices = sorted(X_test_long['sample_index'].unique())

if len(predicted_labels) != len(test_sample_indices):
    print(f"ERROR: Prediction count mismatch! Predictions: {len(predicted_labels)}, Test Indices: {len(test_sample_indices)}")
else:
    print("Prediction count matches. Creating submission.")
    
    final_submission_df = pd.DataFrame({
        'sample_index': test_sample_indices,
        'label': predicted_labels 
    })
    
    final_submission_df['sample_index'] = final_submission_df['sample_index'].apply(lambda x: f"{x:03d}")

    SUBMISSIONS_DIR = "submissions"
    os.makedirs(SUBMISSIONS_DIR, exist_ok=True)
    
    submission_filename = f"submission_{FINAL_EXPERIMENT_NAME}.csv"
    submission_filepath = os.path.join(SUBMISSIONS_DIR, submission_filename)
    
    final_submission_df.to_csv(submission_filepath, index=False)

    print(f"\nSuccessfully saved to {submission_filepath}!")
    print("This file is correctly formatted for Kaggle:")
    print(final_submission_df.head())

del final_model, full_train_loader, test_loader
del full_train_features_ts, full_train_features_static, final_test_features_ts, final_test_features_static


--- Preparing full dataset for final training ---
Fitting FINAL TS Scaler on X_train_full_ts_2d shape: (105760, 35)
Fitting FINAL Static Scaler on X_train_full_static shape: (661, 6)
Final scaling complete.
--- Applying sliding windows to final dataset ---
Full train windowed TS shape: (2644, 100, 35)
Full train windowed Static shape: (2644, 6)
Test windowed TS shape: (5296, 100, 35)
Test windowed Static shape: (5296, 6)
Test window-to-sample map shape: (5296,)
Final DataLoaders created.

--- Building FINAL model for submission: GRU_H64_L3_BTrue_D0.5783_Static_Window_w100_s20_Optuna_FINAL ---
Compiling final model...
Training final model for 54 epochs on ALL data...
Final Training Epoch   1/54 | Train: Loss=0.7753, F1=0.7137
Final Training Epoch   5/54 | Train: Loss=0.2956, F1=0.9096
Final Training Epoch  10/54 | Train: Loss=0.1522, F1=0.9675
Final Training Epoch  15/54 | Train: Loss=0.1381, F1=0.9731
Final Training Epoch  20/54 | Train: Loss=0.1032, F1=0.9843
Final Training Epoch  25