# üè¥‚Äç‚ò†Ô∏è Pirate Pain Classification Challenge

> ‚öì *"Even pirates feel pain ‚Äî let's teach the model to feel it too."*

---

## üìö Table of Contents
0. [README](#readme)  
1. [Setup & Configuration](#setup)  
2. [Data Loading](#data-loading)  
3. [Import Libraries](#import-libraries)  
4. [Data Preprocessing](#data-preprocessing)  
5. [Sequence Building](#sequence-building)  
6. [DataLoaders](#dataloaders)  
7. [Network Hyperparameters](#hyperparameters)
8. [Model Architecture](#model-architecture)  
9. [Training Functions](#training-functions)  
10. [Model Training](#model-training)  
11. [Evaluation & Metrics](#evaluation)  
12. [Model Loading & Final Testing](#model-loading)  
13. [Competition Submission](#submission)

---

### ‚öôÔ∏è Quick Configuration Map

> üß≠ *"If ye seek to tweak the code, here be where to look!"*

- üß∫ **Batch Size:** ‚Üí [DataLoaders](#dataloaders)  
- ‚öóÔ∏è **Hyperparameters:** ‚Üí [Network Hyperparameters](#hyperparameters)  
- ü™û **Window Size & Stride:** ‚Üí [Sequence Building](#sequence-building)  
- ‚öôÔ∏è **Model Type:** ‚Üí [Setup & Configuration](#setup)  

---



---

### üí∞ Treasure Storage ‚Äî Models & Submissions
> üè¥‚Äç‚ò†Ô∏è *"A wise pirate always knows where his treasure be buried ‚Äî guard yer models and submissions well!"*

- üíæ **Model & Submission Save/Load Path:** ‚Üí [Setup & Configuration](#setup)  
  - üóÇÔ∏è Models be saved in a **`models/`** folder with the name:
    **`experiment_name_dd-mm-HH-MM.pt`** (day-month-hour-minute).
  - üìú Submissions be saved in a **`submissions/`** folder with the filename format:  
    **`experiment_name_dd-mm-HH-MM.csv`** .
  - üî° All related model parameters are saved in **`models/`** folder with the  name **`experiment_name_dd-mm-HH-MM_config.json`** .

  
  *‚ùóThe experiment name is set as **`RnnType_Bi_dd-mm-HH-MM`** or **`RnnType_dd-mm-HH-MM`** depending on if it is bidirectional or not*
---






<a id="readme"></a>
## 0. Info



This section lists all the main parameters that can be modified to control data loading, model behavior, and training.

---

### üìÅ File Paths
| Variable | Description | Default Value |
|-----------|--------------|----------------|
| `TRAIN_DATA_PATH` | Training features | `'pirate_pain_train.csv'` |
| `TRAIN_LABELS_PATH` | Training labels | `'pirate_pain_train_labels.csv'` |
| `TEST_DATA_PATH` | Test set for inference | `'pirate_pain_test.csv'` *(optional)* |
| `MODEL_SAVE_PATH` | Output model file | `'pirate_model.pt'` |
| `RESULTS_FILE` | CSV for predictions | `'results_<date-time>.csv'` |

---

### üß† Model & Architecture
| Parameter | Description | Typical Values |
|------------|--------------|----------------|
| `model_type` | Choose model class | `'RNN'`, `'LSTM'`, `'GRU'`, `'ANN'` |
| `input_size` | Number of features per time step | *auto-detected from data* |
| `hidden_size` | Hidden layer size | `64`, `128`, `256` |
| `num_layers` | Number of RNN layers | `1-4` |
| `dropout` | Dropout probability | `0.2‚Äì0.5` |
| `num_classes` | Output classes (pain levels) | *from label set* |

---

### üèãÔ∏è Training Hyperparameters
| Parameter | Description | Default / Range |
|------------|--------------|-----------------|
| `batch_size` | Samples per batch | `512/2^n` |
| `learning_rate` | Optimizer learning rate | `1e-3` |
| `num_epochs` | Training iterations | `500` |
| `optimizer` | Optimization algorithm | `'AdamW'` |
| `criterion` | Loss function | `CrossEntropyLoss()` |
| `seed` | Random seed for reproducibility | `42` |

---

### üì§ Inference
| Parameter | Description |
|------------|--------------|
| `LOAD_MODEL_PATH` | Path to pretrained `.pt` model (optional) |
| `save_results` | Whether to write output CSV | `True` |

---

> üí° *Tip:* Adjust hyperparameters in the ‚ÄúConfiguration‚Äù or ‚ÄúTraining Setup‚Äù cell before running the notebook.


<a id="setup"></a>
## 1. Setup & Configuration

*Optional: Connect to Google Drive (for Colab users)*

In [136]:
from google.colab import drive
drive.mount("/gdrive")
current_dir = "/gdrive/MyDrive/pirate_dataset"
%cd $current_dir

Drive already mounted at /gdrive; to attempt to forcibly remount, call drive.mount("/gdrive", force_remount=True).
/gdrive/MyDrive/pirate_dataset


*Set Model Type*

In [137]:
RNN_TYPE = 'LSTM'            # 'RNN', 'LSTM', or 'GRU'
BIDIRECTIONAL = True        # True / False

*Set Model Save Name*

In [138]:
from datetime import datetime

# Get current date and time for submission filename
current_datetime = datetime.now().strftime("%d-%m-%H-%M")

if BIDIRECTIONAL:
    EXPERIMENT_NAME = f"{RNN_TYPE}_bi_{current_datetime}"
else:
    EXPERIMENT_NAME = f"{RNN_TYPE}_{current_datetime}"

SUBMISSION_FILENAME = f"{EXPERIMENT_NAME}.csv"


# Directory configuration
logs_dir = "tensorboard"
models_dir = "models"

# Model save/load paths
MODEL_SAVE_PATH = f"{models_dir}/{EXPERIMENT_NAME}_model.pt"
MODEL_LOAD_PATH = f"{models_dir}/{EXPERIMENT_NAME}_model.pt"




print(f"Experiment name: {EXPERIMENT_NAME}")
print(f"Submission filename: {SUBMISSION_FILENAME}")
print(f"Model save path: {MODEL_SAVE_PATH}")
print(f"Model load path: {MODEL_LOAD_PATH}")

Experiment name: LSTM_bi_12-11-14-45
Submission filename: LSTM_bi_12-11-14-45.csv
Model save path: models/LSTM_bi_12-11-14-45_model.pt
Model load path: models/LSTM_bi_12-11-14-45_model.pt


<a id="data-loading"></a>
## 2. Data Loading

Load training and test datasets from CSV files.

In [139]:
import pandas as pd

X_train = pd.read_csv('pirate_pain_train.csv')
y_train = pd.read_csv('pirate_pain_train_labels.csv')

<a id="import-libraries"></a>
## 3. Import Libraries

Set random seeds for reproducibility and import all necessary packages.

In [140]:
# Set seed for reproducibility
SEED = 1122
# Import necessary libraries
import os

# Set environment variables before importing modules
os.environ['PYTHONHASHSEED'] = str(SEED)
os.environ['MPLCONFIGDIR'] = os.getcwd() + '/configs/'

# Suppress warnings
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
warnings.simplefilter(action='ignore', category=Warning)

# Import necessary modules
import logging
import random
import numpy as np

# Set seeds for random number generators in NumPy and Python
np.random.seed(SEED)
random.seed(SEED)

# Import PyTorch
import torch
torch.manual_seed(SEED)
from torch import nn
from torchsummary import summary
from torch.utils.tensorboard import SummaryWriter
from torch.utils.data import TensorDataset, DataLoader
from collections import Counter
from sklearn.model_selection import ParameterGrid






!pkill -f tensorboard
%load_ext tensorboard
!mkdir -p {models_dir}

if torch.cuda.is_available():
    device = torch.device("cuda")
    torch.cuda.manual_seed_all(SEED)
    torch.backends.cudnn.benchmark = True
else:
    device = torch.device("cpu")

print(f"PyTorch version: {torch.__version__}")
print(f"Device: {device}")

# Import other libraries
import copy
import shutil
from datetime import datetime
from itertools import product
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score, confusion_matrix,classification_report
from sklearn.preprocessing import StandardScaler
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import json
from itertools import product

# Configure plot display settings
sns.set(font_scale=1.4)
sns.set_style('white')
plt.rc('font', size=14)
%matplotlib inline

The tensorboard extension is already loaded. To reload it, use:
  %reload_ext tensorboard
PyTorch version: 2.8.0+cu126
Device: cuda


<a id="data-preprocessing"></a>
## 4. Data Preprocessing

Explore data, split into train/val/test sets, normalize features, and encode labels.

### 4.1 Data Exploration

In [141]:
# Print the shape of the dataset
print(f"Dataset shape: {X_train.shape}")

# Display the first few rows of the dataset
X_train.head(10)

Dataset shape: (105760, 40)


Unnamed: 0,sample_index,time,pain_survey_1,pain_survey_2,pain_survey_3,pain_survey_4,n_legs,n_hands,n_eyes,joint_00,...,joint_21,joint_22,joint_23,joint_24,joint_25,joint_26,joint_27,joint_28,joint_29,joint_30
0,0,0,2,0,2,1,two,two,two,1.094705,...,3.499558e-06,1.945042e-06,3.999558e-06,1.153299e-05,4e-06,0.017592,0.013508,0.026798,0.027815,0.5
1,0,1,2,2,2,2,two,two,two,1.135183,...,3.976952e-07,6.765107e-07,6.019627e-06,4.643774e-08,0.0,0.013352,0.0,0.013377,0.013716,0.5
2,0,2,2,0,2,2,two,two,two,1.080745,...,1.53382e-07,1.698525e-07,1.446051e-06,2.424536e-06,3e-06,0.016225,0.00811,0.024097,0.023105,0.5
3,0,3,2,2,2,2,two,two,two,0.938017,...,1.006865e-05,5.511079e-07,1.847597e-06,5.432416e-08,0.0,0.011832,0.00745,0.028613,0.024648,0.5
4,0,4,2,2,2,2,two,two,two,1.090185,...,4.437266e-06,1.735459e-07,1.552722e-06,5.825366e-08,7e-06,0.00536,0.002532,0.033026,0.025328,0.5
5,0,5,2,0,2,1,two,two,two,1.146031,...,1.073167e-06,1.753837e-07,2.95734e-07,6.217311e-08,7e-06,0.00615,0.006444,0.033101,0.023767,0.5
6,0,6,2,1,2,1,two,two,two,1.02587,...,1.0748e-06,1.772156e-07,1.976558e-06,1.576086e-06,5e-06,0.006495,0.006421,0.031804,0.019056,0.5
7,0,7,2,2,2,2,two,two,two,1.038597,...,8.829074e-07,1.790415e-07,2.210562e-06,1.485741e-06,0.0,0.015998,0.005397,0.035552,0.015732,0.5
8,0,8,2,2,0,1,two,two,two,0.984251,...,1.621055e-06,1.165161e-06,3.030164e-07,5.416678e-07,0.0,0.020539,0.008517,0.008635,0.015257,0.5
9,0,9,0,2,2,2,two,two,two,1.054999,...,1.609114e-06,3.959558e-06,2.017157e-06,1.154349e-06,7e-06,0.007682,0.021383,0.034006,0.028966,0.5


### 4.2 Categorical Encoding

In [142]:
# Merge features and labels
data = X_train.merge(y_train, on='sample_index')

# Create a mapping dictionary to convert categorical labels to numerical values
map_dict_legs = { 'two': 2, 'one+peg_leg': 1}
map_dict_hands = { 'two': 2, 'one+hook_hand': 1}
map_dict_eyes = { 'two': 2, 'one+eye_patch': 1}
data['n_legs'] = data['n_legs'].map(map_dict_legs)
data['n_hands'] = data['n_hands'].map(map_dict_hands)
data['n_eyes'] = data['n_eyes'].map(map_dict_eyes)


### 4.3 Stratified Train/Val/Test Split

In [143]:
import pandas as pd
import random
from sklearn.model_selection import train_test_split

# df has columns: ['sample_index', 'label']
N_VAL_USERS = 120
N_TEST_USERS = 120

# --- Step 1: Compute each user's dominant label (or label distribution)
user_labels = (
    data.groupby('sample_index')['label']
    .agg(lambda x: x.value_counts().index[0])  # dominant label per user
    .reset_index()
)

train_users, temp_users = train_test_split(
    user_labels['sample_index'],
    test_size=(N_VAL_USERS + N_TEST_USERS) / len(user_labels),
    stratify=user_labels['label'],
    random_state=SEED
)

# Split temp into val/test (also stratified)
temp_labels = user_labels[user_labels['sample_index'].isin(temp_users)]
if N_TEST_USERS != 0:
  val_users, test_users = train_test_split(
      temp_labels['sample_index'],
      test_size=N_TEST_USERS / (N_VAL_USERS + N_TEST_USERS),
      stratify=temp_labels['label'],
      random_state=SEED
  )
else:
  val_users = temp_users
  test_users = []

# --- Step 3: Filter your main df
df_train = data[data['sample_index'].isin(train_users)]
df_val = data[data['sample_index'].isin(val_users)]
df_test = data[data['sample_index'].isin(test_users)]

# --- Step 4: Check label proportions
print("Label proportions:")
print("Train:\n", df_train['label'].value_counts(normalize=True))
print("Val:\n", df_val['label'].value_counts(normalize=True))
print("Test:\n", df_test['label'].value_counts(normalize=True))

Label proportions:
Train:
 label
no_pain      0.771971
low_pain     0.142518
high_pain    0.085511
Name: proportion, dtype: float64
Val:
 label
no_pain      0.775000
low_pain     0.141667
high_pain    0.083333
Name: proportion, dtype: float64
Test:
 label
no_pain      0.775000
low_pain     0.141667
high_pain    0.083333
Name: proportion, dtype: float64


In [144]:
df_train.shape, df_val.shape, df_test.shape

((67360, 41), (19200, 41), (19200, 41))

In [145]:
# Print the total number of pirates for each dataset
print(f"Total pirates in training set: {df_train['sample_index'].nunique()}")
print(f"Total pirates in validation set: {df_val['sample_index'].nunique()}")
print(f"Total pirates in test set: {df_test['sample_index'].nunique()}")

Total pirates in training set: 421
Total pirates in validation set: 120
Total pirates in test set: 120


### 4.4 Feature Normalization (min-max)

In [146]:
# Define the columns to be normalised

scale_columns = [
    col for col in data.columns
    if (col.startswith('joint_') or col.startswith('pain_survey')) and not col.startswith('joint_30')
]



# Calculate the minimum and maximum values from the training data only
mins_train = df_train[scale_columns].min()
maxs_train = df_train[scale_columns].max()

#mins_val = df_val[scale_columns].min()
#maxs_val = df_val[scale_columns].max()
#
#mins_test = df_test[scale_columns].min()
#maxs_test = df_test[scale_columns].max()

####
#CHANGED ALL THE REGULARIZATION TO USE MIN AND MAX VALUES FROM THE TRAINING DATA FOR GENERALIZATION
###

# Apply normalisation to the specified columns in all datasets
for column in scale_columns:
    denom = maxs_train[column] - mins_train[column]
    if np.isclose(denom, 0.0):
        df_train[column] = 0.0
        df_val[column] = 0.0
        df_test[column] = 0.0
        continue

    # Normalise the training set
    df_train[column] = (df_train[column] - mins_train[column]) / denom

    # Normalise the validation set
    df_val[column] = (df_val[column] - mins_train[column]) / denom

    # Normalise the test set
    df_test[column] = (df_test[column] - mins_train[column]) / denom





In [147]:
df_train.head(9)

Unnamed: 0,sample_index,time,pain_survey_1,pain_survey_2,pain_survey_3,pain_survey_4,n_legs,n_hands,n_eyes,joint_00,...,joint_22,joint_23,joint_24,joint_25,joint_26,joint_27,joint_28,joint_29,joint_30,label
0,0,0,1.0,0.0,1.0,0.5,2,2,2,0.777507,...,1.374706e-06,1.5e-05,0.0003162813,4e-06,0.014214,0.011376,0.018978,0.024117,0.5,no_pain
1,0,1,1.0,1.0,1.0,1.0,2,2,2,0.806256,...,4.026521e-07,2.2e-05,9.828599e-07,0.0,0.010748,0.0,0.009473,0.011892,0.5,no_pain
2,0,2,1.0,0.0,1.0,1.0,2,2,2,0.767592,...,1.440847e-08,5e-06,6.626013e-05,3e-06,0.013097,0.00683,0.017065,0.020033,0.5,no_pain
3,0,3,1.0,1.0,1.0,1.0,2,2,2,0.66622,...,3.06558e-07,7e-06,1.199337e-06,0.0,0.009505,0.006274,0.020264,0.021371,0.5,no_pain
4,0,4,1.0,1.0,1.0,1.0,2,2,2,0.774297,...,1.723863e-08,6e-06,1.307199e-06,7e-06,0.004216,0.002132,0.023389,0.021961,0.5,no_pain
5,0,5,1.0,0.0,1.0,0.5,2,2,2,0.813961,...,1.864695e-08,1e-06,1.414785e-06,8e-06,0.004861,0.005427,0.023442,0.020607,0.5,no_pain
6,0,6,1.0,0.5,1.0,0.5,2,2,2,0.728617,...,2.005071e-08,7e-06,4.297072e-05,5e-06,0.005143,0.005407,0.022523,0.016522,0.5,no_pain
7,0,7,1.0,1.0,1.0,1.0,2,2,2,0.737657,...,2.144985e-08,8e-06,4.04908e-05,0.0,0.012911,0.004546,0.025178,0.01364,0.5,no_pain
8,0,8,1.0,1.0,0.0,0.5,2,2,2,0.699058,...,7.770963e-07,1e-06,1.457661e-05,0.0,0.016622,0.007172,0.006115,0.013229,0.5,no_pain


In [148]:
# @title  Delete Some Columns Experimental

#del_columns = [
#    col for col in data.columns
#    if not (col.startswith('pain_survey') or col.startswith('sample_index') or col.startswith('label') or col.startswith('time') or
#            col.endswith('00') or col.endswith('01') or col.endswith('02') or col.endswith('03') or col.endswith('04') or col.endswith('05')
#            or col.endswith('06') or col.endswith('07') or col.endswith('08') or col.endswith('09') or col.endswith('10') or col.endswith('11')
#            or col.endswith('12') or col.endswith('25') or col.endswith('26') or col.endswith('27') or col.endswith('28') or col.endswith('29'))
#]
#
#for column in del_columns:
#
#    # Normalise the training set
#    df_train[column] =  0.0
#    df_val[column] =  0.0
#    df_test[column] =  0.0
#


In [149]:
df_train.head()

Unnamed: 0,sample_index,time,pain_survey_1,pain_survey_2,pain_survey_3,pain_survey_4,n_legs,n_hands,n_eyes,joint_00,...,joint_22,joint_23,joint_24,joint_25,joint_26,joint_27,joint_28,joint_29,joint_30,label
0,0,0,1.0,0.0,1.0,0.5,2,2,2,0.777507,...,1.374706e-06,1.5e-05,0.0003162813,4e-06,0.014214,0.011376,0.018978,0.024117,0.5,no_pain
1,0,1,1.0,1.0,1.0,1.0,2,2,2,0.806256,...,4.026521e-07,2.2e-05,9.828599e-07,0.0,0.010748,0.0,0.009473,0.011892,0.5,no_pain
2,0,2,1.0,0.0,1.0,1.0,2,2,2,0.767592,...,1.440847e-08,5e-06,6.626013e-05,3e-06,0.013097,0.00683,0.017065,0.020033,0.5,no_pain
3,0,3,1.0,1.0,1.0,1.0,2,2,2,0.66622,...,3.06558e-07,7e-06,1.199337e-06,0.0,0.009505,0.006274,0.020264,0.021371,0.5,no_pain
4,0,4,1.0,1.0,1.0,1.0,2,2,2,0.774297,...,1.723863e-08,6e-06,1.307199e-06,7e-06,0.004216,0.002132,0.023389,0.021961,0.5,no_pain


### 4.5 Label Distribution Analysis

In [150]:
# Initialise a dictionary to count occurrences of each activity in the training set
training_labels = {
    'no_pain': 0,
    'low_pain': 0,
    'high_pain': 0
}

# Count occurrences of each activity for unique IDs in the training set
for id in df_train['sample_index'].unique():
    label = df_train[df_train['sample_index'] == id]['label'].values[0]
    training_labels[label] += 1


# Print the distribution of training labels
print('Training labels:', training_labels)

# Initialise a dictionary to count occurrences of each activity in the training set
val_labels = {
    'no_pain': 0,
    'low_pain': 0,
    'high_pain': 0
}

# Count occurrences of each activity for unique IDs in the training set
for id in df_val['sample_index'].unique():
    label = df_val[df_val['sample_index'] == id]['label'].values[0]
    val_labels[label] += 1

# Print the distribution of validation labels
print('Validation labels:', val_labels)

# Initialise a dictionary to count occurrences of each activity in the test set
test_labels = {
    'no_pain': 0,
    'low_pain': 0,
    'high_pain': 0
}

# Count occurrences of each activity for unique IDs in the test set
for id in df_test['sample_index'].unique():
    label = df_test[df_test['sample_index'] == id]['label'].values[0]
    test_labels[label] += 1

# Print the distribution of test labels
print('Test labels:', test_labels)

Training labels: {'no_pain': 325, 'low_pain': 60, 'high_pain': 36}
Validation labels: {'no_pain': 93, 'low_pain': 17, 'high_pain': 10}
Test labels: {'no_pain': 93, 'low_pain': 17, 'high_pain': 10}


In [151]:
# Define a training mapping of label names to integer labels
label_mapping = {
    'no_pain': 0,
    'low_pain': 1,
    'high_pain': 2
}

# Map label names to integers in the training set
df_train['label'] = df_train['label'].map(label_mapping)

# Map label names to integers in the validation set
df_val['label'] = df_val['label'].map(label_mapping)

# Map label names to integers in the test set
df_test['label'] = df_test['label'].map(label_mapping)


In [152]:
print(df_train.head(3))

   sample_index  time  pain_survey_1  pain_survey_2  pain_survey_3  \
0             0     0            1.0            0.0            1.0   
1             0     1            1.0            1.0            1.0   
2             0     2            1.0            0.0            1.0   

   pain_survey_4  n_legs  n_hands  n_eyes  joint_00  ...      joint_22  \
0            0.5       2        2       2  0.777507  ...  1.374706e-06   
1            1.0       2        2       2  0.806256  ...  4.026521e-07   
2            1.0       2        2       2  0.767592  ...  1.440847e-08   

   joint_23      joint_24  joint_25  joint_26  joint_27  joint_28  joint_29  \
0  0.000015  3.162813e-04  0.000004  0.014214  0.011376  0.018978  0.024117   
1  0.000022  9.828599e-07  0.000000  0.010748  0.000000  0.009473  0.011892   
2  0.000005  6.626013e-05  0.000003  0.013097  0.006830  0.017065  0.020033   

   joint_30  label  
0       0.5      0  
1       0.5      0  
2       0.5      0  

[3 rows x 41 columns

<a id="sequence-building"></a>
## 5. Sequence Building

Convert variable-length time-series into fixed-size windows for RNN input.

In [153]:

# Define window and stride boolean variable -> if True, during training we will visit more time the same pirate with overlapping windows
# if False, each pirate will be visited only once during training
one_pirate_window = True

In [154]:
if one_pirate_window:
    # Define the window size
    WINDOW_SIZE = 30 # before: 80

    # Stride size
    STRIDE = 10
else:
    # Define the window size -> select an higher window size in order to get more pirates
    WINDOW_SIZE = 160

    # Stride size
    STRIDE = 160

### 5.1 Window & Stride Configuration

### 5.2 Build Sequences Function

In [155]:
def build_sequences(df, window=200, stride=200):
    assert window % stride == 0

    dataset = []
    labels = []
    ids = []  # <--- NEW: to store pirate/sample IDs

    for id in df['sample_index'].unique():
        columns = [col for col in df.columns if col not in ['sample_index', 'label', 'time']]
        temp = df[df['sample_index'] == id][columns].values
        label = df[df['sample_index'] == id]['label'].values[0]

        remainder = len(temp) % window
        padding_len = (window - remainder) % window
        if padding_len:
            padding = np.zeros((padding_len, len(columns)), dtype='float32')
            temp = np.concatenate((temp, padding))

        idx = 0
        while idx + window <= len(temp):
            dataset.append(temp[idx:idx + window])
            labels.append(label)
            ids.append(id)  # <--- NEW: add same ID for each window
            idx += stride

    dataset = np.array(dataset)
    labels = np.array(labels)
    ids = np.array(ids)  # <--- convert to numpy

    return dataset, labels, ids  # <--- UPDATED return


### 5.3 Generate Sequences for Train/Val/Test

In [156]:
# Generate sequences and labels for the training set
X_train, y_train, ids_train = build_sequences(df_train, WINDOW_SIZE, STRIDE)

# Generate sequences and labels for the validation set
X_val, y_val, ids_val = build_sequences(df_val, WINDOW_SIZE, STRIDE)

# Generate sequences and labels for the test set
X_test, y_test, ids_test = build_sequences(df_test, WINDOW_SIZE, STRIDE)

# Print the shapes of the generated datasets and their labels
print(f"X_train shape: {X_train.shape}, y_train shape: {y_train.shape}")
print(f"X_val shape: {X_val.shape}, y_val shape: {y_val.shape}")
print(f"X_test shape: {X_test.shape}, y_test shape: {y_test.shape}")

X_train shape: (6736, 30, 38), y_train shape: (6736,)
X_val shape: (1920, 30, 38), y_val shape: (1920,)
X_test shape: (1920, 30, 38), y_test shape: (1920,)


### 5.4 Data Type Conversion & Cleaning

In [157]:
# Convert dataset into float32 for PyTorch compatibility
X_train = X_train.astype('float32')
X_val = X_val.astype('float32')
X_test = X_test.astype('float32')
# y_train = y_train.astype('int64')
# y_val = y_val.astype('int64')
# y_test = y_test.astype('int64')

In [158]:
# Define the input shape based on the training data
input_shape = X_train.shape[1:]

# Define the number of classes based on the categorical labels
num_classes = len(np.unique(y_train))
print(f"Number of Classes: {num_classes}")

Number of Classes: 3


In [159]:

# Discard nan values from the dataset
if np.isnan(X_train).any() or np.isnan(X_val).any() or np.isnan(X_test).any():
    X_train = np.nan_to_num(X_train)
    X_val = np.nan_to_num(X_val)
    X_test = np.nan_to_num(X_test)


In [160]:
# Convert numpy arrays to PyTorch datasets (pairs features with labels)
train_ds = TensorDataset(torch.from_numpy(X_train), torch.from_numpy(y_train))
val_ds   = TensorDataset(torch.from_numpy(X_val), torch.from_numpy(y_val))
test_ds  = TensorDataset(torch.from_numpy(X_test), torch.from_numpy(y_test))

<a id="dataloaders"></a>
## 6. DataLoaders

Create PyTorch DataLoaders for efficient batching and parallel loading.

In [161]:
# Define the batch size, which is the number of samples in each batch
BATCH_SIZE = 1024 # we can change it depending on the GPU RAM available (by default 512)

In [162]:
def make_loader(ds, batch_size, shuffle, drop_last):
    # Determine optimal number of worker processes for data loading
    cpu_cores = os.cpu_count() or 2
    num_workers = max(2, min(4, cpu_cores))

    # Create DataLoader with performance optimizations
    return DataLoader(
        ds,
        batch_size=batch_size,
        shuffle=shuffle,
        drop_last=drop_last,
        num_workers=num_workers,
        pin_memory=True,  # Faster GPU transfer
        pin_memory_device="cuda" if torch.cuda.is_available() else "",
        prefetch_factor=4,  # Load 4 batches ahead
    )

In [163]:
# Create data loaders with different settings for each phase
train_loader = make_loader(train_ds, batch_size=BATCH_SIZE, shuffle=True, drop_last=False)
val_loader   = make_loader(val_ds, batch_size=BATCH_SIZE, shuffle=False, drop_last=False)
test_loader  = make_loader(test_ds, batch_size=BATCH_SIZE, shuffle=False, drop_last=False)

In [164]:
# Get one batch from the training data loader
for xb, yb in train_loader:
    print("Features batch shape:", xb.shape)
    print("Labels batch shape:", yb.shape)
    break # Stop after getting one batch

Features batch shape: torch.Size([1024, 30, 38])
Labels batch shape: torch.Size([1024])


In [165]:
def recurrent_summary(model, input_size):
    """
    Custom summary function that emulates torchinfo's output while correctly
    counting parameters for RNN/GRU/LSTM layers.

    This function is designed for models whose direct children are
    nn.Linear, nn.RNN, nn.GRU, or nn.LSTM layers.

    Args:
        model (nn.Module): The model to analyze.
        input_size (tuple): Shape of the input tensor (e.g., (seq_len, features)).
    """

    # Dictionary to store output shapes captured by forward hooks
    output_shapes = {}
    # List to track hook handles for later removal
    hooks = []

    def get_hook(name):
        """Factory function to create a forward hook for a specific module."""
        def hook(module, input, output):
            # Handle RNN layer outputs (returns a tuple)
            if isinstance(output, tuple):
                # output[0]: all hidden states with shape (batch, seq_len, hidden*directions)
                shape1 = list(output[0].shape)
                shape1[0] = -1  # Replace batch dimension with -1

                # output[1]: final hidden state h_n (or tuple (h_n, c_n) for LSTM)
                if isinstance(output[1], tuple):  # LSTM case: (h_n, c_n)
                    shape2 = list(output[1][0].shape)  # Extract h_n only
                else:  # RNN/GRU case: h_n only
                    shape2 = list(output[1].shape)

                # Replace batch dimension (middle position) with -1
                shape2[1] = -1

                output_shapes[name] = f"[{shape1}, {shape2}]"

            # Handle standard layer outputs (e.g., Linear)
            else:
                shape = list(output.shape)
                shape[0] = -1  # Replace batch dimension with -1
                output_shapes[name] = f"{shape}"
        return hook

    # 1. Determine the device where model parameters reside
    try:
        device = next(model.parameters()).device
    except StopIteration:
        device = torch.device("cpu")  # Fallback for models without parameters

    # 2. Create a dummy input tensor with batch_size=1
    dummy_input = torch.randn(1, *input_size).to(device)

    # 3. Register forward hooks on target layers
    # Iterate through direct children of the model (e.g., self.rnn, self.classifier)
    for name, module in model.named_children():
        if isinstance(module, (nn.Linear, nn.RNN, nn.GRU, nn.LSTM)):
            # Register the hook and store its handle for cleanup
            hook_handle = module.register_forward_hook(get_hook(name))
            hooks.append(hook_handle)

    # 4. Execute a dummy forward pass in evaluation mode
    model.eval()
    with torch.no_grad():
        try:
            model(dummy_input)
        except Exception as e:
            print(f"Error during dummy forward pass: {e}")
            # Clean up hooks even if an error occurs
            for h in hooks:
                h.remove()
            return

    # 5. Remove all registered hooks
    for h in hooks:
        h.remove()

    # --- 6. Print the summary table ---

    print("-" * 79)
    # Column headers
    print(f"{'Layer (type)':<25} {'Output Shape':<28} {'Param #':<18}")
    print("=" * 79)

    total_params = 0
    total_trainable_params = 0

    # Iterate through modules again to collect and display parameter information
    for name, module in model.named_children():
        if name in output_shapes:
            # Count total and trainable parameters for this module
            module_params = sum(p.numel() for p in module.parameters())
            trainable_params = sum(p.numel() for p in module.parameters() if p.requires_grad)

            total_params += module_params
            total_trainable_params += trainable_params

            # Format strings for display
            layer_name = f"{name} ({type(module).__name__})"
            output_shape_str = str(output_shapes[name])
            params_str = f"{trainable_params:,}"

            print(f"{layer_name:<25} {output_shape_str:<28} {params_str:<15}")

    print("=" * 79)
    print(f"Total params: {total_params:,}")
    print(f"Trainable params: {total_trainable_params:,}")
    print(f"Non-trainable params: {total_params - total_trainable_params:,}")
    print("-" * 79)

<a id="hyperparameters"></a>
## 7. Network Hyperparameters

Configure training settings, architecture parameters, and regularization.

In [166]:
 # Training configuration
LEARNING_RATE = 1e-3
EPOCHS = 500
PATIENCE = 40

# Architecture
HIDDEN_LAYERS = 2        # Hidden layers
HIDDEN_SIZE = [32,16,32,16]   # Neurons per layer -> prev hidden size = 128

# Regularisation
DROPOUT_RATE = 0.5     # Dropout probability

# For now disable weight decay
L1_LAMBDA = 0.0001       # L1 penalty
L2_LAMBDA = 0.001         # L2 penalty

# Set up loss function and optimizer
weights = torch.tensor([0.8, 1.0, 1.2]).to(device)



# TO WEIGHT MORE THE "MORE DIFFICULT" CASES AND THE LESS FREQUENT LABELS:
class FocalLoss(nn.Module):
    def __init__(self, alpha=None, gamma=2.0):
        super(FocalLoss, self).__init__()
        self.alpha = alpha
        self.gamma = gamma
        self.ce = nn.CrossEntropyLoss(weight=alpha, reduction='none')

    def forward(self, inputs, targets):
        ce_loss = self.ce(inputs, targets)
        pt = torch.exp(-ce_loss)
        focal_loss = ((1 - pt) ** self.gamma) * ce_loss
        return focal_loss.mean()

alpha = None  # None if we don't want to alterate the weights of each label losses (FocalLoss already do it)
criterion = FocalLoss(alpha=alpha, gamma=1.3)  # gamma = 0 it's like Crossentropy(), gamma < 1 it's like in between Crossentropy and FocalLoss,
                                               # gamma = 1 it's a good compromise, gamma = 1.5 or gamma = 2 to weight so much the less present labels


#criterion = nn.CrossEntropyLoss(weight=weights)

In [167]:
# Initialize best model tracking variables
best_model = None
best_performance = float('-inf')

<a id="model-architecture"></a>
## 8. Model Architecture

Custom RNN/LSTM/GRU classifier with configurable bidirectionality and dropout.

### 7.1 Recurrent Classifier Class

In [168]:
class RecurrentClassifier(nn.Module):
    """
    Generic RNN classifier (RNN, LSTM, GRU).
    Uses the last hidden state for classification.
    """
    def __init__(
            self,
            input_size,
            hidden_size,
            num_layers,
            num_classes,
            rnn_type=  'LSTM',        # 'RNN', 'LSTM', or 'GRU'
            bidirectional=False,
            dropout_rate=0.2
            ):
        super().__init__()

        self.rnn_type = rnn_type
        self.num_layers = num_layers
        self.hidden_size = hidden_size
        self.bidirectional = bidirectional

        # Map string name to PyTorch RNN class
        rnn_map = {
            'RNN': nn.RNN,
            'LSTM': nn.LSTM,
            'GRU': nn.GRU
        }

        if rnn_type not in rnn_map:
            raise ValueError("rnn_type must be 'RNN', 'LSTM', or 'GRU'")

        rnn_module = rnn_map[rnn_type]

        # Dropout is only applied between layers (if num_layers > 1)
        dropout_val = dropout_rate if num_layers > 1 else 0 # dropout between RNN layers, applied for regularization

        # Create the recurrent layer
        self.rnn = rnn_module(
            input_size=input_size,
            hidden_size=hidden_size,
            num_layers=num_layers,
            batch_first=True,       # Input shape: (batch, seq_len, features)
            bidirectional=bidirectional, # We are defining a bidirectional RNN since we want to extract also the future contextual information for making better predictions
            dropout=dropout_val
        )

        # Calculate input size for the final classifier
        if self.bidirectional:
            classifier_input_size = hidden_size * 2 # Concat fwd + bwd
        else:
            classifier_input_size = hidden_size

        # Final classification layer
        self.classifier = nn.Linear(classifier_input_size, num_classes) # output layer for classifying

    def forward(self, x):
        """
        x shape: (batch_size, seq_length, input_size)
        """

        # rnn_out shape: (batch_size, seq_len, hidden_size * num_directions)
        rnn_out, hidden = self.rnn(x) # feeds the input sequence into the RNN layer
        # rnn_out -> contains the hidden state output for every timestep

        # LSTM returns (h_n, c_n), we only need h_n
        if self.rnn_type == 'LSTM':
            hidden = hidden[0]  # final hidden state of the last timestep

        # hidden shape: (num_layers * num_directions, batch_size, hidden_size)

        if self.bidirectional:
            # For bidirectional, hidden states are interleaved:
            # [layer_0_fwd, layer_0_bwd, layer_1_fwd, layer_1_bwd, ...]
            # We want the last layer's forward and backward states
            fwd_hidden = hidden[-2, :, :]  # Last layer, forward direction
            bwd_hidden = hidden[-1, :, :]  # Last layer, backward direction
            hidden_to_classify = torch.cat([fwd_hidden, bwd_hidden], dim=1)
        else:
            hidden_to_classify = hidden[-1]

        # Get logits
        logits = self.classifier(hidden_to_classify)
        return logits

In [169]:
class FlexibleRecurrentClassifier(nn.Module):
    def __init__(self, input_size, hidden_sizes, num_classes,
                 rnn_type='LSTM', bidirectional=False, dropout_rate=0.2,
                 use_batch_norm=False):
        super().__init__()
        assert isinstance(hidden_sizes, (list, tuple)) and len(hidden_sizes) >= 1

        self.rnn_type = rnn_type
        self.bidirectional = bidirectional
        self.num_layers = len(hidden_sizes)
        self.use_batch_norm = use_batch_norm

        rnn_map = {'RNN': nn.RNN, 'LSTM': nn.LSTM, 'GRU': nn.GRU}
        if rnn_type not in rnn_map:
            raise ValueError("rnn_type must be 'RNN', 'LSTM', or 'GRU'")

        rnn_module = rnn_map[rnn_type]
        self.rnns = nn.ModuleList()
        self.batch_norms = nn.ModuleList() if use_batch_norm else None

        input_dim = input_size
        for hidden_dim in hidden_sizes:
            self.rnns.append(
                rnn_module(
                    input_size=input_dim,
                    hidden_size=hidden_dim,
                    num_layers=1,
                    batch_first=True,
                    bidirectional=bidirectional,
                    dropout=0.0
                )
            )
            output_dim = hidden_dim * (2 if bidirectional else 1)

            if use_batch_norm:
                self.batch_norms.append(nn.BatchNorm1d(output_dim))

            input_dim = output_dim

        self.dropout = nn.Dropout(dropout_rate)
        self.classifier = nn.Linear(input_dim, num_classes)

    def forward(self, x):
        """
        Args:
            x: (batch_size, seq_len, input_size)
        Returns:
            logits: (batch_size, num_classes)
        """
        out = x

        for i, rnn in enumerate(self.rnns):
            out, hidden = rnn(out)

            if self.use_batch_norm:
                # (batch, seq, features) -> (batch, features, seq)
                out = out.transpose(1, 2)
                out = self.batch_norms[i](out)
                out = out.transpose(1, 2)

            out = self.dropout(out)

        # Use final timestep output (more common than hidden state)
        final_output = out[:, -1, :]  # (batch, hidden_dim * num_directions)

        logits = self.classifier(final_output)
        return logits

<a id="training-functions"></a>
## 9. Training Functions

Helper functions for training, validation, logging, and early stopping.

In [170]:
def train_one_epoch(model, train_loader, criterion, optimizer, scaler, device, l1_lambda=0, l2_lambda=0):
    """
    Perform one complete training epoch through the entire training dataset.

    Args:
        model (nn.Module): The neural network model to train
        train_loader (DataLoader): PyTorch DataLoader containing training data batches
        criterion (nn.Module): Loss function (e.g., CrossEntropyLoss, MSELoss)
        optimizer (torch.optim): Optimization algorithm (e.g., Adam, SGD)
        scaler (GradScaler): PyTorch's gradient scaler for mixed precision training
        device (torch.device): Computing device ('cuda' for GPU, 'cpu' for CPU)
        l1_lambda (float): Lambda for L1 regularization
        l2_lambda (float): Lambda for L2 regularization

    Returns:
        tuple: (average_loss, f1 score) - Training loss and f1 score for this epoch
    """
    model.train()  # Set model to training mode

    running_loss = 0.0
    all_predictions = []
    all_targets = []

    # Iterate through training batches
    for batch_idx, (inputs, targets) in enumerate(train_loader):
        # Move data to device (GPU/CPU)
        inputs, targets = inputs.to(device), targets.to(device)

        # Clear gradients from previous step
        optimizer.zero_grad(set_to_none=True)

        # Forward pass with mixed precision (if CUDA available)
        with torch.amp.autocast(device_type=device.type, enabled=(device.type == 'cuda')):
            logits = model(inputs)
            loss = criterion(logits, targets)

            # Add L1 and L2 regularization
            l1_norm = sum(p.abs().sum() for p in model.parameters())
            l2_norm = sum(p.pow(2).sum() for p in model.parameters())
            loss = loss + l1_lambda * l1_norm + l2_lambda * l2_norm


        # Backward pass with gradient scaling
        scaler.scale(loss).backward()
        scaler.step(optimizer)
        scaler.update()

        # Accumulate metrics
        running_loss += loss.item() * inputs.size(0)
        predictions = logits.argmax(dim=1)
        all_predictions.append(predictions.cpu().numpy())
        all_targets.append(targets.cpu().numpy())

    # Calculate epoch metrics
    epoch_loss = running_loss / len(train_loader.dataset)
    epoch_f1 = f1_score(
        np.concatenate(all_targets),
        np.concatenate(all_predictions),
        average='weighted'
    )

    return epoch_loss, epoch_f1

### 9.1 Train One Epoch Function

In [171]:
def validate_one_epoch(model, val_loader, criterion, device):
    """
    Perform one complete validation epoch through the entire validation dataset.

    Args:
        model (nn.Module): The neural network model to evaluate (must be in eval mode)
        val_loader (DataLoader): PyTorch DataLoader containing validation data batches
        criterion (nn.Module): Loss function used to calculate validation loss
        device (torch.device): Computing device ('cuda' for GPU, 'cpu' for CPU)

    Returns:
        tuple: (average_loss, accuracy) - Validation loss and accuracy for this epoch

    Note:
        This function automatically sets the model to evaluation mode and disables
        gradient computation for efficiency during validation.
    """
    model.eval()  # Set model to evaluation mode

    running_loss = 0.0
    all_predictions = []
    all_targets = []

    # Disable gradient computation for validation
    with torch.no_grad():
        for inputs, targets in val_loader:
            # Move data to device
            inputs, targets = inputs.to(device), targets.to(device)

            # Forward pass with mixed precision (if CUDA available)
            with torch.amp.autocast(device_type=device.type, enabled=(device.type == 'cuda')):
                logits = model(inputs)
                loss = criterion(logits, targets)

            # Accumulate metrics
            running_loss += loss.item() * inputs.size(0)
            predictions = logits.argmax(dim=1)
            all_predictions.append(predictions.cpu().numpy())
            all_targets.append(targets.cpu().numpy())

    # Calculate epoch metrics
    epoch_loss = running_loss / len(val_loader.dataset)
    epoch_accuracy = f1_score(
        np.concatenate(all_targets),
        np.concatenate(all_predictions),
        average='weighted'
    )

    return epoch_loss, epoch_accuracy

### 9.2 Validate One Epoch Function

In [172]:
def log_metrics_to_tensorboard(writer, epoch, train_loss, train_f1, val_loss, val_f1, model):
    """
    Log training metrics and model parameters to TensorBoard for visualization.

    Args:
        writer (SummaryWriter): TensorBoard SummaryWriter object for logging
        epoch (int): Current epoch number (used as x-axis in TensorBoard plots)
        train_loss (float): Training loss for this epoch
        train_f1 (float): Training f1 score for this epoch
        val_loss (float): Validation loss for this epoch
        val_f1 (float): Validation f1 score for this epoch
        model (nn.Module): The neural network model (for logging weights/gradients)

    Note:
        This function logs scalar metrics (loss/f1 score) and histograms of model
        parameters and gradients, which helps monitor training progress and detect
        issues like vanishing/exploding gradients.
    """
    # Log scalar metrics
    writer.add_scalar('Loss/Training', train_loss, epoch)
    writer.add_scalar('Loss/Validation', val_loss, epoch)
    writer.add_scalar('F1/Training', train_f1, epoch)
    writer.add_scalar('F1/Validation', val_f1, epoch)

    # Log model parameters and gradients
    for name, param in model.named_parameters():
        if param.requires_grad:
            # Check if the tensor is not empty before adding a histogram
            if param.numel() > 0:
                writer.add_histogram(f'{name}/weights', param.data, epoch)
            if param.grad is not None:
                # Check if the gradient tensor is not empty before adding a histogram
                if param.grad.numel() > 0:
                    if param.grad is not None and torch.isfinite(param.grad).all():
                        writer.add_histogram(f'{name}/gradients', param.grad.data, epoch)

### 9.3 Fit  Function

In [173]:
def fit(model, train_loader, val_loader, epochs, criterion, optimizer, scaler, device,
        l1_lambda=0, l2_lambda=0, patience=0, evaluation_metric="val_f1", mode='max',
        restore_best_weights=True, writer=None, verbose=10, experiment_name="",save_model=True):
    """
    Train the neural network model on the training data and validate on the validation data.

    Args:
        model (nn.Module): The neural network model to train
        train_loader (DataLoader): PyTorch DataLoader containing training data batches
        val_loader (DataLoader): PyTorch DataLoader containing validation data batches
        epochs (int): Number of training epochs
        criterion (nn.Module): Loss function (e.g., CrossEntropyLoss, MSELoss)
        optimizer (torch.optim): Optimization algorithm (e.g., Adam, SGD)
        scaler (GradScaler): PyTorch's gradient scaler for mixed precision training
        device (torch.device): Computing device ('cuda' for GPU, 'cpu' for CPU)
        l1_lambda (float): L1 regularization coefficient (default: 0)
        l2_lambda (float): L2 regularization coefficient (default: 0)
        patience (int): Number of epochs to wait for improvement before early stopping (default: 0)
        evaluation_metric (str): Metric to monitor for early stopping (default: "val_f1")
        mode (str): 'max' for maximizing the metric, 'min' for minimizing (default: 'max')
        restore_best_weights (bool): Whether to restore model weights from best epoch (default: True)
        writer (SummaryWriter, optional): TensorBoard SummaryWriter object for logging (default: None)
        verbose (int, optional): Frequency of printing training progress (default: 10)
        experiment_name (str, optional): Experiment name for saving models (default: "")

    Returns:
        tuple: (model, training_history) - Trained model and metrics history
    """

    # Initialize metrics tracking
    training_history = {
        'train_loss': [], 'val_loss': [],
        'train_f1': [], 'val_f1': []
    }

    # Configure early stopping if patience is set
    if patience > 0:
        patience_counter = 0
        best_metric = float('-inf') if mode == 'max' else float('inf')
        best_epoch = 0

    print(f"Training {epochs} epochs...")

    # Main training loop: iterate through epochs
    for epoch in range(1, epochs + 1):

        # Forward pass through training data, compute gradients, update weights
        train_loss, train_f1 = train_one_epoch(
            model, train_loader, criterion, optimizer, scaler, device, l1_lambda, l2_lambda
        )

        # Evaluate model on validation data without updating weights
        val_loss, val_f1 = validate_one_epoch(
            model, val_loader, criterion, device
        )

        # Store metrics for plotting and analysis
        training_history['train_loss'].append(train_loss)
        training_history['val_loss'].append(val_loss)
        training_history['train_f1'].append(train_f1)
        training_history['val_f1'].append(val_f1)

        # Write metrics to TensorBoard for visualization
        if writer is not None:
            log_metrics_to_tensorboard(
                writer, epoch, train_loss, train_f1, val_loss, val_f1, model
            )

        # Print progress every N epochs or on first epoch
        if verbose > 0:
            if epoch % verbose == 0 or epoch == 1:
                print(f"Epoch {epoch:3d}/{epochs} | "
                    f"Train: Loss={train_loss:.4f}, F1 Score={train_f1:.4f} | "
                    f"Val: Loss={val_loss:.4f}, F1 Score={val_f1:.4f}")

        # Early stopping logic: monitor metric and save best model
        if patience > 0:
            current_metric = training_history[evaluation_metric][-1]
            is_improvement = (current_metric > best_metric) if mode == 'max' else (current_metric < best_metric)

            if is_improvement :
                best_metric = current_metric
                best_val_f1 = val_f1
                best_epoch = epoch
                if save_model:
                  torch.save(model.state_dict(), f"{models_dir}/{experiment_name}_model.pt")
                patience_counter = 0
            else:
                patience_counter += 1
                if patience_counter >= patience:
                    print(f"Early stopping triggered after {epoch} epochs.")
                    break
    if restore_best_weights and patience > 0:
      print(f"Best model restored from epoch {best_epoch} with {evaluation_metric} {best_metric:.4f} and with val_f1 {best_val_f1}")
    # Restore best model weights if early stopping was used
    if restore_best_weights and patience > 0 and save_model:
        model.load_state_dict(torch.load(f"{models_dir}/{experiment_name}_model.pt"))
        print(f"Best model restored from epoch {best_epoch} with {evaluation_metric} {best_metric:.4f} and with val_f1 {best_val_f1}")

    # Save final model if no early stopping
    if patience == 0 and save_model:
        torch.save(model.state_dict(), f"{models_dir}/{experiment_name}_model.pt")
    if not save_model:
        print("Model saving turned off.")

    # Close TensorBoard writer
    if writer is not None:
        writer.close()

    return model, training_history

### 9.4 Training Loop

In [174]:
#@ title Set Parameter Grid
from sklearn.model_selection import ParameterGrid
import copy

GRID_EPOCHS = 500
GRID_PATIENCE = 40

#===========================================
###########  SET GRID HERE #################
#===========================================

base_params = {
    'RNN_TYPE': ['LSTM'], # Possible values: LSTM, GRU, RNN
    'BIDIRECTIONAL': [True], # Possible values: True/False
    'HIDDEN_SIZE': [[64,32,32,32],[64,32,64,32],[64,32,32],[32,32,16,8]],
    'HIDDEN_LAYERS': [2],
    'LEARNING_RATE': [1e-3],
    'DROPOUT_RATE': [0.2],
    'BATCH_SIZE': [512],
    'L1_LAMBDA': [0.0001],
    'L2_LAMBDA': [0.001],
    'WINDOW': [6],
    'STRIDE': [2],
    'CRITERION': ['FOCAL'], # You can also use ['CROSS', 'FOCAL']
}

# Conditional options
weights_options = [[1.0, 1.0, 1.0]]
gammas_options = [1.3]

# List to hold full parameter grids
combined_param_grids = []

# Loop through each criterion in base_params['CRITERION']
for criterion in base_params['CRITERION']:
    config = copy.deepcopy(base_params)
    config['CRITERION'] = [criterion]

    # Add conditionally depending on the criterion
    if criterion == 'CROSS':
        config['WEIGHTS'] = weights_options
        if 'GAMMA' in config: del config['GAMMA']
    elif criterion == 'FOCAL':
        config['GAMMA'] = gammas_options
        if 'WEIGHTS' in config: del config['WEIGHTS']

    combined_param_grids.append(config)

# Create full parameter grid
grid = list(ParameterGrid(combined_param_grids))

print(f"Generated {len(grid)} combinations.")
results = []


Generated 4 combinations.


In [175]:
import pandas as pd

df_grid = pd.DataFrame(grid)
print(df_grid.to_string(index=False))

 BATCH_SIZE  BIDIRECTIONAL CRITERION  DROPOUT_RATE  GAMMA  HIDDEN_LAYERS      HIDDEN_SIZE  L1_LAMBDA  L2_LAMBDA  LEARNING_RATE RNN_TYPE  STRIDE  WINDOW
        512           True     FOCAL           0.2    1.3              2 [64, 32, 32, 32]     0.0001      0.001          0.001     LSTM       2       6
        512           True     FOCAL           0.2    1.3              2 [64, 32, 64, 32]     0.0001      0.001          0.001     LSTM       2       6
        512           True     FOCAL           0.2    1.3              2     [64, 32, 32]     0.0001      0.001          0.001     LSTM       2       6
        512           True     FOCAL           0.2    1.3              2  [32, 32, 16, 8]     0.0001      0.001          0.001     LSTM       2       6


In [None]:
import os, math, copy, time

# --- ensure output directories exist ---
os.makedirs(str(logs_dir), exist_ok=True)

best_val_f1 = float('-inf')
best_params = None
best_training_history = None
best_state_dict = None
best_model_path = None
best_run_idx = None
best_epoch_in_run = None
results = []


for idx, params in enumerate(grid, 1):
    start_time = time.perf_counter()
    print(f"\nConfiguration {idx}/{len(grid)}: {params}")
    #Set up Criterion


    if params['CRITERION'] == 'CROSS':
      weights = torch.tensor(params['WEIGHTS']).to(device)
      criterion = nn.CrossEntropyLoss(weight=weights)
    else:
      criterion = FocalLoss(alpha=None, gamma=params['GAMMA'])
    #Build Sequence for the grid step
    WINDOW_GRID = params['WINDOW']
    STRIDE_GRID = params['STRIDE']
    # Generate sequences and labels for the training set
    X_train, y_train, _ = build_sequences(df_train, WINDOW_GRID, STRIDE_GRID) # Unpack all three return values
    # Generate sequences and labels for the validation set
    X_val, y_val, _ = build_sequences(df_val, WINDOW_GRID, STRIDE_GRID) # Unpack all three return values
    X_train = X_train.astype('float32')
    X_val = X_val.astype('float32')
    # Define the input shape based on the training data
    input_shape = X_train.shape[1:]
    # Define the number of classes based on the categorical labels
    num_classes = len(np.unique(y_train))
    # Discard nan values from the dataset
    if np.isnan(X_train).any() or np.isnan(X_val).any() or np.isnan(X_test).any():
        X_train = np.nan_to_num(X_train)
        X_val = np.nan_to_num(X_val)
    train_ds = TensorDataset(torch.from_numpy(X_train), torch.from_numpy(y_train))
    val_ds   = TensorDataset(torch.from_numpy(X_val), torch.from_numpy(y_val))
    train_loader = make_loader(train_ds, batch_size=params['BATCH_SIZE'], shuffle=True, drop_last=False)
    val_loader   = make_loader(val_ds, batch_size=params['BATCH_SIZE'], shuffle=False, drop_last=False)
    print(f"Training set size: {len(train_ds)}")
    print(f"Validation set size: {len(val_ds)}")


    # Build model
    rnn_model = FlexibleRecurrentClassifier(
        input_size=input_shape[-1],
        hidden_sizes=params['HIDDEN_SIZE'], # Changed from hidden_size to hidden_sizes as per FlexibleRecurrentClassifier init
        num_classes=num_classes,
        dropout_rate=params['DROPOUT_RATE'],
        bidirectional=params['BIDIRECTIONAL'], # Changed from BIDIRECTIONAL to params['BIDIRECTIONAL']
        rnn_type=params['RNN_TYPE'] # Changed from RNN_TYPE to params['RNN_TYPE']
    ).to(device)

    # Display architecture summary
    try:
        recurrent_summary(rnn_model, input_size=input_shape)
    except Exception as e:
        print(f"[warn] recurrent_summary failed: {e}")

    # Set up TensorBoard writer
    experiment_id = f"{EXPERIMENT_NAME}_{idx}"
    writer = SummaryWriter(f"./{logs_dir}/{experiment_id}")

    # Add model graph only once to save time/disk
    if idx == 1:
        try:
            x = torch.randn(1, input_shape[0], input_shape[1]).to(device)
            x_for_graph = x if getattr(rnn_model, "batch_first", True) else x.permute(1, 0, 2)
            writer.add_graph(rnn_model, x_for_graph)
        except Exception as e:
            print(f"[warn] Skipping add_graph: {e}")

    # Optimizer and AMP scaler
    optimizer = torch.optim.AdamW(
        rnn_model.parameters(),
        lr=params['LEARNING_RATE'],
        weight_decay=params['L2_LAMBDA']
    )
    scaler = torch.cuda.amp.GradScaler(enabled=(device.type == 'cuda'))

    # --- train model ---
    try:
        rnn_model, training_history = fit(
            model=rnn_model,
            train_loader=train_loader,
            val_loader=val_loader,
            l1_lambda=params['L1_LAMBDA'],
            l2_lambda=0.0, # Always set to zero because we are applying L2 Regularization in Optimizer
            epochs=GRID_EPOCHS,
            criterion=criterion,
            optimizer=optimizer,
            scaler=scaler,
            device=device,
            writer=None, # No Tensorboard saving
            verbose=2,
            experiment_name=experiment_id,
            patience=GRID_PATIENCE,
            evaluation_metric="val_loss",
            mode= 'min',
            save_model=False # No model saving for grid search runs
        )

        # Extract metrics
        val_f1_series = [float(v) for v in training_history.get('val_f1', [])
                         if isinstance(v, (int, float)) and math.isfinite(v)]
        val_loss_series = [float(v) for v in training_history.get('val_loss', [])
                           if isinstance(v, (int, float)) and math.isfinite(v)]

        if val_f1_series and val_loss_series:
            run_best_f1 = max(val_f1_series)
            run_best_epoch = val_f1_series.index(run_best_f1) + 1
            run_best_val_loss = val_loss_series[run_best_epoch - 1]
            elapsed = time.perf_counter() - start_time

            print(f"[Run {idx}] Best val_f1 = {run_best_f1:.4f} (epoch {run_best_epoch})")

            # Save metrics for summary table
            results.append({
                'Run': idx,
                'Best_Epoch': run_best_epoch,
                'Best_Val_F1': run_best_f1,
                'Best_Val_Loss': run_best_val_loss,
                'Elapsed_s': elapsed,
                **params
            })

            # Track best model
            if run_best_f1 > best_val_f1:
                best_val_f1 = run_best_f1
                best_params = params
                best_training_history = training_history
                best_state_dict = copy.deepcopy(rnn_model.state_dict())
                best_run_idx = idx
                best_epoch_in_run = run_best_epoch
                #best_model_path = f"./{logs_dir}/{EXPERIMENT_NAME}_best.pt"
                #torch.save(best_state_dict, best_model_path)

        else:
            print("[warn] No valid val_f1 or val_loss values recorded for this run.")

    except Exception as e:
        print(f"[error] Training failed for configuration {idx}: {e}")

    finally:
        try:
            writer.close()
        except Exception:
            pass
        del optimizer, scaler, rnn_model
        if device.type == 'cuda':
            torch.cuda.empty_cache()

    print(f"Configuration {idx} completed in {time.perf_counter() - start_time:.1f}s")

# --- summary ---
print("\n" + "="*50)
print("                GRID SEARCH COMPLETE")
print("="*50)
if best_params is not None:
    print(f"Best run: #{best_run_idx} (epoch {best_epoch_in_run})")
    print(f"Best Validation F1 Score: {best_val_f1:.4f}")
    print(f"Best Parameters: {best_params}")
    if best_model_path:
        print(f"Best model saved to: {best_model_path}")
else:
    print("No successful runs (val_f1 was empty or invalid).")
print("="*50)

# --- results table ---
if results:
    df_results = pd.DataFrame(results)
    df_results = df_results.sort_values(by='Best_Val_F1', ascending=False).reset_index(drop=True)

    print("\nGrid Search Results Summary:")
    print(df_results.to_string(index=False))

    # Optionally save results to CSV
    results_path = f"./{logs_dir}/{EXPERIMENT_NAME}_grid_results.csv"
    df_results.to_csv(results_path, index=False)
    print(f"\nResults saved to: {results_path}")
else:
    print("No results to display.")


Configuration 1/4: {'BATCH_SIZE': 512, 'BIDIRECTIONAL': True, 'CRITERION': 'FOCAL', 'DROPOUT_RATE': 0.2, 'GAMMA': 1.3, 'HIDDEN_LAYERS': 2, 'HIDDEN_SIZE': [64, 32, 32, 32], 'L1_LAMBDA': 0.0001, 'L2_LAMBDA': 0.001, 'LEARNING_RATE': 0.001, 'RNN_TYPE': 'LSTM', 'STRIDE': 2, 'WINDOW': 6}
Training set size: 33259
Validation set size: 9480
-------------------------------------------------------------------------------
Layer (type)              Output Shape                 Param #           
classifier (Linear)       [-1, 3]                      195            
Total params: 195
Trainable params: 195
Non-trainable params: 0
-------------------------------------------------------------------------------
Training 160 epochs...
Epoch   1/160 | Train: Loss=1.2465, F1 Score=0.6671 | Val: Loss=0.3814, F1 Score=0.6768
Epoch   2/160 | Train: Loss=0.6879, F1 Score=0.6726 | Val: Loss=0.3814, F1 Score=0.6768
Epoch   4/160 | Train: Loss=0.4330, F1 Score=0.6726 | Val: Loss=0.3812, F1 Score=0.6768
Epoch   6

In [None]:
results = pd.read_csv(results_path)
print(results.to_string(index=False))