# LSTM Hyperparameter Tuning

This notebook performs hyperparameter optimization for LSTM using Optuna.

**Task**: Multiclass fault classification (18 classes)

**Modes**:
- Quick mode (QUICK_MODE=True): 2% data, 5 trials - for validation
- Tuning mode (QUICK_MODE=False): 20% data, 50 trials - for actual optimization

**Data Handling**:
- Windows are created within simulation runs only (no cross-run windows)
- Subsampling is done by simulation runs, not individual rows

**Outputs**:
- Best hyperparameters: `outputs/hyperparams/lstm_best.json`
- Optuna study: `outputs/optuna_studies/lstm_study.pkl`

## Configuration

In [1]:
import os
import sys
import time
import json
import pickle
from pathlib import Path

start_time = time.time()
print("="*60)
print("LSTM Hyperparameter Tuning")
print("="*60)
print(f"Started at: {time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(start_time))}")

QUICK_MODE = os.getenv('QUICK_MODE', 'False').lower() in ('true', '1', 'yes')

if QUICK_MODE:
    RUN_FRACTION = 0.01  # Sample 1% of simulation runs
    MIN_RUNS_PER_CLASS = 5  # But ensure at least 5 runs per fault class
    N_TRIALS = 5
    MAX_EPOCHS = 10
    PATIENCE = 3
    print("ðŸš€ QUICK MODE (1% runs, min 5/class, 5 trials, max 10 epochs)")
else:
    RUN_FRACTION = 0.50
    MIN_RUNS_PER_CLASS = 5
    N_TRIALS = 50
    MAX_EPOCHS = 50
    PATIENCE = 5
    print("ðŸ”¬ TUNING MODE (50% runs, 50 trials, max 50 epochs)")

DATA_DIR = Path('../data')
OUTPUT_DIR = Path('../outputs')
HYPERPARAM_DIR = OUTPUT_DIR / 'hyperparams'
STUDY_DIR = OUTPUT_DIR / 'optuna_studies'
PROGRESS_FILE = OUTPUT_DIR / 'lstm_progress.log'

HYPERPARAM_DIR.mkdir(parents=True, exist_ok=True)
STUDY_DIR.mkdir(parents=True, exist_ok=True)

RANDOM_SEED = 42

print(f"Run fraction: {RUN_FRACTION*100}%")
print(f"Trials: {N_TRIALS}, Max epochs: {MAX_EPOCHS}, Patience: {PATIENCE}")
print("="*60)

def log_progress(message):
    print(message, flush=True)
    with open(PROGRESS_FILE, 'a') as f:
        f.write(f"{message}\n")
        f.flush()

PROGRESS_FILE.write_text("")

LSTM Hyperparameter Tuning
Started at: 2026-01-03 22:54:11
ðŸš€ QUICK MODE (1% runs, min 5/class, 5 trials, max 10 epochs)
Run fraction: 1.0%
Trials: 5, Max epochs: 10, Patience: 3


0

## Imports

In [2]:
log_progress("\n[Step 1/6] Loading libraries...")
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.metrics import f1_score
import optuna
from optuna.pruners import MedianPruner
import warnings
warnings.filterwarnings('ignore')

torch.manual_seed(RANDOM_SEED)
np.random.seed(RANDOM_SEED)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
log_progress(f"âœ“ Using device: {device}")


[Step 1/6] Loading libraries...


âœ“ Using device: cuda


## Data Loading and Preprocessing

**Important**: We sample by simulation runs (not individual rows) to preserve temporal structure.

In [3]:
log_progress("\n[Step 2/6] Loading and preparing data...")

train = pd.read_csv(DATA_DIR / 'multiclass_train.csv')
val = pd.read_csv(DATA_DIR / 'multiclass_val.csv')

log_progress(f"âœ“ Full data - Train: {train.shape}, Val: {val.shape}")

# Sample by simulation runs (not individual rows) to preserve temporal structure
def sample_by_runs(df, fraction, seed, min_runs=5):
    """Sample complete simulation runs, preserving temporal structure within each run.
    
    Args:
        df: DataFrame with faultNumber, simulationRun columns
        fraction: Fraction of runs to sample per class
        seed: Random seed
        min_runs: Minimum number of runs per class (ensures small fractions still work)
    """
    runs = df.groupby(['faultNumber', 'simulationRun']).size().reset_index()[['faultNumber', 'simulationRun']]
    
    def sample_class(x):
        n_total = len(x)
        n_sample = max(min_runs, int(n_total * fraction))
        n_sample = min(n_sample, n_total)
        return x.sample(n=n_sample, random_state=seed)
    
    sampled_runs = runs.groupby('faultNumber', group_keys=False).apply(sample_class)
    df_sampled = df.merge(sampled_runs, on=['faultNumber', 'simulationRun'])
    
    return df_sampled.sort_values(['faultNumber', 'simulationRun', 'sample']).reset_index(drop=True)

train_sampled = sample_by_runs(train, RUN_FRACTION, RANDOM_SEED, MIN_RUNS_PER_CLASS)
val_sampled = sample_by_runs(val, RUN_FRACTION, RANDOM_SEED, MIN_RUNS_PER_CLASS)

n_train_runs = train_sampled.groupby(['faultNumber', 'simulationRun']).ngroups
n_val_runs = val_sampled.groupby(['faultNumber', 'simulationRun']).ngroups

log_progress(f"âœ“ Sampled - Train: {train_sampled.shape} ({n_train_runs} runs), Val: {val_sampled.shape} ({n_val_runs} runs)")

# Get feature columns
features = [col for col in train.columns if 'xmeas' in col or 'xmv' in col]
num_features = len(features)

# Fit scaler on training data
scaler = StandardScaler()
scaler.fit(train_sampled[features])

# Encode labels
label_encoder = LabelEncoder()
label_encoder.fit(train_sampled['faultNumber'])
num_classes = len(label_encoder.classes_)

log_progress(f"âœ“ Features: {num_features}, Classes: {num_classes}")


[Step 2/6] Loading and preparing data...


âœ“ Full data - Train: (864000, 57), Val: (432000, 57)


âœ“ Sampled - Train: (43200, 57) (90 runs), Val: (43200, 57) (90 runs)


âœ“ Features: 52, Classes: 18


## LSTM Model Definition

In [4]:
log_progress("\n[Step 3/6] Defining LSTM model...")

class SimulationRunDataset(Dataset):
    """
    Dataset that creates windows WITHIN simulation runs only.
    No windows cross simulation run boundaries.
    """
    def __init__(self, df, features, label_col, scaler, label_encoder, sequence_length=10):
        self.seq_len = sequence_length
        self.windows = []
        self.labels = []
        
        # Process each simulation run separately
        for (fault, run), group in df.groupby(['faultNumber', 'simulationRun']):
            # Sort by sample number to ensure temporal order
            group = group.sort_values('sample')
            
            # Scale features
            X = scaler.transform(group[features].values)
            y = label_encoder.transform(group['faultNumber'].values)
            
            # Create windows within this run
            for i in range(len(X) - sequence_length + 1):
                self.windows.append(X[i:i+sequence_length])
                self.labels.append(y[i+sequence_length-1])
        
        self.windows = np.array(self.windows, dtype=np.float32)
        self.labels = np.array(self.labels, dtype=np.int64)
    
    def __len__(self):
        return len(self.windows)
    
    def __getitem__(self, idx):
        return torch.from_numpy(self.windows[idx]), torch.tensor(self.labels[idx])

class LSTMClassifier(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, num_classes, dropout=0.0):
        super().__init__()
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, 
                           batch_first=True, dropout=dropout if num_layers > 1 else 0)
        self.dropout = nn.Dropout(dropout)
        self.fc = nn.Linear(hidden_size, num_classes)
        
    def forward(self, x):
        lstm_out, _ = self.lstm(x)
        out = self.dropout(lstm_out[:, -1, :])
        return self.fc(out)

log_progress("âœ“ LSTM model defined")


[Step 3/6] Defining LSTM model...


âœ“ LSTM model defined


## Optuna Hyperparameter Optimization

In [5]:
log_progress("\n[Step 4/6] Setting up optimization...")

def objective(trial):
    """
    Optuna objective for LSTM hyperparameter optimization.
    
    Hyperparameters:
    - sequence_length: Length of input sequences
    - hidden_size: Number of LSTM hidden units
    - num_layers: Number of LSTM layers
    - dropout: Dropout rate
    - learning_rate: Learning rate for optimizer
    - batch_size: Training batch size
    """
    sequence_length = trial.suggest_int('sequence_length', 5, 20)
    hidden_size = trial.suggest_categorical('hidden_size', [32, 64, 128, 256])
    num_layers = trial.suggest_int('num_layers', 1, 3)
    dropout = trial.suggest_float('dropout', 0.0, 0.5)
    learning_rate = trial.suggest_float('learning_rate', 1e-4, 1e-2, log=True)
    batch_size = trial.suggest_categorical('batch_size', [32, 64, 128])
    
    # Create datasets with proper windowing
    train_dataset = SimulationRunDataset(
        train_sampled, features, 'faultNumber', scaler, label_encoder, sequence_length
    )
    val_dataset = SimulationRunDataset(
        val_sampled, features, 'faultNumber', scaler, label_encoder, sequence_length
    )
    
    train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
    val_loader = DataLoader(val_dataset, batch_size=batch_size)
    
    model = LSTMClassifier(num_features, hidden_size, num_layers, num_classes, dropout).to(device)
    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
    
    # Early stopping variables
    best_val_loss = float('inf')
    patience_counter = 0
    best_model_state = None
    
    # Training with early stopping
    for epoch in range(MAX_EPOCHS):
        # Training phase
        model.train()
        for X_batch, y_batch in train_loader:
            X_batch, y_batch = X_batch.to(device), y_batch.to(device)
            optimizer.zero_grad()
            outputs = model(X_batch)
            loss = criterion(outputs, y_batch)
            loss.backward()
            optimizer.step()
        
        # Validation phase for early stopping
        model.eval()
        val_loss = 0.0
        with torch.no_grad():
            for X_batch, y_batch in val_loader:
                X_batch, y_batch = X_batch.to(device), y_batch.to(device)
                outputs = model(X_batch)
                val_loss += criterion(outputs, y_batch).item()
        val_loss /= len(val_loader)
        
        # Early stopping check
        if val_loss < best_val_loss:
            best_val_loss = val_loss
            patience_counter = 0
            best_model_state = {k: v.cpu().clone() for k, v in model.state_dict().items()}
        else:
            patience_counter += 1
            if patience_counter >= PATIENCE:
                break
    
    # Restore best model
    if best_model_state is not None:
        model.load_state_dict({k: v.to(device) for k, v in best_model_state.items()})
    
    # Final validation for F1 score
    model.eval()
    all_preds, all_labels = [], []
    with torch.no_grad():
        for X_batch, y_batch in val_loader:
            X_batch = X_batch.to(device)
            outputs = model(X_batch)
            preds = outputs.argmax(dim=1).cpu().numpy()
            all_preds.extend(preds)
            all_labels.extend(y_batch.numpy())
    
    return f1_score(all_labels, all_preds, average='weighted')

log_progress("âœ“ Objective function defined")


[Step 4/6] Setting up optimization...


âœ“ Objective function defined


In [6]:
log_progress(f"\n{'='*60}")
log_progress(f"[Step 5/6] Starting optimization")
log_progress(f"{'='*60}\n")

optuna_start = time.time()
optuna.logging.set_verbosity(optuna.logging.WARNING)

study = optuna.create_study(
    direction='maximize',
    pruner=MedianPruner(n_startup_trials=5, n_warmup_steps=5),
    study_name='lstm_multiclass'
)

log_progress(f"Running {N_TRIALS} trials...")

for trial_num in range(N_TRIALS):
    study.optimize(objective, n_trials=1, show_progress_bar=False)
    trial = study.trials[-1]
    log_progress(f"Trial {trial_num + 1}/{N_TRIALS}: F1={trial.value:.4f} (best={study.best_value:.4f})")

optuna_time = time.time() - optuna_start

log_progress(f"\n{'='*60}")
log_progress("âœ“ Optimization complete!")
log_progress(f"Total time: {optuna_time:.2f}s")




[Step 5/6] Starting optimization





Running 5 trials...


Trial 1/5: F1=0.8829 (best=0.8829)


Trial 2/5: F1=0.8766 (best=0.8829)


Trial 3/5: F1=0.8665 (best=0.8829)


Trial 4/5: F1=0.8868 (best=0.8868)


Trial 5/5: F1=0.9063 (best=0.9063)





âœ“ Optimization complete!


Total time: 172.85s


## Save Results

In [7]:
end_time = time.time()
total_runtime = end_time - start_time

log_progress("\n[Step 6/6] Saving results...")

results = {
    'model': 'LSTM',
    'task': 'multiclass',
    'best_params': study.best_params,
    'best_f1_weighted': float(study.best_value),
    'num_trials': N_TRIALS,
    'run_fraction': RUN_FRACTION,
    'quick_mode': QUICK_MODE,
    'max_epochs': MAX_EPOCHS,
    'early_stopping_patience': PATIENCE,
    'optimization_time_seconds': optuna_time,
    'random_seed': RANDOM_SEED,
    'timing': {
        'start_time': time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(start_time)),
        'end_time': time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(end_time)),
        'total_runtime_seconds': float(total_runtime),
        'total_runtime_formatted': f"{int(total_runtime // 60)}m {int(total_runtime % 60)}s"
    }
}

mode_suffix = "_quick" if QUICK_MODE else ""
json_path = HYPERPARAM_DIR / f'lstm_best{mode_suffix}.json'
study_path = STUDY_DIR / f'lstm_study{mode_suffix}.pkl'

with open(json_path, 'w') as f:
    json.dump(results, f, indent=2)
log_progress(f"âœ“ Saved to {json_path}")

with open(study_path, 'wb') as f:
    pickle.dump(study, f)

log_progress(f"\n{'='*60}")
log_progress("âœ“ LSTM Hyperparameter Tuning Complete!")
log_progress(f"Runtime: {results['timing']['total_runtime_formatted']}")
log_progress(f"Best F1: {study.best_value:.4f}")
log_progress(f"{'='*60}")


[Step 6/6] Saving results...


âœ“ Saved to ../outputs/hyperparams/lstm_best_quick.json





âœ“ LSTM Hyperparameter Tuning Complete!


Runtime: 3m 0s


Best F1: 0.9063


