# Pirate Pain Level Classification## OverviewThis notebook implements a deep learning pipeline to classify pain levels in pirates based on time-series sensor data. The classification task involves predicting three pain levels:- **no_pain**: No pain detected- **low_pain**: Low level of pain- **high_pain**: High level of pain## Dataset DescriptionThe dataset consists of multivariate time-series data with the following features:- **Pain survey responses**: 4 self-reported pain indicators (pain_survey_1 to pain_survey_4)- **Body characteristics**: Number of legs, hands, and eyes- **Joint angles**: 31 joint measurements (joint_00 to joint_30) captured over timeEach sample has 160 time steps with measurements recorded at regular intervals.

## 1. Environment SetupImport necessary libraries and configure the environment for reproducibility.

In [None]:
# Core libraries
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
from collections import defaultdict
warnings.filterwarnings('ignore')

# Deep Learning framework
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models

# Machine learning utilities
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.metrics import classification_report, confusion_matrix, ConfusionMatrixDisplay, f1_score
from sklearn.utils import class_weight
from imblearn.over_sampling import SMOTE

# Parallel processing
from itertools import product
from joblib import Parallel, delayed
from tqdm import tqdm

# Set random seeds for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

# Configure plotting style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

print(f"TensorFlow version: {tf.__version__}")
print(f"GPU Available: {tf.config.list_physical_devices('GPU')}")
print("✅ Environment setup complete!")

## 2. Data Loading and ExplorationLoad the training data and examine its structure.

In [None]:
# Load training data and labels
train_features = pd.read_csv('pirate_pain_train.csv')
train_labels = pd.read_csv('pirate_pain_train_labels.csv')

print("📊 Training Data Overview")
print("=" * 60)
print(f"Features shape: {train_features.shape}")
print(f"Labels shape: {train_labels.shape}")
print(f"Number of unique samples: {train_features['sample_index'].nunique()}")
print(f"Time steps per sample: {train_features.groupby('sample_index').size().iloc[0]}")
print(f"Number of features: {train_features.shape[1] - 2}")  # Excluding sample_index and time

### 2.1 Preview Data

In [None]:
# Display sample of features
print("\n📋 First few rows of training features:")
display(train_features.head(10))

# Display sample of labels
print("\n🏷️ First few labels:")
display(train_labels.head(10))

### 2.2 Class Distribution AnalysisExamine the distribution of pain levels in the training set to identify any class imbalance.

In [None]:
# Analyze label distribution
label_distribution = train_labels['label'].value_counts()

# Visualize distribution
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Bar plot
colors_bar = {'no_pain': 'green', 'low_pain': 'orange', 'high_pain': 'red'}
bar_colors = [colors_bar.get(label, 'gray') for label in label_distribution.index]
axes[0].bar(label_distribution.index, label_distribution.values, color=bar_colors)
axes[0].set_xlabel('Pain Level', fontsize=12)
axes[0].set_ylabel('Count', fontsize=12)
axes[0].set_title('Distribution of Pain Levels', fontsize=14, fontweight='bold')
axes[0].grid(axis='y', alpha=0.3)

# Pie chart
axes[1].pie(label_distribution.values, labels=label_distribution.index, 
            autopct='%1.1f%%', startangle=90, colors=bar_colors)
axes[1].set_title('Pain Level Proportions', fontsize=14, fontweight='bold')

plt.tight_layout()
plt.show()

print("\n📈 Class Statistics:")
print(label_distribution)
print(f"\nClass imbalance ratio: {label_distribution.min() / label_distribution.max():.2%}")

### 2.3 Feature AnalysisIdentify and categorize the different types of features in the dataset.

In [None]:
# Categorize features
pain_survey_features = [col for col in train_features.columns if 'pain_survey' in col]
body_characteristic_features = ['n_legs', 'n_hands', 'n_eyes']
joint_angle_features = [col for col in train_features.columns if 'joint_' in col]

print("🔍 Feature Categories:")
print("=" * 60)
print(f"\n📋 Pain Survey Features ({len(pain_survey_features)}):")
print(f"   {', '.join(pain_survey_features)}")
print(f"\n🧍 Body Characteristics ({len(body_characteristic_features)}):")
print(f"   {', '.join(body_characteristic_features)}")
print(f"\n🦴 Joint Angle Measurements ({len(joint_angle_features)}):")
print(f"   Joint features: joint_00 to joint_{len(joint_angle_features)-1:02d}")
print(f"\n✅ Total features: {len(pain_survey_features) + len(body_characteristic_features) + len(joint_angle_features)}")

### 2.4 Time Series VisualizationVisualize the temporal patterns in the data for a sample pirate.

In [None]:
# Select a sample to visualize
sample_id = 0
sample_time_series = train_features[train_features['sample_index'] == sample_id].sort_values('time')

# Create visualization
fig, axes = plt.subplots(2, 2, figsize=(15, 10))
fig.suptitle(f'Time Series Data for Sample {sample_id}', fontsize=16, fontweight='bold')

# Plot pain survey readings
for feature in pain_survey_features:
    axes[0, 0].plot(sample_time_series['time'], sample_time_series[feature], 
                    marker='o', label=feature, alpha=0.7)
axes[0, 0].set_xlabel('Time')
axes[0, 0].set_ylabel('Value')
axes[0, 0].set_title('Pain Survey Readings Over Time')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# Plot body characteristics
for feature in body_characteristic_features:
    if feature in sample_time_series.columns:
        axes[0, 1].plot(sample_time_series['time'], sample_time_series[feature], 
                        marker='s', label=feature, alpha=0.7)
axes[0, 1].set_xlabel('Time')
axes[0, 1].set_ylabel('Value')
axes[0, 1].set_title('Body Characteristics Over Time')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)

# Plot first 5 joint angles
for feature in joint_angle_features[:5]:
    axes[1, 0].plot(sample_time_series['time'], sample_time_series[feature], 
                    label=feature, alpha=0.7)
axes[1, 0].set_xlabel('Time')
axes[1, 0].set_ylabel('Angle (degrees)')
axes[1, 0].set_title('Joint Angles (First 5)')
axes[1, 0].legend()
axes[1, 0].grid(True, alpha=0.3)

# Plot last 5 joint angles
for feature in joint_angle_features[-5:]:
    axes[1, 1].plot(sample_time_series['time'], sample_time_series[feature], 
                    label=feature, alpha=0.7)
axes[1, 1].set_xlabel('Time')
axes[1, 1].set_ylabel('Angle (degrees)')
axes[1, 1].set_title('Joint Angles (Last 5)')
axes[1, 1].legend()
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## 3. Data Preprocessing### 3.1 Feature PreprocessingDefine preprocessing functions to handle categorical variables and prepare features for modeling.

In [None]:
# Define categorical mapping
CATEGORICAL_MAPPING = {'zero': 0, 'one': 1, 'two': 2, 'three': 3}
CATEGORICAL_FEATURES = ['n_legs', 'n_hands', 'n_eyes']

def preprocess_features(dataframe):
    """
    Preprocess features for model input.
    
    Args:
        dataframe: DataFrame containing time-series features
        
    Returns:
        numpy array of preprocessed features
    """
    # Exclude identifier columns
    excluded_columns = ['sample_index', 'time']
    feature_columns = [col for col in dataframe.columns if col not in excluded_columns]
    
    # Copy relevant features
    processed_data = dataframe[feature_columns].copy()
    
    # Convert categorical features to numeric
    for feature in CATEGORICAL_FEATURES:
        if feature in processed_data.columns:
            # Map categorical values to numeric
            mapped_values = processed_data[feature].map(CATEGORICAL_MAPPING)
            
            # Fill missing values with mode or 0
            if mapped_values.isna().all():
                fill_value = 0
            else:
                mode_series = mapped_values.mode(dropna=True)
                fill_value = mode_series.iloc[0] if not mode_series.empty else 0
            
            processed_data[feature] = mapped_values.fillna(fill_value)
    
    # Fill any remaining NaN values
    processed_data = processed_data.fillna(0)
    
    return processed_data.values

print("✅ Preprocessing functions defined!")

### 3.2 Create SequencesConvert the data into sequences suitable for LSTM/RNN models.

In [None]:
# Build sequences from time-series data
print("🔄 Creating sequences...")

sequences = []
sequence_labels = []

unique_sample_indices = train_features['sample_index'].unique()

for sample_idx in unique_sample_indices:
    # Extract all time steps for this sample
    sample_data = train_features[train_features['sample_index'] == sample_idx].copy()
    sample_data = sample_data.sort_values('time')
    
    # Preprocess features
    features = preprocess_features(sample_data)
    sequences.append(features)
    
    # Get corresponding label
    label = train_labels[train_labels['sample_index'] == sample_idx]['label'].iloc[0]
    sequence_labels.append(label)

print(f"✅ Created {len(sequences)} sequences")
print(f"   Sequence length range: {min(s.shape[0] for s in sequences)} to {max(s.shape[0] for s in sequences)} time steps")
print(f"   Features per time step: {sequences[0].shape[1]}")

### 3.3 Feature ScalingScale features using StandardScaler to normalize the data and improve model convergence.

In [None]:
# Scale features for better model performance
print("⚖️ Scaling features...")

feature_scaler = StandardScaler()
# Fit scaler on all time steps combined
feature_scaler.fit(np.vstack(sequences))
# Transform each sequence
scaled_sequences = [feature_scaler.transform(seq) for seq in sequences]

print("✅ Feature scaling complete!")

### 3.4 Pad SequencesPad sequences to uniform length for batch processing.

In [None]:
# Pad sequences to uniform length
max_sequence_length = max([s.shape[0] for s in scaled_sequences])
num_features = scaled_sequences[0].shape[1]

print(f"📏 Padding sequences to maximum length: {max_sequence_length}")

# Create padded array
padded_sequences = np.zeros((len(scaled_sequences), max_sequence_length, num_features))
for i, sequence in enumerate(scaled_sequences):
    padded_sequences[i, :sequence.shape[0], :] = sequence

print(f"✅ Padded sequences shape: {padded_sequences.shape}")
print(f"   Format: (num_samples, time_steps, num_features)")

### 3.5 Encode LabelsConvert text labels to numeric format.

In [None]:
# Encode labels to integers
label_encoder = LabelEncoder()
encoded_labels = label_encoder.fit_transform(sequence_labels)

print("🏷️ Label Encoding Mapping:")
for idx, label_name in enumerate(label_encoder.classes_):
    print(f"   {label_name} → {idx}")

# Show class distribution
class_distribution = pd.Series(encoded_labels).value_counts().sort_index()
print("\nClass Distribution:")
for class_idx, count in class_distribution.items():
    print(f"   Class {class_idx} ({label_encoder.classes_[class_idx]}): {count} samples")

### 3.6 Train-Validation SplitSplit data into training and validation sets with stratification to maintain class proportions.

In [None]:
# Create stratified train-validation split
print("✂️ Splitting data into train and validation sets...")

train_sequences, val_sequences, train_labels_encoded, val_labels_encoded = train_test_split(
    padded_sequences, encoded_labels,
    test_size=0.2,
    random_state=42,
    stratify=encoded_labels
)

print(f"✅ Training set: {train_sequences.shape[0]} samples")
print(f"✅ Validation set: {val_sequences.shape[0]} samples")

# Store dimensions for model building
sequence_length = train_sequences.shape[1]
feature_count = train_sequences.shape[2]
num_classes = len(label_encoder.classes_)

# Show distribution in splits
train_dist = pd.Series(train_labels_encoded).value_counts().sort_index()
val_dist = pd.Series(val_labels_encoded).value_counts().sort_index()

print("\n📊 Class Distribution After Split:")
for i, label_name in enumerate(label_encoder.classes_):
    print(f"   {label_name}: Train={train_dist.get(i, 0)}, Val={val_dist.get(i, 0)}")

# Store base training data for resampling experiments
base_train_sequences = train_sequences.astype(np.float32)
base_train_labels = train_labels_encoded.copy()

print("\n✅ Base training data stored for balancing strategies")

## 4. Model Architecture and Training Components### 4.1 Custom F1-Score MetricImplement a custom F1-score metric for Keras to monitor macro-averaged F1-score during training.

In [None]:
class F1Score(keras.metrics.Metric):
    """
    Custom F1-Score metric for multi-class classification.
    Computes macro-averaged F1-score across all classes.
    """
    def __init__(self, name='f1_score', num_classes=3, **kwargs):
        super().__init__(name=name, **kwargs)
        self.num_classes = num_classes
        
        # Initialize state variables for each class
        self.true_positives = self.add_weight(
            name='tp', shape=(num_classes,), initializer='zeros'
        )
        self.false_positives = self.add_weight(
            name='fp', shape=(num_classes,), initializer='zeros'
        )
        self.false_negatives = self.add_weight(
            name='fn', shape=(num_classes,), initializer='zeros'
        )
    
    def update_state(self, y_true, y_pred, sample_weight=None):
        """Update metric state with new predictions."""
        # Convert predictions to class indices
        y_true = tf.cast(tf.reshape(y_true, [-1]), tf.int32)
        y_pred_classes = tf.argmax(y_pred, axis=-1, output_type=tf.int32)
        
        # Create one-hot encodings
        y_true_one_hot = tf.cast(tf.one_hot(y_true, depth=self.num_classes), tf.float32)
        y_pred_one_hot = tf.cast(tf.one_hot(y_pred_classes, depth=self.num_classes), tf.float32)
        
        # Apply sample weights if provided
        if sample_weight is not None:
            sample_weight = tf.cast(tf.reshape(sample_weight, [-1, 1]), tf.float32)
            y_true_one_hot *= sample_weight
            y_pred_one_hot *= sample_weight
        
        # Calculate TP, FP, FN
        tp = tf.reduce_sum(y_true_one_hot * y_pred_one_hot, axis=0)
        fp = tf.reduce_sum((1.0 - y_true_one_hot) * y_pred_one_hot, axis=0)
        fn = tf.reduce_sum(y_true_one_hot * (1.0 - y_pred_one_hot), axis=0)
        
        # Update state
        self.true_positives.assign_add(tp)
        self.false_positives.assign_add(fp)
        self.false_negatives.assign_add(fn)
    
    def result(self):
        """Compute macro-averaged F1-score."""
        precision = self.true_positives / (
            self.true_positives + self.false_positives + tf.keras.backend.epsilon()
        )
        recall = self.true_positives / (
            self.true_positives + self.false_negatives + tf.keras.backend.epsilon()
        )
        f1 = 2 * (precision * recall) / (precision + recall + tf.keras.backend.epsilon())
        return tf.reduce_mean(f1)
    
    def reset_state(self):
        """Reset metric state."""
        self.true_positives.assign(tf.zeros((self.num_classes,)))
        self.false_positives.assign(tf.zeros((self.num_classes,)))
        self.false_negatives.assign(tf.zeros((self.num_classes,)))

print("✅ F1-Score metric class defined!")

### 4.2 Model BuildersDefine functions to build different model architectures: LSTM, GRU, and CNN-LSTM.

In [None]:
def build_lstm_model(sequence_length, num_features, num_classes, 
                     units=(128, 64), dropout=0.3):
    """
    Build a stacked LSTM model for time-series classification.
    
    Args:
        sequence_length: Length of input sequences
        num_features: Number of features per time step
        num_classes: Number of output classes
        units: Tuple of LSTM layer sizes
        dropout: Dropout rate
        
    Returns:
        Compiled Keras model
    """
    model = keras.Sequential([
        layers.Masking(mask_value=0.0, input_shape=(sequence_length, num_features)),
        layers.LSTM(units[0], return_sequences=True),
        layers.Dropout(dropout),
        layers.LSTM(units[1]),
        layers.Dropout(dropout),
        layers.Dense(64, activation='relu'),
        layers.Dense(num_classes, activation='softmax')
    ])
    return model

def build_gru_model(sequence_length, num_features, num_classes, 
                    units=(128, 64), dropout=0.3):
    """
    Build a stacked GRU model for time-series classification.
    
    Args:
        sequence_length: Length of input sequences
        num_features: Number of features per time step
        num_classes: Number of output classes
        units: Tuple of GRU layer sizes
        dropout: Dropout rate
        
    Returns:
        Compiled Keras model
    """
    model = keras.Sequential([
        layers.Masking(mask_value=0.0, input_shape=(sequence_length, num_features)),
        layers.GRU(units[0], return_sequences=True),
        layers.Dropout(dropout),
        layers.GRU(units[1]),
        layers.Dropout(dropout),
        layers.Dense(64, activation='relu'),
        layers.Dense(num_classes, activation='softmax')
    ])
    return model

def build_cnn_lstm_model(sequence_length, num_features, num_classes,
                         filters=64, kernel_size=3, lstm_units=64, dropout=0.3):
    """
    Build a hybrid CNN-LSTM model for time-series classification.
    
    CNN layers extract local patterns, LSTM layers capture temporal dependencies.
    
    Args:
        sequence_length: Length of input sequences
        num_features: Number of features per time step
        num_classes: Number of output classes
        filters: Number of CNN filters
        kernel_size: Size of CNN kernel
        lstm_units: Number of LSTM units
        dropout: Dropout rate
        
    Returns:
        Compiled Keras model
    """
    model = keras.Sequential([
        layers.Masking(mask_value=0.0, input_shape=(sequence_length, num_features)),
        layers.Conv1D(filters, kernel_size, activation='relu', padding='same'),
        layers.MaxPooling1D(2),
        layers.LSTM(lstm_units),
        layers.Dropout(dropout),
        layers.Dense(64, activation='relu'),
        layers.Dense(num_classes, activation='softmax')
    ])
    return model

def build_model_by_type(model_type, **kwargs):
    """
    Factory function to build models by type.
    
    Args:
        model_type: One of 'LSTM', 'GRU', or 'CNN_LSTM'
        **kwargs: Arguments to pass to the specific builder
        
    Returns:
        Keras model
    """
    if model_type == "LSTM":
        return build_lstm_model(**kwargs)
    elif model_type == "GRU":
        return build_gru_model(**kwargs)
    elif model_type == "CNN_LSTM":
        return build_cnn_lstm_model(**kwargs)
    else:
        raise ValueError(f"Unknown model_type: {model_type}")

print("✅ Model builder functions defined!")

### 4.3 Training UtilitiesDefine callbacks and data augmentation functions.

In [None]:
def create_training_callbacks():
    """
    Create Keras callbacks for training.
    
    Returns:
        List of callbacks
    """
    return [
        keras.callbacks.EarlyStopping(
            monitor='val_f1_score',
            patience=10,
            mode='max',
            restore_best_weights=True,
            verbose=0
        ),
        keras.callbacks.ReduceLROnPlateau(
            monitor='val_f1_score',
            factor=0.5,
            patience=5,
            mode='max',
            min_lr=1e-6,
            verbose=0
        )
    ]

def apply_smote_resampling(X_train, y_train, smote_params):
    """
    Apply SMOTE (Synthetic Minority Over-sampling Technique) to balance classes.
    
    Args:
        X_train: Training sequences (3D array)
        y_train: Training labels
        smote_params: SMOTE configuration parameters
        
    Returns:
        Resampled sequences and labels
    """
    smote = SMOTE(**smote_params)
    # Flatten sequences for SMOTE
    X_flat = X_train.reshape(X_train.shape[0], -1)
    # Apply SMOTE
    X_resampled_flat, y_resampled = smote.fit_resample(X_flat, y_train)
    # Reshape back to 3D
    X_resampled = X_resampled_flat.reshape(-1, sequence_length, feature_count)
    return X_resampled.astype(np.float32), y_resampled

print("✅ Training utilities defined!")

## 5. Model Training and Evaluation### 5.1 Ensemble Grid SearchTrain multiple model architectures and configurations to find the best performing approach.

In [None]:
print("🚀 Starting ensemble grid search...")

# Define parameter grids
smote_parameter_grid = [
    {'k_neighbors': 5, 'sampling_strategy': 'auto', 'random_state': 42},
]

model_parameter_grid = [
    {'model_type': 'LSTM', 'units': (128, 64), 'dropout': 0.3, 'learning_rate': 7e-4},
    {'model_type': 'GRU', 'units': (128, 64), 'dropout': 0.3, 'learning_rate': 7e-4},
    {'model_type': 'CNN_LSTM', 'filters': 64, 'kernel_size': 3, 
     'lstm_units': 64, 'dropout': 0.3, 'learning_rate': 1e-3},
]

training_config = {'epochs': 50, 'batch_size': 32}

def train_single_model(smote_params, model_params):
    """
    Train a single model configuration.
    
    Args:
        smote_params: SMOTE configuration
        model_params: Model hyperparameters
        
    Returns:
        Dictionary containing training results
    """
    # Copy parameters to avoid modification
    smote_config = smote_params.copy()
    model_config = model_params.copy()
    
    # Extract model-specific parameters
    model_type = model_config.pop("model_type")
    learning_rate = model_config.pop("learning_rate", 1e-3)
    dropout = model_config.pop("dropout", 0.3)
    
    # Apply SMOTE resampling
    X_resampled, y_resampled = apply_smote_resampling(
        base_train_sequences, base_train_labels, smote_config
    )
    
    # Build model
    build_args = {
        'sequence_length': sequence_length,
        'num_features': feature_count,
        'num_classes': num_classes
    }
    
    model = build_model_by_type(
        model_type,
        **build_args,
        dropout=dropout,
        **model_config
    )
    
    # Compile model
    model.compile(
        optimizer=keras.optimizers.Adam(learning_rate=learning_rate),
        loss='sparse_categorical_crossentropy',
        metrics=['accuracy', F1Score(num_classes=num_classes)]
    )
    
    # Train model
    history = model.fit(
        X_resampled, y_resampled,
        validation_data=(val_sequences, val_labels_encoded),
        epochs=training_config['epochs'],
        batch_size=training_config['batch_size'],
        callbacks=create_training_callbacks(),
        verbose=0
    )
    
    # Evaluate on validation set
    val_predictions_probs = model.predict(val_sequences, verbose=0)
    val_predictions = np.argmax(val_predictions_probs, axis=1)
    f1_macro = f1_score(val_labels_encoded, val_predictions, average='macro')
    accuracy = np.mean(val_labels_encoded == val_predictions)
    
    # Extract best epoch info
    history_dict = history.history
    val_f1_history = history_dict.get('val_f1_score')
    best_epoch = int(np.argmax(val_f1_history)) if val_f1_history else None
    best_val_f1 = float(max(val_f1_history)) if val_f1_history else None
    
    # Prepare result
    full_model_config = {**model_config}
    full_model_config['model_type'] = model_type
    full_model_config['dropout'] = dropout
    full_model_config['learning_rate'] = learning_rate
    
    result = {
        'model_type': model_type,
        'model_config': full_model_config,
        'smote_params': smote_config,
        'val_f1_macro': f1_macro,
        'val_accuracy': accuracy,
        'val_pred_probs': val_predictions_probs.astype(np.float32),
        'epochs_trained': len(history_dict.get('loss', [])),
        'best_epoch': best_epoch,
        'val_f1_keras_best': best_val_f1
    }
    
    # Clean up
    tf.keras.backend.clear_session()
    del model
    
    return result

# Run parallel grid search
grid_search_results = Parallel(n_jobs=-1)(
    delayed(train_single_model)(smote_p, model_p)
    for smote_p, model_p in tqdm(
        product(smote_parameter_grid, model_parameter_grid),
        total=len(smote_parameter_grid) * len(model_parameter_grid),
        desc="Training models"
    )
)

print("✅ Grid search complete!")

### 5.2 Ensemble PredictionsCombine predictions from all models using weighted voting based on their F1-scores.

In [None]:
print("🧠 Creating ensemble predictions...")

# Collect predictions and weights
all_predictions = []
prediction_weights = []

for result in grid_search_results:
    pred_probs = result['val_pred_probs']
    all_predictions.append(pred_probs)
    # Use F1-score as weight (with small epsilon to avoid zero weights)
    prediction_weights.append(max(result['val_f1_macro'], 1e-6))

# Weighted average of predictions
ensemble_predictions_probs = np.average(all_predictions, axis=0, weights=prediction_weights)
ensemble_predictions = np.argmax(ensemble_predictions_probs, axis=1)

# Evaluate ensemble
ensemble_f1 = f1_score(val_labels_encoded, ensemble_predictions, average='macro')
ensemble_accuracy = np.mean(ensemble_predictions == val_labels_encoded)

print(f"\n🎯 Ensemble Performance:")
print(f"   F1-score (macro): {ensemble_f1:.4f}")
print(f"   Accuracy: {ensemble_accuracy:.4f}")

### 5.3 Results SummaryDisplay results from all models and identify the best performing one.

In [None]:
# Create results summary
results_summary = pd.DataFrame([
    {
        'model_type': r['model_type'],
        'val_f1_macro': r['val_f1_macro'],
        'val_accuracy': r['val_accuracy'],
    }
    for r in grid_search_results
])

print("\n📊 Individual Model Results:")
display(results_summary.sort_values('val_f1_macro', ascending=False))

# Identify best model
best_model_result = max(grid_search_results, key=lambda r: r['val_f1_macro'])
best_config = best_model_result['model_config'].copy()
best_smote_config = best_model_result['smote_params'].copy()

print("\n🏆 Best Single Model:")
print(f"   Type: {best_model_result['model_type']}")
print(f"   Macro F1: {best_model_result['val_f1_macro']:.4f}")
print(f"   Accuracy: {best_model_result['val_accuracy']:.4f}")
print(f"   Hyperparameters: {best_config}")
print(f"   SMOTE params: {best_smote_config}")

### 5.4 Retrain Best ModelRetrain the best model configuration with more epochs for final predictions.

In [None]:
print("🎯 Retraining best model configuration...")

# Training parameters
final_training_epochs = 80
final_model_config = best_config.copy()
final_smote_config = best_smote_config.copy()

# Extract parameters
final_model_type = final_model_config.pop('model_type')
final_learning_rate = final_model_config.pop('learning_rate', 1e-3)
final_dropout = final_model_config.pop('dropout', 0.3)

print(f"Configuration: {final_model_type} with learning_rate={final_learning_rate}, dropout={final_dropout}")

# Apply SMOTE
X_resampled_final, y_resampled_final = apply_smote_resampling(
    base_train_sequences,
    base_train_labels,
    final_smote_config
)

# Build final model
final_model = build_model_by_type(
    final_model_type,
    sequence_length=sequence_length,
    num_features=feature_count,
    num_classes=num_classes,
    dropout=final_dropout,
    **final_model_config
)

# Compile final model
final_model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=final_learning_rate),
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy', F1Score(num_classes=num_classes)]
)

# Train final model
final_history = final_model.fit(
    X_resampled_final,
    y_resampled_final,
    validation_data=(val_sequences, val_labels_encoded),
    epochs=final_training_epochs,
    batch_size=training_config['batch_size'],
    callbacks=create_training_callbacks(),
    verbose=0
)

# Evaluate final model
final_val_predictions = np.argmax(final_model.predict(val_sequences, verbose=0), axis=1)
final_f1_macro = f1_score(val_labels_encoded, final_val_predictions, average='macro')
final_f1_weighted = f1_score(val_labels_encoded, final_val_predictions, average='weighted')
final_accuracy = np.mean(final_val_predictions == val_labels_encoded)

print(f"\n✅ Final Model Performance:")
print(f"   Macro F1: {final_f1_macro:.4f}")
print(f"   Weighted F1: {final_f1_weighted:.4f}")
print(f"   Accuracy: {final_accuracy:.4f}")

### 5.5 Training History VisualizationVisualize the training process to understand model learning.

In [None]:
# Plot training history
fig, axes = plt.subplots(1, 3, figsize=(20, 5))

# Loss curve
axes[0].plot(final_history.history['loss'], label='Training Loss', linewidth=2)
axes[0].plot(final_history.history['val_loss'], label='Validation Loss', linewidth=2)
axes[0].set_xlabel('Epoch', fontsize=12)
axes[0].set_ylabel('Loss', fontsize=12)
axes[0].set_title('Model Loss Over Time', fontsize=14, fontweight='bold')
axes[0].legend(fontsize=10)
axes[0].grid(True, alpha=0.3)

# Accuracy curve
axes[1].plot(final_history.history['accuracy'], label='Training Accuracy', linewidth=2)
axes[1].plot(final_history.history['val_accuracy'], label='Validation Accuracy', linewidth=2)
axes[1].set_xlabel('Epoch', fontsize=12)
axes[1].set_ylabel('Accuracy', fontsize=12)
axes[1].set_title('Model Accuracy Over Time', fontsize=14, fontweight='bold')
axes[1].legend(fontsize=10)
axes[1].grid(True, alpha=0.3)

# F1-score curve
axes[2].plot(final_history.history['f1_score'], label='Training F1', linewidth=2)
axes[2].plot(final_history.history['val_f1_score'], label='Validation F1', linewidth=2)
axes[2].set_xlabel('Epoch', fontsize=12)
axes[2].set_ylabel('F1 Score', fontsize=12)
axes[2].set_title('F1 Score Over Time', fontsize=14, fontweight='bold')
axes[2].legend(fontsize=10)
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\n📊 Final Training Metrics:")
print("=" * 60)
print(f"Training Loss: {final_history.history['loss'][-1]:.4f}")
print(f"Training Accuracy: {final_history.history['accuracy'][-1]:.4f}")
print(f"Validation Loss: {final_history.history['val_loss'][-1]:.4f}")
print(f"Validation Accuracy: {final_history.history['val_accuracy'][-1]:.4f}")
print(f"Validation F1: {final_history.history['val_f1_score'][-1]:.4f}")

## 6. Model Evaluation### 6.1 Detailed Classification ReportEvaluate model performance with detailed metrics for each class.

In [None]:
print("🔮 Evaluating final model on validation set...\n")

# Generate predictions
val_prediction_probs = final_model.predict(val_sequences)
val_predictions = np.argmax(val_prediction_probs, axis=1)

# Calculate F1-scores
f1_macro_final = f1_score(val_labels_encoded, val_predictions, average='macro')
f1_weighted_final = f1_score(val_labels_encoded, val_predictions, average='weighted')
f1_per_class = f1_score(val_labels_encoded, val_predictions, average=None)

print("🎯 F1-SCORE RESULTS (PRIMARY METRIC):")
print("=" * 60)
print(f"Macro F1-Score:    {f1_macro_final:.4f} ⭐")
print(f"Weighted F1-Score: {f1_weighted_final:.4f}")
print("\nF1-Score per Class:")
for i, label_name in enumerate(label_encoder.classes_):
    print(f"  {label_name:15s}: {f1_per_class[i]:.4f}")
print("=" * 60)

# Classification report
print("\n📋 Detailed Classification Report:")
print("=" * 70)
print(classification_report(val_labels_encoded, val_predictions, 
                          target_names=label_encoder.classes_))

# Overall accuracy
accuracy_final = np.mean(val_predictions == val_labels_encoded)
print(f"\n✨ Overall Validation Accuracy: {accuracy_final:.4f} ({accuracy_final*100:.2f}%)")

### 6.2 Confusion MatrixVisualize the confusion matrix to understand misclassification patterns.

In [None]:
# Generate confusion matrix
confusion_mat = confusion_matrix(val_labels_encoded, val_predictions)

# Plot confusion matrix
fig, ax = plt.subplots(figsize=(10, 8))
display_cm = ConfusionMatrixDisplay(confusion_matrix=confusion_mat, 
                                     display_labels=label_encoder.classes_)
display_cm.plot(ax=ax, cmap='Blues', values_format='d')
ax.set_title('Confusion Matrix - Validation Set', fontsize=16, fontweight='bold', pad=20)
plt.tight_layout()
plt.show()

# Detailed analysis
print("\n🔍 Confusion Matrix Analysis:")
print("=" * 60)
for i, label_name in enumerate(label_encoder.classes_):
    total_samples = confusion_mat[i].sum()
    correct_predictions = confusion_mat[i, i]
    accuracy_per_class = (correct_predictions / total_samples * 100) if total_samples > 0 else 0
    print(f"{label_name}:")
    print(f"  Correct: {correct_predictions}/{total_samples} ({accuracy_per_class:.1f}%)")
    print(f"  Misclassified: {total_samples - correct_predictions}")

## 7. Generate Test Predictions### 7.1 Load and Preprocess Test DataLoad the test dataset and prepare it for prediction.

In [None]:
# Load test data
print("📂 Loading test data...")
test_features = pd.read_csv('pirate_pain_test.csv')

print(f"Test data shape: {test_features.shape}")
print(f"Number of test samples: {test_features['sample_index'].nunique()}")
print("\n📊 First few rows:")
display(test_features.head())

### 7.2 Create Test SequencesProcess test data into sequences matching the training format.

In [None]:
print("\n🔄 Preparing test sequences...")

test_sequences = []
test_sample_ids = []

unique_test_sample_indices = test_features['sample_index'].unique()

for sample_idx in unique_test_sample_indices:
    # Extract all time steps for this sample
    sample_data = test_features[test_features['sample_index'] == sample_idx].copy()
    sample_data = sample_data.sort_values('time')
    
    # Preprocess and scale features using the training scaler
    features = preprocess_features(sample_data)
    features_scaled = feature_scaler.transform(features)
    test_sequences.append(features_scaled.astype(np.float32))
    test_sample_ids.append(sample_idx)

print(f"✅ Created {len(test_sequences)} test sequences")

# Pad sequences to match training sequence length
test_sequences_padded = np.zeros((len(test_sequences), sequence_length, feature_count), 
                                 dtype=np.float32)
for i, sequence in enumerate(test_sequences):
    seq_len = sequence.shape[0]
    if seq_len >= sequence_length:
        # Truncate if longer
        test_sequences_padded[i] = sequence[:sequence_length]
    else:
        # Pad if shorter
        test_sequences_padded[i, :seq_len, :] = sequence

print(f"✅ Test tensor shape: {test_sequences_padded.shape}")

### 7.3 Generate PredictionsUse the trained model to predict pain levels for test samples.

In [None]:
print("\n🔮 Generating predictions...")

# Generate predictions
test_prediction_probs = final_model.predict(test_sequences_padded)
test_predictions = np.argmax(test_prediction_probs, axis=1)
test_labels = label_encoder.inverse_transform(test_predictions)

print(f"✅ Generated {len(test_labels)} predictions")

# Show prediction distribution
prediction_distribution = pd.Series(test_labels).value_counts()
print("\n📊 Prediction Distribution:")
print(prediction_distribution)

# Visualize prediction distribution
plt.figure(figsize=(10, 6))
colors_map = {'no_pain': 'green', 'low_pain': 'orange', 'high_pain': 'red'}
bar_colors = [colors_map.get(label, 'gray') for label in prediction_distribution.index]
plt.bar(prediction_distribution.index, prediction_distribution.values, color=bar_colors)
plt.xlabel('Pain Level', fontsize=12)
plt.ylabel('Count', fontsize=12)
plt.title('Test Set Predictions Distribution', fontsize=14, fontweight='bold')
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()

### 7.4 Create Submission FileFormat predictions for submission.

In [None]:
print("📝 Creating submission file...\n")

# Format sample indices as zero-padded 3-digit strings
formatted_sample_ids = [f"{int(idx):03d}" for idx in test_sample_ids]

# Create submission dataframe
submission = pd.DataFrame({
    'sample_index': formatted_sample_ids,
    'label': test_labels
})

# Save to CSV
submission_path = 'submission.csv'
submission.to_csv(submission_path, index=False)

print("✅ Submission file created: submission.csv")
print(f"   Format: Comma-separated (CSV)")
print(f"   Total predictions: {len(submission)}")
print(f"\n📋 First 15 predictions:")
print("=" * 50)
display(submission.head(15))

print("\n✨ Submission ready for upload!")

## 8. Summary### Pipeline OverviewThis notebook implements a complete machine learning pipeline for pirate pain level classification:1. **Data Loading & Exploration**: Load and analyze time-series sensor data2. **Feature Engineering**: Preprocess categorical variables and normalize numerical features3. **Data Preparation**: Create sequences, apply scaling, and split into train/validation sets4. **Class Balancing**: Use SMOTE to handle class imbalance5. **Model Development**: Train multiple architectures (LSTM, GRU, CNN-LSTM)6. **Ensemble Learning**: Combine predictions from multiple models7. **Model Evaluation**: Assess performance using F1-score and confusion matrix8. **Prediction**: Generate predictions for test data### Key ResultsThe final model achieved strong performance on the validation set:- **Primary Metric**: Macro F1-Score- **Model Architecture**: Ensemble of LSTM, GRU, and CNN-LSTM models- **Data Augmentation**: SMOTE for class balance- **Feature Scaling**: StandardScaler normalization### Files Generated- `submission.csv`: Test predictions in required format