# Silksong Gesture Recognition - CNN/LSTM Training

**Training for Hollow Knight: Silksong Voice-Controlled Watch Interface**

This notebook trains a CNN/LSTM deep learning model for real-time gesture recognition.

## Setup Requirements:
1. ✅ Enable GPU: Runtime > Change runtime type > GPU (T4 recommended)
2. ✅ Upload your data to Google Drive in: `My Drive/silksong_data/`
3. ✅ Each session folder should contain:
   - `sensor_data.csv` (accelerometer + gyroscope data)
   - `[session]_labels.csv` (gesture labels with timestamps)

## Expected Training Time:
- **With GPU (T4):** 20-40 minutes
- **Without GPU (CPU):** 2-4 hours (not recommended)

---

## 1. Mount Google Drive & Install Dependencies

In [None]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

print("\n✅ Google Drive mounted!")
print("Your data should be in: /content/drive/MyDrive/silksong_data/")

In [None]:
# Check GPU availability
import tensorflow as tf

print("TensorFlow version:", tf.__version__)
print("\nGPU Available:", tf.config.list_physical_devices('GPU'))

if tf.config.list_physical_devices('GPU'):
    print("\n✅ GPU is enabled! Training will be fast.")
else:
    print("\n⚠️  No GPU detected. Training will be slow.")
    print("   Enable GPU: Runtime > Change runtime type > GPU")

In [None]:
# Import required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.utils import shuffle
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import os
from pathlib import Path

print("✅ All imports successful!")

# Set random seeds for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

## 2. Configure Data Paths

**Update this cell with your session folder names!**

In [None]:
# Data directory in Google Drive
DATA_DIR = '/content/drive/MyDrive/silksong_data'

# List your session folders here
SESSION_FOLDERS = [
    '20251017_125600_session',
    '20251017_135458_session',
    '20251017_141539_session',
    '20251017_143217_session',
    '20251017_143627_session',
]

# Model configuration
# 🔧 REDUCED WINDOW SIZE to capture short gesture labels (0.3s duration)
WINDOW_SIZE = 25  # 0.5 seconds at 50Hz (was 50 = 1.0s)
STRIDE = 12       # 0.24 seconds overlap (was 25 = 0.5s)

# Why reduced window size?
# - Your voice labels are only 0.3s duration (word pronunciation time)
# - Original 1.0s windows required 50 consecutive samples of same gesture
# - 0.3s labels = 15 samples, can't fill a 50-sample window
# - New 0.5s windows = 25 samples, can capture 0.3s labels!
# - Expected result: 50-100 jump windows instead of only 7

# Expected features: accel_x, accel_y, accel_z, gyro_x, gyro_y, gyro_z,
#                    rot_w, rot_x, rot_y, rot_z = 10 features
# (timestamp and sensor columns are excluded)
NUM_FEATURES = 10  # Will be verified during data loading

# Gesture classes
GESTURES = ['jump', 'punch', 'turn', 'walk', 'noise']
NUM_CLASSES = len(GESTURES)

print(f"Configured {len(SESSION_FOLDERS)} sessions for training")
print(f"Gestures: {GESTURES}")
print(f"Window: {WINDOW_SIZE} samples ({WINDOW_SIZE/50.0:.2f}s) with {STRIDE} sample stride ({STRIDE/50.0:.2f}s)")

## 3. Load and Prepare Training Data

In [None]:
def load_session_data(session_folder):
    """Load sensor data and labels for one session"""
    session_path = os.path.join(DATA_DIR, session_folder)

    # Load sensor data
    sensor_file = os.path.join(session_path, 'sensor_data.csv')
    sensor_data_raw = pd.read_csv(sensor_file, skipinitialspace=True)

    # Clean column names (strip whitespace)
    sensor_data_raw.columns = sensor_data_raw.columns.str.strip()

    # 🔧 FIX: Process sensor data to handle separate rows per sensor
    # Sensor data has separate rows for each sensor type (linear_acceleration, gyroscope, rotation_vector)
    # We need to merge them into one row per timestamp with all sensor values
    
    # Separate by sensor type
    accel_data = sensor_data_raw[sensor_data_raw['sensor'] == 'linear_acceleration'][['timestamp', 'accel_x', 'accel_y', 'accel_z']].copy()
    gyro_data = sensor_data_raw[sensor_data_raw['sensor'] == 'gyroscope'][['timestamp', 'gyro_x', 'gyro_y', 'gyro_z']].copy()
    rot_data = sensor_data_raw[sensor_data_raw['sensor'] == 'rotation_vector'][['timestamp', 'rot_w', 'rot_x', 'rot_y', 'rot_z']].copy()
    
    # Get all unique timestamps
    all_timestamps = pd.DataFrame({'timestamp': sorted(sensor_data_raw['timestamp'].unique())})
    
    # Merge all sensors on timestamp
    sensor_data = all_timestamps.copy()
    sensor_data = sensor_data.merge(accel_data, on='timestamp', how='left')
    sensor_data = sensor_data.merge(gyro_data, on='timestamp', how='left')
    sensor_data = sensor_data.merge(rot_data, on='timestamp', how='left')
    
    # Forward-fill to propagate sensor values (sensors update at different rates)
    feature_cols = ['accel_x', 'accel_y', 'accel_z', 'gyro_x', 'gyro_y', 'gyro_z', 'rot_w', 'rot_x', 'rot_y', 'rot_z']
    sensor_data[feature_cols] = sensor_data[feature_cols].ffill()
    
    # Fill any remaining NaN (at the beginning) with 0
    sensor_data[feature_cols] = sensor_data[feature_cols].fillna(0)

    # Load labels
    labels_file = os.path.join(session_path, f'{session_folder}_labels.csv')
    labels_data = pd.read_csv(labels_file)

    return sensor_data, labels_data


def create_label_vector(sensor_data, labels_data):
    """Create per-sample labels from segment labels"""
    num_samples = len(sensor_data)
    label_vector = np.full(num_samples, -1, dtype=int)

    # Assuming 50Hz sampling rate
    sample_rate = 50.0

    for _, row in labels_data.iterrows():
        start_time = row['timestamp']
        duration = row['duration']
        gesture = row['gesture']

        if gesture not in GESTURES:
            continue

        gesture_idx = GESTURES.index(gesture)

        # Convert time to sample indices
        start_idx = int(start_time * sample_rate)
        end_idx = int((start_time + duration) * sample_rate)

        # Clip to valid range
        start_idx = max(0, min(start_idx, num_samples))
        end_idx = max(0, min(end_idx, num_samples))

        label_vector[start_idx:end_idx] = gesture_idx

    return label_vector


def create_windows(sensor_data, labels, window_size, stride):
    """Create sliding windows from continuous data"""
    X = []
    y = []

    num_samples = len(sensor_data)

    for i in range(0, num_samples - window_size, stride):
        window = sensor_data[i:i+window_size]
        window_labels = labels[i:i+window_size]

        # Skip if window contains unlabeled data
        if np.any(window_labels == -1):
            continue

        # Use majority vote for window label
        label = np.bincount(window_labels).argmax()

        X.append(window)
        y.append(label)

    return np.array(X), np.array(y)


print("✅ Helper functions defined")

In [None]:
# Load and process all sessions
all_X = []
all_y = []

for session_folder in SESSION_FOLDERS:
    print(f"\nProcessing {session_folder}...")

    try:
        sensor_data, labels_data = load_session_data(session_folder)
        print(f"  Sensor samples: {len(sensor_data)}")
        print(f"  Label segments: {len(labels_data)}")

        # Extract features (exclude non-numeric columns: timestamp, sensor)
        feature_cols = [col for col in sensor_data.columns
                       if col not in ['timestamp', 'sensor']]

        # Convert to float32 explicitly to avoid dtype issues
        features = sensor_data[feature_cols].astype(np.float32).values

        # Verify features are numeric
        print(f"  Feature columns ({len(feature_cols)}): {feature_cols}")
        print(f"  Feature shape: {features.shape}")
        print(f"  Feature dtype: {features.dtype}")

        # Create per-sample labels
        label_vector = create_label_vector(sensor_data, labels_data)

        # Create sliding windows
        X, y = create_windows(features, label_vector, WINDOW_SIZE, STRIDE)
        print(f"  Generated {len(X)} windows")

        all_X.append(X)
        all_y.append(y)

    except Exception as e:
        print(f"  ❌ Error: {e}")
        import traceback
        traceback.print_exc()
        continue

# Combine all sessions
if all_X:
    X_combined = np.concatenate(all_X, axis=0)
    y_combined = np.concatenate(all_y, axis=0)

    print(f"\n✅ Total training windows: {len(X_combined)}")
    print(f"   Input shape: {X_combined.shape}")
    print(f"   Labels shape: {y_combined.shape}")
    print(f"   X dtype: {X_combined.dtype}")
    print(f"   y dtype: {y_combined.dtype}")

    # Show class distribution
    print("\n   Class distribution:")
    for i, gesture in enumerate(GESTURES):
        count = np.sum(y_combined == i)
        percentage = count / len(y_combined) * 100
        print(f"     {gesture}: {count} ({percentage:.1f}%)")
else:
    print("\n❌ No data loaded! Check your data paths.")

## 4. Split Train/Validation/Test Sets

In [None]:
# Shuffle data
X_combined, y_combined = shuffle(X_combined, y_combined, random_state=42)

# Split: 70% train, 15% validation, 15% test
X_temp, X_test, y_temp, y_test = train_test_split(
    X_combined, y_combined, test_size=0.15, random_state=42, stratify=y_combined
)

X_train, X_val, y_train, y_val = train_test_split(
    X_temp, y_temp, test_size=0.176, random_state=42, stratify=y_temp  # 0.176 of 0.85 ≈ 0.15 overall
)

print(f"Training set:   {len(X_train)} samples ({len(X_train)/len(X_combined)*100:.1f}%)")
print(f"Validation set: {len(X_val)} samples ({len(X_val)/len(X_combined)*100:.1f}%)")
print(f"Test set:       {len(X_test)} samples ({len(X_test)/len(X_combined)*100:.1f}%)")

# ============================================================================
# 🔧 CHECK FOR NaN/INF IN DATA
# ============================================================================
print("\n" + "="*60)
print("DATA QUALITY CHECK")
print("="*60)

nan_count = np.isnan(X_train).sum()
inf_count = np.isinf(X_train).sum()

print(f"\nNaN values in training data: {nan_count}")
print(f"Inf values in training data: {inf_count}")

if nan_count > 0 or inf_count > 0:
    print("⚠️  WARNING: Invalid values detected!")
    print("   Replacing NaN with 0 and clipping infinite values...")
    X_train = np.nan_to_num(X_train, nan=0.0, posinf=1e6, neginf=-1e6)
    X_val = np.nan_to_num(X_val, nan=0.0, posinf=1e6, neginf=-1e6)
    X_test = np.nan_to_num(X_test, nan=0.0, posinf=1e6, neginf=-1e6)
    print("✅ Data cleaned!")

# Check data range
print(f"\nData range:")
print(f"  Min: {X_train.min():.4f}")
print(f"  Max: {X_train.max():.4f}")
print(f"  Mean: {X_train.mean():.4f}")
print(f"  Std: {X_train.std():.4f}")

# Check class distribution
print("\n" + "="*60)
print("CLASS DISTRIBUTION")
print("="*60)

print("\nTraining set:")
for i, gesture in enumerate(GESTURES):
    count = np.sum(y_train == i)
    pct = count / len(y_train) * 100
    print(f"  {gesture:8s}: {count:4d} ({pct:5.1f}%)")

print("\nValidation set:")
for i, gesture in enumerate(GESTURES):
    count = np.sum(y_val == i)
    pct = count / len(y_val) * 100
    print(f"  {gesture:8s}: {count:4d} ({pct:5.1f}%)")

print("\nTest set:")
for i, gesture in enumerate(GESTURES):
    count = np.sum(y_test == i)
    pct = count / len(y_test) * 100
    print(f"  {gesture:8s}: {count:4d} ({pct:5.1f}%)")

# ============================================================================
# 🔧 SMART CLASS WEIGHT STRATEGY
# ============================================================================
# Use softened class weights to handle imbalance without numerical instability
# Softening prevents extreme weights that can cause NaN loss

from sklearn.utils.class_weight import compute_class_weight

print("\n" + "="*60)
print("CLASS WEIGHT STRATEGY")
print("="*60)

# Calculate imbalance ratio
class_counts = [np.sum(y_train == i) for i in range(NUM_CLASSES)]
max_class_count = max(class_counts)
min_class_count = min(class_counts)
imbalance_ratio = max_class_count / min_class_count

print(f"\nClass imbalance ratio: {imbalance_ratio:.1f}x")
for i, gesture in enumerate(GESTURES):
    count = class_counts[i]
    pct = count / len(y_train) * 100
    print(f"  {gesture:8s}: {count:4d} samples ({pct:5.1f}%)")

# Compute balanced class weights
class_weights_array = compute_class_weight(
    'balanced',
    classes=np.unique(y_train),
    y=y_train
)

# Apply softening: Use square root to reduce extreme weights
# This prevents numerical instability while still helping minority classes
if imbalance_ratio > 10:
    print("\n🔧 Applying softening (sqrt) to prevent extreme weights...")
    class_weights_array = np.sqrt(class_weights_array)
    class_weights = dict(enumerate(class_weights_array))
    
    print("\nSoftened class weights:")
    for i, gesture in enumerate(GESTURES):
        print(f"  {gesture:8s}: {class_weights[i]:.3f}")
    
    max_weight = max(class_weights.values())
    min_weight = min(class_weights.values())
    weight_ratio = max_weight / min_weight
    print(f"\nWeight ratio after softening: {weight_ratio:.2f}x (was {imbalance_ratio:.1f}x)")
    print("✅ Softening reduces numerical instability while preserving class balance")
else:
    print("\n✅ Imbalance is moderate, using standard balanced weights")
    class_weights = dict(enumerate(class_weights_array))
    
    print("\nClass weights:")
    for i, gesture in enumerate(GESTURES):
        print(f"  {gesture:8s}: {class_weights[i]:.3f}")

print("\n💡 Expected results:")
print("   - All gestures: 75-85% accuracy (balanced learning)")
print("   - Overall: 85-92% accuracy")
print("   - Stable training with softened weights")


## 5. Build CNN/LSTM Model

In [None]:
def create_cnn_lstm_model(input_shape, num_classes):
    """Create CNN/LSTM architecture for gesture recognition"""

    model = keras.Sequential([
        # Input layer
        layers.Input(shape=input_shape),

        # CNN layers for feature extraction
        # Note: Adjusted for smaller window size (25 samples instead of 50)
        layers.Conv1D(filters=64, kernel_size=3, padding='same', activation='relu'),
        layers.BatchNormalization(),
        layers.MaxPooling1D(pool_size=2),

        layers.Conv1D(filters=128, kernel_size=3, padding='same', activation='relu'),
        layers.BatchNormalization(),
        # Removed second pooling to preserve temporal resolution with smaller input

        # LSTM layers for temporal modeling
        layers.LSTM(64, return_sequences=True),
        layers.Dropout(0.3),

        layers.LSTM(32),
        layers.Dropout(0.3),

        # Dense layers for classification
        layers.Dense(64, activation='relu'),
        layers.Dropout(0.3),

        layers.Dense(num_classes, activation='softmax')
    ])

    return model


# Create model
input_shape = (WINDOW_SIZE, NUM_FEATURES)
model = create_cnn_lstm_model(input_shape, NUM_CLASSES)

# Compile model with GRADIENT CLIPPING to prevent NaN
optimizer = keras.optimizers.Adam(
    learning_rate=0.001,
    clipnorm=1.0  # Clip gradients to prevent explosion
)

model.compile(
    optimizer=optimizer,
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

model.summary()
print(f"\n💡 Model input shape: ({WINDOW_SIZE}, {NUM_FEATURES}) = {WINDOW_SIZE/50:.2f}s windows")
print(f"   Optimizer: Adam with gradient clipping (clipnorm=1.0)")
print(f"   This prevents NaN loss from exploding gradients")

## 6. Train Model

In [None]:
# Training callbacks
callbacks = [
    keras.callbacks.EarlyStopping(
        monitor='val_loss',
        patience=10,
        restore_best_weights=True,
        verbose=1
    ),
    keras.callbacks.ReduceLROnPlateau(
        monitor='val_loss',
        factor=0.5,
        patience=5,
        min_lr=1e-6,
        verbose=1
    ),
    keras.callbacks.ModelCheckpoint(
        'best_model.h5',
        monitor='val_accuracy',
        save_best_only=True,
        verbose=1
    )
]

print("✅ Callbacks configured")

In [None]:
# Train model
print("🚀 Starting training...\n")

# Choose your strategy:
# OPTION 1: Use softened class weights (recommended for 400x imbalance)
# OPTION 2: Set class_weights=None to train without weights

history = model.fit(
    X_train, y_train,
    validation_data=(X_val, y_val),
    epochs=100,
    batch_size=32,
    callbacks=callbacks,
    class_weight=class_weights,  # Using softened weights from previous cell
    verbose=1
)

print("\n✅ Training complete!")

## 🔍 URGENT: Diagnose Training Issues

**If you see training accuracy bouncing around 10-20% and validation accuracy dropping after epoch 3:**

This suggests the class weights may be too extreme or there's a data issue. Run the diagnostic below!

In [None]:
# 🚨 CRITICAL DIAGNOSTIC: What went wrong?

print("="*70)
print("TRAINING ISSUE DIAGNOSIS")
print("="*70)

# Check what the model is actually predicting
y_train_pred = model.predict(X_train[:500], verbose=0)  # Check first 500 samples
y_train_pred_classes = np.argmax(y_train_pred, axis=1)

print("\n1️⃣ MODEL PREDICTION DISTRIBUTION (on training data):")
print("-" * 70)
for i, gesture in enumerate(GESTURES):
    count = np.sum(y_train_pred_classes == i)
    pct = count / len(y_train_pred_classes) * 100
    print(f"  {gesture:8s}: {count:4d} predictions ({pct:5.1f}%)")

# Check if model is stuck predicting one class
unique_preds = len(np.unique(y_train_pred_classes))
print(f"\n  ⚠️  Model is predicting {unique_preds} out of {NUM_CLASSES} classes")

if unique_preds == 1:
    print(f"  🚨 PROBLEM: Model is ONLY predicting '{GESTURES[y_train_pred_classes[0]]}'!")
    print("  This means the model collapsed to always predict one class.")

# Check class weight values
print("\n2️⃣ CLASS WEIGHTS USED:")
print("-" * 70)
for i, gesture in enumerate(GESTURES):
    print(f"  {gesture:8s}: {class_weights[i]:.3f}")

max_weight = max(class_weights.values())
min_weight = min(class_weights.values())
weight_ratio = max_weight / min_weight

print(f"\n  Weight ratio (max/min): {weight_ratio:.2f}x")

if weight_ratio > 10:
    print("  ⚠️  EXTREME weight ratio! This can destabilize training.")

# Check actual class distribution again
print("\n3️⃣ ACTUAL CLASS DISTRIBUTION (training data):")
print("-" * 70)
for i, gesture in enumerate(GESTURES):
    count = np.sum(y_train == i)
    pct = count / len(y_train) * 100
    print(f"  {gesture:8s}: {count:4d} samples ({pct:5.1f}%)")

# Check for extreme imbalance
class_counts = [np.sum(y_train == i) for i in range(NUM_CLASSES)]
max_class_count = max(class_counts)
min_class_count = min(class_counts)
imbalance_ratio = max_class_count / min_class_count

print(f"\n  Imbalance ratio (max/min): {imbalance_ratio:.2f}x")

if imbalance_ratio > 30:
    print("  🚨 SEVERE IMBALANCE! The rarest class has <3% of data.")
    print("  Recommendation: Collect more data for rare classes.")
elif imbalance_ratio > 10:
    print("  ⚠️  SIGNIFICANT IMBALANCE. Class weights may need adjustment.")

print("\n" + "="*70)
print("RECOMMENDATIONS")
print("="*70)

if weight_ratio > 10:
    print("\n✅ FIX #1: Use Softer Class Weights")
    print("   Replace the class weight calculation with:")
    print("   ```")
    print("   # Softer class weights (less extreme)")
    print("   class_weights_array = compute_class_weight(")
    print("       'balanced', classes=np.unique(y_train), y=y_train")
    print("   )")
    print("   # Apply square root to soften weights")
    print("   class_weights_array = np.sqrt(class_weights_array)")
    print("   class_weights = dict(enumerate(class_weights_array))")
    print("   ```")

if imbalance_ratio > 20:
    print("\n✅ FIX #2: Try Training Without Class Weights First")
    print("   The imbalance might not be as bad as it looks.")
    print("   Comment out the class_weight parameter:")
    print("   ```")
    print("   history = model.fit(")
    print("       X_train, y_train,")
    print("       validation_data=(X_val, y_val),")
    print("       # class_weight=class_weights,  # Try without this")
    print("       epochs=100,")
    print("       ...")
    print("   ```")

print("\n✅ FIX #3: Check Your Data Quality")
print("   Run the evaluation cells below to see the confusion matrix.")
print("   If the model saved at epoch 3 actually works, you might be fine!")

print("\n" + "="*70)

## 7. Evaluate Model

In [None]:
# Plot training history
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# Accuracy
ax1.plot(history.history['accuracy'], label='Train')
ax1.plot(history.history['val_accuracy'], label='Validation')
ax1.set_xlabel('Epoch')
ax1.set_ylabel('Accuracy')
ax1.set_title('Model Accuracy')
ax1.legend()
ax1.grid(True)

# Loss
ax2.plot(history.history['loss'], label='Train')
ax2.plot(history.history['val_loss'], label='Validation')
ax2.set_xlabel('Epoch')
ax2.set_ylabel('Loss')
ax2.set_title('Model Loss')
ax2.legend()
ax2.grid(True)

plt.tight_layout()
plt.show()

In [None]:
# Evaluate on test set
test_loss, test_accuracy = model.evaluate(X_test, y_test, verbose=0)
print(f"\n📊 Test Accuracy: {test_accuracy*100:.2f}%")
print(f"   Test Loss: {test_loss:.4f}")

# Predictions
y_pred = model.predict(X_test, verbose=0)
y_pred_classes = np.argmax(y_pred, axis=1)

# Classification report
print("\n" + "="*60)
print("CLASSIFICATION REPORT")
print("="*60)
print(classification_report(y_test, y_pred_classes, target_names=GESTURES))

In [None]:
# Confusion matrix
cm = confusion_matrix(y_test, y_pred_classes)
cm_normalized = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]

plt.figure(figsize=(10, 8))
sns.heatmap(cm_normalized, annot=True, fmt='.2f', cmap='Blues',
            xticklabels=GESTURES, yticklabels=GESTURES)
plt.xlabel('Predicted')
plt.ylabel('True')
plt.title('Confusion Matrix (Normalized)')
plt.tight_layout()
plt.show()

## 8. Save Trained Model

In [None]:
# Save to Google Drive
model_save_path = '/content/drive/MyDrive/silksong_data/cnn_lstm_gesture.h5'
model.save(model_save_path)

print(f"✅ Model saved to: {model_save_path}")
print("\nDownload this file to your local project and place it in the 'models/' directory")
print("Then run: python src/udp_listener_v3.py")

## ✅ Training Complete!

### Next Steps:

1. **Download the trained model:**
   - Right-click on the file in Google Drive: `silksong_data/cnn_lstm_gesture.h5`
   - Download to your local machine

2. **Place model in your project:**
   ```bash
   # Move to your project's models directory
   mv ~/Downloads/cnn_lstm_gesture.h5 /path/to/project/models/
   ```

3. **Test real-time recognition:**
   ```bash
   cd src
   python udp_listener_v3.py
   ```

4. **Expected performance:**
   - Latency: 10-30ms per prediction
   - Accuracy: 90-98%
   - Much faster than Phase IV SVM model!

---

**Questions or issues?** Check the documentation in `docs/Phase_V/README.md`