# Phase 5: Bi-Directional LSTM Model Training

## Why LSTM for Mudra Recognition?

**Problem:** Mudras are TEMPORAL sequences - hand shape changes over time

**Why not CNN alone?** CNN only looks at single frames (spatial), ignores temporal context

**Why LSTM?**
1. **Temporal Modeling**: Processes sequence of frames, understands hand movement
2. **Long-range Dependencies**: Captures mudra transitions and sustained poses
3. **Bidirectional**: Processes frames forward AND backward for better context
4. **Suitable for Review**: Simple, interpretable, well-established for sequence classification

## Model Architecture

- **Input:** (batch, 25 frames, 63 keypoint features)
- **Bi-LSTM:** 64 units, forward + backward processing
- **Dense layers:** 32 → output
- **Output:** Softmax over 2 classes (Pataka, Tripataka)
- **Loss:** Categorical cross-entropy
- **Optimizer:** Adam

## What We'll Do

1. Build the Bi-LSTM model
2. Load prepared training data
3. Train with early stopping
4. Plot training curves
5. Evaluate on validation set
6. Save the model

In [None]:
# Setup and imports
import sys
sys.path.insert(0, '/Users/vidanadheera/Documents/SEM - 6/CV/Mudra_recognition_new/src')

import numpy as np
import json
import matplotlib.pyplot as plt
from pathlib import Path
from model import build_lstm_model, train_model, evaluate_model, plot_training_history, save_model
import warnings
warnings.filterwarnings('ignore')

# Set base paths
BASE_DIR = Path('/Users/vidanadheera/Documents/SEM - 6/CV/Mudra_recognition_new')
DATA_DIR = BASE_DIR / 'data' / 'prepared'
MODELS_DIR = BASE_DIR / 'models'

MODELS_DIR.mkdir(parents=True, exist_ok=True)

print("=" * 60)
print("LSTM MODEL TRAINING")
print("=" * 60 + "\n")

## Section 1: Load Prepared Training Data

Load the windowed temporal sequences and labels from Phase 4.

In [None]:
# Load data
X_train = np.load(str(DATA_DIR / 'X_train.npy'))
X_val = np.load(str(DATA_DIR / 'X_val.npy'))
y_train = np.load(str(DATA_DIR / 'y_train.npy'))
y_val = np.load(str(DATA_DIR / 'y_val.npy'))

# Load metadata
with open(str(DATA_DIR / 'metadata.json'), 'r') as f:
    metadata = json.load(f)

label_to_idx = metadata['label_to_idx']
idx_to_label = {int(k): v for k, v in metadata['idx_to_label'].items()}
num_classes = metadata['num_classes']
window_size = metadata['window_size']
num_features = metadata['num_features']

print("Training data loaded:")
print(f"  X_train: {X_train.shape} - (samples, time_steps, features)")
print(f"  X_val: {X_val.shape}")
print(f"  y_train: {y_train.shape} - (samples, classes)")
print(f"  y_val: {y_val.shape}")
print(f"\nMetadata:")
print(f"  Classes: {list(label_to_idx.keys())}")
print(f"  Window size: {window_size} frames")
print(f"  Features per frame: {num_features}")
print(f"  Model input shape: ({window_size}, {num_features})\n")

## Section 2: Build Bi-Directional LSTM Model

Build the neural network architecture for temporal mudra classification.

In [None]:
# Build model
input_shape = (window_size, num_features)
model = build_lstm_model(
    input_shape=input_shape,
    num_classes=num_classes,
    lstm_units=64
)

## Section 3: Train the Model

Train the LSTM on temporal windows with early stopping to prevent overfitting.

In [None]:
# Train model
print("\nStarting training...\n")
history = train_model(
    model,
    X_train, y_train,
    X_val, y_val,
    epochs=50,
    batch_size=16
)

print("\n✓ Training complete")

## Section 4: Plot Training History

Visualize loss and accuracy curves to understand model learning.

In [None]:
# Plot training history
fig = plot_training_history(history)
plt.savefig(str(BASE_DIR / 'outputs' / 'training_history.png'), dpi=100, bbox_inches='tight')
plt.show()

## Section 5: Evaluate Model

Test the model on validation data and report metrics.

In [None]:
# Evaluate on validation set
metrics = evaluate_model(model, X_val, y_val, idx_to_label)

## Section 6: Save Trained Model

Persist the model for use in inference.

In [None]:
# Save model
model_path = MODELS_DIR / 'lstm_mudra_model.h5'
save_model(model, str(model_path))

print("=" * 60)
print("TRAINING SUMMARY")
print("=" * 60)
print(f"Model: Bi-Directional LSTM")
print(f"  - Input shape: {input_shape}")
print(f"  - LSTM units: 64 (bidirectional)")
print(f"  - Dense units: 32")
print(f"  - Output classes: {num_classes} {list(idx_to_label.values())}")
print(f"\nTraining:")
print(f"  - Training samples: {len(X_train)}")
print(f"  - Validation samples: {len(X_val)}")
print(f"  - Validation accuracy: {metrics['accuracy']:.4f}")
print(f"\nModel saved to: {model_path}\n")

## Summary: Phase 5 Complete ✓

**What we've accomplished:**
1. ✓ Built Bi-Directional LSTM model
2. ✓ Trained on properly-windowed temporal sequences
3. ✓ Used categorical cross-entropy + Adam optimizer
4. ✓ Applied early stopping to prevent overfitting
5. ✓ Plotted training loss and accuracy curves
6. ✓ Evaluated on validation set
7. ✓ Saved trained model

**Key improvements over earlier attempt:**
- ✓ Proper window-label alignment (fixed misalignment bug)
- ✓ Synthetic data generation ensures known ground truth
- ✓ Single LSTM layer suitable for review (not over-engineered)
- ✓ Clear, interpretable model architecture

**Next steps (Phase 6):**
- Load trained model
- Extract keypoints from test video
- Create temporal windows
- Make end-to-end predictions
- Generate JSON output with mudra sequences
- Visualize results with landmarks