# Phase 4: Temporal Windowing of Keypoint Sequences

## Why Temporal Windows?

**LSTM expects fixed-size inputs**, but videos have variable length.

**Solution:** Create overlapping temporal windows from keypoint sequences
- **Window size**: 25 frames (1 second @ 25 FPS) - captures mudra shape over time
- **Step size**: 5 frames - creates overlap for more training samples
- **This is CRITICAL** - poor windowing causes label misalignment and wrong predictions

## What We'll Do

1. Load extracted keypoints from training videos
2. Create overlapping temporal windows (25 frames each)
3. Create corresponding labels for each window
4. Reshape for LSTM input (batch_size, time_steps, features)
5. Verify window integrity and label alignment
6. Split into train/validation sets

In [None]:
# Setup and imports
import sys
sys.path.insert(0, '/Users/vidanadheera/Documents/SEM - 6/CV/Mudra_recognition_new/src')

import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path
from windowing import create_temporal_windows, create_labeled_windows, reshape_for_lstm, verify_window_integrity
from sklearn.model_selection import train_test_split
from tensorflow.keras.utils import to_categorical

# Set base paths
BASE_DIR = Path('/Users/vidanadheera/Documents/SEM - 6/CV/Mudra_recognition_new')
KEYPOINTS_DIR = BASE_DIR / 'keypoints'

print("=" * 60)
print("TEMPORAL WINDOWING AND DATA PREPARATION")
print("=" * 60 + "\n")
print(f"Loading keypoints from: {KEYPOINTS_DIR}\n")

## Section 1: Load Keypoint Data

Load pre-extracted keypoints from Phase 3.

In [None]:
# Load keypoints
pataka_keypoints = np.load(str(KEYPOINTS_DIR / 'pataka_keypoints.npy'))
tripataka_keypoints = np.load(str(KEYPOINTS_DIR / 'tripataka_keypoints.npy'))

print("Loaded keypoints:")
print(f"  Pataka: {pataka_keypoints.shape}")
print(f"  Tripataka: {tripataka_keypoints.shape}")
print()

# Verify shapes
assert pataka_keypoints.shape[1] == 21, "Expected 21 landmarks"
assert pataka_keypoints.shape[2] == 3, "Expected 3 coordinates (x,y,z)"
assert tripataka_keypoints.shape[1] == 21, "Expected 21 landmarks"
assert tripataka_keypoints.shape[2] == 3, "Expected 3 coordinates (x,y,z)"

print("✓ Keypoint shapes verified\n")

## Section 2: Create Temporal Windows with Labels

This is the CRITICAL step for avoiding label misalignment.

In [None]:
# Window parameters
WINDOW_SIZE = 25  # frames (1 second @ 25 fps)
STEP_SIZE = 5     # frames

print("Window Parameters:")
print(f"  Window size: {WINDOW_SIZE} frames ({WINDOW_SIZE/25:.2f} seconds)")
print(f"  Step size: {STEP_SIZE} frames")
print(f"  Expected window count (Pataka): {(len(pataka_keypoints) - WINDOW_SIZE) // STEP_SIZE + 1}")
print(f"  Expected window count (Tripataka): {(len(tripataka_keypoints) - WINDOW_SIZE) // STEP_SIZE + 1}\n")

# Create labeled windows
sequences_dict = {
    'Pataka': [pataka_keypoints],
    'Tripataka': [tripataka_keypoints]
}

windows, labels, label_to_idx, idx_to_label, window_metadata = create_labeled_windows(
    sequences_dict,
    window_size=WINDOW_SIZE,
    step_size=STEP_SIZE
)

print(f"Labeled windows created:")
print(f"  Total windows: {len(windows)}")
print(f"  Windows shape: {windows.shape}")
print(f"  Labels shape: {labels.shape}")
print(f"  Label mapping: {label_to_idx}\n")

# Verify window integrity
verify_window_integrity(windows, labels, window_metadata)

## Section 3: Reshape for LSTM Input

LSTM expects input shape: (batch_size, time_steps, features)

We need to flatten landmarks and coordinates into a single feature dimension.

In [None]:
# Reshape for LSTM
X = reshape_for_lstm(windows)

print(f"LSTM Input prepared:")
print(f"  Input shape: {X.shape}")
print(f"  Batch size (samples): {X.shape[0]}")
print(f"  Time steps (frames per window): {X.shape[1]}")
print(f"  Features (landmarks × coordinates): {X.shape[2]}")
print(f"  Calculation: 21 landmarks × 3 coordinates = 63 features\n")

# Convert labels to one-hot encoding
num_classes = len(label_to_idx)
y = to_categorical(labels, num_classes=num_classes)

print(f"Labels one-hot encoded:")
print(f"  Shape: {y.shape}")
print(f"  Classes: {num_classes}")
print(f"  Class names: {idx_to_label}\n")

## Section 4: Train-Validation Split

Split data for model training and validation.

In [None]:
# Split into train and validation sets
X_train, X_val, y_train, y_val = train_test_split(
    X, y,
    test_size=0.2,
    random_state=42,
    stratify=labels  # Ensure balanced split
)

print("Train-Validation Split:")
print(f"  Training samples: {len(X_train)}")
print(f"  Validation samples: {len(X_val)}")
print(f"  Total: {len(X_train) + len(X_val)}")
print(f"  Split ratio: {100 * len(X_train) / len(X):.0f}% train / {100 * len(X_val) / len(X):.0f}% val\n")

# Class distribution
train_dist = np.sum(y_train, axis=0)
val_dist = np.sum(y_val, axis=0)

print("Class distribution:")
print(f"  Training:")
for i, name in idx_to_label.items():
    print(f"    {name}: {int(train_dist[i])} ({100*train_dist[i]/len(X_train):.1f}%)")
print(f"  Validation:")
for i, name in idx_to_label.items():
    print(f"    {name}: {int(val_dist[i])} ({100*val_dist[i]/len(X_val):.1f}%)")
print()

## Section 5: Save Prepared Data

Save the prepared data for use in the training notebook.

In [None]:
# Save data for training
data_dir = BASE_DIR / 'data' / 'prepared'
data_dir.mkdir(parents=True, exist_ok=True)

np.save(str(data_dir / 'X_train.npy'), X_train)
np.save(str(data_dir / 'X_val.npy'), X_val)
np.save(str(data_dir / 'y_train.npy'), y_train)
np.save(str(data_dir / 'y_val.npy'), y_val)

# Also save metadata
import json
metadata = {
    'label_to_idx': label_to_idx,
    'idx_to_label': {str(k): v for k, v in idx_to_label.items()},
    'window_size': WINDOW_SIZE,
    'step_size': STEP_SIZE,
    'num_features': X.shape[2],
    'num_classes': num_classes
}

with open(str(data_dir / 'metadata.json'), 'w') as f:
    json.dump(metadata, f, indent=2)

print(f"Data saved to {data_dir}")
print(f"  - X_train.npy: {X_train.shape}")
print(f"  - X_val.npy: {X_val.shape}")
print(f"  - y_train.npy: {y_train.shape}")
print(f"  - y_val.npy: {y_val.shape}")
print(f"  - metadata.json\n")

## Summary: Phase 4 Complete ✓

**What we've accomplished:**
1. ✓ Loaded extracted keypoint sequences
2. ✓ Created overlapping temporal windows (25 frames, step 5)
3. ✓ Created label-window alignment (CRITICAL for correctness)
4. ✓ Reshaped for LSTM input: (batch, time_steps=25, features=63)
5. ✓ Split into balanced train/validation sets
6. ✓ Saved prepared data

**Key insight:**
- Proper windowing and label alignment is CRITICAL
- Poor alignment causes the model to learn incorrect mudra-label mappings
- This was likely the cause of wrong predictions in the previous attempt

**Data ready for training:**
- X_train/X_val: (num_samples, 25, 63) - temporal windows of hand keypoints
- y_train/y_val: (num_samples, 2) - one-hot encoded labels

**Next steps (Phase 5):**
- Build and train Bi-Directional LSTM
- Monitor training/validation loss and accuracy
- Save the trained model