# ASL Sign Language Detection — Training & Testing Guide

This notebook walks you through the full pipeline:
1. **Download** the ASL Alphabet dataset from Kaggle
2. **Extract** hand landmarks using MediaPipe
3. **Train** a Random Forest classifier
4. **Evaluate** model performance
5. **Predict** on a single image

### Running this notebook
You can run this notebook **locally** or use **Docker** for the heavy-lifting steps (extract/train).
Docker and local share the same `data/` and `models/` folders via volume mounts, so you can mix both.

> **Note:** Real-time webcam recognition must be run locally (not in a notebook or Docker).
> After training, use `python main.py realtime` from your terminal.

## 0. Setup

Make sure you have installed the project dependencies:
```bash
pip install -r requirements.txt
```

**If you used Docker** for download/extract/train, the results are already in your local `data/` and `models/` folders (Docker mounts them via volumes). You can skip straight to the evaluation steps.

In [None]:
import os
import sys

# Ensure the project root is on the path
PROJECT_ROOT = os.path.abspath(os.path.join(os.getcwd(), '..'))
if PROJECT_ROOT not in sys.path:
    sys.path.insert(0, PROJECT_ROOT)
os.chdir(PROJECT_ROOT)

print(f'Project root: {PROJECT_ROOT}')

## 1. Download the Dataset

The dataset is the [ASL Alphabet](https://www.kaggle.com/datasets/grassknoted/asl-alphabet) from Kaggle (~1GB).

**Prerequisites (one of):**
- Place `kaggle.json` at `~/.kaggle/kaggle.json` (download from https://www.kaggle.com/settings → Create New Token)
- Or set the environment variable: `export KAGGLE_API_TOKEN=your_token_here`

**Already have the data?** If you downloaded via Docker (`docker compose run --rm download`), the data is already in `data/` — skip this step.

In [None]:
# Check if dataset already exists
DATA_DIR = 'data/asl_alphabet_train/asl_alphabet_train'

if os.path.exists(DATA_DIR):
    classes = sorted([d for d in os.listdir(DATA_DIR) if os.path.isdir(os.path.join(DATA_DIR, d))])
    print(f'Dataset already downloaded! Found {len(classes)} classes: {classes}')
else:
    print('Dataset not found. Run the cell below to download it.')

In [None]:
# Download the dataset (only run if not already present)
if not os.path.exists(DATA_DIR):
    !python main.py download
else:
    print('Skipping download — dataset already exists.')

### Explore the Dataset

Let's look at a few sample images from different classes.

In [None]:
import cv2
import matplotlib.pyplot as plt
import random

classes = sorted([d for d in os.listdir(DATA_DIR) if os.path.isdir(os.path.join(DATA_DIR, d))])

# Show one random sample from 8 random classes
sample_classes = random.sample(classes, min(8, len(classes)))

fig, axes = plt.subplots(2, 4, figsize=(16, 8))
for ax, cls in zip(axes.flat, sample_classes):
    cls_dir = os.path.join(DATA_DIR, cls)
    imgs = [f for f in os.listdir(cls_dir) if f.lower().endswith(('.jpg', '.png'))]
    img_path = os.path.join(cls_dir, random.choice(imgs))
    img = cv2.imread(img_path)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    ax.imshow(img)
    ax.set_title(f'Class: {cls}', fontsize=14, fontweight='bold')
    ax.axis('off')

plt.suptitle('Sample Images from ASL Alphabet Dataset', fontsize=16)
plt.tight_layout()
plt.show()

# Count images per class
print(f'\nTotal classes: {len(classes)}')
for cls in classes:
    count = len(os.listdir(os.path.join(DATA_DIR, cls)))
    print(f'  {cls}: {count} images')

## 2. Extract Features

This step uses MediaPipe to detect hand landmarks in each image, then computes:
- **63 base features**: 21 hand landmarks × 3 coordinates (x, y, z)
- **54 engineered features**: distances, angles, z-depth comparisons, etc.

The result is saved to `data/landmarks.pkl`.

⏱ **Time estimate:**
- Full dataset (~3000 images/class): ~15-20 minutes
- Quick test (200 images/class): ~2-3 minutes

In [None]:
from src.extract_features import extract_features_from_dataset

FEATURES_PATH = 'data/landmarks.pkl'

# Set to None for full dataset, or a number like 200 for a quick test
SAMPLE_PER_CLASS = None  # Change to 200 for a quick run

if os.path.exists(FEATURES_PATH):
    print(f'Features already extracted at {FEATURES_PATH}.')
    print('Delete the file and re-run this cell to re-extract.')
else:
    features, labels = extract_features_from_dataset(
        DATA_DIR, FEATURES_PATH, sample_per_class=SAMPLE_PER_CLASS
    )
    print(f'\nDone! Extracted {len(features)} samples with {features.shape[1]} features each.')

## 3. Train the Model

We train a **Random Forest** classifier (100 trees) on the extracted features.
The model is saved to `models/asl_classifier.pkl`.

In [None]:
from src.train import train_classifier

MODEL_PATH = 'models/asl_classifier.pkl'

clf, accuracy = train_classifier(FEATURES_PATH, MODEL_PATH)
print(f'\n✅ Model trained with {accuracy * 100:.2f}% accuracy')
print(f'   Saved to: {MODEL_PATH}')

## 4. Evaluate Model Performance

Let's visualize the confusion matrix and feature importances.

In [None]:
import pickle
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay, classification_report

# Load features and model
with open(FEATURES_PATH, 'rb') as f:
    data = pickle.load(f)
features, labels, classes = data['features'], data['labels'], data['classes']

with open(MODEL_PATH, 'rb') as f:
    model_data = pickle.load(f)
clf = model_data['classifier']

# Split (same seed as training)
X_train, X_test, y_train, y_test = train_test_split(
    features, labels, test_size=0.2, random_state=42, stratify=labels
)
y_pred = clf.predict(X_test)

In [None]:
# Confusion Matrix
fig, ax = plt.subplots(figsize=(18, 15))
cm = confusion_matrix(y_test, y_pred, labels=classes)
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=classes)
disp.plot(ax=ax, cmap='Blues', xticks_rotation=45)
ax.set_title('Confusion Matrix', fontsize=16)
plt.tight_layout()
plt.show()

In [None]:
# Per-class accuracy
report = classification_report(y_test, y_pred, output_dict=True)

class_acc = {cls: report[cls]['f1-score'] for cls in classes if cls in report}
sorted_acc = sorted(class_acc.items(), key=lambda x: x[1])

fig, ax = plt.subplots(figsize=(12, 8))
names, scores = zip(*sorted_acc)
colors = ['#e74c3c' if s < 0.9 else '#f39c12' if s < 0.95 else '#2ecc71' for s in scores]
ax.barh(names, scores, color=colors)
ax.set_xlim(0.5, 1.0)
ax.set_xlabel('F1 Score', fontsize=12)
ax.set_title('Per-Class F1 Score (red < 90%, yellow < 95%, green ≥ 95%)', fontsize=14)
ax.axvline(x=0.95, color='gray', linestyle='--', alpha=0.5)
plt.tight_layout()
plt.show()

In [None]:
# Top 20 most important features
importances = clf.feature_importances_

# Name the features
feature_names = []
landmarks = ['Wrist','Thumb_CMC','Thumb_MCP','Thumb_IP','Thumb_Tip',
              'Index_MCP','Index_PIP','Index_DIP','Index_Tip',
              'Middle_MCP','Middle_PIP','Middle_DIP','Middle_Tip',
              'Ring_MCP','Ring_PIP','Ring_DIP','Ring_Tip',
              'Pinky_MCP','Pinky_PIP','Pinky_DIP','Pinky_Tip']
for lm in landmarks:
    for coord in ['x', 'y', 'z']:
        feature_names.append(f'{lm}_{coord}')

# Engineered feature names (simplified)
eng_names = (
    [f'thumb_to_{f}_dist' for f in ['index','middle','ring','pinky']] +
    [f'adj_dist_{i}' for i in range(4)] +
    [f'z_depth_{f}' for f in ['index','middle','ring','pinky']] +
    ['z_thumb_palm'] +
    [f'curl_{f}' for f in ['thumb','index','middle','ring','pinky']] +
    [f'spread_{i}' for i in range(4)] +
    ['thumb_idx_x','thumb_idx_y','thumb_idx_z'] +
    [f'tip_wrist_{f}' for f in ['thumb','index','middle','ring','pinky']] +
    ['cross_z'] +
    [f'thumb_pip_{f}' for f in ['index','middle','ring','pinky']] +
    [f'thumb_dip_{f}' for f in ['index','middle','ring','pinky']] +
    [f'y_drape_{f}' for f in ['index','middle','ring','pinky']] +
    ['thumb_mid_x','thumb_mid_y','thumb_mid_z'] +
    ['thumb_ring_x','thumb_ring_y','thumb_ring_z'] +
    ['drape_score','palm_plane_dist'] +
    [f'dip_curl_{f}' for f in ['index','middle','ring','pinky']]
)
feature_names.extend(eng_names)

# Trim or pad to match
feature_names = feature_names[:len(importances)]

top_n = 20
indices = np.argsort(importances)[-top_n:]

fig, ax = plt.subplots(figsize=(10, 8))
ax.barh([feature_names[i] for i in indices], importances[indices], color='steelblue')
ax.set_xlabel('Feature Importance', fontsize=12)
ax.set_title(f'Top {top_n} Most Important Features', fontsize=14)
plt.tight_layout()
plt.show()

## 5. Predict on a Single Image

Test the trained model on any image of a hand sign.

In [None]:
from src.extract_features import HandLandmarkExtractor

# Pick a random test image
test_class = random.choice(classes)
test_dir = os.path.join(DATA_DIR, test_class)
test_img_name = random.choice(os.listdir(test_dir))
test_img_path = os.path.join(test_dir, test_img_name)

print(f'Testing with: {test_img_path}')
print(f'True label: {test_class}')

# Load and predict
image = cv2.imread(test_img_path)
extractor = HandLandmarkExtractor(use_engineered_features=True)
features = extractor.extract_landmarks(image)
extractor.close()

if features is not None:
    features = features.reshape(1, -1)
    prediction = clf.predict(features)[0]
    probs = clf.predict_proba(features)[0]
    confidence = np.max(probs)

    # Display
    fig, ax = plt.subplots(figsize=(6, 6))
    ax.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
    color = 'green' if prediction == test_class else 'red'
    ax.set_title(f'Predicted: {prediction} ({confidence*100:.1f}%)\nTrue: {test_class}',
                 fontsize=14, color=color, fontweight='bold')
    ax.axis('off')
    plt.show()
else:
    print('No hand detected in this image. Try another one.')

## 6. Next Steps

Now that you have a trained model, you can:

### Real-time Webcam Recognition (run locally)
```bash
source venv/bin/activate
python main.py realtime         # Letter-by-letter mode
python main.py realtime --spell  # Spell mode (words)
```

### Docker alternative for training
If you prefer Docker, the `data/` and `models/` folders are shared via volume mounts:
```bash
docker compose run --rm extract-quick  # Extract features
docker compose run --rm train          # Train the model
```
The trained model ends up in the same `models/` folder, so you can run `realtime` locally right after.

### Improve accuracy
- Re-extract with full dataset: set `SAMPLE_PER_CLASS = None` above
- Try different classifiers in `src/train.py` (SVM, XGBoost, etc.)
- See `PLAN.md` for more ideas