# Arabic Word Training (KArSL + MediaPipe + BiLSTM)

## GPU-Optimized Word-Level Arabic Sign Language Recognition

This notebook builds a **word-level** Arabic sign language recognition model using KArSL data filtered by `shared_word_vocabulary.csv`.

### Pipeline:

1. **GPU Detection & Configuration** ‚Äî memory growth, mixed precision, device verification
2. **Config & Imports** ‚Äî all paths and hyper-parameters in one place
3. **Load Shared Vocabulary** ‚Äî filter to matched bilingual word set
4. **KArSL Data Loading** ‚Äî supports pre-extracted keypoints (.npy/.csv) or raw video (.mp4)
5. **Data Exploration** ‚Äî class distribution, sample visualization
6. **Preprocessing & Splits** ‚Äî StandardScaler, stratified 60/20/20 split, class weights
7. **Build & Train BiLSTM** ‚Äî GPU-accelerated with tf.data pipeline
8. **Evaluation** ‚Äî top-1/top-5 accuracy, confusion matrix, classification report, per-category breakdown

### Supported Input Formats:

- Pre-extracted MediaPipe keypoints (`.npy` or `.csv` files per sample)
- Raw video files (`.mp4`) with on-the-fly MediaPipe extraction

### Output Artifacts:

- `arsl_word_sequences.npz` ‚Äî extracted landmark sequences
- `arsl_word_lstm_model_best.h5` ‚Äî best checkpoint (val_accuracy)
- `arsl_word_lstm_model_final.h5` ‚Äî final model after early-stopping
- `arsl_word_classes.csv` ‚Äî class index ‚Üî word_id mapping

### Key: Same Architecture as ASL Word Notebook

Both the English and Arabic word models share identical BiLSTM architecture and the same `shared_word_vocabulary.csv`, enabling bilingual translation in the final combined notebook.


In [1]:
# ============================================
# Section 1: Import Required Libraries
# ============================================
import os
import time
import warnings
from pathlib import Path

import cv2
import mediapipe as mp_lib
import numpy as np
import pandas as pd
import seaborn as sns
import tensorflow as tf
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.utils.class_weight import compute_class_weight
from sklearn.metrics import confusion_matrix, classification_report
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import (
    Input, LSTM, Bidirectional, Dense, Dropout, BatchNormalization
)
from tensorflow.keras.callbacks import (
    ModelCheckpoint, EarlyStopping, ReduceLROnPlateau
)
from tensorflow.keras.utils import to_categorical
from tensorflow.keras import mixed_precision
from tqdm import tqdm

warnings.filterwarnings('ignore', category=UserWarning)

print('=' * 60)
print('‚úÖ All libraries imported successfully!')
print(f'üì¶ TensorFlow : {tf.__version__}')
print(f'üì¶ NumPy      : {np.__version__}')
print(f'üì¶ Pandas     : {pd.__version__}')
print(f'üì¶ OpenCV     : {cv2.__version__}')
print('=' * 60)


‚úÖ All libraries imported successfully!
üì¶ TensorFlow : 2.10.0
üì¶ NumPy      : 1.23.5
üì¶ Pandas     : 2.0.3
üì¶ OpenCV     : 4.11.0


## Section 2: GPU Detection and Configuration

Automatic GPU detection, memory growth, and optional mixed precision.  
**If you get 'Out of Memory' errors** ‚Üí reduce `BATCH_SIZE` to 16 or 8.  
**If training shows NaN loss** ‚Üí set `ENABLE_MIXED_PRECISION = False`.  
**Monitor GPU** ‚Üí run `nvidia-smi -l 1` in a separate terminal.


In [2]:
# ============================================
# Section 2: GPU DETECTION & CONFIGURATION
# ============================================
print('=' * 60)
print('üîç GPU DETECTION & CONFIGURATION')
print('=' * 60)
print(f'\nTensorFlow version: {tf.__version__}')
print(f'Built with CUDA  : {tf.test.is_built_with_cuda()}')

# List all physical devices
physical_devices = tf.config.list_physical_devices()
print(f'\nAll Physical Devices: {physical_devices}')

# GPU detection
gpus = tf.config.list_physical_devices('GPU')
print(f'\nüéÆ GPU Devices Found: {len(gpus)}')

USE_GPU = False
DEVICE = '/CPU:0'

if gpus:
    print('\n‚úÖ GPU IS AVAILABLE!')
    try:
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
        print(f'   ‚úÖ Memory growth enabled for {len(gpus)} GPU(s)')

        tf.config.set_visible_devices(gpus[0], 'GPU')
        USE_GPU = True
        DEVICE = '/GPU:0'
        print(f'   ‚úÖ Using GPU: {gpus[0].name}')

        try:
            details = tf.config.experimental.get_device_details(gpus[0])
            if 'device_name' in details:
                print(f'   üìä Device Name       : {details["device_name"]}')
            if 'compute_capability' in details:
                print(f'   üìä Compute Capability: {details["compute_capability"]}')
        except Exception:
            pass

    except RuntimeError as e:
        print(f'   ‚ö†Ô∏è  GPU config error: {e}')
else:
    print('\n‚ö†Ô∏è  No GPU detected ‚Äî training on CPU (will be slower)')

# Mixed precision ‚Äî safer to keep off for LSTM by default
ENABLE_MIXED_PRECISION = False

if USE_GPU and ENABLE_MIXED_PRECISION:
    try:
        policy = mixed_precision.Policy('mixed_float16')
        mixed_precision.set_global_policy(policy)
        print(f'\n‚ö° Mixed precision enabled: {policy.name}')
    except Exception as e:
        print(f'\n‚ö†Ô∏è  Mixed precision not enabled: {e}')
else:
    mixed_precision.set_global_policy('float32')
    print(f'\nüìê Using float32 precision (stable for LSTM)')

# GPU verification test
if USE_GPU:
    print('\nüß™ GPU Verification Test...')
    try:
        with tf.device('/GPU:0'):
            a = tf.constant([[1.0, 2.0], [3.0, 4.0]])
            b = tf.constant([[5.0, 6.0], [7.0, 8.0]])
            c = tf.matmul(a, b)
        print(f'   ‚úÖ GPU computation successful: {c.device}')
    except Exception as e:
        print(f'   ‚ùå GPU test failed: {e}')
        USE_GPU = False
        DEVICE = '/CPU:0'

print(f'\n‚úÖ Configuration complete. Using device: {DEVICE}')
print('=' * 60)


üîç GPU DETECTION & CONFIGURATION

TensorFlow version: 2.10.0
Built with CUDA  : True

All Physical Devices: [PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

üéÆ GPU Devices Found: 1

‚úÖ GPU IS AVAILABLE!
   ‚úÖ Memory growth enabled for 1 GPU(s)
   ‚úÖ Using GPU: /physical_device:GPU:0
   üìä Device Name       : NVIDIA GeForce MX150
   üìä Compute Capability: (6, 1)

üìê Using float32 precision (stable for LSTM)

üß™ GPU Verification Test...
   ‚úÖ GPU computation successful: /job:localhost/replica:0/task:0/device:GPU:0

‚úÖ Configuration complete. Using device: /GPU:0


In [3]:
# ============================================
# Section 3: Configuration & Paths
# ============================================
# ‚ö†Ô∏è UPDATE THIS SINGLE PATH for your machine:
PROJECT_ROOT = Path(r'E:/Term 9/Grad')

# Derived paths
SLR_MAIN     = PROJECT_ROOT / 'Main/Sign-Language-Recognition-System-main/SLR Main'
WORDS_ROOT   = SLR_MAIN / 'Words'
SHARED_CSV   = WORDS_ROOT / 'Shared/shared_word_vocabulary.csv'

# ‚ö†Ô∏è SET THIS to your extracted KArSL folder:
KARSL_ROOT   = WORDS_ROOT / 'Datasets/KArSL_502'
OUTPUT_DIR   = WORDS_ROOT / 'ArSL Word (Arabic)'

# ===== HYPER-PARAMETERS =====
SEQUENCE_LENGTH = 30       # frames per sample
NUM_FEATURES    = 63       # 21 landmarks √ó 3 coords
BATCH_SIZE      = 32       # base batch size (CPU fallback)
EPOCHS          = 100      # max epochs
LEARNING_RATE   = 1e-3     # initial LR
LSTM_UNITS_1    = 128      # BiLSTM layer 1
LSTM_UNITS_2    = 64       # LSTM layer 2
DENSE_UNITS     = 128      # Dense layer before output
DROPOUT_RATE    = 0.3
TEST_SIZE       = 0.4      # val+test fraction ‚Üí 60/20/20

# If True, load pre-extracted .npy/.csv keypoints (faster)
# If False, extract from raw .mp4 videos using MediaPipe
USE_PREEXTRACTED_KEYPOINTS = True

OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

# Verify paths
for name, path in [('Shared CSV', SHARED_CSV), ('KArSL root', KARSL_ROOT)]:
    status = '‚úÖ' if path.exists() else '‚ùå NOT FOUND'
    print(f'{status} {name}: {path}')

print(f'\nüìÅ Output dir : {OUTPUT_DIR}')
print(f'\n‚öôÔ∏è  Sequence length : {SEQUENCE_LENGTH}')
print(f'‚öôÔ∏è  Features/frame  : {NUM_FEATURES}')
print(f'‚öôÔ∏è  Batch size      : {BATCH_SIZE} (auto-scales to 64 on GPU)')
print(f'‚öôÔ∏è  Max epochs      : {EPOCHS}')
print(f'‚öôÔ∏è  Pre-extracted   : {USE_PREEXTRACTED_KEYPOINTS}')


‚úÖ Shared CSV: E:\Term 9\Grad\Main\Sign-Language-Recognition-System-main\SLR Main\Words\Shared\shared_word_vocabulary.csv
‚ùå NOT FOUND KArSL root: E:\Term 9\Grad\Main\Sign-Language-Recognition-System-main\SLR Main\Words\Datasets\KArSL_502

üìÅ Output dir : E:\Term 9\Grad\Main\Sign-Language-Recognition-System-main\SLR Main\Words\ArSL Word (Arabic)

‚öôÔ∏è  Sequence length : 30
‚öôÔ∏è  Features/frame  : 63
‚öôÔ∏è  Batch size      : 32 (auto-scales to 64 on GPU)
‚öôÔ∏è  Max epochs      : 100
‚öôÔ∏è  Pre-extracted   : True


In [4]:
# ============================================
# Section 4: Load Shared Vocabulary
# ============================================
print('=' * 60)
print('üìö LOADING SHARED VOCABULARY')
print('=' * 60)

vocab_df = pd.read_csv(SHARED_CSV)
vocab_df = vocab_df.dropna(subset=['karsl_class'])
vocab_df['karsl_class'] = vocab_df['karsl_class'].astype(int)

karsl_to_wordid  = dict(zip(vocab_df['karsl_class'], vocab_df['word_id'].astype(int)))
id_to_english    = dict(zip(vocab_df['word_id'].astype(int), vocab_df['english']))
id_to_arabic     = dict(zip(vocab_df['word_id'].astype(int), vocab_df['arabic']))
target_karsl_classes = sorted(karsl_to_wordid.keys())

print(f'\nüìñ Matched vocabulary : {len(target_karsl_classes)} Arabic words')
print(f'   Categories         : {vocab_df["category"].nunique()} ‚Äî {", ".join(vocab_df["category"].unique())}')
print(f'   Sample classes     : {target_karsl_classes[:10]}...')


üìö LOADING SHARED VOCABULARY

üìñ Matched vocabulary : 157 Arabic words
   Categories         : 9 ‚Äî verb, object, adjective, family, health, direction, job, social, religion
   Sample classes     : [71, 83, 88, 90, 92, 104, 113, 114, 115, 116]...


In [5]:
# ============================================
# Section 5: Helper Functions
# ============================================

def pad_or_sample(sequence, target_len=SEQUENCE_LENGTH, target_features=NUM_FEATURES):
    """Pad (short) or uniformly sample (long) a sequence to fixed shape."""
    arr = np.array(sequence, dtype=np.float32)
    if arr.ndim != 2:
        return None

    # Adjust feature dimension
    if arr.shape[1] > target_features:
        arr = arr[:, :target_features]
    elif arr.shape[1] < target_features:
        pad_feat = np.zeros((arr.shape[0], target_features - arr.shape[1]), dtype=np.float32)
        arr = np.concatenate([arr, pad_feat], axis=1)

    # Adjust time dimension
    if arr.shape[0] >= target_len:
        idx = np.linspace(0, arr.shape[0] - 1, target_len, dtype=int)
        arr = arr[idx]
    else:
        pad_time = np.zeros((target_len - arr.shape[0], target_features), dtype=np.float32)
        arr = np.concatenate([arr, pad_time], axis=0)

    return arr  # shape: (SEQUENCE_LENGTH, NUM_FEATURES)


# MediaPipe for video extraction (only used if USE_PREEXTRACTED_KEYPOINTS=False)
mp_hands = mp_lib.solutions.hands

def extract_from_video(video_path):
    """Extract MediaPipe hand landmarks from a video file."""
    hands = mp_hands.Hands(
        static_image_mode=False,
        max_num_hands=1,
        min_detection_confidence=0.5,
        min_tracking_confidence=0.5
    )
    cap = cv2.VideoCapture(str(video_path))
    if not cap.isOpened():
        hands.close()
        return None

    frames = []
    while True:
        ret, frame = cap.read()
        if not ret:
            break
        rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        result = hands.process(rgb)

        if result.multi_hand_landmarks:
            lm = result.multi_hand_landmarks[0]
            vec = np.array([[p.x, p.y, p.z] for p in lm.landmark]).flatten()
        else:
            vec = np.zeros(NUM_FEATURES, dtype=np.float32)
        frames.append(vec)

    cap.release()
    hands.close()

    if len(frames) == 0:
        return None
    return pad_or_sample(np.array(frames, dtype=np.float32))

print('‚úÖ Helper functions defined')


‚úÖ Helper functions defined


In [6]:
# ============================================
# Section 6: Build Dataset (or Load Cached)
# ============================================
print('=' * 60)
print('üì¶ BUILDING ARABIC WORD DATASET')
print('=' * 60)

NPZ_PATH = OUTPUT_DIR / 'arsl_word_sequences.npz'

if NPZ_PATH.exists():
    print(f'\nüíæ Cached data found: {NPZ_PATH}')
    data = np.load(NPZ_PATH)
    X, y = data['X'], data['y']
    print(f'   X shape : {X.shape}')
    print(f'   y shape : {y.shape}')
    print(f'   Classes : {len(np.unique(y))}')
    print('   ‚úÖ Loaded from cache ‚Äî skipping extraction')
else:
    if not KARSL_ROOT.exists():
        print(f'\n‚ùå KArSL dataset NOT FOUND at: {KARSL_ROOT}')
        print('   Please download KArSL-502 from Kaggle:')
        print('   https://www.kaggle.com/datasets/yousefelkilany/karsl-502')
        print(f'   Extract it to: {KARSL_ROOT}')
        raise FileNotFoundError(f'KArSL dataset not found: {KARSL_ROOT}')

    print(f'\n‚è≥ Loading KArSL data from: {KARSL_ROOT}')
    start_time = time.time()

    X_list, y_list = [], []
    found_classes, empty_classes = 0, 0

    for karsl_class in tqdm(target_karsl_classes, desc='Loading KArSL classes'):
        word_id = int(karsl_to_wordid[karsl_class])

        # Try common folder naming conventions
        candidates = [
            KARSL_ROOT / str(karsl_class),
            KARSL_ROOT / f'{karsl_class:03d}',
            KARSL_ROOT / f'{karsl_class:04d}',
            KARSL_ROOT / f'class_{karsl_class}',
        ]
        class_dir = next((p for p in candidates if p.exists()), None)
        if class_dir is None:
            empty_classes += 1
            continue

        found_classes += 1

        # Collect all data files
        if USE_PREEXTRACTED_KEYPOINTS:
            files = list(class_dir.rglob('*.npy')) + list(class_dir.rglob('*.csv'))
        else:
            files = list(class_dir.rglob('*.mp4'))

        if not files:
            # Fallback: try all types
            files = list(class_dir.rglob('*.npy')) + list(class_dir.rglob('*.csv')) + list(class_dir.rglob('*.mp4'))

        for fp in files:
            seq = None
            try:
                if fp.suffix.lower() == '.npy':
                    arr = np.load(fp)
                    seq = pad_or_sample(arr)
                elif fp.suffix.lower() == '.csv':
                    arr = pd.read_csv(fp).values
                    seq = pad_or_sample(arr)
                elif fp.suffix.lower() == '.mp4':
                    seq = extract_from_video(fp)
            except Exception:
                continue

            if seq is None:
                continue

            # Skip blank sequences (<20% hand detection)
            blank_ratio = np.sum(np.all(seq == 0, axis=1)) / len(seq)
            if blank_ratio > 0.8:
                continue

            X_list.append(seq)
            y_list.append(word_id)

    elapsed = time.time() - start_time

    X = np.array(X_list, dtype=np.float32)
    y = np.array(y_list, dtype=np.int32)

    print(f'\n‚úÖ Dataset built in {elapsed:.1f}s ({elapsed/60:.1f} min)')
    print(f'   X shape       : {X.shape}')
    print(f'   y shape       : {y.shape}')
    print(f'   Classes found : {found_classes} / {len(target_karsl_classes)}')
    print(f'   Empty classes  : {empty_classes}')

    np.savez_compressed(NPZ_PATH, X=X, y=y)
    print(f'\nüíæ Saved: {NPZ_PATH}')


üì¶ BUILDING ARABIC WORD DATASET

‚ùå KArSL dataset NOT FOUND at: E:\Term 9\Grad\Main\Sign-Language-Recognition-System-main\SLR Main\Words\Datasets\KArSL_502
   Please download KArSL-502 from Kaggle:
   https://www.kaggle.com/datasets/yousefelkilany/karsl-502
   Extract it to: E:\Term 9\Grad\Main\Sign-Language-Recognition-System-main\SLR Main\Words\Datasets\KArSL_502


FileNotFoundError: KArSL dataset not found: E:\Term 9\Grad\Main\Sign-Language-Recognition-System-main\SLR Main\Words\Datasets\KArSL_502

In [None]:
# ============================================
# Section 7: Data Exploration
# ÿßÿ≥ÿ™ŸÉÿ¥ÿßŸÅ ÿßŸÑÿ®ŸäÿßŸÜÿßÿ™
# ============================================
print('=' * 60)
print('üìä DATA EXPLORATION / ÿßÿ≥ÿ™ŸÉÿ¥ÿßŸÅ ÿßŸÑÿ®ŸäÿßŸÜÿßÿ™')
print('=' * 60)

unique_ids, counts = np.unique(y, return_counts=True)
word_names_en = [id_to_english.get(int(uid), str(uid)) for uid in unique_ids]
word_names_ar = [id_to_arabic.get(int(uid), str(uid)) for uid in unique_ids]

# Sort by count descending
sort_idx = np.argsort(counts)[::-1]
sorted_names  = [f'{word_names_en[i]} / {word_names_ar[i]}' for i in sort_idx]
sorted_counts = counts[sort_idx]

# =============================================
# PLOT 1: Class distribution
# =============================================
fig, ax = plt.subplots(figsize=(24, 7))
ax.bar(range(len(sorted_names)), sorted_counts, color='darkgreen', edgecolor='black', linewidth=0.3)
ax.set_xticks(range(len(sorted_names)))
ax.set_xticklabels(sorted_names, rotation=90, fontsize=5)
ax.set_xlabel('Word (English / Arabic)', fontsize=12)
ax.set_ylabel('Number of Samples', fontsize=12)
ax.set_title(f'Arabic Word Dataset Distribution ‚Äî {len(unique_ids)} classes, {len(y)} total samples', fontsize=14)
ax.axhline(y=np.mean(sorted_counts), color='red', linestyle='--', alpha=0.7, label=f'Mean: {np.mean(sorted_counts):.1f}')
ax.axhline(y=np.median(sorted_counts), color='orange', linestyle=':', alpha=0.7, label=f'Median: {np.median(sorted_counts):.1f}')
ax.legend(fontsize=11)
plt.tight_layout()
plt.show()

print(f'\nüìä Dataset Summary:')
print(f'   Total samples    : {len(y)}')
print(f'   Total classes    : {len(unique_ids)}')
print(f'   Min samples/class: {counts.min()} ({word_names_en[counts.argmin()]})')
print(f'   Max samples/class: {counts.max()} ({word_names_en[counts.argmax()]})')
print(f'   Mean             : {counts.mean():.1f}')
print(f'   Std              : {counts.std():.1f}')
print(f'   Median           : {np.median(counts):.1f}')

low_sample = [(word_names_en[i], counts[i]) for i in range(len(counts)) if counts[i] < 5]
if low_sample:
    print(f'\n‚ö†Ô∏è  Classes with <5 samples ({len(low_sample)}):')
    for name, cnt in low_sample:
        print(f'   {name}: {cnt}')

# =============================================
# PLOT 2: Class frequency histogram + Zero-frame quality + Feature distribution
# =============================================
fig, axes = plt.subplots(1, 3, figsize=(22, 5))

# 2a: Histogram of samples-per-class
axes[0].hist(sorted_counts, bins=20, color='darkgreen', edgecolor='black', alpha=0.85)
axes[0].set_xlabel('Samples per Class', fontsize=11)
axes[0].set_ylabel('Number of Classes', fontsize=11)
axes[0].set_title('How Many Classes Have N Samples?', fontsize=13)
axes[0].axvline(x=np.mean(sorted_counts), color='red', linestyle='--', label=f'Mean: {np.mean(sorted_counts):.1f}')
axes[0].axvline(x=np.median(sorted_counts), color='orange', linestyle=':', label=f'Median: {np.median(sorted_counts):.1f}')
axes[0].legend(fontsize=9)

# 2b: Zero-frame quality analysis
zero_ratios = []
for i in range(len(X)):
    frame_sums = np.sum(np.abs(X[i]), axis=1)
    zero_frames = np.sum(frame_sums == 0)
    zero_ratios.append(zero_frames / SEQUENCE_LENGTH * 100)
zero_ratios = np.array(zero_ratios)

axes[1].hist(zero_ratios, bins=30, color='coral', edgecolor='darkred', alpha=0.85)
axes[1].set_xlabel('% Zero Frames per Sample', fontsize=11)
axes[1].set_ylabel('Number of Samples', fontsize=11)
axes[1].set_title(f'Sequence Quality ‚Äî Mean: {np.mean(zero_ratios):.1f}% zero frames', fontsize=13)
axes[1].axvline(x=np.mean(zero_ratios), color='red', linestyle='--', alpha=0.7)

# 2c: Feature value distribution (boxplot of x-coordinates per landmark)
subsample = X[:min(200, len(X))].reshape(-1, NUM_FEATURES)
landmark_x = [subsample[:, i*3] for i in range(21)]
bp = axes[2].boxplot(landmark_x, whis=1.5, showfliers=False, patch_artist=True,
                     boxprops=dict(facecolor='#C8E6C9', edgecolor='darkgreen'))
axes[2].set_xlabel('Landmark Index', fontsize=11)
axes[2].set_ylabel('X-Coordinate Value', fontsize=11)
axes[2].set_title('Landmark X-Coordinate Distribution (first 200 samples)', fontsize=13)
axes[2].set_xticklabels([f'{i}' for i in range(21)], fontsize=7)

plt.tight_layout()
plt.show()

print(f'\nüîç Quality Stats:')
print(f'   Samples with >50% zero frames: {np.sum(zero_ratios > 50)} ({np.sum(zero_ratios > 50)/len(zero_ratios)*100:.1f}%)')
print(f'   Samples with 0% zero frames  : {np.sum(zero_ratios == 0)} ({np.sum(zero_ratios == 0)/len(zero_ratios)*100:.1f}%)')

# =============================================
# PLOT 3: 2D Hand Landmark Trajectories
# =============================================
sample_idx = 0
fig, axes = plt.subplots(1, 3, figsize=(20, 5))
sample = X[sample_idx]
colors = np.arange(SEQUENCE_LENGTH)

# Wrist trajectory (landmark 0)
sc0 = axes[0].scatter(sample[:, 0], sample[:, 1], c=colors, cmap='viridis', s=25, zorder=2)
axes[0].plot(sample[:, 0], sample[:, 1], 'k-', alpha=0.2, linewidth=0.5, zorder=1)
axes[0].set_xlabel('X', fontsize=11)
axes[0].set_ylabel('Y', fontsize=11)
axes[0].set_title('Wrist (L0) Trajectory', fontsize=13)
axes[0].invert_yaxis()
plt.colorbar(sc0, ax=axes[0], label='Frame')

# Index fingertip trajectory (landmark 8: index 24,25)
sc1 = axes[1].scatter(sample[:, 24], sample[:, 25], c=colors, cmap='plasma', s=25, zorder=2)
axes[1].plot(sample[:, 24], sample[:, 25], 'k-', alpha=0.2, linewidth=0.5, zorder=1)
axes[1].set_xlabel('X', fontsize=11)
axes[1].set_ylabel('Y', fontsize=11)
axes[1].set_title('Index Fingertip (L8) Trajectory', fontsize=13)
axes[1].invert_yaxis()
plt.colorbar(sc1, ax=axes[1], label='Frame')

# Middle fingertip trajectory (landmark 12: index 36,37)
sc2 = axes[2].scatter(sample[:, 36], sample[:, 37], c=colors, cmap='coolwarm', s=25, zorder=2)
axes[2].plot(sample[:, 36], sample[:, 37], 'k-', alpha=0.2, linewidth=0.5, zorder=1)
axes[2].set_xlabel('X', fontsize=11)
axes[2].set_ylabel('Y', fontsize=11)
axes[2].set_title('Middle Fingertip (L12) Trajectory', fontsize=13)
axes[2].invert_yaxis()
plt.colorbar(sc2, ax=axes[2], label='Frame')

word_en = id_to_english.get(int(y[sample_idx]), '?')
word_ar = id_to_arabic.get(int(y[sample_idx]), '?')
plt.suptitle(f'Hand Landmark Trajectories ‚Äî "{word_en}" / "{word_ar}" (color = time)', fontsize=14, y=1.02)
plt.tight_layout()
plt.show()

# =============================================
# PLOT 4: Landmark Heatmap of sample sequence
# =============================================
plt.figure(figsize=(14, 5))
plt.imshow(X[sample_idx].T, aspect='auto', cmap='viridis', interpolation='nearest')
plt.colorbar(label='Coordinate value')
plt.xlabel('Frame (time step)', fontsize=12)
plt.ylabel('Feature index (landmark √ó coord)', fontsize=12)
plt.title(f'Landmark Heatmap: "{word_en}" / "{word_ar}" ‚Äî shape {X[sample_idx].shape}', fontsize=14)

# Add landmark boundary lines
for lm in range(1, 21):
    plt.axhline(y=lm*3 - 0.5, color='white', linewidth=0.3, alpha=0.5)
plt.tight_layout()
plt.show()


In [None]:
# ============================================
# Section 8: Preprocessing & Splits
# ============================================
print('=' * 60)
print('üîß PREPROCESSING & TRAIN/VAL/TEST SPLIT')
print('=' * 60)

# Reload from cache
data = np.load(NPZ_PATH)
X, y = data['X'], data['y']

# StandardScaler normalization
original_shape = X.shape
X_flat = X.reshape(-1, NUM_FEATURES)
scaler = StandardScaler()
X_flat = scaler.fit_transform(X_flat)
X = X_flat.reshape(original_shape).astype(np.float32)
print(f'   ‚úÖ StandardScaler applied: mean‚âà0, std‚âà1 per feature')

# Encode labels
encoder = LabelEncoder()
y_encoded = encoder.fit_transform(y)
num_classes = len(encoder.classes_)
y_onehot = to_categorical(y_encoded, num_classes=num_classes)

# Stratified split: 60/20/20
X_train, X_temp, y_train, y_temp = train_test_split(
    X, y_onehot, test_size=TEST_SIZE, random_state=42, stratify=y_encoded
)
temp_labels = np.argmax(y_temp, axis=1)
X_val, X_test, y_val, y_test = train_test_split(
    X_temp, y_temp, test_size=0.5, random_state=42, stratify=temp_labels
)

# Class weights for imbalanced data
train_labels = np.argmax(y_train, axis=1)
cw = compute_class_weight('balanced', classes=np.arange(num_classes), y=train_labels)
class_weights = dict(enumerate(cw))

print(f'\nüìä Split Summary:')
print(f'   Classes      : {num_classes}')
print(f'   Train        : {X_train.shape[0]} ({X_train.shape[0]/len(X)*100:.0f}%)')
print(f'   Validation   : {X_val.shape[0]} ({X_val.shape[0]/len(X)*100:.0f}%)')
print(f'   Test         : {X_test.shape[0]} ({X_test.shape[0]/len(X)*100:.0f}%)')
print(f'   Input shape  : {X_train.shape[1:]}')
print(f'   Class weights: balanced ‚Äî max: {max(cw):.2f}')


In [None]:
# ============================================
# Section 9: Build & Train BiLSTM (GPU-Optimized)
# ============================================
print('=' * 60)
print('üöÄ BUILDING & TRAINING BiLSTM MODEL')
print('=' * 60)

# Clear previous session for clean GPU memory state
tf.keras.backend.clear_session()

# Adaptive batch size: GPU uses larger batches for throughput
BATCH_SIZE_TRAIN = 64 if USE_GPU else BATCH_SIZE
print(f'   üì¶ Batch size (auto): {BATCH_SIZE_TRAIN} ({"GPU" if USE_GPU else "CPU"})')

# --- tf.data pipeline ---
AUTOTUNE = tf.data.AUTOTUNE

train_ds = tf.data.Dataset.from_tensor_slices((X_train, y_train))
train_ds = train_ds.shuffle(buffer_size=min(len(X_train), 10000), seed=42, reshuffle_each_iteration=True)
train_ds = train_ds.batch(BATCH_SIZE_TRAIN).prefetch(AUTOTUNE)

val_ds = tf.data.Dataset.from_tensor_slices((X_val, y_val))
val_ds = val_ds.batch(BATCH_SIZE_TRAIN).prefetch(AUTOTUNE)

print(f'   ‚úÖ tf.data pipelines created (shuffle + batch + prefetch)')

# --- Build model (identical architecture to ASL Word notebook) ---
# ‚ö° IMPORTANT: recurrent_dropout is intentionally NOT used.
# When recurrent_dropout=0, TensorFlow uses NVIDIA cuDNN LSTM kernels
# which are 5-10x faster on GPU. Regular Dropout layers provide
# the same regularization effect.

with tf.device(DEVICE):
    model = Sequential([
        Input(shape=(SEQUENCE_LENGTH, NUM_FEATURES), name='landmark_sequence'),

        # BiLSTM block 1 ‚Äî reads forward + backward (cuDNN-accelerated)
        Bidirectional(LSTM(LSTM_UNITS_1, return_sequences=True), name='bilstm_1'),
        BatchNormalization(name='bn_1'),
        Dropout(DROPOUT_RATE, name='drop_1'),

        # LSTM block 2 ‚Äî outputs final hidden state (cuDNN-accelerated)
        LSTM(LSTM_UNITS_2, name='lstm_2'),
        BatchNormalization(name='bn_2'),
        Dropout(DROPOUT_RATE, name='drop_2'),

        # Dense classifier (with L2 regularization + He initialization)
        Dense(DENSE_UNITS, activation='relu',
              kernel_initializer='he_normal',
              kernel_regularizer=tf.keras.regularizers.l2(1e-4),
              name='dense_1'),
        Dropout(DROPOUT_RATE - 0.1, name='drop_3'),
        Dense(num_classes, activation='softmax', dtype='float32', name='output')
    ], name='ArSL_Word_BiLSTM')

# Use legacy Adam for better GPU + mixed precision compatibility
optimizer = tf.keras.optimizers.legacy.Adam(learning_rate=LEARNING_RATE)
model.compile(
    optimizer=optimizer,
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

print('\nüìê Model Architecture:')
model.summary()
print(f'\nüñ•Ô∏è  Model will train on: {DEVICE}')

# --- Callbacks ---
callbacks = [
    ModelCheckpoint(
        str(OUTPUT_DIR / 'arsl_word_lstm_model_best.h5'),
        monitor='val_accuracy', save_best_only=True, mode='max', verbose=1
    ),
    EarlyStopping(
        monitor='val_loss', patience=15, restore_best_weights=True, verbose=1
    ),
    ReduceLROnPlateau(
        monitor='val_loss', factor=0.5, patience=5, min_lr=1e-7, verbose=1
    )
]

# --- Train (GPU-accelerated) ---
print(f'\nüöÄ Starting training...')
print(f'   Device       : {DEVICE}')
print(f'   Batch size   : {BATCH_SIZE_TRAIN}')
print(f'   Max epochs   : {EPOCHS}')
print(f'   LR           : {LEARNING_RATE}')
print(f'   cuDNN LSTM   : ‚úÖ enabled (no recurrent_dropout)')
print(f'   Class weights: ‚úÖ enabled (balanced)')
start_time = time.time()

with tf.device(DEVICE):
    history = model.fit(
        train_ds,
        validation_data=val_ds,
        epochs=EPOCHS,
        callbacks=callbacks,
        class_weight=class_weights,
        verbose=1
    )

training_time = time.time() - start_time
print(f'\n‚úÖ Training complete in {training_time:.1f}s ({training_time/60:.1f} min)')
print(f'   Best val_accuracy: {max(history.history["val_accuracy"]):.4f}')
print(f'   Final LR         : {model.optimizer.learning_rate.numpy():.2e}')

# Save
model.save(str(OUTPUT_DIR / 'arsl_word_lstm_model_final.h5'))
class_df = pd.DataFrame({
    'model_class_index': range(num_classes),
    'word_id': encoder.classes_.tolist()
})
class_df.to_csv(OUTPUT_DIR / 'arsl_word_classes.csv', index=False)
print(f'\nüíæ Final model : {OUTPUT_DIR / "arsl_word_lstm_model_final.h5"}')
print(f'üíæ Best model  : {OUTPUT_DIR / "arsl_word_lstm_model_best.h5"}')
print(f'üíæ Class map   : {OUTPUT_DIR / "arsl_word_classes.csv"}')


In [None]:
# ============================================
# Section 10: Evaluation & Visualization Dashboard
# ÿßŸÑÿ™ŸÇŸäŸäŸÖ ŸàŸÑŸàÿ≠ÿ© ÿßŸÑÿ™ÿµŸàÿ± ÿßŸÑÿ®ÿµÿ±Ÿä
# ============================================
print('=' * 60)
print('üìà MODEL EVALUATION & VISUALIZATION DASHBOARD')
print('=' * 60)

# Load best checkpoint
best_model = tf.keras.models.load_model(str(OUTPUT_DIR / 'arsl_word_lstm_model_best.h5'))

# Predict using optimized pipeline
eval_batch = 64 if USE_GPU else 32
eval_ds = tf.data.Dataset.from_tensor_slices((X_test,)).batch(eval_batch).prefetch(tf.data.AUTOTUNE)

with tf.device(DEVICE):
    proba = best_model.predict(eval_ds, verbose=0)

y_pred = np.argmax(proba, axis=1)
y_true = np.argmax(y_test, axis=1)

# Top-1 accuracy
top1_acc = (y_pred == y_true).mean()

# Top-5 accuracy
top5_correct = 0
for i in range(len(y_true)):
    top5 = np.argsort(proba[i])[-5:]
    if y_true[i] in top5:
        top5_correct += 1
top5_acc = top5_correct / len(y_true)

print(f'\nüéØ Test Results:')
print(f'   Top-1 Accuracy : {top1_acc:.4f} ({top1_acc*100:.2f}%)')
print(f'   Top-5 Accuracy : {top5_acc:.4f} ({top5_acc*100:.2f}%)')
print(f'   Test samples   : {len(y_true)}')
print(f'   Classes         : {num_classes}')

# =============================================
# PLOT 1: Training Dashboard (4 panels)
# =============================================
fig, axes = plt.subplots(2, 2, figsize=(18, 12))

# 1a: Accuracy curves
axes[0, 0].plot(history.history['accuracy'], label='Train', linewidth=2, color='#2E7D32')
axes[0, 0].plot(history.history['val_accuracy'], label='Validation', linewidth=2, color='#FF9800')
axes[0, 0].fill_between(range(len(history.history['accuracy'])),
                         history.history['accuracy'], history.history['val_accuracy'],
                         alpha=0.1, color='red')
axes[0, 0].set_title('Accuracy over Epochs', fontsize=14, fontweight='bold')
axes[0, 0].set_xlabel('Epoch')
axes[0, 0].set_ylabel('Accuracy')
axes[0, 0].legend(fontsize=11)
axes[0, 0].grid(True, alpha=0.3)
axes[0, 0].set_ylim([0, 1.05])
best_epoch = np.argmax(history.history['val_accuracy'])
axes[0, 0].axvline(x=best_epoch, color='blue', linestyle=':', alpha=0.5, label=f'Best: epoch {best_epoch}')

# 1b: Loss curves
axes[0, 1].plot(history.history['loss'], label='Train', linewidth=2, color='#2E7D32')
axes[0, 1].plot(history.history['val_loss'], label='Validation', linewidth=2, color='#FF9800')
axes[0, 1].fill_between(range(len(history.history['loss'])),
                         history.history['loss'], history.history['val_loss'],
                         alpha=0.1, color='red')
axes[0, 1].set_title('Loss over Epochs', fontsize=14, fontweight='bold')
axes[0, 1].set_xlabel('Epoch')
axes[0, 1].set_ylabel('Loss')
axes[0, 1].legend(fontsize=11)
axes[0, 1].grid(True, alpha=0.3)

# 1c: Learning Rate schedule
if 'lr' in history.history:
    lr_values = history.history['lr']
else:
    lr_values = [LEARNING_RATE] * len(history.history['loss'])
axes[1, 0].plot(lr_values, linewidth=2, color='#4CAF50', marker='o', markersize=3)
axes[1, 0].set_title('Learning Rate Schedule', fontsize=14, fontweight='bold')
axes[1, 0].set_xlabel('Epoch')
axes[1, 0].set_ylabel('Learning Rate')
axes[1, 0].set_yscale('log')
axes[1, 0].grid(True, alpha=0.3)

# 1d: Overfitting gap (train_acc - val_acc)
train_acc = np.array(history.history['accuracy'])
val_acc = np.array(history.history['val_accuracy'])
gap = train_acc - val_acc
axes[1, 1].bar(range(len(gap)), gap, color=['green' if g < 0.05 else 'orange' if g < 0.15 else 'red' for g in gap],
               edgecolor='black', linewidth=0.3, alpha=0.8)
axes[1, 1].axhline(y=0.05, color='green', linestyle='--', alpha=0.5, label='Healthy gap (5%)')
axes[1, 1].axhline(y=0.15, color='red', linestyle='--', alpha=0.5, label='Overfitting threshold (15%)')
axes[1, 1].set_title('Overfitting Monitor (Train - Val Accuracy)', fontsize=14, fontweight='bold')
axes[1, 1].set_xlabel('Epoch')
axes[1, 1].set_ylabel('Accuracy Gap')
axes[1, 1].legend(fontsize=9)
axes[1, 1].grid(True, alpha=0.3)

plt.suptitle(f'Arabic Word BiLSTM Training Dashboard ‚Äî Top-1: {top1_acc*100:.1f}%, Top-5: {top5_acc*100:.1f}%',
             fontsize=16, fontweight='bold', y=1.01)
plt.tight_layout()
plt.show()

# =============================================
# PLOT 2: Prediction Confidence Distribution
# =============================================
fig, axes = plt.subplots(1, 2, figsize=(16, 5))

correct_mask = y_pred == y_true
correct_conf = np.max(proba[correct_mask], axis=1)
wrong_conf   = np.max(proba[~correct_mask], axis=1) if np.sum(~correct_mask) > 0 else np.array([])

axes[0].hist(correct_conf, bins=30, alpha=0.7, color='#4CAF50', edgecolor='darkgreen', label=f'Correct ({len(correct_conf)})')
if len(wrong_conf) > 0:
    axes[0].hist(wrong_conf, bins=30, alpha=0.7, color='#F44336', edgecolor='darkred', label=f'Wrong ({len(wrong_conf)})')
axes[0].set_xlabel('Prediction Confidence', fontsize=12)
axes[0].set_ylabel('Count', fontsize=12)
axes[0].set_title('Confidence Distribution: Correct vs Wrong', fontsize=13)
axes[0].legend(fontsize=11)
axes[0].axvline(x=0.5, color='gray', linestyle='--', alpha=0.5)

# Confidence margin (top-1 vs top-2)
top1_conf = np.max(proba, axis=1)
sorted_proba = np.sort(proba, axis=1)[:, ::-1]
top2_conf = sorted_proba[:, 1] if proba.shape[1] > 1 else np.zeros(len(proba))
margin = top1_conf - top2_conf

axes[1].hist(margin, bins=30, color='#9C27B0', edgecolor='purple', alpha=0.8)
axes[1].set_xlabel('Confidence Margin (Top-1 ‚àí Top-2)', fontsize=12)
axes[1].set_ylabel('Count', fontsize=12)
axes[1].set_title(f'Decision Margin ‚Äî Mean: {np.mean(margin):.3f}', fontsize=13)
axes[1].axvline(x=np.mean(margin), color='red', linestyle='--', alpha=0.7)

plt.tight_layout()
plt.show()

# =============================================
# PLOT 3: Classification Report with bilingual labels
# =============================================
word_labels = []
for cls_idx in range(num_classes):
    wid = int(encoder.classes_[cls_idx])
    en = id_to_english.get(wid, str(wid))
    ar = id_to_arabic.get(wid, '')
    word_labels.append(f'{en}/{ar}')

# Short labels for bar charts (English only)
word_labels_short = []
for cls_idx in range(num_classes):
    wid = int(encoder.classes_[cls_idx])
    word_labels_short.append(id_to_english.get(wid, str(wid)))

print('\nüìã Classification Report:')
report = classification_report(y_true, y_pred, target_names=word_labels, zero_division=0, output_dict=True)
print(classification_report(y_true, y_pred, target_names=word_labels, zero_division=0))

# =============================================
# PLOT 4: Per-Class F1 Score Bar Chart
# =============================================
class_f1 = {k: v['f1-score'] for k, v in report.items() if k in word_labels}
sorted_f1 = sorted(class_f1.items(), key=lambda x: x[1], reverse=True)
f1_names = [x[0] for x in sorted_f1]
f1_vals  = [x[1] for x in sorted_f1]

fig, ax = plt.subplots(figsize=(24, 6))
colors_f1 = ['#4CAF50' if v >= 0.7 else '#FF9800' if v >= 0.4 else '#F44336' for v in f1_vals]
ax.bar(range(len(f1_names)), f1_vals, color=colors_f1, edgecolor='black', linewidth=0.3)
ax.set_xticks(range(len(f1_names)))
ax.set_xticklabels(f1_names, rotation=90, fontsize=5)
ax.set_xlabel('Word', fontsize=12)
ax.set_ylabel('F1 Score', fontsize=12)
ax.set_title(f'Per-Class F1 Score (green ‚â•0.7, orange ‚â•0.4, red <0.4) ‚Äî Mean: {np.mean(f1_vals):.3f}', fontsize=14)
ax.axhline(y=np.mean(f1_vals), color='blue', linestyle='--', alpha=0.5, label=f'Mean F1: {np.mean(f1_vals):.3f}')
ax.legend(fontsize=11)
ax.set_ylim([0, 1.05])
plt.tight_layout()
plt.show()

# =============================================
# PLOT 5: Confusion Matrix (enhanced)
# =============================================
cm = confusion_matrix(y_true, y_pred)

fig, ax = plt.subplots(figsize=(20, 18))
if num_classes <= 50:
    sns.heatmap(cm, annot=True, fmt='d', cmap='Greens',
                xticklabels=word_labels, yticklabels=word_labels, ax=ax,
                linewidths=0.5, linecolor='lightgray')
else:
    sns.heatmap(cm, annot=False, cmap='Greens',
                xticklabels=word_labels, yticklabels=word_labels, ax=ax)
ax.set_title(f'Arabic Confusion Matrix ‚Äî {num_classes} classes (Top-1: {top1_acc*100:.1f}%)', fontsize=15)
ax.set_xlabel('Predicted', fontsize=13)
ax.set_ylabel('True', fontsize=13)
plt.xticks(rotation=90, fontsize=5)
plt.yticks(fontsize=5)
plt.tight_layout()
plt.show()

# =============================================
# PLOT 6: Top-10 Most Confused Pairs
# =============================================
np.fill_diagonal(cm, 0)
confused_pairs = []
for i in range(num_classes):
    for j in range(num_classes):
        if cm[i, j] > 0:
            confused_pairs.append((word_labels_short[i], word_labels_short[j], cm[i, j]))
confused_pairs.sort(key=lambda x: x[2], reverse=True)
top_confused = confused_pairs[:10]

if top_confused:
    fig, ax = plt.subplots(figsize=(14, 6))
    pair_labels = [f'{p[0]} ‚Üí {p[1]}' for p in top_confused]
    pair_counts = [p[2] for p in top_confused]
    bars = ax.barh(range(len(pair_labels)), pair_counts, color='#E91E63', edgecolor='darkred', alpha=0.85)
    ax.set_yticks(range(len(pair_labels)))
    ax.set_yticklabels(pair_labels, fontsize=10)
    ax.set_xlabel('Misclassification Count', fontsize=12)
    ax.set_title('Top-10 Most Confused Pairs (True ‚Üí Predicted)', fontsize=14, fontweight='bold')
    ax.invert_yaxis()
    for bar, count in zip(bars, pair_counts):
        ax.text(bar.get_width() + 0.3, bar.get_y() + bar.get_height()/2,
                str(count), va='center', fontsize=10, fontweight='bold')
    plt.tight_layout()
    plt.show()

# =============================================
# PLOT 7: Per-Category Accuracy (bar chart)
# =============================================
cat_map = dict(zip(vocab_df['word_id'].astype(int), vocab_df['category']))
category_correct, category_total = {}, {}
for i in range(len(y_true)):
    wid = int(encoder.classes_[y_true[i]])
    cat = cat_map.get(wid, 'unknown')
    category_total[cat] = category_total.get(cat, 0) + 1
    if y_pred[i] == y_true[i]:
        category_correct[cat] = category_correct.get(cat, 0) + 1

cat_names = sorted(category_total.keys())
cat_accs  = [category_correct.get(c, 0) / category_total[c] for c in cat_names]
cat_sizes = [category_total[c] for c in cat_names]

fig, ax1 = plt.subplots(figsize=(14, 6))

x_pos = range(len(cat_names))
bars = ax1.bar(x_pos, [a * 100 for a in cat_accs], color='#2E7D32', edgecolor='black', alpha=0.85, label='Accuracy %')
ax1.set_xticks(x_pos)
ax1.set_xticklabels(cat_names, rotation=45, ha='right', fontsize=11)
ax1.set_ylabel('Accuracy (%)', fontsize=12)
ax1.set_ylim([0, 105])

for bar, acc, size in zip(bars, cat_accs, cat_sizes):
    ax1.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 1,
             f'{acc*100:.1f}%\n(n={size})', ha='center', va='bottom', fontsize=9, fontweight='bold')

ax1.set_title('Per-Category Accuracy with Sample Counts', fontsize=14, fontweight='bold')
ax1.axhline(y=top1_acc*100, color='red', linestyle='--', alpha=0.5, label=f'Overall: {top1_acc*100:.1f}%')
ax1.legend(fontsize=10)
plt.tight_layout()
plt.show()

# =============================================
# PLOT 8: Best & Worst Performing Classes
# =============================================
per_class_acc = {}
for i in range(num_classes):
    mask = y_true == i
    if mask.sum() > 0:
        per_class_acc[word_labels_short[i]] = (y_pred[mask] == i).mean()

sorted_acc = sorted(per_class_acc.items(), key=lambda x: x[1])
n_show = min(10, len(sorted_acc))

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(18, 6))

# Worst
worst = sorted_acc[:n_show]
ax1.barh(range(len(worst)), [w[1]*100 for w in worst], color='#F44336', edgecolor='darkred', alpha=0.85)
ax1.set_yticks(range(len(worst)))
ax1.set_yticklabels([w[0] for w in worst], fontsize=10)
ax1.set_xlabel('Accuracy (%)', fontsize=12)
ax1.set_title(f'Bottom {n_show} Performing Classes', fontsize=14, fontweight='bold', color='#F44336')
ax1.set_xlim([0, 105])
for i, w in enumerate(worst):
    ax1.text(w[1]*100 + 1, i, f'{w[1]*100:.1f}%', va='center', fontsize=10)

# Best
best = sorted_acc[-n_show:][::-1]
ax2.barh(range(len(best)), [b[1]*100 for b in best], color='#4CAF50', edgecolor='darkgreen', alpha=0.85)
ax2.set_yticks(range(len(best)))
ax2.set_yticklabels([b[0] for b in best], fontsize=10)
ax2.set_xlabel('Accuracy (%)', fontsize=12)
ax2.set_title(f'Top {n_show} Performing Classes', fontsize=14, fontweight='bold', color='#4CAF50')
ax2.set_xlim([0, 105])
for i, b in enumerate(best):
    ax2.text(b[1]*100 + 1, i, f'{b[1]*100:.1f}%', va='center', fontsize=10)

plt.suptitle('Best vs Worst Performing Classes', fontsize=15, fontweight='bold', y=1.01)
plt.tight_layout()
plt.show()

# =============================================
# PLOT 9: Precision vs Recall Scatter
# =============================================
precisions = [report[w]['precision'] for w in word_labels if w in report]
recalls = [report[w]['recall'] for w in word_labels if w in report]
f1s = [report[w]['f1-score'] for w in word_labels if w in report]
labels_in_report = [w for w in word_labels if w in report]

fig, ax = plt.subplots(figsize=(10, 8))
scatter = ax.scatter(recalls, precisions, c=f1s, cmap='RdYlGn', s=50, edgecolors='black', linewidth=0.5, alpha=0.8)
plt.colorbar(scatter, label='F1 Score', ax=ax)
ax.set_xlabel('Recall', fontsize=13)
ax.set_ylabel('Precision', fontsize=13)
ax.set_title('Precision vs Recall per Class (color = F1)', fontsize=14, fontweight='bold')
ax.set_xlim([-0.05, 1.05])
ax.set_ylim([-0.05, 1.05])
ax.plot([0, 1], [0, 1], 'k--', alpha=0.2)
ax.grid(True, alpha=0.3)

# Annotate worst classes
for i, lbl in enumerate(labels_in_report):
    if f1s[i] < 0.3:
        ax.annotate(lbl, (recalls[i], precisions[i]), fontsize=6, alpha=0.8,
                    xytext=(5, 5), textcoords='offset points')
plt.tight_layout()
plt.show()

print('\n' + '=' * 60)
print('‚úÖ Evaluation & Visualization Dashboard complete!')
print('=' * 60)


## Tips & Troubleshooting / ŸÜÿµÿßÿ¶ÿ≠ Ÿàÿ≠ŸÑ ÿßŸÑŸÖÿ¥ÿßŸÉŸÑ

| Issue / ÿßŸÑŸÖÿ¥ŸÉŸÑÿ©            | Solution / ÿßŸÑÿ≠ŸÑ                                                              |
| -------------------------- | ---------------------------------------------------------------------------- |
| **OOM (Out of Memory)**    | Reduce `BATCH_SIZE` to 64 or 32 / ŸÇŸÑŸÑ ÿ≠ÿ¨ŸÖ ÿßŸÑÿØŸèŸÅÿπÿ©                            |
| **No GPU detected**        | Install `tensorflow[and-cuda]` or check CUDA/cuDNN / ÿ´ÿ®Ÿëÿ™ tensorflow ŸÖÿπ CUDA |
| **Slow training**          | Ensure GPU is being used (check Cell 2 output) / ÿ™ÿ£ŸÉÿØ ÿ•ŸÜ ÿßŸÑŸÄ GPU ÿ¥ÿ∫ŸëÿßŸÑ       |
| **Low accuracy**           | Increase epochs, add more data, or tune LSTM units / ÿ≤ŸàŸëÿØ ÿßŸÑÿ≠ŸÇÿ® ÿ£Ÿà ÿßŸÑÿ®ŸäÿßŸÜÿßÿ™  |
| **Mixed precision errors** | Remove the mixed precision block in Cell 2 / ÿßÿ≠ÿ∞ŸÅ ŸÉŸàÿØ ÿßŸÑÿØŸÇÿ© ÿßŸÑŸÖÿÆÿ™ŸÑÿ∑ÿ©         |

### Monitor GPU / ŸÖÿ±ÿßŸÇÿ®ÿ© ÿßŸÑŸÄ GPU:

```powershell
nvidia-smi -l 1
```

### Key Differences from Letter Training / ÿßŸÑŸÅÿ±ŸàŸÇÿßÿ™ ÿπŸÜ ÿ™ÿØÿ±Ÿäÿ® ÿßŸÑÿ≠ÿ±ŸàŸÅ:

- Letters use **MLP** (flat keypoints per image) / ÿßŸÑÿ≠ÿ±ŸàŸÅ ÿ™ÿ≥ÿ™ÿÆÿØŸÖ MLP (ŸÜŸÇÿßÿ∑ ŸÖÿ≥ÿ∑ÿ≠ÿ© ŸÑŸÉŸÑ ÿµŸàÿ±ÿ©)
- Words use **BiLSTM** (sequences of keypoints over time) / ÿßŸÑŸÉŸÑŸÖÿßÿ™ ÿ™ÿ≥ÿ™ÿÆÿØŸÖ BiLSTM (ÿ™ÿ≥ŸÑÿ≥ŸÑÿßÿ™ ŸÜŸÇÿßÿ∑ ÿπÿ®ÿ± ÿßŸÑÿ≤ŸÖŸÜ)
- Words need `SEQUENCE_LENGTH` frames per sample / ÿßŸÑŸÉŸÑŸÖÿßÿ™ ÿ™ÿ≠ÿ™ÿßÿ¨ ÿπÿØÿØ ÿ•ÿ∑ÿßÿ±ÿßÿ™ ÿ´ÿßÿ®ÿ™ ŸÑŸÉŸÑ ÿπŸäŸÜÿ©
