# Final MediaPipe Notebook for Arabic Sign Language Letter Recognition

This comprehensive notebook combines the best features from all previous notebooks for training and deploying a MediaPipe-based Arabic sign language recognition system.

## Key Features:

- **GPU Optimization**: Automatic GPU detection and configuration
- **Mixed Precision Training**: Faster training on supported GPUs
- **Data Extraction**: MediaPipe keypoint extraction from your Arabic dataset
- **Model Training**: GPU-optimized MLP model with early stopping and learning rate scheduling
- **Real-Time Inference**: Webcam-based live prediction with Arabic letter recognition
- **Memory Management**: Efficient data pipelines and memory-conscious batch sizing


In [1]:
# ============================================
# Section 1: Import Required Libraries
# ============================================
import os
import time
import cv2
import numpy as np
import pandas as pd
from tqdm import tqdm
import tensorflow as tf
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, BatchNormalization
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping, ReduceLROnPlateau
from tensorflow.keras import mixed_precision
import mediapipe as mp

print('=' * 60)
print('‚úÖ All libraries imported successfully!')
print('=' * 60)


An error occurred: module 'importlib.metadata' has no attribute 'packages_distributions'




‚úÖ All libraries imported successfully!


## Section 2: GPU Detection and Configuration

This section automatically detects your GPU, configures TensorFlow for optimal GPU usage, and enables mixed precision training if supported.


In [2]:
# ============================================
# Section 2: GPU DETECTION & CONFIGURATION
# ============================================
print('=' * 60)
print('üîç GPU DETECTION & CONFIGURATION')
print('=' * 60)
print(f'\nTensorFlow version: {tf.__version__}')

gpus = tf.config.list_physical_devices('GPU')
print(f'Found GPUs: {gpus}')

USE_GPU = False
DEVICE = '/CPU:0'

if gpus:
    try:
        for g in gpus:
            tf.config.experimental.set_memory_growth(g, True)
        tf.config.set_visible_devices(gpus[0], 'GPU')
        USE_GPU = True
        DEVICE = '/GPU:0'
        print(f'‚úÖ GPU configured: {gpus[0]}')
    except RuntimeError as e:
        print(f'‚ö†Ô∏è  GPU config error: {e}')

# Mixed precision (optional and beneficial on modern GPUs)
try:
    if USE_GPU:
        policy = mixed_precision.Policy('mixed_float16')
        mixed_precision.set_global_policy(policy)
        print(f'‚ö° Mixed precision enabled: {policy.name}')
except Exception as e:
    print(f'‚ö†Ô∏è  Mixed precision not enabled: {e}')

print(f'\n‚úÖ Configuration complete. Using device: {DEVICE}')
print('=' * 60)


üîç GPU DETECTION & CONFIGURATION

TensorFlow version: 2.10.0
Found GPUs: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
‚úÖ GPU configured: PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')
Your GPU may run slowly with dtype policy mixed_float16 because it does not have compute capability of at least 7.0. Your GPU:
  NVIDIA GeForce MX150, compute capability 6.1
See https://developer.nvidia.com/cuda-gpus for a list of GPUs and their compute capabilities.
‚ö° Mixed precision enabled: mixed_float16

‚úÖ Configuration complete. Using device: /GPU:0


## Section 3: Batch Size and Memory Tips

**Important Memory Management:**

- If you get 'Out of Memory' errors, reduce `BATCH_SIZE` (try 128 or 64)
- Or disable mixed precision by setting the policy to 'float32'
- Monitor GPU usage with `nvidia-smi -l 1` in a separate PowerShell terminal
- Close other GPU-intensive applications (browser, video software, etc.) during training
- MLP models are memory-efficient - batch size 256 typically requires ~1.5-2.5 GB


In [None]:
# ============================================
# Section 4: EXTRACT KEYPOINTS FROM IMAGES
# ============================================
import os
import cv2
import mediapipe as mp
import pandas as pd
import numpy as np
from tqdm import tqdm # Progress bar

# ‚ö†Ô∏è 1. UPDATE THIS PATH TO YOUR IMAGE FOLDER
# It should be the folder containing subfolders like 'Ain', 'Alif', 'Baa'...
DATASET_DIR = r"M:\Term 9\Grad\Main\Sign-Language-Recognition-System-main\Sign-Language-Recognition-System-main\Sign_to_Sentence Project Main\Datasets\Dataset (ArASL)\ArASL Database\ArASL_Database"

# 2. Output File Name
CSV_PATH = 'my_custom_arabic_keypoints.csv'

# Setup MediaPipe
mp_hands = mp.solutions.hands
hands = mp_hands.Hands(static_image_mode=True, max_num_hands=1, min_detection_confidence=0.5)

print(f"üöÄ Starting Extraction from: {DATASET_DIR}")

data = []
labels = []
classes = sorted(os.listdir(DATASET_DIR)) # Get folder names (labels)

if not classes:
    print("‚ùå ERROR: No folders found! Check your path.")
else:
    print(f"üìÇ Found {len(classes)} classes: {classes[:5]}...")

    # Loop through every folder (Class)
    for folder_name in tqdm(classes, desc="Processing Classes"):
        folder_path = os.path.join(DATASET_DIR, folder_name)
        
        if os.path.isdir(folder_path):
            # Loop through every image in the folder
            for img_name in os.listdir(folder_path):
                img_path = os.path.join(folder_path, img_name)
                
                # Check file type
                if not img_name.lower().endswith(('.jpg', '.jpeg', '.png')):
                    continue

                try:
                    # 1. Read Image
                    # (Safe read for Windows paths with special chars)
                    stream = open(img_path, "rb")
                    bytes = bytearray(stream.read())
                    numpyarray = np.asarray(bytes, dtype=np.uint8)
                    img = cv2.imdecode(numpyarray, cv2.IMREAD_UNCHANGED)
                    
                    if img is None: continue
                    
                    # 2. Convert to RGB
                    img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
                    
                    # 3. Extract Landmarks
                    results = hands.process(img_rgb)
                    
                    if results.multi_hand_landmarks:
                        for hand_landmarks in results.multi_hand_landmarks:
                            row = []
                            # Extract 21 points (x, y, z)
                            for landmark in hand_landmarks.landmark:
                                row.extend([landmark.x, landmark.y, landmark.z])
                            
                            data.append(row)
                            labels.append(folder_name)
                            
                except Exception as e:
                    continue

    # --- SAVE TO CSV ---
    if len(data) > 0:
        # Create column names
        columns = []
        for i in range(21):
            columns.extend([f'x{i}', f'y{i}', f'z{i}'])
        
        df = pd.DataFrame(data, columns=columns)
        df['label'] = labels # Add label column
        
        # Save it
        df.to_csv(CSV_PATH, index=False)
        print(f"\n‚úÖ SUCCESS! Extracted {len(df)} samples.")
        print(f"üíæ Saved to: {CSV_PATH}")
        print("üëâ You can now run Section 5 to train.")
    else:
        print("\n‚ùå FAILED: No hands were detected in any images.")


In [None]:
import cv2
import mediapipe as mp
import os
import numpy as np

# --- CONFIGURATION ---
# Paste your path here again
DATASET_DIR = r"M:\Term 9\Grad\Main\Sign-Language-Recognition-System-main\Sign-Language-Recognition-System-main\Sign_to_Sentence Project Main\Datasets\Dataset (ArASL)\ArASL Database\ArASL_Database"

# We will pick the first letter folder we find (e.g., 'ain')
target_class = os.listdir(DATASET_DIR)[0]
class_path = os.path.join(DATASET_DIR, target_class)
image_name = os.listdir(class_path)[0]
img_path = os.path.join(class_path, image_name)

print(f"üßê Debugging Image: {img_path}")

# --- SETUP MEDIAPIPE ---
mp_hands = mp.solutions.hands
mp_draw = mp.solutions.drawing_utils

def try_detect(confidence, static_mode):
    print(f"\nTesting: Confidence={confidence}, StaticMode={static_mode}...")
    
    with mp_hands.Hands(
        static_image_mode=static_mode,
        max_num_hands=1,
        min_detection_confidence=confidence
    ) as hands:
        
        # 1. Read Image
        # Use binary read to handle Windows path issues
        with open(img_path, "rb") as f:
            bytes = bytearray(f.read())
            numpyarray = np.asarray(bytes, dtype=np.uint8)
            img = cv2.imdecode(numpyarray, cv2.IMREAD_UNCHANGED)
        
        if img is None:
            print("‚ùå Error: Could not read image file.")
            return False

        # 2. Convert to RGB
        img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        
        # 3. Process
        results = hands.process(img_rgb)
        
        if results.multi_hand_landmarks:
            print("‚úÖ SUCCESS! Hand Detected.")
            # Draw landmarks to prove it
            for hand_landmarks in results.multi_hand_landmarks:
                mp_draw.draw_landmarks(img, hand_landmarks, mp_hands.HAND_CONNECTIONS)
            
            # Show the success
            cv2.imshow(f"Success! Conf={confidence}", img)
            cv2.waitKey(0)
            cv2.destroyAllWindows()
            return True
        else:
            print("‚ùå FAILED. No hand seen.")
            return False

# --- RUN TESTS ---
# Test 1: Default Settings
if try_detect(confidence=0.5, static_mode=True):
    print(">> Recommendation: Your images are fine. Maybe the loop logic was wrong.")
    
# Test 2: Lower Confidence (For bad lighting)
elif try_detect(confidence=0.3, static_mode=True):
    print(">> Recommendation: Change 'min_detection_confidence' to 0.3 in your script.")

# Test 3: Video Mode (Sometimes works better for blurry pics)
elif try_detect(confidence=0.5, static_mode=False):
    print(">> Recommendation: Change 'static_image_mode' to False in your script.")

else:
    print("\nüö® CRITICAL FAILURE: MediaPipe cannot see the hand in this image at all.")
    print("   1. Are the images blank?")
    print("   2. Is the hand cut off (no wrist)?")
    print("   3. Please open the image manually to check it.")


üßê Debugging Image: M:\Term 9\Grad\Main\Sign-Language-Recognition-System-main\Sign-Language-Recognition-System-main\Sign_to_Sentence Project Main\Datasets\Dataset (ArASL)\ArASL Database\ArASL_Database\ain\AIN (1).JPG

Testing: Confidence=0.5, StaticMode=True...
‚ùå FAILED. No hand seen.

Testing: Confidence=0.3, StaticMode=True...
‚ùå FAILED. No hand seen.

Testing: Confidence=0.5, StaticMode=False...
‚ùå FAILED. No hand seen.

üö® CRITICAL FAILURE: MediaPipe cannot see the hand in this image at all.
   1. Are the images blank?
   2. Is the hand cut off (no wrist)?
   3. Please open the image manually to check it.


In [None]:
# ==========================================
# REPLACEMENT FOR SECTION 4: LOAD CSV DIRECTLY
# ==========================================
import pandas as pd
import os

# 1. Point to the file you uploaded
CSV_PATH = r'FINAL_CLEAN_DATASET.csv'

if os.path.exists(CSV_PATH):
    print(f"‚úÖ Found Ready-Made CSV: {CSV_PATH}")
    
    # Load it
    df = pd.read_csv(CSV_PATH)
    
    # 2. Fix the column names to match what the training code expects
    # (The file has 'letter', but our code wants 'label')
    if 'letter' in df.columns:
        df = df.rename(columns={'letter': 'label'})
        print("   -> Renamed column 'letter' to 'label'")
        
    print(f"üéâ SUCCESS! Loaded {len(df)} training samples.")
    print("   You can now skip to Section 5/6 to train the model!")
    
else:
    print(f"‚ùå Error: Could not find '{CSV_PATH}'")
    print("   Make sure you dragged and dropped the CSV file into the notebook folder.")


‚úÖ Found Ready-Made CSV: FINAL_CLEAN_DATASET.csv
üéâ SUCCESS! Loaded 8037 training samples.
   You can now skip to Section 5/6 to train the model!


## Section 5: Preprocess and Split Data

Load the extracted keypoints, prepare features and labels, and split into training, validation, and test sets.


In [None]:
# ============================================
# Section 5: PREPROCESS AND SPLIT DATA
# ============================================

print('=' * 60)
print('üîÑ DATA PREPROCESSING AND SPLITTING')
print('=' * 60)

if df.empty:
    print('\n‚ùå ERROR: No dataset loaded. Please run the extraction cell first.')
else:
    # --- CORRECTED SLICING FOR YOUR CSV ---
    # Your label is in the FIRST column (index 0)
    # Your data (x,y,z) starts from the SECOND column (index 1 to end)
    
    # 1. Extract Features (X): Skip the first column (label)
    X = df.iloc[:, 1:].astype('float32').values
    
    # 2. Extract Labels (y): Take ONLY the first column
    y = df.iloc[:, 0].values
    
    print(f'\nüìä Dataset Statistics:')
    print(f'   Total samples: {len(df)}')
    print(f'   Features per sample: {X.shape[1]} (Should be 63)')
    print(f'   Unique classes: {len(np.unique(y))}')
    
    # Encode labels
    encoder = LabelEncoder()
    y_encoded = encoder.fit_transform(y)
    num_classes = len(encoder.classes_)
    
    print(f'\nüî§ Arabic Letters in Dataset:')
    for i, letter in enumerate(encoder.classes_):
        count = np.sum(y_encoded == i)
        print(f'   {letter}: {count} samples')
    
    # Stratified split: Train (60%), Validation (20%), Test (20%)
    print(f'\n‚úÇÔ∏è  Splitting data (60% train, 20% val, 20% test)...')
    
    X_train_full, X_test, y_train_full, y_test = train_test_split(
        X, y_encoded, test_size=0.2, random_state=42, stratify=y_encoded
    )
    
    X_train, X_val, y_train, y_val = train_test_split(
        X_train_full, y_train_full, test_size=0.2, random_state=42, stratify=y_train_full
    )
    
    # Convert labels to one-hot encoding
    y_train = to_categorical(y_train, num_classes=num_classes)
    y_val = to_categorical(y_val, num_classes=num_classes)
    y_test = to_categorical(y_test, num_classes=num_classes)
    
    # Ensure all features are float32 for training
    X_train = X_train.astype('float32')
    X_val = X_val.astype('float32')
    X_test = X_test.astype('float32')
    
    print(f'\nüìà Data Split Summary:')
    print(f'   Training samples: {len(X_train)}')
    print(f'   Validation samples: {len(X_val)}')
    print(f'   Test samples: {len(X_test)}')
    print(f'   Total: {len(X_train) + len(X_val) + len(X_test)}')
    print('=' * 60)


üîÑ DATA PREPROCESSING AND SPLITTING

üìä Dataset Statistics:
   Total samples: 8037
   Features per sample: 63 (Should be 63)
   Unique classes: 34

üî§ Arabic Letters in Dataset:
   Ain: 209 samples
   Al: 269 samples
   Alef: 258 samples
   Beh: 274 samples
   Dad: 256 samples
   Dal: 219 samples
   Feh: 236 samples
   Ghain: 216 samples
   Hah: 188 samples
   Heh: 203 samples
   Jeem: 192 samples
   Kaf: 252 samples
   Khah: 196 samples
   Laa: 240 samples
   Lam: 253 samples
   Meem: 240 samples
   Noon: 214 samples
   Qaf: 193 samples
   Reh: 206 samples
   Sad: 261 samples
   Seen: 262 samples
   Sheen: 274 samples
   Tah: 196 samples
   Teh: 278 samples
   Teh_Marbuta: 230 samples
   Theh: 271 samples
   Waw: 201 samples
   Yeh: 261 samples
   Zah: 210 samples
   Zain: 191 samples
   del: 300 samples
   nothing: 300 samples
   space: 300 samples
   thal: 188 samples

‚úÇÔ∏è  Splitting data (60% train, 20% val, 20% test)...

üìà Data Split Summary:
   Training samples: 5143


## Section 6: Efficient tf.data Pipeline Helper

Create an optimized data pipeline function for training and evaluation.


In [6]:
# ============================================
# Section 6: EFFICIENT TF.DATA PIPELINE
# ============================================

AUTOTUNE = tf.data.AUTOTUNE

def make_dataset(features, labels, batch_size, training=True):
    """
    Create an efficient tf.data pipeline for training or evaluation.
    
    Args:
        features: Input features (numpy array)
        labels: Target labels (numpy array)
        batch_size: Batch size for training
        training: Whether to shuffle data (True for training, False for validation/test)
    
    Returns:
        tf.data.Dataset: Optimized dataset pipeline
    """
    ds = tf.data.Dataset.from_tensor_slices((features, labels))
    
    if training:
        # Shuffle with a reasonable buffer size for reproducibility
        buffer = min(len(features), 10000)
        ds = ds.shuffle(buffer_size=buffer, reshuffle_each_iteration=True)
    
    # Batch and prefetch for efficient GPU feeding
    ds = ds.batch(batch_size).prefetch(AUTOTUNE)
    
    return ds

print('‚úÖ tf.data pipeline helper function created')


‚úÖ tf.data pipeline helper function created


## Section 7: Build and Train MLP Model (GPU-Optimized)

Build, compile, and train a multi-layer perceptron model optimized for GPU training with all callbacks and advanced features.


In [7]:
# ============================================
# Section 7: BUILD AND TRAIN MLP MODEL
# ============================================

print('=' * 60)
print('üî® BUILDING AND TRAINING MLP MODEL')
print('=' * 60)

if df.empty:
    print('‚ùå ERROR: No data available. Run preprocessing cell first.')
else:
    # Clear session to free GPU memory
    tf.keras.backend.clear_session()
    
    print(f'\nüìã Model Configuration:')
    print(f'   Input shape: {X_train.shape[1]} features')
    print(f'   Output classes: {num_classes} Arabic letters')
    print(f'   Device: {DEVICE}')
    
    # Build model with GPU optimization
    with tf.device(DEVICE):
        model = Sequential([
            # Input layer: 256 neurons
            Dense(
                512,
                activation='relu',
                kernel_initializer='he_normal',
                kernel_regularizer=tf.keras.regularizers.l2(1e-4),
                input_shape=(X_train.shape[1],),
                name='dense_512'
            ),
            BatchNormalization(name='bn_1'),
            Dropout(0.2, name='dropout_1'),
            
            # Hidden layer: 128 neurons
            Dense(
                256,
                activation='relu',
                kernel_initializer='he_normal',
                kernel_regularizer=tf.keras.regularizers.l2(1e-4),
                name='dense_256'
            ),
            BatchNormalization(name='bn_2'),
            Dropout(0.2, name='dropout_2'),
            
            # Hidden layer: 64 neurons
            Dense(
                64,
                activation='relu',
                kernel_initializer='he_normal',
                name='dense_64'
            ),
            Dropout(0.2, name='dropout_3'),
            
            # Output layer: softmax for multi-class classification
            Dense(num_classes, activation='softmax', dtype='float32', name='output')
        ])
        
        # Use legacy Adam optimizer (works better with mixed precision)
        optimizer = tf.keras.optimizers.legacy.Adam(learning_rate=0.0005)
        
        # Compile model
        model.compile(
            optimizer=optimizer,
            loss='categorical_crossentropy',
            metrics=['accuracy']
        )
    
    # Display model summary
    print('\nüìä Model Summary:')
    model.summary()
    
    # Prepare data pipelines
    if USE_GPU:
        BATCH_SIZE = 256
    else:
        BATCH_SIZE = 64
    
    train_ds = make_dataset(X_train, y_train, BATCH_SIZE, training=True)
    val_ds = make_dataset(X_val, y_val, BATCH_SIZE, training=False)
    
    # Define callbacks
    callbacks = [
        ModelCheckpoint(
            'arsl_mediapipe_mlp_model_best.h5',
            monitor='val_accuracy',
            save_best_only=True,
            mode='max',
            verbose=1
        ),
        EarlyStopping(
            monitor='val_loss',
            patience=15,
            restore_best_weights=True,
            verbose=1
        ),
        ReduceLROnPlateau(
            monitor='val_loss',
            factor=0.5,
            patience=15,
            min_lr=1e-7,
            verbose=1
        )
    ]
    
    # Print training configuration
    print(f'\n‚öôÔ∏è  Training Configuration:')
    print(f'   Batch size: {BATCH_SIZE}')
    print(f'   Epochs: 20 (early stopping may occur earlier)')
    print(f'   Optimizer: Adam (learning rate: 0.001)')
    print(f'   Loss: Categorical Crossentropy')
    print(f'   Metrics: Accuracy')
    print(f'   Callbacks: ModelCheckpoint, EarlyStopping, ReduceLROnPlateau')
    
    # Train model
    print('\nüöÄ Starting training...')
    print('=' * 60)
    
    start_time = time.time()
    
    with tf.device(DEVICE):
        history = model.fit(
            train_ds,
            validation_data=val_ds,
            epochs=100,
            callbacks=callbacks,
            verbose=1
        )
    
    training_time = time.time() - start_time
    
    # Save final model
    model.save('arsl_mediapipe_mlp_model_final.h5')
    
    print('\n' + '=' * 60)
    print('‚úÖ TRAINING COMPLETE!')
    print('=' * 60)
    print(f'‚è±Ô∏è  Training time: {training_time:.2f} seconds ({training_time/60:.2f} minutes)')
    print(f'üìÅ Best model saved: arsl_mediapipe_mlp_model_best.h5')
    print(f'üìÅ Final model saved: arsl_mediapipe_mlp_model_final.h5')
    
    # Display final metrics
    if hasattr(history, 'history'):
        final_train_acc = history.history['accuracy'][-1]
        final_val_acc = history.history['val_accuracy'][-1]
        final_train_loss = history.history['loss'][-1]
        final_val_loss = history.history['val_loss'][-1]
        
        print(f'\nüìä Final Training Metrics:')
        print(f'   Training Accuracy: {final_train_acc*100:.2f}%')
        print(f'   Validation Accuracy: {final_val_acc*100:.2f}%')
        print(f'   Training Loss: {final_train_loss:.4f}')
        print(f'   Validation Loss: {final_val_loss:.4f}')
    print('=' * 60)


üî® BUILDING AND TRAINING MLP MODEL

üìã Model Configuration:
   Input shape: 63 features
   Output classes: 34 Arabic letters
   Device: /GPU:0

üìä Model Summary:
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_512 (Dense)           (None, 512)               32768     
                                                                 
 bn_1 (BatchNormalization)   (None, 512)               2048      
                                                                 
 dropout_1 (Dropout)         (None, 512)               0         
                                                                 
 dense_256 (Dense)           (None, 256)               131328    
                                                                 
 bn_2 (BatchNormalization)   (None, 256)               1024      
                                                                 
 dropout_2 (Dropout)

## Section 8: Evaluate Model on Test Data

Evaluate the best trained model on the held-out test set and report performance metrics.


In [8]:
# ============================================
# Section 8: EVALUATE MODEL ON TEST DATA
# ============================================

print('=' * 60)
print('üß™ MODEL EVALUATION ON TEST DATA')
print('=' * 60)

if df.empty:
    print('‚ùå ERROR: No data available. Run preprocessing cell first.')
else:
    # Load the best model
    print('\nüì¶ Loading best model...')
    model_best = tf.keras.models.load_model('arsl_mediapipe_mlp_model_best.h5')
    
    # Create test dataset pipeline
    eval_batch_size = 256 if USE_GPU else 128
    test_ds = make_dataset(X_test, y_test, eval_batch_size, training=False)
    
    print(f'\nüîç Evaluating on {len(X_test)} test samples...')
    print(f'   Batch size: {eval_batch_size}')
    print(f'   Device: {DEVICE}')
    
    # Evaluate
    start_time = time.time()
    with tf.device(DEVICE):
        test_loss, test_accuracy = model_best.evaluate(test_ds, verbose=1)
    eval_time = time.time() - start_time
    
    print('\n' + '=' * 60)
    print('‚úÖ EVALUATION COMPLETE!')
    print('=' * 60)
    print(f'‚è±Ô∏è  Evaluation time: {eval_time:.4f} seconds')
    print(f'\nüìä Test Performance Metrics:')
    print(f'   Test Loss: {test_loss:.4f}')
    print(f'   Test Accuracy: {test_accuracy*100:.2f}%')
    
    # Performance interpretation
    if test_accuracy >= 0.95:
        print(f'\n   üåü Excellent! Model is highly accurate.')
    elif test_accuracy >= 0.90:
        print(f'\n   ‚≠ê Very good performance!')
    elif test_accuracy >= 0.80:
        print(f'\n   üëç Good performance. Consider more training data or fine-tuning.')
    else:
        print(f'\n   ‚ö†Ô∏è  May need improvement. Check data quality or increase epochs.')
    print('=' * 60)


üß™ MODEL EVALUATION ON TEST DATA

üì¶ Loading best model...

üîç Evaluating on 1608 test samples...
   Batch size: 256
   Device: /GPU:0

‚úÖ EVALUATION COMPLETE!
‚è±Ô∏è  Evaluation time: 0.3225 seconds

üìä Test Performance Metrics:
   Test Loss: 0.2402
   Test Accuracy: 97.45%

   üåü Excellent! Model is highly accurate.


## Section 9: Real-Time Inference (Webcam) with Arabic Letters

Deploy the trained model for real-time Arabic letter recognition using your webcam. Press 'q' to quit.

### Instructions:

1. Make sure your webcam is connected and working
2. Position your hand clearly in front of the camera
3. The system will display the predicted letter in the top-left corner
4. Special gestures:
   - **SPACE**: Add a space between words
   - **DELETE**: Remove the last letter
   - **NOTHING**: Ignore the prediction

The predicted sentence will be displayed at the bottom of the video feed.


In [None]:
# ============================================
# Section 9: REAL-TIME INFERENCE (WEBCAM)
# ============================================
# Commit-once-then-wait strategy (same as English notebooks)
# Control labels match CSV: 'space', 'del', 'nothing' (lowercase)

print('=' * 60)
print('üé• REAL-TIME ARABIC LETTER RECOGNITION')
print('=' * 60)

if df.empty:
    print('‚ùå ERROR: No data available. Run preprocessing cell first.')
else:
    # Load encoder
    print('\nüì¶ Loading model and encoder...')
    encoder = LabelEncoder()
    df_labels = pd.read_csv(CSV_PATH)
    # Handle both possible column names ('label' or 'letter')
    label_col = 'label' if 'label' in df_labels.columns else 'letter'
    encoder.fit(df_labels[label_col])
    print(f'   Encoder classes ({len(encoder.classes_)}): {list(encoder.classes_[:5])}...')
    
    # Load trained model
    tf.keras.mixed_precision.set_global_policy('float32')
    mlp_model = tf.keras.models.load_model('arsl_mediapipe_mlp_model_best.h5')
    print('‚úÖ Model and encoder loaded!')
    
    # Initialize MediaPipe
    mp_hands = mp.solutions.hands
    mp_drawing = mp.solutions.drawing_utils
    hands = mp_hands.Hands(
        min_detection_confidence=0.7,
        min_tracking_confidence=0.7
    )
    
    # Stabilization settings
    STABILIZATION_WINDOW_SIZE = 10
    STABILIZATION_THRESHOLD = 7
    MIN_CONFIDENCE = 0.70
    HOLD_TIME_REQUIRED = 0.8
    DISPLAY_WIDTH = 1280
    DISPLAY_HEIGHT = 720
    
    # Open webcam
    print('\nüé• Initializing webcam...')
    cap = cv2.VideoCapture(0)
    
    if not cap.isOpened():
        print('‚ùå ERROR: Could not open webcam!')
    else:
        print('‚úÖ Webcam opened successfully!')
        print('\nüìù Instructions:')
        print('   - Position your hand in front of the camera')
        print('   - Hold a sign steady until it commits')
        print('   - Change sign or remove hand for next letter')
        print('   - Press "q" to quit, "c" to clear')
        print('\n' + '=' * 60)
        print('üî¥ Recording... Press "q" to stop')
        print('=' * 60 + '\n')
        
        # State variables
        predicted_sentence = ''
        stabilization_buffer = deque(maxlen=STABILIZATION_WINDOW_SIZE)
        
        # Commit-once-then-wait state
        committed_label = None
        current_sign_label = None
        current_sign_start = None
        waiting_for_change = False
        
        window_name = 'Arabic Sign Language Recognition (MediaPipe + MLP)'
        cv2.namedWindow(window_name, cv2.WINDOW_NORMAL)
        cv2.resizeWindow(window_name, DISPLAY_WIDTH, DISPLAY_HEIGHT)
        
        try:
            while cap.isOpened():
                ret, frame = cap.read()
                if not ret:
                    break
                
                # Process UNFLIPPED frame with MediaPipe (matches training data)
                frame = cv2.resize(frame, (DISPLAY_WIDTH, DISPLAY_HEIGHT))
                h, w, c = frame.shape
                rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
                rgb_frame.flags.writeable = False
                results = hands.process(rgb_frame)
                rgb_frame.flags.writeable = True
                
                display_status = ''
                status_color = (200, 200, 200)
                
                if results.multi_hand_landmarks:
                    for hand_landmarks, handedness in zip(
                        results.multi_hand_landmarks,
                        results.multi_handedness
                    ):
                        mp_drawing.draw_landmarks(
                            frame, hand_landmarks, mp_hands.HAND_CONNECTIONS
                        )
                        
                        # Extract landmarks ‚Äî NO mirroring (matches training)
                        landmarks = np.array([
                            [lm.x, lm.y, lm.z] for lm in hand_landmarks.landmark
                        ])
                        input_data = landmarks.flatten().reshape(1, -1).astype('float32')
                        
                        try:
                            prediction = mlp_model.predict(input_data, verbose=0)
                        except Exception as e:
                            display_status = f'Prediction error: {e}'
                            status_color = (0, 0, 255)
                            break
                        
                        pred_class = np.argmax(prediction)
                        pred_confidence = float(np.max(prediction))
                        pred_label = encoder.inverse_transform([pred_class])[0]
                        
                        # Skip low confidence
                        if pred_confidence < MIN_CONFIDENCE:
                            display_status = f'{pred_label} ({pred_confidence:.0%}) Low conf'
                            status_color = (0, 100, 255)
                            break
                        
                        # Stability buffer
                        stabilization_buffer.append(pred_label)
                        buffer_count = stabilization_buffer.count(pred_label)
                        is_stable = (buffer_count >= STABILIZATION_THRESHOLD and 
                                     len(stabilization_buffer) == STABILIZATION_WINDOW_SIZE)
                        
                        if not is_stable:
                            progress = buffer_count / STABILIZATION_THRESHOLD * 100
                            display_status = f'{pred_label} ({pred_confidence:.0%}) Stabilizing {progress:.0f}%'
                            status_color = (0, 255, 255)
                            break
                        
                        now = time.time()
                        
                        # Check if waiting after a commit
                        if waiting_for_change:
                            if pred_label == committed_label:
                                display_status = f'{pred_label} ({pred_confidence:.0%}) Committed - change sign'
                                status_color = (255, 200, 0)
                                break
                            else:
                                waiting_for_change = False
                                committed_label = None
                                current_sign_label = pred_label
                                current_sign_start = now
                        
                        # Track hold time
                        if pred_label != current_sign_label:
                            current_sign_label = pred_label
                            current_sign_start = now
                        
                        hold_duration = now - current_sign_start if current_sign_start else 0
                        
                        if hold_duration < HOLD_TIME_REQUIRED:
                            hold_pct = hold_duration / HOLD_TIME_REQUIRED * 100
                            display_status = f'{pred_label} ({pred_confidence:.0%}) Hold: {hold_pct:.0f}%'
                            status_color = (0, 255, 255)
                            break
                        
                        # COMMIT ‚Äî control labels match CSV: 'space', 'del', 'nothing'
                        if pred_label == 'space':
                            if predicted_sentence and predicted_sentence[-1] != ' ':
                                predicted_sentence += ' '
                        elif pred_label == 'del':
                            if predicted_sentence:
                                predicted_sentence = predicted_sentence[:-1]
                        elif pred_label != 'nothing':
                            predicted_sentence += pred_label
                        
                        committed_label = pred_label
                        waiting_for_change = True
                        current_sign_label = None
                        current_sign_start = None
                        stabilization_buffer.clear()
                        
                        display_status = f'{pred_label} ({pred_confidence:.0%}) COMMITTED!'
                        status_color = (0, 255, 0)
                else:
                    # No hand ‚Üí full reset
                    committed_label = None
                    waiting_for_change = False
                    current_sign_label = None
                    current_sign_start = None
                    stabilization_buffer.clear()
                    display_status = 'No hand detected'
                    status_color = (150, 150, 150)
                
                # Flip for selfie-view display
                frame = cv2.flip(frame, 1)
                
                # Draw info panel at top-left
                cv2.rectangle(frame, (10, 10), (500, 100), (0, 0, 0), -1)
                cv2.putText(frame, 'Arabic Letter Recognition', (20, 35),
                           cv2.FONT_HERSHEY_SIMPLEX, 0.7, (255, 255, 255), 1)
                cv2.putText(frame, display_status, (20, 75),
                           cv2.FONT_HERSHEY_SIMPLEX, 0.8, status_color, 2)
                
                # Draw sentence panel at bottom
                cv2.rectangle(frame, (0, h - 80), (w, h), (0, 0, 0), -1)
                cv2.putText(frame, 'Predicted:', (10, h - 55),
                           cv2.FONT_HERSHEY_SIMPLEX, 0.7, (200, 200, 200), 1)
                cv2.putText(frame, predicted_sentence[-50:], (10, h - 20),
                           cv2.FONT_HERSHEY_SIMPLEX, 1.0, (0, 255, 255), 2)
                
                cv2.imshow(window_name, frame)
                
                key = cv2.waitKey(1) & 0xFF
                if key == ord('q'):
                    break
                elif key == ord('c'):
                    predicted_sentence = ''
                    committed_label = None
                    waiting_for_change = False
                    stabilization_buffer.clear()
                    print('Sentence cleared')
        
        except KeyboardInterrupt:
            print('\n‚ö†Ô∏è Interrupted by user')
        except Exception as e:
            print(f'‚ùå Error: {e}')
        finally:
            cap.release()
            cv2.destroyAllWindows()
            
            print('\n' + '=' * 60)
            print('Session ended')
            if predicted_sentence:
                print(f'Final sentence: {predicted_sentence}')
            print('=' * 60)


## Summary and Next Steps

Congratulations! You have successfully created a complete Arabic sign language recognition system using MediaPipe and neural networks.

### What You've Accomplished:

‚úÖ Extracted MediaPipe hand keypoints from your Arabic dataset  
‚úÖ Preprocessed and split data into training, validation, and test sets  
‚úÖ Built and trained a GPU-optimized MLP model  
‚úÖ Evaluated model performance on test data  
‚úÖ Deployed real-time inference using webcam

### Generated Files:

- `arsl_mediapipe_keypoints_final.csv` - Extracted keypoints dataset
- `arsl_mediapipe_mlp_model_best.h5` - Best trained model (highest validation accuracy)
- `arsl_mediapipe_mlp_model_final.h5` - Final model after training

### Troubleshooting Tips:

**If extraction is slow:**

- Reduce dataset size or use a subset for testing
- GPU will not significantly speed up image reading, only model training

**If training is slow or runs out of memory:**

- Reduce `BATCH_SIZE` to 128 or 64
- Close other applications
- Check GPU memory with `nvidia-smi -l 1`

**If real-time inference is slow:**

- Ensure GPU is being used (check `DEVICE` variable)
- Reduce preprocessing in the inference loop if needed

**If accuracy is low:**

- Check data quality (images should be clear, well-lit)
- Ensure hands are visible in most training images
- Increase training data
- Train for more epochs (remove early stopping or increase patience)

### Future Improvements:

- Use transfer learning with pre-trained models (ResNet, MobileNet)
- Implement sequence modeling (LSTM, Transformers) for better temporal understanding
- Add hand gesture smoothing for more stable predictions
- Include two-hand detection for two-handed signs
- Build a larger, more diverse dataset
- Deploy as a web application or mobile app

**Good luck with your Arabic sign language recognition project!**
