# ArSL Word — Live Webcam Testing

This notebook lets you **test your trained Arabic sign language word model in real-time** using your webcam.

### How it works:

1. **Continuous capture** — MediaPipe extracts hand landmarks every frame (supports 1 or 2 hands)
2. **Sliding window** — buffers the last 30 frames into a sequence
3. **Prediction** — feeds the sequence to the BiLSTM model every 0.5s
4. **Sentence building** — confirmed words are appended to a sentence

### Two-Hand Support:

- **Auto-detects** the model's expected input shape (63 or 126 features)
- If the model expects **63 features** (1 hand) — uses the dominant hand only
- If the model expects **126 features** (2 hands) — captures both hands and concatenates landmarks
- Many Arabic sign language word signs require two hands for proper recognition

### Controls:

| Key         | Action                  |
| ----------- | ----------------------- |
| `q`         | Quit                    |
| `r`         | Reset sentence          |
| `SPACE`     | Add space between words |
| `BACKSPACE` | Delete last word        |

### Requirements:

- Trained model: `arsl_word_lstm_model_best.h5`
- Class mapping: `arsl_word_classes.csv`
- Webcam connected


In [None]:
# ===============================
# CELL 1: IMPORTS & SETUP
# ===============================

import cv2
import json
import time
import numpy as np
import pandas as pd
import mediapipe as mp
import tensorflow as tf
from pathlib import Path
from collections import deque

print(f'TensorFlow: {tf.__version__}')
print(f'OpenCV: {cv2.__version__}')
print(f'MediaPipe: {mp.__version__}')

# Check GPU
gpus = tf.config.list_physical_devices('GPU')
if gpus:
    for gpu in gpus:
        tf.config.experimental.set_memory_growth(gpu, True)
    print(f'GPU detected: {gpus[0].name}')
else:
    print('No GPU — running on CPU')


In [None]:
# ===============================
# CELL 2: CONFIGURATION
# ===============================

PROJECT_ROOT = Path(r'E:/Term 9/Grad')
SLR_MAIN = PROJECT_ROOT / 'Main/Sign-Language-Recognition-System-main/SLR Main'
WORDS_ROOT = SLR_MAIN / 'Words'
OUTPUT_DIR = WORDS_ROOT / 'ArSL Word (Arabic)'
SHARED_CSV = WORDS_ROOT / 'Shared/shared_word_vocabulary.csv'

# Model files
MODEL_PATH = OUTPUT_DIR / 'arsl_word_lstm_model_best.h5'
CLASSES_CSV = OUTPUT_DIR / 'arsl_word_classes.csv'

# Sequence parameters (must match training)
SEQUENCE_LENGTH = 30    # frames per sequence

# Hand detection mode: auto-detected from model input shape
# - 63 features = 1 hand (21 landmarks x 3)
# - 126 features = 2 hands (2 x 21 landmarks x 3)
# Set to None for auto-detection, or override manually:
NUM_FEATURES = None  # will be set after model loads

# Live inference settings
CONFIDENCE_THRESHOLD = 0.35     # minimum confidence to accept a prediction
PREDICTION_INTERVAL = 0.5       # seconds between predictions
STABILITY_WINDOW = 3            # consecutive same predictions needed to confirm
COOLDOWN_TIME = 2.0             # seconds after confirming a word before next

# Camera
CAMERA_INDEX = 0
CAMERA_WIDTH = 1280
CAMERA_HEIGHT = 720

print(f'Model  : {MODEL_PATH}')
print(f'Classes: {CLASSES_CSV}')
print(f'Sequence: {SEQUENCE_LENGTH} frames')
print(f'Confidence threshold: {CONFIDENCE_THRESHOLD}')
print(f'Stability window: {STABILITY_WINDOW} predictions')


In [None]:
# ===============================
# CELL 3: LOAD MODEL & VOCABULARY
# ===============================

# --- Custom layer needed for model loading ---
class TemporalAttention(tf.keras.layers.Layer):
    """Temporal attention layer (must match training definition)."""
    def __init__(self, **kwargs):
        super().__init__(**kwargs)

    def build(self, input_shape):
        self.W = self.add_weight(name='att_weight', shape=(input_shape[-1], 1),
                                 initializer='glorot_uniform', trainable=True)
        self.b = self.add_weight(name='att_bias', shape=(input_shape[1], 1),
                                 initializer='zeros', trainable=True)

    def call(self, x):
        e = tf.nn.tanh(tf.matmul(x, self.W) + self.b)
        a = tf.nn.softmax(e, axis=1)
        output = tf.reduce_sum(x * a, axis=1)
        return output

# Load model
print('Loading model...')
model = tf.keras.models.load_model(
    str(MODEL_PATH),
    custom_objects={'TemporalAttention': TemporalAttention}
)
print(f'Model loaded: {model.name} — {model.count_params():,} parameters')

# Auto-detect feature count from model input shape
model_input_shape = model.input_shape  # (None, SEQUENCE_LENGTH, NUM_FEATURES)
NUM_FEATURES = model_input_shape[-1]
NUM_HANDS = 2 if NUM_FEATURES == 126 else 1
LANDMARKS_PER_HAND = 21 * 3  # 63

print(f'Model expects {NUM_FEATURES} features -> {NUM_HANDS} hand(s) mode')

# Load class mapping
class_df = pd.read_csv(CLASSES_CSV)
vocab_df = pd.read_csv(SHARED_CSV)
vocab_df = vocab_df.dropna(subset=['karsl_class'])

id_to_english = dict(zip(vocab_df['word_id'].astype(int), vocab_df['english']))
id_to_arabic = dict(zip(vocab_df['word_id'].astype(int), vocab_df['arabic']))
id_to_category = dict(zip(vocab_df['word_id'].astype(int), vocab_df['category']))

# Build model_index -> word name mapping (both English and Arabic)
index_to_english = {}
index_to_arabic = {}
for _, row in class_df.iterrows():
    idx = int(row['model_class_index'])
    wid = int(row['word_id'])
    index_to_english[idx] = id_to_english.get(wid, f'word_{wid}')
    index_to_arabic[idx] = id_to_arabic.get(wid, f'word_{wid}')

num_classes = len(index_to_english)
print(f'{num_classes} word classes loaded')
print(f'\nSample words:')
for i in list(index_to_english.keys())[:10]:
    print(f'   {i}: {index_to_english[i]} / {index_to_arabic[i]}')


In [None]:
# ===============================
# CELL 4: MEDIAPIPE HAND DETECTOR
# ===============================
# Supports both 1-hand and 2-hand detection based on model requirements

mp_hands = mp.solutions.hands
mp_drawing = mp.solutions.drawing_utils
mp_drawing_styles = mp.solutions.drawing_styles

hands = mp_hands.Hands(
    static_image_mode=False,
    max_num_hands=NUM_HANDS,       # dynamically set based on model
    min_detection_confidence=0.6,
    min_tracking_confidence=0.6
)

def extract_landmarks(frame):
    """Extract hand landmarks from a single frame.

    - 1-hand mode (63 features): returns landmarks for the first detected hand.
    - 2-hand mode (126 features): returns concatenated landmarks for both hands.
      If only one hand is detected, the other hand's landmarks are zero-padded.
      Hands are ordered: Left hand first, Right hand second (consistent ordering).

    Returns: (feature_vector, list_of_hand_landmarks_for_drawing)
    """
    rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    results = hands.process(rgb)

    draw_landmarks = []

    if NUM_HANDS == 1:
        # Single-hand mode (63 features)
        if results.multi_hand_landmarks:
            lm = results.multi_hand_landmarks[0]
            vec = np.array([[p.x, p.y, p.z] for p in lm.landmark], dtype=np.float32).flatten()
            draw_landmarks = [lm]
            return vec, draw_landmarks
        return np.zeros(NUM_FEATURES, dtype=np.float32), draw_landmarks

    else:
        # Two-hand mode (126 features)
        left_vec = np.zeros(LANDMARKS_PER_HAND, dtype=np.float32)
        right_vec = np.zeros(LANDMARKS_PER_HAND, dtype=np.float32)

        if results.multi_hand_landmarks and results.multi_handedness:
            for hand_lm, handedness in zip(results.multi_hand_landmarks, results.multi_handedness):
                draw_landmarks.append(hand_lm)
                label = handedness.classification[0].label  # 'Left' or 'Right'
                vec = np.array([[p.x, p.y, p.z] for p in hand_lm.landmark], dtype=np.float32).flatten()

                # Note: MediaPipe labels are mirrored (camera mirror effect)
                # 'Left' in MediaPipe = right hand in real life (when image is flipped)
                if label == 'Left':
                    left_vec = vec
                else:
                    right_vec = vec

        # Concatenate: [left_hand(63) | right_hand(63)] = 126 features
        combined = np.concatenate([left_vec, right_vec])
        return combined, draw_landmarks

print(f'MediaPipe hand detector ready ({NUM_HANDS} hand(s) mode)')
print(f'   Features per frame: {NUM_FEATURES}')


In [None]:
# ===============================
# CELL 5: LIVE WEBCAM TESTING
# ===============================
# Run this cell to start the live webcam feed.

def run_live_test():
    """Main live testing loop with sliding window prediction.
    Supports both 1-hand and 2-hand models automatically.
    Displays both English and Arabic translations."""

    cap = cv2.VideoCapture(CAMERA_INDEX)
    cap.set(cv2.CAP_PROP_FRAME_WIDTH, CAMERA_WIDTH)
    cap.set(cv2.CAP_PROP_FRAME_HEIGHT, CAMERA_HEIGHT)

    if not cap.isOpened():
        print('Cannot open camera!')
        return

    hand_mode_str = f'{NUM_HANDS} hand(s), {NUM_FEATURES} features'
    print(f'Camera opened [{hand_mode_str}]. Press Q to quit, R to reset, SPACE to add space, BACKSPACE to delete.')

    # --- State variables ---
    frame_buffer = deque(maxlen=SEQUENCE_LENGTH)
    prediction_history = deque(maxlen=STABILITY_WINDOW)
    sentence_words_en = []
    sentence_words_ar = []
    current_word_en = ''
    current_word_ar = ''
    current_conf = 0.0
    last_prediction_time = 0.0
    last_confirmed_time = 0.0
    hand_detected = False
    hands_count = 0
    fps_history = deque(maxlen=30)

    # Colors
    GREEN = (0, 200, 0)
    RED = (0, 0, 200)
    BLUE = (200, 100, 0)
    WHITE = (255, 255, 255)
    BLACK = (0, 0, 0)
    YELLOW = (0, 220, 220)
    ORANGE = (0, 140, 255)

    while True:
        frame_start = time.time()
        ret, frame = cap.read()
        if not ret:
            break

        frame = cv2.flip(frame, 1)
        h, w = frame.shape[:2]

        # --- Extract landmarks ---
        landmarks, hand_lm_list = extract_landmarks(frame)
        hand_detected = len(hand_lm_list) > 0
        hands_count = len(hand_lm_list)
        frame_buffer.append(landmarks)

        # --- Draw hand landmarks (all detected hands) ---
        for hand_lm in hand_lm_list:
            mp_drawing.draw_landmarks(
                frame, hand_lm, mp_hands.HAND_CONNECTIONS,
                mp_drawing_styles.get_default_hand_landmarks_style(),
                mp_drawing_styles.get_default_hand_connections_style()
            )

        # --- Predict when buffer is full ---
        now = time.time()
        if len(frame_buffer) == SEQUENCE_LENGTH and (now - last_prediction_time) >= PREDICTION_INTERVAL:
            last_prediction_time = now

            # Build sequence
            seq = np.array(list(frame_buffer), dtype=np.float32)
            seq = np.expand_dims(seq, axis=0)  # (1, 30, NUM_FEATURES)

            # Check if sequence has enough non-zero frames
            non_zero = np.sum(np.any(seq[0] != 0, axis=1))
            if non_zero >= SEQUENCE_LENGTH * 0.3:  # at least 30% non-zero frames
                proba = model.predict(seq, verbose=0)[0]
                pred_idx = np.argmax(proba)
                pred_conf = proba[pred_idx]
                pred_word_en = index_to_english.get(pred_idx, '?')
                pred_word_ar = index_to_arabic.get(pred_idx, '?')

                # Top-3 for display
                top3_idx = np.argsort(proba)[-3:][::-1]
                top3 = [(index_to_english.get(i, '?'), index_to_arabic.get(i, '?'), proba[i]) for i in top3_idx]

                if pred_conf >= CONFIDENCE_THRESHOLD:
                    current_word_en = pred_word_en
                    current_word_ar = pred_word_ar
                    current_conf = pred_conf
                    prediction_history.append(pred_word_en)

                    # Check stability: same word predicted N times in a row
                    if (len(prediction_history) == STABILITY_WINDOW and
                        len(set(prediction_history)) == 1 and
                        (now - last_confirmed_time) >= COOLDOWN_TIME):
                        # Confirm the word!
                        sentence_words_en.append(current_word_en)
                        sentence_words_ar.append(current_word_ar)
                        last_confirmed_time = now
                        prediction_history.clear()
                        print(f'Confirmed: "{current_word_en}" / "{current_word_ar}" ({current_conf:.1%})')
                else:
                    current_word_en = ''
                    current_word_ar = ''
                    current_conf = 0.0
            else:
                current_word_en = ''
                current_word_ar = ''
                current_conf = 0.0

        # --- Draw UI Overlay ---

        # Top bar: prediction info
        cv2.rectangle(frame, (0, 0), (w, 90), BLACK, -1)
        cv2.rectangle(frame, (0, 0), (w, 90), WHITE, 2)

        if current_word_en:
            color = GREEN if current_conf >= 0.6 else YELLOW if current_conf >= 0.4 else ORANGE
            cv2.putText(frame, f'Word: {current_word_en}', (15, 35),
                        cv2.FONT_HERSHEY_SIMPLEX, 1.0, color, 2)
            cv2.putText(frame, f'Confidence: {current_conf:.1%}', (15, 65),
                        cv2.FONT_HERSHEY_SIMPLEX, 0.7, color, 2)

            # Confidence bar
            bar_x = 450
            bar_w = 200
            bar_h = 20
            cv2.rectangle(frame, (bar_x, 20), (bar_x + bar_w, 20 + bar_h), (50, 50, 50), -1)
            fill_w = int(bar_w * current_conf)
            cv2.rectangle(frame, (bar_x, 20), (bar_x + fill_w, 20 + bar_h), color, -1)
            cv2.rectangle(frame, (bar_x, 20), (bar_x + bar_w, 20 + bar_h), WHITE, 1)

            # Stability progress
            stable_count = sum(1 for p in prediction_history if p == current_word_en)
            cv2.putText(frame, f'Stability: {stable_count}/{STABILITY_WINDOW}',
                        (bar_x, 65), cv2.FONT_HERSHEY_SIMPLEX, 0.6, WHITE, 1)
        else:
            status = 'Show a sign...' if hand_detected else 'No hand detected'
            cv2.putText(frame, status, (15, 45),
                        cv2.FONT_HERSHEY_SIMPLEX, 0.9, (150, 150, 150), 2)

        # Top-3 predictions (right side) — shows both English and Arabic
        if current_word_en and 'top3' in dir():
            tx = w - 380
            cv2.putText(frame, 'Top 3:', (tx, 25), cv2.FONT_HERSHEY_SIMPLEX, 0.5, WHITE, 1)
            for rank, (tw_en, tw_ar, tc) in enumerate(top3):
                y_pos = 45 + rank * 20
                cv2.putText(frame, f'{rank+1}. {tw_en} ({tc:.1%})', (tx, y_pos),
                            cv2.FONT_HERSHEY_SIMPLEX, 0.5, WHITE, 1)

        # Bottom bar: sentence (English)
        sentence_en = ' '.join(sentence_words_en) if sentence_words_en else '(sentence will appear here)'
        sentence_ar = ' '.join(sentence_words_ar) if sentence_words_ar else ''
        cv2.rectangle(frame, (0, h - 75), (w, h), BLACK, -1)
        cv2.rectangle(frame, (0, h - 75), (w, h), WHITE, 2)
        cv2.putText(frame, f'EN: {sentence_en}', (15, h - 45),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.7, WHITE, 2)
        if sentence_ar:
            cv2.putText(frame, f'AR: {sentence_ar}', (15, h - 15),
                        cv2.FONT_HERSHEY_SIMPLEX, 0.7, (100, 255, 100), 2)

        # Buffer indicator (above bottom bar)
        buf_fill = len(frame_buffer) / SEQUENCE_LENGTH
        buf_color = GREEN if buf_fill >= 1.0 else YELLOW
        cv2.putText(frame, f'Buffer: {len(frame_buffer)}/{SEQUENCE_LENGTH}',
                    (15, h - 90), cv2.FONT_HERSHEY_SIMPLEX, 0.5, buf_color, 1)

        # Hand status indicator (shows hand count for two-hand mode)
        if NUM_HANDS == 2:
            if hands_count == 2:
                hand_color = GREEN
                hand_text = f'HANDS: 2/2'
            elif hands_count == 1:
                hand_color = YELLOW
                hand_text = f'HANDS: 1/2'
            else:
                hand_color = RED
                hand_text = 'NO HANDS'
        else:
            hand_color = GREEN if hand_detected else RED
            hand_text = 'HAND OK' if hand_detected else 'NO HAND'

        cv2.circle(frame, (w - 80, h - 95), 8, hand_color, -1)
        cv2.putText(frame, hand_text, (w - 170, h - 90),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.5, hand_color, 1)

        # FPS counter
        fps = 1.0 / max(time.time() - frame_start, 1e-6)
        fps_history.append(fps)
        avg_fps = sum(fps_history) / len(fps_history)
        cv2.putText(frame, f'FPS: {avg_fps:.0f}', (w - 110, 115),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.6, WHITE, 1)

        # Mode indicator
        mode_text = f'ArSL | {NUM_HANDS}H / {NUM_FEATURES}F'
        cv2.putText(frame, mode_text, (w - 220, 135),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.5, (180, 180, 180), 1)

        # Cooldown indicator
        cooldown_remaining = max(0, COOLDOWN_TIME - (now - last_confirmed_time))
        if cooldown_remaining > 0:
            cv2.putText(frame, f'Cooldown: {cooldown_remaining:.1f}s',
                        (w // 2 - 80, 115), cv2.FONT_HERSHEY_SIMPLEX, 0.6, ORANGE, 2)

        # --- Show frame ---
        cv2.imshow('ArSL Word Recognition — Live Test', frame)

        # --- Handle keyboard ---
        key = cv2.waitKey(1) & 0xFF
        if key == ord('q'):
            break
        elif key == ord('r'):
            sentence_words_en.clear()
            sentence_words_ar.clear()
            prediction_history.clear()
            current_word_en = ''
            current_word_ar = ''
            print('Sentence reset')
        elif key == 32:  # SPACE
            sentence_words_en.append(' ')
            sentence_words_ar.append(' ')
            print('   [space added]')
        elif key == 8:   # BACKSPACE
            if sentence_words_en:
                removed_en = sentence_words_en.pop()
                removed_ar = sentence_words_ar.pop() if sentence_words_ar else ''
                print(f'Removed: "{removed_en}" / "{removed_ar}"')

    # Cleanup
    cap.release()
    cv2.destroyAllWindows()

    final_en = ' '.join(sentence_words_en)
    final_ar = ' '.join(sentence_words_ar)
    print(f'\nFinal sentence (EN): {final_en}')
    print(f'Final sentence (AR): {final_ar}')
    return final_en, final_ar

# --- RUN ---
result = run_live_test()


## Tips

| Issue                      | Solution                                                                                       |
| -------------------------- | ---------------------------------------------------------------------------------------------- |
| **Low FPS**                | Close other apps, reduce `CAMERA_WIDTH`/`CAMERA_HEIGHT`                                        |
| **Wrong predictions**      | Hold the sign steadily for ~2 seconds                                                          |
| **Camera not opening**     | Change `CAMERA_INDEX` to 1 or 2                                                                |
| **Too sensitive**          | Increase `STABILITY_WINDOW` to 4-5                                                             |
| **Not detecting**          | Lower `CONFIDENCE_THRESHOLD` to 0.25                                                           |
| **Too slow between words** | Decrease `COOLDOWN_TIME` to 1.0                                                                |
| **Only 1 hand shown**      | The model auto-detects hand count from its input shape. Retrain with 2 hands for full support. |

### How to perform a sign:

1. Face the camera with your hand(s) clearly visible
2. Perform the Arabic sign gesture smoothly
3. Wait for the stability bar to fill up
4. The word will be confirmed and added to the sentence (both English and Arabic)

### Two-Hand Mode Notes:

- If your model was trained with 126 features (2 hands), both hands will be tracked
- Hands are ordered consistently: Left first, Right second
- If only one hand is visible, the other hand's landmarks are zero-padded
- For best results with two-hand signs, keep both hands in the camera frame

### Arabic Display:

- The bottom bar shows both **English** and **Arabic** translations
- Top-3 predictions show English names (OpenCV has limited Arabic font support)
- Full Arabic display requires a GUI framework with Arabic font rendering
