# 🔍 Model Performance Evaluation

Comprehensive evaluation of model performance using validation dataset metrics.

## Purpose
- Generate detailed classification metrics for each class
- Visualize prediction patterns through confusion matrix
- Assess model accuracy, precision, and recall across all categories

## What to Look For
✅ **High diagonal values** in confusion matrix → Correct predictions  
✅ **Balanced precision/recall** across classes → Fair performance  
⚠️ **Dark off-diagonal cells** → Common misclassification patterns  
⚠️ **Low F1-scores** for specific classes → Needs improvement for those signs

---

# 📷 Real-time Sign Language Detection

Live webcam implementation for real-time sign language classification.

## Purpose
- Load trained model for inference
- Process live webcam feed for instant predictions
- Provide real-time sign language recognition

## Prerequisites
• Trained model file at `models/signvision_cnn.h5`  
• Webcam access and OpenCV installation  
• Correct class labels matching training data

## Configuration
- **Input Size**: 224×224 pixels (matches training)
- **Camera**: Default webcam (index 0)
- **Classes**: A-E (update with your actual labels)

---

# 🎯 Prediction Pipeline Setup

Initialize the real-time classification workflow with model and camera.

## Components
- **Model Loader**: Restores trained CNN architecture and weights
- **Label Mapper**: Defines output class to sign language mapping
- **Camera Interface**: Configures video capture device
- **Image Preprocessor**: Sets required input dimensions

## Next Steps
Add prediction loop to process frames and display classification results in real-time.

In [4]:
import cv2
import numpy as np
import tensorflow as tf

# Load trained model
model = tf.keras.models.load_model("E:/Downloads/sign_language/SignVision/models/signvision_cnn_v2.h5")

# Get class labels
class_names = ['A', 'B', 'C', 'D', 'E']  # replace with your actual labels

# Initialize webcam
cap = cv2.VideoCapture(0)

IMG_SIZE = (224, 224)




# ✋ Hand Tracking

Initialize MediaPipe for hand detection.

**Settings:**
- Single hand detection
- 60% detection confidence  
- 50% tracking confidence

**Features:**
- 21 landmark points per hand
- Real-time processing
- Visualize hand skeleton

In [5]:
import mediapipe as mp

mp_hands = mp.solutions.hands
mp_drawing = mp.solutions.drawing_utils

hands = mp_hands.Hands(
    max_num_hands=1,
    min_detection_confidence=0.6,
    min_tracking_confidence=0.5
)


# 🔄 Real-time Recognition Loop

Live hand tracking and sign classification pipeline.

## Process Flow
1. **Capture Frame** → Get webcam feed
2. **Hand Detection** → Find hand landmarks using MediaPipe
3. **Region Cropping** → Extract hand area or use full frame
4. **Preprocessing** → Resize and normalize for model input
5. **Classification** → Predict sign and confidence score
6. **Display Results** → Show prediction on video feed

## Key Features
- **Natural Selfie View**: Horizontally flipped display
- **Bounding Box**: Dynamic hand region detection
- **Confidence Scoring**: Shows prediction certainty
- **ESC to Exit**: Press Escape key to close application

## Visual Output
- Real-time hand landmark visualization
- Live prediction overlay (class + confidence)
- Continuous video feed with annotations

In [6]:
while True:
    ret, frame = cap.read()
    if not ret:
        break
    
    # Flip the frame for a natural selfie view
    frame = cv2.flip(frame, 1)
    h, w, _ = frame.shape
    
    # Detect hands (optional)
    results = hands.process(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
    bbox = None
    
    if results.multi_hand_landmarks:
        for hand_landmarks in results.multi_hand_landmarks:
            mp_drawing.draw_landmarks(frame, hand_landmarks, mp_hands.HAND_CONNECTIONS)
            
            x_coords = [lm.x for lm in hand_landmarks.landmark]
            y_coords = [lm.y for lm in hand_landmarks.landmark]
            xmin, xmax = int(min(x_coords) * w), int(max(x_coords) * w)
            ymin, ymax = int(min(y_coords) * h), int(max(y_coords) * h)
            bbox = (xmin, ymin, xmax, ymax)
    
    # If hand detected, crop region
    if bbox:
        x1, y1, x2, y2 = bbox
        hand_img = frame[y1:y2, x1:x2]
    else:
        hand_img = frame
    
    # Preprocess for model
    img = cv2.resize(hand_img, IMG_SIZE)
    img = np.expand_dims(img / 255.0, axis=0)
    
    # Predict
    pred = model.predict(img)
    class_id = np.argmax(pred)
    conf = np.max(pred)
    label = f"{class_names[class_id]} ({conf:.2f})"
    
    # Display
    cv2.putText(frame, label, (30, 50), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
    cv2.imshow("SignVision - Real-time Recognition", frame)
    
    if cv2.waitKey(1) & 0xFF == 27:  # ESC to quit
        break

cap.release()
cv2.destroyAllWindows()


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 75ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 20ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 22ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 23ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 21ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 21ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 12ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 16ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 19ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 16ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 35ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 19ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 30