# ASL Fingerspelling Recognition

In this notebook a brief setup guide along with the processes on how data was collected, preprocesses and trained will be discussed

## Setup (Windows Only)

For running and testing, please install the below libraries:

1. For data collection and preprocessing:
(Open-CV, cvzone, NumPy, MediaPipe, Pillow)

In [None]:
%pip install opencv-python cvzone numpy mediapipe Pillow 

2. For ML models and Training: (Tensorflow, Tensorflow-Hub, Matplotlib, Seaborn)

Please install on the same enviroment. (Required for training)

In [None]:
%pip install tensorflow tensorflow-hub matplotlib seaborn scikit-learn

3. For Real-Time Recognition: [Same as collection and preprocessing] (Open-CV, cvzone, NumPy, MediaPipe, Pillow)

In [None]:
# !! Skip if installed using step-1
%pip install opencv-python cvzone numpy mediapipe Pillow pyttsx3

In [None]:
%pip install psutil GPUtil

NOTE:

1. There may protobuf based errors when trying to run.
Please install/re-install lower version of protobuf if faced (3.20.x or lower)

2. There may be mediapipe based errors, please reinstall medipipe suitable for above protobuf version.

## Custom Collection Pipeline

For the process of collecting images of hand along with basic preprocessing while collection.

Run the below cell after installing relevant dependancies

### Note:

IMPORTANT: WebCam required

1. Below script collects 600, 500x500 images for each alphabets/class (A to Y, no Z) 
2. To start collection press S on the opencv frame (webcam frame)
3. Fill in the baseFolder variable with the repository location to save the classes and images.
4. After collecting images for the class, ENTER KEY prompt will pop up in VS-code search-bar/top-panel. Press enter there which will start process for next class
5. Repeat steps 2 and 3 for subsequent collections
6. After the collection of final class the program will exit/halt.
7. The images can be found in the baseFolder repository

### Preprocessings:

1. Cropping hands from complete frame
2. Adding hand landmarks to cropped frame
3. Converting cropped frame to binary image (black and white pixels only)

In [None]:
import cv2
from cvzone.HandTrackingModule import HandDetector
import numpy as np
import math
import os
import mediapipe as mp

#--------------------
baseFolder = "C:/path/to/your/output_folder"  # <-- change this
#--------------------


# Initialize webcam
cap = cv2.VideoCapture(0)

# Initialize hand detector
detector = HandDetector(maxHands=1)

# Constants
imgSize = 500
# List of letters A-Y (adjust as needed)
letters = [chr(i) for i in range(ord('A'), ord('Y') + 1)]
maxImages = 600          # Total images to capture per class
paddingFactor = 0.45     # Padding percentage

mp_hands = mp.solutions.hands

# Function to process and resize images (canvas will match input channels)
def process_and_resize(imgCrop, aspectRatio, imgSize):
    channels = 1 if len(imgCrop.shape) == 2 else imgCrop.shape[2]
    imgWhite = np.ones((imgSize, imgSize, channels), np.uint8) * 0
    try:
        if aspectRatio > 1:
            # If height > width:
            k = imgSize / imgCrop.shape[0]
            wCal = math.ceil(k * imgCrop.shape[1])
            imgResize = cv2.resize(imgCrop, (wCal, imgSize))
            wGap = math.ceil((imgSize - wCal) / 2)
            imgWhite[:, wGap:wCal + wGap] = imgResize
        else:
            # If width >= height:
            k = imgSize / imgCrop.shape[1]
            hCal = math.ceil(k * imgCrop.shape[0])
            imgResize = cv2.resize(imgCrop, (imgSize, hCal))
            hGap = math.ceil((imgSize - hCal) / 2)
            imgWhite[hGap:hCal + hGap, :] = imgResize
    except Exception as e:
        print(f"Error during image processing: {e}")
        return None
    return imgWhite

# Function to detect skin using YCrCb thresholds (returns a binary mask)
def detect_skin(frame):
    ycrcb = cv2.cvtColor(frame, cv2.COLOR_BGR2YCrCb)
    lower_skin = np.array([0, 133, 77], dtype=np.uint8)
    upper_skin = np.array([255, 173, 127], dtype=np.uint8)
    mask = cv2.inRange(ycrcb, lower_skin, upper_skin)
    
    # Clean up noise
    kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5, 5))
    mask = cv2.morphologyEx(mask, cv2.MORPH_OPEN, kernel)
    mask = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, kernel)
    mask = cv2.GaussianBlur(mask, (5, 5), 0)
    
    # (Optional) clean-up by drawing filled contours
    contours, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    contour_mask = np.zeros_like(mask)
    cv2.drawContours(contour_mask, contours, -1, 255, thickness=cv2.FILLED)
    mask = cv2.bitwise_and(mask, contour_mask)
    
    return mask

# Main loop for each letter
for className in letters:
    print(f"Starting collection for: {className}")
    folder = os.path.join(baseFolder, className)
    os.makedirs(folder, exist_ok=True)

    counter, collecting = 0, False

    while counter < maxImages:
        success, img = cap.read()
        if not success:
            print("Camera access failed.")
            break

        # Detect hand on the full image (only once)
        hands, _ = detector.findHands(img, draw=False)
        if hands:
            # Use the first detected hand
            hand = hands[0]
            bbox = hand['bbox']       # [x, y, w, h]
            lm_list = hand['lmList']    # list of landmarks in full-image coordinates
            x, y, w, h = bbox

            # Calculate padding (based on hand size)
            xPad = int(w * paddingFactor)
            yPad = int(h * paddingFactor)

            # Compute crop boundaries (make sure they’re within image bounds)
            crop_x1 = max(0, x - xPad)
            crop_y1 = max(0, y - yPad)
            crop_x2 = min(x + w + xPad, img.shape[1])
            crop_y2 = min(y + h + yPad, img.shape[0])
            imgCrop = img[crop_y1:crop_y2, crop_x1:crop_x2]

            if imgCrop.size > 0:
                # ----- STEP 1: Draw landmarks on the cropped image using adjusted coordinates -----
                # Create a copy of the crop on which we will draw the landmarks
                imgCrop_landmarked = imgCrop.copy()
                # Adjust each landmark from full-image coordinates to crop coordinates
                for lm in lm_list:
                    adj_x = lm[0] - crop_x1
                    adj_y = lm[1] - crop_y1
                    cv2.circle(imgCrop_landmarked, (adj_x, adj_y), 4, (0, 0, 255), -1)
                # Optionally, also draw the connections between landmarks:
                for connection in mp.solutions.hands.HAND_CONNECTIONS:
                    pt1 = lm_list[connection[0]]
                    pt2 = lm_list[connection[1]]
                    pt1_adjusted = (pt1[0] - crop_x1, pt1[1] - crop_y1)
                    pt2_adjusted = (pt2[0] - crop_x1, pt2[1] - crop_y1)
                    cv2.line(imgCrop_landmarked, pt1_adjusted, pt2_adjusted, (0, 0, 255), 2)

                # ----- STEP 2: Convert the landmarked crop to a binary image -----
                # First, get a binary skin mask from the original crop (without landmarks)
                binaryMask = detect_skin(imgCrop)
                # Create a blank (black) image and fill white where skin is detected
                binary_result = np.zeros_like(imgCrop)
                binary_result[binaryMask > 0] = [255, 255, 255]
                # Now overlay the landmarks (drawn in black) onto the binary image.
                # (Using the same adjusted coordinates from above)
                for lm in lm_list:
                    adj_x = lm[0] - crop_x1
                    adj_y = lm[1] - crop_y1
                    cv2.circle(binary_result, (adj_x, adj_y), 4, (0, 0, 0), -1)
                for connection in mp.solutions.hands.HAND_CONNECTIONS:
                    pt1 = lm_list[connection[0]]
                    pt2 = lm_list[connection[1]]
                    pt1_adjusted = (pt1[0] - crop_x1, pt1[1] - crop_y1)
                    pt2_adjusted = (pt2[0] - crop_x1, pt2[1] - crop_y1)
                    cv2.line(binary_result, pt1_adjusted, pt2_adjusted, (0, 0, 0), 2)

                # ----- STEP 3: Resize for saving/visualization -----
                # Use the crop’s aspect ratio for correct resizing. (You can also use h/w from bbox.)
                aspectRatio = (crop_y2 - crop_y1) / (crop_x2 - crop_x1)
                imgWhite = process_and_resize(binary_result, aspectRatio, imgSize)
                if imgWhite is not None:
                    cv2.imshow("Processed Binary Image", imgWhite)
                    if collecting:
                        counter += 1
                        savePath = os.path.join(folder, f"{className.lower()}_{counter}.jpg")
                        cv2.imwrite(savePath, imgWhite)
                        print(f"Saved {counter}/{maxImages} images for {className}")

        # Show the original live feed
        cv2.imshow("Live Feed with Landmarks", img)
        key = cv2.waitKey(1)
        if key == ord('s'):
            collecting = True
        if key == ord('p'):
            collecting = False

    print(f"Completed collection for {className}")
    input("Press Enter for next class.")

cap.release()
cv2.destroyAllWindows()


## Other Dataset Basic Preprocessing Pipeline

For the basic preprocessing of hand images.

Run the below cell after installing relevant dependancies

### Note:

1. Enter input folder 
2. Enter outPut folder for saving in the repository

### Preprocessings:

1. Adding hand landmarks to cropped frame
2. Converting cropped frame to binary image (black and white pixels only)

In [None]:
import cv2
from cvzone.HandTrackingModule import HandDetector
import numpy as np
import math
import os
import mediapipe as mp

# ----------------------------
# Folder containing subfolders for each class (e.g., A, B, C, …)
inputFolder = "C:/path/to/your/input_folder"    # <-- change this
# Folder where the processed images will be saved
outputFolder = "C:/path/to/your/output_folder"  # <-- change this
# ----------------------------


# ----------------------------
# Configuration and Constants
# ----------------------------
imgSize = 500  # final output image will be imgSize x imgSize
os.makedirs(outputFolder, exist_ok=True)
# Initialize the hand detector (using CVZone)
detector = HandDetector(maxHands=1)
# For drawing hand connections
mp_hands = mp.solutions.hands

# ----------------------------
# Utility Functions
# ----------------------------
def process_and_resize(imgInput, aspectRatio, imgSize):
    """
    Resize an image to fit inside a square canvas while preserving aspect ratio.
    """
    channels = 1 if len(imgInput.shape) == 2 else imgInput.shape[2]
    # Create a blank (black) square image
    imgWhite = np.ones((imgSize, imgSize, channels), np.uint8) * 0
    try:
        if aspectRatio > 1:
            # If the image is taller than wide:
            k = imgSize / imgInput.shape[0]
            wCal = math.ceil(k * imgInput.shape[1])
            imgResize = cv2.resize(imgInput, (wCal, imgSize))
            wGap = math.ceil((imgSize - wCal) / 2)
            imgWhite[:, wGap:wCal + wGap] = imgResize
        else:
            # If the image is wider than tall:
            k = imgSize / imgInput.shape[1]
            hCal = math.ceil(k * imgInput.shape[0])
            imgResize = cv2.resize(imgInput, (imgSize, hCal))
            hGap = math.ceil((imgSize - hCal) / 2)
            imgWhite[hGap:hCal + hGap, :] = imgResize
    except Exception as e:
        print(f"Error during image processing: {e}")
        return None
    return imgWhite

def detect_skin(frame):
    """
    Detect skin regions using YCrCb color space thresholds and return a binary mask.
    """
    ycrcb = cv2.cvtColor(frame, cv2.COLOR_BGR2YCrCb)
    lower_skin = np.array([0, 133, 77], dtype=np.uint8)
    upper_skin = np.array([255, 173, 127], dtype=np.uint8)
    mask = cv2.inRange(ycrcb, lower_skin, upper_skin)
    
    # Clean up noise using morphological operations
    kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5, 5))
    mask = cv2.morphologyEx(mask, cv2.MORPH_OPEN, kernel)
    mask = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, kernel)
    mask = cv2.GaussianBlur(mask, (5, 5), 0)
    
    # (Optional) Draw filled contours to further clean up the mask
    contours, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    contour_mask = np.zeros_like(mask)
    cv2.drawContours(contour_mask, contours, -1, 255, thickness=cv2.FILLED)
    mask = cv2.bitwise_and(mask, contour_mask)
    
    return mask

# ----------------------------
# Main Processing Loop
# ----------------------------
# Iterate over each class folder in the input folder.
for className in os.listdir(inputFolder):
    classPath = os.path.join(inputFolder, className)
    if not os.path.isdir(classPath):
        continue

    print(f"Processing class: {className}")
    # Create corresponding output folder for this class
    outClassFolder = os.path.join(outputFolder, className)
    os.makedirs(outClassFolder, exist_ok=True)

    # Process each image file in the class folder
    for imgFile in os.listdir(classPath):
        imgPath = os.path.join(classPath, imgFile)
        img = cv2.imread(imgPath)
        if img is None:
            print(f"Failed to read image: {imgPath}")
            continue

        # --- Hand Landmark Detection ---
        hands, _ = detector.findHands(img, draw=False)
        if hands:
            # Use the first detected hand
            hand = hands[0]
            lm_list = hand['lmList']

            # --- Step 1: Draw Landmarks on a Copy of the Full Image ---
            # (Since the images are provided already, we work on the full image.)
            img_landmarked = img.copy()
            for lm in lm_list:
                cv2.circle(img_landmarked, (lm[0], lm[1]), 4, (0, 0, 255), -1)
            for connection in mp.solutions.hands.HAND_CONNECTIONS:
                pt1 = lm_list[connection[0]]
                pt2 = lm_list[connection[1]]
                cv2.line(img_landmarked, (pt1[0], pt1[1]), (pt2[0], pt2[1]), (0, 0, 255), 2)

            # --- Step 2: Create a Binary Image Using Skin Detection ---
            binaryMask = detect_skin(img)
            # Create a blank image and fill white where skin is detected
            binary_result = np.zeros_like(img)
            binary_result[binaryMask > 0] = [255, 255, 255]
            # Overlay the landmarks on the binary image (drawn in black)
            for lm in lm_list:
                cv2.circle(binary_result, (lm[0], lm[1]), 4, (0, 0, 0), -1)
            for connection in mp.solutions.hands.HAND_CONNECTIONS:
                pt1 = lm_list[connection[0]]
                pt2 = lm_list[connection[1]]
                cv2.line(binary_result, (pt1[0], pt1[1]), (pt2[0], pt2[1]), (0, 0, 0), 2)

            # --- Step 3: Resize for Consistent Output ---
            aspectRatio = img.shape[0] / img.shape[1]
            imgWhite = process_and_resize(binary_result, aspectRatio, imgSize)
            if imgWhite is not None:
                # Save the processed image to the output folder.
                outPath = os.path.join(outClassFolder, imgFile)
                cv2.imwrite(outPath, imgWhite)
                print(f"Processed and saved: {outPath}")
                # (Optional) Display the processed image.
                cv2.imshow("Processed Image", imgWhite)
                cv2.waitKey(1)
        else:
            print(f"No hand detected in image: {imgPath}")

cv2.destroyAllWindows()


## Data Augmentaition Pipeline

We further preprocess the images by augmenting them based on rotations and horizontal flips

Run the below cell after installing relevant dependancies

### Note:

1. Enter the folder location for the preprocessed dataset
2. Enter the output location to save the augmented dataset
3. There are more options for augmentation, comment or add more as needed

In [None]:
input_folder = "C:/path/to/your/input_folder"  
output_folder = "C:/path/to/your/output_folder"  
os.makedirs(output_folder, exist_ok=True)

#### Augmentation Functions

In [None]:
import os
import random
from PIL import Image, ImageEnhance, ImageOps
import numpy as np

# Define augmentation functions

# Rotate image by a random angle between -30 to 30 degrees
def random_rotation(image):
    angle = random.uniform(-30, 30)  # Rotate between -30 to 30 degrees
    return image.rotate(angle)

# Flip image horizontally with a 50% chance
def random_flip(image):
    if random.choice([True, False]):
        return ImageOps.mirror(image)
    return image

# Adjust brightness by a random factor between 0.7 and 1.3
def random_brightness(image):
    enhancer = ImageEnhance.Brightness(image)
    factor = random.uniform(0.7, 1.3)  # Brightness factor
    return enhancer.enhance(factor)

# Adjust contrast by a random factor between 0.7 and 1.3
def random_contrast(image):
    enhancer = ImageEnhance.Contrast(image)
    factor = random.uniform(0.7, 1.3)  # Contrast factor
    return enhancer.enhance(factor)

# Add random noise to the image
def add_random_noise(image):
    np_image = np.array(image)
    noise = np.random.normal(0, 25, np_image.shape).astype(np.int16)
    noisy_image = np.clip(np_image + noise, 0, 255).astype(np.uint8)
    return Image.fromarray(noisy_image)

# Augment an image using a combination of random transformations
# Comment out or add more functions as needed
def augment_image(image):
    image = random_rotation(image)
    image = random_flip(image)
    # image = random_brightness(image)
    # image = random_contrast(image)
    # image = add_random_noise(image)
    return image


#### Applying Augmentations

In [None]:
# To apply the augmentation to all images in the input folder and save them to the output folder
# Iterate over all subfolders and images
for subdir, _, files in os.walk(input_folder):
    relative_path = os.path.relpath(subdir, input_folder)
    output_subdir = os.path.join(output_folder, relative_path)
    os.makedirs(output_subdir, exist_ok=True)

    for file in files:
        if file.lower().endswith(('png', 'jpg', 'jpeg', 'bmp', 'tiff')):
            input_path = os.path.join(subdir, file)
            output_path = os.path.join(output_subdir, file)

            try:
                with Image.open(input_path) as img:
                    img = img.convert("L")  # Ensure greyscale (black and white)
                    augmented_img = augment_image(img)
                    augmented_img.save(output_path)
            except Exception as e:
                print(f"Error processing {input_path}: {e}")

print("Data augmentation completed!")

### Combining Augmented Data With Preprocessed Dataset

Creating a new folder for combined data

### Note:

1. Enter the preprocessed dataset location
2. Enter the augmented dataset location
3. Enter the new combined datasets save location

In [None]:
dataset1 = "C:/path/to/your/input_folder"  
dataset2 = "C:/path/to/your/input_folder"  
output_dataset = "C:/path/to/your/output_folder"  

In [None]:
import os
import shutil

# Create the output directory if it doesn't exist
os.makedirs(output_dataset, exist_ok=True)

# Function to merge datasets with renaming
def merge_datasets(source_dir, target_dir, suffix=""):
    for class_name in os.listdir(source_dir):
        source_class_path = os.path.join(source_dir, class_name)
        target_class_path = os.path.join(target_dir, class_name)
        
        if os.path.isdir(source_class_path):
            # Create the class folder in the target if it doesn't exist
            if not os.path.exists(target_class_path):
                os.makedirs(target_class_path)
            
            for file_name in os.listdir(source_class_path):
                source_file_path = os.path.join(source_class_path, file_name)
                # Add the specified suffix to the file name
                base_name, ext = os.path.splitext(file_name)
                file_name = f"{base_name}{suffix}{ext}"
                target_file_path = os.path.join(target_class_path, file_name)
                
                # Copy the file to the target directory
                shutil.copy2(source_file_path, target_file_path)

# Merge the main dataset
merge_datasets(dataset1, output_dataset, suffix="_black")

# Merge the augmented dataset with "_AUG" renaming
merge_datasets(dataset2, output_dataset, suffix="_white")


print(f"Datasets merged into: {output_dataset}")

## Training using ML Models

In this pipeline the combined dataset will be used to train the ML models

### Note:

Run the below cell which will lead to a UI for training models. 
1. Enter your dataset location
2. Selct the model to train
3. Fill in or keep the default trining parameters
4. Check or uncheck cross validation and write number of folds based on requirements
5. Each epoch will show the training status in UI
6. At end the results will be saved in modelName_RESULTS FOLDER and Models in TrainedBinary2Model

### Warning: 

- Training is resource costly! 
- Ensure that you have right setup before training. 
- Process can take more than a day if trained on CPU (varies by model). 
- Highly recommended to train using GPU if Available
- Recommend to use the already trained models in the repository

#### Run Below Code to Open Training Panel:

In [None]:
%run TrainerV3.py

## Real-Time Recognition

In this section real-time recognition will be tested out based on the predictions from trained model. 

### Note:

IMPORTANT: WebCam required

1. Enter the models location before running
2. The Ensure that the lighting conditions are ideal. 
3. Try to perform in front of dark background if possible, or near non reflective walls
4. Be thourough with ASL fingerspelling signs in order for proper classification

In [None]:
from cvzone.ClassificationModule import Classifier

classifier = Classifier("C:/Users/User/OneDrive/Documents/SignLanguageApp/TrainedBinary2Model/MobileNetV2_model.h5")

#### WebCam access and Processing

In [None]:
import cv2
import mediapipe as mp
from cvzone.HandTrackingModule import HandDetector
import numpy as np
import math


def detect_skin(frame):
    # Convert to YCrCb and equalize the luminance channel
    ycrcb = cv2.cvtColor(frame, cv2.COLOR_BGR2YCrCb)
    y_channel = ycrcb[:, :, 0]
    y_eq = cv2.equalizeHist(y_channel)
    ycrcb[:, :, 0] = y_eq

    # Adjusted thresholds might be needed after equalization.
    lower_skin = np.array([0, 133, 77], dtype=np.uint8)
    upper_skin = np.array([255, 173, 127], dtype=np.uint8)
    mask = cv2.inRange(ycrcb, lower_skin, upper_skin)
    
    # Noise reduction using morphological operations
    kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5, 5))
    mask = cv2.morphologyEx(mask, cv2.MORPH_OPEN, kernel)
    mask = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, kernel)
    mask = cv2.GaussianBlur(mask, (5, 5), 0)
    
    # Optionally, keep only the largest contour (if needed)
    contours, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    contour_mask = np.zeros_like(mask)
    if contours:
        cv2.drawContours(contour_mask, contours, -1, 255, thickness=cv2.FILLED)
    mask = cv2.bitwise_and(mask, contour_mask)
    
    return mask


# Initialize camera, detector
cap = cv2.VideoCapture(0)
detector = HandDetector(maxHands=1)


offset = 45
imgSize = 250
labels = ["A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y"]

# Use Mediapipe’s hand connections to draw lines between landmarks.
mp_hands = mp.solutions.hands
hand_connections = mp_hands.HAND_CONNECTIONS


while True:
    success, img = cap.read()
    if not success:
        break
    imgOutput = img.copy()
    hands, img = detector.findHands(img, draw=False)
    
    if hands:
        hand = hands[0]
        x, y, w, h = hand['bbox']
        y1, y2 = max(0, y - offset), min(img.shape[0], y + h + offset)
        x1, x2 = max(0, x - offset), min(img.shape[1], x + w + offset)
        imgCrop = img[y1:y2, x1:x2]
        
        if imgCrop.shape[0] > 0 and imgCrop.shape[1] > 0:
            # Draw landmarks on a copy of the cropped image (for visualization)
            imgCrop_landmarked = imgCrop.copy()
            if 'lmList' in hand:
                lm_list = hand['lmList']
                for lm in lm_list:
                    cv2.circle(imgCrop_landmarked, (lm[0] - x1, lm[1] - y1), 4, (0, 0, 255), -1)
                for connection in mp_hands.HAND_CONNECTIONS:
                    if connection[0] < len(lm_list) and connection[1] < len(lm_list):
                        pt1 = (lm_list[connection[0]][0] - x1, lm_list[connection[0]][1] - y1)
                        pt2 = (lm_list[connection[1]][0] - x1, lm_list[connection[1]][1] - y1)
                        cv2.line(imgCrop_landmarked, pt1, pt2, (0, 0, 255), 2)
            
            # Process the cropped image to create a binary image
            binaryMask = detect_skin(imgCrop)
            # Create a black background and set the hand area to white:
            binary_result = np.zeros_like(imgCrop)
            binary_result[binaryMask > 0] = [255, 255, 255]
            
            # Overlay the landmarks (black) on the binary image:
            if 'lmList' in hand:
                for lm in lm_list:
                    cv2.circle(binary_result, (lm[0] - x1, lm[1] - y1), 4, (0, 0, 0), -1)
                for connection in mp_hands.HAND_CONNECTIONS:
                    pt1 = (lm_list[connection[0]][0] - x1, lm_list[connection[0]][1] - y1)
                    pt2 = (lm_list[connection[1]][0] - x1, lm_list[connection[1]][1] - y1)
                    cv2.line(binary_result, pt1, pt2, (0, 0, 0), 2)
            
            # Resize the binary_result to a fixed size (e.g., 250x250) while preserving the aspect ratio
            aspectRatio = h / w
            imgWhite = np.ones((imgSize, imgSize), np.uint8) * 0
            if aspectRatio > 1:
                k = imgSize / h
                wCal = math.ceil(k * w)
                imgResize = cv2.resize(binary_result, (wCal, imgSize))
                wGap = math.ceil((imgSize - wCal) / 2)
                imgWhite[:, wGap:wCal + wGap] = cv2.cvtColor(imgResize, cv2.COLOR_BGR2GRAY)
            else:
                k = imgSize / w
                hCal = math.ceil(k * h)
                imgResize = cv2.resize(binary_result, (imgSize, hCal))
                hGap = math.ceil((imgSize - hCal) / 2)
                imgWhite[hGap:hCal + hGap, :] = cv2.cvtColor(imgResize, cv2.COLOR_BGR2GRAY)
            
            # You can now pass imgWhite (or its RGB version) to your classifier.
            imgWhiteRGB = cv2.cvtColor(imgWhite, cv2.COLOR_GRAY2BGR)
            prediction, index = classifier.getPrediction(imgWhiteRGB, draw=False)
            
            # Display classification results if the prediction confidence is high enough.
            if prediction[index] > 0.75 and 0 <= index < len(labels):
                cv2.rectangle(imgOutput, (x - offset, y - offset - 50),
                              (x - offset + 90, y - offset - 50 + 50), (255, 0, 255), cv2.FILLED)
                cv2.putText(imgOutput, labels[index], (x, y - 26),
                            cv2.FONT_HERSHEY_COMPLEX, 1.7, (255, 255, 255), 2)
                cv2.rectangle(imgOutput, (x - offset, y - offset),
                              (x + w + offset, y + h + offset), (255, 0, 255), 4)
            
            cv2.imshow("Processed Binary Image", imgWhite)
            cv2.imshow("Hand Landmarks", imgCrop_landmarked)
    
    cv2.imshow("Image", imgOutput)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break


cap.release()
cv2.destroyAllWindows()


### Real-Time recognition UI/Tkinter App

#### Features:

1. Panel displaying main frame with prediction label, binary frame, landmark frame to understand workings of the process.
2. Word typing. Use space bar to save the predicted letter. For white_space remove hand from frame and press space bar.
3. --More to be added--


### Note:
 
IMPORTANT: WebCam required

1. Change the model name in "ASLRecogAPP.py" before running the app.
2. The Ensure that the lighting conditions are ideal. 
3. Try to perform in front of dark background if possible, or near non reflective walls
4. Be thourough with ASL fingerspelling signs in order for proper classification

Run Below Cell to open the app:

In [None]:
%run ASLRecogAPP.py

No Labels Found
No Labels Found


In [None]:
import cv2
import mediapipe as mp
from cvzone.HandTrackingModule import HandDetector
from cvzone.ClassificationModule import Classifier
import numpy as np
import math
import tkinter as tk
from tkinter import scrolledtext, messagebox
from PIL import Image, ImageTk
import time  # for timing delays
import pyttsx3  # for text-to-speech
import os  # for creating directories and saving files
import psutil  # for resource usage info
import platform
try:
    import GPUtil
except ImportError:
    GPUtil = None

# Determine the resampling method for resizing images
try:
    resample_method = Image.Resampling.LANCZOS
except AttributeError:
    resample_method = Image.ANTIALIAS

# ---------------------------
# Processing Code (unchanged)
# ---------------------------
def detect_skin(frame):
    # Convert to YCrCb and equalize the luminance channel
    ycrcb = cv2.cvtColor(frame, cv2.COLOR_BGR2YCrCb)
    y_channel = ycrcb[:, :, 0]
    y_eq = cv2.equalizeHist(y_channel)
    ycrcb[:, :, 0] = y_eq

    lower_skin = np.array([0, 133, 77], dtype=np.uint8)
    upper_skin = np.array([255, 173, 127], dtype=np.uint8)
    mask = cv2.inRange(ycrcb, lower_skin, upper_skin)
    
    # Noise reduction using morphological operations
    kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5, 5))
    mask = cv2.morphologyEx(mask, cv2.MORPH_OPEN, kernel)
    mask = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, kernel)
    mask = cv2.GaussianBlur(mask, (5, 5), 0)
    
    # Optionally, keep only the largest contour (if needed)
    contours, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    contour_mask = np.zeros_like(mask)
    if contours:
        cv2.drawContours(contour_mask, contours, -1, 255, thickness=cv2.FILLED)
    mask = cv2.bitwise_and(mask, contour_mask)
    
    return mask

# ---------------------------
# Initialization (unchanged)
# ---------------------------
cap = cv2.VideoCapture(0)
detector = HandDetector(maxHands=1)

# Initialize MediaPipe Selfie Segmentation
mp_selfie_segmentation = mp.solutions.selfie_segmentation
segmentation_module = mp_selfie_segmentation.SelfieSegmentation(model_selection=0)

# Define model options (adjust file paths as needed)
model_options = {
    "MobileNetV2 (T2)": "C:/Users/User/OneDrive/Documents/SignLanguageApp/TrainedBinaryNewModel/MobileNetV2_model.h5",
    "VGG16 (T2)": "C:/Users/User/OneDrive/Documents/SignLanguageApp/TrainedBinaryNewModel/VGG16_model.h5",
    "DenseNet121 (T2)": "C:/Users/User/OneDrive/Documents/SignLanguageApp/TrainedBinaryNewModel/DenseNet121_model.h5",
    "VGG19 (T2)": "C:/Users/User/OneDrive/Documents/SignLanguageApp/TrainedBinaryNewModel/VGG19_model.h5",
    "Fusion Model (T2)": "C:/Users/User/OneDrive/Documents/SignLanguageApp/TrainedBinaryNewModel/FusionModel_model.h5",
    "MobileNet (T2)": "C:/Users/User/OneDrive/Documents/SignLanguageApp/TrainedBinaryNewModel/MobileNet_model.h5",
    "CNN Model (T2)": "C:/Users/User/OneDrive/Documents/SignLanguageApp/TrainedBinaryNewModel/BasicCNN_model.h5",
    "NASNetMobile (T2)": "C:/Users/User/OneDrive/Documents/SignLanguageApp/TrainedBinaryNewModel/NASNetMobile_model.h5",
}

# Initially load the default model
selected_model = "MobileNetV2 (T2)"
classifier = Classifier(model_options[selected_model])

offset = 45
imgSize = 250
labels = ["A", "B", "C", "D", "E", "F", "G", "H", "HI", "I", "J", "K", "L", 
          "M", "N", "O", "P", "Q", "R", "S", " ", "T", "U", "V", "W", "X", "Y", "Z"]

mp_hands = mp.solutions.hands  # for landmark connections
hand_connections = mp_hands.HAND_CONNECTIONS

# ---------------------------
# Globals for word typing and auto type (unchanged)
# ---------------------------
current_word = ""
current_prediction = ""
prediction_buffer = []  # Each element: (timestamp, prediction_array)

last_prediction_update_time = 0
prediction_update_delay = 0.05
last_space_time = 0
space_cooldown = 0.5

# Globals for FPS measurements
last_frame_time = time.time()
fps_values = []
frame_counter = 0
pred_counter = 0
last_fps_report_time = time.time()

# Globals for Auto Type functionality
auto_type_start_time = None
auto_type_last_prediction = None

# ---------------------------
# Tkinter UI Setup (Improved)
# ---------------------------
root = tk.Tk()
root.title("ASL - Fingerspelling Recognition")
# Increase overall window height to accommodate additional content
root.geometry("2100x970")
root.configure(bg="#1e1e1e")

# Configure grid layout for the root window
root.grid_rowconfigure(0, weight=1)
root.grid_columnconfigure(0, weight=0)
root.grid_columnconfigure(1, weight=1)
root.grid_columnconfigure(2, weight=0)

# ---------------------------
# Menubar (New)
# ---------------------------
menubar = tk.Menu(root, bg="#2c2f33", fg="#ffffff")
# File Menu
file_menu = tk.Menu(menubar, tearoff=0, bg="#2c2f33", fg="#ffffff")
file_menu.add_command(label="Exit", command=root.quit)
menubar.add_cascade(label="File", menu=file_menu)
# Help Menu
help_menu = tk.Menu(menubar, tearoff=0, bg="#2c2f33", fg="#ffffff")
help_menu.add_command(label="About", command=lambda: messagebox.showinfo("About", "ASL Fingerspelling Recognition App\nDeveloped with OpenCV, MediaPipe, and Tkinter"))
menubar.add_cascade(label="Help", menu=help_menu)
root.config(menu=menubar)

# ---------------------------
# Loading Splash Screen (unchanged)
# ---------------------------
splash = tk.Toplevel()
splash.overrideredirect(True)
splash.geometry("400x300+600+300")
splash.configure(bg="#1e1e1e")
splash_label = tk.Label(splash, text="Loading, please wait...", font=("Segoe UI", 24), bg="#1e1e1e", fg="#ffffff")
splash_label.pack(expand=True, fill="both")

root.withdraw()  # Hide main UI during loading

def close_splash():
    splash.destroy()
    root.deiconify()

root.after(2000, close_splash)  # Show main UI after 2 seconds

# ---------------------------
# Left Panel: Performance Metrics, Resource Info, and Log
# ---------------------------
fps_frame = tk.Frame(root, width=500, height=900, bg="#2c2f33", bd=2, relief="ridge")
fps_frame.grid(row=0, column=0, sticky="nsew", padx=10, pady=10)
fps_frame.grid_propagate(False)

fps_frame_title = tk.Label(fps_frame, text="Performance Metrics", font=("Segoe UI", 18, "bold"),
                           bg="#2c2f33", fg="#00ffff")
fps_frame_title.pack(pady=10)

# FPS info labels
max_fps_var = tk.StringVar(value="Max FPS: 0")
min_fps_var = tk.StringVar(value="Min FPS: 0")
avg_fps_var = tk.StringVar(value="Avg FPS: 0")
pred_rate_var = tk.StringVar(value="Predictions per frame: 0")
pps_var = tk.StringVar(value="PPS: 0")

model_fps_label = tk.Label(fps_frame, text="Model: " + selected_model, font=("Segoe UI", 14),
                           bg="#2c2f33", fg="#00ff00")
model_fps_label.pack(pady=5)

max_fps_label = tk.Label(fps_frame, textvariable=max_fps_var, font=("Segoe UI", 14),
                         bg="#2c2f33", fg="#ffffff")
max_fps_label.pack(pady=5)

min_fps_label = tk.Label(fps_frame, textvariable=min_fps_var, font=("Segoe UI", 14),
                         bg="#2c2f33", fg="#ffffff")
min_fps_label.pack(pady=5)

avg_fps_label = tk.Label(fps_frame, textvariable=avg_fps_var, font=("Segoe UI", 14),
                         bg="#2c2f33", fg="#ffffff")
avg_fps_label.pack(pady=5)

pred_rate_label = tk.Label(fps_frame, textvariable=pred_rate_var, font=("Segoe UI", 14),
                           bg="#2c2f33", fg="#ffffff")
pred_rate_label.pack(pady=5)

pps_label = tk.Label(fps_frame, textvariable=pps_var, font=("Segoe UI", 14),
                     bg="#2c2f33", fg="#ffffff")
pps_label.pack(pady=5)

# Resource Info Section
resource_frame = tk.Frame(fps_frame, bg="#2c2f33")
resource_frame.pack(pady=20)

resource_title = tk.Label(resource_frame, text="Resource Info", font=("Segoe UI", 16, "bold"),
                          bg="#2c2f33", fg="#ffcc00")
resource_title.pack(pady=5)

cpu_info_var = tk.StringVar(value="CPU: Loading...")
gpu_info_var = tk.StringVar(value="GPU: Loading...")
memory_info_var = tk.StringVar(value="Memory: Loading...")

cpu_info_label = tk.Label(resource_frame, textvariable=cpu_info_var, font=("Segoe UI", 12),
                          bg="#2c2f33", fg="#ffffff")
cpu_info_label.pack(pady=2)

gpu_info_label = tk.Label(resource_frame, textvariable=gpu_info_var, font=("Segoe UI", 12),
                          bg="#2c2f33", fg="#ffffff")
gpu_info_label.pack(pady=2)

memory_info_label = tk.Label(resource_frame, textvariable=memory_info_var, font=("Segoe UI", 12),
                             bg="#2c2f33", fg="#ffffff")
memory_info_label.pack(pady=2)

def update_resource_info():
    # CPU Info
    cpu_model = platform.processor()
    if not cpu_model:
        cpu_model = "Unknown CPU"
    cores = psutil.cpu_count(logical=True)
    cpu_info_var.set(f"CPU: {cpu_model} ({cores} cores)")
    
    # GPU Info
    if GPUtil is not None:
        gpus = GPUtil.getGPUs()
        if gpus:
            gpu_names = ", ".join([gpu.name for gpu in gpus])
            gpu_info_var.set("GPU: " + gpu_names)
        else:
            gpu_info_var.set("GPU: None")
    else:
        gpu_info_var.set("GPU: Not available")
    
    # Memory Info
    mem = psutil.virtual_memory()
    total_gb = mem.total / (1024**3)
    available_gb = mem.available / (1024**3)
    memory_info_var.set(f"Memory: {available_gb:.1f}GB free / {total_gb:.1f}GB")
    
    root.after(1000, update_resource_info)

update_resource_info()

# Log box
log_text = scrolledtext.ScrolledText(fps_frame, width=40, height=10,
                                       bg="#23272a", fg="#ffffff", font=("Segoe UI", 12))
log_text.pack(pady=10, fill="both", expand=True)
log_text.configure(state='disabled')

# ---------------------------
# Center Panel: Main Camera Feed
# ---------------------------
left_frame = tk.Frame(root, width=900, height=900, bg="#141414", bd=2, relief="ridge")
left_frame.grid(row=0, column=1, sticky="nsew", padx=10, pady=10)
left_frame.grid_propagate(False)

main_image_label = tk.Label(left_frame, bg="#141414")
main_image_label.place(relwidth=1, relheight=1)

# ---------------------------
# Right Panel: Controls, Typed Text, and Settings
# ---------------------------
# Define these variables globally so they can be used in the Settings pop-up.
selected_model_var = tk.StringVar(value=selected_model)
auto_type_enabled_var = tk.BooleanVar(value=False)
segmentation_enabled_var = tk.BooleanVar(value=False)

right_frame = tk.Frame(root, width=500, height=970, bg="#1f1f1f", bd=2, relief="ridge")
right_frame.grid(row=0, column=2, sticky="nsew", padx=10, pady=10)
right_frame.grid_propagate(False)
right_frame.grid_columnconfigure(0, weight=1)

# Header (Centered)
header_label = tk.Label(right_frame, text="ASL Fingerspelling Recognition", font=("Segoe UI", 20, "bold"),
                        bg="#1f1f1f", fg="#00ffff", anchor="center")
header_label.grid(row=0, column=0, pady=(15, 10), sticky="ew")

# Top Buttons Frame: Settings and Help
def open_settings():
    settings_window = tk.Toplevel(root)
    settings_window.title("Settings")
    settings_window.geometry("400x300")
    settings_window.configure(bg="#1f1f1f")
    
    tk.Label(settings_window, text="Model Selection:", font=("Segoe UI", 14), bg="#1f1f1f", fg="#ffffff").pack(pady=5)
    model_option_menu_settings = tk.OptionMenu(settings_window, selected_model_var, *model_options.keys(), command=lambda x: change_model())
    model_option_menu_settings.config(font=("Segoe UI", 14), bg="#333333", fg="#ffffff", highlightthickness=0)
    model_option_menu_settings["menu"].config(bg="#333333", fg="#ffffff")
    model_option_menu_settings.pack(pady=5)
    
    segmentation_check_settings = tk.Checkbutton(settings_window, text="Enable Segmentation Filter", variable=segmentation_enabled_var,
                                                   font=("Segoe UI", 14), bg="#1f1f1f", fg="#ffffff",
                                                   activebackground="#1f1f1f", activeforeground="#00ffff", selectcolor="#000000")
    segmentation_check_settings.pack(pady=5)
    
    auto_type_check = tk.Checkbutton(settings_window, text="Enable Auto Type", variable=auto_type_enabled_var,
                                     font=("Segoe UI", 14), bg="#1f1f1f", fg="#ffffff",
                                     activebackground="#1f1f1f", activeforeground="#00ffff", selectcolor="#000000")
    auto_type_check.pack(pady=5)
    
    tk.Button(settings_window, text="Close", command=settings_window.destroy, font=("Segoe UI", 14),
              bg="#333333", fg="#00ffff").pack(pady=20)

def open_help():
    help_text = (
        "• Ensure the camera is at least 720p\n"
        "• Ensure the environment is properly lit up and has a smooth background\n"
        "• Press the Space Bar key to enter the predicted letter/white space\n"
        "• Press the backspace key to delete the last character\n"
        "• Settings Button:\n"
        "   • Choose a relevant model if needed (default is MobileNetV2). Changing is not recommended\n"
        "   • Select the segmentation rule if in a bright room\n"
        "   • Select Auto-type to automatically enter letters (3 seconds for a letter)\n"
        "   • Save by closing"
    )
    messagebox.showinfo("Help", help_text)

# Frame to hold Settings and Help buttons side-by-side
top_buttons_frame = tk.Frame(right_frame, bg="#1f1f1f")
top_buttons_frame.grid(row=1, column=0, pady=10, sticky="ew")
top_buttons_frame.grid_columnconfigure(0, weight=1)
top_buttons_frame.grid_columnconfigure(1, weight=1)

settings_button = tk.Button(top_buttons_frame, text="Settings", font=("Segoe UI", 14),
                             bg="#333333", fg="#00ffff", command=open_settings)
settings_button.grid(row=0, column=0, padx=5, pady=5, sticky="ew")

help_button = tk.Button(top_buttons_frame, text="Help", font=("Segoe UI", 14),
                        bg="#333333", fg="#00ffff", command=open_help)
help_button.grid(row=0, column=1, padx=5, pady=5, sticky="ew")

# Media Display Section (Binary and Landmarks)
FIXED_WIDTH = 280
FIXED_HEIGHT = 280

media_frame = tk.Frame(right_frame, bg="#1f1f1f")
media_frame.grid(row=2, column=0, padx=15, pady=10)
media_frame.grid_columnconfigure(0, weight=1)
media_frame.grid_columnconfigure(1, weight=1)

# Binary Image Display
binary_frame = tk.Frame(media_frame, width=FIXED_WIDTH, height=FIXED_HEIGHT, bg="#333333", bd=1, relief="sunken")
binary_frame.grid(row=0, column=0, padx=15, pady=10)
binary_frame.grid_propagate(False)
binary_title = tk.Label(binary_frame, text="Binary Image", font=("Segoe UI", 12),
                          bg="#333333", fg="#ffffff")
binary_title.pack(side="top", fill="x")
binary_label = tk.Label(binary_frame, bg="#333333")
binary_label.pack(expand=True, fill="both")

# Hand Landmarks Display
landmarks_frame = tk.Frame(media_frame, width=FIXED_WIDTH, height=FIXED_HEIGHT, bg="#333333", bd=1, relief="sunken")
landmarks_frame.grid(row=0, column=1, padx=15, pady=10)
landmarks_frame.grid_propagate(False)
landmarks_title = tk.Label(landmarks_frame, text="Landmarks", font=("Segoe UI", 12),
                            bg="#333333", fg="#ffffff")
landmarks_title.pack(side="top", fill="x")
landmarks_label = tk.Label(landmarks_frame, bg="#333333")
landmarks_label.pack(expand=True, fill="both")

# 3. Prediction Info - now only the prediction label is shown (no neon green text)
prediction_info_frame = tk.Frame(right_frame, bg="#1f1f1f")
prediction_info_frame.grid(row=3, column=0, padx=15, pady=10, sticky="ew")
prediction_info_frame.grid_columnconfigure(0, weight=1)

prediction_label = tk.Label(prediction_info_frame, text="", font=("Segoe UI", 16),
                            bg="#1f1f1f", fg="#ff00ff", anchor="w")
prediction_label.grid(row=0, column=0, padx=5, pady=5, sticky="w")
prediction_var = tk.StringVar(value="")
prediction_label.config(textvariable=prediction_var)

# 4. Buttons (Clear and Pronounce) - Centered
buttons_frame = tk.Frame(right_frame, bg="#1f1f1f")
buttons_frame.grid(row=4, column=0, padx=15, pady=10, sticky="ew")
buttons_frame.grid_columnconfigure(0, weight=1)
buttons_frame.grid_columnconfigure(1, weight=1)

def clear_word():
    global current_word
    current_word = ""
    log_text.configure(state='normal')
    log_text.insert(tk.END, "Clear pressed: Word cleared\n")
    log_text.see(tk.END)
    log_text.configure(state='disabled')
    typed_text.configure(state='normal')
    typed_text.delete("1.0", tk.END)
    typed_text.configure(state='disabled')

clear_button = tk.Button(buttons_frame, text="Clear", font=("Segoe UI", 14),
                           bg="#333333", fg="#00ffff", command=clear_word)
clear_button.grid(row=0, column=0, padx=5, pady=5, sticky="ew")

def pronounce_sentence():
    global current_word
    engine = pyttsx3.init()
    sentence = current_word.strip()
    if sentence == "":
        log_text.configure(state='normal')
        log_text.insert(tk.END, "No sentence to pronounce.\n")
        log_text.see(tk.END)
        log_text.configure(state='disabled')
        return
    log_text.configure(state='normal')
    log_text.insert(tk.END, f"Pronouncing: {sentence}\n")
    log_text.see(tk.END)
    log_text.configure(state='disabled')
    engine.say(sentence)
    engine.runAndWait()

pronounce_button = tk.Button(buttons_frame, text="Pronounce", font=("Segoe UI", 14),
                              bg="#333333", fg="#00ffff", command=pronounce_sentence)
pronounce_button.grid(row=0, column=1, padx=5, pady=5, sticky="ew")

# 5. Instructions Label
instructions_label = tk.Label(
    right_frame,
    text="1. Enter word: Space Bar  |  2. Delete last character: Backspace",
    font=("Segoe UI", 10),
    bg="#1f1f1f",
    fg="#aaaaaa"
)
instructions_label.grid(row=5, column=0, padx=15, pady=(0, 10), sticky="ew")

# 6. Typed Text Box (this will show the current word)
typed_text = scrolledtext.ScrolledText(right_frame, width=40, height=10,
                                       bg="#23272a", fg="#ffffff", font=("Segoe UI", 12))
typed_text.grid(row=6, column=0, padx=15, pady=10, sticky="ew")
typed_text.configure(state='disabled')

# ---------------------------
# Key Binding Functions (unchanged except removal of word_label updates)
# ---------------------------
def on_space_press(event):
    global current_word, current_prediction, last_space_time
    current_time = time.time()
    if current_time - last_space_time < space_cooldown:
        return
    last_space_time = current_time
    if current_prediction != "":
        current_word += current_prediction
        log_text.configure(state='normal')
        log_text.insert(tk.END, f"Entered: {current_prediction}\n")
        log_text.see(tk.END)
        log_text.configure(state='disabled')
    else:
        current_word += " "
        log_text.configure(state='normal')
        log_text.insert(tk.END, "Entered: [space]\n")
        log_text.see(tk.END)
        log_text.configure(state='disabled')
    typed_text.configure(state='normal')
    typed_text.delete("1.0", tk.END)
    typed_text.insert(tk.END, current_word)
    typed_text.configure(state='disabled')

def on_backspace(event):
    global current_word
    if current_word:
        current_word = current_word[:-1]
        log_text.configure(state='normal')
        log_text.insert(tk.END, "Backspace pressed: Removed last character\n")
        log_text.see(tk.END)
        log_text.configure(state='disabled')
        typed_text.configure(state='normal')
        typed_text.delete("1.0", tk.END)
        typed_text.insert(tk.END, current_word)
        typed_text.configure(state='disabled')

root.bind("<space>", on_space_press)
root.bind("<BackSpace>", on_backspace)

# ---------------------------
# Main update_frame() function (unchanged except removal of word_label updates)
# ---------------------------
def update_frame():
    global current_prediction, last_frame_time, fps_values, frame_counter, pred_counter, last_fps_report_time
    global auto_type_start_time, auto_type_last_prediction, current_word

    success, img = cap.read()
    if not success:
        root.after(10, update_frame)
        return

    # FPS Measurement (per frame)
    current_time = time.time()
    dt = current_time - last_frame_time
    current_fps = 1.0 / dt if dt > 0 else 0
    last_frame_time = current_time

    fps_values.append(current_fps)
    frame_counter += 1

    imgOutput = img.copy()
    hands, img = detector.findHands(img, draw=False)

    if hands:
        pred_counter += 1

    imgWhite_for_display = None
    imgCrop_landmarked_for_display = None

    if hands:
        hand = hands[0]
        x, y, w, h = hand['bbox']
        y1, y2 = max(0, y - offset), min(img.shape[0], y + h + offset)
        x1, x2 = max(0, x - offset), min(img.shape[1], x + w + offset)
        imgCrop = img[y1:y2, x1:x2]

        if segmentation_enabled_var.get():
            rgb_crop = cv2.cvtColor(imgCrop, cv2.COLOR_BGR2RGB)
            seg_results = segmentation_module.process(rgb_crop)
            seg_mask = seg_results.segmentation_mask
            threshold_seg = 0.5
            mask_binary_seg = (seg_mask > threshold_seg).astype(np.uint8) * 255
            mask_binary_seg = cv2.cvtColor(mask_binary_seg, cv2.COLOR_GRAY2BGR)
            imgCrop = cv2.bitwise_and(imgCrop, mask_binary_seg)

        if imgCrop.shape[0] > 0 and imgCrop.shape[1] > 0:
            imgCrop_landmarked = imgCrop.copy()
            if 'lmList' in hand:
                lm_list = hand['lmList']
                for lm in lm_list:
                    cv2.circle(imgCrop_landmarked, (lm[0] - x1, lm[1] - y1), 4, (0, 0, 255), -1)
                for connection in mp_hands.HAND_CONNECTIONS:
                    if connection[0] < len(lm_list) and connection[1] < len(lm_list):
                        pt1 = (lm_list[connection[0]][0] - x1, lm_list[connection[0]][1] - y1)
                        pt2 = (lm_list[connection[1]][0] - x1, lm_list[connection[1]][1] - y1)
                        cv2.line(imgCrop_landmarked, pt1, pt2, (0, 0, 255), 2)

            binaryMask = detect_skin(imgCrop)
            binary_result = np.zeros_like(imgCrop)
            binary_result[binaryMask > 0] = [255, 255, 255]

            if 'lmList' in hand:
                for lm in lm_list:
                    cv2.circle(binary_result, (lm[0] - x1, lm[1] - y1), 4, (0, 0, 0), -1)
                for connection in mp_hands.HAND_CONNECTIONS:
                    pt1 = (lm_list[connection[0]][0] - x1, lm_list[connection[0]][1] - y1)
                    pt2 = (lm_list[connection[1]][0] - x1, lm_list[connection[1]][1] - y1)
                    cv2.line(binary_result, pt1, pt2, (0, 0, 0), 2)

            aspectRatio = h / w
            imgWhite = np.ones((imgSize, imgSize), np.uint8) * 0
            if aspectRatio > 1:
                k = imgSize / h
                wCal = math.ceil(k * w)
                imgResize = cv2.resize(binary_result, (wCal, imgSize))
                wGap = math.ceil((imgSize - wCal) / 2)
                imgWhite[:, wGap:wCal + wGap] = cv2.cvtColor(imgResize, cv2.COLOR_BGR2GRAY)
            else:
                k = imgSize / w
                hCal = int(k * h)
                imgResize = cv2.resize(binary_result, (imgSize, hCal))
                hGap = (imgSize - hCal) // 2
                imgWhite[hGap:hGap + hCal, :] = cv2.cvtColor(imgResize, cv2.COLOR_BGR2GRAY)

            imgWhiteRGB = cv2.cvtColor(imgWhite, cv2.COLOR_GRAY2BGR)
            prediction, index = classifier.getPrediction(imgWhiteRGB, draw=False)

            prediction_buffer.append((current_time, prediction))
            prediction_buffer[:] = [(t, p) for (t, p) in prediction_buffer if current_time - t <= 1.0]
            if prediction_buffer:
                avg_prediction = np.mean([p for (t, p) in prediction_buffer], axis=0)
                best_index = np.argmax(avg_prediction)
                best_letter = labels[best_index]
                best_prob = avg_prediction[best_index]
                current_prediction = best_letter

                box_x = x - offset
                box_y = y - offset - 50
                box_width = 150
                box_height = 50
                cv2.rectangle(imgOutput, (box_x, box_y), (box_x + box_width, box_y + box_height), (255, 0, 255), cv2.FILLED)
                text = f"{best_letter}: {best_prob:.2f}"
                cv2.putText(imgOutput, text, (box_x + 5, box_y + 35), cv2.FONT_HERSHEY_COMPLEX, 0.8, (255, 255, 255), 2)
                cv2.rectangle(imgOutput, (x - offset, y - offset), (x + w + offset, y + h + offset), (255, 0, 255), 4)
            else:
                current_prediction = ""
            
            prediction_var.set("Prediction: " + (current_prediction if current_prediction != "" else "[None]"))
            
            imgWhite_for_display = imgWhite.copy()
            imgCrop_landmarked_for_display = imgCrop_landmarked.copy()
    else:
        current_prediction = ""
        prediction_var.set("Prediction: [None]")

    # ---------------------------
    # Auto Type Functionality
    # ---------------------------
    if auto_type_enabled_var.get():
        if hands:
            if auto_type_start_time is None or current_prediction != auto_type_last_prediction:
                auto_type_start_time = current_time
                auto_type_last_prediction = current_prediction
            else:
                if current_time - auto_type_start_time >= 2:
                    if current_prediction != "":
                        current_word += current_prediction
                        log_text.configure(state='normal')
                        log_text.insert(tk.END, f"Auto typed: {current_prediction}\n")
                        log_text.see(tk.END)
                        log_text.configure(state='disabled')
                        typed_text.configure(state='normal')
                        typed_text.delete("1.0", tk.END)
                        typed_text.insert(tk.END, current_word)
                        typed_text.configure(state='disabled')
                        auto_type_start_time = current_time  # reset timer to avoid repeated auto typing
        else:
            auto_type_start_time = None
            auto_type_last_prediction = None
    else:
        auto_type_start_time = None
        auto_type_last_prediction = None

    imgOutput_rgb = cv2.cvtColor(imgOutput, cv2.COLOR_BGR2RGB)
    img_pil = Image.fromarray(imgOutput_rgb)
    img_tk = ImageTk.PhotoImage(image=img_pil)
    main_image_label.imgtk = img_tk
    main_image_label.configure(image=img_tk)
    
    if imgWhite_for_display is not None:
        imgWhite_rgb = cv2.cvtColor(imgWhite_for_display, cv2.COLOR_GRAY2RGB)
        imgWhite_pil = Image.fromarray(imgWhite_rgb).resize((FIXED_WIDTH, FIXED_HEIGHT), resample_method)
        imgWhite_tk = ImageTk.PhotoImage(image=imgWhite_pil)
        binary_label.imgtk = imgWhite_tk
        binary_label.configure(image=imgWhite_tk)
    
    if imgCrop_landmarked_for_display is not None:
        imgCrop_rgb = cv2.cvtColor(imgCrop_landmarked_for_display, cv2.COLOR_BGR2RGB)
        imgCrop_pil = Image.fromarray(imgCrop_rgb).resize((FIXED_WIDTH, FIXED_HEIGHT), resample_method)
        imgCrop_tk = ImageTk.PhotoImage(image=imgCrop_pil)
        landmarks_label.imgtk = imgCrop_tk
        landmarks_label.configure(image=imgCrop_tk)
    
    if current_time - last_fps_report_time >= 1.0:
        max_fps = max(fps_values) if fps_values else 0
        min_fps = min(fps_values) if fps_values else 0
        avg_fps = sum(fps_values) / len(fps_values) if fps_values else 0
        pred_rate = (pred_counter / frame_counter) if frame_counter > 0 else 0
        pps = pred_counter

        max_fps_var.set(f"Max FPS: {max_fps:.2f}")
        min_fps_var.set(f"Min FPS: {min_fps:.2f}")
        avg_fps_var.set(f"Avg FPS: {avg_fps:.2f}")
        pred_rate_var.set(f"Predictions per frame: {pred_rate:.2f}")
        pps_var.set(f"PPS: {pps}")

        # Log FPS and prediction stats; also log device info at the very top if file is new/empty.
        log_folder = "AppFPS_Results"
        os.makedirs(log_folder, exist_ok=True)
        log_filename = os.path.join(log_folder, f"{selected_model_var.get()}_results.txt")
        if not os.path.exists(log_filename) or os.stat(log_filename).st_size == 0:
            with open(log_filename, "w") as log_file:
                cpu_model = platform.processor() or "Unknown CPU"
                cores = psutil.cpu_count(logical=True)
                cpu_info_str = f"CPU: {cpu_model} ({cores} cores)"
                if GPUtil is not None:
                    gpus = GPUtil.getGPUs()
                    gpu_info_str = "GPU: " + (", ".join([gpu.name for gpu in gpus]) if gpus else "None")
                else:
                    gpu_info_str = "GPU: Not available"
                mem = psutil.virtual_memory()
                total_gb = mem.total / (1024**3)
                available_gb = mem.available / (1024**3)
                memory_info_str = f"Memory: {available_gb:.1f}GB free / {total_gb:.1f}GB total"
                log_file.write("Device Info:\n")
                log_file.write(cpu_info_str + "\n")
                log_file.write(gpu_info_str + "\n")
                log_file.write(memory_info_str + "\n")
                log_file.write("----------\n")
        with open(log_filename, "a") as log_file:
            log_line = (f"{time.strftime('%Y-%m-%d %H:%M:%S')} - Max FPS: {max_fps:.2f}, "
                        f"Min FPS: {min_fps:.2f}, Avg FPS: {avg_fps:.2f}, "
                        f"Predictions per frame: {pred_rate:.2f}, PPS: {pps}\n")
            log_file.write(log_line)
        
        fps_values = []
        frame_counter = 0
        pred_counter = 0
        last_fps_report_time = current_time

    root.after(10, update_frame)

# ---------------------------
# Status Bar (New)
# ---------------------------
status_var = tk.StringVar(value="Ready")
status_bar = tk.Label(root, textvariable=status_var, font=("Segoe UI", 12), bg="#2c2f33", fg="#ffffff", bd=1, relief="sunken")
status_bar.grid(row=1, column=0, columnspan=3, sticky="ew")

# ---------------------------
# Clean up on exit (unchanged)
# ---------------------------
def on_closing():
    cap.release()
    root.destroy()

root.protocol("WM_DELETE_WINDOW", on_closing)

update_frame()
root.mainloop()


No Labels Found


In [None]:
import tensorflow as tf

print("Num GPUs Available:", len(tf.config.experimental.list_physical_devices('GPU')))
print("GPU being used:", tf.test.gpu_device_name())

tf.debugging.set_log_device_placement(True)  #