# Hand Gesture Recognition Project - DAT540 Introduction to Data Science
*Authors: Haakon Vollheim Webb, Håkon Nodeland, Magnus Kjellesvig Egeland, Ninh Bao Turong, William Vagle*


### Problem Statement
We are tasked with recognizing hand gestures. We want to use machine learning tools to build an effective and reliable model. This model, along with the webcam of the computer running the code, will be used to create a live feed of the models interpretations of the hand gestures. 

The code has the following structure:

0. Defining all Imports
1. Downloading the Dataset
2. Pre-Processing
3. Sanity Check
4. Defining a Model
5. Defining Hyper Parameters
6. Performing Cross Validation
7. Configuring Webcam Access
8. Predicting on a Single Frame
9. Live Feed of Predictions


In [None]:
# import libraries
import os
import zipfile

import cv2
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

from keras_tuner import Hyperband

from pathlib import Path

from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.model_selection import StratifiedKFold, train_test_split
from sklearn.preprocessing import LabelEncoder

from tensorflow.keras.layers import Conv2D, Dense, Dropout, Flatten, MaxPooling2D
from tensorflow.keras.models import Sequential, load_model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.utils import to_categorical


## 1. Downloading the Dataset
When running the code for the first time, we ensure that the user has the required dataset locally. If the dataset is not found locally, we download the it from kaggle and unzip it into /data/leapGestRecog.

In [None]:
# Define paths
dataset_folder = Path("./data/leapGestRecog").expanduser()  # Adjust folder name as needed
zip_path = Path("./data/archive.zip").expanduser()

if not dataset_folder.parent.exists():
    print(f"Creating the directory: {dataset_folder.parent}")
    dataset_folder.parent.mkdir(parents=True, exist_ok=True)

# Check if dataset folder exists
if not dataset_folder.exists():
    print("Dataset not found locally. Downloading...")
    
    if not zip_path.exists():
        # Run shell command to download the dataset if it doesn't exist
        !curl -L -o {zip_path} https://www.kaggle.com/api/v1/datasets/download/gti-upm/leapgestrecog
    
    # Unzip the downloaded file
    if zip_path.exists():
        print("Download complete. Extracting files...")
        with zipfile.ZipFile(zip_path, 'r') as zip_ref:
            zip_ref.extractall(dataset_folder)  # Extracts to the folder above the zip file
        print("Extraction complete.")
    else:
        print("Download failed. Please check your connection or Kaggle API credentials.")
else:
    print("Dataset already exists locally.")


## 2. Pre-Processing
The pre-processing is split into 3 parts: Normalization, data augmentation and data preperation.

### Normalization
We need to ensure that all images are of the same size, and that they are in gray scale. We use the OpenCV python binary extention loader (cv2) to read the images in grayscale:

`img = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE)`

And we use it to resize the images to the desired dimentions, to ensure that all images are of the same size and shape:

`img_resized = cv2.resize(img, desired_size)`

The `images` and `labels` lists contains the images and labels in the uint8-encoded format. These will be used for the data augmentation.

In [32]:
# Initialize the list of images and labels
images = []
labels = []

# We only want to work with the first person's data (folder '00')
base_folder = './data/leapGestRecog/leapGestRecog/00/'

# Specify desired image size
desired_size = (128, 128)

if os.path.isdir(base_folder):
    # For each gesture folder
    for gesture_folder in os.listdir(base_folder):
        gesture_path = os.path.join(base_folder, gesture_folder)
        
        # Check if it's a directory
        if os.path.isdir(gesture_path):
            
            # Use gesture folder as label
            label = gesture_folder 
            
            # For each image in the gesture folder
            for filename in os.listdir(gesture_path):
                
                # Check if the file is an image
                if filename.endswith(".jpg") or filename.endswith(".png"):
                    
                    # Read the image in grayscale and resize
                    img_path = os.path.join(gesture_path, filename)
                    img = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE)
                    
                    if img is not None:
                        img_resized = cv2.resize(img, desired_size)
                        
                        # Keep the image in uint8 format with pixel values in [0, 255]
                        images.append(img_resized)
                        labels.append(label)

### Data Augmentation
The dataset only contains 200 images of each hand gesture. We want to increase this by creating duplicates with slight augmentation. Using Tensorflow's ImageDataGenerator, we can specify how much we want to augment the images. It will rotate, shift, zoom, flip, and modify the brightness randomly for each image. For the implementation, it is specified that we want `augments_per_image` number of augments. These augmented images are then saved locally.

Missing: Check if augmented images already exist and how many copies, so as to not generate new augmented images every time.

In [33]:
# Convert images and labels to numpy arrays
images = np.array(images, dtype=np.uint8)
labels = np.array(labels)

# Reshape images to add channel dimension (grayscale images)
images = images.reshape((-1, desired_size[0], desired_size[1], 1))

# Augmentation settings
datagen = ImageDataGenerator(
    rotation_range=30,
    width_shift_range=0.1,
    height_shift_range=0.1,
    zoom_range=0.2,
    brightness_range=[0.8, 1.2],
    horizontal_flip=True,
    shear_range=0.2,
    fill_mode='nearest'
)

# Create a directory to save augmented images
output_dir = 'data/augmented_images'
os.makedirs(output_dir, exist_ok=True)
for label in np.unique(labels):
    os.makedirs(f'{output_dir}/{label}', exist_ok=True)

# Augment and collect augmented images and labels
augmented_images = []
augmented_labels = []

# Number of augmentations per original image
augmentations_per_image = 2

# For each image
for idx in range(len(images)):
    img = images[idx]
    label = labels[idx]
    
    # Reshape to (1, height, width, channels) which is the expected shape for the flow() function
    img = img.reshape((1,) + img.shape)
    i = 0
    for batch in datagen.flow(
            img,
            batch_size=1,
            save_to_dir=output_dir+'/{}'.format(label),
            save_prefix=f'{idx}',
            save_format='jpeg'):
        
        # Append the augmented image and label to the lists
        augmented_images.append(batch[0])
        augmented_labels.append(label)
        i += 1
        if i >= augmentations_per_image:
            break

## Data Preperation
We want to combine the list of augmented images with the original samples. Additionally we need to add a channel dimenstion to the images, as it is a required field for Conv2D later on. Additionally we convert the labels to one-hot encoded vectors.

In [34]:
# Combine original and augmented data
all_images = np.concatenate((images, np.array(augmented_images, dtype=np.uint8)))
all_labels = np.concatenate((labels, np.array(augmented_labels)))

# Adding a channel dimension to the images. The images are grayscale so the channel dimension is 1.
# X here is a 4D array (number of samples, height, width, channels)
# which we can directly use in a Conv2D layer in the next code block
X = np.array(all_images).reshape(-1, desired_size[0], desired_size[1], 1)

#Convert labels to integers (i.e. convert '01_palm' to (for instance) 0)
le = LabelEncoder()
y = le.fit_transform(all_labels)

# Convert labels to one-hot encoded vectors
y_categorical = to_categorical(y)

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y_categorical, test_size=0.2, random_state=42)

## 3. Sanity Check - Part 1
To ensure we our implementation are working so far, we want to display some sample images and the class distribution.

In [None]:
# Display sample images from each class
classes = np.unique(y)
fig, axes = plt.subplots(1, len(classes), figsize=(15, 5))
for i, cls in enumerate(classes):
    idx = y.tolist().index(cls)
    axes[i].imshow(all_images[idx], cmap='gray')
    axes[i].set_title(f"Class: {le.inverse_transform([cls])[0]}")
    axes[i].axis('off')
plt.show()


In [None]:
# Plot the distribution of classes, we can see that the classes are balanced
# as we have 200 * augments_per_image samples for each class, all classes have the same number of samples
sns.countplot(x=y)
plt.xlabel('Class')
plt.xticks(ticks=classes, labels=le.inverse_transform(classes), rotation=45)
plt.ylabel('Number of Samples')
plt.title('Class Distribution')
plt.show()

## 4. Building the model

We are using a sequential model, which allows for a linear stack of layers. We are building a CNN, where 

The first layer is responsible for extracting basic features from the input image, reducing its spatial size, and preparing the data for deeper layers to learn higher-level features.

The second layer is responsible for refining and combining the features extracted by the first layer to build a more abstract and detailed representation of the input data.

The third layer, the flatten and dense layer, helps the network transition from feature extraction to learning the relationships necessary for classification.

Finally we have the output layer which is responsible of producing the final classification output.


All of the parameters defined in the `build_model(hp)` function are tunable. This is because we are using this function to perform the hyperparameter tuning.

In [None]:
def build_model(hp):
    # Create a Sequential model. A sequential model is a linear stack of layers.
    # It is a type of CNN model that is suitable for a plain stack of layers where each layer has exactly one input tensor and one output tensor.
    model = Sequential()
    
    
    # First Convolutional Layer with tunable number of filters and kernel size
    model.add(Conv2D(
        filters=hp.Int('conv_1_filters', min_value=32, max_value=128, step=32),
        kernel_size=hp.Choice('conv_1_kernel', values=[3, 5]),
        activation='relu',
        input_shape=(desired_size[0], desired_size[1], 1)
    ))
    model.add(MaxPooling2D((2, 2)))
    
    # Second Convolutional Layer with tunable number of filters and kernel size
    model.add(Conv2D(
        filters=hp.Int('conv_2_filters', min_value=64, max_value=256, step=32),
        kernel_size=hp.Choice('conv_2_kernel', values=[3, 5]),
        activation='relu'
    ))
    model.add(MaxPooling2D((2, 2)))
    
    # Flatten and Dense Layers with tunable dense units and dropout rate
    model.add(Flatten())
    model.add(Dense(units=hp.Int('dense_units', min_value=64, max_value=256, step=32), activation='relu'))
    model.add(Dropout(rate=hp.Float('dropout', min_value=0.2, max_value=0.5, step=0.1)))
    
    # Output Layer
    model.add(Dense(len(classes), activation='softmax'))
    
    # Compile model with tunable learning rate
    model.compile(
        optimizer=Adam(learning_rate=hp.Choice('learning_rate', values=[1e-3, 1e-4, 1e-5])),
        loss='categorical_crossentropy',
        metrics=['accuracy']
    )
    
    return model

# 5. Defining Hyper Parameters
To find the hyper parameters, we use keras' Hyperband class. This class has the search() method, which performs a search for the best hyper parameters. This search is only required to be performed once. We can later use the found parameters to define a single model. Additionally, we will use these parameters for constructing the models during the Cross-Validation.

In [None]:
# Define the tuner and its parameters
tuner = Hyperband(
    build_model,
    objective='val_accuracy',
    max_epochs=20,
    factor=3,
    directory='tuning_dir',
    project_name='hand_sign_tuning'
)

In [None]:
# Run the tuner search.
# ETA: 2.5 hours, depending on the cpu/gpu, augments_per_image, max_epochs and factor.
# Only necessary if you want to run the search again.

tuner.search(X_train, y_train, epochs=20, validation_split=0.1, batch_size=32)

In [None]:
# Get the best model
best_model = tuner.get_best_models(num_models=1)[0]

# Display the best hyperparameters
best_hyperparameters = tuner.get_best_hyperparameters(1)[0]
print("Best Hyperparameters:")
print(f"Conv Layer 1 Filters: {best_hyperparameters.get('conv_1_filters')}")
print(f"Conv Layer 1 Kernel Size: {best_hyperparameters.get('conv_1_kernel')}")
print(f"Conv Layer 2 Filters: {best_hyperparameters.get('conv_2_filters')}")
print(f"Conv Layer 2 Kernel Size: {best_hyperparameters.get('conv_2_kernel')}")
print(f"Dense Units: {best_hyperparameters.get('dense_units')}")
print(f"Dropout Rate: {best_hyperparameters.get('dropout')}")
print(f"Learning Rate: {best_hyperparameters.get('learning_rate')}")


# # Evaluate the model on the test set
test_loss, test_accuracy = best_model.evaluate(X_test, y_test)
print(f"Test Accuracy: {test_accuracy:.4f}")

In [None]:


# Make predictions on the test set
y_pred = model.predict(X_test)
y_pred_classes = np.argmax(y_pred, axis=1)


# Convert one-hot encoded y_test back to labels
y_test_classes = np.argmax(y_test, axis=1)

# Accuracy
accuracy = accuracy_score(y_test_classes, y_pred_classes)
print(f"Test Accuracy: {accuracy:.2f}")

# Classification Report
print("Classification Report:")
print(classification_report(y_test_classes, y_pred_classes, target_names=le.classes_))

cm = confusion_matrix(y_test_classes, y_pred_classes)
plt.figure(figsize=(10,7))
sns.heatmap(cm, annot=True, fmt='d', xticklabels=le.classes_, yticklabels=le.classes_)
plt.xlabel('Predicted')
plt.ylabel('True')
plt.title('Confusion Matrix')
plt.show()

#Shows actual (y-axis) vs predicted values (x-axis)
#The diagonal represents the correct predictions
#The diagonal shows that the model never predicts false positives



## 6. Cross Validation

To ensure that we have a good model, we want to cross validate it. This ensures us that the model reads unseed data in a good manner, that it is not over-fitted while also being a good process for evaluating the performance of the model. We are using the hyper parameters found in step 5.

In [None]:
# Ensure y is in integer format (not one-hot encoded)
# If y is one-hot encoded, convert it back to integer labels
if y.ndim > 1:
    y_labels = np.argmax(y, axis=1)
else:
    y_labels = y
    
#
def build_model_with_parameters():

    # Best Hyperparameters:
    # Conv Layer 1 Filters: 128
    # Conv Layer 1 Kernel Size: 3
    # Conv Layer 2 Filters: 96
    # Conv Layer 2 Kernel Size: 5
    # Dense Units: 192
    # Dropout Rate: 0.30000000000000004
    # Learning Rate: 1e-05

    # WEBB:
    #Best Hyperparameters:
    # Conv Layer 1 Filters: 32
    # Conv Layer 1 Kernel Size: 5
    # Conv Layer 2 Filters: 128
    # Conv Layer 2 Kernel Size: 5
    # Dense Units: 256
    # Dropout Rate: 0.2
    # Learning Rate: 0.0001
    # 38/38 ━━━━━━━━━━━━━━━━━━━━ 2s 38ms/step - accuracy: 0.9333 - loss: 0.3256
    # Test Accuracy: 0.9275

    # Webb Best Hyperparameters
    conv_1_filters = 32
    conv_1_kernel = 5
    conv_2_filters = 128
    conv_2_kernel = 5
    dense_units = 256
    dropout_rate = 0.2
    learning_rate = 0.0001
    
    # Best Hyperparameters
    # conv_1_filters = 128
    # conv_1_kernel = 3
    # conv_2_filters = 96
    # conv_2_kernel = 5
    # dense_units = 192
    # dropout_rate = 0.3
    # learning_rate = 0.00001

    

    # Build the model
    model = Sequential()
    model.add(Conv2D(filters=conv_1_filters,
                     kernel_size=(conv_1_kernel, conv_1_kernel),
                     activation='relu',
                     input_shape=(desired_size[0], desired_size[1], 1)))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Conv2D(filters=conv_2_filters,
                     kernel_size=(conv_2_kernel, conv_2_kernel),
                     activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Flatten())
    model.add(Dense(units=dense_units, activation='relu'))
    model.add(Dropout(rate=dropout_rate))
    model.add(Dense(len(classes), activation='softmax'))

    # Compile the model
    optimizer = Adam(learning_rate=learning_rate)
    model.compile(optimizer=optimizer,
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])
    return model


## K-Fold Cross-Validation

K-fold cross-validation is a technique used in machine learning to evaluate the performance of a model. Here's how it works:

1. **Split the Data**: The dataset is divided into `k` equally sized folds (subsets).
2. **Training and Validation**: For each fold:
   - Use `k-1` folds for training the model.
   - Use the remaining 1 fold for validating the model.
3. **Repeat**: This process is repeated `k` times, with each fold used exactly once as the validation data.
4. **Average Performance**: The performance metric (e.g., accuracy, precision) is averaged over the `k` iterations to provide a more robust estimate of the model's performance.

This method helps in reducing the variance of the performance estimate and ensures that every data point gets a chance to be in the validation set.

In [None]:
# Define the number of folds
n_splits = 5

# Initialize StratifiedKFold
skf = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=42)

fold_no = 1
accuracy_per_fold = []
loss_per_fold = []

for train_index, val_index in skf.split(X, y_labels):
    print(f'Fold {fold_no} -------------------------------------')

    # Split data
    X_train_cv, X_val_cv = X[train_index], X[val_index]
    y_train_cv, y_val_cv = y[train_index], y[val_index]

    # Ensure labels are one-hot encoded
    y_train_categorical_cv = to_categorical(y_train_cv, num_classes=len(classes))
    y_val_categorical_cv = to_categorical(y_val_cv, num_classes=len(classes))

    # Build a fresh model for each fold
    model = build_model_with_parameters()

    # Train the model
    history_cv = model.fit(X_train_cv, y_train_categorical_cv,
                        epochs=20,
                        batch_size=32,
                        validation_data=(X_val_cv, y_val_categorical_cv),
                        verbose=1)

    # Evaluate the model
    scores = model.evaluate(X_val_cv, y_val_categorical_cv, verbose=0)
    print(f'Score for fold {fold_no}: {model.metrics_names[0]} = {scores[0]:.4f}; {model.metrics_names[1]} = {scores[1]:.4f}')
    accuracy_per_fold.append(scores[1])
    loss_per_fold.append(scores[0])

    fold_no += 1


print('-------------------------------------')
print('Score per fold')
for i in range(len(accuracy_per_fold)):
    print(f'> Fold {i+1} - Loss: {loss_per_fold[i]:.4f} - Accuracy: {accuracy_per_fold[i]:.4f}%')
print('-------------------------------------')
print('Average scores for all folds:')
print(f'> Accuracy: {np.mean(accuracy_per_fold):.4f} (+- {np.std(accuracy_per_fold):.4f})')
print(f'> Loss: {np.mean(loss_per_fold):.4f}')
print('-------------------------------------')


# Score per fold
# > Fold 1 - Loss: 0.0039 - Accuracy: 1.0000%
# > Fold 2 - Loss: 0.0066 - Accuracy: 1.0000%
# > Fold 3 - Loss: 0.0054 - Accuracy: 1.0000%
# > Fold 4 - Loss: 0.0044 - Accuracy: 1.0000%
# > Fold 5 - Loss: 0.0051 - Accuracy: 1.0000%
# -------------------------------------
# Average scores for all folds:
# > Accuracy: 1.0000 (+- 0.0000)
# > Loss: 0.0051


In [None]:
# Build a model with the saved hyperparameters
# Only run this if you not already have a model.keras file locally
model = build_model_with_parameters()
history = model.fit(X_train, y_train, epochs=20, batch_size=32, validation_split=0.1)
test_loss, test_accuracy = model.evaluate(X_test, y_test)
print(f"Test Accuracy: {test_accuracy}")
model.save('model.keras')

# Image Adjustments
This code block defines functions to adjust incoming images from live feed.

In [None]:
# Define the adjust_levels function
def adjust_levels(channel, low_input, high_input, gamma, low_output, high_output):
    # Ensure values are in [0,1]
    channel = np.clip(channel, 0, 1)

    # Initialize the output channel
    out_channel = np.zeros_like(channel)

    # Avoid division by zero
    denom = high_input - low_input
    denom = denom if denom != 0 else 1e-6

    # Apply levels adjustment
    idx = (channel >= low_input) & (channel <= high_input)
    x = (channel[idx] - low_input) / denom
    x = x ** (1 / gamma)
    out_channel[idx] = low_output + x * (high_output - low_output)

    # For pixels below low_input
    idx = channel < low_input
    out_channel[idx] = low_output

    # For pixels above high_input
    idx = channel > high_input
    out_channel[idx] = high_output

    return out_channel


# Adjusting the image to match the training data. 
def adjust_image(image, desired_size):
    # Split the image into B, G, R channels
    b, g, r = cv2.split(image)

    # Adjust levels for each channel
    # Red channel
    r_adj = adjust_levels(r,
                          low_input=0.078,
                          high_input=0.88,
                          gamma=1.0,
                          low_output=0.0,
                          high_output=1.0)

    # Green channel
    g_adj = adjust_levels(g,
                          low_input=0.090,
                          high_input=0.82,
                          gamma=1.0,
                          low_output=0.0,
                          high_output=1.0)

    # Blue channel
    b_adj = adjust_levels(b,
                          low_input=0.13,
                          high_input=0.92,
                          gamma=1.0,
                          low_output=0.0,
                          high_output=1.0)

    # Merge the adjusted channels back into an image
    image_adj = cv2.merge([b_adj, g_adj, r_adj])

    # Convert image_adj back to uint8
    image_adj_uint8 = np.clip(image_adj * 255.0, 0, 255).astype(np.uint8)

    # Convert the adjusted image to HSV
    image_hsv = cv2.cvtColor(image_adj_uint8, cv2.COLOR_BGR2HSV)

    # Split the HSV channels
    h, s, v = cv2.split(image_hsv)

    # Convert V to float32 and normalize to [0,1]
    v = v.astype(np.float32) / 255.0

    # Adjust levels for the 'value' channel
    v_adj = adjust_levels(v,
                          low_input=0.21,
                          high_input=0.58,
                          gamma=0.23,
                          low_output=0.0,
                          high_output=0.85)

    # Multiply by 255 and convert to uint8
    v_adj_uint8 = np.clip(v_adj * 255.0, 0, 255).astype(np.uint8)

    # Merge the HSV channels back
    hsv_adj = cv2.merge([h, s, v_adj_uint8])

    # Convert back to BGR color space
    image_final = cv2.cvtColor(hsv_adj, cv2.COLOR_HSV2BGR)


    # Optionally reduce brightness
    frame_darkened = cv2.convertScaleAbs(image_final, alpha=0.9, beta=0.3)

    # Convert to grayscale for prediction
    frame_gray = cv2.cvtColor(frame_darkened, cv2.COLOR_BGR2GRAY)
    # Resize the frame to match the input size
    frame_resized = cv2.resize(frame_gray, desired_size)

    # Return both the display image and the resized grayscale image
    return frame_resized

# Live Demonstration Section
* Load Model
* Run live video capture
* Run prediction on frames taken from video capture
* Provide accuracy on frames taken

In [60]:
# Load the trained model
model = load_model('test.keras')

# Rebuild the LabelEncoder with the same labels used during training
labels = ['01_palm', '02_l', '03_fist', '04_fist_moved', '05_thumb',
          '06_index', '07_ok', '08_palm_moved', '09_c', '10_down']

le = LabelEncoder()
le.fit(labels)

# Get the input shape of the model
input_shape = model.input_shape  # Should be (None, 128, 128, 1)
_, img_height, img_width, channels = input_shape

print(f"Model's input shape: {input_shape}")

# Specify desired image size
desired_size = (img_width, img_height)

# Start video capture from the default webcam
cap = cv2.VideoCapture(0)

# Check if the webcam is opened correctly
if not cap.isOpened():
    print("Error: Could not open webcam.")
    exit()

# Ensure the directory for saving frames exists
os.makedirs("data/frames", exist_ok=True)
frame_count = 0

while True:
    # Capture frame-by-frame
    ret, frame = cap.read()
    if not ret:
        print("Failed to grab frame.")
        break

    # Convert frame to float32 and normalize to [0,1]
    image = frame.astype(np.float32) / 255.0
    
    # Copy the unmodified frame for display
    frame_display = image.copy()
    
    # Adjust the image for prediction
    frame_resized = adjust_image(image, desired_size)

    # Prediction
    frame_uint8 = frame_resized.astype(np.uint8)
    frame_expanded = np.expand_dims(frame_uint8, axis=0)  # Add batch dimension
    frame_expanded = np.expand_dims(frame_expanded, axis=-1)  # Add channel dimension

    # Ensure data type is float32
    frame_expanded = frame_expanded.astype(np.float32)

    # Check the shape of the expanded frame
    print(f"frame_expanded shape: {frame_expanded.shape}")  # Should be (1, 128, 128, 1)

    # Save every n frames to debug or analyze
    n = 10
    if frame_count % n == 0:
        frame_filename = os.path.join("data/frames", f'frame_{frame_count}.jpg')
        cv2.imwrite(frame_filename, frame_resized)
        print(f"Saved frame to '{frame_filename}'")
    frame_count += 1

    # Make prediction
    predictions = model.predict(frame_expanded)
    predicted_class_index = np.argmax(predictions, axis=1)[0]
    confidence = np.max(predictions)

    # Decode the predicted class index to get the gesture label
    predicted_label = le.inverse_transform([predicted_class_index])[0]

    # Overlay the prediction on the frame_display
    cv2.putText(frame_display, f'Gesture: {predicted_label} ({confidence*100:.2f}%)',
                (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1,
                (0, 255, 0), 2, cv2.LINE_AA)

    # Display the resulting frame
    cv2.imshow('Hand Gesture Recognition', frame_display)

    # Press 'q' to exit the loop
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# When everything is done, release the capture and close windows
cap.release()
cv2.destroyAllWindows()


Model's input shape: (None, 128, 128, 1)
frame_expanded shape: (1, 128, 128, 1)
Saved frame to 'data/frames\frame_0.jpg'
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 70ms/step
frame_expanded shape: (1, 128, 128, 1)
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 29ms/step
frame_expanded shape: (1, 128, 128, 1)
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 31ms/step
frame_expanded shape: (1, 128, 128, 1)
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 27ms/step
frame_expanded shape: (1, 128, 128, 1)
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 32ms/step
frame_expanded shape: (1, 128, 128, 1)
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 39ms/step
frame_expanded shape: (1, 128, 128, 1)
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 30ms/step
frame_expanded shape: (1, 128, 128, 1)
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 33ms/step
frame_expanded shape: (1, 128,

# Sanity Check - Part 2
* Live demonstration shows 100% accuracy, but we know it is not guessing correct.
* Want to sanity check captured images with test data.
* Can clearly seee difference in the images and why the result is incorrect.

![Test Data](./data/leapGestRecog/leapGestRecog/00/01_palm/frame_00_01_0001.png)
![Augmented Test Data](./path/to/image.png)
![Captured Image](./path/to/image.png)

# Sources:
* [Handwritten Digit Recognition using CNN with TensorFlow](https://learner-cares.medium.com/handwritten-digit-recognition-using-convolutional-neural-network-cnn-with-tensorflow-2f444e6c4c31)
* ChatGPT
* Co-Pilot