# Data Generation for Hand Gesture Recognition

This notebook is the first step in our gesture recognition pipeline. Its purpose is to collect the raw data needed to train our neural network. It provides a streamlined process for capturing a large number of images for various hand gestures directly from a webcam.

The key outcomes of this notebook are:
1.  A structured directory of images, with each sub-directory corresponding to a specific gesture.
2.  Two CSV files (`gestures_dataset_train.csv` and `gestures_dataset_val.csv`) containing processed hand landmark data, ready for model training.
3.  A default key bindings configuration file (`key_bindings_default.json`) that maps gestures to keyboard actions.

### Table of Contents
1. [Setup and Dependencies](#setup)
2. [Configuration for Image Capture](#config)
3. [Live Image Capture](#capture)
4. [Processing Images into a Dataset](#processing)
5. [Results and Next Steps](#results)

<a id="setup"></a>
## 1. Setup and Dependencies

This section ensures that the environment is correctly configured to run the notebook. The following code cell installs all the necessary Python libraries listed in the `requirements.txt` file.

**How to Use:**
1.  Make sure you have Python and `pip` installed.
2.  Run the next cell to install all dependencies.

In [None]:
%pip install -r requirements.txt

<a id="config"></a>
## 2. Configuration for Image Capture

Before we begin capturing images, we need to define a few parameters. The next cell allows you to customize the data collection process.

### Parameters:
-   `num_screenshots`: The total number of images to capture for the specified gesture. A larger number of diverse images will lead to a more robust model.
-   `capture_rate`: The time delay (in seconds) between each screenshot. A small delay helps in capturing slight variations of the gesture.
-   `gesture_name`: The name of the gesture you are capturing (e.g., "fist", "palm_up", "nice"). This name is critical as it will be used as the class label for the gesture. It also determines the folder where the images will be saved.

**Instructions:**
1.  Decide on a gesture you want to add (e.g., "thumb_up").
2.  Set the `gesture_name` variable to your chosen name.
3.  Run the cell to apply the configuration.
4.  Repeat this process for every new gesture you want to add to the dataset.

In [None]:
import os
import time

# --- Configuration ---
num_screenshots = 250  # Number of images to capture
capture_rate = 0.1  # Seconds between captures
gesture_name = "thump_up"  # Name of the gesture (e.g., "fist", "point_right")

# --- File and Folder Setup ---
base_folder = "data/gestures"
folder_location = os.path.join(base_folder, gesture_name)
picture_name = gesture_name

# Create the directory if it doesn't exist
os.makedirs(folder_location, exist_ok=True)

print(f"Configuration:")
print(f" - Number of screenshots: {num_screenshots}")
print(f" - Capture rate: {capture_rate} seconds")
print(f" - Gesture name: '{gesture_name}'")
print(f" - Saving to: '{folder_location}'")

<a id="capture"></a>
## 3. Live Image Capture

This section contains the code to capture images from your webcam.

### How It Works:
-   The script opens a window displaying your webcam feed.
-   It's designed for a "hands-free" operation after the initial trigger.
-   Position your hand to form the gesture you configured in the previous step.
-   Press the **'s'** key to start the automated capture sequence. The window will show a countdown of the images being taken.
-   Slightly move your hand and change its orientation between captures to create a varied dataset.
-   Press the **'q'** key at any time to stop the capture and close the window.

The captured images will be saved in the `data/gestures/<gesture_name>/` directory.

In [None]:
import cv2
import time
import os

cap = cv2.VideoCapture(0)

count = len(os.listdir(folder_location))

screenshots_taken = 0
capturing = False
last_capture_time = 0
window_name = 'Data Generation'

while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        print("Failed to grab frame")
        break

    frame = cv2.flip(frame, 1)
    display_frame = frame.copy()

    if not capturing:
        cv2.putText(display_frame, "Press 's' to start capturing", (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 255, 0), 2)
    else:
        cv2.putText(display_frame, f"Capturing... {screenshots_taken}/{num_screenshots}", (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 2)
    
    cv2.putText(display_frame, "Press 'q' to quit", (10, 60), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (255, 0, 0), 2)

    cv2.imshow(window_name, display_frame)

    key = cv2.waitKey(1) & 0xFF

    if key == ord('q'):
        break
    elif key == ord('s'):
        if not capturing:
            print("Starting capture...")
            capturing = True
            last_capture_time = time.time()

    if capturing:
        current_time = time.time()
        if current_time - last_capture_time >= capture_rate:
            if screenshots_taken < num_screenshots:
                img_name = os.path.join(folder_location, f"{picture_name}_{count:03d}.jpg")
                cv2.imwrite(img_name, frame)
                print(f"Saved {img_name}")
                count += 1
                screenshots_taken += 1
                last_capture_time = current_time
            else:
                print("Capture complete.")
                break

cv2.destroyAllWindows()
cv2.waitKey(1)
cap.release()

<a id="processing"></a>
## 4. Processing Images into a Dataset

After capturing the raw images, this section processes them to create a structured, machine-learning-ready dataset.

### The Processing Pipeline:

1.  **Load Configuration**: Loads paths and settings from the `app_config.json` file.
2.  **Initialize Tools**: Sets up the `GestureDetector` (powered by MediaPipe) to find hand landmarks and the `DataPreprocessor` for normalizing them.
3.  **Map Gestures to IDs**: It scans the `data/gestures/` directory, assigning a unique integer ID to each gesture folder (e.g., 'fist': 0, 'palm_up': 1).
4.  **Extract Landmarks**: The code iterates through every image in each gesture folder. For each image:
    *   It uses `GestureDetector` to locate the 21 hand landmarks.
    *   If a hand is found, the `DataPreprocessor` normalizes the landmark coordinates. Normalization is crucial as it makes the model robust to variations in hand size, position, and rotation. It involves translating the landmarks relative to the wrist and scaling them.
5.  **Create DataFrame**: The processed data (gesture ID and the list of normalized landmarks) is compiled into a pandas DataFrame.
6.  **Train-Validation Split**: The dataset is split into a training set (90%) and a validation set (10%). We use stratified sampling (`stratify=df['GESTURE_ID']`) to ensure that the proportion of each gesture is the same in both the training and validation sets. This prevents bias and leads to more reliable evaluation.
7.  **Save Datasets**: The final training and validation DataFrames are saved as CSV files, which will be the direct input for our model training notebook.
8.  **Generate Key Bindings**: A default `key_bindings_default.json` file is created. This file acts as a template for mapping the newly created gesture IDs to keyboard actions in the main application.

In [None]:
from sklearn.model_selection import train_test_split
from src.gesture_controller.data_preprocessor import DataPreprocessor
from src.gesture_controller.gesture_detector import GestureDetector
from src.gesture_controller.app_config import AppConfig
import pandas as pd
import numpy as np
import json
import sys
import cv2
import os


# --- Configuration ---
config = AppConfig()
data_generation_config = config.get_app_config("DATA_GENERATION_CONFIG")
gestures_folder = data_generation_config.get("GESTURE_FOLDER")
datase_train_path = data_generation_config.get("DATASET_TRAINING_PATH")
datase_val_path = data_generation_config.get("DATASET_VAL_PATH")
key_bindings_file = data_generation_config.get("KEY_BINDING_OUTPUT_FILE")

# --- Initialization ---
# We set max_hands to 1 because we are analyzing images of a single gesture
detector = GestureDetector(max_hands=1, min_detection_confidence=0.5)
data_preprocessor = DataPreprocessor()
dataset = []

# --- Data Processing ---

# Get the list of gesture subdirectories and create a mapping to integer IDs
gesture_folders = sorted([f for f in os.listdir(gestures_folder) if os.path.isdir(os.path.join(gestures_folder, f))])
gesture_map = {name: i for i, name in enumerate(gesture_folders)}

print("Processing gestures...")
print(f"Found gestures: {gesture_map}")

# Iterate over each gesture folder
for gesture_name, gesture_id in gesture_map.items():
    folder_path = os.path.join(gestures_folder, gesture_name)
    print(f"\nProcessing folder: {folder_path}")

    # Get a sorted list of images to process them in a consistent order
    image_files = sorted([f for f in os.listdir(folder_path) if f.lower().endswith(('.png', '.jpg', '.jpeg'))])

    # Iterate over each image in the folder
    for image_name in image_files:
        image_path = os.path.join(folder_path, image_name)
        
        image = cv2.imread(image_path)
        if image is None:
            print(f"Warning: Could not read image {image_path}")
            continue

        # Process the frame to find hands
        detector.process_frame(image)
        
        # Get hand vectors (we expect only one hand)
        left_hand, right_hand = detector.get_hand_vectors()
        hand_vector = left_hand or right_hand

        if hand_vector:
            # Normalize the landmarks
            normalized_landmarks = data_preprocessor.process(hand_vector)
            
            if normalized_landmarks is not None:
                # Append the gesture ID and the flattened list of landmarks
                dataset.append([gesture_id, normalized_landmarks.tolist()])
        else:
            print(f"Warning: No hand detected in {image_path}")

# --- Generate Default Key Bindings Configuration ---
key_bindings_config = {}
for gesture_name, gesture_id in gesture_map.items():
    key_bindings_config[str(gesture_id)] = {
        "gesture": gesture_name,
        "keys": [],
        "behavior": ""
    }

# Save the default key bindings configuration
with open(key_bindings_file, 'w') as f:
    json.dump(key_bindings_config, f, indent=2)

# --- Create and Save DataFrames ---
# Create a DataFrame from the collected data
df = pd.DataFrame(dataset, columns=["GESTURE_ID", "LANDMARKS"])

# Split the dataset into training and validation sets (90% train, 10% val)
# We stratify by GESTURE_ID to ensure that the distribution of gestures is
# similar in both the training and validation sets.
train_df, val_df = train_test_split(
    df,
    test_size=0.1,
    random_state=42,
    stratify=df['GESTURE_ID']
)

# Save the datasets to CSV files
train_df.to_csv(datase_train_path, index=False)
val_df.to_csv(datase_val_path, index=False)

print(f"\nDatasets created successfully.")
print(f"Training set: {len(train_df)} samples, saved to '{datase_train_path}'")
print(f"Validation set: {len(val_df)} samples, saved to '{datase_val_path}'")


<a id="results"></a>
## 5. Results and Next Steps

The successful execution of this notebook produces the following artifacts:

-   **Image Folders**: A collection of gesture images located in `data/gestures/`.
-   **Training Data**: `data/gestures_dataset_train.csv`
-   **Validation Data**: `data/gestures_dataset_val.csv`
-   **Key Binding Configuration**: `config/key_bindings_default.json`

### Key Bindings Configuration

The generated `key_bindings_default.json` file is a crucial part of the application. You need to **manually edit this file** to map the detected gestures to specific keyboard actions.

For each gesture, you can specify:
-   `keys`: A list of keyboard keys to be triggered. For example: `["ctrl", "c"]` for copy, or `["up"]` for moving up.
-   `behavior`: The action to perform with the keys. This can be one of two values:
    -   `"press"`: Simulates a quick press and release of the keys (e.g., for firing a weapon).
    -   `"hold"`: Simulates pressing and holding the keys down (e.g., for continuous movement). The keys will be released when the gesture is no longer detected.

### Next Steps
With the data generated and processed, you are now ready to proceed to the `neural_network_training.ipynb` notebook to train the gesture classification model.