# Data Collection for Gesture Recognition

This Python script utilizes the [Mediapipe](https://mediapipe.dev/) library and OpenCV to capture video from a web camera, perform real-time human pose estimation, and collect landmark data for specific actions. The purpose is to create a dataset for training a machine learning model to recognize these actions based on keypoint information.

## Key Components:

1. **Mediapipe Detection Functions:**
   - Define a function `mediapipe_detection` to process an image using the Mediapipe Holistic model and obtain detection results for facial, pose, and hand landmarks.
   - Define a function `draw_landmarks` to visually annotate an image with landmarks and connections for face, pose, and both left and right hands.

2. **Data Extraction Functions:**
   - Define a function `get_keypoints` to extract the 3D coordinates of left and right hand landmarks, pose landmarks, and face landmarks from a single image. The coordinates are flattened into an array for model input.

3. **Data Folder Creation:**
   - Define a function `make_data_folders` to create folders for training data. Each folder corresponds to a specific action and sequence/example within that action.

4. **Data Collection:**
   - Define a function `collect_data` to capture video frames from the device camera, use the Mediapipe Holistic model for landmark detection, and save keypoints for each frame. The collected data is organized into folders based on the action, sequence, and frame number.

5. **Adding More Actions:**
   - To add more actions, update the `ACTIONS` array with the names of additional actions you want to detect.

6. **Main Execution:**
   - Initialize the device camera and set up the Mediapipe model.
   - Iterate through predefined actions, sequences, and frames to collect and save landmark data.
   - Visualize the process by displaying the annotated video feed with landmarks.

The automates the process of creating a labeled dataset for training a machine learning model to recognize specific human actions based on pose information. The keypoints extracted from the landmark data serve as input features for the model.


In [1]:
import mediapipe as mp 
import cv2 
import seaborn 
import matplotlib.pyplot as plt 
import time 
import os 
import numpy as np
from utils.config import  DATA_PATH, NUM_EXAMPLES, SEQUENCE_LENGTH, ACTIONS, mp_holistic, mp_drawing
from utils.mp_helper import mediapipe_detection, draw_landmarks, get_keypoints

## Define Helper Functions

In [None]:
def make_data_folders():
    """
    Create folders for training data, one folder for each action and each sequence/example within that action.

    Parameters:
    - None

    Returns:
    - None
    """
    
    # create folders for training data
    for action in ACTIONS:
        for sequences in range(NUM_EXAMPLES):
            try:
                # make folder for each example for each action 
                os.makedirs(os.path.join(DATA_PATH, action, str(sequences)))
            except:
                pass 
    

## Data Collection

In [None]:
def collect_data():
    
    """
    Capture video from the device camera, use the Mediapipe Holistic model to collect landmark data for  
    actions. Each action has a set amount of examples and each example has a set amount of frames. We save keypoints for each frame
    and organize it in the folders accordingly. 

    Parameters:
    - None

    Returns:
    - None
    """
    
    make_data_folders()
    
    vc = cv2.VideoCapture(0) # open up device camera 

    # set mediapipe model 
    with mp_holistic.Holistic(min_detection_confidence =0.5, min_tracking_confidence=0.5) as holistic:
        
        # go through each action 
        for action in ACTIONS:
            
            # create desired amount of examples
            for seq in range(NUM_EXAMPLES):
                
                # collect data for each sequence length
                for frame_num in range(SEQUENCE_LENGTH):
                    
                    # get frame from stream 
                    ret, frame = vc.read()
                    
                    # make detection
                    image, results = mediapipe_detection(frame, holistic)
                    
                    # show detection
                    draw_landmarks(image, results)
                    
                    # display useful info on collecting or starting new collection
                    if frame_num == 0:
                        cv2.putText(image, 'STARTING COLLECTION', (120, 200),
                                    cv2.FONT_HERSHEY_SIMPLEX, .5, (0, 0, 255), 4, cv2.LINE_AA)
                        
                        cv2.putText(image, 'Collecting frames for {} Video Number {}'.format(action, seq), (15, 12),
                                    cv2.FONT_HERSHEY_SIMPLEX, .5, (0, 0, 255), 1, cv2.LINE_AA)
                        cv2.waitKey(2000) # How long to wait before starting to collect frames 
                    else:
                        cv2.putText(image, 'Collecting frames for {} Video Number {}'.format(action, seq), (15, 12),
                                    cv2.FONT_HERSHEY_SIMPLEX, .5, (0, 0, 255), 1, cv2.LINE_AA)

                    # save kp 
                    keypoints = get_keypoints(results)
                    np.save(os.path.join(DATA_PATH, action, str(seq), str(frame_num)), keypoints)
                    
                    # show the feed
                    cv2.imshow('Live Feed', image)
                    
                    # break gracefully 
                    if cv2.waitKey(10) == ord('q'):
                        break
        vc.release()
        cv2.destroyAllWindows()

In [None]:
collect_data()