# Collecting Hand Gesture Coordinates System

This code is for collecting hand coordinates dataset to train hand gesture recognition model with web cam.  
If the dataset is collected, there are additional steps to create information based on it.  
There are some libraries to run this code on the first cell, please check it.   
It is recommend to run the code with Jupyter Notebook. Run each cell in order.  

- Recommend IDE: Jupyter Notebook, Visual Sutdio Code
- Language: **Python 3.10**

## 1. Import

In [1]:
# Basic libraries 
import cv2
import os
import numpy as np
import pandas as pd
import mediapipe as mp # Important library

## 2. Define Helper Functions

**supplement: How to use Google Mediapipe (details about Mediapipe attributes)**

mp.solutions.hands
- max_num_hands: the maximum number of hand will be detected by mediapipe in a single frame 
- min_detection_confidence: Minimum confidence value (between 0 and 1) for the hand detection to be considered successful
- min_tracking_confidence: Minimum confidence value (between 0 and 1) for the hand landmarks to be considered tracked successfully

mediapipe method applicable to output(result) 
- MULTI_HAND_LANDMARKS: Collection of detected/tracked hands, where each hand is represented as a list of 21 hand landmarks and each landmark is composed of x, y and z
- MULTI_HANDEDNESS: Collection of handedness of the detected/tracked hands (i.e. is it a left or right hand). Each hand is composed of label and score
    - label is a string of value either "Left" or "Right"
    
 
**Which types of 3D coordinates system can we use**
- origianl
    - MULTI_HAND_LANDMARKS: normalized by image's width and height, z is wrist beding the origin 
    - MULTI_HAND_WORLD_LANDMARKS: real-world corrdinates in meters 
- relative
- normalize 

In [2]:
# Setting for Mediapipe
mp_drawing = mp.solutions.drawing_utils
mp_drawing_styles = mp.solutions.drawing_styles
mp_hands = mp.solutions.hands
hands = mp_hands.Hands(max_num_hands=1,              # Only detect one hand
                       min_detection_confidence=0.7) # Defualt 0.5

In [3]:
# Calculate bounding rectangle
def calc_bounding_rect(image, landmarks): 
    image_width, image_height = image.shape[1], image.shape[0]
    landmark_array = np.empty((0, 2), int)

    for _, landmark in enumerate(landmarks.landmark):
        landmark_x = min(int(landmark.x * image_width), image_width - 1)
        landmark_y = min(int(landmark.y * image_height), image_height - 1)

        landmark_point = [np.array((landmark_x, landmark_y))]
        landmark_array = np.append(landmark_array, landmark_point, axis=0)

    x, y, w, h = cv2.boundingRect(landmark_array)
    
    return [x, y, x + w, y + h]

# Draw bounding rectangle
def draw_bounding_rect(use_brect, image, brect):
    if use_brect:
        cv2.rectangle(image, (brect[0], brect[1]), (brect[2], brect[3]),
                     (0, 0, 0), 1)

    return image

## 3. Start Collecting Face Dataset

To collect dataset, just run below cell

- to capture and make data, press '1'
- to quit the code, press 'q' 

In [8]:
# Configuration: set the number of the image to collect and the root to store dataset 
path = os.getcwd()   # currnt path 
img_cnt = 1          # initial the number of image
max_img = 300        # maximum number of dataset
landmarks = []       # for store hand landmarks 
df = pd.DataFrame()  # for preprocessing landmarks 

# Initialize the webcam 
video_capture = cv2.VideoCapture(0) # 0 is a default embedded camera

while True:
    # Read each frame from the webcam
    _, frame = video_capture.read()
    x, y, c = frame.shape
    
    # Flip the frame vertically
    frame = cv2.flip(frame, 1)
    # Convert the frame as grayscale
    framergb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    
    # Mediapipe processing 
    result = hands.process(framergb)
    if result.multi_hand_landmarks:
        for handslms, handness in zip(result.multi_hand_landmarks, # Be carefule to set attribute for right coordinate system
                                      result.multi_handedness): 
            landmarks.clear() # For empty list 
            for point in mp_hands.HandLandmark: # 0 ~ 20
                x = handslms.landmark[point].x
                y = handslms.landmark[point].y
                z = handslms.landmark[point].z
                landmarks.append([str(point), handness.classification[0].label, x, y, z])
            
            # Draw landmarks on the frame with bounding rectangle
            brect = calc_bounding_rect(frame, handslms)
            frame = draw_bounding_rect(True, frame, brect)
            mp_drawing.draw_landmarks(frame, handslms, mp_hands.HAND_CONNECTIONS)
    
    # To capture the frame 
    if cv2.waitKey(1) == ord('1') :
        img_name = path+"\\Dataset\\HAND\\Dataset08\\id08_frame_{}.png".format(img_cnt) # config
        cv2.imwrite(img_name, frame)
        # If the capture is successful
        print("{} written!".format(img_name)) 
        img_cnt += 1
        df = df.append(pd.DataFrame(landmarks), ignore_index=True)
    
    # To Quit from application, press "q"
    elif cv2.waitKey(1) == ord('q') or img_cnt == (max_img + 1) : 
        img_cnt -= 1
        break
        
    # Show the final output
    cv2.imshow("Output", frame)
    
# Release the webcam and destroy all active windows
print("{} images captured!".format(img_cnt)) # Print how many images captured
video_capture.release()
cv2.destroyAllWindows()

0 images captured!


## 4. Preprocess Dataset

Calculate relative distances between each landmark and create relative coordinates system on dataset.

In [10]:
# Calculate and create relative landmarks from the first landmark 0, "wrist"
def create_relative_landmark(img_cnt, landmark_df) :
    df = pd.DataFrame()
    for i in range(img_cnt) : 
        base_lm = landmark_df.iloc[i*21, 2:].values
        for j in range(21) :
            target_lm = landmark_df.iloc[i*21+j, 2:].values 
            result_lm = target_lm-base_lm
            df = df.append(pd.DataFrame(result_lm).transpose(),  ignore_index=True)
    return df
        
relative_df = create_relative_landmark(img_cnt, df)
result_df = pd.concat([df, relative_df], axis=1)
result_df

  df = df.append(pd.DataFrame(result_lm).transpose(),  ignore_index=True)


Unnamed: 0,0,1,2,3,4,0.1,1.1,2.1
0,HandLandmark.WRIST,Right,0.649742,0.319549,4.393590e-07,0.0,0.0,0.0
1,HandLandmark.THUMB_CMC,Right,0.573171,0.361034,-1.980279e-02,-0.076571,0.041485,-0.019803
2,HandLandmark.THUMB_MCP,Right,0.529059,0.449961,-3.892376e-02,-0.120683,0.130412,-0.038924
3,HandLandmark.THUMB_IP,Right,0.522001,0.541655,-5.097779e-02,-0.127741,0.222107,-0.050978
4,HandLandmark.THUMB_TIP,Right,0.499557,0.603344,-6.559844e-02,-0.150185,0.283795,-0.065599
...,...,...,...,...,...,...,...,...
6295,HandLandmark.RING_FINGER_TIP,Left,0.241911,0.694926,-1.355357e-01,-0.009612,0.369049,-0.135537
6296,HandLandmark.PINKY_MCP,Left,0.198408,0.431762,-9.598525e-02,-0.053114,0.105885,-0.095986
6297,HandLandmark.PINKY_PIP,Left,0.206384,0.536332,-1.148420e-01,-0.045139,0.210454,-0.114843
6298,HandLandmark.PINKY_DIP,Left,0.212087,0.600043,-1.198477e-01,-0.039436,0.274165,-0.119849


## 5. Export Dataset

After preprocessing, dataset should be exported at the specific root. 

In [11]:
from pathlib import Path  

filepath = path+"\\Dataset\\HAND\\Dataset08\\"        # config: saving root 
result_df.to_csv(filepath+"gesture08.csv", mode="w")  # config: dataset name