# Distance dataset generation

---


## Motivation


When using Google's mediaPipe library to detect hands, we get handlandmarks like so :

<img src="assets/hand_points.png" width="833">


While doing some gesture with our hand, points are moving and thus, their distance from the point `0` (wrist) is changing
A good approach to define hand gestures could be by analysing the distance of each points from the wrist.

This would eliminate some limitations we could have by analyzing points normalized coordinates or the whole image using a CNN. For example hand rotation.
When rotating the hand doing an "open palm" gesture, points coordinates change but not their distance from the wrist


<img src="assets/distance_schema.png" width="640">


Furthermore, this approach is giving us more control over prediction workflow.
We can do model inference only when detecting landmarks.

We also insanely reduce the amount of inputs we provide to our model. We move from a rgb matrix of 853x480 (or whatever resolution) to 20 scalar inputs.
This will signinicantly speed up training and model inferencing, thus increase performance on our Nvidia Jetson Nano


## Introduction


We are going to generate a new dataset based on the lightweight version of the [HaGRID](https://github.com/hukenovs/hagrid) dataset (HAnd Gesture Recognition Image Dataset).

I kept only the data that was relevant to our use case. `(closed, like, dislike, palm, point_up, rock, victory and victory_inverted)` gestures


| class | gesture          |
| ----- | ---------------- |
| 0     | closed           |
| 1     | dislike          |
| 2     | like             |
| 3     | palm             |
| 4     | point_up         |
| 5     | rock             |
| 6     | victory          |
| 7     | vicotry_inverted |


## Worflow


For each image in the folder :

1. Open the image
2. Detect landmarks using google's [MediaPipe library](https://developers.google.com/mediapipe/solutions/vision/hand_landmarker#configurations_options)
3. Normalize the landmarks
4. compute relative distance and angles from wrist
5. Write in a `.csv` file the distances and angles with the corresponding class


# Time to code !


In [102]:
pip install -r requirements.txt

Note: you may need to restart the kernel to use updated packages.


## Import libraries


In [103]:
import mediapipe as mp
import os
import time
import logging
from joblib import Parallel, delayed

## Mediapipe `hand_landmarks` detection


In [104]:
# Désactiver les logs de MediaPipe en définissant le niveau de journalisation sur CRITICAL
logging.getLogger("mediapipe").setLevel(logging.ERROR)

In [105]:
mp_holistic = mp.solutions.holistic
mp_drawing = mp.solutions.drawing_utils

In [106]:
mp_names = [
    "THUMB_CMC",
    "THUMB_MCP",
    "THUMB_IP",
    "THUMB_TIP",
    "INDEX_FINGER_MCP",
    "INDEX_FINGER_PIP",
    "INDEX_FINGER_DIP",
    "INDEX_FINGER_TIP",
    "MIDDLE_FINGER_MCP",
    "MIDDLE_FINGER_PIP",
    "MIDDLE_FINGER_DIP",
    "MIDDLE_FINGER_TIP",
    "RING_FINGER_MCP",
    "RING_FINGER_PIP",
    "RING_FINGER_DIP",
    "RING_FINGER_TIP",
    "PINKY_MCP",
    "PINKY_PIP",
    "PINKY_DIP",
    "PINKY_TIP",
]

In [107]:
def get_landmarks(image, holistic):
    results = holistic.process(image)

    if results.left_hand_landmarks and results.right_hand_landmarks:
        return None, None

    if results.left_hand_landmarks:
        return [1, 0], results.left_hand_landmarks

    if results.right_hand_landmarks:
        return [0, 1], results.right_hand_landmarks

    return None, None

# Main Program


### Tweak parameters


In [108]:
DATASET_FOLDER_PATH = "./datasets/hagrid"
DATASET_FILENAME = "test_dataset.csv"

MIN_DETECTION_CONFIDENCE = 0.3
MP_MODEL_COMPLEXITY = 0

In [109]:
TEST_MODE = False

TEST_IMAGE_PATH = "./datasets/hagrid/palm/0b0d717b-0205-4dfc-a0ed-0c93b6bd5cd8.jpg"
TEST_IMAGE_PATH = "./datasets/hagrid/palm/00a0d3e0-9fb4-49c5-8a0c-3e0dbe2b76bf.jpg"

In [110]:
from turtle import st
from helpers.camera import close_image
from helpers.camera import get_close_event
from helpers.camera import show_image
from helpers.computations import compute_distances_angles_from_wrist
from helpers.mediapipe import draw_landmarks
from helpers.camera import open_image


def test_flow(holistic):
    image = open_image(TEST_IMAGE_PATH)

    # Get landmarks
    hands, landmarks = get_landmarks(image, holistic)
    data = []

    if hands and landmarks:
        image = draw_landmarks(image, landmarks, mp_holistic, mp_drawing)
        data += hands
        data += compute_distances_angles_from_wrist(landmarks.landmark)

    show_image(image)

    while True:
        if get_close_event():
            break

    close_image()


if TEST_MODE:
    with mp_holistic.Holistic(
        min_detection_confidence=MIN_DETECTION_CONFIDENCE,
        model_complexity=MP_MODEL_COMPLEXITY,
        static_image_mode=True,
    ) as holistic:
        test_flow(holistic)

In [111]:
from helpers.csv_utils import create_csv


columns = ["LEFT_HAND", "RIGHT_HAND"]
columns += [name + "_DISTANCE_FROM_WRIST" for name in mp_names]
columns += [name + "_ANGLE_FROM_WRIST" for name in mp_names]
columns.append("label")

print(f"dataset columns : {columns}")

create_csv(DATASET_FILENAME, columns)

dataset columns : ['LEFT_HAND', 'RIGHT_HAND', 'THUMB_CMC_DISTANCE_FROM_WRIST', 'THUMB_MCP_DISTANCE_FROM_WRIST', 'THUMB_IP_DISTANCE_FROM_WRIST', 'THUMB_TIP_DISTANCE_FROM_WRIST', 'INDEX_FINGER_MCP_DISTANCE_FROM_WRIST', 'INDEX_FINGER_PIP_DISTANCE_FROM_WRIST', 'INDEX_FINGER_DIP_DISTANCE_FROM_WRIST', 'INDEX_FINGER_TIP_DISTANCE_FROM_WRIST', 'MIDDLE_FINGER_MCP_DISTANCE_FROM_WRIST', 'MIDDLE_FINGER_PIP_DISTANCE_FROM_WRIST', 'MIDDLE_FINGER_DIP_DISTANCE_FROM_WRIST', 'MIDDLE_FINGER_TIP_DISTANCE_FROM_WRIST', 'RING_FINGER_MCP_DISTANCE_FROM_WRIST', 'RING_FINGER_PIP_DISTANCE_FROM_WRIST', 'RING_FINGER_DIP_DISTANCE_FROM_WRIST', 'RING_FINGER_TIP_DISTANCE_FROM_WRIST', 'PINKY_MCP_DISTANCE_FROM_WRIST', 'PINKY_PIP_DISTANCE_FROM_WRIST', 'PINKY_DIP_DISTANCE_FROM_WRIST', 'PINKY_TIP_DISTANCE_FROM_WRIST', 'THUMB_CMC_ANGLE_FROM_WRIST', 'THUMB_MCP_ANGLE_FROM_WRIST', 'THUMB_IP_ANGLE_FROM_WRIST', 'THUMB_TIP_ANGLE_FROM_WRIST', 'INDEX_FINGER_MCP_ANGLE_FROM_WRIST', 'INDEX_FINGER_PIP_ANGLE_FROM_WRIST', 'INDEX_FINGER_DIP_

In [112]:
from helpers.csv_utils import write_csv
from helpers.computations import compute_distances_angles_from_wrist
from helpers.camera import open_image
from helpers.mediapipe import draw_landmarks


def process_image(image_path: str, label: str, holistics):
    image = open_image(image_path)

    # Get landmarks
    hands, landmarks = get_landmarks(image, holistics)
    data = []

    if hands and landmarks:

        if TEST_MODE:
            image = draw_landmarks(image, landmarks, mp_holistic, mp_drawing)

        data += hands
        data += compute_distances_angles_from_wrist(landmarks.landmark)
        data.append(label)

        write_csv(DATASET_FILENAME, data)

In [113]:
def process_folder(folder):
    label = folder.split("/")[-1]
    print(f"Processing folder {label}...")

    with mp_holistic.Holistic(
        min_detection_confidence=MIN_DETECTION_CONFIDENCE,
        model_complexity=MP_MODEL_COMPLEXITY,
    ) as holistic:
        for image_filename in os.listdir(folder):
            if image_filename.endswith(".jpg"):
                image_path = os.path.join(folder, image_filename)
                process_image(image_path, label, holistic)

    print("=====================================")
    print(f"Folder {label} processed !")

In [114]:
from helpers.misc import get_timestamp


if not TEST_MODE and __name__ == "__main__":
    start_time = time.time()
    current_subfolder = ""

    Parallel(n_jobs=-1)(  # Use all available cores
        delayed(process_folder)(folder) for folder, _, _ in os.walk(DATASET_FOLDER_PATH)
    )

    end_time = time.time()

    print("Done")
    print(f"Dataset generated in {get_timestamp(end_time - start_time)}")

Processing folder like...
Processing folder victory...
Processing folder point_up...
Processing folder closed...
Processing folder victory_inverted...
Processing folder hagrid...
Processing folder dislike...
Processing folder rock...


I0000 00:00:1707068779.215222       1 gl_context.cc:344] GL version: 2.1 (2.1 Metal - 88), renderer: Apple M1 Pro
I0000 00:00:1707068779.215951       1 gl_context.cc:344] GL version: 2.1 (2.1 Metal - 88), renderer: Apple M1 Pro
I0000 00:00:1707068779.223528       1 gl_context.cc:344] GL version: 2.1 (2.1 Metal - 88), renderer: Apple M1 Pro
I0000 00:00:1707068779.224453       1 gl_context.cc:344] GL version: 2.1 (2.1 Metal - 88), renderer: Apple M1 Pro
I0000 00:00:1707068779.224701       1 gl_context.cc:344] GL version: 2.1 (2.1 Metal - 88), renderer: Apple M1 Pro
I0000 00:00:1707068779.225495       1 gl_context.cc:344] GL version: 2.1 (2.1 Metal - 88), renderer: Apple M1 Pro
I0000 00:00:1707068779.229201       1 gl_context.cc:344] GL version: 2.1 (2.1 Metal - 88), renderer: Apple M1 Pro
I0000 00:00:1707068779.261502       1 gl_context.cc:344] GL version: 2.1 (2.1 Metal - 88), renderer: Apple M1 Pro
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
INFO: Created TensorFlow Lite XN

Folder hagrid processed !
Processing folder palm...


I0000 00:00:1707068780.747993       1 gl_context.cc:344] GL version: 2.1 (2.1 Metal - 88), renderer: Apple M1 Pro


KeyboardInterrupt: 

## Dataset limitations

If mediapipe made any wrong hand landmark prediction, the dataset may not be perfect...
