# 1) Cropping with mediapipe
We start our dataset preprocessing by extracting and cropping hand gestures from the dataset explored in the previous notebook.

In [1]:
from utils.mediapipe_cropper.cropper import process_images
process_images('shared_artifacts/images/hagrid_30k', 'shared_artifacts/images/hagrid_30k_cropped')

Processing directory: shared_artifacts/images/hagrid_30k\like


100%|█████████████████████████████████████████████████████████████████████████████| 3000/3000 [02:11<00:00, 22.74img/s]


Processing directory: shared_artifacts/images/hagrid_30k\stop


100%|█████████████████████████████████████████████████████████████████████████████| 3000/3000 [02:28<00:00, 20.24img/s]


Processing directory: shared_artifacts/images/hagrid_30k\two_up


100%|█████████████████████████████████████████████████████████████████████████████| 6000/6000 [04:44<00:00, 21.12img/s]


When we have cropped hand gesture images saved, they are all of different sizes. We want them to have an optimal size so that it is: 
1. Big enough to capture important hand-shape details
2. Small enough to train fast
3. Consistent across the dataset
    
So we chose to look at the distribution of all our image dimensions and choose a size close to the *median* or *mean*, then round to a CNN-friendly size (like 64, 96, 128). 
- *Motivation to this is:* Powers of 2 and Divisibility - many of these numbers are powers of 2 (64, 128, 256, which is close to 224 in practical terms) or easily divisible by 32. This is crucial because standard CNN architectures use multiple layers of pooling operations that typically reduce the image dimensions by half at each stage.

In [2]:
import os
import cv2
import numpy as np

In [3]:
input_folder = "shared_artifacts/images/hagrid_30k_cropped"

widths = []
heights = []

for label in os.listdir(input_folder):
    label_folder = os.path.join(input_folder, label)

    for f in os.listdir(label_folder):
        img = cv2.imread(os.path.join(label_folder, f))
        h, w = img.shape[:2]
        widths.append(w)
        heights.append(h)

print(f"Mean width: {np.mean(widths):.0f} px")
print(f"Mean height: {np.mean(heights):.0f} px")
print(f"Median width: {np.median(widths)} px")
print(f"Median height: {np.median(heights)} px")

Mean width: 174 px
Mean height: 293 px
Median width: 167.0 px
Median height: 286.0 px


# 2) Resizing and Padding
So dataset (cropped images of "swipe" hand gestures with margin 20px) has:
- Width ~ 75 px
- Height ~ 125 px

Which indicates that:
- the images are not square
- The aspect ratio is roughly 3:5 (75:125 ≈ 0.6)

Since the cropped images are naturally rectangular, if we resize directly to square dimensions like:
- 96×96 -> hands will get squashed
- 128×128 -> same distortion problem

Therefore, we decided to resize while preserving aspect ratio, then pad to a square (add plack pixels)

In [4]:
from utils.resizer.resizer import process_images

input_path = "shared_artifacts/images/hagrid_30k_cropped"
output_path = "shared_artifacts/images/hagrid_30k_resized" 

TARGET_SIZE = 96

process_images(input_path, TARGET_SIZE, output_path)

Processing directory: shared_artifacts/images/hagrid_30k_cropped\like


100%|████████████████████████████████████████████████████████████████████████████| 2765/2765 [00:04<00:00, 614.14img/s]


Processing directory: shared_artifacts/images/hagrid_30k_cropped\stop


100%|████████████████████████████████████████████████████████████████████████████| 2857/2857 [00:04<00:00, 638.18img/s]


Processing directory: shared_artifacts/images/hagrid_30k_cropped\two_up


100%|████████████████████████████████████████████████████████████████████████████| 5326/5326 [00:08<00:00, 654.36img/s]


# 3) No preprocessing pipeline is perfect - Manual check is still needed!
Our utomated filters removed ~97% of problematic images. But those ~3% that are left will be treated by the CNN model will as “truth”, so bad samples → bad model. We don't want that, thus, we perform a quick manual check to remove the last few errors.

## * 3.1) Special case
Since "hagrid" dataset do not include any gestures that would fit well for the functionality of "Slide Right" and "Slide Left", we have to create two new gestures. The plan is:
- Take all "two fingers point up" images.
- Rotate them:
    - 90° clockwise → becomes Slide Right
    - 90° counterclockwise → becomes Slide Left
- But because images include left and right hands, rotating alone flips the semantics.
- So we have to mirror left-hand images before rotating them to get a clean "Slide Right" dataset, and vice versa.

For example:

In [None]:
import matplotlib.pyplot as plt

path = 'shared_artifacts/images/hagrid_30k_resized/train_val_two_up'

files = os.listdir(path)

img = cv2.imread(os.path.join(path, files[0]))
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

# Rotate 90 degrees clockwise
rotated_cw = cv2.rotate(img_rgb, cv2.ROTATE_90_CLOCKWISE)

# Rotate 90 degrees counter-clockwise
rotated_ccw = cv2.rotate(img_rgb, cv2.ROTATE_90_COUNTERCLOCKWISE)

fig, axes = plt.subplots(1, 3, figsize=(15, 5))

axes[0].imshow(img_rgb)
axes[0].set_title('Original')
axes[0].axis('off')

axes[1].imshow(rotated_cw)
axes[1].set_title('Rotated 90° CW - CORRECT for "Swipe Right"')
axes[1].axis('off')

axes[2].imshow(rotated_ccw)
axes[2].set_title('Rotated 90° CCW - INCORRECT for "Swipe Left"')
axes[2].axis('off')

plt.tight_layout()
plt.show()

How it should be:

In [None]:
img_horizontal = cv2.flip(img_rgb, 1)  # 1 - horizontal flip
rotated_ccw = cv2.rotate(img_horizontal, cv2.ROTATE_90_COUNTERCLOCKWISE)

fig, axes = plt.subplots(1, 3, figsize=(15, 5))

axes[0].imshow(img_rgb)
axes[0].set_title('Original')
axes[0].axis('off')

axes[1].imshow(img_horizontal)
axes[1].set_title('Mirrored')
axes[1].axis('off')

axes[2].imshow(img_horizontal)
axes[2].set_title('Rotated 90° CCW - CORRECT for "Swipe Left"')
axes[2].axis('off')

plt.tight_layout()
plt.show()

## 3.2) Mirroring and Rotating

In [5]:
from pathlib import Path

input_path = "shared_artifacts/images/hagrid_30k_resized/two_up"
output_path_left = "shared_artifacts/images/hagrid_30k_resized/swipe_left"
output_path_right = "shared_artifacts/images/hagrid_30k_resized/swipe_right"

Path(output_path_left).mkdir(parents=True, exist_ok=True)
Path(output_path_right).mkdir(parents=True, exist_ok=True)


def process_slide_right(img, filename):
    """Create Slide Right gesture."""
    
    if "left" in filename:
        img = cv2.flip(img, 1)  # convert left hand → right hand
    img = cv2.rotate(img, cv2.ROTATE_90_CLOCKWISE)
    return img


def process_slide_left(img, filename):
    """Create Slide Left gesture."""
    
    if "right" in filename:
        img = cv2.flip(img, 1) # convert right hand → left hand
    img = cv2.rotate(img, cv2.ROTATE_90_COUNTERCLOCKWISE)
    return img
    

print("Generating synthetic slide gestures:")

for file in os.listdir(input_path):
    if not file.lower().endswith((".jpg", ".jpeg", ".png")):
        continue

    img_path = os.path.join(input_path, file)
    img = cv2.imread(img_path)
    if img is None:
        print(f" Could not read image: {file}")
        continue

    # SLIDE RIGHT
    slide_right_img = process_slide_right(img.copy(), file)
    right_output_path = os.path.join(output_path_right, f"right_{file}")
    cv2.imwrite(right_output_path, slide_right_img)

    # SLIDE LEFT
    slide_left_img = process_slide_left(img.copy(), file)
    left_output_path = os.path.join(output_path_left, f"left_{file}")
    cv2.imwrite(left_output_path, slide_left_img)

print("Done slide gestures.")

Generating synthetic slide gestures:
Done slide gestures.


# 4) Train/test split
After having cropped and resized the dataset, we need to split the dataset into "train" and "test"

In [6]:
import shutil
import random

TEST_SET_PERCENTAGE = 0.15

input_path = "shared_artifacts/images/hagrid_30k_resized"
output_path = "shared_artifacts/images/hagrid_30k_test"

labels = [l for l in os.listdir(input_path) if os.path.isdir(os.path.join(input_path, l))]

for label in labels:
    label_input_path = os.path.join(input_path, label)
    label_output_path = os.path.join(output_path, label)

    os.makedirs(label_output_path, exist_ok=True) # make sure the dir exists

    files = [f for f in os.listdir(label_input_path)]
    print(f"Label {label[10:]}: {len(files)} images")

    image_count = int(len(files) * TEST_SET_PERCENTAGE)
    print(f" - Moving {image_count} images to test set")

    files_to_move = random.sample(files, image_count)

    moved_count = 0

    for file in files_to_move:
        src_path = os.path.join(label_input_path, file)
        dst_path = os.path.join(label_output_path, file)

        try:
            shutil.move(src_path, dst_path)
            moved_count += 1
        except Exception as e:
            print(f"Error moving {src_path}: {e}")
    
    print(f" - Moved {moved_count}/{image_count} images to the test set")

Label : 2765 images
 - Moving 414 images to test set
 - Moved 414/414 images to the test set
Label : 2857 images
 - Moving 428 images to test set
 - Moved 428/428 images to the test set
Label : 3000 images
 - Moving 450 images to test set
 - Moved 450/450 images to the test set
Label t: 3000 images
 - Moving 450 images to test set
 - Moved 450/450 images to the test set


Rename the directories to follow the conventional naming

In [8]:
train = "shared_artifacts/images/train"
test = "shared_artifacts/images/test"

os.rename("shared_artifacts/images/hagrid_30k_resized", train)
os.rename("shared_artifacts/images/hagrid_30k_test", test)


Lets take a final look at our dataset

In [9]:
for dir in [train, test]:
    total_images = 0
    for label in os.listdir(dir):
        label_folder = os.path.join(dir, label)
        num_images = len(os.listdir(label_folder))
        print(f"{dir} - Label {label[10:]}: {num_images} images")

shared_artifacts/images/train - Label : 2351 images
shared_artifacts/images/train - Label : 2429 images
shared_artifacts/images/train - Label : 2550 images
shared_artifacts/images/train - Label t: 2550 images
shared_artifacts/images/test - Label : 414 images
shared_artifacts/images/test - Label : 428 images
shared_artifacts/images/test - Label : 450 images
shared_artifacts/images/test - Label t: 450 images
