
We start our dataset preprocessing by extracting and cropping hand gestures from the dataset explored in the previous notebook.

In [None]:
from utils.mediapipe_cropper.cropper import process_images
process_images('shared_artifacts/images/hagrid_30k', 'shared_artifacts/images/hagrid_30k_cropped')

When we have cropped hand gesture images saved, they are all of different sizes. We want them to have an optimal size so that it is: 
1. Big enough to capture important hand-shape details
2. Small enough to train fast
3. Consistent across the dataset
    
So we chose to look at the distribution of all our image dimensions and choose a size close to the *median* or *mean*, then round to a CNN-friendly size (like 64, 96, 128). 
- *Motivation to this is:* Powers of 2 and Divisibility - many of these numbers are powers of 2 (64, 128, 256, which is close to 224 in practical terms) or easily divisible by 32. This is crucial because standard CNN architectures use multiple layers of pooling operations that typically reduce the image dimensions by half at each stage.

In [2]:
import os
import cv2
import numpy as np

input_folder = "shared_artifacts/images/hagrid_30k_cropped"

widths = []
heights = []

for label in os.listdir(input_folder):
    label_folder = os.path.join(input_folder, label)

    for f in os.listdir(label_folder):
        img = cv2.imread(os.path.join(label_folder, f))
        h, w = img.shape[:2]
        widths.append(w)
        heights.append(h)

print(f"Mean width: {np.mean(widths):.0f} px")
print(f"Mean height: {np.mean(heights):.0f} px")
print(f"Median width: {np.median(widths)} px")
print(f"Median height: {np.median(heights)} px")

Mean width: 86 px
Mean height: 116 px
Median width: 82.0 px
Median height: 108.5 px


#### So dataset (cropped images of "swipe" hand gestures with margin 20px) has:
- Width ~ 75 px
- Height ~ 125 px

Which indicates that:
- the images are not square
- The aspect ratio is roughly 3:5 (75:125 ≈ 0.6)

Since the cropped images are naturally rectangular, if we resize directly to square dimensions like:
- 96×96 -> hands will get squashed
- 128×128 -> same distortion problem

Therefore, we decided to resize while preserving aspect ratio, then pad to a square (add plack pixels)

In [None]:
from utils.resizer.resizer import process_images

input_path = "shared_artifacts/images/hagrid_30k_cropped"
output_path = "shared_artifacts/images/hagrid_30k_resized" 

TARGET_SIZE = 94

process_images(input_path, TARGET_SIZE, TARGET_SIZE, output_path)

After having cropped and resized the dataset, we need to split the dataset into "train" and "test"

In [None]:
import shutil
import random

TEST_SET_PERCENTAGE = 0.15

input_path = "shared_artifacts/images/hagrid_30k_resized"
output_path = "shared_artifacts/images/hagrid_30k_test"

labels = [l for l in os.listdir(input_path) if os.path.isdir(os.path.join(input_path, l))]

for label in labels:
    label_input_path = os.path.join(input_path, label)
    label_output_path = os.path.join(output_path, label)

    os.makedirs(label_output_path, exist_ok=True) # make sure the dir exists

    files = [f for f in os.listdir(label_input_path)]
    print(f"Label {label[10:]}: {len(files)} images")

    image_count = int(len(files) * TEST_SET_PERCENTAGE)
    print(f" - Moving {image_count} images to test set")

    files_to_move = random.sample(files, image_count)

    moved_count = 0

    for file in files_to_move:
        src_path = os.path.join(label_input_path, file)
        dst_path = os.path.join(label_output_path, file)

        try:
            shutil.move(src_path, dst_path)
            moved_count += 1
        except Exception as e:
            print(f"Error moving {src_path}: {e}")
    
    print(f" - Moved {moved_count}/{image_count} images to the test set")

Label like: 1235 images
 - Moving 185 images to test set
 - Moved 185/185 images to test set
Label stop: 1359 images
 - Moving 203 images to test set
 - Moved 203/203 images to test set


Rename the directories to follow the conventional naming

In [None]:
train = "shared_artifacts/images/train"
test = "shared_artifacts/images/test"

os.rename("shared_artifacts/images/hagrid_30k_resized", train)
os.rename("shared_artifacts/images/hagrid_30k_test", test)


Lets take a final look at our dataset

In [13]:
for dir in [train, test]:
    total_images = 0
    for label in os.listdir(dir):
        label_folder = os.path.join(dir, label)
        num_images = len(os.listdir(label_folder))
        print(f"{dir} - Label {label[10:]}: {num_images} images")

shared_artifacts/images/train - Label like: 1050 images
shared_artifacts/images/train - Label stop: 1156 images
shared_artifacts/images/test - Label like: 185 images
shared_artifacts/images/test - Label stop: 203 images
