Project: /mediapipe/_project.yaml
Book: /mediapipe/_book.yaml

<link rel="stylesheet" href="/mediapipe/site.css">

# Hand gesture recognition model customization guide

<table align="left" class="buttons">
  <td>
    <a href="https://colab.research.google.com/github/googlesamples/mediapipe/blob/main/examples/customization/gesture_recognizer.ipynb" target="_blank">
      <img src="https://developers.google.com/static/mediapipe/solutions/customization/colab-logo-32px_1920.png" alt="Colab logo"> Run in Colab
    </a>
  </td>

  <td>
    <a href="https://github.com/googlesamples/mediapipe/blob/main/examples/customization/gesture_recognizer.ipynb" target="_blank">
      <img src="https://developers.google.com/static/mediapipe/solutions/customization/github-logo-32px_1920.png" alt="GitHub logo">
      View on GitHub
    </a>
  </td>
</table>

In [1]:
#@title License information
# Copyright 2023 The MediaPipe Authors.
# Licensed under the Apache License, Version 2.0 (the "License");
#
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

The MediaPipe Model Maker package is a low-code solution for customizing on-device machine learning (ML) Models.

This notebook shows the end-to-end process of customizing a gesture recognizer model for recognizing some common hand gestures in the [HaGRID](https://www.kaggle.com/datasets/innominate817/hagrid-sample-30k-384p) dataset.

## Prerequisites

Install the MediaPipe Model Maker package.

In [2]:
%pip install --upgrade pip
%pip install mediapipe-model-maker

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


Import the required libraries.

In [3]:
# from google.colab import files
import os
import tensorflow as tf
assert tf.__version__.startswith('2')

from mediapipe_model_maker import gesture_recognizer

import matplotlib.pyplot as plt

2025-02-13 11:39:33.891895: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-02-13 11:39:33.891978: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-02-13 11:39:33.893124: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-02-13 11:39:33.901037: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.

TensorFlow Addons (TFA) has ended development and in

## Simple End-to-End Example

This end-to-end example uses Model Maker to customize a model for on-device gesture recognition.

### Get the dataset

The dataset for gesture recognition in model maker requires the following format: `<dataset_path>/<label_name>/<img_name>.*`. In addition, one of the label names (`label_names`) must be `none`. The `none` label represents any gesture that isn't classified as one of the other gestures.

This example uses a rock paper scissors dataset sample which is downloaded from GCS.

In [4]:
!mkdir -p "./.training-data"
!wget https://storage.googleapis.com/mediapipe-tasks/gesture_recognizer/rps_data_sample.zip -O "./.training-data/rps_data_sample.zip"
!unzip -o "./.training-data/rps_data_sample.zip" -d "./.training-data"
dataset_path = "./.training-data/rps_data_sample"

--2025-02-13 11:40:34--  https://storage.googleapis.com/mediapipe-tasks/gesture_recognizer/rps_data_sample.zip
198.18.3.75torage.googleapis.com (storage.googleapis.com)... 
Connecting to storage.googleapis.com (storage.googleapis.com)|198.18.3.75|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 12332447 (12M) [application/zip]
Saving to: ‘./.training-data/rps_data_sample.zip’


2025-02-13 11:40:36 (9.25 MB/s) - ‘./.training-data/rps_data_sample.zip’ saved [12332447/12332447]

Archive:  ./.training-data/rps_data_sample.zip
  inflating: ./.training-data/rps_data_sample/paper/77.jpg  
  inflating: ./.training-data/rps_data_sample/paper/837.jpg  
  inflating: ./.training-data/rps_data_sample/paper/176.jpg  
  inflating: ./.training-data/rps_data_sample/paper/406.jpg  
  inflating: ./.training-data/rps_data_sample/paper/771.jpg  
  inflating: ./.training-data/rps_data_sample/paper/89.jpg  
  inflating: ./.training-data/rps_data_sample/paper/76.jpg  
  inflating: ./.

Verify the rock paper scissors dataset by printing the labels. There should be 4 gesture labels, with one of them being the `none` gesture.

In [5]:
print(dataset_path)
labels = []
for i in os.listdir(dataset_path):
  if os.path.isdir(os.path.join(dataset_path, i)):
    labels.append(i)
print(labels)

./.training-data/rps_data_sample
['none', 'paper', 'rock', 'scissors']


To better understand the dataset, plot a couple of example images for each gesture.

In [6]:
NUM_EXAMPLES = 5

for label in labels:
  label_dir = os.path.join(dataset_path, label)
  example_filenames = os.listdir(label_dir)[:NUM_EXAMPLES]
  fig, axs = plt.subplots(1, NUM_EXAMPLES, figsize=(10,2))
  for i in range(NUM_EXAMPLES):
    axs[i].imshow(plt.imread(os.path.join(label_dir, example_filenames[i])))
    axs[i].get_xaxis().set_visible(False)
    axs[i].get_yaxis().set_visible(False)
  fig.suptitle(f'Showing {NUM_EXAMPLES} examples for {label}')

plt.show()

  plt.show()


### Run the example
The workflow consists of 4 steps which have been separated into their own code blocks.

**Load the dataset**

Load the dataset located at `dataset_path` by using the `Dataset.from_folder` method. When loading the dataset, run the pre-packaged hand detection model from MediaPipe Hands to detect the hand landmarks from the images. Any images without detected hands are ommitted from the dataset. The resulting dataset will contain the extracted hand landmark positions from each image, rather than images themselves.

The `HandDataPreprocessingParams` class contains two configurable options for the data loading process:
* `shuffle`: A boolean controlling whether to shuffle the dataset. Defaults to true.
* `min_detection_confidence`: A float between 0 and 1 controlling the confidence threshold for hand detection.

Split the dataset: 80% for training, 10% for validation, and 10% for testing.

In [7]:
data = gesture_recognizer.Dataset.from_folder(
    dirname=dataset_path,
    hparams=gesture_recognizer.HandDataPreprocessingParams()
)
train_data, rest_data = data.split(0.8)
validation_data, test_data = rest_data.split(0.5)

Using existing files at /tmp/model_maker/gesture_recognizer/palm_detection_full.tflite
Using existing files at /tmp/model_maker/gesture_recognizer/hand_landmark_full.tflite
INFO:tensorflow:Loading image /mnt/i/runtime/mediapipe-samples-1/examples/customization/.training-data/rps_data_sample/paper/339.jpg
INFO:tensorflow:Loading image /mnt/i/runtime/mediapipe-samples-1/examples/customization/.training-data/rps_data_sample/paper/730.jpg
INFO:tensorflow:Loading image /mnt/i/runtime/mediapipe-samples-1/examples/customization/.training-data/rps_data_sample/scissors/162.jpg
INFO:tensorflow:Loading image /mnt/i/runtime/mediapipe-samples-1/examples/customization/.training-data/rps_data_sample/rock/329.jpg


I0000 00:00:1739418068.903005   21670 gl_context_egl.cc:85] Successfully initialized EGL. Major : 1 Minor: 5
I0000 00:00:1739418068.960107   21980 gl_context.cc:369] GL version: 3.1 (OpenGL ES 3.1 Mesa 23.2.1-1ubuntu3.1~22.04.3), renderer: D3D12 (NVIDIA GeForce GTX 1080 Ti)
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
W0000 00:00:1739418069.004537   21982 inference_feedback_manager.cc:114] Feedback manager requires a model with a single signature inference. Disabling support for feedback tensors.
W0000 00:00:1739418069.023862   21993 inference_feedback_manager.cc:114] Feedback manager requires a model with a single signature inference. Disabling support for feedback tensors.
W0000 00:00:1739418069.085433   22011 landmark_projection_calculator.cc:186] Using NORM_RECT without IMAGE_DIMENSIONS is only supported for the square ROI. Provide IMAGE_DIMENSIONS or use PROJECTION_MATRIX.


INFO:tensorflow:Loading image /mnt/i/runtime/mediapipe-samples-1/examples/customization/.training-data/rps_data_sample/scissors/522.jpg
INFO:tensorflow:Loading image /mnt/i/runtime/mediapipe-samples-1/examples/customization/.training-data/rps_data_sample/rock/593.jpg
INFO:tensorflow:Loading image /mnt/i/runtime/mediapipe-samples-1/examples/customization/.training-data/rps_data_sample/none/1883.jpg
INFO:tensorflow:Loading image /mnt/i/runtime/mediapipe-samples-1/examples/customization/.training-data/rps_data_sample/scissors/476.jpg
INFO:tensorflow:Loading image /mnt/i/runtime/mediapipe-samples-1/examples/customization/.training-data/rps_data_sample/none/1814.jpg
INFO:tensorflow:Loading image /mnt/i/runtime/mediapipe-samples-1/examples/customization/.training-data/rps_data_sample/none/394.jpg
INFO:tensorflow:Loading image /mnt/i/runtime/mediapipe-samples-1/examples/customization/.training-data/rps_data_sample/scissors/705.jpg
INFO:tensorflow:Loading image /mnt/i/runtime/mediapipe-samples

2025-02-13 11:41:38.093868: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:887] could not open file to read NUMA node: /sys/bus/pci/devices/0000:08:00.0/numa_node
Your kernel may have been built without NUMA support.
2025-02-13 11:41:38.093994: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:887] could not open file to read NUMA node: /sys/bus/pci/devices/0000:43:00.0/numa_node
Your kernel may have been built without NUMA support.
2025-02-13 11:41:41.210152: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:887] could not open file to read NUMA node: /sys/bus/pci/devices/0000:08:00.0/numa_node
Your kernel may have been built without NUMA support.
2025-02-13 11:41:41.210239: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:887] could not open file to read NUMA node: /sys/bus/pci/devices/0000:43:00.0/numa_node
Your kernel may have been built without NUMA support.
2025-02-13 11:41:41.210299: I external/local_xla/xla/stream_executor

Using existing files at /tmp/model_maker/gesture_recognizer/gesture_embedder
INFO:tensorflow:Load valid hands with size: 473, num_label: 4, labels: none,paper,rock,scissors.


INFO:tensorflow:Load valid hands with size: 473, num_label: 4, labels: none,paper,rock,scissors.


**Train the model**

Train the custom gesture recognizer by using the create method and passing in the training data, validation data, model options, and hyperparameters. For more information on model options and hyperparameters, see the [Hyperparameters](#hyperparameters) section below.

In [17]:
!mkdir -p "./.model"
hparams = gesture_recognizer.HParams(
    export_dir="./.model/exported_model2",
    learning_rate=0.0005,
    batch_size=4,
    epochs=10, 
    lr_decay=0.99,
)
model_options = gesture_recognizer.ModelOptions(
    layer_widths=[64, 32]  # Add intermediate layers
)
options = gesture_recognizer.GestureRecognizerOptions(
    hparams=hparams,
    model_options=model_options,
)
model = gesture_recognizer.GestureRecognizer.create(
    train_data=train_data,
    validation_data=validation_data,
    options=options,
)

Model: "model_5"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 hand_embedding (InputLayer  [(None, 128)]             0         
 )                                                               
                                                                 
 batch_normalization_15 (Ba  (None, 128)               512       
 tchNormalization)                                               
                                                                 
 re_lu_15 (ReLU)             (None, 128)               0         
                                                                 
 dropout_15 (Dropout)        (None, 128)               0         
                                                                 
 custom_gesture_recognizer_  (None, 64)                8256      
 0 (Dense)                                                       
                                                           

INFO:tensorflow:Training the models...


Resuming from ./.model/exported_model2/epoch_models/model-0001
Epoch 1/10


KeyboardInterrupt: 

: 

**Evaluate the model performance**

After training the model, evaluate it on a test dataset and print the loss and accuracy metrics.

In [15]:
loss, acc = model.evaluate(test_data, batch_size=1)
print(f"Test loss:{loss}, Test accuracy:{acc}")

Test loss:0.21735136210918427, Test accuracy:0.8541666865348816


2025-02-13 11:51:46.674421: I tensorflow/core/framework/local_rendezvous.cc:421] Local rendezvous recv item cancelled. Key hash: 7423246796892285392
2025-02-13 11:51:46.674529: I tensorflow/core/framework/local_rendezvous.cc:421] Local rendezvous recv item cancelled. Key hash: 10203139980716914487
2025-02-13 11:51:46.674547: I tensorflow/core/framework/local_rendezvous.cc:421] Local rendezvous recv item cancelled. Key hash: 9006436443737671257


**Export to Tensorflow Lite Model**

After creating the model, convert and export it to a Tensorflow Lite model format for later use on an on-device application. The export also includes model metadata, which includes the label file.

In [18]:
model.export_model()
!ls exported_model

Using existing files at /tmp/model_maker/gesture_recognizer/gesture_embedder.tflite
Using existing files at /tmp/model_maker/gesture_recognizer/palm_detection_full.tflite
Using existing files at /tmp/model_maker/gesture_recognizer/hand_landmark_full.tflite
Using existing files at /tmp/model_maker/gesture_recognizer/canned_gesture_classifier.tflite
INFO:tensorflow:Assets written to: /tmp/tmpnm2fi12k/saved_model/assets


INFO:tensorflow:Assets written to: /tmp/tmpnm2fi12k/saved_model/assets
2025-02-13 11:52:48.544326: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:378] Ignored output_format.
2025-02-13 11:52:48.544383: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:381] Ignored drop_control_dependency.
2025-02-13 11:52:48.544604: I tensorflow/cc/saved_model/reader.cc:83] Reading SavedModel from: /tmp/tmpnm2fi12k/saved_model
2025-02-13 11:52:48.547054: I tensorflow/cc/saved_model/reader.cc:51] Reading meta graph with tags { serve }
2025-02-13 11:52:48.547080: I tensorflow/cc/saved_model/reader.cc:146] Reading SavedModel debug info (if present) from: /tmp/tmpnm2fi12k/saved_model
2025-02-13 11:52:48.553456: I tensorflow/cc/saved_model/loader.cc:233] Restoring SavedModel bundle.
2025-02-13 11:52:48.599270: I tensorflow/cc/saved_model/loader.cc:217] Running initialization op on SavedModel bundle at path: /tmp/tmpnm2fi12k/saved_model
2025-02-13 11:52:48.617103: I ten

ls: cannot access 'exported_model': No such file or directory


In [None]:
# files.download('exported_model/gesture_recognizer.task')

## Run the model on-device

To use the TFLite model for on-device usage through MediaPipe Tasks, refer to the Gesture Recognizer [overview page](https://developers.google.com/mediapipe/solutions/vision/gesture_recognizer).

## Hyperparameters {:#hyperparameters}


You can further customize the model using the `GestureRecognizerOptions` class, which has two optional parameters for `ModelOptions` and `HParams`. Use the `ModelOptions` class to customize parameters related to the model itself, and the `HParams` class to customize other parameters related to training and saving the model.

`ModelOptions` has one customizable parameter that affects accuracy:
* `dropout_rate`: The fraction of the input units to drop. Used in dropout layer. Defaults to 0.05.
* `layer_widths`: A list of hidden layer widths for the gesture model. Each element in the list will create a new hidden layer with the specified width. The hidden layers are separated with BatchNorm, Dropout, and ReLU. Defaults to an empty list(no hidden layers).

`HParams` has the following list of customizable parameters which affect model accuracy:
* `learning_rate`: The learning rate to use for gradient descent training. Defaults to 0.001.
* `batch_size`: Batch size for training. Defaults to 2.
* `epochs`: Number of training iterations over the dataset. Defaults to 10.
* `steps_per_epoch`: An optional integer that indicates the number of training steps per epoch. If not set, the training pipeline calculates the default steps per epoch as the training dataset size divided by batch size.
* `shuffle`: True if the dataset is shuffled before training. Defaults to False.
* `lr_decay`: Learning rate decay to use for gradient descent training. Defaults to 0.99.
* `gamma`: Gamma parameter for focal loss. Defaults to 2

Additional `HParams` parameter that does not affect model accuracy:
* `export_dir`: The location of the model checkpoint files and exported model files.

For example, the following trains a new model with the dropout_rate of 0.2 and learning rate of 0.003.

In [None]:
hparams = gesture_recognizer.HParams(learning_rate=0.003, export_dir="exported_model_2")
model_options = gesture_recognizer.ModelOptions(dropout_rate=0.2)
options = gesture_recognizer.GestureRecognizerOptions(model_options=model_options, hparams=hparams)
model_2 = gesture_recognizer.GestureRecognizer.create(
    train_data=train_data,
    validation_data=validation_data,
    options=options
)

Evaluate the newly trained model.

In [None]:
loss, accuracy = model_2.evaluate(test_data)
print(f"Test loss:{loss}, Test accuracy:{accuracy}")