# Recognition model of fingerspelling signs in Estonian sign language with Mediapipe and Tensorflow
It is recommended you run this file in Google Colab, as some of the used python packages are deprecated and have developed dependency conflicts with Tensorflow. If you wish to run this notebook on your own system, a Linux-based OS is recommended along with a python version between 3.8 and 3.10. The specific versions of Tensorflow and mediapipe-model-maker may need to be tweaked to find a conflict-free result.

The default dataset, along with documentation and an interactive web-app can be found at [this repo](https://github.com/Karl-Kristjan-Puusepp/EstonianFingerspellingSigns).

This notebook will guide you through
1. Importing and preparing the default dataset
2. Hand landmark recognition
3. Training, evaluating and exporting the model

This notebook is laregly based on [this](https://colab.research.google.com/github/googlesamples/mediapipe/blob/main/examples/customization/gesture_recognizer.ipynb#scrollTo=JO1GUwC1_T2x) Google Mediapipe example noteboot that has been fitted for the current use case.

In [None]:
#@title License information
# Copyright 2023 The MediaPipe Authors.
# Licensed under the Apache License, Version 2.0 (the "License");
#
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

<a id="1"></a>
## 1. Importing and preparing the default dataset
First we install the necessary libraries (approx. runtime 2 min)

In [None]:
!pip install --upgrade pip
!pip install -q mediapipe-model-maker # On Mac systems, a different distribution of Tensorflow is required to be preinstalled
!pip install Pillow

Collecting pip
  Downloading pip-23.3.1-py3-none-any.whl (2.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m12.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 23.1.2
    Uninstalling pip-23.1.2:
      Successfully uninstalled pip-23.1.2
Successfully installed pip-23.3.1
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m57.7/57.7 kB[0m [31m901.9 kB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m8.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.6/43.6 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m128.0/128.0 kB[0m [31m10.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m 

We then clone the repo containing the Estonian sign language fingerspelling signs into our project folder.

(approx. runtime 2 min)

In [None]:
!git clone https://github.com/Karl-Kristjan-Puusepp/EstonianFingerspellingSigns.git

We can check that the dataset has been imported correctly by checking the labels of the dataset. Currently the directory of of the images is set at "EstonianFingerspellingSigns/data/oneHandedGesturesCropped". If you wish to use a different dataset, simply change the 'dataset_path' variable. (NOTE: the dataset must include a 'none' folder. This is a requirement of the mediapipe_model_maker.)

We also define the path of the reduced dataset used for hyperparameter tuning.


In [None]:
from google.colab import files # Comment out if running locally
import os
import tensorflow as tf
assert tf.__version__.startswith('2')

from mediapipe_model_maker import gesture_recognizer
import matplotlib.pyplot as plt

dataset_path = "EstonianFingerspellingSigns/data/oneHandedGestures" # change if you are using a different dataset

# This dataset is only used for hyperparameter tuning and can be omitted
# reduced_dataset_path = "EstonianFingerspellingSigns/data/oneHandedGesturesCroppedReduced"
print(dataset_path)
labels = []
for i in os.listdir(dataset_path):
  if os.path.isdir(os.path.join(dataset_path, i)):
    labels.append(i)
print(labels)

To conserve memory, the dataset currently contains only completely unique images. This means the amount of left- and right-handed gestures is unbalanced. To account for this, we mirror every image in both the original and reduced datasets and save it as a copy.

(Runtime: Approx 20 sec)

In [None]:
from PIL import Image
import os

def flip_and_save_images(data_folder):
    for root, dirs, files in os.walk(data_folder):
        for file in files:
            # Check if the file is an image (you can customize the list of valid extensions)
            if file.lower().endswith(('.png', '.jpg', '.jpeg')):
                image_path = os.path.join(root, file)
                original_image = Image.open(image_path)

                # Flip
                flipped_image = original_image.transpose(Image.FLIP_LEFT_RIGHT)

                # Append "_m" to the original filename (before the file extension)
                new_filename = os.path.splitext(file)[0] + "_m" + os.path.splitext(file)[1]

                save_path = os.path.join(root, new_filename)

                # Save the flipped image
                flipped_image.save(save_path)
                #print(f"Flipped image {file}")
        print(f"Label {root} done")

flip_and_save_images(dataset_path)
# flip_and_save_images(reduced_dataset_path)



## 2. Hand landmark recognition
In this step we turn the images into normalised data that can be fed into a machine learning model. For this we use the Mediapipe Hand Gesture Landmark library, which takes in an image and returns a set of 21 landmarks in 3d space, each corresponding to a keypoint or joint in a hand. The default dataset has been culled so that in each image, a hand is always found by the gesture_recognizer.

The code below must go through every image in the dataset and perform the recognition. This process takes a while - approx. 1 minute per every 900 images. This scales linearly with more images. In case of the default dataset, the expected runtime is around 14 minutes.





In [None]:
data = gesture_recognizer.Dataset.from_folder(
    dirname=dataset_path,
    hparams=gesture_recognizer.HandDataPreprocessingParams()
)
train_data, rest_data = data.split(0.8)
validation_data, test_data = rest_data.split(0.5)
'''
reduced_data = gesture_recognizer.Dataset.from_folder(
    dirname=reduced_dataset_path,
    hparams=gesture_recognizer.HandDataPreprocessingParams()
)
'''

## 3. Training, evaluating and exporting a machine learning model

Here, we perform a simplified grid search on the hyperparameters of the dataset to find optimal models, that deliver a good accuracy while not overfitting the data. These measurements are then saved to a pandas dataframe and exported to a csv. This step may be skipped in favor of the code in the next step, with the optimal parameters already inserted as depending on the ranges of values to search, the runtime can be anywhere between 1 - 40h.

In [None]:
'''
import pandas as pd
import itertools

# Define the CSV file path
csv_path = "gridsearch.csv"

# Create a list to store results for DataFrame
results_list = []

# Define the hyperparameter ranges
epoch_values = [15, 20]
batch_size_values = [4, 8]
dropout_rate_values = [0.05]
layer_widths_values = [[]]

# Perform grid search
for epochs, batch_size, dropout_rate, layer_widths in itertools.product(epoch_values, batch_size_values, dropout_rate_values, layer_widths_values):
    hparams = gesture_recognizer.HParams(epochs=epochs, export_dir="exported_model", batch_size=batch_size)
    model_options = gesture_recognizer.ModelOptions(dropout_rate=dropout_rate, layer_widths=layer_widths)
    options = gesture_recognizer.GestureRecognizerOptions(model_options=model_options, hparams=hparams)

    # Create model with current hyperparameters
    model = gesture_recognizer.GestureRecognizer.create(
        train_data=train_data,
        validation_data=validation_data,
        options=options
    )

    loss, acc = model.evaluate(test_data, batch_size=1)
    print(f"Test loss: {loss}, Test accuracy: {acc}")

    # Append results to list for DataFrame
    results_list.append([epochs, batch_size, dropout_rate, layer_widths, loss, acc])

# Create a DataFrame from the results list
columns = ["Epochs", "Batch Size", "Dropout Rate", "Layer Widths", "Loss", "Accuracy"]
results_df = pd.DataFrame(results_list, columns=columns)

# Save DataFrame to CSV
results_df.to_csv(csv_path, index=False)

# Display the DataFrame
print(results_df)
'''

Training the model with the optimal parameters. Approx runtime: 3min

In [None]:
hparams = gesture_recognizer.HParams(epochs=12, export_dir="exported_model", batch_size=16)
model_options = gesture_recognizer.ModelOptions(dropout_rate=0.2,layer_widths = [])
options = gesture_recognizer.GestureRecognizerOptions(model_options=model_options, hparams=hparams)
model = gesture_recognizer.GestureRecognizer.create(
    train_data=train_data,
    validation_data=validation_data,
    options=options
)

Evaluating the model

In [None]:
loss, acc = model.evaluate(test_data, batch_size=1)
print(f"Test loss:{loss}, Test accuracy:{acc}")

Exporting and downloading the model

In [None]:
  model.export_model()

In [16]:
files.download('exported_model/gesture_recognizer.task')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>