# Building a TensorFlow Lite based computer vision emoji input device with OpenMV

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ArmDeveloperEcosystem/ml-image-classification-example-for-openmv/blob/main/notebook.ipynb)


```
# SPDX-FileCopyrightText: Copyright 2022 Arm Limited and/or its affiliates <open-source-office@arm.com>
# SPDX-License-Identifier: MIT
```

<img src="https://github.com/ArmDeveloperEcosystem/ml-image-classification-example-for-openmv/blob/main/images/00.demo.gif?raw=1" alt="demo" style="width: 500px"/>


## Introduction

Emojis allow us to express emotions in the digital world, they are relatively easy to input on smartphone and tablet devices equipped with touch screen based virtual keyboards, but they are not as easy to input on traditional computing devices that have physical keyboards. To input emojis on these devices, users typically use a keyboard shortcut or mouse to bring up an on-screen emoji selector, and then use a mouse to select the desired emoji from a series of categories.

<figure>
<img src="https://github.com/ArmDeveloperEcosystem/ml-image-classification-example-for-openmv/blob/main/images/01.on-screen-emoji-input-widget-on-macos.png?raw=1" alt="On-screen emoji input widget on macOS" style="display: block; margin-left: auto; margin-right: auto; width: 150px;">
<figcaption style="text-align: center"><i>On-screen emoji input widget on macOS</i></figcaption>
</figure>

This guide will explore using tinyML on an [Arm Cortex-M](https://developer.arm.com/ip-products/processors/cortex-m/) based device to create a <u>**dedicated**</u> input device. This device will take real-time input from a camera and applies a machine learning (ML) image classification model to detect if the image from the camera contains a set of known hand gestures (✋, 👎, 👍, 👊). When the hand gesture is detected with **high** certainty, the device will then use the [USB Human Interface Device (HID) protocol](https://en.wikipedia.org/wiki/USB_human_interface_device_class) to “type” the emoji on the PC.

The [TensorFlow Lite for Microcontrollers](https://www.tensorflow.org/lite/microcontrollers) run-time with [Arm CMSIS-NN](https://arm-software.github.io/CMSIS_5/NN/html/index.html) will be used as the on-device ML inferencing framework on the dedicated input device. On-device inferencing will allow us to <u>reduce</u> the latency of the system, as the image data will be processed at the source (instead of being transmitted to a cloud service). The user’s privacy will also be preserved, as no image data will leave the device at inference time.

All technical assets for this guide can be found on [GitHub](https://github.com/ArmDeveloperEcosystem/ml-image-classification-example-for-openmv).

### Microcontrollers and Keyboards

Microcontroller Units (MCUs) are self-contained computing systems embedded in the devices you use every day, including your keyboard! Like all computing systems, they have inputs and outputs.

The MCU inside a USB keyboard reacts to the digital events that occur when one or more of the key switches on the keyboard are pressed or released. The MCU determines which key(s) triggered the event and then translates the event into a USB HID message to send to the PC using the USB standard.

<figure>
<img src="https://github.com/ArmDeveloperEcosystem/ml-image-classification-example-for-openmv/blob/main/images/02.block-diagram-of-usb-keyboard.png?raw=1" alt="Block diagram of USB keyboard" style="display: block; margin-left: auto; margin-right: auto; width: 500px;">
<figcaption style="text-align: center"><i>Block diagram of USB keyboard</i></figcaption>
</figure>

Keyboard specific USB HID messages have a fixed length of 8 bytes. The first byte is composed of the status of the modifier keys (control, shift, alt) and after a padding byte, the remaining bytes indicate which keys are currently pressed.

<figure >
<img src="https://github.com/ArmDeveloperEcosystem/ml-image-classification-example-for-openmv/blob/main/images/03.wireshark-capture-of-usb-hid-device.png?raw=1" alt="Wireshark capture of USB HID device sending CTRL+SHIFT+U key sequence" style="display: block; margin-left: auto; margin-right: auto; width: 500px">
<figcaption style="text-align: center"><i>Wireshark capture of USB HID device sending CTRL+SHIFT+U key sequence</i></figcaption>
</figure>

The emoji ‘keyboard’ will use an image sensor for input (instead of key switches) and then process the image data locally on a more powerful [Arm Cortex-M7](https://developer.arm.com/Processors/Cortex-M7) based microcontroller. All operations, including ML inferencing, are performed on a [STM32H7 MCU](https://www.st.com/en/microcontrollers-microprocessors/stm32h743vi.html), which contains an Arm Cortex-M7 CPU along with a digital interface for the image sensor and USB communications.

<figure>
<img src="https://github.com/ArmDeveloperEcosystem/ml-image-classification-example-for-openmv/blob/main/images/04.block-diagram-of-computer-vision-based-emoji-keyboard.png?raw=1"  alt="Block diagram of computer vision based emoji keyboard" style="display: block; margin-left: auto; margin-right: auto; width: 500px;">
<figcaption style="text-align: center"><i>Block diagram of computer vision based emoji “keyboard”</i></figcaption>
</figure>

Even though the STM32 H7 is a constrained computing platform that runs at 480 MHz with 1 MB of on-board RAM - we can still process a grayscale 96x96 pixel image input from the camera at just under 20 frames per second (fps)!


### The OpenMV development platform

[OpenMV](https://openmv.io) is an open source (Micro) Python powered Machine Vision platform. The [OpenMV product line-up](https://openmv.io/collections/cams) consists of several Arm Cortex-M based development boards. Each board is equipped with an on-board camera and MCU.

<figure>
<img src="https://github.com/ArmDeveloperEcosystem/ml-image-classification-example-for-openmv/blob/main/images/05.screenshot-of-openmv-camera-products-page.png?raw=1" alt="Screenshot of OpenMV camera products page" style="display: block; margin-left: auto; margin-right: auto; width: 400px;">
<figcaption style="text-align: center"><i>Screenshot of <a href="https://openmv.io/collections/cams">OpenMV camera products page</a></i></figcaption>
</figure>

The development boards can be used in conjunction with the [OpenMV IDE](https://openmv.io/pages/download) to develop machine vision applications. The [OpenMV run-time](https://github.com/openmv/micropython) is based on [MicroPython](https://micropython.org/), which is an implementation of the Python 3 programming language that runs on several Arm Cortex-M based microcontrollers (MCUs).

<figure>
<img src="https://github.com/ArmDeveloperEcosystem/ml-image-classification-example-for-openmv/blob/main/images/06.screenshot-of-openmv-ide.png?raw=1" alt="Screenshot of OpenMV IDE" style="display: block; margin-left: auto; margin-right: auto; width: 500px;">
<figcaption style="text-align: center"><i>Screenshot of OpenMV IDE</i></figcaption>
</figure>

For this project, the [OpenMV Cam H7](https://openmv.io/products/openmv-cam-h7) or [OpenMV Cam H7 R2](https://openmv.io/collections/products/products/openmv-cam-h7-r2) board will suit our needs. Both boards are based on the STM32H7 MCU, the updated R2 revision uses an [MT9M114](https://www.onsemi.com/products/sensors/image-sensors/mt9m114) image sensor instead of the [OV7725](https://www.ovt.com/sensor/ov7725/) image sensor that is used in the original version. Both versions will work well for this project, as mentioned in the [OpenMV “Production Update” blog from June 2021](https://openmv.io/blogs/news/production-update), the MT9M114 sensor offers improved image quality over the OV7725 sensor.

### What we will need

<figure>
<img src="https://github.com/ArmDeveloperEcosystem/ml-image-classification-example-for-openmv/blob/main/images/07.openmv-cam-h7-camera-and-microsd-card.jpg?raw=1" alt="OpenMV Cam H7 Camera (left) and microSD card (right)" style="display: block; margin-left: auto; margin-right: auto; width: 300px;">
<figcaption style="text-align: center"><i>OpenMV Cam H7 Camera (left) and microSD card (right)</i></figcaption>
</figure>

* Hardware
  * [OpenMV Cam H7](https://openmv.io/products/openmv-cam-h7) or [OpenMV Cam H7 R2](https://openmv.io/collections/products/products/openmv-cam-h7-r2) board
  * MicroSD card with at least 2 MB of storage space (to store ML model)
  * USB micro cable
* Software
  * [OpenMV IDE](https://openmv.io/pages/download)
* Services
  * [Google Colab](https://colab.research.google.com)
  * [Kaggle Account](https://www.kaggle.com)

## Dataset

Production grade ML models are typically trained on thousands of hours of human labeled data. It would be very time consuming to collect thousands of hours of training data ourselves for this project. However, we can leverage an existing public dataset that contains 10k+ images.

[Kaggle](https://www.kaggle.com) user [Sparsh Gupta (@imsparsh)](https://www.kaggle.com/imsparsh) has previously curated and shared an excellent [Gesture Recognition dataset](https://www.kaggle.com/datasets/imsparsh/gesture-recognition) and made it publicly available on [Kaggle under a permissive CC0 1.0 Universal (CC0 1.0) Public Domain license](https://creativecommons.org/publicdomain/zero/1.0/).

The dataset contains ~23k image files of people performing the following hand gestures over a 30 second period:

1. Left hand swipe
2. Right hand swipe
3. Thumbs down
4. Thumbs up

A [Kaggle account](https://www.kaggle.com) is needed to download the dataset via the [*kaggle* CLI tool](https://github.com/Kaggle/kaggle-api). Follow the instructions in [the “Authentication” section of the “How to Use Kaggle” guide](https://www.kaggle.com/docs/api#getting-started-installation-&-authentication) to download your account specific `kaggle.json` file, which contains your Kaggle username and API key, and place it in the correct location for the *kaggle* CLI tool to access.
 
The Kaggle CLI can be installed via `pip`:


In [None]:
%pip install kaggle

Then run the code cell below and upload your `kaggle.json` file:

In [None]:
import os
import shutil

import google.colab as colab

kaggle_json = 'kaggle.json'

print(f"Please upload your '{kaggle_json}' file:")
uploaded = colab.files.upload()

if kaggle_json not in uploaded:
  raise Exception(f"{kaggle_json} file was NOT uploaded!")

dot_kaggle_path = os.path.join(
    os.path.expanduser('~'),
    '.kaggle'
)
kaggle_json_path = os.path.join(dot_kaggle_path, kaggle_json)

print(f'Moving kaggle.json to {kaggle_json_path}')
os.makedirs(dot_kaggle_path, exist_ok=True)
shutil.move(kaggle_json, kaggle_json_path)
os.chmod(kaggle_json_path, 0o600)

Once your Kaggle authentication has be setup, the dataset can be downloaded using:

In [None]:
%%shell

kaggle datasets download --unzip --path dataset_raw imsparsh/gesture-recognition

### Inspect Dataset

Install the `pandas` and `matplotlib` libraries:

In [None]:
%pip install pandas matplotlib

Load `train.csv` with `pandas`:

In [None]:
import os

import pandas as pd

train_csv_file = os.path.join('dataset_raw', 'train.csv')
train_df = pd.read_csv(train_csv_file, sep=';', names=['folder', 'name', 'label'])
train_df = train_df.drop(['name'], axis=1)

train_df.head(-1)

Define function to get image paths for a folder in the dataset:

In [None]:
import os

def image_paths_for_train_folder(folder):
  folder_path = os.path.join('dataset_raw', 'train', folder)
  files = os.listdir(folder_path)

  return [ os.path.join(folder_path, file) for file in files ]

Display some of the images in the dataset:

In [None]:
import random

random.seed(42)

In [None]:
import matplotlib
import matplotlib.pyplot as plt 

fig, ax = plt.subplots(3, 3, figsize=(16, 12))
fig.tight_layout(pad=0)

for i in range(len(ax)):
  for j in range(len(ax[0])):
    df_index = random.randrange(len(train_df))

    folder = train_df['folder'][df_index]

    image_paths = image_paths_for_train_folder(folder)

    image_paths_index = random.randrange(len(image_paths))
    im = matplotlib.image.imread(image_paths[image_paths_index])

    ax[i, j].imshow(im)
    ax[i, j].text(0, -5, f"{image_paths[image_paths_index].split(os.path.sep)[-1]}", fontsize=8)
    ax[i, j].axis('off')


### Adapting the dataset

The image classification model we create will classify images into the following categories:

* 🚫 - No gesture
* ✋ - Hand up
* 👎 - Thumbs Down
* 👍 - Thumbs Up
* 👊 - Fist

The swipe right and swipe left gestures in the Kaggle dataset do not correspond to any of these classes, any images in these classes will need to be discarded for our model.

Since the images in the Kaggle dataset are taken over a 30 second period, they might contain other gestures at the start or end of the series. For example, some of the people in the dataset started with their hands in a fist position before eventually going to the labeled gesture hand up, thumbs up and thumbs down. Other times the person in the dataset starts off with no hand gesture in frame.


Create function to animate images in a folder:

In [None]:
from matplotlib import animation

# enable HTML5 output for matplotlib animations (needed for Colab)
matplotlib.rc('animation', html='html5')

def animate_train_folder(folder, frame_interval=33):
  fig, ax = plt.subplots(1, 1)
  fig.tight_layout(pad=0)
  ax.axis('off')
  plt.close()

  image_paths = image_paths_for_train_folder(folder)

  ims = []

  for image_path in image_paths:
    im = matplotlib.image.imread(image_path)

    ims.append([
                ax.imshow(im)
    ])

  return animation.ArtistAnimation(
      fig,
      ims, 
      interval=frame_interval,
      blit=True
  )

Animate some folders ...

In [None]:
ani1 = animate_train_folder('WIN_20180925_17_31_48_Pro_Stop_new', frame_interval=1000)

ani1.save("ani1.gif", dpi=300, writer=matplotlib.animation.PillowWriter(fps=1))

ani1

In [None]:
ani2 = animate_train_folder('WIN_20180925_17_34_05_Pro_Thumbs_Down_new', frame_interval=1000)

ani2.save("ani2.gif", dpi=300, writer=matplotlib.animation.PillowWriter(fps=1))

ani2

In [None]:
ani3 = animate_train_folder('WIN_20180925_17_41_54_Pro_Thumbs_Up_new', frame_interval=1000)

ani3.save("ani3.gif", dpi=300, writer=matplotlib.animation.PillowWriter(fps=1))

ani3

We’ve gone ahead and manually re-labeled the images into the classes, it can be found in CSV format in the [data folder on GitHub](https://github.com/ArmDeveloperEcosystem/ml-image-classification-example-for-openmv/tree/main/data), and contains labels for ~14k images.

The repository can be cloned:

In [None]:
%%shell

git clone https://github.com/ArmDeveloperEcosystem/ml-image-classification-example-for-openmv.git
ln -s ml-image-classification-example-for-openmv/* .

## TensorFlow model

We can now use TensorFlow to create and train the image classification model.

Install `tensorflow`:

In [None]:
%pip install tensorflow==2.8.2

### Loading Images
In order to load images from the dataset using the Keras [tf.keras.utils.image_dataset_from_directory(...)](https://www.tensorflow.org/api_docs/python/tf/keras/utils/image_dataset_from_directory) API the images files need to be reorganized into the following directory structure:

```
dataset/
    train/
        0/
        1/
        2/
        3/
        4/
    val/
        …
```

The [CSV files](https://github.com/ArmDeveloperEcosystem/ml-image-classification-example-for-openmv/tree/main/data) that contain the relabeled class information can be loaded using the [pandas](https://pandas.pydata.org) library and Python code can be run to set up this folder structure for the images.

In [None]:
import os
import shutil

import pandas as pd

for dataset in ['train', 'val']:
  csv_file = os.path.join('data', f'{dataset}.csv')

  df = pd.read_csv(csv_file)

  for index, row in df.iterrows():
    label = row['label']

    target_dir = os.path.join('dataset', dataset, str(label))
    source_path = os.path.join('dataset_raw', row['path'])
    
    os.makedirs(target_dir, exist_ok=True)
    
    shutil.copy2(source_path, target_dir)

The TensorFlow library can in imported and used to set a random seed.

In [None]:
import tensorflow as tf

print(tf.__version__)

In [None]:
SEED = 42

tf.keras.utils.set_random_seed(SEED)

Once the images are in the correct folder structure the training images can loaded as a [TensorFlow Dataset](https://www.tensorflow.org/api_docs/python/tf/data/Dataset):

In [None]:
VALIDATION_SPLIT = 0.20

TRAIN_IMAGE_SIZE = (160, 120)
IMAGE_SIZE = (96, 96)

BATCH_SIZE = 16

train_ds = tf.keras.utils.image_dataset_from_directory(
  'dataset/train',
  validation_split=VALIDATION_SPLIT,
  subset='training',
  seed=SEED,
  image_size=TRAIN_IMAGE_SIZE,
  color_mode='grayscale',
  crop_to_aspect_ratio=True,
  label_mode='categorical',
  batch_size=BATCH_SIZE
)

We will reserve 20% of the items in the train folder for the validation dataset. The images will be resized 120x120 pixels (while preserving their aspect ratio) and converted from the RGB colorspace to grayscale.

The validation and test datasets will re-size the images to 96x96 pixels instead of 120x120. The training dataset uses higher image dimensions to allow for data augmentation steps that will be discussed shortly.


In [None]:
val_ds = tf.keras.utils.image_dataset_from_directory(
  'dataset/train',
  validation_split=VALIDATION_SPLIT,
  subset='validation',
  seed=SEED,
  image_size=IMAGE_SIZE,
  color_mode='grayscale',
  crop_to_aspect_ratio=True,
  label_mode='categorical',
  batch_size=BATCH_SIZE
)

test_ds = tf.keras.utils.image_dataset_from_directory(
  'dataset/val',
  seed=SEED,
  image_size=IMAGE_SIZE,
  color_mode='grayscale',
  crop_to_aspect_ratio=True,
  label_mode='categorical',
  batch_size=BATCH_SIZE
)

The number of classes in the dataset can determined using:

In [None]:
image_spec, output_spec = train_ds.element_spec

num_classes = output_spec.shape[1]

print('num_classes =', num_classes)

### Model Structure

[MobileNetV1](https://arxiv.org/abs/1704.04861) is a well-known model architecture used for image classification tasks, including the [TensorLite for Microcontrollers Person detection example](https://github.com/tensorflow/tflite-micro/tree/main/tensorflow/lite/micro/examples/person_detection). We’ll train this model architecture on our dataset, with the same alpha (0.25) and image sizes (96x96x1) used in the [Visual Wake Words Dataset](https://arxiv.org/abs/1906.05721) paper.

A MobileNetV1 model is composed of 28 layers, but a single call to the Keras [tf.keras.applications.mobilenet.MobileNet(...)](https://www.tensorflow.org/api_docs/python/tf/keras/applications/mobilenet/MobileNet) API can be used to easily create a MobileNetV1 model for 5 output classes and the desired alpha and input shape values:


In [None]:
ALPHA = 0.25
DROPOUT = 0.10

mobilenet_025_96 = tf.keras.applications.mobilenet.MobileNet(
    input_shape=(IMAGE_SIZE[0], IMAGE_SIZE[1], 1),
    alpha=ALPHA,
    dropout=DROPOUT,
    weights=None,
    pooling='avg',
    classes=num_classes,
)

mobilenet_025_96.summary()

#### OpenMV Compatibility

The MicroPython based firmware used on the OpenMV Cam H7 does not include support for all of the layer types in the MobileNetV1 model we just created using the Keras API. At the time of writing, the latest firmware version [v4.3.1](https://github.com/openmv/openmv/releases/tag/v4.3.1), only included support for the following layer types:

* Add
* AveragePool2D
* Conv2D
* DepthwiseConv2D
* FullyConnected
* MaxPool2D
* Mean
* Pad
* Reshape
* Shape
* Softmax
* Sub

This was done to decrease the size of the OpenMV run-time to fit within the STM32 H7’s 2 MB of flash memory. More details can be found in the [libtf.cc file on GitHub](https://github.com/openmv/tensorflow-lib/blob/master/libtf.cc).

We’ll need to adapt the model as follows:

* Drop any [ZeroPadding2D](https://www.tensorflow.org/api_docs/python/tf/keras/layers/ZeroPadding2D) layers
* Modify the [DepthwiseConv2D](https://www.tensorflow.org/api_docs/python/tf/keras/layers/DepthwiseConv2D) layers to use *‘same’* padding
* Replace [GlobalAveragePooling2D](https://www.tensorflow.org/api_docs/python/tf/keras/layers/GlobalAveragePooling2D) layers with [AveragePooling2D](https://www.tensorflow.org/api_docs/python/tf/keras/layers/AveragePooling2D) layers
*Replace [Reshape](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Reshape) layers with [Flatten](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Flatten) layers

This can be done in only ~30 lines of Python code:




In [None]:
# only support some operations: https://github.com/openmv/tensorflow-lib/blob/master/libtf.cc
def modify_mobilenet_for_openmv(mobilenet_model):
  input_type_spec = mobilenet_model.layers[0].input.type_spec

  input = tf.keras.Input(shape=(input_type_spec.shape[1:]))
  output = input

  for layer in mobilenet_model.layers[1:]:
    if (isinstance(layer, tf.keras.layers.ZeroPadding2D)):
      print("dropping ZeroPadding2D", layer.name)
    elif (isinstance(layer, tf.keras.layers.DepthwiseConv2D)) and layer.padding != 'same':
      print("replacing DepthwiseConv2D", layer.name)
      output = tf.keras.layers.DepthwiseConv2D(
            kernel_size=layer.kernel_size,
            strides=layer.strides,
            padding='same',
            depth_multiplier=layer.depth_multiplier,
            use_bias=layer.use_bias,
            name=layer.name
      )(output)
    elif (isinstance(layer, tf.keras.layers.GlobalAveragePooling2D)):
      print("replacing GlobalAveragePooling2D", layer.name)
      output = tf.keras.layers.AveragePooling2D((3, 3), strides=(2, 2), padding='valid')(output)
    elif (isinstance(layer, tf.keras.layers.Reshape)):
      output = tf.keras.layers.Flatten()(output)
    else:
      output = layer(output)

  return tf.keras.Model(input, output)

In [None]:
openmv_mobilenet_025_96 = modify_mobilenet_for_openmv(mobilenet_025_96)

openmv_mobilenet_025_96.summary()

After this adaptation is done, the model structure will also be identical to the TensorFlow Lite Person Detection example model, apart from the number of outputs the model has (5 vs 1) in the final layer.

#### Data Augmentation

In order to avoid the model overfitting the training dataset, we can introduce a data augmentation step to alter input images during training.

We will use the following built-in Keras layers for this:
* [Random flipping](https://www.tensorflow.org/api_docs/python/tf/keras/layers/RandomFlip)
* [Random rotation](https://www.tensorflow.org/api_docs/python/tf/keras/layers/RandomRotation)
* [Random zooming](https://www.tensorflow.org/api_docs/python/tf/keras/layers/RandomZoom)
* [Random contrast adjustments](https://www.tensorflow.org/api_docs/python/tf/keras/layers/RandomContrast)


In [None]:
data_augmentation = tf.keras.Sequential([
  tf.keras.layers.RandomFlip("horizontal", seed=SEED),
  tf.keras.layers.RandomRotation(0.1, seed=SEED),
  tf.keras.layers.RandomZoom(0.1, seed=SEED),
  tf.keras.layers.RandomContrast(0.2, seed=SEED),
])

#### Combine layers

The adapted MobileNetV1 model and data augmentation ([Sequential](https://www.tensorflow.org/api_docs/python/tf/keras/Sequential)) layer can then be combined as follows:

In [None]:
model = tf.keras.Sequential([
  data_augmentation,
  tf.keras.layers.Resizing(IMAGE_SIZE[0], IMAGE_SIZE[1], crop_to_aspect_ratio=True),
  openmv_mobilenet_025_96
])

A [Resizing](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Resizing) layer was added to resize the output of the data augmentation layer to match the 96x96 input size required by the MobileNetV1 model.

### Train

We are now ready to train the model. To start, we can define two callbacks used during training:

1. A [LearningRateScheduler](https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/LearningRateScheduler) callback to exponentially decrease the learning rate after each epoch.

2. A [ModelCheckpoint](https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/ModelCheckpoint) callback to save the weights with the best validation loss.

In [None]:
LEARNING_RATE = 0.01
EPOCHS = 20

callbacks = [
  tf.keras.callbacks.LearningRateScheduler(
      schedule=lambda epoch, lr: lr * tf.math.exp(-0.1)
  ),
  tf.keras.callbacks.ModelCheckpoint(
    filepath='/tmp/checkpoint',
    monitor='val_loss',
    verbose=1,
    save_best_only=True,
    mode='min',
    save_weights_only=True,
  )
]

 
The model can then be compiled with an [Adam](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/Adam) optimizer and [CategoricalCrossEntropy](https://www.tensorflow.org/api_docs/python/tf/keras/losses/CategoricalCrossentropy) loss function. The starting learning rate will be set to 0.1.

In [None]:
model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=LEARNING_RATE),
    loss=tf.keras.losses.CategoricalCrossentropy(),
    metrics=['accuracy']
)

Finally model.fit(...) can be called to train on the dataset for 20 epochs. *Alternatively, to save time you skip the next 4 code cells and download a pre-trained model.*

In [None]:
history = model.fit(
    train_ds.cache().prefetch(BATCH_SIZE),
    validation_data=val_ds.cache().prefetch(BATCH_SIZE),
    epochs=EPOCHS,
    batch_size=BATCH_SIZE,
    callbacks=callbacks
)

Once the model has completed training, the best weights found during training for the validation loss can be restored as follows:

In [None]:
model.load_weights('/tmp/checkpoint')

and then the model can be saved:

In [None]:
model.save('model')

and zipped up:

In [None]:
%%shell

zip -r model.zip model/

If needed, the pre-trained model from GitHub can be [downloaded](https://github.com/ArmDeveloperEcosystem/ml-image-classification-example-for-openmv/archive/refs/heads/pretrained.zip) and restored by uncommenting and running:

In [None]:
# tf.keras.utils.get_file(
#     fname='model.zip',
#     origin='https://github.com/ArmDeveloperEcosystem/ml-image-classification-example-for-openmv/archive/refs/heads/pretrained.zip',
#     extract=True,
#     cache_subdir='/content'
# )

# model = tf.keras.models.load_model('ml-image-classification-example-for-openmv-pretrained/model')

# openmv_mobilenet_025_96 = model.layers[-1]

### Evaluate

The test dataset can then be used to evaluate performance of the trained model:

In [None]:
model.evaluate(test_ds, batch_size=BATCH_SIZE)

An accuracy metric of ~0.65 was obtained which is not great but is good enough to continue.

### Model Uncertainty

[Robert Monarch’s “Human-in-the-Loop Machine Learning” book](https://www.manning.com/books/human-in-the-loop-machine-learning) is an excellent resource for identifying which samples to prioritize for human labeling in human in the loop machine learning systems. Chapters 3 and 4 introduce concepts of “low activation” and “uncertainty sampling” to identify when an ML model is uncertain about its input data. We will try to leverage these concepts at inference time on our model to understand when it is certain about its outputs.

#### The Softmax function and low activation inputs

The model we have trained uses a Softmax function in the final layer and has a formula of:

\begin{align}
    \sigma({z_i}) = \dfrac{e^{z_i}}{\sum_{j} e^{z_j}}
\end{align}

Since the function is dividing by an exponential, it will lose the scale of the logit inputs.

For example:

1. `softmax([-2, 1, -1, 0])`
2. `softmax([1, 4, 2, 3])`
3. `softmax([11, 14, 12, 13])`
4. `softmax([101, 104, 102, 103])`

all have an output value of `[0.0320586 , 0.64391426, 0.08714432, 0.23688282]` even though the scale of the input values have different magnitudes.

When looking at the inputs to the softmax function from the perspective of the output of a hidden layer in a neural network, the first two example inputs would be considered to have “lower activation” compared to the last two example inputs. Having access to the inputs of the softmax activation layer during inference can give us insights into how activated the model’s hidden layer is and help to decide how certain the model’s output is.

<figure>
<img src="https://github.com/ArmDeveloperEcosystem/ml-image-classification-example-for-openmv/blob/main/images/10-hidden-layer-outputs-feeding-into-softmax-layer.png?raw=1" alt="OpenMV Cam H7 Camera (left) and microSD card (right)" style="display: block; margin-left: auto; margin-right: auto; width: 600px;">
<figcaption style="text-align: center"><i>Hidden Layer Outputs feeding into Softmax layer</i></figcaption>
</figure>

For example, we can set the minimum threshold for the maximum entry in the hidden layer output to be 5 and mark all model outputs below this criteria as uncertain.


#### Uncertainty sampling 

Uncertainty sampling techniques allow you to detect when the model’s output is near a decision boundary. One technique for doing this is called “Margin of confidence sampling”, this technique uses the difference between the top two most confident predictions. This difference can give us some perspective on the certainty of a model's outputs.

#### Getting a certainty insights

Next we will see how we get the hidden layer outputs in Keras. This can be done by creating a new model with layer 0’s input tensor as the input and the second last layers output tensor as the output:

In [None]:
hidden_layer_openmv_mobilenet_025_96 = tf.keras.Model(
    inputs=[
            openmv_mobilenet_025_96.layers[0].input
    ],
    outputs=[
             openmv_mobilenet_025_96.layers[-2].output        
    ]
)

hidden_layer_openmv_mobilenet_025_96.summary()

We can extend this further and create a model with multiple outputs:

1. The original output
2. The output of the hidden layer
3. The predicted label - based on the index with the maximum value
4. The predicted confidence - based on the maximum softmax output value
5. The maximum value of the hidden layer
6. The margin of confidence

In [None]:
model_output = openmv_mobilenet_025_96.output
hidden_layer_output = openmv_mobilenet_025_96.layers[-2].output

predicted_label_output = tf.argmax(model_output, axis=-1)
predicted_confidence = tf.reduce_max(openmv_mobilenet_025_96.output, axis=-1)

sorted_hidden_layer_output = tf.sort(hidden_layer_output, direction='DESCENDING', axis=-1)
max_hidden_layer_output = sorted_hidden_layer_output[:, 0]
margin_of_confidence_output = tf.math.subtract(sorted_hidden_layer_output[:, 0], sorted_hidden_layer_output[: ,1])

model_with_certainty = tf.keras.Model(
    inputs=[
            openmv_mobilenet_025_96.layers[0].input
    ],
    outputs=[
             model_output,
             hidden_layer_output,
             predicted_label_output,
             predicted_confidence,
             max_hidden_layer_output,
             margin_of_confidence_output
    ]
)

We can load some images and compare outputs of the various outputs of the model.

In [None]:
test_csv_file = os.path.join('data', 'val.csv')
test_df = pd.read_csv(test_csv_file)


test_df['path'] = test_df['path'].apply(lambda p: os.path.join('dataset_raw', p))

test_df.head()

In [None]:
test_paths_ds = tf.data.Dataset.from_tensor_slices(test_df['path'])

def decode_img(p):
  img = tf.io.read_file(p)
  img = tf.io.decode_image(img)
  img = tf.image.rgb_to_grayscale(img)
  img = tf.image.resize_with_crop_or_pad(img, IMAGE_SIZE[0], IMAGE_SIZE[1])

  return img

test_images_ds = test_paths_ds.map(decode_img)

p = model_with_certainty.predict(test_images_ds.batch(len(test_df)))


In [None]:
test_df['predicted_label'] = p[2]
test_df['predicted_confidence'] = p[3]
test_df['hidden_layer_max'] = p[4]
test_df['margin_of_confidence'] = p[5]

In [None]:
import IPython
import numpy as np

def display_test_df_row(display_index):
  display(IPython.display.Image(test_df['path'][display_index]))

  display(IPython.display.Markdown(f'''
  | | |
  | ---------------------------------- | ------------------------------------------------ |
  | **Actual label**                   | {test_df['label'][display_index]}                |
  | **Predicted label**                | {test_df['predicted_label'][display_index]}      |
  | **Softmax output**                 | {np.around(p[0][display_index], 3)}                           |
  | **Hidden output**                  | {np.around(p[1][display_index], 6)}                           |
  | **Predicted confidence**           | {test_df['predicted_confidence'][display_index]} |
  | **Maximum output of hidden layer** | {test_df['hidden_layer_max'][display_index]}     |
  | **Margin of confidence**           | {test_df['margin_of_confidence'][display_index]} |
  '''))

In [None]:
lowest_margin_of_confidence_index = test_df.sort_values('margin_of_confidence', ascending=True).index[0]

print('Lowest margin of confidence')
display_test_df_row(lowest_margin_of_confidence_index)

This image has the **lowest** margin of confidence in the test dataset - you can see it is near a decision boundary between classes.

In [None]:
highest_margin_of_confidence_index = test_df.sort_values('margin_of_confidence', ascending=False).index[0]

print('Highest margin of confidence')
display_test_df_row(highest_margin_of_confidence_index)

This image has the **highest** margin of confidence in the test dataset.

In [None]:
lowest_activation_index = test_df.sort_values('hidden_layer_max', ascending=True).index[0]

print('Lowest activation')
display_test_df_row(lowest_activation_index)

This image has the **lowest** maximum hidden layer output in the test dataset and is considered to have “low activation”.

These example images provided a brief highlight of the certainty insights we covered earlier. Later in the guide you will have an opportunity to see how thresholds for each will impact when the inference application detects a hand gesture and “types” an emoji.


### Converting model to TensorFlow Lite format

In order for the (hidden layer output of the) model to be deployed on the OpenMV Cam H7 board it first needs to be converted into [TensorFlow Lite](https://www.tensorflow.org/lite) format. This can be done using the [tf.lite.TFLiteConverter.from_keras_model(...)](https://www.tensorflow.org/api_docs/python/tf/lite/TFLiteConverter) API. The [default optimizations](https://www.tensorflow.org/api_docs/python/tf/lite/Optimize) for conversion will be selected to enable quantized 8-bit weights. The validation dataset will be used as a representative data set for the quantization process. The model's input and output types will be set to [tf.int8](https://www.tensorflow.org/api_docs/python/tf/dtypes) data types.


In [None]:
import numpy as np

presoftmax_openmv_mobilenet_025_96 = tf.keras.Model(
    inputs=[
            openmv_mobilenet_025_96.layers[0].input
    ],
    outputs=[
             openmv_mobilenet_025_96.layers[-2].output        
    ]
)

def representative_dataset():
  for image, label in val_ds.unbatch():
    yield [ np.array([image]) ]

converter = tf.lite.TFLiteConverter.from_keras_model(presoftmax_openmv_mobilenet_025_96)
converter.optimizations = [ tf.lite.Optimize.DEFAULT ]
converter.representative_dataset = representative_dataset

# https://github.com/openmv/tensorflow-lib/blob/2abbaee8458379c83444fc391cde5e748becfd55/libtf.cc
converter.target_spec.supported_ops = [ tf.lite.OpsSet.TFLITE_BUILTINS_INT8 ]
converter.inference_input_type = tf.int8
converter.inference_output_type = tf.int8

tflite_quant_model = converter.convert()

Once the model is converted it can then be saved to a model.tflite file to be transferred to the camera.

In [None]:
with open('model.tflite', 'wb') as output:
  print(len(tflite_quant_model))
  output.write(tflite_quant_model);

The converted model has a size of ~303 kilobytes and can be inspected using the [Netron App](https://netron.app).

Run the code cell below to download the model:

In [None]:
colab.files.download('model.tflite')

## OpenMV Application

This section will outline how to set up the OpenMV (Integrated Development Environment) IDE and develop applications for the OpenMV board. It will also provide an overview of key parts of the inference application. The full source code for the application can be found in the [`openmv` folder on GitHub](https://github.com/ArmDeveloperEcosystem/ml-image-classification-example-for-openmv/tree/main/openmv).


### Setting up the development environment

[Download and install the OpenMV IDE](https://openmv.io/pages/download) for your operating system. More information on this can be found on the ["OpenMV Cam Tutorial - Software Setup" page](https://docs.openmv.io/openmvcam/tutorial/software_setup.html).

### Hardware Setup

Insert the microSD card into the back of the OpenMV camera and then plug in the micro USB cable into the bottom of the board. Connect the other end of the USB cable to your computer. Once plugged in the camera will appear as a new removable USB disk drive on your computer.

<figure>
<img src="https://github.com/ArmDeveloperEcosystem/ml-image-classification-example-for-openmv/blob/main/images/14.back-of-openmv-cam-h7-board-with-microsd-card-inserted.jpg?raw=1" alt="Back of OpenMV Cam H7 board with microSD card inserted" style="display: block; margin-left: auto; margin-right: auto; width: 250px;">
<figcaption style="text-align: center"><i>Back of OpenMV Cam H7 board with microSD card inserted</i></figcaption>
</figure>

Start the OpenMV IDE and click on the “Connect” button in the bottom left corner.

<figure>
<img src="https://github.com/ArmDeveloperEcosystem/ml-image-classification-example-for-openmv/blob/main/images/15.openmv-ide-connect-button.png?raw=1" alt="OpenMV IDE - Connect button" style="display: block; margin-left: auto; margin-right: auto;">
<figcaption style="text-align: center"><i>OpenMV IDE - Connect button</i></figcaption>
</figure>

You may be prompted to update the firmware running on the board if it is not the latest version available. After this, the icon will change state to connected and the “Run” button below will be enabled.

<figure>
<img src="https://github.com/ArmDeveloperEcosystem/ml-image-classification-example-for-openmv/blob/main/images/16.openmv-ide-board-connected-state.png?raw=1" alt="OpenMV IDE - Board connected state" style="display: block; margin-left: auto; margin-right: auto;">
<figcaption style="text-align: center"><i>OpenMV IDE - Board connected state</i></figcaption>
</figure>

### Hello World example
To ensure the camera is functioning correctly we can upload the “Hello World” example onto the board. In the OpenMV IDE, select `File -> Examples -> OpenMV -> Basics -> helloworld.py`

<figure>
<img src="https://github.com/ArmDeveloperEcosystem/ml-image-classification-example-for-openmv/blob/main/images/17.hello_world.py-example-in-the-openmv-ide.png?raw=1" alt="hello_world.py example in the OpenMV IDE" style="display: block; margin-left: auto; margin-right: auto; width: 500px;">
<figcaption style="text-align: center"><i>hello_world.py example in the OpenMV IDE</i></figcaption>
</figure>

Now click on the “Run” icon in the bottom left corner of the OpenMV IDE to run the example. The board will start running the example, and you will see the view from the camera in the top right corner of the OpenMV IDE. To see the output of the `print(...)` statements in the example click on the “Serial Terminal” button on the bottom of the IDE. The execution of the script can be stopped by clicking the “Stop” button in the bottom left corner of the IDE.

<figure>
<img src="https://github.com/ArmDeveloperEcosystem/ml-image-classification-example-for-openmv/blob/main/images/18.openmv-ide-stop-button.png?raw=1" alt="OpenMV IDE - Stop button" style="display: block; margin-left: auto; margin-right: auto;">
<figcaption style="text-align: center"><i>OpenMV IDE - Stop button</i></figcaption>
</figure>

We can now modify the example to match what is needed for the ML model we trained and converted earlier, by changing the following lines from:

```python
sensor.set_pixformat(sensor.RGB565)
sensor.set_framesize(sensor.QVGA)
```

to:

```python
sensor.set_pixformat(sensor.GRAYSCALE)
sensor.set_framesize(sensor.QQVGA)
```

These changes will make the camera obtain a grayscale image of 160x120 pixels (instead of a RGB 320x240 pixels). Press the “Start” button to test the changes on your board.

**Note**: The [OpenMV `sensor` module documentation](https://docs.openmv.io/library/omv.sensor.html) contains information on additional API’s and options you may want to explore outside of this project.

### Using TensorFlow with OpenMV

The [OpenMV `tf` module](https://docs.openmv.io/library/omv.tf.html) enables integration of quantized TensorFlow Lite models into OpenMV applications. It uses the [TensorFlow Lite for Microcontrollers C/C++ library](https://github.com/tensorflow/tflite-micro) but wraps it in its own customized Python API. When using a model that is stored on a microSD, it must be under 400KB in size, since the model will be loaded from the SD card to the OpenMV Cam H7’s RAM.

To get started copy the `model.tflite` file from earlier onto the OpenMV board’s removable USB disk interface on your computer. Once transferred it will be stored on the microSD card.

We can then edit the `hello_world.py` example used earlier, first add a new import line for the `tf` module to the start of the file under the existing imports:

```python
import tf
```

Now that the `tf` module has been loaded, the model can loaded as follows before the main loop of the example:

```python
model = tf.load("model.tflite", load_to_fb=True)
```

The model output for an input image (from the camera) can be calculated using the following in the main loop:

```python
classification_result = model.classify(img)
model_output = classification_result[0].output()
  
print(model_output)
```

Click the “Start” button again to run the changes we’ve made to the example. You can test how the model output varies by posing with the hand gestures the model is trained for. In my testing I found the model behaved best when the camera was on the desk in front of me and slightly tilted up, and the room was well lit up.

Since the model we converted used the hidden layer outputs as its outputs, a softmax function must be used to convert the hidden layer output values to softmax output. This can be done by defining a `softmax(...)` function at the start of the file after the imports:

```python
import math

def softmax(input):
   result = []

   numerator = []
   denominator = 0
   for i, item in enumerate(input):
       numerator.append(math.exp(item))
       denominator += math.exp(item)

   for i, item in enumerate(numerator):
       result.append(numerator[i] / denominator)

   return result
```

The main loop can then be updated to call this new function and output softmax output values:

```python
softmax_model_output = softmax(model_output)
  
print(model_output, softmax_model_output)
```

The predicted class of the input image can be calculated by finding the output index with the highest values. Then the index can be used to print the associated emoji using an array which holds the string values for each class.

```python
LABELS = ["🚫", "✋", "👎", "👍", "👊"]
# ...
while True:
    # ...
    classification = model_output.index(max(model_output))
  
    print(model_output, softmax_model_output, LABELS[classification])


### Calculating the margin of confidence of the model output

We would like the system to react when the model’s output has a high degree of certainty. To do this we will calculate the model outputs margin of confidence value by sorting the models hidden layer output values and then subtracting the two highest values: 

```python
sorted_model_output = model_output.copy()
sorted_model_output.sort(reverse=True)
margin_of_confidence = sorted_model_output[0] - sorted_model_output[1]
```

The printout we had earlier can be modified to only print label emojis for model outputs with certain above a specific class specific threshold for both activation values and margin of confidence:

```python
ACTIVATION_THRESHOLDS = [0, 6, 2, 2, 2] # activation threshold
MOC_THRESHOLDS = [0, 5, 3, 3, 3] # margin of confidence threshold
# ...

while True:
    # …
    
    above_activation_threshold = (
        sorted_model_output[0] > ACTIVATION_THRESHOLDS[classification]
    )
    above_moc_threshold = margin_of_confidence > MOC_THRESHOLDS[classification]

    if above_activation_threshold and above_moc_threshold:
        print(
            model_output,
            softmax_model_output,
            margin_of_confidence,
            LABELS[classification]
        )
    else:
        print(model_output, softmax_model_output, margin_of_confidence)

```

Run the script and tune the per class threshold values so that you get a reasonable amount of gesture detected.


### RGB LED integration

The OpenMV has an on-board RGB LED, we can use it to show a visual indication when a gesture is detected. This will be useful when the application is running standalone without the OpenMV IDE.

<figure>
<img src="https://github.com/ArmDeveloperEcosystem/ml-image-classification-example-for-openmv/blob/main/images/19.rgb-led-on-openvm-%20h7-cycling-colors.gif?raw=1" alt="RGB LED on OpenMV H7 cycling colors" style="display: block; margin-left: auto; margin-right: auto; width: 200px;">
<figcaption style="text-align: center"><i>RGB LED on OpenMV H7 cycling colors</i></figcaption>
</figure>

We can map the following colors to the image classification classes:

* ⚪️ White:  🚫 - No gesture
* 🟡 Yellow: ✋ - Hand up
* 🔴 Red:     👎 - Thumbs Down
* 🟢 Green:  👍 - Thumbs Up
* 🔵 Blue:     👊 - Fist

The LED’s can be accessed by using the [LED class](https://docs.openmv.io/library/pyb.LED.html) inside in the [`pyb` module](https://docs.openmv.io/library/pyb.html). The red LED has an id of 1, while the green and blue LEDs are id 2 and 3 respectively: 

```python
import pyb

red_led = pyb.LED(1)
green_led = pyb.LED(2)
blue_led = pyb.LED(3)
```

Each LED can be turned on and off, by calling [`led.on()`](https://docs.openmv.io/library/pyb.LED.html#pyb.LED.on) and [`led.off()`](https://docs.openmv.io/library/pyb.LED.html#pyb.LED.off). A white color can be mixed by turning on all three LEDs at the same time, while yellow can be mixed by only turning on the red and green LEDs. We can define a function that takes input string and turns the appropriate LEDs on and off.

```python
def set_rgb_led(color):
   red_led.on() if "r" in color else red_led.off()
   green_led.on() if "g" in color else green_led.off()
   blue_led.on() if "b" in color else blue_led.off()
```

For example, passing in ‘r’ will turn the red LED on, ‘rg’ will turn on both the red and green LEDs, and `rgb` will turn on all three LEDs.

We can create a new array to store the LED string values for each class:

```python
LED_LABELS = ["rgb", "rg", "r", "g", "b"]
```

The if statement in the main loop can then be updated to call the function, if the model is determined to be uncertain the LEDs will be set to color of class 0 - white:

```python
if above_activation_threshold and above_moc_threshold:
    # ...
    set_rgb_leds(LED_LABELS[classification])
else:
    # ...
    set_rgb_leds(LED_LABELS[0])
```


### Exponential smoothing

As you probably observed during testing so far, the outputs from the model are slightly noisy. This is due to the camera sensor being slightly noisy and this noise trickling down to the model’s output.

We can use a [basic exponential smoothing function](https://en.wikipedia.org/wiki/Exponential_smoothing#Basic_(simple)_exponential_smoothing_(Holt_linear)) to smooth the output of the model prior to deciding if an emoji needs to be “typed” on the PC. The formula is as follows:


\begin{align}
    S_t = \alpha \times X_t + (1 - \alpha) \times S_{t - 1}
\end{align}

The alpha value is called the “smoothing factor” and is a number between 0 and 1. It controls how much influence new X<sub>t</sub> values have on the output of S<sub>t</sub>.

This function can be defined in Python using:

```python
def exponential_smooth(x, s_in, alpha):
    s_out =  [0] * len(s_in)

    for i in range(len(s_in)):
        s_out[i] = alpha * x[i] + (1 - alpha) * s_in[i]

    return s_out
```

If model output does not have a high degree of certainty we can override the softmax output to `[1, 0, 0, 0, 0]` to place it in the no gesture category:

```python
if above_activation_threshold and above_moc_threshold:
    # ...
else:
    # …
    softmax_model_output = [0] * len(softmax_model_output)
    softmax_model_output[0] = 1
```

Then the `softmax_model_ouput` variable can be exponentially smoothed into a new variable called `smoothed_softmax_model_output`, for this application we will use an alpha value of 0.20.

```python
ALPHA = 0.20

# ...

smoothed_softmax_model_output = [0] * len(LABELS)

# ...

while True:
    # ...

    smoothed_softmax_model_output = exponential_smooth(
        softmax_model_output, smoothed_softmax_model_output, ALPHA
    )
```

A new exponentially smoothed classification class can be calculated using:

```python
smoothed_classification = smoothed_softmax_model_output.index(
    max(smoothed_softmax_model_output)
)
```


### Deciding when to “type” an emoji

We have an exponentially smooth classification value along with the model’s softmax output that accounts for high certainty model outputs. The exponential smoothed classification value can now be used to detect if a new emoji needs to be typed as follows: 

```python

SMOOTHED_THRESHOLD = 0.80

# ...

last_output = -1

# ...

while True:
    # ...

    if (
        smoothed_softmax_model_output[smoothed_classification] > SMOOTHED_THRESHOLD
        and last_output is not smoothed_classification
    ):
        if smoothed_classification is not 0:
            print(f"Ready to send {LABELS[smoothed_classification]} emoji")

        last_output = smoothed_classification

```

This code first checks if the softmax output of smoothed classification value is above a desired threshold (0.80) and is also different from the previous output value - to prevent repeatedly sending the same emoji. If this criteria is met, and the classification value is not 0 (no gesture) we can send the emoji associated with the classification. The last output value is then updated with the new smoothed classification value, to use in the next loop cycle.


### “Typing” Emojis over USB HID

The firmware running on the OpenMV Cam H7 board does not enable USB HID by default. We can create a boot.py file on the OpenMV camera’s file system to enable USB HID:

```python
import pyb

pyb.usb_mode('VCP+MSC+HID', hid=pyb.hid_keyboard)
```

This code uses the [`pyb` module](https://docs.openmv.io/library/pyb.html#)’s [`pyb.usb_mode(...)`](https://docs.openmv.io/library/pyb.html#pyb.usb_mode) API to enable USB HID in keyboard mode, while still enabling the USB VCP (virtual comm port) interface for serial communications and USB MSD (mass storage device) interface to enable access to the OpenMV board’s filesystem from a PC. Code in the boot.py file runs before the OpenMV application code.

A USB HID keyboard message can now be sent by creating an instance of the [`pyb.USB_HID()`](https://docs.openmv.io/library/pyb.USB_HID.html) class and using the [`USB_HID.send(...)`](https://docs.openmv.io/library/pyb.USB_HID.html#pyb.USB_HID.send) API to send an 8 byte array value.

At the time of writing this blog, there was no standard way to type an emoji character on all major operating systems (macOS, Linux, Windows). The next section will go over Operating System (OS) specific items. More information can be found on the [“Unicode Input” Wikipedia page](https://en.wikipedia.org/wiki/Unicode_input).


#### macOS

It is possible to type emojis on macOS computers by enabling “*Unicode Hex Input*” in System Preferences. Go into the “*System Preferences*” application, click on the “*Keyboard*” button and then the “Input Sources” tab.

<figure>
<img src="https://github.com/ArmDeveloperEcosystem/ml-image-classification-example-for-openmv/blob/main/images/20.macos-keyboard-input-source-system-preferences.png?raw=1" alt="macOS - Keyboard Input Source - System Preferences" style="display: block; margin-left: auto; margin-right: auto; width: 300px;">
<figcaption style="text-align: center"><i>macOS - Keyboard Input Source - System Preferences</i></figcaption>
</figure>

Click the + button, then scroll down to the “*Others*” category on the left hand pane, select “*Unicode Hex Input*”, and click the “*Add*” button.

<figure>
<img src="https://github.com/ArmDeveloperEcosystem/ml-image-classification-example-for-openmv/blob/main/images/21.macos-system-preferences-add-unicode-hex-input-input-source.png?raw=1" alt="macOS - System Preferences - Add Unicode Hex Input input source" style="display: block; margin-left: auto; margin-right: auto; width: 300px;">
<figcaption style="text-align: center"><i>macOS - System Preferences - Add “Unicode Hex Input” input source</i></figcaption>
</figure>

A new item will appear on your Mac’s top menu bar for Keyboard inputs, select the newly added “*Unicode Hex Input*” option.

<figure>
<img src="https://github.com/ArmDeveloperEcosystem/ml-image-classification-example-for-openmv/blob/main/images/22.selecting-unicode-hex-input-in-the-macos-menu-bar.png?raw=1" alt="Selecting Unicode Hex Input in the macOS menu bar" style="display: block; margin-left: auto; margin-right: auto; width: 200px;">
<figcaption style="text-align: center"><i>Selecting “Unicode Hex Input” in the macOS menu bar</i></figcaption>
</figure>

You will now be able to manually type emojis if you know their UTF-16 values. For example, the 👍 emoji has a UTF-16 value of `0xd83ddc4d`. If you type hold down the option key and type this sequence: d,8,3,d,d,c,4,d - emoji will appear.


#### Linux

On Linux it is possible to type emojis in applications that support UTF-8 input text (like LibreOffice) if you know their UTF-8 values. The 👍 emoji has a UTF-8 value of `0x1f44d`. If you hold down the CTRL and SHIFT keys while typing u, and then type this sequence: 1,f,4,4,d - followed by a space character the emoji will appear.

#### Windows

In applications like Word it is possible to type emojis by holding down the ALT key, pressing + key on the number pad, typing the UTF-8 value in decimal on the number pad key and releasing the ALT key. The 👍 emoji has a UTF-8 value of `128077` (`0x1f44d` in hexadecimal). If you type this sequence while holding down the ALT key using the keys on the number pad: +,1,2,8,0,7,7 - the emoji will appear.

#### Integrate the UnicodeHexKeyboard class
Manually typing emojis via UTF-8 or UTF-16 codes is not convenient, however this can be automated using Python on the OpenMV Cam H7 board. We’ve created [a custom `UnicodeHexKeyboard` Python class](https://github.com/ArmDeveloperEcosystem/ml-image-classification-example-for-openmv/blob/main/openmv/unicode_hex_keyboard.py) that handles everything for you.

To use it, download the [`unicode_hex_key_board.py` file from the `openmv` folder on GitHub](https://github.com/ArmDeveloperEcosystem/ml-image-classification-example-for-openmv/blob/main/openmv/unicode_hex_keyboard.py) to the disk drive for the OpenMV board. The main application file can then use it as follows:

```python
import unicode_hex_keyboard

# ...

# keyboard instance to use to type emojis
#  - to use with a Linux PC pass in: unicode_hex_keyboard.LINUX
#  - to use with a Mac pass in: unicode_hex_keyboard.MACOS
#  - to use with a Windows PC pass in: unicode_hex_keyboard.WINDOWS
keyboard = unicode_hex_keyboard.UnicodeHexKeyboard(unicode_hex_keyboard.MACOS)

# ...

keyboard.send('👍')
```

The final step is to update the code in the application’s main loop to send the classification specific emoji:

```python
# ...

print(f'Ready to send {LABELS[smoothed_classification]} emoji')

keyboard.send(LABELS[smoothed_classification])

# ...
```


### Recap of Application

We’ve successfully integrated the TensorFlow Lite model into the OpenMV application. The application:

1. Grabs an image frame from the camera
2. Gets the ML model’s output for the captured image frame
3. Filters the ML model’s output for high certainty predictions
4. Uses an exponential smoothing function to smooth the model’s (softmax) outputs
5. Uses the exponentially smoothed model outputs to determine if a new hand gesture is present.
6. Then sends then “types” the associated emoji on a PC using the USB HID protocol.

<figure>
<img src="https://github.com/ArmDeveloperEcosystem/ml-image-classification-example-for-openmv/blob/main/images/23.block-diagram-of-application-processing-pipeline.png?raw=1" alt="Block Diagram of Application processing pipeline" style="display: block; margin-left: auto; margin-right: auto; width: 750px;">
<figcaption style="text-align: center"><i>Block Diagram of Application processing pipeline</i></figcaption>
</figure>

You can save the `.py` file that you were editing and running manually on the board's USB disk interface as `main.py` now. When the board powers on, it will automatically start running the code in the `main.py` file. 


## Conclusion

Throughout this project we’ve covered an end-to-end flow of training a custom image classification model and how to deploy it locally to a Arm Cortex-M7 based OpenMV development board using TensorFlow Lite! TensorFlow was used in a Google Colab notebook to train the model on a re-labeled public dataset from Kaggle. After training, the model was converted into TensorFlow Lite format to run on the OpenMV board using the TensorFlow Lite for Microcontrollers run-time along with accelerated Arm CMSIS-NN kernels. 
 
At inference time the model’s outputs were processed using model certainty techniques, and then fed output from the (Softmax) activation output into an exponential smoothing function to determine when to send keystrokes over USB HID to type emojis on a PC. The dedicated input device we created was able to capture and process grayscale 96x96 image data at just under 20 fps on an Arm Cortex-M7 processor running at 480 MHz. On-device inferencing provided a low latency response and preserved the privacy of the user by keeping all image data at the source and processing it locally. 

Build one yourself by purchasing an OpenMV Cam H7 R2 board on [openmv.io](https://openmv.io/collections/products/products/openmv-cam-h7-r2) or [a distributor](https://openmv.io/collections/products). The project can be extended by fine tuning the model on your own data or applying transfer learning techniques and using the model we developed as base to train other hand gestures. Maybe you can find another public dataset for facial gestures and use it to type 😀 emojis when you smile! 

*A big thanks to Sparsh Gupta for sharing the Gesture Recognition dataset on Kaggle under a public domain license and my Arm colleagues Rod Crawford, Prathyusha Venkata, Elham Harirpoush, and Liliya Wu for their help in reviewing the material for this guide!*



## Further Reading

* Papers
  * [MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications](https://arxiv.org/abs/1704.04861)
  * [Visual Wake Words Dataset](https://arxiv.org/abs/1906.05721)
* Books
  * [Human-in-the-Loop Machine Learning by Robert (Munro) Monarch](https://www.manning.com/books/human-in-the-loop-machine-learning)