# MediaPipe Object Detection Learning

[![Open In Colab <](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ShawnHymel/google-coral-micro-object-detection/blob/master/notebooks/mediapipe-object-detection-learning.ipynb)

```
Original authors: MediaPipeline (Google)
Modified by: Shawn Hymel
Date: December 16, 2023
```

Use transfer learning with Google MediaPipe to build a custom object detection model. Based on the example code from https://developers.google.com/mediapipe/solutions/customization/object_detector.

> **Note:** This script has been verified with TensorFlow v2.15.0.

To use this script, upload your dataset in [Pascal VOC format](http://host.robots.ox.ac.uk/pascal/VOC/) in an archive named *dataset.zip*. You can use a labeling tool like [labelImg](https://github.com/HumanSignal/labelImg) or [Make Sense](https://www.makesense.ai/) to create bounding box annotations in the Pascal VOC format.


Your data should be in the following format. Note that the directory names "Annotations" and "images" must be exactly as shown (with the capital 'A' and lowercase 'i').

```
dataset.zip
├── Annotations/
│   ├── image.01.xml
│   ├── image.02.xml
│   ├── ...
└── images/
    ├── image.01.jpg
    ├── image.02.jpg
    └── ...
```

Run through all the cells. Adjust the hyperparameters (`hparams`) as needed to achieve the desired accuracy. Ideally, you want your average precision (AP) to be greater than 90% to get a useful object detection model.

In [None]:
#@title License information
# Copyright 2023 The MediaPipe Authors.
# Licensed under the Apache License, Version 2.0 (the "License");
#
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

## Configuration

In [13]:
# Install MediaPipe and Edge TPU compiler
!python --version
!pip install --upgrade pip
!pip install mediapipe-model-maker
! curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
! echo "deb https://packages.cloud.google.com/apt coral-edgetpu-stable main" | sudo tee /etc/apt/sources.list.d/coral-edgetpu.list
! sudo apt-get update
! sudo apt-get install edgetpu-compiler

Python 3.11.11
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1022  100  1022    0     0   7980      0 --:--:-- --:--:-- --:--:--  8047
OK
deb https://packages.cloud.google.com/apt coral-edgetpu-stable main
Hit:1 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease
Hit:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease
Get:3 http://security.ubuntu.com/ubuntu jammy-security InRelease [129 kB]
Hit:4 http://archive.ubuntu.com/ubuntu jammy InRelease
Hit:5 https://packages.cloud.google.com/apt coral-edgetpu-stable InRelease
Hit:6 https://r2u.stat.illinois.edu/ubuntu jammy InRelease
Get:7 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [128 kB]
Hit:8 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease
Hit:9 https://ppa.launchpadcontent.net/graphics-drivers/ppa/ubuntu jammy InRelease
Hit:10 https://

In [14]:
from google.colab import files
import os
import json
import tensorflow as tf

from mediapipe_model_maker import object_detector, quantization

In [15]:
# Check TensorFlow version
print(tf.__version__)
assert tf.__version__.startswith('2')

2.15.1


In [16]:
# Settings
BASE_PATH = "."
DATASET_ZIP_PATH = os.path.join(BASE_PATH, "dataset.zip")
DATASET_PATH = os.path.join(BASE_PATH, "dataset/")
TRAIN_SPLIT = 0.8
EXPORT_PATH = os.path.join(BASE_PATH, "exported_models/")
TFLITE_FLOAT32_NAME = "model.tflite"
TFLITE_INT8_NAME = "model_int8.tflite"
METADATA_PATH = os.path.join(EXPORT_PATH, "metadata.json")
METADATA_H_NAME = "metadata.hpp"
METADATA_H_PATH = os.path.join(EXPORT_PATH, METADATA_H_NAME)

## Create dataset

Load and prepare the dataset for training and validation.

In [17]:
# Unzip dataset
!rm -rf {DATASET_PATH}
!unzip -q {DATASET_ZIP_PATH} -d {DATASET_PATH}

In [18]:
# Load the dataset
data = object_detector.Dataset.from_pascal_voc_folder(DATASET_PATH)

# Split the dataset into separate training and validation sets
train_data, validation_data = data.split(TRAIN_SPLIT)

## Train object detection model

Use transfer learning to retrain a model. Gather more/better data and adjust the hyperparameters (`hparams`) to ideally obtain a `total_loss` of less than 0.1 and an average precision (AP) of greater than 0.9.

In [19]:
# Load pre-trained model and specify hyperparameters
spec = object_detector.SupportedModels.MOBILENET_V2_I320
hparams = object_detector.HParams(
    learning_rate = 0.3,
    batch_size=8,
    epochs=50,
    export_dir=EXPORT_PATH,
)
options = object_detector.ObjectDetectorOptions(
    supported_model=spec,
    hparams=hparams,
)

In [20]:
# Retrain model
model = object_detector.ObjectDetector.create(
    train_data=train_data,
    validation_data=validation_data,
    options=options
)

  inputs = self._flatten_to_reference_inputs(inputs)


Downloading https://storage.googleapis.com/tf_model_garden/vision/qat/mobilenetv2_ssd_coco/mobilenetv2_ssd_i320_ckpt.tar.gz to /tmp/model_maker/object_detector/mobilenetv2_i320
Model: "retina_net_model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 mobile_net (MobileNet)      {'2': (None, 80, 80, 24   2257984   
                             ),                                  
                              '3': (None, 40, 40, 32             
                             ),                                  
                              '4': (None, 20, 20, 96             
                             ),                                  
                              '5': (None, 10, 10, 32             
                             0),                                 
                              '6': (None, 10, 10, 12             
                             80)}                                
     



Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


In [21]:
# Evaluate model performance
loss, coco_metrics = model.evaluate(
    validation_data,
    batch_size=4,
)
print(f"Validation loss: {loss}")
print(f"Validation metrics: {coco_metrics}")

creating index...
index created!
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=0.70s).
Accumulating evaluation results...
DONE (t=0.07s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.790
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 1.000
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.950
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.790
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.825
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.825
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.825
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000


## Export model

Save the model in three different formats:

 1. 32-bit floating point TensorFlow Lite (TFLite)
 2. 8-bit integer quantized TFLite
 3. TPU compiled and quantized TFLite|

Additionally, save the metadata (anchor box information) in a .h file that a resource-constrained device can recalculate the anchor boxes.



In [22]:
# Export 32-bit float model
model.export_model()

Exporting a floating point model


  inputs = self._flatten_to_reference_inputs(inputs)


In [23]:
# Perform post-training quantization (8-bit integer) and save quantized model
quantization_config = quantization.QuantizationConfig.for_int8(
    representative_data=validation_data,
)
model.restore_float_ckpt()
model.export_model(
    model_name=TFLITE_INT8_NAME,
    quantization_config=quantization_config,
)

  inputs = self._flatten_to_reference_inputs(inputs)


Using existing files at /tmp/model_maker/object_detector/mobilenetv2_i320
Model: "retina_net_model_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 mobile_net_1 (MobileNet)    {'2': (None, 80, 80, 24   2257984   
                             ),                                  
                              '3': (None, 40, 40, 32             
                             ),                                  
                              '4': (None, 20, 20, 96             
                             ),                                  
                              '5': (None, 10, 10, 32             
                             0),                                 
                              '6': (None, 10, 10, 12             
                             80)}                                
                                                                 
 fpn_1 (FPN)                 {'5': (None

  inputs = self._flatten_to_reference_inputs(inputs)


In [24]:
# Compile the model for Edge TPU
!edgetpu_compiler -s -o {EXPORT_PATH} {os.path.join(EXPORT_PATH, TFLITE_INT8_NAME)}

Edge TPU Compiler version 16.0.384591198
Started a compilation timeout timer of 180 seconds.

Model compiled successfully in 3336 ms.

Input model: ./exported_models/model_int8.tflite
Input size: 3.30MiB
Output model: ./exported_models/model_int8_edgetpu.tflite
Output size: 3.97MiB
On-chip memory used for caching model parameters: 3.21MiB
On-chip memory remaining for caching model parameters: 4.39MiB
Off-chip memory used for streaming uncached model parameters: 4.00KiB
Number of Edge TPU subgraphs: 1
Total number of operations: 182
Operation log: ./exported_models/model_int8_edgetpu.log

Operator                       Count      Status

RESHAPE                        8          Mapped to Edge TPU
CONCATENATION                  2          Mapped to Edge TPU
ADD                            13         Mapped to Edge TPU
CONV_2D                        81         Mapped to Edge TPU
QUANTIZE                       10         Mapped to Edge TPU
MUL                            1          Mapped t

In [25]:
# Import model metadata
with open(METADATA_PATH, 'r') as file:
    metadata = json.load(file)

# Parse metadata
custom_metadata = metadata['subgraph_metadata'][0]['custom_metadata'][0]
anchors = custom_metadata['data']['ssd_anchors_options']['fixed_anchors_schema']['anchors']
num_values_per_keypoint = custom_metadata['data']['tensors_decoding_options']['num_values_per_keypoint']
apply_exponential_on_box_size = custom_metadata['data']['tensors_decoding_options']['apply_exponential_on_box_size']
x_scale = custom_metadata['data']['tensors_decoding_options']['x_scale']
y_scale = custom_metadata['data']['tensors_decoding_options']['y_scale']
w_scale = custom_metadata['data']['tensors_decoding_options']['w_scale']
h_scale = custom_metadata['data']['tensors_decoding_options']['h_scale']

In [26]:
# Figure out when the resets (sectors) occur, the x/y increases, and width/height of anchors
reset_idxs = []
y_strides = []
x_strides = []
widths_per_section = []
widths = []
heights_per_section = []
heights = []
reset_flag = True
x_stride_flag = True
width_flag = True

# Go through all the anchors
num_anchors = len(anchors)
for i in range(num_anchors):

    # Store the first index
    if i == 0:
        reset_idxs.append(i)

    # Only measure strides on not 0 indexes
    else:

        # New section: reset flags
        if anchors[i]['y_center'] < anchors[i - 1]['y_center']:
            reset_idxs.append(i)
            reset_flag = True
            x_stride_flag = True
            width_flag = True

        # Measure Y increase (stride)
        if reset_flag:
            if anchors[i]['y_center'] > anchors[i - 1]['y_center']:
                y_inc = anchors[i]['y_center'] - anchors[i - 1]['y_center']
                y_strides.append(round(y_inc, 5))
                reset_flag = False

        # Measure X increase (stride)
        if x_stride_flag:
            if anchors[i]['x_center'] > anchors[i - 1]['x_center']:
                x_inc = anchors[i]['x_center'] - anchors[i - 1]['x_center']
                x_strides.append(round(x_inc, 5))
                x_stride_flag = False

    # Record widths and heights of the anchor boxes
    if width_flag:
        if i != 0 and anchors[i]['x_center'] > anchors[i - 1]['x_center']:
            widths.append(widths_per_section)
            widths_per_section = []
            heights.append(heights_per_section)
            heights_per_section = []
            width_flag = False
        else:
            width = anchors[i]['width']
            widths_per_section.append(round(width, 5))
            height = anchors[i]['height']
            heights_per_section.append(round(height, 5))

# Calculate the number of sectors
num_sectors = len(reset_idxs)

# Calculate the number of anchors per coordinate
num_anchors_per_coord = len(widths[0])

# Calculate the number of Xs in each Y
num_xs_per_y = []
for sector in range(num_sectors):
    num_xs_per_y.append(int(1.0 / x_strides[sector] * num_anchors_per_coord))

print(f"Number of anchors {num_anchors}")
print(f"Number of sectors: {num_sectors}")
print(f"Number of anchors per coordinate: {num_anchors_per_coord}")
print(f"Reset indexes: {reset_idxs}")
print(f"Number of Xs per Y: {num_xs_per_y}")
print(f"X strides: {x_strides}")
print(f"Y strides: {y_strides}")
print("Widths:")
for wps in widths:
    print(wps)
print("Heights:")
for hps in heights:
    print(hps)

Number of anchors 19125
Number of sectors: 4
Number of anchors per coordinate: 9
Reset indexes: [0, 14400, 18000, 18900]
Number of Xs per Y: [360, 180, 90, 45]
X strides: [0.025, 0.05, 0.1, 0.2]
Y strides: [0.025, 0.05, 0.1, 0.2]
Widths:
[0.05303, 0.075, 0.10607, 0.06682, 0.09449, 0.13364, 0.08418, 0.11905, 0.16837]
[0.10607, 0.15, 0.21213, 0.13364, 0.18899, 0.26727, 0.16837, 0.23811, 0.33674]
[0.21213, 0.3, 0.42426, 0.26727, 0.37798, 0.53454, 0.33674, 0.47622, 0.67348]
[0.42426, 0.6, 0.84853, 0.53454, 0.75595, 1.06908, 0.67348, 0.95244, 1.34695]
Heights:
[0.10607, 0.075, 0.05303, 0.13364, 0.09449, 0.06682, 0.16837, 0.11905, 0.08418]
[0.21213, 0.15, 0.10607, 0.26727, 0.18899, 0.13364, 0.33674, 0.23811, 0.16837]
[0.42426, 0.3, 0.21213, 0.53454, 0.37798, 0.26727, 0.67348, 0.47622, 0.33674]
[0.84853, 0.6, 0.42426, 1.06908, 0.75595, 0.53454, 1.34695, 0.95244, 0.67348]


In [27]:
# Generate header file for metadata information
h_str = f"""\
// Filename: {METADATA_H_NAME}

#ifndef METADATA_HPP
#define METADATA_HPP

namespace metadata {{
    constexpr unsigned int num_anchors = {num_anchors};
    constexpr int apply_exp_scaling = {1 if apply_exponential_on_box_size else 0};
    constexpr float x_scale = {x_scale};
    constexpr float y_scale = {y_scale};
    constexpr float w_scale = {w_scale};
    constexpr float h_scale = {h_scale};
    constexpr unsigned int num_sectors = {num_sectors};
    constexpr unsigned int num_anchors_per_coord = {num_anchors_per_coord};
"""

# Print reset indexes
h_str += "    constexpr unsigned int reset_idxs[] = {\r\n"
h_str += "        "
for i in range(num_sectors):
    h_str += f"{reset_idxs[i]}"
    if i < num_sectors - 1:
        h_str += ", "
h_str += "\r\n"
h_str += "    };\r\n"

# Print the number of X values for each Y value
h_str += "    constexpr unsigned int num_xs_per_y[] = {\r\n"
h_str += "        "
for i in range(num_sectors):
    h_str += f"{num_xs_per_y[i]}"
    if i < num_sectors - 1:
        h_str += ", "
h_str += "\r\n"
h_str += "    };\r\n"

# Print the X strides
h_str += "    constexpr float x_strides[] = {\r\n"
h_str += "        "
for i in range(num_sectors):
    h_str += f"{x_strides[i]}"
    if i < num_sectors - 1:
        h_str += ", "
h_str += "\r\n"
h_str += "    };\r\n"

# Print the Y strides
h_str += "    constexpr float y_strides[] = {\r\n"
h_str += "        "
for i in range(num_sectors):
    h_str += f"{y_strides[i]}"
    if i < num_sectors - 1:
        h_str += ", "
h_str += "\r\n"
h_str += "    };\r\n"

# Print the anchor widths for each section
h_str += f"    constexpr float widths[{num_sectors}][{len(widths[0])}] = {{\r\n"
for i in range(num_sectors):
    h_str += "        {"
    for j in range(len(widths[0])):
        h_str += f"{widths[i][j]}"
        if j < len(widths[0]) - 1:
            h_str += ", "
    h_str += "}"
    if i < num_sectors - 1:
        h_str += ","
    h_str += "\r\n"
h_str += "    };\r\n"

# Print the anchor heights for each section
h_str += f"    constexpr float heights[{num_sectors}][{len(heights[0])}] = {{\r\n"
for i in range(num_sectors):
    h_str += "        {"
    for j in range(len(heights[0])):
        h_str += f"{heights[i][j]}"
        if j < len(heights[0]) - 1:
            h_str += ", "
    h_str += "}"
    if i < num_sectors - 1:
        h_str += ","
    h_str += "\r\n"
h_str += "    };\r\n"

# Close header file
h_str += """\
}

#endif // METADATA_HPP
"""

# write to .h file
with open(METADATA_H_PATH, 'w') as file:
    file.write(h_str)

In [28]:
# Zip exported models
zip_name = os.path.normpath(EXPORT_PATH).split(os.sep)[-1] + ".zip"
zip_path = os.path.join(BASE_PATH, zip_name)
!zip -q -r {zip_path} {EXPORT_PATH}/*

In [29]:
# Download exported models
files.download(zip_path)

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [None]:
!zip -q -r exported_models.zip {os.path.join(EXPORT_PATH, "*")}