# TAO Object Detection using Nvidia TAO YOLOv4 Tiny with STEdgeAI Devloper Cloud

Transfer learning is the process of transferring learned features from one application to another. It is a commonly used training technique where you use a model trained on one task and re-train to use it on a different task. 

This notebook provides a complete life cycle of an object detection model training, optimization and benchmarking using [NVIDIA TAO Toolkit](https://developer.nvidia.com/tao-toolkit) and [STEdgeAI Developer Cloud](https://stm32ai.st.com/stm32-cube-ai-dc/).


Train Adapt Optimize (TAO) Toolkit  is a simple and easy-to-use Python based AI toolkit for taking purpose-built AI models and customizing them with users' own data.

[STEdgeAI Developer Cloud](https://stm32ai-cs.st.com/home) is a free-of-charge online platform and services allowing the creation, optimization, benchmarking, and generation of AI models for the STM32 microcontrollers. It is based on the [STEdgeAI](https://stm32ai.st.com/stm32-cube-ai/) core technology.


<br>

<img style="float: center;background-color: white; width: 1080" src="../docs/TAO-STM32CubeAI.png" width="1080">

<br>

## Sample prediction of YOLOv4-tiny
<br>

<img style="float: center;background-color: white; width: 1080" src="../docs/sample_prediction.jpg" width="1080">

<br> 

## Learning Objectives
In this notebook, you will learn how to leverage the simplicity and convenience of TAO to:

* Take a pretrained model and train a YOLO v4 Tiny model on the Coco2017 Person dataset (a subset of Coco2017)
* Prune the trained YOLO v4 Tiny model
* Retrain the pruned model to recover lost accuracy
* Export the pruned model
* Run Inference on the trained model
* Export the pruned and retrained model to a .onnx file for deployment on STM32 targets

At the end of this notebook, you will have generated a trained and optimized `YOLOv4 Tiny` model
which you can evaluate, quantize, benchmark, and deploy via [STEdgeAI Developer Cloud](https://stm32ai.st.com/stm32-cube-ai-dc/) and [stm32ai-modelzoo-services](https://github.com/STMicroelectronics/stm32ai-modelzoo-services).

## Table of Contents

This notebook shows an example use case of YOLO v4 Tiny object detection using Train Adapt Optimize (TAO) Toolkit.

0. [Set up env variables and map drives](#head-0)
1. [Install the TAO launcher](#head-1)
2. [Prepare dataset and pre-trained model](#head-2) <br>
     2.1 [Download the dataset](#head-2-1)<br>
     2.2 [Verify the downloaded dataset](#head-2-2)<br>
     2.3 [Generate tfrecords](#head-2-3)<br>
     2.4 [Download pretrained model](#head-2-4)
3. [Provide training specification](#head-3)
4. [Run TAO training](#head-4)
5. [Evaluate trained models](#head-5)
6. [Prune trained models](#head-6)
7. [Retrain pruned models](#head-7)
8. [Evaluate retrained model](#head-8)
9. [Visualize inferences](#head-9)
10. [Model Export](#head-10)

## 0. Set up env variables and map drives <a class="anchor" id="head-0"></a>

The following notebook requires the user to set an env variable called the `$LOCAL_PROJECT_DIR` as the path to the users workspace. Please note that the dataset to run this notebook is expected to reside in the `$LOCAL_PROJECT_DIR/data`, while the TAO experiment generated collaterals will be output to `$LOCAL_PROJECT_DIR/yolo_v4_tiny`. More information on how to set up the dataset and the supported steps in the TAO workflow are provided in the subsequent cells.

*Note: Please make sure to remove any stray artifacts/files from the `$USER_EXPERIMENT_DIR` or `$DATA_DOWNLOAD_DIR` paths as mentioned below, that may have been generated from previous experiments. Having checkpoint files etc may interfere with creating a training graph for a new experiment.*

In [None]:
# Setting up env variables for cleaner command line commands.
import os

# %env USER_EXPERIMENT_DIR=/workspace/tao-experiments/yolo_v4_tiny
%env USER_EXPERIMENT_DIR=/workspace/tao-experiments/yolo_v4_tiny
%env DATA_DOWNLOAD_DIR=/workspace/tao-experiments/data

# Set this path if you don't run the notebook from the samples directory.
# %env NOTEBOOK_ROOT=~/tao-samples/yolo_v4_tiny

# Please define this local project directory that needs to be mapped to the TAO docker session.
# The dataset expected to be present in $LOCAL_PROJECT_DIR/data, while the results for the steps
# in this notebook will be stored at $LOCAL_PROJECT_DIR/yolo_v4_tiny
# %env LOCAL_PROJECT_DIR=YOUR_LOCAL_PROJECT_DIR_PATH
%env LOCAL_PROJECT_DIR=/local/home/stm32ai-tao/
os.environ["LOCAL_DATA_DIR"] = os.path.join(os.getenv("LOCAL_PROJECT_DIR", os.getcwd()), "data")
# os.environ["LOCAL_EXPERIMENT_DIR"] = os.path.join(os.getenv("LOCAL_PROJECT_DIR", os.getcwd()), "yolo_v4_tiny")
os.environ["LOCAL_EXPERIMENT_DIR"] = os.path.join(os.getenv("LOCAL_PROJECT_DIR", os.getcwd()), "yolo_v4_tiny")

# The sample spec files are present in the same path as the downloaded samples.
os.environ["LOCAL_SPECS_DIR"] = os.path.join(
    os.getenv("NOTEBOOK_ROOT", os.getcwd()),
    "specs"
)
# %env SPECS_DIR=/workspace/tao-experiments/yolo_v4_tiny/specs
%env SPECS_DIR=/workspace/tao-experiments/yolo_v4_tiny/specs

# Showing list of specification files.
!ls -rlt $LOCAL_SPECS_DIR

In [None]:
# Create local dir
!mkdir -p $LOCAL_DATA_DIR
!mkdir -p $LOCAL_EXPERIMENT_DIR

The cell below maps the project directory on your local host to a workspace directory in the TAO docker instance, so that the data and the results are mapped from outside to inside of the docker instance.

In [None]:
# Mapping up the local directories to the TAO docker.
import json
mounts_file = os.path.expanduser("~/.tao_mounts.json")

# Define the dictionary with the mapped drives
drive_map = {
    "Mounts": [
        # Mapping the data directory
        {
            "source": os.environ["LOCAL_PROJECT_DIR"],
            "destination": "/workspace/tao-experiments"
        },
        # Mapping the specs directory.
        {
            "source": os.environ["LOCAL_SPECS_DIR"],
            "destination": os.environ["SPECS_DIR"]
        },
    ]
}

# Writing the mounts file.
with open(mounts_file, "w") as mfile:
    json.dump(drive_map, mfile, indent=4)

In [None]:
!cat ~/.tao_mounts.json

## 1. Install the TAO launcher <a class="anchor" id="head-1"></a>
The TAO launcher is a python package distributed as a python wheel listed in PyPI. You may install the launcher by executing the following cell.

Please note that TAO Toolkit recommends users to run the TAO launcher in a virtual env with python 3.6.9. You may follow the instruction in this [page](https://virtualenvwrapper.readthedocs.io/en/latest/install.html) to set up a python virtual env using the `virtualenv` and `virtualenvwrapper` packages. Once you have setup virtualenvwrapper, please set the version of python to be used in the virtual env by using the `VIRTUALENVWRAPPER_PYTHON` variable. You may do so by running

```sh
export VIRTUALENVWRAPPER_PYTHON=/path/to/bin/python3.x
```
where x >= 6 and <= 8

We recommend performing this step first and then launching the notebook from the virtual environment. In addition to installing TAO python package, please make sure of the following software requirements:
* python >=3.7, <=3.10.x
* docker-ce > 19.03.5
* docker-API 1.40
* nvidia-container-toolkit > 1.3.0-1
* nvidia-container-runtime > 3.4.0-1
* nvidia-docker2 > 2.5.0-1
* nvidia-driver > 455+

Once you have installed the pre-requisites, please log in to the docker registry nvcr.io by following the command below

```sh
docker login nvcr.io
```

You will be triggered to enter a username and password. The username is `$oauthtoken` and the password is the API key generated from `ngc.nvidia.com`. Please follow the instructions in the [NGC setup guide](https://docs.nvidia.com/ngc/ngc-overview/index.html#generating-api-key) to generate your own API key.

After setting up your virtual environment with the above requirements, install TAO pip package.

In [None]:
# SKIP this step IF you have already installed the TAO launcher.
!pip3 install --upgrade nvidia-pyindex
!pip3 install --upgrade nvidia-tao

In [None]:
# View the versions of the TAO launcher
!tao info --verbose

## 2. Prepare dataset and pre-trained model <a class="anchor" id="head-2"></a>

In this notebook we will be using COCO2017 dataset for training an Yolo V4 Tiny model to detect persons in the image. 

The downloading and unzipping the dataset is long and out of the scope of this notebook. However, to make the process smoother for the users we provide the guidelines for data prepration below:

Download the coco2017 dataset from the link [coco2017 Dataset Download](https://cocodataset.org/#download). In particular download 
* 2017 Train images \[118K/18GB\]
* 2017 Val images \[5K/1GB\], and
* 2017 Train/Val annotations \[241MB\]

Unzip these files in a single directory called COCO2017. After unzipping, copy the python script file `./utils/generate_convert_coco_subset_to_kitti.py` to this folder. you should have a structure like below:
```bash
coco2017/
... train2017/
...... 00000***.jpg
...... 00000***.jpg
...... 00000***.jpg

... val2017/
...... 00000***.jpg
...... 00000***.jpg
...... 00000***.jpg

... annotations
...... instances_val2017.json
...... instances_val2017.json

... generate_convert_coco_subset_to_kitti.py
```

Then launch two commands below to filter and convert the dataset to the input format of this script.

    python generate_convert_coco_subset_to_kitti.py --source-image-dir ./val2017 --source-annotation-file ./annotations/instances_val2017.json --out-data-dir ./coco2017_person/val2017 --num-images <num_images_to_keep> --categories-to-keep person

    python generate_convert_coco_subset_to_kitti.py --source-image-dir ./train2017 --source-annotation-file ./annotations/instances_train2017.json --out-data-dir ./coco2017_person/train2017 --num-images <num_images_to_keep> --categories-to-keep person
    
_if you want to keep all the images for the chosen classes, do not provide the variable `--num-images` in the call._

The result will be a folder inside the coco2017 folder called `coco2017_person` with structure like:
```bash
coco2017_person
... val2017
...... annotations
......... instances_val2017.json # filtered coco annotations for the person class only
...... images
......... 0000*****.jpg
......... 0000*****.jpg
...... kitti_annotations
......... 0000*****.txt
......... 0000*****.txt

... train2017
...... annotations
......... instances_train2017.json # filtered coco annotations for the person class only
...... images
......... 0000*****.jpg
......... 0000*****.jpg
...... kitti_annotations
......... 0000*****.txt
......... 0000*****.txt
```

<b> Note: The names of the directories have to be exactly like this otherwise the notebook will not run without adaption to use the names that are resulted. </b>

Once the directories are created, copy the directory `coco2017_person` in the `stm32ai_tao/data/` directory

### 2.1 Check the dataset <a class="anchor" id="head-2-1"></a>

Once the dataset is prepared and copied, this next cell, will show the number of images in the train and validation splits.

In [None]:
# verify
import os

DATA_DIR = os.environ.get('LOCAL_DATA_DIR')
num_training_images = len(os.listdir(os.path.join(DATA_DIR, "coco2017_person/train2017/images")))
num_training_labels = len(os.listdir(os.path.join(DATA_DIR, "coco2017_person/train2017/kitti_annotations/")))
num_testing_images = len(os.listdir(os.path.join(DATA_DIR, "coco2017_person/val2017/images")))
num_testing_labels = len(os.listdir(os.path.join(DATA_DIR, "coco2017_person/val2017/kitti_annotations/")))
print("Number of images in the train/val set. {}".format(num_training_images))
print("Number of labels in the train/val set. {}".format(num_training_labels))
print("Number of images in the test set. {}".format(num_testing_images))
print("Number of labels in the test set. {}".format(num_testing_labels))

### 2.1.a Running the following cell will generate the anchor shapes based on the data for the training set.
Generate the anchor shapes and then copy them in the training spec files. The values available in the spec files are generated using all the images with the person annotations in them in coco2017.

__Following cell is not needed to be run or the values to be copied to spec files if same dataset is being used__.

In [None]:
# If you use your own dataset, you will need to run the code below to generate the best anchor shape

!tao model yolo_v4_tiny kmeans -l $DATA_DOWNLOAD_DIR/coco2017_person/train2017/kitti_annotations/ \
                         -i $DATA_DOWNLOAD_DIR/coco2017_person/train2017/images/ \
                         -n 6 \
                         -x 256 \
                         -y 256

# x and y are the values of the height and width.
# The anchor shape generated by this script is sorted. Write the first 3 into small_anchor_shape in the config spec
# file. Write middle 3 into mid_anchor_shape. Write last 3 into big_anchor_shape.

### 2.3 Generate tfrecords <a class="anchor" id="head-2-3"></a>

The default YOLOv4 Tiny data format requires generation of TFRecords. To do this run the two following cells.

__Note: we observe the TFRecords format sometimes results in CUDA error during evaluation. Setting `force_on_cpu` in `nms_config` to `true` can help prevent this problem.__

In [None]:
!tao model yolo_v4_tiny dataset_convert -d $SPECS_DIR/yolo_v4_tiny_tfrecords_person_train.txt \
                             -o $DATA_DOWNLOAD_DIR/yolo_v4_tiny_person/tfrecords/train \
                             -r $USER_EXPERIMENT_DIR/

In [None]:
!tao model yolo_v4_tiny dataset_convert -d $SPECS_DIR/yolo_v4_tiny_tfrecords_person_val.txt \
                             -o $DATA_DOWNLOAD_DIR/yolo_v4_tiny_person/tfrecords/val \
                             -r $USER_EXPERIMENT_DIR/

### 2.4 Download pre-trained model <a class="anchor" id="head-2-4"></a>

We will use NGC CLI to get the pre-trained models. For more details, go to [ngc.nvidia.com](ngc.nvidia.com) and click the SETUP on the navigation bar.

In [None]:
# Installing NGC CLI on the local machine.
## Download and install
%env CLI=ngccli_cat_linux.zip
!mkdir -p $LOCAL_PROJECT_DIR/ngccli

# Remove any previously existing CLI installations
!rm -rf $LOCAL_PROJECT_DIR/ngccli/*
!wget "https://ngc.nvidia.com/downloads/$CLI" -P $LOCAL_PROJECT_DIR/ngccli
!unzip -u "$LOCAL_PROJECT_DIR/ngccli/$CLI" -d $LOCAL_PROJECT_DIR/ngccli/
!rm $LOCAL_PROJECT_DIR/ngccli/*.zip 
os.environ["PATH"]="{}/ngccli/ngc-cli:{}".format(os.getenv("LOCAL_PROJECT_DIR", ""), os.getenv("PATH", ""))

In [None]:
!ngc registry model list nvidia/tao/pretrained_object_detection:*

In [None]:
!mkdir -p $LOCAL_EXPERIMENT_DIR/pretrained_cspdarknet_tiny

In [None]:
# Pull pretrained model from NGC
!ngc registry model download-version nvidia/tao/pretrained_object_detection:cspdarknet_tiny \
                   --dest $LOCAL_EXPERIMENT_DIR/pretrained_cspdarknet_tiny

In [None]:
print("Check that model is downloaded into dir.")
!ls -l $LOCAL_EXPERIMENT_DIR/pretrained_cspdarknet_tiny/pretrained_object_detection_vcspdarknet_tiny

## 3. Provide training specification <a class="anchor" id="head-3"></a>
* Augmentation parameters for on-the-fly data augmentation
* Other training (hyper-)parameters such as batch size, number of epochs, learning rate etc.

In [None]:
# Provide pretrained model path
!sed -i 's,EXPERIMENT_DIR,'"$USER_EXPERIMENT_DIR"',' $LOCAL_SPECS_DIR/yolo_v4_tiny_train_person.txt

In [None]:
!cat $LOCAL_SPECS_DIR/yolo_v4_tiny_train_person.txt

## 4. Run TAO training <a class="anchor" id="head-4"></a>
* Provide the sample spec file and the output directory location for models
* WARNING: training will take several hours or one day to complete

In [None]:
!mkdir -p $LOCAL_EXPERIMENT_DIR/experiment_dir_unpruned

In [None]:
print("To run with multigpu, please change --gpus based on the number of available GPUs in your machine.")
!tao model yolo_v4_tiny train -e $SPECS_DIR/yolo_v4_tiny_train_person.txt \
                   -r $USER_EXPERIMENT_DIR/experiment_dir_unpruned \
                   --gpus 1

In [None]:
print("To resume from checkpoint, please change pretrain_model_path to resume_model_path in config file.")

In [None]:
print('Model for each epoch:')
print('---------------------')
!ls -ltrh $LOCAL_EXPERIMENT_DIR/experiment_dir_unpruned/weights

In [None]:
# Now check the evaluation stats in the csv file and pick the model with highest eval accuracy.
!cat $LOCAL_EXPERIMENT_DIR/experiment_dir_unpruned/yolov4_training_log_cspdarknet_tiny.csv
%set_env EPOCH=080

## 5. Evaluate trained models <a class="anchor" id="head-5"></a>

In [None]:
!tao model yolo_v4_tiny evaluate -e $SPECS_DIR/yolo_v4_tiny_train_person.txt \
                      -m $USER_EXPERIMENT_DIR/experiment_dir_unpruned/weights/yolov4_cspdarknet_tiny_epoch_$EPOCH.hdf5

In [None]:
# tao <task> export will fail if .onnx already exists. So we clear the export folder before tao <task> export
# !rm -rf $LOCAL_EXPERIMENT_DIR/export
!mkdir -p $LOCAL_EXPERIMENT_DIR/export
# Generate .onnx file using tao container
!tao model yolo_v4_tiny export -m $USER_EXPERIMENT_DIR/experiment_dir_unpruned/weights/yolov4_cspdarknet_tiny_epoch_$EPOCH.hdf5 \
                               -o $USER_EXPERIMENT_DIR/export/yolov4_cspdarknet_tiny_unpruned_epoch_$EPOCH.onnx \
                               -e $SPECS_DIR/yolo_v4_tiny_train_person.txt \
                               --target_opset 15 \
                               --gen_ds_config

## 6. Prune trained models <a class="anchor" id="head-6"></a>
* Specify pre-trained model
* Equalization criterion
* Threshold for pruning
* Output directory to store the model

Usually, you just need to adjust `-pth` (threshold) for accuracy and model size trade off. Higher `pth` gives you smaller model (and thus higher inference speed) but worse accuracy. The threshold value depends on the dataset and the model. `0.7` in the block below is just a start point. If the retrain accuracy is good, you can increase this value to get smaller models. Otherwise, lower this value to get better accuracy.

In [None]:
!mkdir -p $LOCAL_EXPERIMENT_DIR/experiment_dir_pruned

In [None]:
!tao model yolo_v4_tiny prune -m $USER_EXPERIMENT_DIR/experiment_dir_unpruned/weights/yolov4_cspdarknet_tiny_epoch_$EPOCH.hdf5 \
                   -e $SPECS_DIR/yolo_v4_tiny_train_person.txt \
                   -o $USER_EXPERIMENT_DIR/experiment_dir_pruned/yolov4_cspdarknet_tiny_pruned.hdf5 \
                   -eq geometric_mean \
                   -pth 0.7

In [None]:
!ls -rlt $LOCAL_EXPERIMENT_DIR/experiment_dir_pruned/

## 7. Retrain pruned models <a class="anchor" id="head-7"></a>
* Model needs to be re-trained to bring back accuracy after pruning
* Specify re-training specification
* WARNING: training will take several hours or one day to complete

In [None]:
# Printing the retrain spec file. 
# Here we have updated the spec file to include the newly pruned model as a pretrained weights.
!sed -i 's,EXPERIMENT_DIR,'"$USER_EXPERIMENT_DIR"',' $LOCAL_SPECS_DIR/yolo_v4_tiny_retrain_person.txt
!cat $LOCAL_SPECS_DIR/yolo_v4_tiny_retrain_person.txt

In [None]:
!mkdir -p $LOCAL_EXPERIMENT_DIR/experiment_dir_retrain

In [None]:
# Retraining using the pruned model as pretrained weights 
!tao model yolo_v4_tiny train --gpus 1 \
                   -e $SPECS_DIR/yolo_v4_tiny_retrain_person.txt \
                   -r $USER_EXPERIMENT_DIR/experiment_dir_retrain

In [None]:
# Listing the newly retrained model.
!ls -rlt $LOCAL_EXPERIMENT_DIR/experiment_dir_retrain/weights

In [None]:
# Now check the evaluation stats in the csv file and pick the model with highest eval accuracy.
!cat $LOCAL_EXPERIMENT_DIR/experiment_dir_retrain/yolov4_training_log_cspdarknet_tiny.csv
%set_env EPOCH=080

## 8. Evaluate retrained model <a class="anchor" id="head-8"></a>

In [None]:
!tao model yolo_v4_tiny evaluate -e $SPECS_DIR/yolo_v4_tiny_retrain_person.txt \
                      -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/yolov4_cspdarknet_tiny_epoch_$EPOCH.hdf5

## 9. Visualize inferences <a class="anchor" id="head-9"></a>
In this section, we run the `infer` tool to generate inferences on the trained models and visualize the results.

In [None]:
# Copy some test images
!mkdir -p $LOCAL_DATA_DIR/test_samples_person
!cp $LOCAL_DATA_DIR/coco2017_person/val2017/images/00000000* $LOCAL_DATA_DIR/test_samples_person/

In [None]:
# Running inference for detection on n images
!tao model yolo_v4_tiny inference -i $DATA_DOWNLOAD_DIR/test_samples_person \
                       -e $SPECS_DIR/yolo_v4_tiny_retrain_person.txt \
                       -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/yolov4_cspdarknet_tiny_epoch_$EPOCH.hdf5 \
                       -r $USER_EXPERIMENT_DIR/

The `inference` tool produces two outputs. 
1. Overlain images in `$LOCAL_EXPERIMENT_DIR/images_annotated`
2. Frame by frame bbox labels in kitti format located in `$LOCAL_EXPERIMENT_DIR/labels`

In [None]:
# Simple grid visualizer
# !pip3 install "matplotlib>=3.3.3, <4.0"
import matplotlib.pyplot as plt
import os
from math import ceil
valid_image_ext = ['.jpg', '.png', '.jpeg', '.ppm']

def visualize_images(image_dir, num_cols=4, num_images=10):
    output_path = os.path.join(os.environ['LOCAL_EXPERIMENT_DIR'], image_dir)
    num_rows = int(ceil(float(num_images) / float(num_cols)))
    f, axarr = plt.subplots(num_rows, num_cols, figsize=[80,30])
    f.tight_layout()
    a = [os.path.join(output_path, image) for image in os.listdir(output_path) 
         if os.path.splitext(image)[1].lower() in valid_image_ext]
    for idx, img_path in enumerate(a[:num_images]):
        col_id = idx % num_cols
        row_id = idx // num_cols
        img = plt.imread(img_path)
        axarr[row_id, col_id].imshow(img) 

In [None]:
# Visualizing the sample images.
OUTPUT_PATH = 'images_annotated' # relative path from $USER_EXPERIMENT_DIR.
COLS = 5 # number of columns in the visualizer grid.
IMAGES = 25 # number of images to visualize.

visualize_images(OUTPUT_PATH, num_cols=COLS, num_images=IMAGES)

## 10. Model Export <a class="anchor" id="head-10"></a>

If you trained a non-QAT model, you may export in FP32, FP16 or INT8 mode using the code block below. For INT8, you need to provide calibration image directory.

In [None]:
# tao <task> export will fail if .onnx already exists. So we clear the export folder before tao <task> export
# !rm -rf $LOCAL_EXPERIMENT_DIR/export
!mkdir -p $LOCAL_EXPERIMENT_DIR/export
# Generate .onnx file using tao container
!tao model yolo_v4_tiny export -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/yolov4_cspdarknet_tiny_epoch_$EPOCH.hdf5 \
                               -o $USER_EXPERIMENT_DIR/export/yolov4_cspdarknet_tiny_epoch_$EPOCH.onnx \
                               -e $SPECS_DIR/yolo_v4_tiny_retrain_person.txt \
                               --target_opset 15 \
                               --gen_ds_config

## Benchmarking and using the trained and exported model

After running this cell the trained model is available as `.onnx` format in `./export/`. This model can then be used with the `stm32ai-modelzoo-services` to be 
- quantized
- used to run inference
- benchmarked, and
- deployment on STM32NPU.

However, the exported model has a post processing node as shown below, this post processing layer has to be removed before the model can be used with [stm32ai-modelzoo-services](https://github.com/STMicroelectronics/stm32ai-modelzoo-services/tree/main) or [STEdgeAI](https://stm32ai.st.com/stm32-cube-ai/).


<br>

<img style="float: center;background-color: white; width: 1080" src="../docs/post_processing_node_yolov4_tiny.png" width="1080">

<br> 


To remove this post-processing layer please use the `./utils/remove_nms.py`.

After removal of the post-processing node the model has two outputs called `box` and `cls` as below.

<br>

<img style="float: center;background-color: white; width: 1080" src="../docs/removed_nms_head.png" width="1080">

<br> 

**Note**: The values shown as the shapes of the `cls` and `box` are when the input shape is 256 x 256 and batch_size of 1.


To do this we need to install the python packages
- onnx_graphsurgeon
- numpy
- onnx
- onnxruntime

Then correct the path to the model you want to remove the nms from in the file:
```python
input_model = './export/yolov4_cspdarknet_tiny_epoch_080.onnx' # correct the path
```
and launch the script 
> python remove_nms.py

This will result in a model file `./export/yolov4_cspdarknet_tiny_epoch_080_no_nms.onnx`. This model file then can be used to run with the [using_yolo_v4_tiny_with_stm32ai_modelzoo.ipynb](./using_yolo_v4_tiny_with_stm32ai_modelzoo.ipynb).