## YOLO_v4_tiny Object detection using STM32AI ModelZoo

``` python
# /*---------------------------------------------------------------------------------------------
#  * Copyright (c) 2022 STMicroelectronics.
#  * All rights reserved.
#  *
#  * This software is licensed under terms that can be found in the LICENSE file in
#  * the root directory of this software component.
#  * If no LICENSE file comes with this software, it is provided AS-IS.
#  *--------------------------------------------------------------------------------------------*/
```

### <u>NOTE: This notebook is tested with Python version 3.10.x.</u>

## 0. Removing Post-Processing layer of exported model

After training, optimizing and adapting the YOLO_v4_tiny Object Detection model using Jupyter Notebook [yolo_v4_tiny.ipynb](./yolo_v4_tiny.ipynb), the trained model is available as `.onnx` format in `./export/`. This model can then be used with the `stm32ai-modelzoo-services` to be 
- quantized
- used to run inference
- benchmarked, and
- deployed on STM32NPU.

However, the exported model has a post processing node as shown below, this post processing layer has to be removed before the model can be used with [stm32ai-modelzoo-services](https://github.com/STMicroelectronics/stm32ai-modelzoo-services/tree/main) or [STEdgeAI](https://stm32ai.st.com/stm32-cube-ai/).


<br>

<img style="float: center;background-color: white; width: 1080" src="../docs/post_processing_node_yolov4_tiny.png" width="1080">

<br> 


To remove this post-processing layer please use the `./utils/remove_nms.py`.

After removal of the post-processing node the model has two outputs called `box` and `cls` as below.

<br>

<img style="float: center;background-color: white; width: 1080" src="../docs/removed_nms_head.png" width="1080">

<br> 

**Note**: The values shown as the shapes of the `cls` and `box` are when the input shape is `256 x 256` and batch_size of 1.


To do this we need to install the python packages
- onnx_graphsurgeon
- numpy
- onnx
- onnxruntime

Then correct the path to the model you want to remove the nms from in the file:
```python
input_model = '../export/yolov4_cspdarknet_tiny_epoch_***.onnx' # correct the path
```
and launch the scripts `remove_nms.py`. The following cell creates a python virtual environment and does all these steps. Running this cell will result in a model file `../export/yolov4_cspdarknet_tiny_epoch_***_no_nms.onnx`. This model file then can be used to run with this notebook.

In [None]:
import subprocess
import os

os.chdir("utils")
venv_name = "temp"
subprocess.run(["python", "-m", "venv", venv_name])
subprocess.run([os.path.join(venv_name, "Scripts", "pip"), "install", "onnx_graphsurgeon", "onnxruntime", "onnx", "numpy"])
subprocess.run([os.path.join(venv_name, "Scripts", "python"), "remove_nms.py"])
os.chdir("..")

The cells below show how this model then can be used with [stm32ai-modelzoo-services](https://github.com/STMicroelectronics/stm32ai-modelzoo-services/tree/main).


The rest of the notebook is arranged as below:

<div style="border-bottom: 3px solid #273B5F">
<h2>Table of content</h2>
<ul style="list-style-type: none">
  <li><a href="#Setup">1. Setup Instructions</a></li>

<li><a href="#Prep">2. Preparing the Baseline Model</a></li>
    <ul style="list-style-type: none">
    <li><a href="#Prediction">2.1 Prediction</a></li>
    <li><a href="#Quantization">2.2 Quantization</a></li>
    <li><a href="#Prediction_2">2.3 Prediction</a></li>
    <li><a href="#Benchmarking">2.4 Benchmarking the Model</a></li>
</ul>
</div>

<div id="Setup">
    <h2>1. Setup Instructions</h2>
</div>

In this notebook, we present how to quantize, run prediction and benchmark a [YOLO_v4-tiny](https://docs.nvidia.com/tao/tao-toolkit-archive/tao-30-2202/text/object_detection/yolo_v4_tiny.html) object detection model on a STM32 board using the [STM32 model zoo](https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main). 

The STM32 model zoo contains valuable resource, is accessible on GitHub, and offers a range of use cases such as image classification, object detection, audio event detection, hand posture, and human activity recognition. It provides numerous services, including training, evaluation, prediction, deployment, quantization, benchmarking, and chained services, such as chain_tbqeb, chain_tqe, chain_eqe, chain_qb, chain_eqeb, and chain_qd, which are thoroughly explained in their respective readmes.

To get started, you'll need to clone the stm32ai model zoo repository by running the code below:

In [None]:
!git clone https://github.com/STMicroelectronics/stm32ai-modelzoo-services.git

After running the code above, navigate to the stm32ai-modelzoo-services repository and install the required libraries by running the code below:

In [None]:
os.chdir('stm32ai-modelzoo-services')
!pip install -r requirements.txt

In this notebook, we will be utilizing the various services of the object detection service. To do so, we must navigate to the object detection source by running the code section below and use the `stm32ai_main.py` script in conjunction with a YAML file in the next sections. 

In [None]:
os.chdir('object_detection')

<div id="Prep">
    <h2>2. Preparing the Model</h2>
</div>

<div id="Prediction">
    <h3>2.1 Prediction</h3>
</div>

In this section, we will be using an object detection model and predict images. To achieve this, we will be using the `prediction_config.yaml` file located in `object_detection/src/config_file_examples` to specify the `operation_mode` and the other configuration parameters such as the `model_path`, `model_type`, `preprocessing`, `postprocessing` etc. 

To start off, change the path to the test directory, the model path and type and the operation mode to `prediction`.
``` yaml
general:
   model_path: ../../export/yolov4_cspdarknet_tiny_epoch_***_no_nms.onnx
   model_type: yolo_v4_tiny

operation_mode: prediction
```
Then, make sure to modify preprocessing and postprocessing parameters in the configuration file to correspond to the parameters used during training:

```yaml
preprocessing:
  rescaling: 
    scale: [1, 1, 1]
    offset: [-103.939,-116.779,-123.68]
  resizing:
    aspect_ratio: fit
    interpolation: bilinear
  color_mode: bgr

postprocessing:
  confidence_thresh: 0.4 # to not have boxes with small confidence
  NMS_thresh: 0.5
  IoU_eval_thresh: 0.5
  plot_metrics: True   # Plot precision versus recall curves. Default is False.
  max_detection_boxes: 200
```

Finally, make sure to change the path to the directory containing the prediction images.
```yaml
prediction:
  test_files_path: ../../../data/test_samples_person
```

Once this is done, you can run your predictions using the cell below:

In [None]:
%run stm32ai_main.py --config-path src/config_file_examples --config-name prediction_config.yaml

<div id="Quantization">
    <h3>2.2 Quantization</h3>
</div>

In this section, we will quantize the float32 model to an int8 quantized model. Quantization is a technique used to reduce the memory and computation requirements of a model by converting the weights and activations from float32 to int8.

To perform quantization, we will use the sample configuration file `quantization_config.yaml` provided in `object_detection/src/config_example_files` for providing the configurations. The configuration file specifies the `quantization_dataset_path`, `preprocessing`, the quantization parameters, such as the `quantization_input_type` and `quantization_output_type`. 

Depending on your needs, you can adapt the following parameters:

``` yaml
general:
   model_path: ../../export/yolov4_cspdarknet_tiny_epoch_***_no_nms.onnx
   model_type: yolo_v4_tiny

operation_mode: quantization

dataset:
  name: COCO_2017_person
  class_names: [ person ]
  quantization_path: ../../../data/test_samples_person # containing sample images for the quantization (ideally the whole training set but few 20s will suffice)
  quantization_split: 0.99 # to use all the images for the quantization

preprocessing:
  rescaling: 
    scale: [1, 1, 1]
    offset: [-103.939,-116.779,-123.68]
  resizing:
    aspect_ratio: fit
    interpolation: bilinear
  color_mode: bgr

postprocessing:
  confidence_thresh: 0.4 # to not have boxes with small confidence
  NMS_thresh: 0.5
  IoU_eval_thresh: 0.5
  plot_metrics: True   # Plot precision versus recall curves. Default is False.
  max_detection_boxes: 200

quantization:
  quantizer: onnx_quantizer
  target_opset: 17
  granularity: per_channel #per_channel
  quantization_type: PTQ
  quantization_input_type: float 
  quantization_output_type: float
  export_dir: quantized_models
```

After running the `stm32ai_main.py` script with the `quantization_config.yaml` file, an int8 quantized tflite model will be generate and saved under **experiments_outputs/experiment_runtime/quantized_models**.

In [None]:
%run stm32ai_main.py --config-path src/config_file_examples --config-name quantization_config.yaml

<div id="Prediction_2">
    <h3>2.3 Prediction</h3>
</div>

After the quantization of the model we can run the inference again to check that quantization has not effected the performance of the model. In this section, we will be using the quantized model for running the prediction on the test images.

You should be using the same parameters as the previous prediction, but change the model path to the new model.

```yaml
general:
    model_path: src/experiments_outputs/YYYY_MM_DD_HH_MM_SS/quantized_models/yolov4_cspdarknet_tiny_epoch_***_no_nms_quant_qdq_pc.onnx
```

In [None]:
%run stm32ai_main.py --config-path src/config_file_examples --config-name prediction_config.yaml

<div id="Benchmarking">
    <h3>2.4 Benchmarking the Model</h2>
</div>

In this section we use the [STM32Cube.AI Developer Cloud](https://stm32ai-cs.st.com/home) to benchmark the quantized model on the **STM32N6750-DK** board.

If you are behind a proxy, you can uncomment and fill the following proxy settings.

**NOTE** : If the password contains some special characters like `@`, `:` etc. they need to be url-encoded with their ASCII values.

In [None]:
# os.environ['http_proxy'] = "http://user:passwd@ip_address:port"
# os.environ['https_proxy'] = "https://user:passwd@ip_address:port"
# And eventually disable SSL verification
# os.environ['NO_SSL_VERIFY'] = "1"
# os.environ["SSL_VERIFY"]="False"


Set environment variables with your credentials to access STM32Cube.AI Developer Cloud. If you don't have an account yet go to : https://stm32ai-cs.st.com/home and click on sign in to create an account. Then set the environment variables below with your credentials.

In [None]:
import getpass

email ='xxx.yyy@st.com'
os.environ['stmai_username'] = email
print('Enter your password')
password = getpass.getpass()
os.environ['stmai_password'] = password
os.environ['NO_SSL_VERIFY'] = "1"

We will be using the `benchmarking_config.yaml` file in `object_detection/src/config_example_files/` to measure the performance of the baseline model on the **STM32N6750-DK**.

```yaml
general:
    model_path: src/experiments_outputs/YYYY_MM_DD_HH_MM_SS/quantized_models/yolov4_cspdarknet_tiny_epoch_***_no_nms_quant_qdq_pc.onnx

operation_mode: benchmarking

tools:
   stedgeai:
      version: 10.0.0
      optimization: balanced
      on_cloud: True
      path_to_stedgeai: path_to_local_installation_dir_of_stedgeai/stedgeai.exe #only needed if benchmarking is done locally.
   path_to_cubeIDE: path_to_installations_dir_of_STM32CubeIDE/stm32cubeide.exe

benchmarking:
   board: STM32N6570-DK
``` 

In [None]:
%run stm32ai_main.py --config-path src/config_file_examples --config-name benchmarking_config.yaml