<h1> <FONT COLOR="#273B5F"> Quantization of Deep Learning Models using ONNX Runtime: </h1>
    
    


<p>
The process of quantization involves the convertion the original floating-point parameters and intermediate activations of a model into lower precision integer representations. This reduction in precision can significantly decrease the memory footprint and computational cost of the model, making it more efficient to deploy on STM32 board using STM32Cube.AI or any other resource-constrained devices.

ONNX Runtime Quantization is a feature the ONNX Runtime that allows efficient execution of quantized models. It provides tools and techniques to quantize the ONNX format models. It includes methods for quantizing weights and activations.


**This notebook demonstrates the process of static post-training quantization for deep learning models using the ONNX runtime. It covers the model quantization with calibration dataset or with fake data, the evaluation of the full precision model and the quantized model, and then the STM32Cube.AI Developer Cloud is used to benchmark the models and to generate the model C code to be deployed on your STM32 board.** 
</p>

## License

This software component is licensed by ST under BSD-3-Clause license,
the "License"; 

You may not use this file except in compliance with the
License. 

You may obtain a copy of the License at: https://opensource.org/licenses/BSD-3-Clause

Copyright (c) 2023 STMicroelectronics. All rights reserved

<div style="border-bottom: 3px solid #273B5F">
<h2>Table of content</h2>
<ul style="list-style-type: none">
  <li><a href="#settings">1. Settings</a>
  <ul style="list-style-type: none">
    <li><a href="#install">1.1 Install and import necessary packages</a></li>
    <li><a href="#select">1.2 Select input model filename and dataset folder</a></li>
  </ul>
</li>
<li><a href="#quantization">2.Quantization</a></li>
      <ul style="list-style-type: none">
    <li><a href="#opset">2.1 Opset conversion</a></li>
    <li><a href="#dataset">2.2 Creating calibration dataset</a></li>
    <li><a href="#quantize">2.3 Quantize the model using QDQ quantization to int8 weights and activations</a></li>
  </ul>
<li><a href="#validation">3. Evaluation </a></li>
<li><a href="#benchmark">4. Benchmark</a></li>
      <ul style="list-style-type: none">
    <li><a href="#proxy">4.1 Proxy setting and connection to the STM32Cube.AI Developer Cloud</a></li>
    <li><a href="#analyze">4.2 Analyze your model memory footprints</a></li>
    <li><a href="#Benchmark">4.3 Benchmark your model on a STM32 target</a></li>
    <li><a href="#generate">4.4 Generate the model optimized C code for STM32</a></li>
         

  </ul>
</ul>
</div>




<div id="settings">
    <h2>1. Settings</h2>
</div>


<div id="install">
    <h3>1.1 Install and import necessary packages </h3>
</div>

In [None]:
import sys
!{sys.executable} -m pip install numpy==1.23.5
!{sys.executable} -m pip install onnxruntime==1.13.1
!{sys.executable} -m pip install onnx==1.12.0
!{sys.executable} -m pip install Pillow==9.4.0
!{sys.executable} -m pip install tensorflow==2.8.3 
!{sys.executable} -m pip install scikit-learn
!{sys.executable} -m pip install tqdm

# for the cloud service
!{sys.executable} -m pip install gitdir




In [None]:
import glob
import os
import random
import shutil

import numpy as np 
import tensorflow as tf
from datetime import datetime
from tqdm import tqdm

import onnx
import onnxruntime
from onnx import version_converter
from onnxruntime import quantization
from onnxruntime.quantization import (CalibrationDataReader, CalibrationMethod,
                                      QuantFormat, QuantType, quantize_static)


<div id="select">
    <h3>1.2 Select input model filename and dataset folder</h3>
</div>


The code section bellow is to set the paths of the model and the dataset for the following notebook, the model is expected to be in Open Neural Network Exchange (ONNX) format, in the conducted experience we are using the mobilenet_v2_0.35_128 model as an exemple with the modified version of COCO2014 dataset. To find more details please visit this [link](https://pjreddie.com/projects/coco-mirror/). 

The quantization set is a directory containing a sub-directory per class, For instance:

```bash
 quantization_set/
 ..class_a:person/
 ....a_image_1.jpg
 ....a_image_2.jpg
 ..class_b:not_person/
 ....b_image_1.jpg
 ....b_image_2.jpg

 
```
**For proper quantization, ``quantization_dataset_path`` must point to the quatization set or to the training set to create the calibration dataset later.**

**For fake quantization, ``quantization_dataset_path``  is set to ``None``.**

In [41]:
input_model ="models\mobilenet_v2_128_0.5.onnx"
quantization_dataset_path=os.path.join("..\..\image_classification\scripts\training\datasets\person_dataset\quantization_set")
#quantization_dataset_path=None



<div id="quantization">
    <h2>2. Quantization</h2>
</div>


<div id="opset">
    <h3>2.1. Opset conversion  </h3>
</div>

The next function is to change the opset number of the model, this can be a necessary step to ensure a proper quantization. 

Since Batch normalization folding and other advanced optimizations are available for models with opset 13 and above and to be aligned with the onnx and onnx runtime versions, we are converting the opset of the model to 15.

To ensure compatibility between ONNX runtime version and the opset number check [the official documentation of ONNX Runtime](https://onnxruntime.ai/docs/reference/compatibility.html).



In [None]:
def change_opset(input_model, new_opset): 
    
    if not input_model.endswith('.onnx'):
        raise Exception("Error! The model must be in onnx format")    
    model = onnx.load(input_model)
    # Check the current opset version
    current_opset = model.opset_import[0].version
    if current_opset == new_opset:
        print(f"The model is already using opset {new_opset}")
        return input_model

    # Modify the opset version in the model
    converted_model = version_converter.convert_version(model, new_opset)
    temp_model_path = input_model+ '.temp'
    onnx.save(converted_model, temp_model_path)

    # Load the modified model using ONNX Runtime Check if the model is valid
    session = onnxruntime.InferenceSession(temp_model_path)
    try:
        session.get_inputs()
    except Exception as e:
        print(f"An error occurred while loading the modified model: {e}")
        return

    # Replace the original model file with the modified model
    os.replace(temp_model_path, input_model)
    print(f"The model has been converted to opset {new_opset} and saved at the same location.")
    return input_model
    
change_opset(input_model, new_opset=15)



<div id="dataset">
    <h3> 2.2 Creating calibration dataset </h3>
</div>

During the ONNX runtime quantization, the model is run on the calibration data to provide statistics about the dynamic and characteristics of each input and output. These statistics are then used to determine the main quantization parameters, which are the scale factor and a zero-point or offset to to map the floating-point values to integers. 

When obtaining representative real data for calibration is difficult or impractical, randomly generated or synthetic input data can be used for the calibration. 

The next three code sections bellow contain:
* The `create_calibration_dataset` function to create the calibration set from the original directory by taking a specific number of samples from each class, and the preprocess_image_batch function to load the batch and process it. 
* The `preprocess_random_images` to generate random images for fake quantization and preprocess them.
* The `ImageNetDataReader` class that inherate from the ONNX Runtime calibration data readers and implement the `get_next method` to generate and provide input data dictionaries for the calibration process.


As precised in <a href="#select"> Select input model filename and dataset folder</a> if you want to perfomre quantization with fake data, set the ``dataset_path`` to ``NONE``.


**Note :** the preprocessing of the quantization dataset in the section bellow is aligned with preprocessing of the trained model, for other models with diffrent preprocessing schema some arguments need to be changed like the ``color_mode``, ``interpolation`` and ``norm`` for the normalization 

In [None]:
def create_calibration_dataset(dataset_path, samples_per_class = 100):
    # the calibration dataset will be find in under the same directory as the dataset 
    calibration_dataset_path = os.path.join(os.path.dirname(dataset_path), 'calibration_' + os.path.basename(dataset_path))
    # List directories
    dir_list = next(os.walk(dataset_path))[1]

    # Create the target directory if it doesn't exist
    if not os.path.exists(calibration_dataset_path):
        os.makedirs(calibration_dataset_path)

    # For each directory, create a new directory in the target directory
    for dir_i in tqdm(dir_list):
        img_list = glob.glob(os.path.join(dataset_path, dir_i, '*.jpg')) + \
                   glob.glob(os.path.join(dataset_path, dir_i, '*.png')) + \
                   glob.glob(os.path.join(dataset_path, dir_i, '*.jpeg'))

        # Shuffle the data
        random.shuffle(img_list)

        # Copy a subset of images to the target directory
        for j in range(min(samples_per_class, len(img_list))):
            shutil.copy2(img_list[j], calibration_dataset_path)
    now = datetime.now()
    current_time = now.strftime("%H:%M:%S")
    print(current_time + ' - ' + f'Done creating calibration dataset.')
    return(calibration_dataset_path)

def preprocess_image_batch(images_folder: str, height: int, width: int,  interpolation = 'bilinear', norm='tf', size_limit=0):
    """
    Loads a batch of images and preprocess them
    parameter images_folder: path to folder storing images
    parameter height: image height in pixels
    parameter width: image width in pixels
    parameter size_limit: number of images to load. Default is 0 which means all images are picked.
    return: list of matrices characterizing multiple images
    """
    TORCH_MEANS = [0.485, 0.456, 0.406]
    TORCH_STD = [0.224, 0.224, 0.224]

    image_names = os.listdir(images_folder)
    if size_limit > 0 and len(image_names) >= size_limit:
        batch_filenames = [image_names[i] for i in range(size_limit)]
    else:
        batch_filenames = image_names
    unconcatenated_batch_data = []

    for image_name in batch_filenames:
        image_filepath = images_folder + "/" + image_name
        img = tf.keras.utils.load_img(image_filepath , grayscale = False, color_mode = 'rgb',
            target_size = (width,height), interpolation=interpolation)
        img_array = np.array([tf.keras.utils.img_to_array(img)])
        if norm.lower() == 'tf':
            img_array = -1 + img_array / 127.5
        elif norm.lower() == 'torch':
            img_array = img_array / 255.0
            img_array = img_array - TORCH_MEANS
            img_array = img_array/ TORCH_STD
        # transpose the data (hwc to chw) to be conform to the expected input data representation
        img_array = img_array.transpose((0,3,1,2))
        unconcatenated_batch_data.append(img_array)
    batch_data = np.stack(unconcatenated_batch_data, axis=0)
    return batch_data

In [None]:
def preprocess_random_images(height: int, width: int, channel: int,  size_limit=400):
    """
    Loads a batch of images and preprocess them
    parameter height: image height in pixels
    parameter width: image width in pixels
    parameter size_limit: number of images to load. Default is 100
    return: list of matrices characterizing multiple images
    """
    unconcatenated_batch_data = []
    for i in range(size_limit):
        random_vals = np.random.uniform(0, 1, channel*height*width).astype('float32')
        random_image = random_vals.reshape(1, channel, height, width)
        unconcatenated_batch_data.append(random_image)
        batch_data = np.concatenate(np.expand_dims(unconcatenated_batch_data, axis=0), axis=0) 
    now = datetime.now()
    current_time = now.strftime("%H:%M:%S")
    print(current_time + ' - ' + 'random dataset with {} random images'.format(size_limit))
    return batch_data

In [None]:
class ImageNetDataReader(CalibrationDataReader):
    def __init__(self, calibration_image_folder: str, model_path: str):
        # Use inference session to get input shape
        session = onnxruntime.InferenceSession(model_path, None)
        (_, channel, height, width) = session.get_inputs()[0].shape

        # Convert image to input data
        if calibration_image_folder:
            self.nhwc_data_list = preprocess_image_batch(
                calibration_image_folder, height, width, norm='tf', size_limit=0
            )
        else:
            self.nhwc_data_list = preprocess_random_images(
                height, width, channel
            )

        self.input_name = session.get_inputs()[0].name
        self.datasize = len(self.nhwc_data_list)

        self.enum_data = None  # Enumerator for calibration data

    def get_next(self):
        if self.enum_data is None:
            # Create an iterator that generates input dictionaries
            # with input name and corresponding data
            self.enum_data = iter(
                [{self.input_name: nhwc_data} for nhwc_data in self.nhwc_data_list]
            )
        
        return next(self.enum_data, None)  # Return next item from enumerator

    def rewind(self):
        self.enum_data = None  # Reset the enumeration of calibration dataclass ImageNetDataReader

<div id="quantize">
    <h3> 2.3 Quantize the model using QDQ quantization to int8 weights and activations </h3>
</div>

The following section quantize the float32 onnx model to int8 quantized onnx model after the preprocessing to prepare it to the qunatization by using the ``quantize_static`` function that we recommand to use with calibration data and with the following supported arguments setting.


<table>
<tr>
<th style="text-align: left">Argument</th>
<th style="text-align: left">Description /  CUBE.AI recommendation</th>
</tr>
    
<tr><td style="text-align: left">Quant_format </td>
<td style="text-align: left"> <p> QuantFormat.QDQ format: <strong>recommended</strong>, it quantizes the model by inserting QuantizeLinear/DeQuantizeLinear on the tensor. QOperator format: <strong> not recommended </strong>, it quantizes the model with quantized operators directly </p> </td></tr>
<tr><td style="text-align: left"> Activation type</td> 
<td style="text-align: left"> <p> QuantType.QInt8: <strong>recommended</strong>, it quantizes the activations to int8.  QuantType.QUInt8: <strong>not recommended</strong>, to quantize the activations uint8 </p> </td></tr>  
<tr><td style="text-align: left">Weight_type </td> 
<td style="text-align: left"> <p> QuantType.QInt8: <strong>recommended</strong> , it quantizes the weights to int8.  QuantType.QUInt8: <strong>not recommended</strong>, it quantizes the weights to uint8</p> </td></tr> 
<tr><td style="text-align: left">Per_Channel</td>
<td style="text-align: left"> <p>True: <strong>recommended</strong>, it makes the quantization process is carried out individually and separately for each channel based on the characteristics of the data within that specific channel / False: supported and <strong>not recommended</strong>, the quantization process is carried out for each tensor </p> </td>
</tr>
<tr><td style="text-align: left">ActivationSymmetric</td>
<td style="text-align: left"> <p>`False: <strong>recommended</strong> it makes the activations in the range of [-128  +127]. True: supported, it makes the  activations in the range of [-128  +127] with the zero_point=0 </p> </td>
</tr>
<tr>
<td style="text-align: left">WeightSymmetric</td>
<td style="text-align: left"> <p>True: <strong>Highly recommended</strong>, it makes the weights in the range of [-128  +127] with the zero_point=0.  False: supported and <strong>not recommended</strong>, it makes the weights in the range of [-128  +127]</p> </td>
</tr>
   
</table>



In [None]:
if not quantization_dataset_path is None:
    calibration_dataset_path=create_calibration_dataset(quantization_dataset_path, samples_per_class = 100)
else: calibration_dataset_path= None
# set the data reader pointing to the representative dataset
print('Prepare the data reader for the representative dataset...')
dr = ImageNetDataReader(calibration_dataset_path, input_model) 
print('the data reader is ready')

# preprocess the model to infer shapes of each tensor
infer_model = os.path.splitext(input_model)
infer_model = infer_model[0] + '_infer' + infer_model[1]
print('Infer for the model: {}...'.format(os.path.basename(input_model)))
quantization.quant_pre_process(input_model_path=input_model, output_model_path=infer_model, skip_optimization=False)

# prepare quantized onnx model filename
quant_model = os.path.splitext(input_model)
if not calibration_dataset_path is None:
    quant_model = quant_model[0] + '_QDQ_quant' + quant_model[1]
else:
    quant_model = quant_model[0] + '_QDQ_fakequant' + quant_model[1]
print('Quantize the model {}, please wait...'.format(os.path.basename(input_model)))

quantize_static(
        infer_model,
        quant_model,
        dr,
        calibrate_method=CalibrationMethod.MinMax, 
        quant_format=QuantFormat.QDQ,
        per_channel=True,
        weight_type=QuantType.QInt8, 
        activation_type = QuantType.QInt8, 
        optimize_model=False,
        reduce_range=True,
        extra_options={'WeightSymmetric': True, 'ActivationSymmetric':False, })

now = datetime.now()
current_time = now.strftime("%H:%M:%S")
print(current_time + ' - ' + '{} model has been created'.format(os.path.basename(quant_model)))
quantized_session = onnxruntime.InferenceSession(quant_model)


<div id="validation">
        <h2> 3. Evaluation </h2>
</div>


The bellow code section contains some functions to evaluate the models on the validation dataset.

**Note:** again, the preprocessing of the evaluation dataset in the section bellow is aligned with preprocessing of the trained model, for other models with diffrent preprocessing schema some arguments need to be changed like the ``color_mode``, ``interpolation`` and ``norm`` for the normalization. 

In [None]:
from onnx import ModelProto
from sklearn.metrics import accuracy_score, confusion_matrix, ConfusionMatrixDisplay
import matplotlib.pyplot as plt

TORCH_MEANS = [0.485,0.456,0.406]
TORCH_STD = [0.224, 0.224, 0.224]


def get_preprocessed_image(image_path , height, width, grayscale, color_mode, interpolation, norm):
    img = tf.keras.utils.load_img(image_path, grayscale=grayscale , color_mode = color_mode,
     target_size = (width,height), interpolation=interpolation)
    img_array = np.array([tf.keras.utils.img_to_array(img)])
    if norm.lower() == 'tf':
        img_array = -1 + img_array / 127.5
    elif norm.lower() == 'torch':
        img_array = img_array / 255.0
        img_array = img_array - TORCH_MEANS
        img_array= img_array/ TORCH_STD
    img_array = img_array.transpose((0,3,1,2))
    return img_array

def predict_onnx(sess, data):
    input_name = sess.get_inputs()[0].name
    label_name = sess.get_outputs()[0].name
    onx_pred = sess.run([label_name], {input_name: data.astype(np.float32)})[0]
    return onx_pred

def plot_confusion_matrix(cm, class_labels, model_name, val_accuracy = '_'):
    print(f'confusion_matrix : \n{cm}')
    cm_normalized = [element/sum(row) for element, row in zip([row for row in cm], cm)]
    cm_normalized = np.array(cm_normalized)
    
    plt.figure(figsize = (4,4))
    
    disp = ConfusionMatrixDisplay(cm_normalized)
    disp.plot(cmap = "Blues")
    plt.title(f'Model_accuracy : {val_accuracy}', fontsize = 10)
    plt.tight_layout(pad=3)
    plt.ylabel("True labels")
    plt.xlabel("Predicted labels")
    plt.xticks(np.arange(0, len(class_labels)), class_labels)
    plt.yticks(np.arange(0, len(class_labels)), class_labels)
    plt.savefig(f'{model_name}_confusion-matrix.png')
    plt.rcParams.update({'font.size': 14})
    plt.show()

In [None]:
def evaluate_onnx_model(onnx_model_path, val_dir, model_name, interpolation = 'bilinear'):
    onx = ModelProto()
    with open(onnx_model_path, mode = 'rb') as f:
        content = f.read()
        onx.ParseFromString(content)
    sess = onnxruntime.InferenceSession(onnx_model_path)
    (_, _, img_height, img_width) = sess.get_inputs()[0].shape
    gt_labels = []
    prd_labels = np.empty((0))
    class_labels = sorted(os.listdir(val_dir))
    for i in range(len(class_labels)):
        class_label = class_labels[i]
        
        for file in os.listdir(os.path.join(val_dir, class_label)):
            gt_labels.append(i)
            image_path = os.path.join(val_dir,class_label,file)
            # don't forget to adapt the preprocessing schema
            img = get_preprocessed_image(image_path, width = img_width, height = img_height, 
                                          grayscale = False, color_mode = 'rgb',
                                          interpolation = interpolation, norm='tf')
            # predicting the results on the batch
            pred = predict_onnx(sess, img).argmax(axis = 1)
            prd_labels = np.concatenate((prd_labels, pred))

    val_acc = round(accuracy_score(gt_labels, prd_labels), 6)
    print(f'Evaluation Top 1 accuracy : {val_acc}')
    val_cm = confusion_matrix(gt_labels,prd_labels)

    plot_confusion_matrix(val_cm, class_labels, model_name, val_accuracy = val_acc)
    
    return val_acc, val_cm

**The input_model should be set to the model to be evaluated and the val_set to validation dataset.**


Evaluation of the full precision model: the floating-point numbers are used to represent weights, activations, and computations. This evaluation provides a baseline measure of the model's accuracy in its original form without any quantization applied.

In [None]:
val_set= os.path.join("..\..\image_classification\scripts\training\datasets\person_dataset\val_set")
input_model ="models\\mobilenet_v2_128_0.5.onnx"
evaluate_onnx_model(input_model, val_set, model_name='float_model')

Evaluation of the quantized model with the original dataset: it is evaluated using the same validation dataset to measure its accuracy to see the potential impact of quantization on the model's accuracy.

In [None]:
input_model ="models\\mobilenet_v2_128_0.5_model_QDQ_quant.onnx"
evaluate_onnx_model(input_model, val_set, model_name='int8_model')

Evaluation of the quantized model with the fake data: the ``mobilenet_v2_128_0.5_QDQ_fakequant.onnx`` has to be created by reproducing the same experience with ``data_path`` set to ``NONE``.

In [None]:
input_model="models\\mobilenet_v2_128_0.5_model_QDQ_fakequant.onnx"
evaluate_onnx_model(input_model, val_set, model_name='int8_model_fake_data')



<table>
<tr>
<th style="text-align: left">Metric </th>
<th style="text-align: left">Value</th>

</tr>
<tr>
    
    
<td style="text-align: left">float_model     </td>
<td style="text-align: left"> 0.907203</td>
</tr>
    
<tr>
<td style="text-align: left">quant_model_real_data </td>
<td style="text-align: left">0.9053531</td>
</tr>
    
<tr>
<td style="text-align: left">quant_model_fake_data</td>
<td style="text-align: left">0.725064</td>
</tr>


    
</table>


Based on the given results, it can be concluded that quantization with real data has a negligible impact on the accuracy of the model compared to the float model. However, quantization with fake data leads to a notable decrease in accuracy, because the fake data we use for calibration does not accurately represent the actual distribution or characteristics of the data the model will encounter during inference.  

<div id="benchmark">
        <h2> 4. Benchmarking the model on the STM32Cube.AI Developer Cloud</h2>
</div>

In this section we use the [STM32Cube.AI Developer Cloud](https://stm32ai-cs.st.com/home) to analyze, optimize, benchmark and deploy quantized neural network on a **STM32** target.




<div id="proxy">
        <h3> 4.1 Proxy setting and connection to the STM32Cube.AI Developer Cloud</h3>
</div>

If you are behind a proxy, you can uncomment and fill the following proxy settings.

**NOTE**: If the password contains some special characters like `@`, `:` etc. they need to be url-encoded with their ASCII values.



In [None]:
# import os
# os.environ['http_proxy'] = "http://user:passwd@ip_address:port"
# os.environ['https_proxy'] = "https://user:passwd@ip_address:port"
# And eventually disable SSL verification
# os.environ['NO_SSL_VERIFY'] = "1"

In [None]:
import os
import shutil
import sys
import getpass

import matplotlib.pyplot as plt

# Get STM32Cube.AI Developer Cloud
!gitdir https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/common/stm32ai_dc

# Reorganize local folders
if os.path.exists('./stm32ai_dc'):
    shutil.rmtree('./stm32ai_dc')
shutil.move('./common/stm32ai_dc', './stm32ai_dc')
shutil.rmtree('./common')

In [None]:
sys.path.append(os.path.abspath('stm32ai'))
os.environ['STATS_TYPE'] = 'jupyter_devcloud'

os.makedirs('models', exist_ok=True)
os.makedirs('outputs', exist_ok=True)

from stm32ai_dc import (CliLibraryIde, CliLibrarySerie, CliParameters,
                        CloudBackend, Stm32Ai)
from stm32ai_dc.errors import BenchmarkServerError

Create an account on **myST** and then sign in to [STM32Cube.AI Developer Cloud](https://stm32ai-cs.st.com/home) to be able access the service and then set the environment variables below with your credentials: the mail adress should be set as a string in username and a popup will appear to enter the password.

In [None]:
username ='xxx.yyy@st.com'
os.environ['stmai_username'] = username
print('Enter you password')
password = getpass.getpass()
os.environ['stmai_password'] = password
os.environ['NO_SSL_VERIFY'] = "1"

In [None]:
#Log in STM32Cube.AI Developer Cloud 
try:
    stmai = Stm32Ai(CloudBackend(str(username), str(password)))
    print("Successfully Connected!")
except Exception as e:
    print("ERROR: ", e)

Set the model path you want to conduct the benchmark on. 

In [None]:
#Upload the model on STM32Cube.AI Developer Cloud
model_path = quant_model
stmai.upload_model(model_path)
model_name = os.path.basename(model_path)

<div id="analyze">
        <h3> 4.2 Analyze the model memory footprints</h3>
</div> <br>





<table>
<tr>
<th style="text-align: left">Option</th>
<th style="text-align: left">Description /  CUBE.AI recommendation</th>

</tr>
<tr>
    
    
<td style="text-align: left">model</td>
<td style="text-align: left">model name corresponding to the file name uploaded</td>
</tr>
    
<tr>
<td style="text-align: left">optimization</td>
<td style="text-align: left">optimization setting "balanced", "time" or "ram"</td>
</tr>
    
<tr>
<td style="text-align: left">allocateInputs</td>
<td style="text-align: left"><strong>recommended</strong>, activations buffer will be also used to handle the input buffers.True by default</td>
</tr>
 
<tr>
<td style="text-align: left">allocateOutputs</td>
<td style="text-align: left"><strong>recommended</strong>, activations buffer will be also used to handle the output buffers. True by default</td>
</tr>

<tr>
<td style="text-align: left">relocatable</td>
<td style="text-align: left"><strong>recommended</strong>, to generate a relocatable binary model. '--binary' option can be used to have a separate binary file with only the data of the weight/bias tensors. True by default</td>
</tr>

<tr>
<td style="text-align: left">noOnnxOptimizer</td>
<td style="text-align: left"><strong>not recommended</strong>, allows to disable the ONNX optimizer pass. "False" by default. Apply only to ONNX file will be ignored otherwise</td>
</tr>

<tr>
<td style="text-align: left">noOnnxIoTranspose</td>
<td style="text-align: left"> <strong>recommended only if</strong> the onnx model has already IO transpose layers to make it expect channel last data, allows to avoid adding a specific transpose layer during the import of a ONNX model, "False" by default. Apply only to ONNX file will be ignored otherwise</td>
</tr>
    
</table>


In [None]:
def analyze_footprints(report=None):
    activations_ram = report.ram_size / 1024
    weights_rom = report.rom_size / 1024
    macc = report.macc / 1e6
    print("[INFO] : STM32Cube.AI model memory footprint")
    print("[INFO] : MACCs : {} (M)".format(macc))
    print("[INFO] : Flash Weights  : {0:.1f} (KiB)".format(weights_rom))
    print("[INFO] : RAM Activations : {0:.1f} (KiB)".format(activations_ram))

# Analyze RAM/Flash model memory footprints after optimization by STM32Cube.AI
res_analyse = stmai.analyze(CliParameters(model=model_name, \
                                          optimization='balanced', \
                                          noOnnxIoTranspose=False, \
                                          relocatable=True, \
                                          noOnnxOptimizer=True))

analyze_footprints(report=res_analyse)

<div id="Benchmark">
        <h3> 4.3 Benchmark the model on a STM32 </h3>
</div>

In [None]:
def analyze_inference_time(report=None):
    cycles = report.cycles
    inference_time = report.duration_ms
    fps = 1000.0/inference_time
    print("[INFO] : Number of cycles : {} ".format(cycles))
    print("[INFO] : Inference Time : {0:.1f} (ms)".format(inference_time))
    print("[INFO] : FPS : {0:.1f}".format(fps))
    return fps

try:
  stmai.upload_model(model_path)
  print(f'Model {model_name} is uploaded !')
except Exception as e:
    print("ERROR: ", e)
fps_array=[]
board_name='STM32H747I-DISCO'
result = stmai.benchmark(CliParameters(model=model_name, \
                                           optimization='balanced', \
                                           allocateInputs=True, \
                                           allocateOutputs=True, \
                                           noOnnxIoTranspose=False, \
                                           fromModel=model_name), \
                                           board_name=board_name)
fps = analyze_inference_time(report=result)
fps_array.append(fps)
# Save the result in outputs folder
with open(f'./outputs/{model_name}_{board_name}.txt', 'w') as file_benchmark:
      file_benchmark.write(f'{result}')

<div id="generate">
        <h3> 4.4 Generate the model optimized C code for STM32 </h3>
</div>

Here you generate the specialized network and data C-files to make the model ready to be integrated in the **STM32** application.

In [None]:
board_name='STM32H7'
IDE='gcc'
print(f'{model_name}\ngenerating code for {board_name}')
os.makedirs(f'code_outputs', exist_ok=True)
# Generate model .c/.h code + Lib/Inc on STM32Cube.AI Developer Cloud
result = stmai.generate(CliParameters(model=model_name, \
                                      output="code_outputs", \
                                      optimization='balanced', \
                                      allocateInputs=True, \
                                      allocateOutputs=True, \
                                      noOnnxIoTranspose=False, \
                                      includeLibraryForSerie=CliLibrarySerie(board_name), \
                                      includeLibraryForIde=CliLibraryIde(IDE), \
                                      fromModel=model_name), \
                                      )

print(os.listdir("./code_outputs"))
# print 20 first lines of the report
with open('./code_outputs/network_generate_report.txt', 'r') as f: 
  for _ in range(20): print(next(f))