# ONNX to TF-Lite Model Conversion

### Configure Paths

You can update these paths as necessary for your particular `.onnx` model. Change the value of `ONNX_MODEL_PATH` to the path of your `.onnx` model file.

``` python
ONNX_MODEL_PATH = <path to your onnx model>
```

Change the value of `WORKING_DIR` to the path of the directory where you want to save conversion results.

``` python
WORKING_DIR = <path to your working directory>
```

__NOTE:__ You can view the contents of the `.onnx` model file by dragging and dropping onto the webapge: [netron.app](https://netron.app/)

In [None]:
import os
from mltk.utils.path import create_tempdir

# This contains the path to the pre-trained model in ONNX model format
# For this tutorial, we use the one downloaded from above
# Update this path to point to your specific model if necessary
ONNX_MODEL_PATH = os.path.join('../models/onnx/version-RFB-320_without_postprocessing.onnx')
assert os.path.exists(ONNX_MODEL_PATH), f'The provided ONNX_MODEL_PATH does not exist at: {ONNX_MODEL_PATH}'

# This contains the path to our working directory where all
# generated, intermediate files will be stored.
# For this tutorial, we use a temp directory.
# Update as necessary for your setup
WORKING_DIR = os.path.join('../models/results')
if not os.path.exists(WORKING_DIR):
    os.makedirs(WORKING_DIR, exist_ok=True)

# Use the filename for the model's name
MODEL_NAME = os.path.basename(ONNX_MODEL_PATH)[:-len('.onnx')]

print(f'ONNX_MODEL_PATH = {ONNX_MODEL_PATH}')
print(f'MODEL_NAME = {MODEL_NAME}')
print(f'WORKING_DIR = {WORKING_DIR}')

ONNX_MODEL_PATH = models/version-RFB-320_without_postprocessing.onnx
MODEL_NAME = version-RFB-320_without_postprocessing
WORKING_DIR = /tmp/root/mltk/ultra_light_onnx_to_tflite


### Simplify the ONNX model

While optional, this step can help reduce the complexity of the ONNX 
by using the [ONNX Simplifier](https://github.com/daquexian/onnx-simplifier) Python package.

This can help reduce the execution overhead on the embedded device.

__NOTE:__ You can view the contents of the generated `.onnx` model file by dragging and dropping onto the webapge: [netron.app](https://netron.app/)

In [None]:
import onnxsim
import onnx

simplified_onnx_model, success = onnxsim.simplify(ONNX_MODEL_PATH)
assert success, 'Failed to simplify the ONNX model. You may have to skip this step'
simplified_onnx_model_path =  f'{WORKING_DIR}/{MODEL_NAME}.simplified.onnx'

print(f'Generating {simplified_onnx_model_path} ...')
onnx.save(simplified_onnx_model, simplified_onnx_model_path)
print('done')

Generating /tmp/root/mltk/ultra_light_onnx_to_tflite/version-RFB-320_without_postprocessing.simplified.onnx ...
done


### Convert to OpenVino Intermediate Format

Recall that the ONNX format uses the `NCHW` format while TF-Lite uses the `NHWC` format to store the model tensors.  
While doable, converting from one format to the other is non-trivial. As such, additional steps are required to do the conversion.

The first step is converting the `.onnx` model to the [OpenVino](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/overview.html) intermediate format.  
This is done using the tools installed by the [openvino_dev](https://pypi.org/project/openvino-dev/) Python package.

In [None]:
import sys
import os

# Import the model optimizer tool from the openvino_dev package
from openvino.tools.mo import main as mo_main
import onnx
from onnx_tf.backend import prepare
from mltk.utils.shell_cmd import run_shell_cmd

# Load the ONNX model
onnx_model = onnx.load(ONNX_MODEL_PATH)
tf_rep = prepare(onnx_model)

# Get the input tensor shape
input_tensor = tf_rep.signatures[tf_rep.inputs[0]]
input_shape = input_tensor.shape
input_shape_str = '[' + ','.join([str(x) for x in input_shape]) + ']'


openvino_out_dir = f'{WORKING_DIR}/openvino'
os.makedirs(openvino_out_dir, exist_ok=True)


print(f'Generating openvino at: {openvino_out_dir}')
cmd = [ 
    sys.executable, mo_main.__file__, 
    '--input_model', simplified_onnx_model_path,
    '--input_shape', input_shape_str,
    '--output_dir', openvino_out_dir,
    '--data_type', 'FP32'

]
retcode, retmsg = run_shell_cmd(cmd,  outfile=sys.stdout)
assert retcode == 0, 'Failed to do conversion' 

Generating openvino at: /tmp/root/mltk/ultra_light_onnx_to_tflite/openvino
[ INFO ] The model was converted to IR v11, the latest model format that corresponds to the source DL framework input/output format. While IR v11 is backwards compatible with OpenVINO Inference Engine API v1.0, please use API v2.0 (as of 2022.1) to take advantage of the latest improvements in IR v11.
Find more information about API v2.0 and IR v11 at https://docs.openvino.ai/latest/openvino_2_0_transition_guide.html
[ SUCCESS ] Generated IR version 11 model.
[ SUCCESS ] XML file: /tmp/root/mltk/ultra_light_onnx_to_tflite/openvino/version-RFB-320_without_postprocessing.simplified.xml
[ SUCCESS ] BIN file: /tmp/root/mltk/ultra_light_onnx_to_tflite/openvino/version-RFB-320_without_postprocessing.simplified.bin


### Convert from OpenVino to TF-Lite-Float32 

Next, we use the [openvino2tensorflow](https://github.com/PINTO0309/openvino2tensorflow) Python package to convert from the OpenVino intermediate format to a `.tflite` model file.  
The generated model file has all of its weights and tensors in the __float32__ data type.

__NOTE:__ You can view the contents of the `.tflite` model file by dragging and dropping onto the webapge: [netron.app](https://netron.app/)

In [None]:
import os 
from mltk.utils.shell_cmd import run_shell_cmd

openvino2tensorflow_out_dir = f'{WORKING_DIR}/openvino2tensorflow'
openvino_xml_name = os.path.basename(simplified_onnx_model_path)[:-len('.onnx')] + '.xml'


if os.name == 'nt':
  openvino2tensorflow_exe_cmd = [sys.executable, os.path.join(os.path.dirname(sys.executable), 'openvino2tensorflow')]
else:
  openvino2tensorflow_exe_cmd = ['openvino2tensorflow']

print(f'Generating openvino2tensorflow model at: {openvino2tensorflow_out_dir} ...')
cmd = openvino2tensorflow_exe_cmd + [ 
    '--model_path', f'{openvino_out_dir}/{openvino_xml_name}',
    '--model_output_path', openvino2tensorflow_out_dir,
    '--output_saved_model',
    '--output_no_quant_float32_tflite'
]

retcode, retmsg = run_shell_cmd(cmd)
assert retcode == 0, retmsg
print('done')

Generating openvino2tensorflow model at: /tmp/root/mltk/ultra_light_onnx_to_tflite/openvino2tensorflow ...
done


### Prepare Representative Dataset

For full integer quantization, you need to calibrate or estimate the range, i.e, (min, max) of all floating-point tensors in the model. Unlike constant tensors such as weights and biases, variable tensors such as model input, activations (outputs of intermediate layers) and model output cannot be calibrated unless we run a few inference cycles. As a result, the converter requires a representative dataset to calibrate them. 

This dataset can be a small subset (around ~100-500 samples) of the training or validation data. Refer to the representative_dataset() function below. Perform the following

1. Download the dataset from this link: https://drive.google.com/file/d/1Db5Y_vdFzhqie-ibn_PXxbFkysslcE_6/view?usp=sharing (this is reduced to only contain test images and labels in widerface dataset)
2. Unzip the file into folder `data` in the root directory if this project

In [None]:
from pathlib import Path
import cv2

def preprocess_image(img):
    img_resize = cv2.resize(img, (320, 240))
    img_resize = cv2.cvtColor(img_resize, cv2.COLOR_BGR2RGB)
    img_resize = img_resize - 127.0
    img_resize = img_resize / 128.0
    img_resize = np.float32(np.expand_dims(img_resize, axis=0))

    return img_resize


def representative_dataset_generator():
    folder = Path('data/wider_face_add_lm_10_10/JPEGImages')

    i = 0
    for p in folder.iterdir():
        if p.is_dir():
            continue
        
        if i > 1000:
            break

        img = cv2.imread(str(p))
        i += 1
        yield [preprocess_image(img)]

### Quantize the TF-Lite Model

The final conversion step is converting the `.tflite` model file which has __float32__ tensors into a `.tflite` model file that has __int8__ tensors.
A model with __int8__ tensors executes much more efficiently on an embedded device and also reduces the memory requirements by a factor of 4.

This conversion process is called [Post-Training Quantization](https://www.tensorflow.org/lite/performance/post_training_quantization).  
To do the conversion, we use the [TfliteConverter](https://www.tensorflow.org/lite/convert) that comes with Tensorflow.

To do the quantization, we need a _representative dataset_. 

__NOTE:__ You can view the contents of the quantized `.tflite` model file by dragging and dropping onto the webapge: [netron.app](https://netron.app/)

In [None]:
import tensorflow as tf 
import numpy as np

tflite_int8_model_path = f'{WORKING_DIR}/{MODEL_NAME}.int8.tflite'

converter = tf.lite.TFLiteConverter.from_saved_model(openvino2tensorflow_out_dir)

converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8] # We only want to use int8 kernels
converter.inference_input_type = tf.float32 # Can also be tf.int8
converter.inference_output_type = tf.float32  # Can also be tf.int8
converter.representative_dataset = representative_dataset_generator

print(f'Generating {tflite_int8_model_path} ...')
tflite_quant_model = converter.convert()

with open(tflite_int8_model_path, 'wb') as f:
    f.write(tflite_quant_model)

print('done')

Generating /tmp/root/mltk/ultra_light_onnx_to_tflite/version-RFB-320_without_postprocessing.int8.tflite ...
done


## Profile the Quantized Model

Now that we have converted the `.onnx` model to a quantized `.tflite` model, let's profile it to see if it can run on an embedded target.

For this, we use the [Model Profiler](https://siliconlabs.github.io/mltk/docs/guides/model_profiler.html) that comes with the MLTK.

In [9]:
from mltk.core import profile_model

results = profile_model(
    tflite_int8_model_path,
    accelerator='mvp' # Optional profile using the MVP hardware accelerator
)
print(results)

Profiling model in simulator ...
Profiling Summary
Name: version-RFB-320_without_postprocessing_int8
Accelerator: mvp
Input Shape: 1x240x320x3
Input Data Type: float32
Output Shape: 1x4420x2,1x4420x4
Output Data Type: float32,float32
Flash, Model File Size (bytes): 411.2k
RAM, Runtime Memory Size (bytes): 1.3M
Operation Count: 223.6M
Multiply-Accumulate Count: 100.4M
Layer Count: 92
Unsupported Layer Count: 21
Accelerator Cycle Count: 15.5M

Model Layers
+-------+-------------------+--------+--------+------------+----------------------------------+--------------+-------------------------------------------------------+------------+---------------------------------------------------------------------+
| Index | OpCode            | # Ops  | # MACs | Acc Cycles | Input Shape                      | Output Shape | Options                                               | Supported? | Error Msg                                                           |
+-------+-------------------+--------+---