# Custom Model Compilation and Inference using Onnx runtime 

In this example notebook, we describe how to take a pre-trained classification model and compile it using ***Onnx runtime*** to generate deployable artifacts that can be deployed on the target using the ***Onnx*** interface. 
 
 - Pre-trained model: `resnet18v2` model trained on ***ImageNet*** dataset using ***Onnx***  
 
In particular, we will show how to
- compile the model (during heterogenous model compilation, layers that are supported will be offloaded to the`TI-DSP` and artifacts needed for inference are generated)
- use the generated artifacts for inference
- perform input preprocessing and output postprocessing
    
## Onnx Runtime based work flow

The diagram below describes the steps for Onnx Runtime based work flow. 

Note:
 - The user needs to compile models(sub-graph creation and quantization) on a PC to generate model artifacts.
 - The generated artifacts can then be used to run inference on the target.

<img src=docs/images/onnx_work_flow_2.png width="400">

In [None]:
import os
import tqdm
import cv2
import numpy as np
import onnxruntime as rt
from scripts.utils import imagenet_class_to_name, download_model
import matplotlib.pyplot as plt

## Define utility function to preprocess input images
Below, we define a utility function to preprocess images for `resnet18v2`. This function takes a path as input, loads the image and preprocesses it for generic ***Onnx*** inference. The steps are as follows: 

 1. load image
 2. convert BGR image to RGB
 3. scale image so that the short edge is 256 pixels
 4. center-crop image to 224x224 pixels
 5. apply per-channel pixel scaling and mean subtraction


- Note: If you are using a custom model or a model that was trained using a different framework, please remember to define your own utility function.

In [None]:
def preprocess_for_onnx_resent18v2(image_path):
    
    # read the image using openCV
    img = cv2.imread(image_path)
    
    # convert to RGB
    img = img[:,:,::-1]
    
    # Most of the onnx models are trained using
    # 224x224 images. The general rule of thumb
    # is to scale the input image while preserving
    # the original aspect ratio so that the
    # short edge is 256 pixels, and then
    # center-crop the scaled image to 224x224
    orig_height, orig_width, _ = img.shape
    short_edge = min(img.shape[:2])
    new_height = (orig_height * 256) // short_edge
    new_width = (orig_width * 256) // short_edge
    img = cv2.resize(img, (new_width, new_height), interpolation=cv2.INTER_CUBIC)

    startx = new_width//2 - (224//2)
    starty = new_height//2 - (224//2)
    img = img[starty:starty+224,startx:startx+224]
    
    # apply scaling and mean subtraction.
    # if your model is built with an input
    # normalization layer, then you might
    # need to skip this
    img = img.astype('float32')
    for mean, scale, ch in zip([128, 128, 128], [0.0078125, 0.0078125, 0.0078125], range(img.shape[2])):
            img[:,:,ch] = ((img.astype('float32')[:,:,ch] - mean) * scale)
    img = np.expand_dims(img,axis=0)
    img = np.transpose(img, (0, 3, 1, 2))
    
    return img

## Compile the model
In this step, we create Onnx runtime with `tidl_model_import_onnx` library to generate artifacts that offload supported portion of the DL model to the TI DSP.
 - `sess` is created with the options below to calibrate the model for 8-bit fixed point inference
   
    * **artifacts_folder** - folder where all the compilation artifacts needed for inference are stored 
    * **tidl_tools_path** - os.getenv('TIDL_TOOLS_PATH'), path to `TIDL` compilation tools 
    * **tensor_bits** - 8 or 16, is the number of bits to be used for quantization 
    * **advanced_options:calibration_frames**  - number of images to be used for calibration
     
    ``` 
    compile_options = {
        'tidl_tools_path' : os.environ['TIDL_TOOLS_PATH'],
        'artifacts_folder' : output_dir,
        'tensor_bits' : 16,
        'accuracy_level' : 0,
        'advanced_options:calibration_frames' : len(calib_images), 
        'advanced_options:calibration_iterations' : 3 # used if accuracy_level = 1
    }
    ``` 
    
- Note: The path to `TIDL` compilation tools and `aarch64` `GCC` compiler is required for model compilation, both of which are accessed by this notebook using predefined environment variables `TIDL_TOOLS_PATH` and `ARM64_GCC_PATH`. The example usage of both the variables is demonstrated in the cell below. 
- `accuracy_level` is set to 0 in this example. For better accuracy, set `accuracy_level = 1`. This option results in more time for compilation but better inference accuracy. 
Compilation status log for accuracy_level = 1 is currently not implemented in this notebook. This will be added in future versions. 
- Please refer to TIDL user guide for further advanced options.

In [None]:
output_dir = 'custom-artifacts/onnx/resnet18v2'
onnx_model_path = '../../models/public/onnx/resnet18_opset9.onnx'
download_model(onnx_model_path)

In [None]:
calib_images = [
'sample-images/elephant.bmp',
'sample-images/bus.bmp',
'sample-images/bicycle.bmp',
'sample-images/zebra.bmp',
]

compile_options = {
    'tidl_tools_path' : os.environ['TIDL_TOOLS_PATH'],
    'artifacts_folder' : output_dir,
    'tensor_bits' : 8,
    'accuracy_level' : 1,
    'advanced_options:calibration_frames' : len(calib_images), 
    'advanced_options:calibration_iterations' : 3 # used if accuracy_level = 1
}

# create the output dir if not present
# clear the directory
os.makedirs(output_dir, exist_ok=True)
for root, dirs, files in os.walk(output_dir, topdown=False):
    [os.remove(os.path.join(root, f)) for f in files]
    [os.rmdir(os.path.join(root, d)) for d in dirs]

so = rt.SessionOptions()
EP_list = ['TIDLCompilationProvider','CPUExecutionProvider']
sess = rt.InferenceSession(onnx_model_path ,providers=EP_list, provider_options=[compile_options, {}], sess_options=so)

input_details = sess.get_inputs()

for num in tqdm.trange(len(calib_images)):
    output = list(sess.run(None, {input_details[0].name : preprocess_for_onnx_resent18v2(calib_images[num])}))[0]


## Use compiled model for inference
Then using ***Onnx*** with the ***`libtidl_onnxrt_EP`*** inference library we run the model and collect benchmark data.

In [None]:
EP_list = ['TIDLExecutionProvider','CPUExecutionProvider']

sess = rt.InferenceSession(onnx_model_path ,providers=EP_list, provider_options=[compile_options, {}], sess_options=so)
#Running inference several times to get an stable performance output
for i in range(5):
    output = list(sess.run(None, {input_details[0].name : preprocess_for_onnx_resent18v2('sample-images/elephant.bmp')}))

for idx, cls in enumerate(output[0].squeeze().argsort()[-5:][::-1]):
    print('[%d] %s' % (idx, '/'.join(imagenet_class_to_name(cls))))
    
from scripts.utils import plot_TI_performance_data, plot_TI_DDRBW_data, get_benchmark_output
stats = sess.get_TI_benchmark_data()
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(10,5))
plot_TI_performance_data(stats, axis=ax)
plt.show()

tt, st, rb, wb = get_benchmark_output(stats)
print(f'Statistics : \n Inferences Per Second   : {1000.0/tt :7.2f} fps')
print(f' Inferece Time Per Image : {tt :7.2f} ms  \n DDR BW Per Image        : {rb+ wb : 7.2f} MB')