# DEBUG: Inference performance comparison                                                                                                             
#  ONNX Runtime - Performance CPU EP vs. Heterogenous execution
In this example notebook, we compare ***ONNX Runtime*** inference performance of a pre-trained Classification model running on CPU Execution Provider (Cortex A) vs. the same model running in an heterogenous approach (Cortex A + TIDL offload)

   - The user can choose the model (see section titled *Choosing a Pre-Compiled Model*)
   - The models used in this example were trained on the ***ImageNet*** dataset because it is a widely used dataset developed for training and benchmarking image classification AI models. 
   - We perform inference on one image.
   
## Choosing a Pre-Compiled Model
We provide a set of precompiled artifacts to use with this notebook that will appear as a drop-down list once the first code cell is executed.

<img src=docs/images/drop_down.PNG width="400">

***Note:*** Users can run this notebook as-is, only action required is to select a model.

In [None]:
import os
import cv2
import numpy as np
import ipywidgets as widgets
from scripts.utils import get_eval_configs
#grab a set of model configurations locally defined in a script
last_artifacts_id = selected_model_id.value if "selected_model_id" in locals() else None
prebuilt_configs, selected_model_id = get_eval_configs('classification','onnxrt', num_quant_bits = 8, last_artifacts_id = last_artifacts_id)
display(selected_model_id)

In [None]:
print(f'Selected Model: {selected_model_id.label}')
config = prebuilt_configs[selected_model_id.value]
config['session'].set_param('model_id', selected_model_id.value)
config['session'].start()

## Define utility function to preprocess input images

Below, we define a utility function to preprocess images for the model. This function takes a path as input, loads the image and preprocesses the images as required by the model. The steps below are shown as a reference (no user action required):

 1. Load image
 2. Convert BGR image to RGB
 3. Scale image
 4. Apply per-channel pixel scaling and mean subtraction
 5. Convert RGB Image to BGR. 
 6. Convert the image to NCHW format


- The input arguments of this utility function is selected automatically by this notebook based on the model selected in the drop-down

In [None]:
def preprocess(image_path, size, mean, scale, layout, reverse_channels):
    # Step 1 - read image
    img = cv2.imread(image_path)
    
    # Step 2 - Flip from BGR to RGB
    img = img[:,:,::-1]
    
    # Step 3 -- resize to match model input dimensions 
    img = cv2.resize(img, (size, size), interpolation=cv2.INTER_CUBIC)
     
    # Step 4 - subtract a mean and multiply a scale to match model's expected data distributions
    if mean is not None and scale is not None:   
        img = img.astype('float32')
        for mean, scale, ch in zip(mean, scale, range(img.shape[2])):
            img[:,:,ch] = ((img.astype('float32')[:,:,ch] - mean) * scale)
    # Step 5 - If needed, flip back to BGR
    if reverse_channels:
        img = img[:,:,::-1]
        
    # Step 6 -- Reorder tensor dimensions as NCHW (number, channel, height, width) or NHWC
    if layout == 'NCHW':
        img = np.expand_dims(np.transpose(img, (2,0,1)),axis=0)
    else:
        img = np.expand_dims(img,axis=0)
    
    return img

In [None]:
from scripts.utils import get_preproc_props
size, mean, scale, layout, reverse_channels = get_preproc_props(config)    
print(f'Image size: {size}')

## Load and Run a model on CPU Execution Provider 

ONNX runtime will execute the model on Cortex A72 and collect benchmark data.

<div class="alert alert-block alert-warning">
<b>Warning:</b> It is recommended to use the ONNX Runtime APIs in the cells below without any modifications.
</div>

In [None]:
import onnxruntime as rt

onnx_model_path = config['session'].get_param('model_file')
so = rt.SessionOptions()

EP_list = ['CPUExecutionProvider']
sess = rt.InferenceSession(onnx_model_path ,providers=EP_list, sess_options=so)

input_details = sess.get_inputs()

In [None]:
from scripts.utils import imagenet_class_to_name
import matplotlib.pyplot as plt

img_in = preprocess('sample-images/elephant.bmp' , size, mean, scale, layout, reverse_channels) 

if not input_details[0].type == 'tensor(float)':
    img_in = np.uint8(img_in)

#Running inference several times to get an stable performance output
for i in range(5):
    output = list(sess.run(None, {input_details[0].name: img_in}))[0]

print(f'\nTop three results:')
for idx, cls in enumerate(output[0].squeeze().argsort()[-3:][::-1]):
    print('[%d] %s' % (idx, '/'.join(imagenet_class_to_name(cls))))
    
from scripts.utils import plot_TI_performance_data, plot_TI_DDRBW_data, get_benchmark_output
print(f'\nPerformance CPU EP')
stats = sess.get_TI_benchmark_data()
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(10,5))
plot_TI_performance_data(stats, axis=ax)
plt.show()
tt, st, rb, wb = get_benchmark_output(stats)

print(f'Statistics : \n Inferences Per Second   : {1000.0/tt :7.2f} fps')
print(f' Inferece Time Per Image : {tt :7.2f} ms  \n DDR BW Per Image        : {rb+ wb : 7.2f} MB')

## Load and Run a model on Heterogenous mode 

ONNX runtime, in heterogenous mode, execute the model on Cortex A, offload graphs to TIDL using ***`libtidl_onnxrt_EP`*** inference library, and collect benchmark data.

<div class="alert alert-block alert-warning">
<b>Warning:</b> It is recommended to use the ONNX Runtime APIs in the cells below without any modifications.
</div>

In [None]:
onnx_model_path = config['session'].get_param('model_file')
delegate_options = {}
so = rt.SessionOptions()
delegate_options['artifacts_folder'] = config['session'].get_param('artifacts_folder')

#TIDLExecutionProvider uses libtidl_onnxrt_EP inference library for offloading graphs to TIDL
EP_list = ['TIDLExecutionProvider','CPUExecutionProvider']
sess = rt.InferenceSession(onnx_model_path ,providers=EP_list, provider_options=[delegate_options, {}], sess_options=so)

input_details = sess.get_inputs()

In [None]:
img_in = preprocess('sample-images/elephant.bmp' , size, mean, scale, layout, reverse_channels) 

if not input_details[0].type == 'tensor(float)':
    img_in = np.uint8(img_in)

#Running inference several times to get an stable performance output
for i in range(5):    
    output = list(sess.run(None, {input_details[0].name: img_in}))[0]

print(f'\nTop three results:')
for idx, cls in enumerate(output[0].squeeze().argsort()[-3:][::-1]):
    print('[%d] %s' % (idx, '/'.join(imagenet_class_to_name(cls))))
    
from scripts.utils import plot_TI_performance_data, plot_TI_DDRBW_data, get_benchmark_output
print(f'\nPerformance CPU EP + TIDL EP ')
stats = sess.get_TI_benchmark_data()
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(10,5))
plot_TI_performance_data(stats, axis=ax)
plt.show()
tt, st, rb, wb = get_benchmark_output(stats)

print(f'Statistics : \n Inferences Per Second   : {1000.0/tt :7.2f} fps')
print(f' Inferece Time Per Image : {tt :7.2f} ms  \n DDR BW Per Image        : {rb+ wb : 7.2f} MB')

## Final notes

- With this notebook, user's can quickly compare FPS when running their models only on Cortex A** vs. running their models in heterogenous mode.
- If in Heterogenous mode a model's accuracy, or output, is wrong, a quick sanity check is to run the same model only on Cortex A**
- Accuracy can be improved by modifying TIDL compilation options. For additional tips you can check "run and compare a model compiled with different compilation option" inside debug_tips notebook