# Custom Model Compilation and Inference using TVM-NEO-DLR
In this example notebook, we describe how to take a pre-trained classification model and compile it using ***TVM*** compiler to generate deployable artifacts that can be deployed on the target using the ***NEO-AI-DLR*** interface. 
 - Pre-trained model: `resnet18` model trained on ***ImageNet*** dataset. 

In particular, we will show how to
- compile the model (during heterogenous model compilation, layers that are supported will be offloaded to the`TI-DSP`)
- use the generated artifacts for inference
- perform input preprocessing and output postprocessing.
- enable debug logs
- use deny-layer compilation option to isolate possible problematic layers and create additional model subgraphs
- use the generated subgraphs artifacts for inference
- perform input preprocessing and output postprocessing
     
## Neo-AI-DLR based workflow
The diagram below describes the steps for TVM/NEO-AI-DLR based workflow. 

Note: 
- The user needs to compile models(sub-graph creation and quantization) on a PC to generate model artifacts.
- The generated artifacts can then be used to run inference on the target.

<img src=docs/images/tvmrt_work_flow_2.png width="400">

In [None]:
import os
import numpy as np
import onnx
import cv2
import shutil 
from tvm import relay
from tvm.relay.backend.contrib import tidl
from dlr import DLRModel
from pathlib import Path
# import functions from local scripts
from scripts.utils import imagenet_class_to_name, download_model
from IPython.display import Markdown as md
from scripts.utils import loggerWriter
from scripts.utils import get_svg_path

## Load the model in its native framework
The `resnet18v2` model used here is trained using the ***ImageNet*** dataset saved in `/model-zoo`. 
- Note: An ***ONNX*** model has several inputs nodes, which include the weights and biases for the compute layers, as well as the input to the model. Below, we are printing the details of the input node that correspond to the model input. From the printed output, we will gather the `name` and the `shape` of the model input.

In [None]:
onnx_model_path = 'models/public/onnx/resnet18_opset9.onnx'
download_model(onnx_model_path)

In [None]:
onnx_model = onnx.load(onnx_model_path)
print(len(onnx_model.graph.input))
onnx_model.graph.input[0]

In [None]:
# we use the output from the cell above to populate these variables
input_name = 'input.1'
input_shape = (1, 3, 224, 224)

## Convert the model to `Relay IR` format

In [None]:
mod, params = relay.frontend.from_onnx(onnx_model, shape={input_name : input_shape})


## Define utility function to preprocess input images

Below, we define a utility function to preprocess images for `resnet18v2`. This function takes a path as input, loads the image and preprocesses it for generic ***ONNX*** inference. The steps are as follows: 

 1. load image
 2. convert BGR image to RGB
 3. scale image so that the short edge is 256 pixels
 4. center-crop image to 224x224 pixels
 5. apply per-channel pixel scaling and mean subtraction
 6. convert the image to NCHW format


- Note: If you are using a custom model or a model that was trained using a different framework, please remember to define your own utility function. For example, if you are using a model trained using ***Tensorflow***, you might need to use a different set of `mean` and `scale` values for *step 5* above and you might need to  modify *step 6* to convert the image to `NHWC` format.


In [None]:
def preprocess(image_path):
    # read the image using openCV
    img = cv2.imread(image_path)
    
    # convert to RGB
    img = img[:,:,::-1]
    
    # Most of the onnx models are trained using
    # 224x224 images. The general rule of thumb
    # is to scale the input image while preserving
    # the original aspect ratio so that the
    # short edge is 256 pixels, and then
    # center-crop the scaled image to 224x224
    orig_height, orig_width, _ = img.shape
    short_edge = min(img.shape[:2])
    new_height = (orig_height * 256) // short_edge
    new_width = (orig_width * 256) // short_edge
    img = cv2.resize(img, (new_width, new_height), interpolation=cv2.INTER_CUBIC)

    startx = new_width//2 - (224//2)
    starty = new_height//2 - (224//2)
    img = img[starty:starty+224,startx:startx+224]
    
    # apply scaling and mean subtraction.
    # if your model is built with an input
    # normalization layer, then you might
    # need to skip this
    # Mean and scale are dependent on training. The same values used to preprocess while training must be used here
    img = img.astype('float32')
    for mean, scale, ch in zip([123.675, 116.28, 103.53], [0.017125, 0.017507, 0.017429], range(img.shape[2])):
            img[:,:,ch] = ((img.astype('float32')[:,:,ch] - mean) * scale)
     
    # convert HWC to NCHW
    img = np.expand_dims(np.transpose(img, (2,0,1)),axis=0)
    
    return img

## Compile the model
In this step, we convert the `Relay IR` module into deployable artifacts with layers offloaded to `TIDL`. The deployable artifacts and all intermediate files are stored in the `output_dir` defined below.   

- Note: Since `TIDL` uses quantized models for inference, layer outputs must be calibrated by running dummy inferences and collecting quantization statistics. We do this by feeding 4 images from the validation subset of the ***ImageNet*** dataset with appropriate preprocessing. It is mandatory that inputs are preprocessed according to model requirements. 
     
    The script below calls `TIDLCompiler` with the following arguments: 
    * **platform** = 'am68pa' to identify the device 
    * **version** = (7, 3) to identify the version 
    * **tidl_tools_path** = os.getenv('TIDL_TOOLS_PATH'), path to `TIDL` compilation tools 
    * **artifacts_folder** = output_dir, where all intermediate results are stored
    * **tensor_bits** = 8, or 16, is the number of bits to be used for  quantization 
    * **max_num_subgraphs** = 16, the maximum number of `TIDL` subgraphs to split into 
    * **accuracy_level** = 0, for fastest compilation with acceptable drop in accuracy
    * **c7x_codegen** = 0 
    
     
- Note: The path to the `TIDL` compilation tools and `aarch64` `GCC` compiler is required for model compilation, both of which can be accessed by this notebook using predefined environment variables `TIDL_TOOLS_PATH` and `ARM64_GCC_PATH`. The example usage of both the variables is demonstrated in the cell below. 
     
- Note: This model does not require `accuracy_level` greater than `0` and delivers great accuracy with simple quantization and calibration with 4 images. However, some models may require a higher number for `accuracy_level`, in which case, the following changes are recommended** 


In [None]:
calib_input_list = []
output_dir = 'custom-artifacts/tvm-dlr/resnet'

#TARGET Build
build_target = 'llvm -device=arm_cpu -mtriple=aarch64-linux-gnu'
cross_cc_args = {'cc' : os.path.join(os.environ['ARM64_GCC_PATH'], 'bin', 'aarch64-none-linux-gnu-gcc')}

#PC Emulation BUILD
#build_target = 'llvm'
#cross_cc_args = {}

# create the output dir if not preset
# clear the directory
os.makedirs(output_dir, exist_ok=True)
for root, dirs, files in os.walk(output_dir, topdown=False):
    [os.remove(os.path.join(root, f)) for f in files]
    [os.rmdir(os.path.join(root, d)) for d in dirs]
    
# build the list of preprocessed images that will be used for calibration
# calibration images are used for post training quantization 
# For application-specific models, it is recommended to use representative data from your dataset
calib_images = [
'sample-images/elephant.bmp',
'sample-images/bus.bmp',
'sample-images/bicycle.bmp',
'sample-images/zebra.bmp',
]
for filename in calib_images:
    calib_input_list.append({input_name : preprocess(filename)})

### Compilation knobs  (optional - In case of debugging accuracy)
if a model accuracy at 8bits is not good, user's can try compiling same model at 16 bits with accuracy level of 1. This will reduce the performance, but it will give users a good accuracy bar.
As a second step, user can try to increase 8 bits accuracy by increasing the number of calibration frames and iterations, in order to get closer to 16 bits + accuracy level of 1 results.

In [None]:
#compilation options - knobs to tweak 
num_bits =8
accuracy =0

### Layers debug (optional - In case of debugging)
Debug_level 1 gives layer information and warnings/erros which could be useful during debug. User's can see logs from compilation inside a giving path to "loggerWritter" helper function.

Another technique is to use deny_list to exclude layers from running on TIDL and create additional subgraphs, in order to aisolate issues

In [None]:
log_dir = Path("logs").mkdir(parents=True, exist_ok=True)

# stdout and stderr saved to a *.log file.  
with loggerWriter("logs/custon-model-tvm-dlr"):
    # Create the TIDL compiler
    tidl_compiler = tidl.TIDLCompiler(
        os.environ['SOC'],
        (7, 3),
        tidl_tools_path = os.getenv('TIDL_TOOLS_PATH'),
        artifacts_folder = output_dir,
        tensor_bits = num_bits,
        max_num_subgraphs = 16,
        accuracy_level = accuracy,
        advanced_options = { 'calibration_iterations' : 3},
        debug_level = 1,
        c7x_codegen = 0,
        #deny_list = "nn.batch_flatten"  #Comma separated string of operator types as defined by TVM Relay ops. Ex: "nn.batch_flatten"
    )
# partition the graph into TIDL operations and TVM operations
mod, status = tidl_compiler.enable(mod, params, calib_input_list)

# build the relay module into deployables
with tidl.build_config(tidl_compiler=tidl_compiler):
    graph, lib, params = relay.build_module.build(mod, target=build_target, params=params)
tidl.remove_tidl_params(params)

# save the deployables
path_lib = os.path.join(output_dir, 'deploy_lib.so')
path_graph = os.path.join(output_dir, 'deploy_graph.json')
path_params = os.path.join(output_dir, 'deploy_params.params')

lib.export_library(path_lib, **cross_cc_args)
with open(path_graph, "w") as fo:
    fo.write(graph)
with open(path_params, "wb") as fo:
    fo.write(relay.save_param_dict(params))

<div class="alert alert-block alert-info">
<b>Note:</b> Please note 'deny_list' is used in above cell as an example and it can be deleted as "flatten" is a supported layer
</div>

### Subgraphs visualization  (optional - In case of debugging models and subgraps)
Running below cell gives links to complete graph and TIDL subgraphs visualizations. This, along with "deny_list" feature, explained above, offer tools for potencially checking and isolating issues in NN model layers.

In [None]:
subgraph_link =get_svg_path(output_dir) 
for sg in subgraph_link:
    hl_text = os.path.join(*Path(sg).parts[4:])
    sg_rel = os.path.join('../', sg)
    display(md("[{}]({})".format(hl_text,sg_rel)))

## Use compiled model for inference

Then using the ***NEO-AI DLR*** interface we run the model and collect benchmark data.

In [None]:
# use deployed artifacts from the compiled model 
model = DLRModel(output_dir, 'cpu')

# run inference
#Running inference several times to get an stable performance output
for i in range(5):
    res = model.run({input_name : preprocess('sample-images/elephant.bmp')})

from scripts.utils import imagenet_class_to_name
import matplotlib.pyplot as plt

# get the TOP-5 class IDs by argsort()
# and use utility function to get names
output = res[0].squeeze()
classes = output.argsort()[-5:][::-1]
print([imagenet_class_to_name(x)[0] for x in classes])

# collect benchmark data 
from scripts.utils import plot_TI_performance_data, plot_TI_DDRBW_data, get_benchmark_output
stats = model.get_TI_benchmark_data()
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(10,5))
plot_TI_performance_data(stats, axis=ax)
plt.show()

tt, st, rb, wb = get_benchmark_output(stats)
print(f'Statistics : \n Inferences Per Second   : {1000.0/tt :7.2f} fps')
print(f' Inference Time Per Image : {tt :7.2f} ms  \n DDR BW Per Image        : {rb+ wb : 7.2f} MB')

## EVM's console logs (optional - in case of inference failure)

To copy console logs from EVM to TI EdgeAI Cloud user's workspace, go to: "Help -> Troubleshooting -> EVM console log", In TI's EdgeAI Cloud landing page.

Alternatevely, from workspace, open/run evm-console-log.ipynb