# Custom Model Compilation and Inference using Tensorflow lite runtime

In this example notebook, we describe how to take a pre-trained classification model and compile it using ***TF Lite runtime*** to generate deployable artifacts that can be deployed on the target using the ***TF Lite*** interface. 
 
 - Pre-trained model: `mobilenetv1` model trained on ***ImageNet*** dataset using ***Tensorflow***  
 
In particular, we will show how to
- compile the model (during heterogenous model compilation, layers that are supported will be offloaded to the`TI-DSP` and artifacts needed for inference are generated)
- enable debug logs
- use deny-layer compilation option to isolate possible problematic layers and create additional model subgraphs
- use the generated subgraphs artifacts for inference
- perform input preprocessing and output postprocessing

    
## Tensorflow Lite Runtime based work flow

The diagram below describes the steps for Tensorflow Lite Runtime based work flow. 

Note:
 - The user needs to compile models(sub-graph creation and quantization) on a PC to generate model artifacts.
 - The generated artifacts can then be used to run inference on the target.

<img src=docs/images/osrt_user_workflow.png width="400">


In [None]:
import os
import tqdm
import cv2
import numpy as np
import tflite_runtime.interpreter as tflite
import shutil 
from pathlib import Path
from IPython.display import Markdown as md
# import functions from local scripts
from scripts.utils import imagenet_class_to_name, download_model
from scripts.utils import loggerWriter
from scripts.utils import get_svg_path

## Set the model to evaluate and images to use for fixed-point calibration
We will set the model file to be used. If this is recognized as a TI model, we will download it from our zoo. For custom models, users will need to ensure that file is present on the local filesystem

A set of calibration images are used to find an appropriate quantization so the floating point model can run on a fixed-point accelerator. For a custom-trained model, it is best to use representative data from the data set.


In [None]:
output_dir = 'custom-artifacts/tflite/mobilenetv1'
tflite_model_path = 'models/public/tflite/mobilenet_v1_1.0_224.tflite'
download_model(tflite_model_path)

# For highly application-specific models, it is recommended to use representative data from your dataset
# calibration images are used for post training quantization 
calib_images = [
'sample-images/elephant.bmp',
'sample-images/bus.bmp',
'sample-images/bicycle.bmp',
'sample-images/zebra.bmp',
]

## Define utility function to preprocess input images
Below, we define a utility function to preprocess images for `mobilenetv1`. This function takes a path as input, loads the image and preprocesses it for generic ***TFLite*** inference. The steps are as follows: 

 1. load image
 2. convert BGR image to RGB
 3. scale image so that the short edge is 256 pixels
 4. center-crop image to 224x224 pixels
 5. apply per-channel pixel scaling and mean subtraction


- Note: If you are using a custom model or a model that was trained using a different framework, please remember to define your own utility function.


In [None]:
def preprocess(image_path):
    # read the image using openCV
    img = cv2.imread(image_path)
    
    # convert to RGB
    img = img[:,:,::-1]
    
    # Most of the tflite models are trained using
    # 224x224 images. The general rule of thumb
    # is to scale the input image while preserving
    # the original aspect ratio so that the
    # short edge is 256 pixels, and then
    # center-crop the scaled image to 224x224
    orig_height, orig_width, _ = img.shape
    short_edge = min(img.shape[:2])
    new_height = (orig_height * 256) // short_edge
    new_width = (orig_width * 256) // short_edge
    img = cv2.resize(img, (new_width, new_height), interpolation=cv2.INTER_CUBIC)

    startx = new_width//2 - (224//2)
    starty = new_height//2 - (224//2)
    img = img[starty:starty+224,startx:startx+224]
    
    # apply scaling and mean subtraction.
    # if your model is built with an input
    # normalization layer, then you might
    # need to skip this
    img = img.astype('float32')
    # mean and scale are dependent on training. The same values used to preprocess while training must be used here
    for mean, scale, ch in zip([128, 128, 128], [0.0078125, 0.0078125, 0.0078125], range(img.shape[2])):
            img[:,:,ch] = ((img.astype('float32')[:,:,ch] - mean) * scale)
    img = np.expand_dims(img,axis=0)
    
    return img

## Compile the model
In this step, we create TFLite runtime with `tidl_model_import_tflite` delegate library to generate artifacts that offload supported portion of the DL model to the TI DSP.
 - `tidl_delegate` is created with the options below to calibrate the model for 8-bit fixed point inference
   
    * **tidl_tools_path** - os.getenv('TIDL_TOOLS_PATH'), path to `TIDL` compilation tools 
    * **artifacts_folder** - folder where all the compilation artifacts needed for inference are stored 
    * **tensor_bits** - 8 or 16, is the number of bits to be used for quantization 
    * **accuracy_level** - 1 or 0, the desired accuracy with quantized model
    * **advanced_options:calibration_frames** - number of images to be used for calibration
    * **advanced_options:calibration_iterations** - number of iterations for advanced calibration
    * **debug_level** - 0 -> no debug, 1 -> rt debug prints, >=2 -> increasing levels of debug and trace dump
    * **deny_list** force disable offload of a particular operator to TIDL. 
    
- Note: The path to `TIDL` compilation tools and `aarch64` `GCC` compiler is required for model compilation, both of which are accessed by this notebook using predefined environment variables `TIDL_TOOLS_PATH` and `ARM64_GCC_PATH`. The example usage of both the variables is demonstrated in the cell below. 
- Please refer to TIDL user guide and the edgeai-tidl-tools repository documentation for further advanced options.

### Layers debug (optional - In case of debugging)
Debug_level 1 gives layer information and warnings/errors which could be useful during debug. User's can see logs from compilation inside a giving path to "loggerWriter" helper function.

Another technique is to use deny_list to exclude layers from running on TIDL and create additional subgraphs, in order to aisolate issues.

### Compilation knobs  (optional - In case of debugging accuracy)
if a model accuracy at 8bits is not good, user's can try compiling same model at 16 bits with accuracy level of 1. This will reduce the performance, but it will give users a good accuracy bar.
As a second step, user can try to increase 8 bits accuracy by increasing the number of calibration frames and iterations, in order to get closer to 16 bits + accuracy level of 1 results.

In [None]:
log_dir = Path("logs").mkdir(parents=True, exist_ok=True)

# debug level -- use 1 or 2 for increased verbosity in the error messages. 
# See log files to view all printed messages
debug_level=0 

#compilation options - knobs to tweak 
num_bits =8
accuracy =1

# stdout and stderr saved to a *.log file.  
with loggerWriter("logs/custon-model-tfl"):
    
# model compilation options
    compile_options = {
        'tidl_tools_path' : os.environ['TIDL_TOOLS_PATH'],
        'artifacts_folder' : output_dir,
        'tensor_bits' : num_bits,
        'accuracy_level' : accuracy,
        'debug_level' : debug_level,
        'advanced_options:calibration_frames' : len(calib_images),
        'advanced_options:calibration_iterations' : 3,
        'advanced_options:add_data_convert_ops' : 1,
        #'deny_list' : 1, #For details of TFLite builtin ops please refer: https://github.com/tensorflow/tensorflow/blob/r2.3/tensorflow/lite/builtin_ops.h
    }



<div class="alert alert-block alert-info">
<b>Note:</b> Please note 'deny_list' is used in above cell as an example and it can be deleted as "AveragePool2d" is a supported layers
</div>

In [None]:
# create the output dir if not preset
# clear the directory
os.makedirs(output_dir, exist_ok=True)
for root, dirs, files in os.walk(output_dir, topdown=False):
    [os.remove(os.path.join(root, f)) for f in files]
    [os.rmdir(os.path.join(root, d)) for d in dirs]

In [None]:
# Run the compilation itself
# Load the TIDL delegate for model import/compilation in TFLite
# Also set the compile_options defined above to set parameters for the compilation process
tidl_delegate = [tflite.load_delegate(os.path.join(os.environ['TIDL_TOOLS_PATH'], 'tidl_model_import_tflite.so'), compile_options)]

# Create a new runtime interpreter for your model targeting the TIDL import delegate
# When this call runs, compilation will begin but not complete because it is waiting for calibration data
interpreter = tflite.Interpreter(model_path=tflite_model_path, experimental_delegates=tidl_delegate)

# Typical TFL API calls prior to usage
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

# Preprocess calibration data and pass it to the runtime. 
# Once at least 'calibration_frames' number of images are passed in, calibration can proceed
# Model compilation can complete after all calibration data is received and processed
for num in tqdm.trange(len(calib_images)):
    interpreter.set_tensor(input_details[0]['index'], preprocess(calib_images[num]))
    interpreter.invoke()    

### Subgraphs visualization  (optional - In case of debugging models and subgraphs)

TIDL processes 'subgraphs' of supported layers that can run on the acclerator. Several SVG files are provided to visualize the network as a graph.

Running below cell gives links to complete graph and TIDL subgraphs visualizations. This, along with "deny_list" feature, explained above, offer tools for potentially checking and isolating issues in the neural network model layers.

In [None]:
subgraph_link =get_svg_path(output_dir) 
for sg in subgraph_link:
    hl_text = os.path.join(*Path(sg).parts[4:])
    sg_rel = os.path.join('../', sg)
    display(md("[{}]({})".format(hl_text,sg_rel))) 

## Use compiled model for inference
Then using ***TF Lite*** with the ***`libtidl_tfl_delegate`*** delegate library we run the model and collect benchmark data.

This time we will use the ***`libtidl_tfl_delegate`*** instead of the ***`tidl_model_import_tflite`*** delegate, which was used for compiling/importing the model.

The  ***`libtidl_tfl_delegate`*** will similarly accept a dictionary of options to be passed into the TIDL runtime, but this need not be as extensive as the compile_options. This time, all be need is the 'artifacts_folder' pointing to the directory where we put the TIDL compilation outputs.

In [None]:
import matplotlib.pyplot as plt

# Setup the TIDL TFL delegate for inference, and point to the compiled model artifacts
tidl_delegate = [tflite.load_delegate('libtidl_tfl_delegate.so', {'artifacts_folder': output_dir})]
# Setup the Interpreter for this model and target the tidl delegate
interpreter = tflite.Interpreter(model_path=tflite_model_path, experimental_delegates=tidl_delegate)

# Typical TFL API calls needs to complete setup of the network 
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

interpreter.set_tensor(input_details[0]['index'], preprocess('sample-images/elephant.bmp'))

#Running inference several times to get an stable performance output
for i in range(5):
    interpreter.invoke()
    
res = interpreter.get_tensor(output_details[0]['index'])

# Postprocess the output to determine what was recognized in the image. For non-classification model, add your own postprocessing function
for idx, cls in enumerate(res[0].squeeze()[1:].argsort()[-5:][::-1]):
    print('[%d] %s' % (idx, '/'.join(imagenet_class_to_name(cls))))
    
# Pull TI performance measurements from the runtime
from scripts.utils import plot_TI_performance_data, plot_TI_DDRBW_data, get_benchmark_output
stats = interpreter.get_TI_benchmark_data()
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(10,5))
plot_TI_performance_data(stats, axis=ax)
plt.show()

# Process runtime stats to get total time (tt), processing time(st), ddr read time (rb), and ddr write time (wb) for one model inference
tt, st, rb, wb = get_benchmark_output(stats)
print(f'Statistics : \n Inferences Per Second   : {1000.0/tt :7.2f} fps')
print(f' Inference Time Per Image : {tt :7.2f} ms  \n DDR BW Per Image        : {rb+ wb : 7.2f} MB')


## EVM's console logs (optional - in case of inference failure)

To copy console logs from EVM to TI EdgeAI Cloud user's workspace, go to: "Help -> Troubleshooting -> EVM console log", In TI's EdgeAI Cloud landing page.

Alternatevely, from workspace, open/run evm-console-log.ipynb