# Image Classification with Tensorflow

This tutorial demonstrates the steps required to prepare and deploy a trained Tensorflow model for FPGA acceleration using Xilinx MLSuite:  
1. **Quantize the model** - The quantizer will generate scaling parameters for quantizing floats INT8. This is required, because FPGAs will take advantage of Fixed Point Precision, to achieve more parallelization at lower power. 
2. **Subgraph Cutting and Compilation** - In this step, the original graph is cut, compiled and a custom FPGA accelerated python layer is inserted to be used for Inference. 
4. **Classification** - In this step, the modified Tensorflow model from the previous step are run on the FPGA to perform inference on an input image.
    
## Prerequisite Files 
1. **Model files** - This notebook requires that model files are located in  
  `$VAI_ALVEO_ROOT/examples/tensorflow/models`
2. **Image files** - This notebook requires ilsvrc2012 image files are downloaded in  
  `$HOME/CK-TOOLS/dataset-imagenet-ilsvrc2012-val-min/`
3. **Model related parameters** Edit the "USER EDITABLE:" portion of "util.py" according to your model parameters. (The default parameters provided in this file are relevant for inception_v1 and resnet_50 examples below)
  
## Setup (Before Running Notebook)
This notebook should be run inside a tensorflow docker container.

Download the models to "$VAI_ALVEO_ROOT/examples/tensorflow/models":
```
$ python $VAI_ALVEO_ROOT/examples/tensorflow/getModels.py
```

Download 500 calibration images into "~/CK-TOOLS/dataset-imagenet-ilsvrc2012-val-min/":
```
$ conda activate vitis-ai-tensorflow
$ python -m ck pull repo:ck-env
$ python -m ck install package:imagenet-2012-val-min
$ python -m ck install package:imagenet-2012-aux
$ head -n 500 ~/CK-TOOLS/dataset-imagenet-ilsvrc2012-aux/val.txt > ~/CK-TOOLS/dataset-imagenet-ilsvrc2012-val-min/val.txt
$ cd $VAI_ALVEO_ROOT/examples/tensorflow
$ python getModels.py
$ source $VAI_ALVEO_ROOT/overlaybins/setup.sh
```

### Step 1. Import required packages

In [None]:
from __future__ import print_function

import os
import re

from IPython.display import Image as display
from ipywidgets import interact

import tensorflow as tf
import numpy as np
import cv2

from util import top5_accuracy
from vai.dpuv1.rt.xdnn_rt_tf import TFxdnnRT as xdnnRT
from vai.dpuv1.rt.xdnn_util import make_list
from vai.dpuv1.rt.xdnn_io import default_xdnn_arg_parser

In [None]:
# Environment Variables (obtained by running "source overlaybins/setup.sh")
HOME             = os.getenv('HOME','/home/mluser/')
VAI_ALVEO_ROOT   = os.getenv('VAI_ALVEO_ROOT',os.getcwd()+'/../')

if os.path.isdir(os.path.join(VAI_ALVEO_ROOT, 'overlaybins','xdnnv3')):
    XCLBIN = os.path.join(VAI_ALVEO_ROOT, 'overlaybins', 'xdnnv3')
else:
    XCLBIN = os.path.join('/opt/xilinx', 'overlaybins', 'xdnnv3')

if 'VAI_ALVEO_ROOT' in os.environ and os.path.isdir(os.path.join(os.environ['VAI_ALVEO_ROOT'], 'vai/dpuv1')):
      ARCH_JSON = os.path.join(os.environ['VAI_ALVEO_ROOT'], 'vai/dpuv1/tools/compile/bin/arch.json')
elif 'CONDA_PREFIX' in os.environ and os.path.isdir(os.path.join(os.environ['CONDA_PREFIX'], 'arch')):
      ARCH_JSON = os.path.join(os.environ['CONDA_PREFIX'], 'arch/dpuv1/ALVEO/ALVEO.json')
else:
      ARCH_JSON = os.path.join(os.environ['VAI_ROOT'], 'compiler/arch/dpuv1/ALVEO/ALVEO.json')
    
MODELDIR   = VAI_ALVEO_ROOT + "/examples/tensorflow/models/"
IMAGEDIR   = HOME + "/CK-TOOLS/dataset-imagenet-ilsvrc2012-val-min/"
IMAGELIST  = HOME + "/CK-TOOLS/dataset-imagenet-ilsvrc2012-val-min/val.txt"
LABELSLIST = HOME + "/CK-TOOLS/dataset-imagenet-ilsvrc2012-aux/synset_words.txt"

print("Running w/ HOME: %s" % HOME)
print("Running w/ VAI_ALVEO_ROOT: %s" % VAI_ALVEO_ROOT)
print("Running w/ XCLBIN: %s" % XCLBIN)

### Step 2. Choose a model
Choose a model using the drop down, or select custom, and enter your own.

In [None]:
quantInfo = MODELDIR + 'quantization_fix_info.txt'

@interact(MODEL=['resnet50','inception_v1','custom'])
def selectModel(MODEL):
    global protoBuffer, inputNode, outputNode, inputShape, means, pre_process

    default_protoBuffer = {'resnet50': 'resnet50_baseline.pb', 'inception_v1': 'inception_v1_baseline.pb', 'pedestrian_attribute': 'pedestrian_attributes_recognition_quantizations.pb'}
    default_inputNode   = {'resnet50': 'data', 'inception_v1': 'data', 'pedestrian_attribute': 'data'}
    default_outputNode  = {'resnet50': 'prob', 'inception_v1': 'loss3_loss3', 'pedestrian_attribute': 'pred_upper,pred_lower,pred_gender,pred_hat,pred_bag,pred_handbag,pred_backpack'}
    default_inputShape  = {'resnet50': '224,224', 'inception_v1': '224,224', 'pedestrian_attribute': '224,128'}
    default_means       = {'resnet50': '104,117,124', 'inception_v1': '104,117,124', 'pedestrian_attribute': '104,117,124'}

    if MODEL == "custom":
        protoBuffer = None
        inputNode   = None
        outputNode  = None
        inputShape  = None
        means       = None
        pre_process = None
    else:
        protoBuffer = MODELDIR + default_protoBuffer[MODEL]
        inputNode   = default_inputNode[MODEL]
        outputNode  = default_outputNode[MODEL]
        inputShape  = default_inputShape[MODEL]
        means       = default_means[MODEL]
        pre_process = MODEL

In [None]:
if not protoBuffer:
    @interact(PROTOBUFFER="Provide the path to your protobuffer")
    def selectTFmodel(PROTOBUFFER):
        global protoBuffer
        protoBuffer = PROTOBUFFER

if not quantInfo:
    @interact(QUANTINFO="Provide the path to your quantization file")
    def selectTFmodel(QUANTINFO):
        global quantInfo
        quantInfo = QUANTINFO

if not inputNode:
    @interact(INPUTNODE="Provide the input node(s) (comma separated string with no spaces)")
    def selectTFmodel(INPUTNODE):
        global inputNode
        inputNode = INPUTNODE

if not outputNode:
    @interact(OUTPUTNODE="Provide the output node(s) (comma separated string with no spaces)")
    def selectTFmodel(OUTPUTNODE):
        global outputNode
        outputNode = OUTPUTNODE

if not inputShape:
    @interact(INPUTSHAPE="Provide the input shapes (comma separated string with no spaces)")
    def selectTFmodel(INPUTSHAPE):
        global inputShape
        inputShape = INPUTSHAPE

if not means:
    @interact(MEANS="Provide the means (comma separated string with no spaces)")
    def selectTFmodel(MEANS):
        global means
        means = MEANS

In [None]:
print("Running with protoBuffer:   %s" % protoBuffer)
print("Running with quantInfo:     %s" % quantInfo)
print("Running with inputNode:     %s" % inputNode)
print("Running with outputNode:    %s" % outputNode)
print("Running with inputShape:    %s" % inputShape)
print("Running with means:         %s" % means)

### Step 3. Run the Quantizer

Inspect the model to gather its input and output node(s), and input nodes' shape.  Next, quantize the model using graph and sample data parameters.  This quantization process produces two protobuf files containing the quantization information.  Finally, extract the quantization information using the compiler.  The end result is a txt file holding the scaling parameters for quantizing floats to INT8.  This is required because FPGAs will take advantage of Fixed Point Precision to achieve accelerated inference. 

In [None]:
!vai_q_tensorflow inspect --input_frozen_graph $protoBuffer

In [None]:
!vai_q_tensorflow quantize \
    --input_frozen_graph $protoBuffer \
    --input_nodes        $inputNode \
    --output_nodes       $outputNode \
    --input_shapes       ?,$inputShape,3 \
    --output_dir         $MODELDIR \
    --input_fn           util.input_fn_$pre_process \
    --method             1 \
    --calib_iter         100

In [None]:
!vai_c_tensorflow \
    --frozen_pb          $MODELDIR/deploy_model.pb \
    --arch               $ARCH_JSON \
    --output_dir         $MODELDIR \
    --net_name           $quantInfo \
    --quant_info

### Step 4: Run the Partitioner and Compiler

The partitioner takes in the model protoBuffer, and pre-computer quantization parameters and compiles the porttion of the network specified from starnode(s) to finalnode(s) for FPGA acceleration.

In case startnode and/or finalnode is not specified (i.e., an empty list) the corresponding endnode is infered from the protoBuffer.

In [None]:
def get_args(startnode=inputNode, finalnode=outputNode):
    return {
        ### Some standard partitioner arguments [EDITABLE]
        'startnode':            startnode,
        'finalnode':            finalnode,
        
        ### Some standard compiler arguments [PLEASE DONT TOUCH]
        'dsp':                  96,
        'memory':               9,
        'bytesperpixels':       1,
        'ddr':                  256,
        'data_format':          'NHWC',
        'mixmemorystrategy':    True,
        'noreplication':        True,
        'xdnnv3':               True,
        'usedeephi':            True,
        'quantz':               ''  
    }

In [None]:
## load default arguments
FLAGS, unparsed = default_xdnn_arg_parser().parse_known_args([])

### Partition and compile
rt = xdnnRT(FLAGS,
            networkfile=protoBuffer,
            quant_cfgfile=quantInfo,
            xclbin=XCLBIN,
            device='FPGA',
            placeholdershape="{{'{}':[1,{},{},3]}}".format(inputNode,*[int(x) for x in inputShape.split(',')]),
            **get_args(inputNode, outputNode)
           )

### Step 5: Inference 

If the model in protoBuffer includes all the pre and post processings, the inference can be done simply by passing the input_data as follows:


In [None]:
## Pre-processing function
def preprocess(image):
    input_height, input_width = 224, 224

    ## Image preprocessing using numpy
    img  = cv2.imread(image).astype(np.float32)
    img -= np.array(make_list(means)).reshape(-1,3).astype(np.float32)
    img  = cv2.resize(img, (input_width, input_height))
    
    return img

In [None]:
## Choose image to run, display it for reference
image  = IMAGEDIR + "ILSVRC2012_val_00000003.JPEG"

display(filename=image)

In [None]:
## Accelerated execution

## load the accelerated graph
graph = rt.load_partitioned_graph()

## run the tensorflow graph as usual (additional operations can be added to the graph)
with tf.Session(graph=graph) as sess:
    input_tensor  = graph.get_operation_by_name(inputNode).outputs[0]
    output_tensor = graph.get_operation_by_name(outputNode).outputs[0]
    
    predictions = sess.run(output_tensor, feed_dict={input_tensor: [preprocess(image)]})

In [None]:
labels = np.loadtxt(LABELSLIST, str, delimiter='\t')
top_k = predictions[0].argsort()[:-6:-1]

for l,p in zip(labels[top_k], predictions[0][top_k]):
    print (l," : ",p)

### Step 6: Accuracy Test

In [None]:
iter_cnt = 100 
batch_size = 1
label_offset = 0

top5_accuracy(graph, inputNode, outputNode, iter_cnt, batch_size, pre_process, label_offset)

# Conclusion
This notebook demonstrates how to target Xilinx FPGAs for inference using TensorFlow.  

When the time comes to take your application to production please look at examples in $VAI_ALVEO_ROOT/examples/deployment_modes/  
Highest performance is acheived by creating multiprocess pipelines.