# LeNet Inference with Cloud Alveo U200

This tutorial demonstrates the steps for hand-writing digit recognition using ML with classic LeNet. Alveo U200 is used for neural network acceleration.
1. Use Windows Paint tool to open a square canvas, fill the background with black, and draw a digit (0 - 9) with white brush, save as **input.png**
2. Upload the **input.png** file to the home directory
3. Run the code in this page

Below is an example for the **input.png** file:
![test](./example.png)

Code in this page will finish following jobs:
1. **Quantize the model** - The quantizer will generate scaling parameters for quantizing floats INT8. This is required, because FPGAs will take advantage of Fixed Point Precision, to achieve more parallelization at lower power. 
2. **Compile the Model** - In this step, the network Graph (prototxt) and the Weights (caffemodel) are compiled, the compiler 
3. **Subgraph Cutting** - In this step, the original graph is cut, and a custom FPGA accelerated python layer is inserted to be used for Inference. 
4. **Classification** - In this step, the caffe model and the prototxt from the previous step are run on the FPGA to perform inference on an input image.


The pre-trained nerual network model includes following files:
* **lenet.prototxt**: Caffe PROTOTXT file
* **lenet_iter_10000.caffemodel**: Caffe model parameters file


### Prepare input image and finish pre-processing

In [None]:
# INPUT_IMAGE is the input file name
# SMALL_IMAGE is the 28x28 size image generated from the input image
INPUT_IMAGE = './input.png'
SMALL_IMAGE = './small.png'

In [None]:
# display the input image file
from IPython.display import Image as NBImage
NBImage(INPUT_IMAGE)

In [None]:
# Preprossing: scale the input input image to 28x28 size, and display it
from PIL import Image
im = Image.open(INPUT_IMAGE)
im.thumbnail((28, 28))
tt =  im.convert('L')
tt.save(SMALL_IMAGE)
NBImage(SMALL_IMAGE)

### Import required packages

In [None]:
from __future__ import print_function

from decent import CaffeFrontend as xfdnnQuantizer
import subprocess
from xfdnn_subgraph import CaffeCutter as xfdnnCutter

# Environment Variables ("source overlaybins/setup.sh")
import os
VAI_ALVEO_ROOT = os.getenv("VAI_ALVEO_ROOT","../")
MLSUITE_PLATFORM = os.getenv("MLSUITE_PLATFORM","alveo-u200")
print("Running with VAI_ALVEO_ROOT: %s" % VAI_ALVEO_ROOT)
print("Running with MLSUITE_PLATFORM: %s" % MLSUITE_PLATFORM)

from IPython.display import Image as display

### Set model files

In [None]:
prototxt = "./lenet_train_test.prototxt"
caffemodel = "./lenet_iter_10000.caffemodel"

### Step 3. Run the Quantizer

Here, we will quantize the model. The inputs are model prototxt, model weights, number of test iterations and calibration iterations. The output is quantized prototxt, weights, and quantize_info.txt and will be generated in the quantize_results/ directory.

The Quantizer will generate a json file holding scaling parameters for quantizing floats to INT8
This is required, because FPGAs will take advantage of Fixed Point Precision, to achieve accelerated inference

In [None]:
def Quantize(prototxt,caffemodel,calib_iter=1):
    
    quantizer = xfdnnQuantizer(
        model=prototxt,
        weights=caffemodel,
        calib_iter=calib_iter,
    )
    
    quantizer.quantize()

In [None]:
Quantize(prototxt,caffemodel)

### Run the Compiler

The compiler takes in the quantizer outputs from the previous step (prototxt, weights, quantize_info) and outputs a compiler.json and quantizer.json.

* A Network Graph (prototxt) and a Weights Blob (caffemodel) are compiled
* The network is optimized
* FPGA Instructions are generated

In [None]:
# Some standard compiler arguments - PLEASE DONT TOUCH
def Getopts():
    return {
            "bytesperpixels":1,
            "dsp":96,
            "memory":9,
            "ddr":256,
            "cpulayermustgo":True,
            "mixmemorystrategy":True,
            "pipelineconvmaxpool":True,
            "usedeephi":True,
    }

In [None]:
def Compile(prototxt="quantize_results/deploy.prototxt",\
            caffemodel="quantize_results/deploy.caffemodel",\
            quantize_info="quantize_results/quantize_info.txt"):
    
    subprocess.call(["vai_c_caffe",
                    "--prototxt", prototxt,
                    "--caffemodel", caffemodel,
                    "--net_name", name,
                    "--output_dir", "work",
                    "--arch", "/opt/vitis_ai/compiler/arch/DPUCADX8G/ALVEO/arch.json",
                    "--options", "{\"quant_cfgfile\":\"%s\", \
                    \"pipelineconvmaxpool\":False, \
                    }" %(quantize_info)])

In [None]:
Compile()

### Run the Subgraph Cutter

The subgraph cutter creates a custom python layer to be accelerated on the FPGA. The inputs are compiler.json, quantizer.json and model weights from the compiler step, as well as the FPGA xclbin. This outputs a cut prototxt file with FPGA references, to be used for inference. 

In [None]:
def Cut(prototxt):
    
    cutter = xfdnnCutter(
        inproto="quantize_results/deploy.prototxt",
        trainproto=prototxt,
        outproto="xfdnn_auto_cut_deploy.prototxt",
        outtrainproto="xfdnn_auto_cut_train_val.prototxt",
        cutAfter="data",
        xclbin=VAI_ALVEO_ROOT+"/overlaybins/"+MLSUITE_PLATFORM+"/overlay_4.xclbin",
        netcfg="work/compiler.json",
        quantizecfg="work/quantizer.json",
        weights="work/weights.h5",
        profile=True
    )
    
    cutter.cut()

In [None]:
Cut(prototxt)

### Step 5: Execute inference 

The inputs are the FPGA prototxt file, caffemodel weights, a test image, and the labels


In [None]:
def Classify(prototxt,caffemodel,image):

    import numpy as np
    from caffe import Classifier,io
    classifier = Classifier(prototxt,caffemodel)
    predictions = classifier.predict([io.load_image(image,color=False)]).flatten()
    return predictions

In [None]:
result = Classify("xfdnn_auto_cut_deploy.prototxt","quantize_results/deploy.caffemodel",SMALL_IMAGE)
print( 'predicted class:', result.argmax())