# Part 2: Using the xfDNN Quantizer to Recalibrate Models

## Introduction 

In this part of the lab, we will look at quantizing 32-bit floating point models to Int16 or Int8 inpreparation for deployment. Deploying Int16/8 models dramatically improves inference deployment and lowers latency. While floating point precision is useful in model training, it is more energy efficient as well as lower latency to deploy models in lower precison. 

The xfDNN Quantizer performs a technique of quantization known as recalibration. This technique does not require full retraining of the model, and can be accomplished in a matter of seconds, as you will see below. It also allows you to maintain the accuracy of the high precision model.

Quantization of the model does not alter the orginal high precision model, rather, it calculates the dynamic range of the model and produces scaling parameters recorded in a json file, which will be used by the xDNN overlay during execution of the network/model. Quantization of the model is an offline process that only needs to be performed once per model. The quantizer produces an optimal target quantization from a given network (prototxt and caffemodel) and calibration set (unlabeled input images) without requiring hours of retraining or a labeled dataset.

In this lab, we will look at quantizing an optimized model generated from Part 1, defined in Caffe prototxt and caffemodel, to Int16 and Int8.  Depending on your earlier notebook this will be either a GoogLeNet-v1 or Resnet-50 model.

Just like in Part 1, first we will run through an example, then you will get a chance to try the quantizer yourself. 

### 1. Import required packages 

In [1]:
import os,sys
from __future__ import print_function

# Bring in Xilinx ML-Suite Compiler
from xfdnn.tools.quantize.quantize import CaffeFrontend as xfdnnQuantizer

### 1. Create Quantizer Instance and run it

To simplify handling of arguments, a config dictionary is used. Take a look at the dictionary below.

The arguments that need to be passed are:
- `outmodel` - Filename generated by the compiler for the optimized prototxt and caffemodel.
- `quantizecfg` - Output JSON filename of quantization scaling parameters. 
- `bitwidths` - Desired precision from quantizer. This is to set the precision for [image data, weight bitwidth, conv output]. All three values need to be set to the same setting. The valid options are `16` for Int16 and `8` for Int8.  
- `in_shape` - Sets the desired input image size of the first layer. Images will be resized to these demensions and must match the network data/placeholder layer.
- `transpose` - Images start as H,W,C (H=0,W=1,C=2) transpose swaps to C,H,W (2,0,1) for typical networks.
- `channel_swap` - Depending on network training and image read, can swap from RGB (R=0,G=1,B=2) to BGR (2,1,0).
- `raw_scale` - Depending on network training, scale pixel values before mean subtraction.
- `img_mean` - Depending on network training, subtract image mean if available.
- `input_scale` - Depending on network training, scale after subtracting mean.
- `calibration_size` - Number of images the quantizer will use to calculate the dynamic range. 
- `calibration_directory` - Location of dir of images used for the calibration process. 

Below is an example with all the parameters filled in. `channel_swap` `raw_scale` `img_mean` `input_scale` are expert parameters that should be left in the default positions, indicated below. 

In [2]:
# Use a config dictionary to pass parameters to the compiler
config = {}

config["caffemodel"] = "work/optimized_model" # String for naming intermediate prototxt, caffemodel

# Quantizer Arguments
#config["outmodel"] = Defined in Step 1 # String for naming intermediate prototxt, caffemodel
config["quantizecfg"] = "work/quantization_params.json" # Quantizer will generate quantization params
config["bitwidths"] = [16,16,16] # Supported quantization precision
config["in_shape"] = [3,224,224] # Images will be resized to this shape -> Needs to match prototxt
config["transpose"] = [2,0,1] # (H,W,C)->(C,H,W) transpose argument to quantizer
config["channel_swap"] = [2,1,0] # (R,G,B)->(B,G,R) Channel Swap argument to quantizer
config["raw_scale"] = 255.0
config["img_mean"] = [104.007, 116.669, 122.679] # Mean of the training set (From Imagenet)
config["input_scale"] = 1.0
config["calibration_size"] = 8 # Number of calibration images quantizer will use
config["calibration_directory"] = "../xfdnn/tools/quantize/calibration_directory" # Directory of images

quantizer = xfdnnQuantizer(
    deploy_model=config["caffemodel"]+".prototxt",        # Model filename: input file
    weights=config["caffemodel"]+".caffemodel",           # Floating Point weights
    output_json=config["quantizecfg"],                    # Quantization JSON output filename
    bitwidths=config["bitwidths"],                        # Fixed Point precision: 8,8,8 or 16,16,16
    dims=config["in_shape"],                              # Image dimensions [C,H,W]
    transpose=config["transpose"],                        # Transpose argument to caffe transformer
    channel_swap=config["channel_swap"],                  # Channel swap argument to caffe transfomer
    raw_scale=config["raw_scale"],                        # Raw scale argument to caffe transformer
    mean_value=config["img_mean"],                        # Image mean per channel to caffe transformer
    input_scale=config["input_scale"],                    # Input scale argument to caffe transformer
    calibration_size=config["calibration_size"],          # Number of calibration images to use
    calibration_directory=config["calibration_directory"] # Directory containing calbration images
)

# Invoke quantizer
try:
    quantizer.quantize()

    import json
    data = json.loads(open(config["quantizecfg"]).read())
    print("**********\nSuccessfully produced quantization JSON file for %d layers.\n"%len(data['network']))
except Exception as e:
    print("Failed to quantize:",e)

Mean : [104.007 116.669 122.679]
Adding ../xfdnn/tools/quantize/calibration_directory/16247716843_b419e8b111_z.jpg to calibration batch.


  warn('`as_grey` has been deprecated in favor of `as_gray`')
  warn("The default mode, 'constant', will be changed to 'reflect' in "
  warn("Anti-aliasing will be enabled by default in skimage 0.15 to "


Adding ../xfdnn/tools/quantize/calibration_directory/3272651417_27976a64b3_z.jpg to calibration batch.
Adding ../xfdnn/tools/quantize/calibration_directory/36085792773_b9a3d115a3_z.jpg to calibration batch.
Adding ../xfdnn/tools/quantize/calibration_directory/4788821373_441cd29c9f_z.jpg to calibration batch.
Adding ../xfdnn/tools/quantize/calibration_directory/5904386289_924b24d75d_z.jpg to calibration batch.
Adding ../xfdnn/tools/quantize/calibration_directory/7291910830_86a8ebb15d_z.jpg to calibration batch.
Adding ../xfdnn/tools/quantize/calibration_directory/7647574936_ffebfa2bea_z.jpg to calibration batch.
Adding ../xfdnn/tools/quantize/calibration_directory/78947826_fc79a94bf2_z.jpg to calibration batch.
--------------------------------------------------------------------------------
Processing layer 0 of 139
Layer Name:data Type:Input
Inputs: [], Outputs: ['data']
Quantizing layer output...
n:  32768 , len(bin_edges):  1099
Mean : th_layer_out:  150.9929962158203 , sf_layer_out:

n:  32768 , len(bin_edges):  1269
Mean : th_layer_out:  12.544038772583008 , sf_layer_out:  0.00038282536614835073
bw_layer_in:  16
th_layer_in:  10.524314880371094
bw_layer_out:  16
th_layer_out:  12.544038772583008
--------------------------------------------------------------------------------
Processing layer 17 of 139
Layer Name:res2b_branch2b_relu Type:ReLU
Inputs: ['res2b_branch2b'], Outputs: ['res2b_branch2b']
n:  32768 , len(bin_edges):  1269
Mean : th_layer_out:  9.34158992767334 , sf_layer_out:  0.00028509140072857875
bw_layer_out:  16
th_layer_out:  9.34158992767334
--------------------------------------------------------------------------------
Processing layer 18 of 139
Layer Name:res2b_branch2c Type:Convolution
Inputs: ['res2b_branch2b'], Outputs: ['res2b_branch2c']
Quantizing conv input layer ... res2b_branch2c
Threshold in shape= ()
Quantizing conv weights for layer res2b_branch2c...
Threshold params shape= (256,)
n:  32768 , len(bin_edges):  2536
Mean : th_layer_out: 

n:  32768 , len(bin_edges):  897
Mean : th_layer_out:  7.3644795417785645 , sf_layer_out:  0.00022475293868155658
bw_layer_out:  16
th_layer_out:  7.3644795417785645
--------------------------------------------------------------------------------
Processing layer 35 of 139
Layer Name:res3a_branch2c Type:Convolution
Inputs: ['res3a_branch2b'], Outputs: ['res3a_branch2c']
Quantizing conv input layer ... res3a_branch2c
Threshold in shape= ()
Quantizing conv weights for layer res3a_branch2c...
Threshold params shape= (512,)
n:  32768 , len(bin_edges):  1793
Mean : th_layer_out:  17.476903915405273 , sf_layer_out:  0.0005333690577533882
Threshold out shape= ()
n:  32768 , len(bin_edges):  1793
Mean : th_layer_out:  17.476903915405273 , sf_layer_out:  0.0005333690577533882
bw_layer_in:  16
th_layer_in:  7.3644795417785645
bw_layer_out:  16
th_layer_out:  17.476903915405273
--------------------------------------------------------------------------------
Processing layer 36 of 139
Layer Name:r

n:  32768 , len(bin_edges):  1793
Mean : th_layer_out:  25.55634307861328 , sf_layer_out:  0.0007799414984164946
Threshold out shape= ()
n:  32768 , len(bin_edges):  1793
Mean : th_layer_out:  25.55634307861328 , sf_layer_out:  0.0007799414984164946
bw_layer_in:  16
th_layer_in:  11.205114364624023
bw_layer_out:  16
th_layer_out:  25.55634307861328
--------------------------------------------------------------------------------
Processing layer 52 of 139
Layer Name:res3c Type:Eltwise
Inputs: ['res3b_res3b_relu_0_split_1', 'res3c_branch2c'], Outputs: ['res3c']
bw_layer_in:  16
th_layer_in:  25.55634307861328
bw_layer_out:  16
th_layer_out:  25.55634307861328
--------------------------------------------------------------------------------
Processing layer 53 of 139
Layer Name:res3c_relu Type:ReLU
Inputs: ['res3c'], Outputs: ['res3c']
n:  32768 , len(bin_edges):  1793
Mean : th_layer_out:  25.085376739501953 , sf_layer_out:  0.0007655683077334499
bw_layer_out:  16
th_layer_out:  25.085376

bw_layer_in:  16
th_layer_in:  7.337042808532715
bw_layer_out:  16
th_layer_out:  13.620804786682129
--------------------------------------------------------------------------------
Processing layer 69 of 139
Layer Name:res4a Type:Eltwise
Inputs: ['res4a_branch1', 'res4a_branch2c'], Outputs: ['res4a']
bw_layer_in:  16
th_layer_in:  19.360361099243164
bw_layer_out:  16
th_layer_out:  19.360361099243164
--------------------------------------------------------------------------------
Processing layer 70 of 139
Layer Name:res4a_relu Type:ReLU
Inputs: ['res4a'], Outputs: ['res4a']
n:  32768 , len(bin_edges):  1269
Mean : th_layer_out:  19.6492862701416 , sf_layer_out:  0.000599666929231898
bw_layer_out:  16
th_layer_out:  19.6492862701416
--------------------------------------------------------------------------------
Processing layer 71 of 139
Layer Name:res4a_res4a_relu_0_split Type:Split
Inputs: ['res4a'], Outputs: ['res4a_res4a_relu_0_split_0', 'res4a_res4a_relu_0_split_1']
bw_layer_in:

n:  32768 , len(bin_edges):  635
Mean : th_layer_out:  9.873241424560547 , sf_layer_out:  0.00030131661197425907
Threshold out shape= ()
n:  32768 , len(bin_edges):  635
Mean : th_layer_out:  9.873241424560547 , sf_layer_out:  0.00030131661197425907
bw_layer_in:  16
th_layer_in:  19.759119033813477
bw_layer_out:  16
th_layer_out:  9.873241424560547
--------------------------------------------------------------------------------
Processing layer 89 of 139
Layer Name:res4d_branch2a_relu Type:ReLU
Inputs: ['res4d_branch2a'], Outputs: ['res4d_branch2a']
n:  32768 , len(bin_edges):  635
Mean : th_layer_out:  9.873241424560547 , sf_layer_out:  0.00030131661197425907
bw_layer_out:  16
th_layer_out:  9.873241424560547
--------------------------------------------------------------------------------
Processing layer 90 of 139
Layer Name:res4d_branch2b Type:Convolution
Inputs: ['res4d_branch2a'], Outputs: ['res4d_branch2b']
Quantizing conv input layer ... res4d_branch2b
Threshold in shape= ()
Qua

n:  32768 , len(bin_edges):  1269
Mean : th_layer_out:  24.923065185546875 , sf_layer_out:  0.000760614801036008
Threshold out shape= ()
n:  32768 , len(bin_edges):  1269
Mean : th_layer_out:  24.923065185546875 , sf_layer_out:  0.000760614801036008
bw_layer_in:  16
th_layer_in:  31.814891815185547
bw_layer_out:  16
th_layer_out:  24.923065185546875
--------------------------------------------------------------------------------
Processing layer 109 of 139
Layer Name:res4f Type:Eltwise
Inputs: ['res4e_res4e_relu_0_split_1', 'res4f_branch2c'], Outputs: ['res4f']
bw_layer_in:  16
th_layer_in:  24.923065185546875
bw_layer_out:  16
th_layer_out:  24.923065185546875
--------------------------------------------------------------------------------
Processing layer 110 of 139
Layer Name:res4f_relu Type:ReLU
Inputs: ['res4f'], Outputs: ['res4f']
n:  32768 , len(bin_edges):  1269
Mean : th_layer_out:  24.923065185546875 , sf_layer_out:  0.000760614801036008
bw_layer_out:  16
th_layer_out:  24.92

n:  32768 , len(bin_edges):  898
Mean : th_layer_out:  25.313888549804688 , sf_layer_out:  0.0007725421475815512
Threshold out shape= ()
n:  32768 , len(bin_edges):  898
Mean : th_layer_out:  25.313888549804688 , sf_layer_out:  0.0007725421475815512
bw_layer_in:  16
th_layer_in:  5.642468452453613
bw_layer_out:  16
th_layer_out:  25.313888549804688
--------------------------------------------------------------------------------
Processing layer 126 of 139
Layer Name:res5b Type:Eltwise
Inputs: ['res5a_res5a_relu_0_split_1', 'res5b_branch2c'], Outputs: ['res5b']
bw_layer_in:  16
th_layer_in:  51.18451690673828
bw_layer_out:  16
th_layer_out:  51.18451690673828
--------------------------------------------------------------------------------
Processing layer 127 of 139
Layer Name:res5b_relu Type:ReLU
Inputs: ['res5b'], Outputs: ['res5b']
n:  32768 , len(bin_edges):  897
Mean : th_layer_out:  53.91033935546875 , sf_layer_out:  0.0016452632024740975
bw_layer_out:  16
th_layer_out:  53.910339

### 2. Try it yourself by changing the quantization precision

Now that you have had a chance to see how this works, it's time to get some hands on experience.  
Change the following from the example above:
1. Precision of quantization by adjusting `bitwidth`

Below, replace `value` with one of the supported precision types. [8,8,8] or [16,16,16]

In [3]:
# Since we already have an instance of the quantizer, you can just update these params:

quantizer.bitwidths = [8,8,8]

# Invoke quantizer
try:
    quantizer.quantize()

    import json
    data = json.loads(open(config["quantizecfg"]).read())
    print("**********\nSuccessfully produced quantization JSON file for %d layers.\n"%len(data['network']))
except Exception as e:
    print("Failed to quantize:",e)

Mean : [104.007 116.669 122.679]
Adding ../xfdnn/tools/quantize/calibration_directory/13923040300_b4c8521b4d_z.jpg to calibration batch.
Adding ../xfdnn/tools/quantize/calibration_directory/15439525724_97d7cc2c81_z.jpg to calibration batch.
Adding ../xfdnn/tools/quantize/calibration_directory/3272651417_27976a64b3_z.jpg to calibration batch.
Adding ../xfdnn/tools/quantize/calibration_directory/3591612840_33710806df_z.jpg to calibration batch.
Adding ../xfdnn/tools/quantize/calibration_directory/4788821373_441cd29c9f_z.jpg to calibration batch.
Adding ../xfdnn/tools/quantize/calibration_directory/5904386289_924b24d75d_z.jpg to calibration batch.
Adding ../xfdnn/tools/quantize/calibration_directory/7291910830_86a8ebb15d_z.jpg to calibration batch.
Adding ../xfdnn/tools/quantize/calibration_directory/7647574936_ffebfa2bea_z.jpg to calibration batch.
--------------------------------------------------------------------------------
Processing layer 0 of 139
Layer Name:data Type:Input
Inputs:

Mean : th_layer_out:  6.281652632195867 , sf_layer_out:  0.049461831749573755
bw_layer_out:  8
th_layer_out:  6.281652632195867
--------------------------------------------------------------------------------
Processing layer 16 of 139
Layer Name:res2b_branch2b Type:Convolution
Inputs: ['res2b_branch2a'], Outputs: ['res2b_branch2b']
Quantizing conv input layer ... res2b_branch2b
Threshold in shape= ()
Quantizing conv weights for layer res2b_branch2b...
Threshold params shape= (64,)
n:  128 , len(bin_edges):  1269
Mean : th_layer_out:  7.8268757569677065 , sf_layer_out:  0.06162894296824966
Threshold out shape= ()
n:  128 , len(bin_edges):  1269
Mean : th_layer_out:  7.8268757569677065 , sf_layer_out:  0.06162894296824966
bw_layer_in:  8
th_layer_in:  6.281652632195867
bw_layer_out:  8
th_layer_out:  7.8268757569677065
--------------------------------------------------------------------------------
Processing layer 17 of 139
Layer Name:res2b_branch2b_relu Type:ReLU
Inputs: ['res2b_branc

Mean : th_layer_out:  5.897329454975469 , sf_layer_out:  0.04643566499980684
bw_layer_out:  8
th_layer_out:  5.897329454975469
--------------------------------------------------------------------------------
Processing layer 33 of 139
Layer Name:res3a_branch2b Type:Convolution
Inputs: ['res3a_branch2a'], Outputs: ['res3a_branch2b']
Quantizing conv input layer ... res3a_branch2b
Threshold in shape= ()
Quantizing conv weights for layer res3a_branch2b...
Threshold params shape= (128,)
n:  128 , len(bin_edges):  897
Mean : th_layer_out:  7.024244919419289 , sf_layer_out:  0.0553090151135377
Threshold out shape= ()
n:  128 , len(bin_edges):  897
Mean : th_layer_out:  7.024244919419289 , sf_layer_out:  0.0553090151135377
bw_layer_in:  8
th_layer_in:  5.897329454975469
bw_layer_out:  8
th_layer_out:  7.024244919419289
--------------------------------------------------------------------------------
Processing layer 34 of 139
Layer Name:res3a_branch2b_relu Type:ReLU
Inputs: ['res3a_branch2b'], 

Mean : th_layer_out:  5.8813841882490925 , sf_layer_out:  0.04631011171849679
bw_layer_out:  8
th_layer_out:  5.8813841882490925
--------------------------------------------------------------------------------
Processing layer 51 of 139
Layer Name:res3c_branch2c Type:Convolution
Inputs: ['res3c_branch2b'], Outputs: ['res3c_branch2c']
Quantizing conv input layer ... res3c_branch2c
Threshold in shape= ()
Quantizing conv weights for layer res3c_branch2c...
Threshold params shape= (512,)
n:  128 , len(bin_edges):  1793
Mean : th_layer_out:  6.919283195797886 , sf_layer_out:  0.05448254484880226
Threshold out shape= ()
n:  128 , len(bin_edges):  1793
Mean : th_layer_out:  6.919283195797886 , sf_layer_out:  0.05448254484880226
bw_layer_in:  8
th_layer_in:  5.8813841882490925
bw_layer_out:  8
th_layer_out:  6.919283195797886
--------------------------------------------------------------------------------
Processing layer 52 of 139
Layer Name:res3c Type:Eltwise
Inputs: ['res3b_res3b_relu_0_spl

Mean : th_layer_out:  7.27341283006999 , sf_layer_out:  0.057270967165905434
Threshold out shape= ()
n:  128 , len(bin_edges):  1269
Mean : th_layer_out:  7.27341283006999 , sf_layer_out:  0.057270967165905434
bw_layer_in:  8
th_layer_in:  5.390883855263141
bw_layer_out:  8
th_layer_out:  7.27341283006999
--------------------------------------------------------------------------------
Processing layer 69 of 139
Layer Name:res4a Type:Eltwise
Inputs: ['res4a_branch1', 'res4a_branch2c'], Outputs: ['res4a']
bw_layer_in:  8
th_layer_in:  7.918170494988138
bw_layer_out:  8
th_layer_out:  7.918170494988138
--------------------------------------------------------------------------------
Processing layer 70 of 139
Layer Name:res4a_relu Type:ReLU
Inputs: ['res4a'], Outputs: ['res4a']
n:  128 , len(bin_edges):  1269
Mean : th_layer_out:  12.92278621287 , sf_layer_out:  0.10175422214858267
bw_layer_out:  8
th_layer_out:  12.92278621287
--------------------------------------------------------------

Mean : th_layer_out:  8.77127093621985 , sf_layer_out:  0.06906512548204606
bw_layer_out:  8
th_layer_out:  8.77127093621985
--------------------------------------------------------------------------------
Processing layer 87 of 139
Layer Name:res4c_res4c_relu_0_split Type:Split
Inputs: ['res4c'], Outputs: ['res4c_res4c_relu_0_split_0', 'res4c_res4c_relu_0_split_1']
bw_layer_in:  8
th_layer_in:  8.77127093621985
bw_layer_out:  8
th_layer_out:  8.77127093621985
--------------------------------------------------------------------------------
Processing layer 88 of 139
Layer Name:res4d_branch2a Type:Convolution
Inputs: ['res4c_res4c_relu_0_split_0'], Outputs: ['res4d_branch2a']
Quantizing conv input layer ... res4d_branch2a
Threshold in shape= ()
Quantizing conv weights for layer res4d_branch2a...
Threshold params shape= (256,)
n:  128 , len(bin_edges):  635
Mean : th_layer_out:  5.211271542479939 , sf_layer_out:  0.04103363419275543
Threshold out shape= ()
n:  128 , len(bin_edges):  635


Mean : th_layer_out:  5.411138480394044 , sf_layer_out:  0.04260738960940192
bw_layer_in:  8
th_layer_in:  7.846795767263659
bw_layer_out:  8
th_layer_out:  5.411138480394044
--------------------------------------------------------------------------------
Processing layer 105 of 139
Layer Name:res4f_branch2a_relu Type:ReLU
Inputs: ['res4f_branch2a'], Outputs: ['res4f_branch2a']
n:  128 , len(bin_edges):  635
Mean : th_layer_out:  5.220881284599424 , sf_layer_out:  0.04110930145353878
bw_layer_out:  8
th_layer_out:  5.220881284599424
--------------------------------------------------------------------------------
Processing layer 106 of 139
Layer Name:res4f_branch2b Type:Convolution
Inputs: ['res4f_branch2a'], Outputs: ['res4f_branch2b']
Quantizing conv input layer ... res4f_branch2b
Threshold in shape= ()
Quantizing conv weights for layer res4f_branch2b...
Threshold params shape= (256,)
n:  128 , len(bin_edges):  635
Mean : th_layer_out:  5.922187317056987 , sf_layer_out:  0.0466313961

n:  128 , len(bin_edges):  449
Mean : th_layer_out:  4.7102821325617175 , sf_layer_out:  0.03708883568946234
bw_layer_out:  8
th_layer_out:  4.7102821325617175
--------------------------------------------------------------------------------
Processing layer 123 of 139
Layer Name:res5b_branch2b Type:Convolution
Inputs: ['res5b_branch2a'], Outputs: ['res5b_branch2b']
Quantizing conv input layer ... res5b_branch2b
Threshold in shape= ()
Quantizing conv weights for layer res5b_branch2b...
Threshold params shape= (512,)
n:  128 , len(bin_edges):  450
Mean : th_layer_out:  5.886197742215775 , sf_layer_out:  0.04634801371823445
Threshold out shape= ()
n:  128 , len(bin_edges):  450
Mean : th_layer_out:  5.886197742215775 , sf_layer_out:  0.04634801371823445
bw_layer_in:  8
th_layer_in:  4.7102821325617175
bw_layer_out:  8
th_layer_out:  5.886197742215775
--------------------------------------------------------------------------------
Processing layer 124 of 139
Layer Name:res5b_branch2b_relu 

Well done! That concludes the Part 2. Now you are ready to put parts 1 and 2 together and deploy a network/model. 

## [Part 3: Putting it all together: Compile, Quantize and Deploy][]

[Part 3: Putting it all together: Compile, Quantize and Deploy]: image_classification_caffe.ipynb