This convert tools is base on TensorRT 2.0 Int8 calibration tools,which use the KL algorithm to find the suitable threshold to quantize the activions from Float32 to Int8(-128 - 127).
We provide the Classification(SqueezeNet_v1.1) and Detection(MobileNet_v1 SSD 300) demos based on ncnn(It is a high-performance neural network inference framework optimized for the mobile platform),and the community ready to support this implment.
For details, please read the following PDF:
MXNet quantization implement:
Quantization module for generating quantized (INT8) models from FP32 models
An introduction to the principles of a Chinese blog written by my friend(bruce.zhang):
The implement of Int8 quantize base on TensorRT
The purpose of this tool(caffe-int8-convert-tool-dev.py) is to test new features,such as mulit-channels quantization depend on group num.
This format is already supported in the ncnn latest version.I will do my best to transform some common network models into classification-dev
python caffe-int8-convert-tool-dev-weight.py -h
usage: caffe-int8-convert-tool-dev-weight.py [-h] [--proto PROTO] [--model MODEL]
[--mean MEAN MEAN MEAN] [--norm NORM]
[--images IMAGES] [--output OUTPUT]
[--group GROUP] [--gpu GPU]
find the pretrained caffemodel int8 quantize scale value
optional arguments:
-h, --help show this help message and exit
--proto PROTO path to deploy prototxt.
--model MODEL path to pretrained caffemodel
--mean MEAN value of mean
--norm NORM value of normalize(scale value)
--images IMAGES path to calibration images
--output OUTPUT path to output calibration table file
--group GROUP enable the group scale(0:disable,1:enable,default:1)
--gpu GPU use gpu to forward(0:disable,1:enable,default:0)
python caffe-int8-convert-tool-dev-weight.py --proto=test/models/mobilenet_v1.prototxt --model=test/models/mobilenet_v1.caffemodel --mean 103.94 116.78 123.68 --norm=0.017 --images=test/images/ output=mobilenet_v1.table --group=1 --gpu=1
For example in MobileNet_v1_dev.table
conv1_param_0 0.0 3779.48337933 482.140562772 1696.53814502
conv2_1/dw_param_0 0 72.129143 149.919382 // the convdw layer's weight scale every group is 0.0 72.129 149.919 ......
......
conv1 49.466518
conv2_1/dw 123.720796 // the convdw layer's bottom blobchannel scale is 123.720
......
Three steps to implement the conv1 layer int8 convolution:
-
Quantize the bottom_blob and weight:
bottom_blob_int8 = bottom_blob_float32 * data_scale(49.466518) weight_int8 = weight_float32 * weight_scale(156.639840)
-
Convolution_Int8:
top_blob_int32 = bottom_blob_int8 * weight_int8
-
Dequantize the TopBlob_Int32 and add the bias:
top_blob_float32 = top_blob_int32 / [data_scale(49.466518) * weight_scale(156.639840)] + bias_float32
Type | Detail |
---|---|
Calibration Dataset | ILSVRC2012_img_test 1k |
Test Dataset | ILSVRC2012_img_val 5k |
Framework | ncnn |
Support Layer | Convolution,ConvolutionDepthwise,ReLU |
The following table show the Top1 and Top5 different between Float32 and Int8 inference.
Models | FP32 | INT8 | Loss | |||
---|---|---|---|---|---|---|
Top1 | Top5 | Top1 | Top5 | Diff Top1 | Diff Top5 | |
SqueezeNet v1.1 | 57.78% | 79.88% | 57.82% | 79.84% | +0.04% | -0.04% |
MobileNet v1 | 67.26% | 87.92% | 66.74% | 87.43% | -0.52% | -0.49% |
GoogleNet | 68.50% | 88.84% | 68.62% | 88.68% | +0.12% | -0.16% |
ResNet18 | 65.49% | 86.56% | 65.30% | 86.52% | -0.19% | -0.04% |
ResNet50 | 71.80% | 89.90% | 71.76% | 90.06% | -0.04% | +0.16% |
Type | Detail |
---|---|
Test Dataset | VOC2007 |
Unit | mAP (Class 20) |
Models | FP32 | INT8 | Loss |
---|---|---|---|
SqueezeNet SSD | 61.80 | 61.27 | -0.53 |
MobileNet_v1 SSD | 70.49 | 68.92 | -1.57 |
The following table show the speedup between Float32 and Int8 inference.It should be noted that the winograd algorithm is enable in the Float32 and Int8 inference.The Hardware Platform is Hisi3519(Cortex-A17@880MHz)
Uint(ms) | SqueezeNet v1.1 | MobileNet v1 | GoogleNet | ResNet18 | MobileNetv1 SSD | SqueezeNet SSD |
---|---|---|---|---|---|---|
Float32 | 282 | 490 | 1107 | 985 | 970 | 610 |
Int8 | 192 | 369 | 696 | 531 | 605 | 498 |
Ratio | x1.46 | x1.33 | x1.59 | x1.85 | x1.60 | x1.22 |
Runtime Memory : mbytes
Models | fp32-wino63 | int8-wino23 | int8-wino43 |
---|---|---|---|
squeezenet_v1_1 | 50 | 30 | 32 |
mobilenet_v1 | 61 | 35 | 35 |
mobilenet_v1_ssd | 90 | 45 | 45 |
squeezenet_v1_ssd | 210 | 70 | 94 |
resnet18 | 335 | 77 | 130 |
googlenet_v1 | 154 | 72 | 89 |
Storage Memory : mbytes
Models | fp32 | int8 |
---|---|---|
squeezenet_v1_1 | 4.71 | 1.20 |
mobilenet_v1 | 16.3 | 4.31 |
mobilenet_v1_ssd | 22.0 | 5.60 |
squeezenet_v1_ssd | 21.1 | 5.37 |
resnet18 | 44.6 | 11.2 |
googlenet_v1 | 26.6 | 6.72 |
Thanks to NVIDIA for providing the principle of correlation entropy,and ncnn's author nihui sharing his neural network inference framework.
Thanks to the help from the following friends:
Optimization Instructor : Fugangping, bruce.zhang
Algorithm : xupengfeixupf, JansonZhu, wangxinwei, lengmm
Python : daquexian
BSD 3 Clause