Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where can I found the generation scripts of quantized ssd models ? #791

Closed
TriLoo opened this issue Jun 3, 2019 · 28 comments

Comments

Projects
None yet
3 participants
@TriLoo
Copy link

commented Jun 3, 2019

I found that the gluoncv/model_zoo/quantized/quantized.py only download the already quantized ssd models, like ssd_512_mobilenet1.0_voc_int8-symbol.json or ssd_300_vgg16_atrous_voc_int8-symbol.json. Now I want to generate my own ssd model implemented based on Gluon-CV, and i am not sure the contents of ecluded_sym_names, any advise ?

@pengzhao-intel

This comment has been minimized.

Copy link

commented Jun 3, 2019

In the next release, we will improve the script to make quantization flow more convenience :)
Did you try already quantized model and how much perf you can gain from INT8?

@xinyu-intel can help you for the customized model

@TriLoo

This comment has been minimized.

Copy link
Author

commented Jun 3, 2019

@pengzhao-intel
Thanks for reply.

I just resolved my problem and obtained the quantized model (quantization mode: entropy, INT8). However, the time used by quantized model is much more than non-quantized model using MKLDNN_QUANTIZE as backend.

During the quantization, only conv layers were quantized and all other layers, including flatten, concat,
etc., were put into excluded_sym_names parameter. Now, I am working on the latency problem, though not sure how to fix it yet. Generally, mobilent ssd 300, the latency between quantized and non-quantized models is: 33.4 vs 6.7 (unit: s, average value of 100 iters).

@xinyu-intel

This comment has been minimized.

Copy link
Collaborator

commented Jun 4, 2019

@TriLoo can you please give more details of SW&HW like CPU, mxnet version, gluoncv version, etc. BTW, is throughput of your model ok?

@TriLoo

This comment has been minimized.

Copy link
Author

commented Jun 4, 2019

@xinyu-intel
Sure.
CPU: Intel Xeon CPU E5-2650 v4 @2.2GHz,
OS: CentOS 7.2.1511
MXNet: 1.5.0, git commit head: 5fc4fc53df74f
GluonCV: 0.4.0.post0 (pip installed)
MKLDNN: v0.19.0, git hash: 41bee20d7eb4a
MKL: 2019.0 Update 5
Python: Anaconda python 3.6.8
Environment Variable: OMP_NUM_THREADS=1, MXNET_SUBGRAPH_BACKEND=MKLDNN

I didn't test the throughput of quantized model yet. Note that, only one thread (OMP_NUM_THREADS=1) is allowed in our applications, and I found the when unset OMP_NUM_THREADS, the latency of quantized and non-quantized models get very similar, no speedup found from quantization one.

In fact, my model is very similar to ssd_mobilenet1.0_512 from GluonCV, except that BatchNorm is fused into convolution layers and bias was added (Verified, correct). I can provide my json and params files to you to reproduce the above speedup results.

Thanks for reply.

@xinyu-intel

This comment has been minimized.

Copy link
Collaborator

commented Jun 4, 2019

@TriLoo Thanks for your info, special instructions like avx512-bw or VNNI are needed to speedup quantized models. Can you try a skylake machine?

@TriLoo

This comment has been minimized.

Copy link
Author

commented Jun 4, 2019

@xinyu-intel
Sorry, only Xeon E5 2650v4 I have, which micro architecture is broadwell. I am not sure if I should install other libraries, like Intel System Studio ?

@xinyu-intel

This comment has been minimized.

Copy link
Collaborator

commented Jun 4, 2019

@TriLoo Broadwell Xeon will not bring good speedup for quantized models but the accuracy is good for reference.

@TriLoo

This comment has been minimized.

Copy link
Author

commented Jun 4, 2019

However, I am not sure why my quantized model can be 5 times slower than non-quantization model. Could you please give me some advices? I can send my scripts to you. @xinyu-intel

@xinyu-intel

This comment has been minimized.

Copy link
Collaborator

commented Jun 4, 2019

maybe some int8 kernel haven't be optimized on avx2 machine, which will be slower than fp32 ones.

@TriLoo

This comment has been minimized.

Copy link
Author

commented Jun 4, 2019

So it is possible that 5 times larger latency ? How can I verify this ? thank you very much.

@TriLoo

This comment has been minimized.

Copy link
Author

commented Jun 4, 2019

What's more, the ssd_vgg16_reudced example from incubator-mxnet/example/ssd/ run time is as below:

  • Non-quantized model : 1.061404 img/sec
  • Quantized model : 0.558333 img/sec
@xinyu-intel

This comment has been minimized.

Copy link
Collaborator

commented Jun 4, 2019

If you want to get better speedup, you need to launch the quantized inference on a xeon skylake machine.

@TriLoo

This comment has been minimized.

Copy link
Author

commented Jun 4, 2019

Thanks for your advise.

Now, I'm working on a new machine, which is equipped with Xeon Silver 4114 CPU (Skylake microarchitecture). If I have the results, I will paste here.

@TriLoo

This comment has been minimized.

Copy link
Author

commented Jun 4, 2019

I just test the quantized model incubator-mxnet/example/ssd/, although all CPU is busy, the results are shown below:

  • Non quantized: 0.508 img/sec
  • Quantized : ~ 0.955 img/sec
@TriLoo

This comment has been minimized.

Copy link
Author

commented Jun 4, 2019

@xinyu-intel

You can close this issue. I have worked through my program, and finally got about 2.3x speedup on Xeon Siliver 4114 though all cpus are very busy.

Looks like concat or flatten or some any another one layers can raise a segment fault even through MKLDNN support INT8 concat, flatten layers' inference, so all concat, flatten layers should be excluded during quantization.

@pengzhao-intel

This comment has been minimized.

Copy link

commented Jun 4, 2019

@TriLoo Very cool! It's great to hear you get the nice speedup on SKX.

It should be fixed if there's a segment fault.
Would you mind provide more details for us?

@TriLoo

This comment has been minimized.

Copy link
Author

commented Jun 4, 2019

Sure, what details do you need ? I can provide two quantized json file, one of which is correct, and the other can raise segment fault. In the latter one, I only excluded layers endwith _weight, _bias, _data, _workspace, because I think that the quantization API can select those layers which are supported by MKLDNN Quantization INT8 inference automatically.

I have not figure out the cause of Segment Fault, but after I excluded all flatten, concat, reshape, element_sub, element_add layers, the problem was fixed.

@xinyu-intel

This comment has been minimized.

Copy link
Collaborator

commented Jun 5, 2019

@TriLoo Try to exclude flatten only to see if segmentfault exists.

@TriLoo

This comment has been minimized.

Copy link
Author

commented Jun 5, 2019

OK, I will try it later, I am working on the mAP of quantization model now. @xinyu-intel

@TriLoo

This comment has been minimized.

Copy link
Author

commented Jun 6, 2019

I just finished the quantization of mobilenet SSD based on GluonCV and mxnet. The segment falut reproduction experiments are listed as below:

quantization mode: none or entropy
input dtype: fp32
output dtype: int8
platform: Xeon Siliver 4114, CentOS 6.9
MXNet: 5fc4fc53df7
GluonCV: 0.4.0.post0
MKLDNN: v0.19.0
MKL: 2018.0 Update 1 or 2019 Update 5

  • Only fatten layers are excluded: can raise Segment Fault
  • Only concat layers are excluded: can raise Segment Fault
  • Both flatten and concat layers are excluded: No Segment Fault

My final quantization results are:
Speedup:~2.3x
mAP: decrease about 0.12 % only

One thing:
The DetRecordIter class in incubator-mxnet/example/ssd/dataset/iterator.py can be confused with gluoncv.data.RecordFileDetection class. Because the former did not divide the data by std values (only subtract the input image by mean valuse), and this can lead to a bad mAP if one use RecordFileDetection in model training, and use DetRecordIter to load data when quantizing model.

One thing more:
I found Intel OpenVNIO support AVX2, so my question is: add avx2 int8 support to mxnet or use Intel OpenVNIO, which one is eaiser and more efficient? Because some layers types used in GluonCV are not supported by Intel OpenVNIO yet, like great_op. Any advise ?

@pengzhao-intel @xinyu-intel

@pengzhao-intel

This comment has been minimized.

Copy link

commented Jun 6, 2019

@TriLoo really great results and thanks to pointing out the potential issues in our flow.
We will fix the bug and continuously improve the usability in GluonCV.

Regarding avx2, YES, OpenVINO has the supports at now, as below link.
https://www.intel.ai/introducing-int8-quantization-for-fast-cpu-inference-using-openvino/#gs.h4qann

If you're using desktop CPU, you can try to run model by OpenVINO too.

@TriLoo

This comment has been minimized.

Copy link
Author

commented Jun 6, 2019

Ok, thanks for your advise, @pengzhao-intel .

I will try to add some custom layers to OpenVINO to run my INT8 Mobilenet SSD 300 on Xeon E5-2650v4.

@TriLoo TriLoo closed this Jun 6, 2019

@pengzhao-intel

This comment has been minimized.

Copy link

commented Jun 8, 2019

@TriLoo Could you let us know when you get the results by OpenVINO?

@TriLoo

This comment has been minimized.

Copy link
Author

commented Jun 8, 2019

@pengzhao-intel
Sure, I found one drawback of GluonCV SSD implementation that the example re-implemented the MultiBoxPrior and MultiBoxDetection, leading to OpenVINO can not understand some layers, but OpenVNIO can handle MultiBoxPrior and MultiBoxDetection very well.

I added some custom layers to OpenVNIO based on this tutorial: Add custom mxnet layer to OpenVNIO. However, I can not find enough docs to work through with OpenVNIO. Now, I changed the implementation of GluonCV SSD detection head part to MultiBoxPrior and MultiBoxDetection and just get the OpenVNIO xml and .bin files. I have not ran the test of these two files (next Monday might have a results). I am not sure about whether I should finetune my model because of the changes of implementation. Can you give me some advises to implement MXNet custom layers within OpenVNIO ?

I am also learning the C++ source code of MXNet to find the possibility of implement INT8 kernel using AVX2. My main purpose is to obtain a good inference speed on Skylake and Broadwell microarchitecture CPUs.

@pengzhao-intel

This comment has been minimized.

Copy link

commented Jun 11, 2019

Thanks, the GluonCV or framework will be more flexible while OpenVINO is based on a group of fixed model and OPs.
Thus, there're some gaps between them. We're also thinking about the combination of them together, but in the brainstorming stage :) The real experience from the user is very important for us.

@TriLoo

This comment has been minimized.

Copy link
Author

commented Jun 11, 2019

@pengzhao-intel
Ok, I have successfully convert the gluoncv implemented SSD model to the IR representation (*.bin, *.xml), and ran the converted models using OpenVINO after correctly add cpu plugins.

The biggest problem in my case is that: Some GluonCV models have re-implemented the MultiBoxDetection and MultiBoxPrior, I do not think this is a good idea to the third party inference speedup tools, like OpenVINO or TVM etc. Because it means developers have to add more custom layers to these un-uniform APIs although then did same things.

My enviroments are:

  • Platform: Xeon E5-2650v4
  • Data Type: FP32, (cpu plugins not support FP16)
  • CPU Plugin used: libcpu_extension_avx2.so

Results (Not quantized):

  • Normal MXNet model: 65 +- 1 ms
  • Models in OpenVINO : ~58 ms

I believe the mAP would not drop much, because no quantization is used yet, but still FP32.

@TriLoo

This comment has been minimized.

Copy link
Author

commented Jun 12, 2019

I updated my results.

@pengzhao-intel

@pengzhao-intel

This comment has been minimized.

Copy link

commented Jun 12, 2019

Thanks, looks good :) It's slight better than MXNet.
Do you try quantization from OpenVINO?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.