Setting Per-Tensor Dynamic Range Using Python cause acc dropped #1165

maoxiaoming86 · 2021-04-01T12:59:19Z

Description

When I set Per-Tensor Dynamic Range Using Python, the int8-model's acc is very low. The amax is got from pytorch-quantization.
I try to disable some layers not to do int8 inference, and found that disable add layer's input, the acc can be up.

But the speed of new model is going tobe alot slower. And I found that , the log of building engine show that some layers rejectint int8 implementation.

How to solve this problem? Can anyone give me some advice? Thank you

Environment

TensorRT Version:
NVIDIA GPU:
NVIDIA Driver Version:
CUDA Version:
CUDNN Version:
Operating System:
Python Version (if applicable):
Tensorflow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if so, version):

Relevant Files

Steps To Reproduce

ttyio · 2021-04-06T09:56:18Z

Hello @maoxiaoming86 , Is this TRT 7.1? the highlighted conv kernel should run on INT8->FP32 precision. Could you take a try to upgrade to 7.2? thanks!

maoxiaoming86 · 2021-04-07T02:40:41Z

Hello @maoxiaoming86 , Is this TRT 7.1? the highlighted conv kernel should run on INT8->FP32 precision. Could you take a try to upgrade to 7.2? thanks!

My TRT is 7.1. Do you mean that when upgrade to 7.2, the backend of trt run the highlighted conv kernel on INT8->FP32 precision automatically?

ttyio · 2021-04-07T02:56:01Z

Hello @maoxiaoming86 ,
If you have

Q/DQ before the highlight conv
no Q/DQ after the bottom relu

this highlight conv should run in INT8 -> FP32 precision automaticly.

maoxiaoming86 · 2021-04-09T02:33:31Z

Hello @maoxiaoming86 ,
If you have

Q/DQ before the highlight conv

no Q/DQ after the bottom relu

this highlight conv should run in INT8 -> FP32 precision automaticly.

I don't have any Q/DQ，before build engine, I remove Q/DQ nodes in onnx, and save input's scale and weight's scale for each conv. When build engine, I first use saved scales to calculate input's amax and output's amax for each conv, then use set_dynamic_range to set amax. By this manner, how to break Conv+Add+Relu

ttyio · 2021-04-09T02:48:20Z

@maoxiaoming86
By break Conv+Add+Relu, do you mean the 2 highlight conv not fused with the successor add and relu?
From the graph I see they are using different input and different filter size, so we cannot horizontal merge the conv, then we also cannot fuse the add + relu.

could you point out the logs that how tensorRT now fuse the highlight graph? thanks

ttyio · 2021-05-21T05:17:40Z

Close since no activity for more than 3 weeks, please reopen if you still have question, thanks!

ttyio added Precision: INT8 Release: 7.x Performance triaged Issue has been triaged by maintainers labels Apr 6, 2021

ttyio closed this as completed May 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Setting Per-Tensor Dynamic Range Using Python cause acc dropped #1165

Setting Per-Tensor Dynamic Range Using Python cause acc dropped #1165

maoxiaoming86 commented Apr 1, 2021

ttyio commented Apr 6, 2021 •

edited

Loading

maoxiaoming86 commented Apr 7, 2021

ttyio commented Apr 7, 2021 •

edited

Loading

maoxiaoming86 commented Apr 9, 2021

ttyio commented Apr 9, 2021

ttyio commented May 21, 2021

Setting Per-Tensor Dynamic Range Using Python cause acc dropped #1165

Setting Per-Tensor Dynamic Range Using Python cause acc dropped #1165

Comments

maoxiaoming86 commented Apr 1, 2021

Description

Environment

Relevant Files

Steps To Reproduce

ttyio commented Apr 6, 2021 • edited Loading

maoxiaoming86 commented Apr 7, 2021

ttyio commented Apr 7, 2021 • edited Loading

maoxiaoming86 commented Apr 9, 2021

ttyio commented Apr 9, 2021

ttyio commented May 21, 2021

ttyio commented Apr 6, 2021 •

edited

Loading

ttyio commented Apr 7, 2021 •

edited

Loading