Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting Per-Tensor Dynamic Range Using Python cause acc dropped #1165

Closed
maoxiaoming86 opened this issue Apr 1, 2021 · 6 comments
Closed
Labels
Performance Precision: INT8 triaged Issue has been triaged by maintainers

Comments

@maoxiaoming86
Copy link

Description

When I set Per-Tensor Dynamic Range Using Python, the int8-model's acc is very low. The amax is got from pytorch-quantization.
I try to disable some layers not to do int8 inference, and found that disable add layer's input, the acc can be up.

image

But the speed of new model is going tobe alot slower. And I found that , the log of building engine show that some layers rejectint int8 implementation.

image

How to solve this problem? Can anyone give me some advice? Thank you

Environment

TensorRT Version:
NVIDIA GPU:
NVIDIA Driver Version:
CUDA Version:
CUDNN Version:
Operating System:
Python Version (if applicable):
Tensorflow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if so, version):

Relevant Files

Steps To Reproduce

@ttyio
Copy link
Collaborator

ttyio commented Apr 6, 2021

Hello @maoxiaoming86 , Is this TRT 7.1? the highlighted conv kernel should run on INT8->FP32 precision. Could you take a try to upgrade to 7.2? thanks!

@ttyio ttyio added Precision: INT8 Release: 7.x Performance triaged Issue has been triaged by maintainers labels Apr 6, 2021
@maoxiaoming86
Copy link
Author

Hello @maoxiaoming86 , Is this TRT 7.1? the highlighted conv kernel should run on INT8->FP32 precision. Could you take a try to upgrade to 7.2? thanks!

My TRT is 7.1. Do you mean that when upgrade to 7.2, the backend of trt run the highlighted conv kernel on INT8->FP32 precision automatically?

@ttyio
Copy link
Collaborator

ttyio commented Apr 7, 2021

Hello @maoxiaoming86 ,
If you have

  • Q/DQ before the highlight conv
  • no Q/DQ after the bottom relu

this highlight conv should run in INT8 -> FP32 precision automaticly.

@maoxiaoming86
Copy link
Author

Hello @maoxiaoming86 ,
If you have

  • Q/DQ before the highlight conv
  • no Q/DQ after the bottom relu

this highlight conv should run in INT8 -> FP32 precision automaticly.

I don't have any Q/DQ,before build engine, I remove Q/DQ nodes in onnx, and save input's scale and weight's scale for each conv. When build engine, I first use saved scales to calculate input's amax and output's amax for each conv, then use set_dynamic_range to set amax. By this manner, how to break Conv+Add+Relu

@ttyio
Copy link
Collaborator

ttyio commented Apr 9, 2021

@maoxiaoming86
By break Conv+Add+Relu, do you mean the 2 highlight conv not fused with the successor add and relu?
From the graph I see they are using different input and different filter size, so we cannot horizontal merge the conv, then we also cannot fuse the add + relu.

could you point out the logs that how tensorRT now fuse the highlight graph? thanks

@ttyio
Copy link
Collaborator

ttyio commented May 21, 2021

Close since no activity for more than 3 weeks, please reopen if you still have question, thanks!

@ttyio ttyio closed this as completed May 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Precision: INT8 triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

2 participants