Skip to content

Inconsistent result of INT8 inferencing in TensorRT 8.6 when running inference on RTX 3070 or 3090Ti #3009

@cezbloch

Description

@cezbloch

Description

I am running inference on a model that I have build from python code through TensorRT API. All the layers are forced to run convolution in INT8 precision.
The minimalistic networks that reproduces the problem consists of just one convolutional layer and looks like this:

  • INT8 input tensor
  • DQ layer (gets removed)
  • INT8 convolution - I have initialized the weights using QDQ layer, scales are FP32
  • Q layer
  • output is INT8 tensor

I have written a bunch of unit tests that validate various scenarios and almost everything works as expected. However there is an issue with rounding values of 0.5. According to the documentation TensorRT developers guide the roundWithTiesToEven is used.

Now depending on the kernel size, the accumulated and scaled value of 0.5 gets rounded to 1 or to 0.
0.5 here is the sum of Hadamard product returned by convolution and divided by scale passed Q layer creation - in network.add_quantize.

Environment

TensorRT Version: 8.6.0

NVIDIA GPU: RTX 3090Ti (desktop), RTX 3070 (laptop), GTX 1050 (laptop)

NVIDIA Driver Version: 525.105.17

CUDA Version: 11.7

CUDNN Version: 8.9

Operating System: Ubuntu 20.04

Python Version (if applicable): 3.8

Tensorflow Version (if applicable): TF 2.11.0

Relevant Files

Look here for TRT builder output for the built engine.

Steps To Reproduce

  • Create an INT8 input tensor and fill it with value 126 - eg. NCHW - 1x128x8x8 - all values are 126
  • create an identity kernel that will just copy the input tensor value - eg. RSCK 5x5x128x1. Fill the kernel with zeros and set one of the values to 1 (like the middle of a kernel) - the idea is that the input value will pass through the convolution unchanged.
  • set scales of Q layer to 252.0

126 divided by 252.0 is 0.5, this should be rounded to 0 but is rounded to 0 or 1 depending on which tactic is picked up during built (or at least that is what it seems like). Expected value is 0.

When kernel size is 3x3x128x1 the value of 0 is returned.
Tactic:"sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize32x32x64_stage6_warpsize2x1x1_g1_tensor16x8x32_t1r3s3"

When kernel size is 5x5x128x1 the value of 1 is returned.
Tactic:"sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize32x32x64_stage6_warpsize2x1x1_g1_tensor16x8x32_t1r5s5"

When kernel size is 7x7x128x1 the value of 1 is returned
Tactic"sm80_xmma_fprop_implicit_gemm_interleaved_indexed_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize32x64x64_stage6_warpsize2x2x1_g1_tensor16x8x32"

This may seem like a small difference but we need the values in the output tensor to always exactly match expected result for particular input.

  • What is the correct rounding?
  • Any ideas why some tactics return 0 and some other 1?
  • how to make this behavior consistent?

When I run layers with the same inputs and weights (both restricted to INT8 values) in tensorflow through nn.conv2d layer with FP32 precision and then scale and round in the code I get correct values of 0.

Metadata

Metadata

Assignees

Labels

triagedIssue has been triaged by maintainers

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions