INT8 wrong results for max batch size > 1

Hi, I have an issue using an INT8 quantized model fine-tuned with QAT. The fine-tuning procedure works correctly, and I have no issues in converting it to ONNX. The problem arises once I try to build the Engine from the ONNX. The build finishes without issues but I am able to run a successful inference only with a batch size set to 1.

If I try to increase the batch size, only the first batch will show a correct output, while all the others will have a wrong prediction.
I am also running the same identical C++ code both for the quantized model and for a non-quantized one with precision FP16 (quantized model and not quantized one are exported from the same training). The non-quantized model with precision FP16 works instead perfectly! Same goes for a model exported with FP32 precision.

In the figure below you can see a sample output of the INT8 and FP16 models.
INT8 model prediction, batch_size=2             |  FP16 model prediction, batch_size=2 
:-------------------------:|:-------------------------:
![](https://github.com/NVIDIA/TensorRT/assets/123370831/495d9455-f9c8-4c45-89bd-94c3761cec25)  |  ![](https://github.com/NVIDIA/TensorRT/assets/123370831/2d10f47e-d6f7-4d1e-b811-599fdac1a701)

•	I’m using TensorRT 8.6.1.6, CUDA v11.8 and Cudnn 8.9.0.131
•	I have checked the INT8 sample here:  [https://github.com/NVIDIA/TensorRT/blob/master/samples/sampleINT8/sampleINT8.cpp](https://github.com/NVIDIA/TensorRT/blob/master/samples/sampleINT8/sampleINT8.cpp%20), but I did not find any mistake in my implementation.
•	The quantized model is exported from the Python side, with the batch dimension set to “None” or “-1”.

**Note**: I tried not setting the batch dimension with the function “setBindingDimensions” just before the inference call (with “executeV2”). In this case also FP16 model shows the same issue.
Have you encountered this issue before? Thank you

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

INT8 wrong results for max batch size > 1 #3103

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

INT8 wrong results for max batch size > 1 #3103

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions