Skip to content

Can the outputs of pytorch and tensorrt be accurately aligned ? #3707

@lzcchl

Description

@lzcchl

In fact, I am currently working on a project to verify whether the outputs of pytorch and tensorrt can be accurately aligned.

I have tested both cases of conv and conv+bn, and the netron visualizations of the two models are shown in the following figure.

resnet50-part-conv
resnet50-part-cb

After testing, under model conv, the results of pytorch and tensorrt are completely consistent.

However, under model conv+bn, there is a difference of approximately 1e-6 in accuracy between the results of pytorch and tensorrt.

I suspect that Tensorrt did some optimization work, such as operator fusion. When building the Tensorrt engine, I added – builderOptimizationLevel=0, I think operator fusion may not be performed at this optimization level, but there is still a slight difference in accuracy between the final results of pytorch and tensorrt.

Also, I try "--builderOptimizationLevel=0 --noTF32 --precisionConstraints=obey --layerPrecisions=:fp32 --layerOutputTypes=:fp32" when build trt engine, but the difference still exist.

So, may I ask what is the internal reason for the difference in results, and is there any way to make this difference disappear ?

Environment
TensorRT Version: 8.6.1
GPU Type: GTX1060
Nvidia Driver Version: 545.29.06
CUDA Version: 12.1
CUDNN Version: 8.9.7
Operating System + Version: ubuntu20.04
Python Version (if applicable): python3.8
TensorFlow Version (if applicable): NO
PyTorch Version (if applicable): 2.1.2+cu121
Baremetal or Container (if container which image + tag): NO
Triton infer server : docker image r23.12

Relevant Files
my onnx(from pytorch) and trt models are here:
test_model.zip

Steps To Reproduce
The following is a screenshot of the source code for torchvision.resnet. I added a return to end the inference earlier.

2024-03-11 20-08-03

After entering a data for infer, you can also obtain that the result of “resnet50-part-conv” is the same, but there are differences in the result of “resnet50-part-cb”.

this is my code for consult, I have test torch.backends.cudnn.enabled / benchmark / deterministic, trt infer I use triton.

client-http.zip

Metadata

Metadata

Assignees

Labels

triagedIssue has been triaged by maintainers

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions