Skip to content

Onnxruntime and TensorRT inference time #1284

@callbarian

Description

@callbarian

Description

TensorRT runs slower than cuda on c# environment.

model: resnet34 pretrained from pytorch.
input: 126,3,96,64 for 72 loops

1st conv1 and last fc layer modified.

Python
TensorRT on ubuntu runs faster than onnxruntime gpu.

C#
Onnxruntime built from source with --use_tensorrt
tensorrt runs slower than cuda. It is very slow that the difference is 9 times slower.

Environment

TensorRT Version: 7.2.2.3, 7.2.3.4
NVIDIA GPU: RTX 2080 Ti
NVIDIA Driver Version: 461.92
CUDA Version: 11.1, 11.2
CUDNN Version: 8.0. , 8.1., 8.2,
Operating System: Windows 10 x64
Python Version (if applicable): 3.6
Tensorflow Version (if applicable):
PyTorch Version (if applicable): 1.7.0
Baremetal or Container (if so, version):

The combination was
TensorRT 7.2.2.3, cudnn 8.0. cuda 11.1
TensorRT 7.2.2.3, cudnn 8.0. cuda 11.2
TensorRT 7.2.3.4, cudnn 8.1. cuda 11.2
TensorRT 7.2.3.4, cudnn 8.2. cuda 11.2

all of which had a slow inference time running with TensorRT

Relevant Files

tensorrt2
This is Nsight System on TensorRT inference. I am not sure what causes such delays before the full operation at the end.

cuda
This is Nsight System on Cuda inference. Unlike TensorRT, there is very little delay before the full operation at the end

Steps To Reproduce

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions