-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
slower inference speed of TensorRT 10.0 on GPU Tesla T4 #3896
Comments
You can compare the layer info time-profile and fusion tactic. |
I can't get performance profile because there are some errors, while execute trtexec with --dumpProfile. But it's ok compiled with tensorrt8.6. Could this be related to the slower inference speed? How can I find out the reason? Thank you very much. |
Thanks, I can repro the issue and filed internal bug 4672320 to track this. |
You can try add |
Thank you, that's helpful! |
hi, is there a profiler you can run for triton inference server? |
https://aws.amazon.com/blogs/machine-learning/host-ml-models-on-amazon-sagemaker-using-triton-onnx-models/
From: geraldstanje ***@***.***>
Date: Sunday, June 9, 2024 at 5:06 PM
To: NVIDIA/TensorRT ***@***.***>
Cc: Gerald Stanje (gstanje) ***@***.***>, Manual ***@***.***>
Subject: Re: [NVIDIA/TensorRT] slower inference speed of TensorRT 10.0 on GPU Tesla T4 (Issue #3896)
hi, is there a profiler you can run for triton inference server?
—
Reply to this email directly, view it on GitHub<#3896 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BGZOXMORNGNFLMTOQX5MYW3ZGS7UZAVCNFSM6AAAAABIHJOUOGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJWG44DSNRTGU>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
no, I haven't used triton inference server before. |
We are actively investigating this issue. Meanwhile, you can work around this regression by setting the optimization level in builder config to 5 or adding |
Fixed in TRT 10.3, closed. |
Description
I convert nafnet from onnx to tensorrt on Tesla T4 with TensorRT 10.0. However, the inference speed is much slower than engine converted from TensorRT 8.6.
TensorRT 10.0:
[05/24/2024-14:43:21] [I] === Trace details ===
[05/24/2024-14:43:21] [I] Trace averages of 10 runs:
[05/24/2024-14:43:21] [I] Average on 10 runs - GPU latency: 539.803 ms - Host latency: 546.901 ms (enqueue 4.92217 ms)
[05/24/2024-14:43:21] [I]
[05/24/2024-14:43:21] [I] === Performance summary ===
[05/24/2024-14:43:21] [I] Throughput: 1.64966 qps
[05/24/2024-14:43:21] [I] Latency: min = 542.295 ms, max = 550.235 ms, mean = 546.901 ms, median = 546.891 ms, percentile(90%) = 550.032 ms, percentile(95%) = 550.235 ms, percentile(99%) = 550.235 ms
[05/24/2024-14:43:21] [I] Enqueue Time: min = 3.92992 ms, max = 5.48389 ms, mean = 4.92217 ms, median = 5.14417 ms, percentile(90%) = 5.33893 ms, percentile(95%) = 5.48389 ms, percentile(99%) = 5.48389 ms
[05/24/2024-14:43:21] [I] H2D Latency: min = 3.60913 ms, max = 4.47997 ms, mean = 3.70715 ms, median = 3.62408 ms, percentile(90%) = 3.63037 ms, percentile(95%) = 4.47997 ms, percentile(99%) = 4.47997 ms
[05/24/2024-14:43:21] [I] GPU Compute Time: min = 535.282 ms, max = 543.216 ms, mean = 539.803 ms, median = 539.882 ms, percentile(90%) = 543.027 ms, percentile(95%) = 543.216 ms, percentile(99%) = 543.216 ms
[05/24/2024-14:43:21] [I] D2H Latency: min = 3.38086 ms, max = 3.40747 ms, mean = 3.3907 ms, median = 3.38916 ms, percentile(90%) = 3.39551 ms, percentile(95%) = 3.40747 ms, percentile(99%) = 3.40747 ms
[05/24/2024-14:43:21] [I] Total Host Walltime: 6.06185 s
[05/24/2024-14:43:21] [I] Total GPU Compute Time: 5.39803 s
[05/24/2024-14:43:21] [I] Explanations of the performance metrics are printed in the verbose logs.
[05/24/2024-14:43:21] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v100001] # ./trtexec --loadEngine=nafnetcc75_t4_float32_v10.trtmodel --shapes=input:1x1920x1920x3 --device=3
TensorRT 8.6:
[05/24/2024-14:44:43] [I] === Trace details ===
[05/24/2024-14:44:43] [I] Trace averages of 10 runs:
[05/24/2024-14:44:43] [I] Average on 10 runs - GPU latency: 143.531 ms - Host latency: 150.62 ms (enqueue 4.77478 ms)
[05/24/2024-14:44:43] [I] Average on 10 runs - GPU latency: 141.829 ms - Host latency: 148.839 ms (enqueue 5.34015 ms)
[05/24/2024-14:44:43] [I]
[05/24/2024-14:44:43] [I] === Performance summary ===
[05/24/2024-14:44:43] [I] Throughput: 6.59775 qps
[05/24/2024-14:44:43] [I] Latency: min = 147.611 ms, max = 165.985 ms, mean = 149.754 ms, median = 148.669 ms, percentile(90%) = 151.169 ms, percentile(95%) = 151.494 ms, percentile(99%) = 165.985 ms
[05/24/2024-14:44:43] [I] Enqueue Time: min = 2.2744 ms, max = 5.82202 ms, mean = 5.09928 ms, median = 5.2124 ms, percentile(90%) = 5.76062 ms, percentile(95%) = 5.77234 ms, percentile(99%) = 5.82202 ms
[05/24/2024-14:44:43] [I] H2D Latency: min = 3.60007 ms, max = 4.53885 ms, mean = 3.65205 ms, median = 3.61035 ms, percentile(90%) = 3.63367 ms, percentile(95%) = 3.63477 ms, percentile(99%) = 4.53885 ms
[05/24/2024-14:44:43] [I] GPU Compute Time: min = 140.629 ms, max = 158.058 ms, mean = 142.711 ms, median = 141.668 ms, percentile(90%) = 144.174 ms, percentile(95%) = 144.487 ms, percentile(99%) = 158.058 ms
[05/24/2024-14:44:43] [I] D2H Latency: min = 3.38074 ms, max = 3.40759 ms, mean = 3.3908 ms, median = 3.38867 ms, percentile(90%) = 3.40186 ms, percentile(95%) = 3.40405 ms, percentile(99%) = 3.40759 ms
[05/24/2024-14:44:43] [I] Total Host Walltime: 3.48604 s
[05/24/2024-14:44:43] [I] Total GPU Compute Time: 3.28235 s
[05/24/2024-14:44:43] [W] * GPU compute time is unstable, with coefficient of variance = 2.41332%.
[05/24/2024-14:44:43] [W] If not already in use, locking GPU clock frequency or adding --useSpinWait may improve the stability.
[05/24/2024-14:44:43] [I] Explanations of the performance metrics are printed in the verbose logs.
[05/24/2024-14:44:43] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8601] # ./trtexec --loadEngine=nafnetcc75_t4_float32_v86.trtmodel --shapes=input:1x1920x1920x3 --device=3
detail log:
trt10.log
trt8.6.log
Environment
TensorRT Version:10.0
NVIDIA GPU:Telsa T4
NVIDIA Driver Version:
CUDA Version:
CUDNN Version:
Operating System:
Python Version (if applicable):
Tensorflow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if so, version):
Relevant Files
Model link:
onnx.zip
trt10.zip
trt86.zip
Steps To Reproduce
./trtexec --onnx=color_consistency_nafnet.onnx --saveEngine=nafnetcc75_t4_float32_v10.trtmodel --inputIOFormats=fp32:chw --outputIOFormats=fp32:chw --device=3 --minShapes=input:1x64x64x3 --optShapes=input:1x1024x1024x3 --maxShapes=input:1x1920x1920x3
./trtexec --loadEngine=nafnetcc75_t4_float32_v10.trtmodel --shapes=input:1x1920x1920x3 --device=3
Commands or scripts:
Have you tried the latest release?:
Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (
polygraphy run <model.onnx> --onnxrt
):The text was updated successfully, but these errors were encountered: