-
Notifications
You must be signed in to change notification settings - Fork 2.2k
[Performance]: is the performance of the PyTorch backend on par with the previous engine compilation setup? #8564
Copy link
Copy link
Closed
Labels
General perf<NV>Broad performance issues not specific to a particular component<NV>Broad performance issues not specific to a particular componentInvestigatingPerformanceTRTLLM model inference speed, throughput, efficiency. Latency, benchmarks, regressions, opts.TRTLLM model inference speed, throughput, efficiency. Latency, benchmarks, regressions, opts.Pytorch<NV>Pytorch backend related issues<NV>Pytorch backend related issues
Description
Proposal to improve performance
I do not think I really understand the move to PyTorch backend, it is said to be the default now. Does this mean the engine compilation is going away?
Also, does the torch backend have similar performance to the previous engine building steps?
Report of performance regression
No response
Misc discussion on performance
No response
Your current environment (if you think it is necessary)
System Information:
- OS:
- Python version:
- CUDA version:
- GPU model(s):
- Driver version:
- TensorRT version:
- PyTorch version:
- TensorRT-LLM version:
Detailed output:
Paste the output of the above commands here
Before submitting a new issue...
- Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.
Reactions are currently unavailable
Metadata
Metadata
Labels
General perf<NV>Broad performance issues not specific to a particular component<NV>Broad performance issues not specific to a particular componentInvestigatingPerformanceTRTLLM model inference speed, throughput, efficiency. Latency, benchmarks, regressions, opts.TRTLLM model inference speed, throughput, efficiency. Latency, benchmarks, regressions, opts.Pytorch<NV>Pytorch backend related issues<NV>Pytorch backend related issues