Skip to content

[Performance]: is the performance of the PyTorch backend on par with the previous engine compilation setup? #8564

@JoJoLev

Description

@JoJoLev

Proposal to improve performance

I do not think I really understand the move to PyTorch backend, it is said to be the default now. Does this mean the engine compilation is going away?
Also, does the torch backend have similar performance to the previous engine building steps?

Report of performance regression

No response

Misc discussion on performance

No response

Your current environment (if you think it is necessary)

System Information:

  • OS:
  • Python version:
  • CUDA version:
  • GPU model(s):
  • Driver version:
  • TensorRT version:
  • PyTorch version:
  • TensorRT-LLM version:

Detailed output:

Paste the output of the above commands here

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.

Metadata

Metadata

Assignees

Labels

General perf<NV>Broad performance issues not specific to a particular componentInvestigatingPerformanceTRTLLM model inference speed, throughput, efficiency. Latency, benchmarks, regressions, opts.Pytorch<NV>Pytorch backend related issues

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions