[Performance]:  is the performance of the PyTorch backend on par with the previous engine compilation setup?

### Proposal to improve performance

I do not think I really understand the move to PyTorch backend, it is said to be the default now. Does this mean the engine compilation is going away?
Also, does the torch backend have similar performance to the previous engine building steps?

### Report of performance regression

_No response_

### Misc discussion on performance

_No response_

### Your current environment (if you think it is necessary)

**System Information:**
- OS:
- Python version:
- CUDA version:
- GPU model(s):
- Driver version:
- TensorRT version:
- PyTorch version:
- TensorRT-LLM version:

**Detailed output:**
```text
Paste the output of the above commands here
```


### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and checked the [documentation](https://nvidia.github.io/TensorRT-LLM/) and [examples](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples) for answers to frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance]: is the performance of the PyTorch backend on par with the previous engine compilation setup? #8564

Proposal to improve performance

Report of performance regression

Misc discussion on performance

Your current environment (if you think it is necessary)

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Performance]: is the performance of the PyTorch backend on par with the previous engine compilation setup? #8564

Description

Proposal to improve performance

Report of performance regression

Misc discussion on performance

Your current environment (if you think it is necessary)

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions