Skip to content

Request for Direct Integration of TensorRT Execution Provider for INT8 Inference Acceleration #1368

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Efan-Cui opened this issue Apr 2, 2025 · 0 comments

Comments

@Efan-Cui
Copy link

Efan-Cui commented Apr 2, 2025

I would like to propose an enhancement to the onnxruntime_genai package. Currently, while it is possible to configure execution providers via the extra_options parameter, there is no built-in support for directly integrating TensorRT through the TensorrtExecutionProvider to enable INT8 inference acceleration.

My request is to modify onnxruntime_genai so that it can directly integrate TensorRT via the TensorrtExecutionProvider. Specifically, the enhancement should:

Provide a built-in option to configure TensorRT EP parameters (for example, automatically setting options like "trt_int8_enable": "1") to facilitate INT8 inference acceleration.

Ensure that when this option is enabled, the necessary TensorRT EP configurations are applied without requiring manual intervention.

Update the documentation accordingly so that users can easily leverage TensorRT’s INT8 acceleration for their inference tasks.

Implementing this feature would simplify the process for users who need high-performance, low-precision inference and would greatly enhance the usability of onnxruntime_genai in production scenarios.

Thank you for your consideration and your ongoing work on this project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant