Request for Direct Integration of TensorRT Execution Provider for INT8 Inference Acceleration #1368

Efan-Cui · 2025-04-02T08:41:45Z

I would like to propose an enhancement to the onnxruntime_genai package. Currently, while it is possible to configure execution providers via the extra_options parameter, there is no built-in support for directly integrating TensorRT through the TensorrtExecutionProvider to enable INT8 inference acceleration.

My request is to modify onnxruntime_genai so that it can directly integrate TensorRT via the TensorrtExecutionProvider. Specifically, the enhancement should:

Provide a built-in option to configure TensorRT EP parameters (for example, automatically setting options like "trt_int8_enable": "1") to facilitate INT8 inference acceleration.

Ensure that when this option is enabled, the necessary TensorRT EP configurations are applied without requiring manual intervention.

Update the documentation accordingly so that users can easily leverage TensorRT’s INT8 acceleration for their inference tasks.

Implementing this feature would simplify the process for users who need high-performance, low-precision inference and would greatly enhance the usability of onnxruntime_genai in production scenarios.

Thank you for your consideration and your ongoing work on this project.

microsoft-github-policy-service bot added the ep:TensorRT label Apr 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Request for Direct Integration of TensorRT Execution Provider for INT8 Inference Acceleration #1368

Request for Direct Integration of TensorRT Execution Provider for INT8 Inference Acceleration #1368

Efan-Cui commented Apr 2, 2025

Request for Direct Integration of TensorRT Execution Provider for INT8 Inference Acceleration #1368

Request for Direct Integration of TensorRT Execution Provider for INT8 Inference Acceleration #1368

Comments

Efan-Cui commented Apr 2, 2025