You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to propose an enhancement to the onnxruntime_genai package. Currently, while it is possible to configure execution providers via the extra_options parameter, there is no built-in support for directly integrating TensorRT through the TensorrtExecutionProvider to enable INT8 inference acceleration.
My request is to modify onnxruntime_genai so that it can directly integrate TensorRT via the TensorrtExecutionProvider. Specifically, the enhancement should:
Provide a built-in option to configure TensorRT EP parameters (for example, automatically setting options like "trt_int8_enable": "1") to facilitate INT8 inference acceleration.
Ensure that when this option is enabled, the necessary TensorRT EP configurations are applied without requiring manual intervention.
Update the documentation accordingly so that users can easily leverage TensorRT’s INT8 acceleration for their inference tasks.
Implementing this feature would simplify the process for users who need high-performance, low-precision inference and would greatly enhance the usability of onnxruntime_genai in production scenarios.
Thank you for your consideration and your ongoing work on this project.
The text was updated successfully, but these errors were encountered:
I would like to propose an enhancement to the onnxruntime_genai package. Currently, while it is possible to configure execution providers via the extra_options parameter, there is no built-in support for directly integrating TensorRT through the TensorrtExecutionProvider to enable INT8 inference acceleration.
My request is to modify onnxruntime_genai so that it can directly integrate TensorRT via the TensorrtExecutionProvider. Specifically, the enhancement should:
Provide a built-in option to configure TensorRT EP parameters (for example, automatically setting options like "trt_int8_enable": "1") to facilitate INT8 inference acceleration.
Ensure that when this option is enabled, the necessary TensorRT EP configurations are applied without requiring manual intervention.
Update the documentation accordingly so that users can easily leverage TensorRT’s INT8 acceleration for their inference tasks.
Implementing this feature would simplify the process for users who need high-performance, low-precision inference and would greatly enhance the usability of onnxruntime_genai in production scenarios.
Thank you for your consideration and your ongoing work on this project.
The text was updated successfully, but these errors were encountered: