Skip to content

TensorRT-RTX Execution Provider for ONNXRuntime v1.23.2 (onnxruntime-trt-rtx)

Latest

Choose a tag to compare

@anskumar01 anskumar01 released this 02 Dec 12:29
458e1bb

onnxruntime-trt-rtx (ORT TRT RTX)

The NVIDIA TensorRT-RTX Execution Provider (EP) is an inference deployment solution designed specifically for NVIDIA RTX GPUs. It is optimized for client-centric use cases.

TensorRT RTX EP provides the following benefits:

  • Small package footprint: Optimized resource usage on end-user systems at just under 200 MB.
  • Faster model compile and load times: Leverages just-in-time compilation techniques, to build RTX hardware-optimized engines on end-user devices in seconds.
  • Portability: Seamlessly use cached models across multiple RTX GPUs.

The TensorRT RTX EP leverages NVIDIA’s new deep learning inference engine, TensorRT for RTX, to accelerate ONNX models on RTX GPUs

New Feature Support in ORT TRT RTX

  • Cuda Graph
  • EPContext
  • GPU I/O Binding
  • Optimized Memory Management
  • Reduced CPU overhead

For details on these features, performance gains, and how to integrate them, please refer to NV Tensor RT RTX EP documentation here

Getting Started

Run via Python

  1. Install ONNXRuntime with TensorRT-RTX support
pip install onnxruntime-trt-rtx
  1. Set up ONNXRuntime
import onnxruntime as ort
ort.preload_dlls() #Preload CUDA Runtime DLL
trt_rtx_provider_options = {<specify_options>}
session = ort.InferenceSession(model_path, providers=[('NvTensorRtRtxExecutionProvider', trt_rtx_provider_options)])
  1. Try out the python samples

Note:- Python versions 3.10 to 3.13 supported

Run via C++

  1. Download and extract ONNXRuntime TRT RTX SDK: onnxruntime-trt-rtx-win-x64-1.23.2.zip

  2. Sample Usage

Ort::Env env(ORT_LOGGING_LEVEL_WARNING, "SampleApp");
Ort::SessionOptions session_options;
session_options.AppendExecutionProvider(onnxruntime::kNvTensorRTRTXExecutionProvider, {});
Ort::Session session(env, model_path, session_options);
  1. Try out the C++ samples. Follow the steps here for build and execution

Note:- The core ONNXRuntime changes in this package are tested on NVIDIA GPUs only.

Build ONNXRuntime TRT RTX from Source

Pre-requisites

1. Clone the Git Repository

git clone https://github.com/microsoft/onnxruntime.git
cd onnxruntime

2. Build ONNXRuntime with TRT RTX EP

#Windows
.\build.bat --config Release --build_dir <build_path> --parallel --use_nv_tensorrt_rtx --tensorrt_rtx_home "path\to\tensorrt-rtx" --cuda_home "path\to\cuda\home" --cmake_generator "Visual Studio 17 2022" --build_shared_lib --skip_tests --build --build_wheel --wheel_name_suffix=trt-rtx --update --use_vcpkg --cmake_extra_defines "CMAKE_CUDA_ARCHITECTURES=80;86;89;120"

#Linux
./build.sh --config Release --build_dir <build_path> --parallel --use_nv_tensorrt_rtx --tensorrt_rtx_home "path/to/tensorrt-rtx" --cuda_home "path/to/cuda/home" --build_shared_lib --skip_tests --build --build_wheel --wheel_name_suffix=trt-rtx --update --cmake_extra_defines "CMAKE_CUDA_ARCHITECTURES=80;86;89;120"

3. Access the binaries and wheels from <build_path>/Release/Release

#Python
pip install <build_path>/Release/Release/dist/onnxruntime_trt_rtx*.whl

#C++
#Copy TRT RTX dlls and CUDA Runtime dll to working directory
#Run unit tests
.\build\Release\Release\onnxruntime_test_all.exe --gtest_filter=*NvExecutionProviderTest.*

More details: Build ORT from source

Contributions

Contributors to the release of Onnxruntime TRT RTX EP
@gaugarg-nv, @keshavv27, @gedoensmax, @anujj, @ishwar-raut1, @umangb-09, @thevishalagarwal, @anskumar01