onnxruntime-trt-rtx (ORT TRT RTX)
The NVIDIA TensorRT-RTX Execution Provider (EP) is an inference deployment solution designed specifically for NVIDIA RTX GPUs. It is optimized for client-centric use cases.
TensorRT RTX EP provides the following benefits:
- Small package footprint: Optimized resource usage on end-user systems at just under 200 MB.
- Faster model compile and load times: Leverages just-in-time compilation techniques, to build RTX hardware-optimized engines on end-user devices in seconds.
- Portability: Seamlessly use cached models across multiple RTX GPUs.
The TensorRT RTX EP leverages NVIDIA’s new deep learning inference engine, TensorRT for RTX, to accelerate ONNX models on RTX GPUs
New Feature Support in ORT TRT RTX
- Cuda Graph
- EPContext
- GPU I/O Binding
- Optimized Memory Management
- Reduced CPU overhead
For details on these features, performance gains, and how to integrate them, please refer to NV Tensor RT RTX EP documentation here
Getting Started
Run via Python
- Install ONNXRuntime with TensorRT-RTX support
pip install onnxruntime-trt-rtx- Set up ONNXRuntime
import onnxruntime as ort
ort.preload_dlls() #Preload CUDA Runtime DLL
trt_rtx_provider_options = {<specify_options>}
session = ort.InferenceSession(model_path, providers=[('NvTensorRtRtxExecutionProvider', trt_rtx_provider_options)])- Try out the python samples
Note:- Python versions 3.10 to 3.13 supported
Run via C++
-
Download and extract ONNXRuntime TRT RTX SDK: onnxruntime-trt-rtx-win-x64-1.23.2.zip
-
Sample Usage
Ort::Env env(ORT_LOGGING_LEVEL_WARNING, "SampleApp");
Ort::SessionOptions session_options;
session_options.AppendExecutionProvider(onnxruntime::kNvTensorRTRTXExecutionProvider, {});
Ort::Session session(env, model_path, session_options);- Try out the C++ samples. Follow the steps here for build and execution
Note:- The core ONNXRuntime changes in this package are tested on NVIDIA GPUs only.
Build ONNXRuntime TRT RTX from Source
Pre-requisites
- Install git, CMake (>=3.18), Python (>=3.10)
- Install latest NVIDIA driver
- Install CUDA toolkit 12.9
- Install TensorRT RTX 1.2
- For Windows only, install Visual Studio
1. Clone the Git Repository
git clone https://github.com/microsoft/onnxruntime.git
cd onnxruntime
2. Build ONNXRuntime with TRT RTX EP
#Windows
.\build.bat --config Release --build_dir <build_path> --parallel --use_nv_tensorrt_rtx --tensorrt_rtx_home "path\to\tensorrt-rtx" --cuda_home "path\to\cuda\home" --cmake_generator "Visual Studio 17 2022" --build_shared_lib --skip_tests --build --build_wheel --wheel_name_suffix=trt-rtx --update --use_vcpkg --cmake_extra_defines "CMAKE_CUDA_ARCHITECTURES=80;86;89;120"
#Linux
./build.sh --config Release --build_dir <build_path> --parallel --use_nv_tensorrt_rtx --tensorrt_rtx_home "path/to/tensorrt-rtx" --cuda_home "path/to/cuda/home" --build_shared_lib --skip_tests --build --build_wheel --wheel_name_suffix=trt-rtx --update --cmake_extra_defines "CMAKE_CUDA_ARCHITECTURES=80;86;89;120"3. Access the binaries and wheels from <build_path>/Release/Release
#Python
pip install <build_path>/Release/Release/dist/onnxruntime_trt_rtx*.whl
#C++
#Copy TRT RTX dlls and CUDA Runtime dll to working directory
#Run unit tests
.\build\Release\Release\onnxruntime_test_all.exe --gtest_filter=*NvExecutionProviderTest.*More details: Build ORT from source
Contributions
Contributors to the release of Onnxruntime TRT RTX EP
@gaugarg-nv, @keshavv27, @gedoensmax, @anujj, @ishwar-raut1, @umangb-09, @thevishalagarwal, @anskumar01