Wheel packaging, ORT API negotiation, weight streaming, and memory/quantization improvements
- Add Python wheel packaging: meta-package plus per-CUDA variants (cu12, cu13),
Linux SONAME/symlink handling, and PyPI READMEs - Negotiate ORT API version with the host so one DLL serves ONNX Runtime 1.24-1.26+
- Add TensorRT-RTX weight streaming budget support; auto-disable CUDA Graphs when
weight streaming is enabled - Improve memory handling: auto-fallback from CUDA async mempool to sync arena and
add arena Shrink() to release unused regions - Add policy-driven Q/DQ lowering for asymmetric quantization
- Port the EP ABI to C++20 and add Windows-on-Arm cross-compile via vcpkg
- Fix UTF-8 handling for non-ASCII cache paths, capability-discovery and EP-context
crashes, EPContext external engine resolution, and CUDA 13.x build breaks
Contributors to this release of TensorRT RTX EP ABI:
@keshavv27, @gedoensmax, @anujj, @ishwar-raut1, @umangb-09, @nitthilan, @yen-shi, @praneshgo, @wenbingl