Skip to content

v0.3.0

Latest

Choose a tag to compare

@umangb-09 umangb-09 released this 09 Jun 16:36

Wheel packaging, ORT API negotiation, weight streaming, and memory/quantization improvements

  • Add Python wheel packaging: meta-package plus per-CUDA variants (cu12, cu13),
    Linux SONAME/symlink handling, and PyPI READMEs
  • Negotiate ORT API version with the host so one DLL serves ONNX Runtime 1.24-1.26+
  • Add TensorRT-RTX weight streaming budget support; auto-disable CUDA Graphs when
    weight streaming is enabled
  • Improve memory handling: auto-fallback from CUDA async mempool to sync arena and
    add arena Shrink() to release unused regions
  • Add policy-driven Q/DQ lowering for asymmetric quantization
  • Port the EP ABI to C++20 and add Windows-on-Arm cross-compile via vcpkg
  • Fix UTF-8 handling for non-ASCII cache paths, capability-discovery and EP-context
    crashes, EPContext external engine resolution, and CUDA 13.x build breaks

Contributors to this release of TensorRT RTX EP ABI:
@keshavv27, @gedoensmax, @anujj, @ishwar-raut1, @umangb-09, @nitthilan, @yen-shi, @praneshgo, @wenbingl