Release v0.3.0 · NVIDIA/TensorRT-RTX-EP-ABI

Wheel packaging, ORT API negotiation, weight streaming, and memory/quantization improvements

Add Python wheel packaging: meta-package plus per-CUDA variants (cu12, cu13),
Linux SONAME/symlink handling, and PyPI READMEs
Negotiate ORT API version with the host so one DLL serves ONNX Runtime 1.24-1.26+
Add TensorRT-RTX weight streaming budget support; auto-disable CUDA Graphs when
weight streaming is enabled
Improve memory handling: auto-fallback from CUDA async mempool to sync arena and
add arena Shrink() to release unused regions
Add policy-driven Q/DQ lowering for asymmetric quantization
Port the EP ABI to C++20 and add Windows-on-Arm cross-compile via vcpkg
Fix UTF-8 handling for non-ASCII cache paths, capability-discovery and EP-context
crashes, EPContext external engine resolution, and CUDA 13.x build breaks

Provide feedback