1.3.0
Summary
NIXL 1.3.0 expands platform reach and backend capabilities. It adds AMD ROCm/HIP support for AMD Instinct GPUs (MI300X, MI325X, MI350X, MI355X), including nixlbench. The core build now targets C++20 to enable modern C++ features and provides a stronger foundation for future development for anyone compiling NIXL or its plugins from source. NIXL 1.3.0 also broadens the storage ecosystem: a new DDN Infinia backend joins the object-storage family, the obj plugin can now auto-register vendor backends without factory changes, and path-based file registration is now supported across all file-based backends.
Secondary updates focus on performance and reliability. Azure Blob Storage paths get faster through parallel memory queries, releasing the python global lock during transfer-request creation reduces multi-threaded contention, and using a pre-allocated telemetry buffer improves hot path performance. The telemetry schema is simplified, descriptor-list paths gain batched bulk removal and an empty-section leak fix, and the benchmark tools (nixlbench/kvbench) include several correctness improvements.
Major Features
- AMD ROCm/HIP support: Added AMD ROCm/HIP build support for AMD Instinct GPUs (gfx942 - MI300X, MI325X; gfx950-MI350X, MI355X), including hardware-info detection plumbing and a follow-up enabling the same support for
nixlbench. (#1642, #1647) - Move to C++20: Switched the core and plugins to the C++20 standard and updated the plugin READMEs accordingly. Downstream builds-from-source now require a C++20-capable toolchain. (#1571)
- DDN Infinia backend plugin: Added a new NIXL backend plugin for DDN Infinia storage. (#1569)
- Path-based file registration for all
FILE_SEGbackends: Callers can now declare files by path innixlBlobDesc::metaInfo(<modes>:<path>withro/rwaccess anddirect/sync/noatime/createflags); backends open the file inregisterMemand close it inderegisterMem. Wired through a sharedsrc/utils/file/file_path_modehelper into POSIX, HF3FS, CUDA_GDS, and GDS_MT. Strictly additive — unknown tokens fall back to the existing fd-in-devIdmode. (#1635) - Object plugin vendor backend registry: Replaced the
#ifdefladder inobj_backend.cppwith a self-registration pattern so accelerated/vendor engines register themselves viaobjAccelEngineRegistrar, making it trivial to add new object-storage engines without modifying the factory. (#1550)
API Changes
- [Telemetry] Slimmed telemetry event schema: Removed the redundant
categoryfield from telemetry events, simplifyingtelemetry_event.hand the backend telemetry surface. Consumers parsing telemetry events should drop thecategoryfield. (#1649) - [NIXL EP] Refactor rank and expert semantics: Added new fields, public mask-update capability and host-side tracking of active ranks for elastic rank handling. Dispatch/combine now accept an active-rank bound + experts-per-rank parameterization; internal buffer/layouts updated to use the active-range model. Removed the legacy mask-clean API. (#1693)
Enhancements
Performance
- [Azure] Parallelized memory query for the
AZURE_BLOBplugin: Batched memory queries now issue HEAD requests in parallel (mirroring the OBJ plugin), improving initial blob-existence checks for integrations such as KV-cache lookups in LMCache. (#1721) - [Python] Release the GIL during
makeXferReq: The Python binding now releases the GIL while building transfer requests, reducing contention for multithreaded callers. (#1712) - [Telemetry] Pre-allocate the event buffer: The telemetry event buffer is pre-allocated to avoid reallocation on the hot path. (#1719)
- [Core] Batched descriptor-list removal:
remDescListandremoveLocalDatanow remove descriptors in bulk instead of one-at-a-time, eliminating the previous O(N*M) per-deregister cost. (#1597) - [Core] Use C++20
[[likely]]/[[unlikely]]: Replaced__builtin_expectwith the standard C++20 branch-prediction attributes across the core, UCX backend, andnixlbench. (#1714)
Networking & Backend
- [Azure] Updated CA certificate discovery: Expanded the list of CA certificate file paths the Azure Blob client checks, improving TLS trust-store discovery across environments. (#1694)
NIXL-EP
- Removed a declared-but-undefined method: Cleaned up a method without a definition in the device/EP example. (#1684)
Packaging & Distribution
nixl_epwheel packs CUDA in a separate namespace: Thenixl_epwheel now packages the bundled CUDA libraries under a distinct namespace (with a load fallback), avoiding collisions with other CUDA installations. (#1727)- Wheel build excludes DDN partner libraries: Extended the
auditwheel --excludelist inbuild-wheel.shso DDN partner libraries are not vendored into the wheel. (#1733)
Benchmarks
- [nixlbench] AMD ROCm/HIP build support: Enabled building
nixlbenchfor AMD ROCm/HIP as a follow-up to the core AMD support. (#1647) - [nixlbench] Use
max_block_sizefor object-storage buffer size: Sized the object-storage buffer frommax_block_size. (#1636) - [nixlbench] Fixed object-storage device-ID collisions across threads: Resolved colliding device IDs when multiple threads target object storage. (#1638)
- [nixlbench] Fixed deallocation memory ordering: Corrected the order in which memory is deallocated. (#1590)
- [nixlbench] Fixed a missing closing bracket in print output: Repaired malformed benchmark print output. (#1725)
- [kvbench] Fixed OBJ backend configuration and buffer-size setup: Corrected the OBJ backend configuration and buffer-size initialization in
kvbench. (#1549)
Bugfixes
- [Core] Fixed
addElementusing a hardcodedVRAM_SEGlookup:mem_sectionno longer assumesVRAM_SEGwhen adding an element. (#1634) - [Core] Erase empty sections in
removeLocalData: Empty section map entries are now removed, preventingsectionMap/memToBackendfrom growing without bound in long-running register/deregister workloads. (#1597) - [Core] Fixed
remDescListreturningNIXL_ERR_NOT_FOUNDwhenlen=0: Zero-length descriptor lists are handled correctly. (#1551) - [POSIX] Fixed backend queue fallback handling: Corrected fallback handling in the POSIX backend transfer queue. (#1605)
- [Telemetry] Fixed DOCA exporter build: Added
nixl_common_depsotomlplusplusresolves for the DOCA telemetry plugin, which otherwise failed to compile aftercommon/configuration.hbegan includingtoml++/toml.hppdirectly. (#1640)
Known Issues
- [POSIX] File-path Mode has a double-free issue (#1766)
Full Changelog: 1.2.0...1.3.0