Skip to content

v1.3.0

Latest

Choose a tag to compare

@nv-nmailhot nv-nmailhot released this 15 Jun 23:33
· 5 commits to main since this release
5949ccf

1.3.0

Summary

NIXL 1.3.0 expands platform reach and backend capabilities. It adds AMD ROCm/HIP support for AMD Instinct GPUs (MI300X, MI325X, MI350X, MI355X), including nixlbench. The core build now targets C++20 to enable modern C++ features and provides a stronger foundation for future development for anyone compiling NIXL or its plugins from source. NIXL 1.3.0 also broadens the storage ecosystem: a new DDN Infinia backend joins the object-storage family, the obj plugin can now auto-register vendor backends without factory changes, and path-based file registration is now supported across all file-based backends.

Secondary updates focus on performance and reliability. Azure Blob Storage paths get faster through parallel memory queries, releasing the python global lock during transfer-request creation reduces multi-threaded contention, and using a pre-allocated telemetry buffer improves hot path performance. The telemetry schema is simplified, descriptor-list paths gain batched bulk removal and an empty-section leak fix, and the benchmark tools (nixlbench/kvbench) include several correctness improvements.

Major Features

  • AMD ROCm/HIP support: Added AMD ROCm/HIP build support for AMD Instinct GPUs (gfx942 - MI300X, MI325X; gfx950-MI350X, MI355X), including hardware-info detection plumbing and a follow-up enabling the same support for nixlbench. (#1642, #1647)
  • Move to C++20: Switched the core and plugins to the C++20 standard and updated the plugin READMEs accordingly. Downstream builds-from-source now require a C++20-capable toolchain. (#1571)
  • DDN Infinia backend plugin: Added a new NIXL backend plugin for DDN Infinia storage. (#1569)
  • Path-based file registration for all FILE_SEG backends: Callers can now declare files by path in nixlBlobDesc::metaInfo (<modes>:<path> with ro/rw access and direct/sync/noatime/create flags); backends open the file in registerMem and close it in deregisterMem. Wired through a shared src/utils/file/file_path_mode helper into POSIX, HF3FS, CUDA_GDS, and GDS_MT. Strictly additive — unknown tokens fall back to the existing fd-in-devId mode. (#1635)
  • Object plugin vendor backend registry: Replaced the #ifdef ladder in obj_backend.cpp with a self-registration pattern so accelerated/vendor engines register themselves via objAccelEngineRegistrar, making it trivial to add new object-storage engines without modifying the factory. (#1550)

API Changes

  • [Telemetry] Slimmed telemetry event schema: Removed the redundant category field from telemetry events, simplifying telemetry_event.h and the backend telemetry surface. Consumers parsing telemetry events should drop the category field. (#1649)
  • [NIXL EP] Refactor rank and expert semantics: Added new fields, public mask-update capability and host-side tracking of active ranks for elastic rank handling. Dispatch/combine now accept an active-rank bound + experts-per-rank parameterization; internal buffer/layouts updated to use the active-range model. Removed the legacy mask-clean API. (#1693)

Enhancements

Performance

  • [Azure] Parallelized memory query for the AZURE_BLOB plugin: Batched memory queries now issue HEAD requests in parallel (mirroring the OBJ plugin), improving initial blob-existence checks for integrations such as KV-cache lookups in LMCache. (#1721)
  • [Python] Release the GIL during makeXferReq: The Python binding now releases the GIL while building transfer requests, reducing contention for multithreaded callers. (#1712)
  • [Telemetry] Pre-allocate the event buffer: The telemetry event buffer is pre-allocated to avoid reallocation on the hot path. (#1719)
  • [Core] Batched descriptor-list removal: remDescList and removeLocalData now remove descriptors in bulk instead of one-at-a-time, eliminating the previous O(N*M) per-deregister cost. (#1597)
  • [Core] Use C++20 [[likely]]/[[unlikely]]: Replaced __builtin_expect with the standard C++20 branch-prediction attributes across the core, UCX backend, and nixlbench. (#1714)

Networking & Backend

  • [Azure] Updated CA certificate discovery: Expanded the list of CA certificate file paths the Azure Blob client checks, improving TLS trust-store discovery across environments. (#1694)

NIXL-EP

  • Removed a declared-but-undefined method: Cleaned up a method without a definition in the device/EP example. (#1684)

Packaging & Distribution

  • nixl_ep wheel packs CUDA in a separate namespace: The nixl_ep wheel now packages the bundled CUDA libraries under a distinct namespace (with a load fallback), avoiding collisions with other CUDA installations. (#1727)
  • Wheel build excludes DDN partner libraries: Extended the auditwheel --exclude list in build-wheel.sh so DDN partner libraries are not vendored into the wheel. (#1733)

Benchmarks

  • [nixlbench] AMD ROCm/HIP build support: Enabled building nixlbench for AMD ROCm/HIP as a follow-up to the core AMD support. (#1647)
  • [nixlbench] Use max_block_size for object-storage buffer size: Sized the object-storage buffer from max_block_size. (#1636)
  • [nixlbench] Fixed object-storage device-ID collisions across threads: Resolved colliding device IDs when multiple threads target object storage. (#1638)
  • [nixlbench] Fixed deallocation memory ordering: Corrected the order in which memory is deallocated. (#1590)
  • [nixlbench] Fixed a missing closing bracket in print output: Repaired malformed benchmark print output. (#1725)
  • [kvbench] Fixed OBJ backend configuration and buffer-size setup: Corrected the OBJ backend configuration and buffer-size initialization in kvbench. (#1549)

Bugfixes

  • [Core] Fixed addElement using a hardcoded VRAM_SEG lookup: mem_section no longer assumes VRAM_SEG when adding an element. (#1634)
  • [Core] Erase empty sections in removeLocalData: Empty section map entries are now removed, preventing sectionMap/memToBackend from growing without bound in long-running register/deregister workloads. (#1597)
  • [Core] Fixed remDescList returning NIXL_ERR_NOT_FOUND when len=0: Zero-length descriptor lists are handled correctly. (#1551)
  • [POSIX] Fixed backend queue fallback handling: Corrected fallback handling in the POSIX backend transfer queue. (#1605)
  • [Telemetry] Fixed DOCA exporter build: Added nixl_common_dep so tomlplusplus resolves for the DOCA telemetry plugin, which otherwise failed to compile after common/configuration.h began including toml++/toml.hpp directly. (#1640)

Known Issues

  • [POSIX] File-path Mode has a double-free issue (#1766)

Full Changelog: 1.2.0...1.3.0