Skip to content

Dynamo v1.2.1

Latest

Choose a tag to compare

@dagil-nvidia dagil-nvidia released this 13 Jun 18:48
919682d

Dynamo v1.2.1 - Release Notes

Summary

Dynamo v1.2.1 is a patch release on top of v1.2.0, focused on ModelExpress 0.4.0 engine-side model loading (including object-storage model sources), AMD ROCm / Python 3.10 import compatibility, EFA container build fixes, and backend correctness fixes for SGLang and the gpt-oss-120b recipe.

Base Branch: release/1.2.1

Features & Improvements

  • ModelExpress 0.4.0 Integration: Added engine-side ModelExpress model loading to the vLLM and SGLang runtimes (#10578). The runtime images now ship modelexpress==0.4.0 by default (installed with --no-deps so the upstream engine dependency stacks are untouched), the new vLLM path is owned by the ModelExpress vLLM plugin, and the legacy Dynamo-owned --model-express-url / MODEL_EXPRESS_URL wrapper is retained only as deprecated compatibility parsing.

Bug Fixes

  • ModelExpress Object-Storage Loading: Fixed a model-load stall on the vLLM and SGLang ModelExpress / RunAI Model Streamer object-storage path (--model s3://…, also gs:// / az://) (#10674). Avoided constructing the vLLM ModelConfig twice (which ran duplicate __post_init__() side effects during engine startup) and made register_model() use the engine's pulled local directory rather than re-resolving the object-storage URI.
  • SGLang Routed-Experts Encoding: Fixed a crash on the first decoded token when --enable-return-routed-experts is used with non-DeepSeek-V4 MoE models on SGLang 0.5.11+ (#10543). SGLang v0.5.11 moved the base64 encoding of routed_experts upstream into tokenizer_manager, so the Dynamo decode handler no longer re-encodes the already-encoded string; nvext.routed_experts is emitted as a base64 UTF-8 string at both emit sites.
  • ROCm / Python 3.10 Import Compatibility: Fixed two import-time failures that blocked import dynamo.* on AMD ROCm / Python 3.10 hosts (#10545). nixl_connect now defers the CUDA-only NIXL ImportError until first use so the router, planner, and frontend import cleanly on hosts without a NIXL wheel, and dynamo.common.configuration uses typing_extensions.Self instead of typing.Self for Python 3.10. CUDA behavior is unchanged.
  • gpt-oss-120b Recipe Revert: Reverted the gpt-oss-120b TensorRT-LLM aggregated recipe to runtime image 1.0.0 to restore the prior recipe baseline (#10549).

Build, CI and Test

  • EFA Container Build Fixes: Backported two EFA container build fixes to the release branch (#10425): bumped nixl_gdrcopy_ref to v2.5.2 for Linux kernel ≥6.15 compatibility, and built and overlaid libfabric v2.5.1 (the first release with the CUDA dmabuf fix for GB200 EFA) onto the EFA installer's stock binary. Together these restore the Dynamo EFA RDMA container build.

Key Dependencies

Dependency Version
ModelExpress 0.4.0
SGLang runtime 0.5.11
typing_extensions >=4.10.0
libfabric (EFA overlay) v2.5.1
NIXL GDRCopy v2.5.2

Backend runtime versions (vLLM, TensorRT-LLM, CUDA) are unchanged from v1.2.0.

Known Issues

  • SGLang ModelExpress peer-to-peer: SGLang ModelExpress peer-to-peer transfer is not included in this release; the bundled SGLang 0.5.11 runtime does not carry the required upstream support.