Dynamo v1.2.1 - Release Notes
Summary
Dynamo v1.2.1 is a patch release on top of v1.2.0, focused on ModelExpress 0.4.0 engine-side model loading (including object-storage model sources), AMD ROCm / Python 3.10 import compatibility, EFA container build fixes, and backend correctness fixes for SGLang and the gpt-oss-120b recipe.
Base Branch: release/1.2.1
Features & Improvements
- ModelExpress 0.4.0 Integration: Added engine-side ModelExpress model loading to the vLLM and SGLang runtimes (#10578). The runtime images now ship
modelexpress==0.4.0by default (installed with--no-depsso the upstream engine dependency stacks are untouched), the new vLLM path is owned by the ModelExpress vLLM plugin, and the legacy Dynamo-owned--model-express-url/MODEL_EXPRESS_URLwrapper is retained only as deprecated compatibility parsing.
Bug Fixes
- ModelExpress Object-Storage Loading: Fixed a model-load stall on the vLLM and SGLang ModelExpress / RunAI Model Streamer object-storage path (
--model s3://…, alsogs:///az://) (#10674). Avoided constructing the vLLMModelConfigtwice (which ran duplicate__post_init__()side effects during engine startup) and maderegister_model()use the engine's pulled local directory rather than re-resolving the object-storage URI. - SGLang Routed-Experts Encoding: Fixed a crash on the first decoded token when
--enable-return-routed-expertsis used with non-DeepSeek-V4 MoE models on SGLang 0.5.11+ (#10543). SGLang v0.5.11 moved the base64 encoding ofrouted_expertsupstream intotokenizer_manager, so the Dynamo decode handler no longer re-encodes the already-encoded string;nvext.routed_expertsis emitted as a base64 UTF-8 string at both emit sites. - ROCm / Python 3.10 Import Compatibility: Fixed two import-time failures that blocked
import dynamo.*on AMD ROCm / Python 3.10 hosts (#10545).nixl_connectnow defers the CUDA-only NIXLImportErroruntil first use so the router, planner, and frontend import cleanly on hosts without a NIXL wheel, anddynamo.common.configurationusestyping_extensions.Selfinstead oftyping.Selffor Python 3.10. CUDA behavior is unchanged. - gpt-oss-120b Recipe Revert: Reverted the gpt-oss-120b TensorRT-LLM aggregated recipe to runtime image
1.0.0to restore the prior recipe baseline (#10549).
Build, CI and Test
- EFA Container Build Fixes: Backported two EFA container build fixes to the release branch (#10425): bumped
nixl_gdrcopy_refto v2.5.2 for Linux kernel ≥6.15 compatibility, and built and overlaid libfabric v2.5.1 (the first release with the CUDA dmabuf fix for GB200 EFA) onto the EFA installer's stock binary. Together these restore the Dynamo EFA RDMA container build.
Key Dependencies
| Dependency | Version |
|---|---|
| ModelExpress | 0.4.0 |
| SGLang runtime | 0.5.11 |
| typing_extensions | >=4.10.0 |
| libfabric (EFA overlay) | v2.5.1 |
| NIXL GDRCopy | v2.5.2 |
Backend runtime versions (vLLM, TensorRT-LLM, CUDA) are unchanged from v1.2.0.
Known Issues
- SGLang ModelExpress peer-to-peer: SGLang ModelExpress peer-to-peer transfer is not included in this release; the bundled SGLang 0.5.11 runtime does not carry the required upstream support.