Release v1.6.1 · NVIDIA/cloudai

New Changes

Added support for the following workloads:
- vLLM - LLM serving benchmark support with Slurm execution, disaggregated prefill/decode mode, multi-node serving, reporting, DSE metrics, and NIXL-related options
- SGLang - LLM serving benchmark support sharing the common vLLM/SGLang serving flow, reporting, health checks, and multi-node execution
- NIXL EP - NIXL Expert Parallelism workload with Slurm command generation, log parsing, reporting, and tests
Added DSE reporting, including richer visualization of design-space exploration results and best-configuration selection
Added report generation for MegatronRun and OSU benchmarks
Added support for CNI specification configuration for NCCL and AI Dynamo workloads on Kubernetes

Backward Compatibility Notes

AI Dynamo configuration schema
- Worker settings now use explicit prefill_worker and decode_worker blocks with nested args.
- Older fields such as prefill-cmd, decode-cmd, top-level worker parallelism keys, run_script, and huggingface_home_container_path should be migrated to the new schema.
Megatron-Bridge configuration schema
- model_family_name and model_recipe_name replace the earlier model_name and model_size fields.
- time_limit is now taken from the test run rather than cmd_args.
- A Megatron-Bridge git repo only overrides the container copy when mount_as = "/opt/Megatron-Bridge" is set.
Custom workload implementations
- Custom workloads that override constraint_check(self, tr) should update the method signature to accept the new system argument.

LLM Serving Improvements

CloudAI now includes first-class support for vLLM and SGLang serving workloads. The implementation includes shared serving infrastructure, Slurm command generation, result reporting, disaggregated prefill/decode support, two-node serving flows, custom health check endpoints, and more robust startup, shutdown, and cleanup handling. vLLM also supports DSE metrics, NIXL thread options, boolean flag handling, and constraint checks.

Megatron and Megatron-Bridge Improvements

Megatron-Bridge support was updated for r0.3.0 recipes and improved configuration handling. GPU counts can be derived from the system configuration, time limits are managed by the test run, VP parameters are handled more reliably, and status checks reduce false passes. MegatronRun now has report generation support and improved success detection, including timeout handling.

NIXL, Kubernetes, and Networking

NIXL workloads gained a new EP workload, updated CLI argument handling, support for separate ETCD containers, improved ETCD failure handling, safer mount cleanup, and installable fixes around nested Docker image paths and submodules. Kubernetes support was improved with CNI spec handling for NCCL and AI Dynamo, while NCCL Kubernetes tests were refactored for better reuse and temporary-resource management.

Reporting, Configuration, and Parsing

Reporting now includes DSE reports, OSU benchmark reports, MegatronRun reports, and reward override support for constraint failures. Configuration handling is more robust with improved duplicate-key errors, system config detection, path expansion/storage, first-sweep messaging, and agent configuration/caching updates.

Architecture, Reliability, and Tooling

Job monitoring no longer relies on asyncio, heavy imports are blocked at module level, and command shell checks no longer run during object creation. Slurm handling was improved around node exclusion, reservation nodes, GPU resource requesting, and propagation of extra Slurm arguments. Tooling was refreshed with pre-commit, updated CI workflows, uv usage in CI, Node 24-compatible GitHub Actions, broader tests organized by system/workload, and dependency updates.

Documentation

Documentation was expanded for vLLM, SGLang, NIXL EP, Systems, workload requirements, reporting, troubleshooting, and tutorial/user guide content. Workload pages and release configurations were updated to match the new workloads and configuration flows.

All Changed

Bump to v1.6 + upgrade dependencies by @amaslenn in #798
Upgrade GitHub Actions to latest versions by @salmanmkc in #751
Upgrade GitHub Actions for Node 24 compatibility by @salmanmkc in #750
Ban "heavy" imports on module level by @amaslenn in #801
Remove asyncio usage in jobs monitoring by @amaslenn in #796
Bump pillow from 12.1.0 to 12.1.1 by @dependabot[bot] in #802
Add report generation strategy for the MegatronRun by @juntaowww in #787
Fix accedentially reverted version bump by @amaslenn in #805
Add support for running vLLM by @amaslenn in #799
Unit-tests per system/workload by @podkidyshev in #808
Fix nsys subfield merging behavior by @juntaowww in #795
Add support for setting NIXL num threads for vLLM CLI by @amaslenn in #809
Fix base_tr fixture dependency by @podkidyshev in #810
Fixes CLOUDAI-15: Updated copyright check by @podkidyshev in #811
Add report generation for OSU Benchmark by @allkoow in #807
Single sbatch + NIXL + ETCD issues by @podkidyshev in #812
Support separate ETCD container for NIXL workloads by @amaslenn in #813
Yet another attempt on the right copyright by @podkidyshev in #815
Refactor NCCL k8s test cases to improve re-use and temp resources management by @amaslenn in #817
Support DSE metrics for vLLM by @amaslenn in #816
Agent configs by @podkidyshev in #818
AI Dynamo updates by @karya0 in #814
Avoid silent failure when commit hash is invalid by @juntaowww in #820
Warning on using first sweep by @podkidyshev in #822
Update CLI args format for NIXL bench by @amaslenn in #823
Fix commit verification: commit/branch/tag support by @podkidyshev in #824
Megatron-Bridge updates by @podkidyshev in #821
pre-commit by @podkidyshev in #827
Add documentation for Systems by @amaslenn in #826
Bump werkzeug from 3.1.5 to 3.1.6 by @dependabot[bot] in #828
Address doc issues by @amaslenn in #831
Use uv in ci by @podkidyshev in #835
Bump tornado from 6.5.4 to 6.5.5 by @dependabot[bot] in #833
Add SGLang workload by @amaslenn in #834
Merge common part of vLLM and SGLang by @amaslenn in #836
NIXL update: filepath and device_list by @podkidyshev in #829
Agents caching by @podkidyshev in #837
Add support for x2 nodes serving for vLLM and SGLang by @amaslenn in #839
Megatron-Bridge r0.3.0 enhancement by @juntaowww in #830
Avoid real system calls by @amaslenn in #842
Do not run CommandShell check during object creation by @amaslenn in #843
Cleanup NIXL file mounts by @podkidyshev in #840
Formatting changes by @RulaHallak in #838
Add NIXL EP workload by @amaslenn in #845
DSE reporting by @podkidyshev in #846
Support CNI spec for NCCL over k8s by @amaslenn in #848
Bump requests from 2.32.5 to 2.33.0 by @dependabot[bot] in #852
MBridge: time limit managed by test run by @podkidyshev in #849
CNI spec support for Dynamo @ k8s by @amaslenn in #854
MBridge: using gpus-per-node from system by @podkidyshev in #847
Update CODEOWNERS by @amaslenn in #856
VLLM: boolean flags and constraints by @podkidyshev in #857
Allow profiling ranks in string format with comma as separator by @juntaowww in #855
MBridge: fix vp parameter handling by @podkidyshev in #858
Bump pygments from 2.19.2 to 2.20.0 by @dependabot[bot] in #853
MBridge: revert metrics parsing by @podkidyshev in #862
Installables: nested docker image path by @podkidyshev in #861
Megatron Run: status check by @podkidyshev in #859
Fix path expansion/storage by @amaslenn in #864
Constraint failure reward override by @alexmanle in #865
Amanley/reward overrides by @alexmanle in #869
MegatronRun: fix .load test + allow timeouts by @podkidyshev in #866
Bump pytest from 9.0.2 to 9.0.3 by @dependabot[bot] in #871
Bump pillow from 12.1.1 to 12.2.0 by @dependabot[bot] in #870
Bump uv from 0.10.0 to 0.11.6 by @dependabot[bot] in #867
Installables: submodules fix by @podkidyshev in #872
Fix broken duplicate test name detection in TestParser.parse_all() by @rutayan-nv in #875
Parsing: enhance error handling by @podkidyshev in #876
fix various vllm/sglan bugs by @podkidyshev in #877
vLLM, SGLang: fix long server start by @podkidyshev in #879
vLLM, SGLang: cleanup fix for single-sbatch by @podkidyshev in #880
Parsing: fix system config detection by @podkidyshev in #881
vLLM, SGLang: custom healthcheck endpoint by @podkidyshev in #882
fix secret scan false positive by @podkidyshev in #883

New Contributors

@salmanmkc made their first contribution in #751

Full Changelog: v1.5.0...v1.6.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.6.1

Choose a tag to compare

Sorry, something went wrong.