Release v1.1.0
Highlights
Container
| Software Component | Version |
|---|---|
| NeMo-RL | v0.6.0 |
| NeMo-Skills | 0229040 (commit) |
| vLLM (eval/SDG) | 0.18.1 |
| vLLM (GRPO) | 0.17.1 |
| sglang | v0.5.10.post1 |
GRPO Multi-Environment Training
Two-environment GRPO pipeline with split configs to prevent cross-environment leaks:
equivalence_llm_judge— FSDP v2 backend, 16 GPUsfinance_sec_search— Megatron backend with YaRN (131K context), 64 GPUs
Qwen3-30B-A3B Production Pipeline
Full GRPO config for Qwen3-30B-A3B MoE with curriculum ordering, dynamic sampling, and context parallelism.
Rollout Scaling
Scale-independent rollout pipeline with multi-node vLLM, logical chunking, and fault-tolerant multi-seed execution via dependent_jobs.
Eval Pipeline Hardening
- DTensor v2 safetensors checkpoint conversion with
.hf_metadataauto-recreation - Separate per-environment eval output directories
- Standalone eval support (cross-session Slurm dependency handling)
Documentation
- Quick-start rewrite with per-environment stage-by-stage execution
- Dual backend guide (FSDP for demo, Megatron for production)