Skip to content

Release v1.1.0

Choose a tag to compare

@psgundecha-nv psgundecha-nv released this 09 May 02:02
· 1 commit to main since this release
v1.1.0
080ee83

Highlights

Container

Software Component Version
NeMo-RL v0.6.0
NeMo-Skills 0229040 (commit)
vLLM (eval/SDG) 0.18.1
vLLM (GRPO) 0.17.1
sglang v0.5.10.post1

GRPO Multi-Environment Training

Two-environment GRPO pipeline with split configs to prevent cross-environment leaks:

  • equivalence_llm_judge — FSDP v2 backend, 16 GPUs
  • finance_sec_search — Megatron backend with YaRN (131K context), 64 GPUs

Qwen3-30B-A3B Production Pipeline

Full GRPO config for Qwen3-30B-A3B MoE with curriculum ordering, dynamic sampling, and context parallelism.

Rollout Scaling

Scale-independent rollout pipeline with multi-node vLLM, logical chunking, and fault-tolerant multi-seed execution via dependent_jobs.

Eval Pipeline Hardening

  • DTensor v2 safetensors checkpoint conversion with .hf_metadata auto-recreation
  • Separate per-environment eval output directories
  • Standalone eval support (cross-session Slurm dependency handling)

Documentation

  • Quick-start rewrite with per-environment stage-by-stage execution
  • Dual backend guide (FSDP for demo, Megatron for production)