v1.1.1 is a focused upgrade adding air-gapped Slurm support to the v1.1.0 stack. The underlying software versions (NeMo-RL v0.6.0, NeMo-Skills @ 0229040, vLLM 0.18.1 / 0.17.1, sglang v0.5.10.post1) are unchanged — this release ships reproducible wrappers around those bases plus the runtime plumbing to use them without internet.
Drop-in upgrade from v1.1.0 — no breaking changes for developer-mode users.
Highlights
Air-gapped Slurm support
Run NVFlow on Slurm clusters with no network access from compute nodes.
We provide Dockerfiles to build air-gapped containers. The new dockerfiles/ directory ships reproducible build recipes for self-sufficient nemo-skills, nemo-rl, vllm, and vllm-grpo images — each extends the same upstream base used in v1.1.0 with pre-baked deps (tiktoken, openai_harmony, etc.). Build with docker build, push to your registry, then provision via the existing scripts/setup_containers.sh. Offline env vars (HF_*_OFFLINE, UV_OFFLINE, TIKTOKEN_*) are wired through the cluster config templates.
See dockerfiles/README.md and dockerfiles/docker_instructions.md for the build/push workflow.
Dual-mode runtime
A single code path now serves both air-gapped and developer modes. New nvflow/lib/runtime.py auto-resolves the right venv / Python — checkpoint converter, vLLM serving, training stages, and SDG scripts all benefit. Dev-mode users see no behaviour change.
Slurm submission ergonomics
New nvflow/lib/sbatch.py plumbs extra_sbatch_args through every Slurm submission via get_executor, so cluster-specific flags (--exclude, --account, etc.) propagate to all stages without per-stage code changes.
GRPO / eval / SFT / SDG fixes
- Per-environment GRPO eval outputs (
step-9-eval/<env>/step-N/…) — no more cross-env overwrite. - Restored equivalence-LLM-judge training policy (
sequence_packing,logprob_chunk_size,make_sequence_length_divisible_by) that was dropped during a prior config refactor. - Tiktoken cache no longer attempts a download when
TIKTOKEN_*env vars are pre-configured.
CI
Lightweight unit-tests workflow is now genuinely lightweight on both GitLab and GitHub (no transitive heavy-dep imports; uv run --no-sync pytest on both runners).
Documentation
INSTALL.md rewritten end-to-end for both runtime modes. New dockerfiles/README.md, dockerfiles/docker_instructions.md, and docs/recipes/finance/troubleshooting.md.
Full Changelog: v1.1.0...v1.1.1