Release Release v1.1.1 · NVIDIA/nvflow

v1.1.1 is a focused upgrade adding air-gapped Slurm support to the v1.1.0 stack. The underlying software versions (NeMo-RL v0.6.0, NeMo-Skills @ 0229040, vLLM 0.18.1 / 0.17.1, sglang v0.5.10.post1) are unchanged — this release ships reproducible wrappers around those bases plus the runtime plumbing to use them without internet.

Drop-in upgrade from v1.1.0 — no breaking changes for developer-mode users.

Highlights

Air-gapped Slurm support

Run NVFlow on Slurm clusters with no network access from compute nodes.

We provide Dockerfiles to build air-gapped containers. The new dockerfiles/ directory ships reproducible build recipes for self-sufficient nemo-skills, nemo-rl, vllm, and vllm-grpo images — each extends the same upstream base used in v1.1.0 with pre-baked deps (tiktoken, openai_harmony, etc.). Build with docker build, push to your registry, then provision via the existing scripts/setup_containers.sh. Offline env vars (HF_*_OFFLINE, UV_OFFLINE, TIKTOKEN_*) are wired through the cluster config templates.

See dockerfiles/README.md and dockerfiles/docker_instructions.md for the build/push workflow.

Dual-mode runtime

A single code path now serves both air-gapped and developer modes. New nvflow/lib/runtime.py auto-resolves the right venv / Python — checkpoint converter, vLLM serving, training stages, and SDG scripts all benefit. Dev-mode users see no behaviour change.

Slurm submission ergonomics

New nvflow/lib/sbatch.py plumbs extra_sbatch_args through every Slurm submission via get_executor, so cluster-specific flags (--exclude, --account, etc.) propagate to all stages without per-stage code changes.

GRPO / eval / SFT / SDG fixes

Per-environment GRPO eval outputs (step-9-eval/<env>/step-N/…) — no more cross-env overwrite.
Restored equivalence-LLM-judge training policy (sequence_packing, logprob_chunk_size, make_sequence_length_divisible_by) that was dropped during a prior config refactor.
Tiktoken cache no longer attempts a download when TIKTOKEN_* env vars are pre-configured.

CI

Lightweight unit-tests workflow is now genuinely lightweight on both GitLab and GitHub (no transitive heavy-dep imports; uv run --no-sync pytest on both runners).

Documentation

INSTALL.md rewritten end-to-end for both runtime modes. New dockerfiles/README.md, dockerfiles/docker_instructions.md, and docs/recipes/finance/troubleshooting.md.

Full Changelog: v1.1.0...v1.1.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release v1.1.1

Choose a tag to compare

Sorry, something went wrong.