Release 2.8.0: Major release · NVIDIA/NVFlare

2.8.0 Release Contributors (PR Count Order)

Total PRs counted in this release: 352

Yuan-Ting Hsieh (謝沅廷) — 64 PRs
Peter Cnudde — 59 PRs
Holger Roth — 58 PRs
Chester Chen — 50 PRs
Isaac Yang — 29 PRs
Zhihong Zhang — 26 PRs
nvkevlu — 19 PRs
Ziyue Xu — 17 PRs
GeorgeWang-nv — 8 PRs
nvshaxie — 6 PRs
Vijay Govindarajan — 4 PRs
Douwe van der Wal — 3 PRs
Yuanyuan Chen — 1 PR
Kevin Ta — 1 PR
Ioannis Christofilogiannis — 1 PR
gn00295120 — 1 PR
Zare2001 — 1 PR
Peixin — 1 PR
rollingsu — 1 PR
Hop Le — 1 PR
Mohan Krishna G R — 1 PR

🎉 Welcome First-Time Contributors!

nvshaxie — 6 PRs
Vijay Govindarajan — 4 PRs
Douwe van der Wal — 3 PRs
Kevin Ta — 1 PR
Ioannis Christofilogiannis — 1 PR
gn00295120 — 1 PR
Zare2001 — 1 PR
rollingsu — 1 PR
Hop Le — 1 PR
Mohan Krishna G R — 1 PR

Feature Highlights

NVIDIA FLARE 2.8.0 focuses on making production federated learning easier to operate across organizations, studies, and runtime environments. The release adds Docker and Kubernetes job launchers, a broader automation-friendly CLI, distributed provisioning, multi-study support, stronger observability, and additional production hardening. It also adds new examples and research bundles for multimodal, language-model, Docker, Kubernetes, and privacy-oriented federated learning workflows.

Modern NVFlare CLI: expanded nvflare command groups for jobs, system operations, local config, startup kits, recipes, distributed provisioning, and deployment preparation, with JSON output and schema support so operators and automation systems can run FLARE workflows without relying on console-only behavior.
Distributed provisioning: new nvflare cert and nvflare package workflows let participants keep private keys local while Project Admins approve certificate requests and generate signed packages, improving security ownership in cross-organization deployments.
Deployment prepare and runtime packaging: new nvflare deploy prepare flow packages existing startup kits for Docker and Kubernetes runtimes, including Kubernetes environments on AWS, Azure, and GCP, so provisioning and runtime packaging can be handled as separate repeatable steps.
Docker and Kubernetes job launchers: each site can configure a process, Docker, or Kubernetes job launcher. With the matching launcher configured, host-based jobs run as subprocesses, Docker-based jobs run as job containers, and Kubernetes-based jobs run as separate job pods, giving production sites Docker/Kubernetes isolation and resource handling plus study-scoped dataset mounts for stronger data isolation.
Multi-study support: study definitions in project.yml, study-scoped sessions, study-aware admin operations, and study CLI commands let one FLARE deployment host multiple collaborations without mixing participants, authorization, data access, or operational context.
Live log streaming: site and job logs stream to the server while jobs are running, reducing time to diagnose remote training failures and making CLI automation more responsive.
Security and production hardening: origin-bound auth tokens, safer archive handling, stricter private-key file permissions, safer loading paths, stronger job metadata validation, and additional dashboard/API hardening reduce common operational risk in federated deployments.
Feature election: a new federated feature selection workflow lets clients perform local feature selection for tabular datasets and share feature scores, not raw data, so FLARE can aggregate a global feature mask for downstream training.
Tensor disk offload for FedAvg: enabling enable_tensor_disk_offload=True significantly reduces server peak memory during FedAvg aggregation. Instead of holding all client tensor updates in memory simultaneously, each update is written to a temporary safetensors file on disk and consumed lazily during aggregation. The benefit scales with model size and client count.
Large-model streaming reliability: large tensor broadcasts are more robust when many clients retry after delayed EOF responses. Finished download refs are handled idempotently, and subprocess Client API jobs now reject unbounded result resends or missing download-completion waits that can turn one slow transfer into repeated large-model retries.
New examples and contributed research: MedGemma, Qwen3-VL, Codon-FM, FedUMM, financial-services fraud detection, Docker job examples, distributed provisioning examples, Hello JAX, and Hello log streaming help teams start from working patterns instead of assembling production and research workflows from scratch.

See the full 2.8.0 release note: https://nvflare.readthedocs.io/en/2.8.0/release_notes/flare_280.html

What's Changed

Expanded NVFlare CLI commands, shared plumbing, POC/provision/backend flows, docs, examples, and startup-kit workflows by @chesterxgchen in #4449, #4448, #4447, #4479
Added distributed provisioning with nvflare cert / nvflare package, job CLI connection args, system commands, workflow enhancements, and provision-version support by @chesterxgchen in #4380, #4462, #4481, #4508
Added deploy prepare for Docker and Kubernetes runtime packaging by @YuanTingHsieh in #4499
Added Docker and Kubernetes job launcher support, job handles, multicloud Kubernetes tooling, CellNet workspace transfer, and study-scoped job pod isolation by @IsaacYangSLA, @YuanTingHsieh, and @pcnudde in #4336, #4409, #4450, #4469, #4474
Added multi-study deployment and administration support, including study plumbing, runtime study commands, registry support, and PoC environment support by @pcnudde and @chesterxgchen in #4386, #4398, #4472, #4415
Added live job log streaming and per-site log streaming control by @nvidianz in #4454, #4476
Added the federated Feature Election workflow by @christofilojohn in #3876
Added tensor disk offload for PyTorch FedAvg, in-flight cleanup, run-scoped temp cleanup, server tempdir guidance, and release-note memory chart coverage by @pcnudde and @chesterxgchen in #4221, #4501, #4534, #4495, #4668, #4769
Improved large-model streaming reliability with finished download-ref retry handling, Client API launcher resend validation, incomplete-download protection, and updated docs by @chesterxgchen in #4708, #4710, #4725, #4714
Hardened auth, archive handling, artifact writer paths, JsonStats encoder loading, BYOC FOBS decomposer loading, and confidential-computing class allow-list support by @pcnudde, @nvidianz, @chesterxgchen, and @IsaacYangSLA in #4605, #4509, #4738, #4740, #4749, #4756, #4701
Added safer deserialization for torch.load / np.load and stricter private-key file permissions by @gn00295120 and @GeorgeWang-nv in #4344, #4431
Removed deprecated FLAdminAPI and HA/Overseer code by @pcnudde and @nvidianz in #4400, #4503
Aligned CLI Python support with Python 3.10 through 3.14 by @pcnudde in #4533
Added Codon-FM, Qwen3-VL, MedGemma, FedUMM, financial-services fraud detection, and Hello JAX examples/research by @holgerroth, @ZiyueXu77, and @rollingsu in #3889, #4212, #4277, #4359, #4424, #4158, #4395, #4358
Added the NVFlare CLI tutorial and refreshed the tutorial example catalog by @chesterxgchen in #4639, #4672

Full Changelog: 2.7.2...2.8.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2.8.0: Major release

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

2.8.0 Release Contributors (PR Count Order)

🎉 Welcome First-Time Contributors!

Feature Highlights

What's Changed

Contributors

Uh oh!