2.8.0 Release Contributors (PR Count Order)
Total PRs counted in this release: 352
- Yuan-Ting Hsieh (謝沅廷) — 64 PRs
- Peter Cnudde — 59 PRs
- Holger Roth — 58 PRs
- Chester Chen — 50 PRs
- Isaac Yang — 29 PRs
- Zhihong Zhang — 26 PRs
- nvkevlu — 19 PRs
- Ziyue Xu — 17 PRs
- GeorgeWang-nv — 8 PRs
- nvshaxie — 6 PRs
- Vijay Govindarajan — 4 PRs
- Douwe van der Wal — 3 PRs
- Yuanyuan Chen — 1 PR
- Kevin Ta — 1 PR
- Ioannis Christofilogiannis — 1 PR
- gn00295120 — 1 PR
- Zare2001 — 1 PR
- Peixin — 1 PR
- rollingsu — 1 PR
- Hop Le — 1 PR
- Mohan Krishna G R — 1 PR
🎉 Welcome First-Time Contributors!
- nvshaxie — 6 PRs
- Vijay Govindarajan — 4 PRs
- Douwe van der Wal — 3 PRs
- Kevin Ta — 1 PR
- Ioannis Christofilogiannis — 1 PR
- gn00295120 — 1 PR
- Zare2001 — 1 PR
- rollingsu — 1 PR
- Hop Le — 1 PR
- Mohan Krishna G R — 1 PR
Feature Highlights
NVIDIA FLARE 2.8.0 focuses on making production federated learning easier to operate across organizations, studies, and runtime environments. The release adds Docker and Kubernetes job launchers, a broader automation-friendly CLI, distributed provisioning, multi-study support, stronger observability, and additional production hardening. It also adds new examples and research bundles for multimodal, language-model, Docker, Kubernetes, and privacy-oriented federated learning workflows.
- Modern NVFlare CLI: expanded
nvflarecommand groups for jobs, system operations, local config, startup kits, recipes, distributed provisioning, and deployment preparation, with JSON output and schema support so operators and automation systems can run FLARE workflows without relying on console-only behavior. - Distributed provisioning: new
nvflare certandnvflare packageworkflows let participants keep private keys local while Project Admins approve certificate requests and generate signed packages, improving security ownership in cross-organization deployments. - Deployment prepare and runtime packaging: new
nvflare deploy prepareflow packages existing startup kits for Docker and Kubernetes runtimes, including Kubernetes environments on AWS, Azure, and GCP, so provisioning and runtime packaging can be handled as separate repeatable steps. - Docker and Kubernetes job launchers: each site can configure a process, Docker, or Kubernetes job launcher. With the matching launcher configured, host-based jobs run as subprocesses, Docker-based jobs run as job containers, and Kubernetes-based jobs run as separate job pods, giving production sites Docker/Kubernetes isolation and resource handling plus study-scoped dataset mounts for stronger data isolation.
- Multi-study support: study definitions in
project.yml, study-scoped sessions, study-aware admin operations, and study CLI commands let one FLARE deployment host multiple collaborations without mixing participants, authorization, data access, or operational context. - Live log streaming: site and job logs stream to the server while jobs are running, reducing time to diagnose remote training failures and making CLI automation more responsive.
- Security and production hardening: origin-bound auth tokens, safer archive handling, stricter private-key file permissions, safer loading paths, stronger job metadata validation, and additional dashboard/API hardening reduce common operational risk in federated deployments.
- Feature election: a new federated feature selection workflow lets clients perform local feature selection for tabular datasets and share feature scores, not raw data, so FLARE can aggregate a global feature mask for downstream training.
- Tensor disk offload for FedAvg: enabling
enable_tensor_disk_offload=Truesignificantly reduces server peak memory during FedAvg aggregation. Instead of holding all client tensor updates in memory simultaneously, each update is written to a temporary safetensors file on disk and consumed lazily during aggregation. The benefit scales with model size and client count. - Large-model streaming reliability: large tensor broadcasts are more robust when many clients retry after delayed EOF responses. Finished download refs are handled idempotently, and subprocess Client API jobs now reject unbounded result resends or missing download-completion waits that can turn one slow transfer into repeated large-model retries.
- New examples and contributed research: MedGemma, Qwen3-VL, Codon-FM, FedUMM, financial-services fraud detection, Docker job examples, distributed provisioning examples, Hello JAX, and Hello log streaming help teams start from working patterns instead of assembling production and research workflows from scratch.
See the full 2.8.0 release note: https://nvflare.readthedocs.io/en/2.8.0/release_notes/flare_280.html
What's Changed
- Expanded NVFlare CLI commands, shared plumbing, POC/provision/backend flows, docs, examples, and startup-kit workflows by @chesterxgchen in #4449, #4448, #4447, #4479
- Added distributed provisioning with
nvflare cert/nvflare package, job CLI connection args, system commands, workflow enhancements, and provision-version support by @chesterxgchen in #4380, #4462, #4481, #4508 - Added deploy prepare for Docker and Kubernetes runtime packaging by @YuanTingHsieh in #4499
- Added Docker and Kubernetes job launcher support, job handles, multicloud Kubernetes tooling, CellNet workspace transfer, and study-scoped job pod isolation by @IsaacYangSLA, @YuanTingHsieh, and @pcnudde in #4336, #4409, #4450, #4469, #4474
- Added multi-study deployment and administration support, including study plumbing, runtime study commands, registry support, and PoC environment support by @pcnudde and @chesterxgchen in #4386, #4398, #4472, #4415
- Added live job log streaming and per-site log streaming control by @nvidianz in #4454, #4476
- Added the federated Feature Election workflow by @christofilojohn in #3876
- Added tensor disk offload for PyTorch FedAvg, in-flight cleanup, run-scoped temp cleanup, server tempdir guidance, and release-note memory chart coverage by @pcnudde and @chesterxgchen in #4221, #4501, #4534, #4495, #4668, #4769
- Improved large-model streaming reliability with finished download-ref retry handling, Client API launcher resend validation, incomplete-download protection, and updated docs by @chesterxgchen in #4708, #4710, #4725, #4714
- Hardened auth, archive handling, artifact writer paths, JsonStats encoder loading, BYOC FOBS decomposer loading, and confidential-computing class allow-list support by @pcnudde, @nvidianz, @chesterxgchen, and @IsaacYangSLA in #4605, #4509, #4738, #4740, #4749, #4756, #4701
- Added safer deserialization for
torch.load/np.loadand stricter private-key file permissions by @gn00295120 and @GeorgeWang-nv in #4344, #4431 - Removed deprecated FLAdminAPI and HA/Overseer code by @pcnudde and @nvidianz in #4400, #4503
- Aligned CLI Python support with Python 3.10 through 3.14 by @pcnudde in #4533
- Added Codon-FM, Qwen3-VL, MedGemma, FedUMM, financial-services fraud detection, and Hello JAX examples/research by @holgerroth, @ZiyueXu77, and @rollingsu in #3889, #4212, #4277, #4359, #4424, #4158, #4395, #4358
- Added the NVFlare CLI tutorial and refreshed the tutorial example catalog by @chesterxgchen in #4639, #4672
Full Changelog: 2.7.2...2.8.0