Federated Learning System Literature

A curated list of system-level optimization approaches on synchronous federated learning.

This repository serves as a complement of the survey below.

Towards Efficient Synchronous Federated Training: A Survey on System Optimization Strategies (IEEE TBD 2022)

@article{jiang2022towards,
  author={Jiang, Zhifeng and Wang, Wei and Li, Bo and Yang, Qiang},
  journal={IEEE Transactions on Big Data}, 
  title={Towards Efficient Synchronous Federated Training: A Survey on System Optimization Strategies}, 
  year={2023},
  volume={9},
  number={2},
  pages={437-454},
  doi={10.1109/TBDATA.2022.3177222}}

If you feel this repository is helpful, please help to cite the survey above.

How to Search?

Search keywords like conference name (e.g., OSDI), target phase (e.g., Client Selection), or performance metric (e.g., Communication Cost) over the webpage to quickly locate related papers.

Quick Links

Recent Optimization Approaches:

Optimizing the Selection Phase: At the beginning of each round, the server waits for a sufficient number of clients with eligible status (i.e., currently charging and connected to an unmetered network) to check in. The server then selects a subset of them based on certain strategies (e.g., randomly or selectively) for participation, and notifies the others to reconnect later.
Optimizing the Configuration Phase: The server next sends the global model status and configuration profiles (e.g., the number of local epochs or the reporting deadline) to each of the selected clients. Based on the instructed configuration, the clients perform local model training independently with their private data.
Optimizing the Reporting Phase: The server then waits for the participating clients to report local updates until reaching the predefined deadline. The current round is aborted if no enough clients report in time. Otherwise, the server aggregates the received local updates, uses the aggregate to update the global model status, and concludes the round.

Measuring and Benchmarking Tools:

Measurement-Based Research
Benchmarking Suite

2 Recent Optimization Approaches

2.1 Optimizing the Selection Phase

Year	Title	Category	Venue	Paper Link
2021	AutoFL: Enabling heterogeneity-aware energy efficient federated learning	Co-design (Fine-grained)	ACM MICRO	Link
2021	Oort: Efficient federated learning via guided participant selection	Co-design (Fine-grained)	USENIX OSDI	Link
2021	Client selection for federated learning with non-IID data in mobile edge computing	Partial optimization (Statistics-oriented)	IEEE Access	Link
2020	TiFL: A tier-based federated learning system	Co-design (Coarse-grained)	ACM HDPC	Link
2020	Optimizing federated learning on non-IID data with reinforcement learning	Partial optimization (statistics-oriented)	IEEE INFOCOM	Link
2019	Client selection for federated learning with heterogeneous resources in mobile edge	Partial optimization (system-oriented)	IEEE ICC	Link

2.2 Optimizing the Configuration Phase

2.2.1 Synchronization Frequency Reduction

Year	Title	Category	Venue	Paper Link
2021	Communication-efficient federated learning with adaptive parameter freezing	Parameter-level	IEEE ICDCS	Link
2020	Communication-efficient federated deep learning with layerwise asynchronous model update and temporally weighted aggregation	Layer-level	IEEE TNNLS	Link
2019	CMFL: Mitigating communication overhead for federated learning	Client-level	IEEE ICDCS	Link
2018	Efficient decentralized deep learning by dynamic model averaging	Client-level	ECML-PKDD	Link

2.2.2 Model Update Size Reduction

Year	Title	Category	Venue	Paper Link
2020	FetchSGD: Communication-efficient federated learning with sketching	Sketch	ICML	Link
2019	Compressing Gradient Optimizers via Count-Sketches	Sketch	ICML	Link
2019	Communication-efficient distributed SGD with sketching	Sketch	NeurIPS	Link
2019	Error feedback fixes SignSGD and other gradient compression schemes	Quantization	ICML	Link
2019	SignSGD with majority vote is communication efficient and fault tolerant	Quantization	ICLR	Link
2019	A distributed synchronous SGD algorithm with global top-k sparsification for low bandwidth networks	Sparsification	IEEE ICDCS	Link
2018	Sparsified SGD with memory	Sparsification	NeurIPS	Link
2018	Deep gradient compression: Reducing the communication bandwidth for distributed training	Sparsification	ICLR	Link
2018	Gradient sparsification for communication-efficient distributed optimization	Sparsification	NeurIPS	Link
2018	SketchML: Accelerating distributed machine learning with data sketches	Sketch	ACM SIGMOD	Link
2018	Error compensated quantized SGD and its applications to large-scale distributed optimization	Quantization	ICML	Link
2017	Gaia: Geo-distributed machine learning approaching LAN speeds	Client-level	USENIX NSDI	Link
2017	Sparse communication for distributed gradient descent	Sparsification	ACL EMNLP	Link
2017	TernGrad: Ternary gradients to reduce communication in distributed deep learning	Quantization	NeurIPS	Link
2017	QSGD: Communication-efficient SGD via gradient quantization and encoding	Quantization	NeurIPS	Link

2.2.3 Training Latency Reduction

Year	Title	Category	Venue	Paper Link
2021	Accelerating DNN training in wireless federated edge learning systems	Load balancing (Communication)	IEEE JSAC	Link
2021	HeteroFL: Computation and communication efficient federated learning for heterogeneous clients	Load balancing (Optimization step)	ICLR	Link
2021	Towards efficient scheduling of federated mobile devices under computational and statistical heterogeneity	Load balancing (Data amount)	IEEE TPDS	Link
2020	Federated optimization in heterogeneous networks	Load balancing (Optimization step)	MLSys	Link
2020	Resource allocation in mobility-aware federated learning networks: A deep reinforcement learning approach	Load balancing (Data amount)	IEEE WF-IoT	Link
2019	Efficient training management for mobile crowd-machine learning: A deep reinforcement learning approach	Load balancing (Data amount)	IEEE WCL	Link

2.2.4 Training Round Reduction

Year	Title	Category	Venue	Paper Link
2021	Breaking the centralized barrier for cross-device federated learning	Client bias reduction	NeurIPS	Link
2021	Federated learning based on dynamic regularization	Client bias reduction	ICLR	Link
2020	Federated learning via posterior averaging: A new perspective and practical algorithms	Client bias reduction	ICLR	Link
2020	SCAFFOLD: Stochastic controlled averaging for federated learning	Client bias reduction	ICML	Link
2020	Federated optimization in heterogeneous networks	Client bias reduction	MLSys	Link
2020	Accelerating federated learning via momentum gradient descent	Optimizer state synchronization	IEEE TPDS	Link
2020	Federated accelerated stochastic gradient descent	Optimizer state synchronization	NeurIPS	Link
2019	FedDANE: A federated Newton-type method	Client bias reduction	IEEE ACSSC	Link
2019	On the linear speedup analysis of communication efficient momentum SGD for distributed nonconvex optimization	Optimizer state synchronization	ICML	Link

2.3 Optimizing the Reporting Phase

2.3.1 Aggregation Latency Reduction

Year	Title	Category	Venue	Paper Link
2022	LightSecAgg: Rethinking secure aggregation in federated learning	Lightweight privacy-preserving aggregation	MLSys	Link
2021	Flashe: Additively symmetric homomorphic encryption for cross-silo federated learning	Lightweight privacy-preserving aggregation	arXiv	Link
2021	Turbo-aggregate: Breaking the quadratic aggregation barrier in secure federated learning	Lightweight privacy-preserving aggregation	IEEE JSAIT	Link
2020	FastSecAgg: Scalable secure aggregation for privacy-preserving federated learning	Lightweight privacy-preserving aggregation	ICML Workshop	Link
2020	Secure single-server aggregation with (poly) logarithmic overhead	Lightweight privacy-preserving aggregation	ACM CCS	Link
2020	BatchCrypt: Efficient homomorphic encryption for cross-silo federated learning	Lightweight privacy-preserving aggregation	USENIX ATC	Link
2020	Accelerating federated learning over reliability-agnostic clients in mobile edge computing systems	Hierarchical aggregation	IEEE TPDS	Link
2020	Hierarchical federated learning across heterogeneous cellular networks	Hierarchical aggregation	IEEE ICASSP	Link
2020	Client-edge-cloud hierarchical federated learning	Hierarchical aggregation	IEEE ICC	Link

2.3.2 Adaptive Aggregation

Year	Title	Category	Venue	Paper Link
2021	Adaptive federated optimization	Server-side optimizer	ICLR	Link
2020	SlowMo: Improving communication-efficient distributed SGD with slow momentum	Server-side optimizer	ICLR	Link
2019	Measuring the effects of nonidentical data distribution for federated visual classification	Server-side optimizer	NeurIPS Workshop	Link

3 Measuring and Benchmarking Tools

3.1 Measurement-Based Research

Year	Title	Category	Venue	Paper Link
2021	Characterizing impacts of heterogeneity in federated learning upon large-scale smartphone data	Mobile	ACM WWW	Link

3.2 Benchmarking Suites

Year	Title	Category	Venue	Paper Link
2022	The OARF benchmark suite: Characterization and implications for federated learning systems	Training datasets	ACM TIST	Link
2022	FedScale: Benchmarking model and system performance of federated learning	Training datasets	ICML Workshop	Link
2021	FATE: An industrial grade platform for collaborative learning with data protection	Production systems and simulation platforms	ACM JMLR	Link
2020	Flower: A friendly federated learning research framework	Production systems and simulation platforms	arXiv	Link
2020	FedML: A research library and benchmark for federated machine learning	Production systems and simulation platforms	NeurIPS Workshop	Link
2018	Leaf: A benchmark for federated settings	Training datasets	arXiv	Link

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Repository files navigation

Federated Learning System Literature

How to Search?

Quick Links

2 Recent Optimization Approaches

2.1 Optimizing the Selection Phase

2.2 Optimizing the Configuration Phase

2.2.1 Synchronization Frequency Reduction

2.2.2 Model Update Size Reduction

2.2.3 Training Latency Reduction

2.2.4 Training Round Reduction

2.3 Optimizing the Reporting Phase

2.3.1 Aggregation Latency Reduction

2.3.2 Adaptive Aggregation

3 Measuring and Benchmarking Tools

3.1 Measurement-Based Research

3.2 Benchmarking Suites

About

Releases

Packages

SamuelGong/federated-learning-system-literature

Folders and files

Latest commit

History

README.md

README.md

Repository files navigation

Federated Learning System Literature

How to Search?

Quick Links

2 Recent Optimization Approaches

2.1 Optimizing the Selection Phase

2.2 Optimizing the Configuration Phase

2.2.1 Synchronization Frequency Reduction

2.2.2 Model Update Size Reduction

2.2.3 Training Latency Reduction

2.2.4 Training Round Reduction

2.3 Optimizing the Reporting Phase

2.3.1 Aggregation Latency Reduction

2.3.2 Adaptive Aggregation

3 Measuring and Benchmarking Tools

3.1 Measurement-Based Research

3.2 Benchmarking Suites

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages