A simulation framework for Asynchronous Multi-Server Federated Learning (FL) with support for multiple aggregation strategies, client-server configurations, and dataset settings.
Use Python 3.9 to create a virtual environment. Install required packages manually:
pip install torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cu118 torchdata==0.9.0
pip install -r requirements.txtIf you get errors you can install torchdata==0.7.1 but you'll have to make adjustment described here: Github Issue
Add to torch/utils/data/datapipes/utils/common.py on line 23.
# BC for torchdata
DILL_AVAILABLE = dill_available()python3 -m mobilefl <experiment-name> --print # Verbose mode
python3 -m mobilefl <experiment-name> # Quiet modepython3 -m mobilefl example_mnist_nomad_latency_w0 --print # Verbose mode- Ensure conda env is installed on scratch
- Load GPU module:
module load cuda11.2/toolkitWhen you run new experiments, please check the configuration that "speed_flag" is not appeared, otherwise, you should rewrite the config file just as the example file that is shown before. And also please delete the config_client_idcs.json file.
wandb login
python3 -m mobilefl example1 --print --wandbnohup python3 -m mobilefl example1 --print --wandb > debug.out &bash ./runDas.sh example # Test config
nohup bash ./runDas.sh example1 -y > debug.out & # Confirmed run-
Create folder:
./configurations/<NewConfig> -
Copy an existing
config.jsoninto it. -
Edit parameters:
"result_file": Output directory under./results/"name": Name of the experiment (used in paths)
Example:
./configurations/cifar10/config.jsonThen run:
python3 -m mobilefl cifar10 --printRun multiple experiments like FedAsync, MultiAsync, FedAvg by changing the "name" field and other algorithm-specific parameters.
Note: Make sure to also create world config files.
python3 -m mobilefl.generate_client.py 100 client100This creates ./client100.pkl to reuse fixed clients.
python3 -m mobilefl.plot cifar10
python3 -m mobilefl.plot cifar10 FedAsync MultiAsync
python3 -m mobilefl.plot80 145000 45000 comm100_cifar MultiAsync MultiSync FedAsync HierFAVG FedAvg
python3 -m mobilefl.plotq 300 200 cifar10 FedAsync MultiAsync qlenLog details are in ./log_tools/.
- Add dataset logic in:
./data/data.py - Add model in:
./models/ - Modify aggregation logic in:
./models/aggregator.py
ruff check .
# Or automatically fix problems
ruff check --fix .black --check .mypy .| Parameter | Description |
|---|---|
name |
Name of the experiment |
dataset |
mnist, fmnist, cifar |
num_rounds |
Number of local updates or communication rounds |
alpha |
Dirichlet distribution parameter |
server_iid, client_iid |
IID control at server and client level |
num_servers, num_clients |
Number of servers and clients |
server_X_clients |
% of clients assigned to each server |
server_X_training_delay |
Avg speed (latency) of each server's clients |
client_async |
Whether clients update asynchronously |
server_async |
Only true for MultiAsync |
sync_period |
Synchronization period for MultiSync (null for others) |
decay_start |
Decay start point (0.8 or 9999 depending on algorithm) |
agr |
Aggregation rule (fedsgd, fedavg) |
leader |
True only for HierFAVG |
hier_period |
Hierarchical sync interval |
client_fraction_per_round |
For FedAvg only (e.g., 0.2 for 20%) |
cuda, cuda_to_use |
GPU usage flags |
num_local_epochs, batch_size |
Local training hyperparameters |
aggregation_buffer_size |
For FedAsync, buffer size (usually 1) |
Run all experiments:
./runAll.sh cifar <ResultFolderName>Plot results:
./plotAll.sh <time> <updates> <ResultFolderName>- Sliding window size: 100 (edit in
plot.py, line 466)
./runAll.sh mnist <ResultFolderName>
./plotAll.sh <ResultFolderName> <time> <updates>If you use this code in your research or project, please cite the following paper:
@article{yuncong2024spyker,
title={Spyker: Asynchronous Multi-Server Federated Learning for Geo-Distributed Clients},
author={Yuncong Zuo and Bart Cox and Lydia Y. Chen and J{\'{e}}r{\'{e}}mie Decouchant},
journal={Middleware 2025},
year={2024},
url = {https://doi.org/10.1145/3652892.3700778},
doi = {10.1145/3652892.3700778},
}For more details, you can access the paper at https://doi.org/10.1145/3652892.3700778
For any questions or issues, please refer to each module's README or contact the maintainers.
Happy Federated Learning! 🤖