Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add docs for scripts #305

Merged
merged 10 commits into from Aug 11, 2022
20 changes: 10 additions & 10 deletions README.md
Expand Up @@ -161,9 +161,9 @@ Here we demonstrate how to run a standard FL task with FederatedScope, with sett

```bash
# Run with default configurations
python federatedscope/main.py --cfg federatedscope/example_configs/femnist.yaml
python federatedscope/main.py --cfg scripts/example_configs/femnist.yaml
# Or with custom configurations
python federatedscope/main.py --cfg federatedscope/example_configs/femnist.yaml federated.total_round_num 50 data.batch_size 128
python federatedscope/main.py --cfg scripts/example_configs/femnist.yaml federated.total_round_num 50 data.batch_size 128
```

Then you can observe some monitored metrics during the training process as:
Expand Down Expand Up @@ -203,12 +203,12 @@ We prepare a synthetic example for running with distributed mode:

```bash
# For server
python main.py --cfg federatedscope/example_configs/distributed_server.yaml distribute.data_file 'PATH/TO/DATA' distribute.server_host x.x.x.x distribute.server_port xxxx
python federatedscope/main.py --cfg scripts/distributed_scripts/distributed_configs/distributed_server.yaml distribute.data_file 'PATH/TO/DATA' distribute.server_host x.x.x.x distribute.server_port xxxx

# For clients
python main.py --cfg federatedscope/example_configs/distributed_client_1.yaml distribute.data_file 'PATH/TO/DATA' distribute.server_host x.x.x.x distribute.server_port xxxx distribute.client_host x.x.x.x distribute.client_port xxxx
python main.py --cfg federatedscope/example_configs/distributed_client_2.yaml distribute.data_file 'PATH/TO/DATA' distribute.server_host x.x.x.x distribute.server_port xxxx distribute.client_host x.x.x.x distribute.client_port xxxx
python main.py --cfg federatedscope/example_configs/distributed_client_3.yaml distribute.data_file 'PATH/TO/DATA' distribute.server_host x.x.x.x distribute.server_port xxxx distribute.client_host x.x.x.x distribute.client_port xxxx
python federatedscope/main.py --cfg scripts/distributed_scripts/distributed_configs/distributed_client_1.yaml distribute.data_file 'PATH/TO/DATA' distribute.server_host x.x.x.x distribute.server_port xxxx distribute.client_host x.x.x.x distribute.client_port xxxx
python federatedscope/main.py --cfg scripts/distributed_scripts/distributed_configs/distributed_client_2.yaml distribute.data_file 'PATH/TO/DATA' distribute.server_host x.x.x.x distribute.server_port xxxx distribute.client_host x.x.x.x distribute.client_port xxxx
python federatedscope/main.py --cfg scripts/distributed_scripts/distributed_configs/distributed_client_3.yaml distribute.data_file 'PATH/TO/DATA' distribute.server_host x.x.x.x distribute.server_port xxxx distribute.client_host x.x.x.x distribute.client_port xxxx
```

An executable example with generated toy data can be run with (a script can be found in `scripts/run_distributed_lr.sh`):
Expand All @@ -217,14 +217,14 @@ An executable example with generated toy data can be run with (a script can be f
python scripts/gen_data.py

# Firstly start the server that is waiting for clients to join in
python federatedscope/main.py --cfg federatedscope/example_configs/distributed_server.yaml distribute.data_file toy_data/server_data distribute.server_host 127.0.0.1 distribute.server_port 50051
python federatedscope/main.py --cfg scripts/distributed_scripts/distributed_configs/distributed_server.yaml distribute.data_file toy_data/server_data distribute.server_host 127.0.0.1 distribute.server_port 50051

# Start the client #1 (with another process)
python federatedscope/main.py --cfg federatedscope/example_configs/distributed_client_1.yaml distribute.data_file toy_data/client_1_data distribute.server_host 127.0.0.1 distribute.server_port 50051 distribute.client_host 127.0.0.1 distribute.client_port 50052
python federatedscope/main.py --cfg scripts/distributed_scripts/distributed_configs/distributed_client_1.yaml distribute.data_file toy_data/client_1_data distribute.server_host 127.0.0.1 distribute.server_port 50051 distribute.client_host 127.0.0.1 distribute.client_port 50052
# Start the client #2 (with another process)
python federatedscope/main.py --cfg federatedscope/example_configs/distributed_client_2.yaml distribute.data_file toy_data/client_2_data distribute.server_host 127.0.0.1 distribute.server_port 50051 distribute.client_host 127.0.0.1 distribute.client_port 50053
python federatedscope/main.py --cfg scripts/distributed_scripts/distributed_configs/distributed_client_2.yaml distribute.data_file toy_data/client_2_data distribute.server_host 127.0.0.1 distribute.server_port 50051 distribute.client_host 127.0.0.1 distribute.client_port 50053
# Start the client #3 (with another process)
python federatedscope/main.py --cfg federatedscope/example_configs/distributed_client_3.yaml distribute.data_file toy_data/client_3_data distribute.server_host 127.0.0.1 distribute.server_port 50051 distribute.client_host 127.0.0.1 distribute.client_port 50054
python federatedscope/main.py --cfg scripts/distributed_scripts/distributed_configs/distributed_client_3.yaml distribute.data_file toy_data/client_3_data distribute.server_host 127.0.0.1 distribute.server_port 50051 distribute.client_host 127.0.0.1 distribute.client_port 50054
```

And you can observe the results as (the IP addresses are anonymized with 'x.x.x.x'):
Expand Down
3 changes: 1 addition & 2 deletions demo/bbo.py
Expand Up @@ -23,8 +23,7 @@ def eval_fl_algo(x):
from federatedscope.core.fed_runner import FedRunner

init_cfg = global_cfg.clone()
init_cfg.merge_from_file(
"federatedscope/example_configs/single_process.yaml")
init_cfg.merge_from_file("scripts/example_configs/single_process.yaml")
init_cfg.merge_from_list(["train.optimizer.lr", float(x[0])])

update_logger(init_cfg, True)
Expand Down
3 changes: 1 addition & 2 deletions demo/hpbandster/rs.py
Expand Up @@ -43,8 +43,7 @@ def eval_fl_algo(x, b):
from federatedscope.core.fed_runner import FedRunner

init_cfg = global_cfg.clone()
init_cfg.merge_from_file(
"federatedscope/example_configs/single_process.yaml")
init_cfg.merge_from_file("scripts/example_configs/single_process.yaml")
# specify the configuration of interest
init_cfg.merge_from_list([
"train.optimizer.lr",
Expand Down
3 changes: 1 addition & 2 deletions demo/smac/gp.py
Expand Up @@ -19,8 +19,7 @@ def eval_fl_algo(x):
from federatedscope.core.fed_runner import FedRunner

init_cfg = global_cfg.clone()
init_cfg.merge_from_file(
"federatedscope/example_configs/single_process.yaml")
init_cfg.merge_from_file("scripts/example_configs/single_process.yaml")
# specify the configuration of interest
init_cfg.merge_from_list([
"optimizer.lr",
Expand Down
3 changes: 1 addition & 2 deletions demo/smac/rf.py
Expand Up @@ -19,8 +19,7 @@ def eval_fl_algo(x):
from federatedscope.core.fed_runner import FedRunner

init_cfg = global_cfg.clone()
init_cfg.merge_from_file(
"federatedscope/example_configs/single_process.yaml")
init_cfg.merge_from_file("scripts/example_configs/single_process.yaml")
# specify the configuration of interest
init_cfg.merge_from_list([
"optimizer.lr",
Expand Down
8 changes: 0 additions & 8 deletions federatedscope/example_configs/cora/run.sh

This file was deleted.

8 changes: 0 additions & 8 deletions federatedscope/example_configs/femnist/run.sh

This file was deleted.

2 changes: 1 addition & 1 deletion federatedscope/organizer/client.py
Expand Up @@ -156,7 +156,7 @@ def do_create_room(self, line):
' command, extra command to launch FS\n' \
' psw, password for room \n\n' \
'Example:\n' \
' create_room --cfg ../../federatedscope/example_configs' \
' create_room --cfg ../../scripts/example_configs' \
'/distributed_femnist_server.yaml 12345\n'
try:
global organizer
Expand Down
2 changes: 1 addition & 1 deletion run_reorganized_standalone.sh
Expand Up @@ -10,7 +10,7 @@ echo "Starts..."

lr=0.01

python federatedscope/main.py --cfg federatedscope/example_configs/single_process.yaml device ${cudaid} data.type toy data.splitter ooxx \
python federatedscope/main.py --cfg scripts/example_configs/single_process.yaml device ${cudaid} data.type toy data.splitter ooxx \
optimizer.lr ${lr} model.type lr federate.mode standalone trainer.type general federate.total_round_num 50 \
>>out_reorganize/lr.out \
2>>out_reorganize/lr.err
Expand Down
58 changes: 58 additions & 0 deletions scripts/README.md
@@ -0,0 +1,58 @@
## Scripts for Reproduction
We provide some scripts for reproducing existing algorithms with FederatedScope, which are constantly being updated.
We greatly appreciate any [contribution](https://federatedscope.io/docs/contributor/) to FederatedScope!

- [Distribute Mode](#distribute-mode)
- [Asynchronous Training Strategy](#asynchronous-training-strategy)
- [Graph Federated Learning](#graph-federated-learning)

### Distribute Mode
Users can train an LR on generated toy data with distribute mode via:
```shell script
bash distributed_scripts/run_distributed_lr.sh
```
The FL course consists of 1 server and 3 clients, which executes on one device as simulation. Each client owns private data and the server holds a test set for global evaluation.
- For running with multiple devices, you need to specify the host/port of IP addresses in the configurations (i.e., the yaml files) and make sure these devices are connected.
Then you can launch the participants (i.e., `python federatedscope/main.py --cfg xxx.yaml`) on each provided device (Remember to launch the server first).
- For the situation that server doesn't own data and the evaluation is performed at clients, use `distributed_server_no_data.yaml` at this [line](https://github.com/alibaba/FederatedScope/blob/master/scripts/distributed_scripts/run_distributed_lr.sh#L11).

Also, users can run distribute mode with other provided datasets and models. Take training ConvNet on FEMNIST as an example:
```shell script
bash distributed_scripts/run_distributed_conv_femnist.sh
```

### Federated Learning in Computer Vision (FL-CV)
We provide several configurations (yaml files) as examples to demonstrate how to apply FL in CV with FederatedScope.
Users can run the following comments for reproducing, and modify/add the yaml file for customization, such as using provided/customized datasets and models, tunning hyperparameters, etc.

Train ConvNet on FEMNIST with vanilla FedAvg:
```shell script
cd ..
python federatedscope/main.py --cfg federatedscope/cv/baseline/fedavg_convnet2_on_femnist.yaml
# or
# python federatedscope/main.py --cfg scripts/example_configs/femnist.yaml
```

Train ConvNet on CelebA with vanilla FedAvg:
```shell script
cd ..
python federatedscope/main.py --cfg federatedscope/cv/baseline/fedavg_convnet2_on_celeba.yaml
```

Train ConvNet on FEMNIST with FedBN:
```shell script
cd ..
python federatedscope/main.py --cfg federatedscope/cv/baseline/fedbn_convnet2_on_femnist.yaml
```

### Asynchronous Training Strategy
We provide an example for training ConvNet on CIFAR-10 with asynchronous training strategies:
```shell script
cd ..
python federatedscope/main.py --cfg scritpes/example_configs/asyn_cifar10.yaml
```
The FL courses consists of 1 server and 200 clients, which applies `goal_achieved` strategies and set the `min_received_num=10` and `staleness_toleration=10`.
Users can change the configurations related to asynchronous training for customization. Please see [configurations](https://github.com/alibaba/FederatedScope/tree/master/federatedscope/core/configs).

### Graph Federated Learning
Please refer to [gfl](https://github.com/alibaba/FederatedScope/tree/master/federatedscope/gfl) for more details.
@@ -1,5 +1,5 @@
use_gpu: True
device: 1
device: 0
early_stop:
patience: 5
seed: 12345
Expand All @@ -8,11 +8,13 @@ federate:
mode: 'distributed'
make_global_eval: False
online_aggr: False
total_round_num: 300
total_round_num: 20
distribute:
use: True
server_host: '127.0.0.1'
server_port: 50051
client_host: '127.0.0.1'
client_port: 50052
role: 'client'
data_idx: 2
data:
Expand Down Expand Up @@ -40,7 +42,7 @@ criterion:
trainer:
type: cvtrainer
eval:
freq: 1
freq: 10
metrics: ['acc', 'correct']
report: [ 'weighted_avg', 'raw' ]
count_flops: False
Expand Up @@ -8,11 +8,13 @@ federate:
mode: 'distributed'
make_global_eval: False
online_aggr: False
total_round_num: 300
total_round_num: 20
distribute:
use: True
server_host: '127.0.0.1'
server_port: 50051
client_host: '127.0.0.1'
client_port: 50053
role: 'client'
data_idx: 3
data:
Expand Down Expand Up @@ -40,7 +42,7 @@ criterion:
trainer:
type: cvtrainer
eval:
freq: 1
freq: 10
metrics: ['acc', 'correct']
report: [ 'weighted_avg', 'raw' ]
count_flops: False
@@ -1,5 +1,5 @@
use_gpu: True
device: 1
device: 0
early_stop:
patience: 5
seed: 12345
Expand All @@ -8,11 +8,13 @@ federate:
mode: 'distributed'
make_global_eval: False
online_aggr: False
total_round_num: 300
total_round_num: 20
distribute:
use: True
server_host: '127.0.0.1'
server_port: 50051
client_host: '127.0.0.1'
client_port: 50054
role: 'client'
data_idx: 4
data:
Expand Down Expand Up @@ -40,7 +42,7 @@ criterion:
trainer:
type: cvtrainer
eval:
freq: 1
freq: 10
metrics: ['acc', 'correct']
report: [ 'weighted_avg', 'raw' ]
count_flops: False
Expand Up @@ -8,7 +8,7 @@ federate:
mode: 'distributed'
make_global_eval: False
online_aggr: False
total_round_num: 300
total_round_num: 20
distribute:
use: True
server_host: '127.0.0.1'
Expand Down Expand Up @@ -40,7 +40,7 @@ criterion:
trainer:
type: cvtrainer
eval:
freq: 1
freq: 10
metrics: ['acc', 'correct']
report: ['weighted_avg', 'raw']
count_flops: False
File renamed without changes.
17 changes: 17 additions & 0 deletions scripts/distributed_scripts/run_distributed_conv_femnist.sh
@@ -0,0 +1,17 @@
set -e

cd ..

echo "Run distributed mode with ConvNet-2 on FEMNIST..."

### server
python federatedscope/main.py --cfg scripts/distributed_scripts/distributed_configs/distributed_femnist_server.yaml &
sleep 2

# clients
python federatedscope/main.py --cfg scripts/distributed_scripts/distributed_configs/distributed_femnist_client_1.yaml &
sleep 2
python federatedscope/main.py --cfg scripts/distributed_scripts/distributed_configs/distributed_femnist_client_2.yaml &
sleep 2
python federatedscope/main.py --cfg scripts/distributed_scripts/distributed_configs/distributed_femnist_client_3.yaml &

22 changes: 22 additions & 0 deletions scripts/distributed_scripts/run_distributed_lr.sh
@@ -0,0 +1,22 @@
set -e

cd ..

echo "Test distributed mode with LR..."

echo "Data generation"
python scripts/gen_data.py

### server owns global test data
python federatedscope/main.py --cfg scripts/distributed_scripts/distributed_configs/distributed_server.yaml &
### server doesn't own data
# python federatedscope/main.py --cfg scripts/distributed_scripts/distributed_configs/distributed_server_no_data.yaml &
sleep 2

# clients
python federatedscope/main.py --cfg scripts/distributed_scripts/distributed_configs/distributed_client_1.yaml &
sleep 2
python federatedscope/main.py --cfg scripts/distributed_scripts/distributed_configs/distributed_client_2.yaml &
sleep 2
python federatedscope/main.py --cfg scripts/distributed_scripts/distributed_configs/distributed_client_3.yaml &

8 changes: 8 additions & 0 deletions scripts/example_configs/cora/run.sh
@@ -0,0 +1,8 @@
# SHA
python hpo.py --cfg scripts/example_configs/cora/sha.yaml

# SHA wrap FedEX (FedEX related param)
python hpo.py --cfg scripts/example_configs/cora/sha_wrap_fedex.yaml

# SHA wrap FedEX (arm)
python hpo.py --cfg scripts/example_configs/cora/sha_wrap_fedex_arm.yaml
Expand Up @@ -38,7 +38,7 @@ hpo:
scheduler: sha
num_workers: 0
init_cand_num: 81
ss: 'federatedscope/example_configs/cora/hpo_ss_sha.yaml'
ss: 'scripts/example_configs/cora/hpo_ss_sha.yaml'
sha:
budgets: [2, 4, 12, 36]
metric: 'server_global_eval.val_avg_loss'
Expand Up @@ -38,10 +38,10 @@ hpo:
scheduler: sha
num_workers: 0
init_cand_num: 81
ss: 'federatedscope/example_configs/cora/hpo_ss_fedex.yaml'
ss: 'scripts/example_configs/cora/hpo_ss_fedex.yaml'
sha:
budgets: [2, 4, 12, 36]
fedex:
use: True
ss: 'federatedscope/example_configs/cora/hpo_ss_fedex_grid.yaml'
ss: 'scripts/example_configs/cora/hpo_ss_fedex_grid.yaml'
metric: 'server_global_eval.val_avg_loss'
Expand Up @@ -38,9 +38,9 @@ hpo:
scheduler: wrap_sha
num_workers: 0
init_cand_num: 81
ss: 'federatedscope/example_configs/cora/hpo_ss_fedex_arm_table.yaml'
ss: 'scripts/example_configs/cora/hpo_ss_fedex_arm_table.yaml'
table:
ss: 'federatedscope/example_configs/cora/hpo_ss_fedex_arm.yaml'
ss: 'scripts/example_configs/cora/hpo_ss_fedex_arm.yaml'
num: 4
cand: 81
sha:
Expand Down
Expand Up @@ -21,6 +21,6 @@ model:
hpo:
fedex:
use: True
# ss: 'federatedscope/example_configs/fedex_flat_search_space.yaml'
ss: 'federatedscope/example_configs/fedex_grid_search_space.yaml'
# ss: 'scripts/example_configs/fedex_flat_search_space.yaml'
ss: 'scripts/example_configs/fedex_grid_search_space.yaml'
diff: True
Expand Up @@ -43,7 +43,7 @@ hpo:
scheduler: sha
num_workers: 0
init_cand_num: 10
ss: 'federatedscope/example_configs/femnist/avg/ss.yaml'
ss: 'scripts/example_configs/femnist/avg/ss.yaml'
sha:
budgets: [50]
elim_rate: 10
Expand Down