# Federated Learning for Image Classification using Fedscale

FedScale, a diverse set of challenging and realistic benchmark datasets to facilitate scalable, comprehensive, and reproducible federated learning (FL) research. FedScale datasets are large-scale, encompassing a diverse range of important FL tasks, such as image classification, object detection, language modeling, speech recognition, and reinforcement learning. For each dataset, we provide a unified evaluation protocol using realistic data splits and evaluation metrics. To meet the pressing need for reproducing realistic FL at scale, we have also built an efficient evaluation platform, FedScale Automated Runtime (FAR), to simplify and standardize the process of FL experimental setup and model evaluation. Our evaluation platform provides flexible APIs to implement new FL algorithms and include new execution backends with minimal developer efforts.

In [1]:
import sys, os
sys.path.insert(1, os.path.join(sys.path[0], './FedScale/core/'))
from FedScale.core.client import Client
from FedScale.core.aggregator import Aggregator
from FedScale.core.fl_client_libs import args
Demo_Aggregator = Aggregator(args)
Demo_Aggregator.run()

(03-03) 20:36:04 INFO     [aggregator.py:21] Job args Namespace(adam_epsilon=1e-08, backbone='./resnet50.pth', backend='gloo', base_port=10001, batch_size=30, bidirectional=True, blacklist_max_len=0.3, blacklist_rounds=-1, block_size=64, cfg_file='./utils/rcnn/cfgs/res101.yml', clf_block_size=32, clip_bound=0.9, clock_factor=1.1624548736462095, conf_path='~/dataset/', cuda_device=None, cut_off_util=0.05, data_cache='', data_dir='~/cifar10/', data_map_file=None, data_set='cifar10', decay_epoch=10, decay_factor=0.98, device_avail_file=None, device_conf_file='/tmp/client.cfg', dump_epoch=10000000000.0, embedding_file='glove.840B.300d.txt', epochs=50, epsilon=0.9, eval_interval=5, executor_configs='127.0.0.1:[1]', exploration_alpha=0.3, exploration_decay=0.98, exploration_factor=0.9, exploration_min=0.3, filter_less=32, filter_more=1000000000000000.0, finetune=False, gamma=0.9, gradient_policy=None, hidden_layers=7, hidden_size=256, input_dim=0, job_name='demo_job', labels_path='labels.jso

(03-03) 20:38:52 INFO     [aggregator.py:292] Wall clock: 3058299 s, Epoch: 10, Planned participants: 4, Succeed participants: 4, Training loss: 4.463670654565373
(03-03) 20:38:52 INFO     [client_manager.py:163] Wall clock time: 3058299, 4 clients online, 0 clients offline
(03-03) 20:38:52 INFO     [aggregator.py:308] Selected participants to run: [1, 2, 3, 4]:
{1: {'computation': 1.8, 'communication': 339809.15625}, 2: {'computation': 1.8, 'communication': 339809.15625}, 3: {'computation': 1.8, 'communication': 339809.15625}, 4: {'computation': 1.8, 'communication': 339809.15625}}
(03-03) 20:38:57 INFO     [aggregator.py:376] FL Testing in epoch: 10, virtual_clock: 3058298.6062499997, top_1: 16.35 %, top_5: 65.38 %, test loss: 2.3310, test len: 10000
(03-03) 20:39:13 INFO     [aggregator.py:292] Wall clock: 3398110 s, Epoch: 11, Planned participants: 4, Succeed participants: 4, Training loss: 4.252917483430275
(03-03) 20:39:13 INFO     [client_manager.py:163] Wall clock time: 3398110

(03-03) 20:42:40 INFO     [client_manager.py:163] Wall clock time: 7475841, 4 clients online, 0 clients offline
(03-03) 20:42:40 INFO     [aggregator.py:308] Selected participants to run: [1, 2, 3, 4]:
{1: {'computation': 1.8, 'communication': 339809.15625}, 2: {'computation': 1.8, 'communication': 339809.15625}, 3: {'computation': 1.8, 'communication': 339809.15625}, 4: {'computation': 1.8, 'communication': 339809.15625}}
(03-03) 20:42:56 INFO     [aggregator.py:292] Wall clock: 7815652 s, Epoch: 24, Planned participants: 4, Succeed participants: 4, Training loss: 3.089161296527864
(03-03) 20:42:56 INFO     [client_manager.py:163] Wall clock time: 7815652, 4 clients online, 0 clients offline
(03-03) 20:42:56 INFO     [aggregator.py:308] Selected participants to run: [1, 2, 3, 4]:
{1: {'computation': 1.8, 'communication': 339809.15625}, 2: {'computation': 1.8, 'communication': 339809.15625}, 3: {'computation': 1.8, 'communication': 339809.15625}, 4: {'computation': 1.8, 'communication'

(03-03) 20:46:25 INFO     [client_manager.py:163] Wall clock time: 11893383, 4 clients online, 0 clients offline
(03-03) 20:46:25 INFO     [aggregator.py:308] Selected participants to run: [1, 2, 3, 4]:
{1: {'computation': 1.8, 'communication': 339809.15625}, 2: {'computation': 1.8, 'communication': 339809.15625}, 3: {'computation': 1.8, 'communication': 339809.15625}, 4: {'computation': 1.8, 'communication': 339809.15625}}
(03-03) 20:46:42 INFO     [aggregator.py:292] Wall clock: 12233194 s, Epoch: 37, Planned participants: 4, Succeed participants: 4, Training loss: 2.705342978469112
(03-03) 20:46:42 INFO     [client_manager.py:163] Wall clock time: 12233194, 4 clients online, 0 clients offline
(03-03) 20:46:42 INFO     [aggregator.py:308] Selected participants to run: [1, 2, 3, 4]:
{1: {'computation': 1.8, 'communication': 339809.15625}, 2: {'computation': 1.8, 'communication': 339809.15625}, 3: {'computation': 1.8, 'communication': 339809.15625}, 4: {'computation': 1.8, 'communicati

(03-03) 20:50:21 INFO     [aggregator.py:292] Wall clock: 16650737 s, Epoch: 50, Planned participants: 4, Succeed participants: 4, Training loss: 2.4984337478283076
(03-03) 20:50:21 INFO     [client_manager.py:163] Wall clock time: 16650737, 4 clients online, 0 clients offline
(03-03) 20:50:21 INFO     [aggregator.py:308] Selected participants to run: [1, 2, 3, 4]:
{1: {'computation': 1.8, 'communication': 339809.15625}, 2: {'computation': 1.8, 'communication': 339809.15625}, 3: {'computation': 1.8, 'communication': 339809.15625}, 4: {'computation': 1.8, 'communication': 339809.15625}}
(03-03) 20:50:21 INFO     [aggregator.py:503] Terminating the aggregator ...


AttributeError: 'Aggregator' object has no attribute 'control_manager'

In [7]:
!tensorboard --logdir=./logs/demo_job --port=6006 --bind_all

/users/Yinwei
TensorFlow installation not found - running with reduced feature set.

NOTE: Using experimental fast data loading logic. To disable, pass
    "--load_fast=false" and report issues on GitHub. More details:
    https://github.com/tensorflow/tensorboard/issues/4784

TensorBoard 2.8.0 at http://clnode219.clemson.cloudlab.us:6006/ (Press CTRL+C to quit)
^C
