# Federated MNIST multiclass classification using ConvNet

- Separate MNIST dataset by digits. 
- First server has 0-3, second 4-6, third 7-9
- By the end of federated training with averaging they all know how to correctly classify all 10 digits in the test set

In [1]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

In [2]:
import sys
import logging

from neoglia.workers.connect_workers import connect
from neoglia.learn.utils import setup_logging
from neoglia.learn.config import LearnConfig
from neoglia.learn.losses import cross_entropy
from neoglia.learn.models import ConvNet
from neoglia.learn.learner import Learner

In [3]:
logger = logging.getLogger()
logger.setLevel(logging.INFO)
handler = logging.StreamHandler(sys.stderr)
formatter = logging.Formatter('%(name)s - %(levelname)s - %(message)s')
handler.setFormatter(formatter)
logger.handlers = [handler]

## Connect to data nodes

In this demo, we have 3 distinct hospitals. Each is an indenpendent EC2 instance on AWS.

In [4]:
h1, h2, h3 = connect(local=False)

neoglia.workers.connect_workers - INFO - Connected to worker h1.
neoglia.workers.connect_workers - INFO - Connected to worker h2.
neoglia.workers.connect_workers - INFO - Connected to worker h3.


In [5]:
logger.info(h1.list_datasets())

root - INFO - -mnist_train:
	data size: [24754, 28, 28],
	target size: [24754]
-mnist_test:
	data size: [10000, 28, 28],
	target size: [10000]
-eicu_class_train:
	data size: [4777, 103],
	target size: [4777]
-eicu_class_test:
	data size: [5389, 103],
	target size: [5389]
-eicu_reg_train:
	data size: [4777, 103],
	target size: [4777]
-eicu_reg_test:
	data size: [5389, 103],
	target size: [5389]



Check the datasets they have and the dimensions of these.

## Train a convolutional neural network on the mnist dataset with federated averaging

Each hospital holds a subset of the training data but they all share the same test data.

## Define the config file for this experiment

This holds everything from the learning rate to the batch size. 

First let's check the available parameters. Note, this object can take a yml config file (good for reproducible experiments) or be parametrised when instantiated.

In [14]:
?LearnConfig

[0;31mInit signature:[0m
[0mLearnConfig[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mconfig_file[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mtrain_dataset_name[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mtest_dataset_name[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mtrain_batch_size[0m[0;34m=[0m[0;36m64[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mtest_batch_size[0m[0;34m=[0m[0;36m128[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mtrain_epochs[0m[0;34m=[0m[0;36m40[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mfed_after_n_batches[0m[0;34m=[0m[0;36m10[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mmetrics[0m[0;34m=[0m[0;34m[[0m[0;34m'accuracy'[0m[0;34m][0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mlr[0m[0;34m=[0m[0;36m0.1[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mcuda[0m[0;34m=[0m[0;32mFalse[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mseed[0m[0;34m=[0m[0;36m

In [7]:
config = LearnConfig("config_mnist.yml")
config

{'config_file': 'config_mnist.yml',
 'train_dataset_name': 'mnist_train',
 'test_dataset_name': 'mnist_test',
 'train_batch_size': 128,
 'test_batch_size': 128,
 'train_epochs': 50,
 'fed_after_n_batches': 10,
 'metrics': ['accuracy'],
 'optimizer': 'SGD',
 'optimizer_params': {'lr': 0.1, 'momentum': 0.9},
 'cuda': False,
 'seed': 42,
 'save_model': True,
 'verbose': True,
 'regression': False}

## Define model architecture and loss function

Define a model architecture in Torch, or simply load one of NeoGlia's predefined ones.

In [8]:
model = ConvNet()

In [8]:
%psource ConvNet

[0;32mclass[0m [0mConvNet[0m[0;34m([0m[0mnn[0m[0;34m.[0m[0mModule[0m[0;34m)[0m[0;34m:[0m[0;34m[0m
[0;34m[0m    [0;34m"""[0m
[0;34m    Simple convolutional neural network for multi-class image data.[0m
[0;34m[0m
[0;34m    Returns probabilities after softmax and not logits.[0m
[0;34m    """[0m[0;34m[0m
[0;34m[0m    [0;32mdef[0m [0m__init__[0m[0;34m([0m[0mself[0m[0;34m)[0m[0;34m:[0m[0;34m[0m
[0;34m[0m        [0msuper[0m[0;34m([0m[0mConvNet[0m[0;34m,[0m [0mself[0m[0;34m)[0m[0;34m.[0m[0m__init__[0m[0;34m([0m[0;34m)[0m[0;34m[0m
[0;34m[0m        [0mself[0m[0;34m.[0m[0mconv1[0m [0;34m=[0m [0mnn[0m[0;34m.[0m[0mConv2d[0m[0;34m([0m[0;36m1[0m[0;34m,[0m [0;36m20[0m[0;34m,[0m [0;36m5[0m[0;34m,[0m [0;36m1[0m[0;34m)[0m[0;34m[0m
[0;34m[0m        [0mself[0m[0;34m.[0m[0mconv2[0m [0;34m=[0m [0mnn[0m[0;34m.[0m[0mConv2d[0m[0;34m([0m[0;36m20[0m[0;34m,[0m [0;36m50[0m[0;34m,[0m 

We'll use cross entropy in this example as a loss function as this is a multi-class problem.

## Start training and evaluating the model in a federated manner

In [9]:
fed_learner = Learner(
    config=config,
    model=model, 
    model_input_dim=[1, 1, 28, 28],
    loss_fn=cross_entropy, 
    workers=(h1, h2, h3)
)

In [10]:
fed_learner.train_eval()

neoglia.learn.learner - INFO - Starting epoch 1/50
neoglia.learn.learner - INFO - Training round: 1, worker: h2, avg_loss: 1.4678
neoglia.learn.learner - INFO - Training round: 1, worker: h3, avg_loss: 1.2387
neoglia.learn.learner - INFO - Training round: 1, worker: h1, avg_loss: 1.7557
neoglia.learn.learner - INFO - Starting epoch 2/50
neoglia.learn.learner - INFO - Training round: 2, worker: h2, avg_loss: 1.6916
neoglia.learn.learner - INFO - Training round: 2, worker: h3, avg_loss: 0.8026
neoglia.learn.learner - INFO - Training round: 2, worker: h1, avg_loss: 1.9385
neoglia.learn.learner - INFO - Starting epoch 3/50
neoglia.learn.learner - INFO - Training round: 3, worker: h2, avg_loss: 0.4517
neoglia.learn.learner - INFO - Training round: 3, worker: h1, avg_loss: 0.9103
neoglia.learn.learner - INFO - Training round: 3, worker: h3, avg_loss: 0.9021
neoglia.learn.learner - INFO - Starting epoch 4/50
neoglia.learn.learner - INFO - Training round: 4, worker: h1, avg_loss: 0.4707
neogli

In [6]:
for worker in (h1, h2, h3):
    worker.close()

## Load trained federated model

And test it locally on a few examples

In [20]:
import torch
m = torch.load('mnist_train_model.pt')
m

OrderedDict([('conv1.weight',
              tensor([[[[ 1.8860e-02, -1.0073e-01, -1.0674e-01,  9.7052e-03, -1.5782e-01],
                        [ 2.8249e-03,  6.3920e-03, -1.7211e-01,  9.3360e-02, -1.1974e-01],
                        [ 1.3968e-02, -1.1998e-01, -1.4551e-01,  7.7240e-02, -7.2984e-02],
                        [-2.7283e-02,  2.0283e-01,  1.5646e-01,  2.4561e-01,  1.5579e-01],
                        [-1.5279e-01, -1.0797e-01,  7.7242e-02, -6.1008e-02,  2.3347e-01]]],
              
              
                      [[[ 2.6632e-02, -5.6882e-02, -1.1255e-01, -1.1662e-01,  1.5993e-01],
                        [ 1.3329e-01, -2.0634e-01, -1.1618e-01,  1.7164e-01,  1.2429e-01],
                        [-1.7148e-01, -1.3992e-01, -7.1745e-02, -2.8118e-03,  2.0056e-01],
                        [-1.9243e-03, -9.0390e-02, -1.9782e-01,  1.2490e-01,  2.0530e-01],
                        [-1.1707e-01, -3.7539e-02, -2.7524e-02,  1.5139e-01,  2.2522e-01]]],
              
           