# Performing Testing at Each Round of Training 

Use for developing (autoreloads changes made across packages)

In [None]:
%load_ext autoreload
%autoreload 2

## Start the network
Before running this notebook, start the network with `./scripts/fedbiomed_run network`

## Setting the node up
It is necessary to previously configure a node:
1. `./scripts/fedbiomed_run node add`
  * Select option 2 (default) to add MNIST to the node
  * Confirm default tags by hitting "y" and ENTER
  * Pick the folder where MNIST is downloaded (this is due torch issue https://github.com/pytorch/vision/issues/3549)
  * Data must have been added (if you get a warning saying that data must be unique is because it's been already added)
  
2. Check that your data has been added by executing `./scripts/fedbiomed_run node list`
3. Run the node using `./scripts/fedbiomed_run node run`. Wait until you get `Starting task manager`. it means you are online.

## Define an experiment model and parameters"

Declare a torch.nn MyTrainingPlan class to send for training on the node

In [None]:
import torch
import torch.nn as nn
from fedbiomed.common.training_plans import TorchTrainingPlan
from fedbiomed.common.data import DataManager
from torchvision import datasets, transforms

# Here we define the model to be used. 
# You can use any class name (here 'Net')
class MyTrainingPlan(TorchTrainingPlan):
    def __init__(self, model_args: dict = {}):
        super(MyTrainingPlan, self).__init__(model_args)
        self.conv1 = nn.Conv2d(1, 32, 3, 1)
        self.conv2 = nn.Conv2d(32, 64, 3, 1)
        self.dropout1 = nn.Dropout(0.25)
        self.dropout2 = nn.Dropout(0.5)
        self.fc1 = nn.Linear(9216, 128)
        self.fc2 = nn.Linear(128, 10)
        
        # Here we define the custom dependencies that will be needed by our custom Dataloader
        # In this case, we need the torch DataLoader classes
        # Since we will train on MNIST, we need datasets and transform from torchvision
        deps = ["from torchvision import datasets, transforms"]
        
        self.add_dependency(deps)

    def forward(self, x):
        x = self.conv1(x)
        x = F.relu(x)
        x = self.conv2(x)
        x = F.relu(x)
        x = F.max_pool2d(x, 2)
        x = self.dropout1(x)
        x = torch.flatten(x, 1)
        x = self.fc1(x)
        x = F.relu(x)
        x = self.dropout2(x)
        x = self.fc2(x)
        
        
        output = F.log_softmax(x, dim=1)
        return output

    def training_data(self, batch_size = 48):
        # Custom torch Dataloader for MNIST data
        transform = transforms.Compose([transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,))])
        dataset1 = datasets.MNIST(self.dataset_path, train=True, download=False, transform=transform)
        train_kwargs = {'batch_size': batch_size, 'shuffle': True}
        return DataManager(dataset=dataset1, **train_kwargs)
    
    def training_step(self, data, target):
        output = self.forward(data)
        loss   = torch.nn.functional.nll_loss(output, target)
        return loss


This group of arguments correspond respectively:
* `model_args`: a dictionary with the arguments related to the model (e.g. number of layers, features, etc.). This will be passed to the model class on the node side.
* `training_args`: a dictionary containing the arguments for the training routine (e.g. batch size, learning rate, epochs, etc.). This will be passed to the routine on the node side.

**NOTE:** typos and/or lack of positional (required) arguments will raise error. 🤓

In [None]:
from fedbiomed.common.metrics import MetricTypes

model_args = {}

training_args = {
    'batch_size': 48, 
    'lr': 1e-3, 
    'epochs': 1, 
    'dry_run': False,  
    'batch_maxnum': 100, # Fast pass for development : only use ( batch_maxnum * batch_size ) samples
    'test_ratio': .3,
    'test_on_local_updates': True, 
    'test_on_global_updates': True,
    'test_metric': MetricTypes.PRECISION
}

## Declare and run the experiment

- search nodes serving data for these `tags`, optionally filter on a list of node ID with `nodes`
- run a round of local training on nodes with model defined in `model_path` + federation with `aggregator`
- run for `round_limit` rounds, applying the `node_selection_strategy` between the rounds

In [None]:
from fedbiomed.researcher.experiment import Experiment
from fedbiomed.researcher.aggregators.fedavg import FedAverage

tags =  ['#MNIST', '#dataset']
rounds = 2

exp = Experiment(tags=tags,
                 model_args=model_args,
                 model_class=MyTrainingPlan,
                 training_args=training_args,
                 round_limit=rounds,
                 aggregator=FedAverage(),
                 node_selection_strategy=None)

Let's start the experiment.

By default, this function doesn't stop until all the `round_limit` rounds are done for all the nodes

In [None]:
exp.run()

In [None]:
exp.run(rounds=2, increase=True)

Local training results for each round and each node are available via `exp.training_replies()` (index 0 to (`rounds` - 1) ).

For example you can view the training results for the last round below.

Different timings (in seconds) are reported for each dataset of a node participating in a round :
- `rtime_training` real time (clock time) spent in the training function on the node
- `ptime_training` process time (user and system CPU) spent in the training function on the node
- `rtime_total` real time (clock time) spent in the researcher between sending the request and handling the response, at the `Job()` layer

In [None]:
print("\nList the training rounds : ", exp.training_replies().keys())

print("\nList the nodes for the last training round and their timings : ")
round_data = exp.training_replies()[rounds - 1].data()
for c in range(len(round_data)):
    print("\t- {id} :\
    \n\t\trtime_training={rtraining:.2f} seconds\
    \n\t\tptime_training={ptraining:.2f} seconds\
    \n\t\trtime_total={rtotal:.2f} seconds".format(id = round_data[c]['node_id'],
        rtraining = round_data[c]['timing']['rtime_training'],
        ptraining = round_data[c]['timing']['ptime_training'],
        rtotal = round_data[c]['timing']['rtime_total']))
print('\n')
    
exp.training_replies()[rounds - 1].dataframe()

Federated parameters for each round are available via `exp.aggregated_params()` (index 0 to (`rounds` - 1) ).

For example you can view the federated parameters for the last round of the experiment :

In [None]:
print("\nList the training rounds : ", exp.aggregated_params().keys())

print("\nAccess the federated params for the last training round :")
print("\t- params_path: ", exp.aggregated_params()[rounds - 1]['params_path'])
print("\t- parameter data: ", exp.aggregated_params()[rounds - 1]['params'].keys())


In [1]:

from fedbiomed.common.training_plans import SGDSkLearnModel
from fedbiomed.common.data import DataManager
import numpy as np


class SkLearnClassifierTrainingPlan(SGDSkLearnModel):
    def __init__(self, model_args):
        super(SkLearnClassifierTrainingPlan,self).__init__(model_args)
        self.add_dependency(['import torch',
                            "from sklearn.linear_model import Perceptron",
                            "from torchvision import datasets, transforms",
                           "from torch.utils.data import DataLoader"])
    
    
    def training_data(self):
        # Custom torch Dataloader for MNIST data
        transform = transforms.Compose([transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,))])
        dataset = datasets.MNIST(self.dataset_path, train=True, download=False, transform=transform)
        
        train_kwargs = {'batch_size': 500, 'shuffle': True}  # number of data passed to classifier
        X_train = dataset.data.numpy()
        X_train = X_train.reshape(-1, 28*28)
        Y_train = dataset.targets.numpy()
        
        return DataManager(dataset=X_train,target=Y_train)

In [2]:
model_args = { 'max_iter':1000,
              'tol': 1e-4 ,
              'model': 'Perceptron' ,
              'n_features': 28*28,
              'n_classes' : 10,
              'eta0':1e-6,
             'random_state':1234,
             'alpha':0.1 }

training_args = {
    'epochs': 1, 
    'test_metric': "PRECISION",
    'test_ratio': .2,
    'test_on_global_updates': True
}

In [3]:
from fedbiomed.researcher.experiment import Experiment
from fedbiomed.researcher.aggregators.fedavg import FedAverage

tags =  ['#MNIST', '#dataset']
rounds = 50

# select nodes participing to this experiment
exp = Experiment(tags=tags,
                 model_args=model_args,
                 model_class=SkLearnClassifierTrainingPlan,
                 training_args=training_args,
                 round_limit=rounds,
                 aggregator=FedAverage(),
                 node_selection_strategy=None)


2022-03-25 16:22:10,292 fedbiomed INFO - Component environment:
2022-03-25 16:22:10,293 fedbiomed INFO - type = ComponentType.RESEARCHER
2022-03-25 16:22:10,481 fedbiomed INFO - Messaging researcher_fbbb1111-3e40-402c-a9cb-a9587cdf201c successfully connected to the message broker, object = <fedbiomed.common.messaging.Messaging object at 0x7fde5b8547f0>
2022-03-25 16:22:10,548 fedbiomed INFO - Searching dataset with data tags: ['#MNIST', '#dataset'] for all nodes
2022-03-25 16:22:20,561 fedbiomed INFO - Node selected for training -> node_21eb288e-4f34-4c6e-954e-28a86b218ec5
2022-03-25 16:22:20,570 fedbiomed DEBUG - Model file has been saved: /home/ybouilla/fedbiomed/var/experiments/Experiment_0000/my_model_204252c7-f88f-4f97-bd0b-4689becd3e6b.py
2022-03-25 16:22:20,746 fedbiomed DEBUG - upload (HTTP POST request) of file /home/ybouilla/fedbiomed/var/experiments/Experiment_0000/my_model_204252c7-f88f-4f97-bd0b-4689becd3e6b.py successful, with status code 201
2022-03-25 16:22:20,878 fedbi

In [4]:
exp.run()

2022-03-25 16:22:20,883 fedbiomed INFO - Sampled nodes in round 0 ['node_21eb288e-4f34-4c6e-954e-28a86b218ec5']
2022-03-25 16:22:20,884 fedbiomed INFO - Send message to node node_21eb288e-4f34-4c6e-954e-28a86b218ec5 - {'researcher_id': 'researcher_fbbb1111-3e40-402c-a9cb-a9587cdf201c', 'job_id': '17ba35af-2b6a-4d39-baab-feb1427a6915', 'training_args': {'test_ratio': 0.2, 'test_on_local_updates': False, 'test_on_global_updates': True, 'test_metric': 'PRECISION', 'test_metric_args': {}, 'epochs': 1}, 'training': True, 'model_args': {'max_iter': 1000, 'tol': 0.0001, 'model': 'Perceptron', 'n_features': 784, 'n_classes': 10, 'eta0': 1e-06, 'random_state': 1234, 'alpha': 0.1, 'verbose': 1}, 'command': 'train', 'model_url': 'http://localhost:8844/media/uploads/2022/03/25/my_model_204252c7-f88f-4f97-bd0b-4689becd3e6b.py', 'params_url': 'http://localhost:8844/media/uploads/2022/03/25/aggregated_params_init_60f87a24-ece7-4a61-9a98-cb7031055831.pt', 'model_class': 'SkLearnClassifierTrainingPlan'

2022-03-25 16:22:32,007 fedbiomed INFO - [1mINFO FROM NODE[0m node_21eb288e-4f34-4c6e-954e-28a86b218ec5
[1mMESSAGE:[0m 
Train Epoch: 0 [Batch All Samples]	Loss: 0.027291[0m
----------------------------------------
2022-03-25 16:22:32,009 fedbiomed INFO - [1mINFO FROM NODE[0m node_21eb288e-4f34-4c6e-954e-28a86b218ec5
[1mMESSAGE:[0m 
Train Epoch: 0 [Batch All Samples]	Loss: 0.094896[0m
----------------------------------------
2022-03-25 16:22:32,011 fedbiomed INFO - [1mINFO FROM NODE[0m node_21eb288e-4f34-4c6e-954e-28a86b218ec5
[1mMESSAGE:[0m 
Train Epoch: 0 [Batch All Samples]	Loss: 0.121548[0m
----------------------------------------
2022-03-25 16:22:32,012 fedbiomed INFO - [1mINFO FROM NODE[0m node_21eb288e-4f34-4c6e-954e-28a86b218ec5
[1mMESSAGE:[0m 
Train Epoch: 0 [Batch All Samples]	Loss: 0.064962[0m
----------------------------------------
2022-03-25 16:22:32,014 fedbiomed INFO - [1mINFO FROM NODE[0m node_21eb288e-4f34-4c6e-954e-28a86b218ec5
[1mMESSAGE:[0m 


2022-03-25 16:22:41,972 fedbiomed INFO - [1mINFO FROM NODE[0m node_21eb288e-4f34-4c6e-954e-28a86b218ec5
[1mMESSAGE:[0m 
results uploaded successfully [0m
----------------------------------------
2022-03-25 16:22:51,061 fedbiomed INFO - Downloading model params after training on node_21eb288e-4f34-4c6e-954e-28a86b218ec5 - from http://localhost:8844/media/uploads/2022/03/25/node_params_065b8517-d04a-4664-a5dc-664145f00ea9.pt
2022-03-25 16:22:51,079 fedbiomed DEBUG - upload (HTTP GET request) of file node_params_6cccbd9d-dd62-42f5-8eca-dd13b28a8204.pt successful, with status code 200
2022-03-25 16:22:51,091 fedbiomed INFO - Nodes that successfully reply in round 2 ['node_21eb288e-4f34-4c6e-954e-28a86b218ec5']
2022-03-25 16:22:51,127 fedbiomed DEBUG - upload (HTTP POST request) of file /home/ybouilla/fedbiomed/var/experiments/Experiment_0000/aggregated_params_c31728fc-3bdb-4304-8776-4350bd770fed.pt successful, with status code 201
2022-03-25 16:22:51,128 fedbiomed INFO - Saved aggrega

2022-03-25 16:23:01,275 fedbiomed INFO - [1mINFO FROM NODE[0m node_21eb288e-4f34-4c6e-954e-28a86b218ec5
[1mMESSAGE:[0m 
training with arguments {'history_monitor': <fedbiomed.node.history_monitor.HistoryMonitor object at 0x7fc7a3aadc40>, 'node_args': {'gpu': False, 'gpu_num': None, 'gpu_only': False}, 'epochs': 1}[0m
----------------------------------------
2022-03-25 16:23:01,359 fedbiomed INFO - [1mINFO FROM NODE[0m node_21eb288e-4f34-4c6e-954e-28a86b218ec5
[1mMESSAGE:[0m 
Actual/True values (y_true) has more than two levels, using multiclass `weighted` calculation for the metric PRECISION[0m
----------------------------------------
2022-03-25 16:23:01,368 fedbiomed INFO - [1mTESTING ON GLOBAL PARAMETERS[0m 
					 NODE_ID: node_21eb288e-4f34-4c6e-954e-28a86b218ec5 
					 Completed: 12000/12000 (100%) 
 					 PRECISION: [1m0.870530[0m 
					 ---------
2022-03-25 16:23:02,063 fedbiomed INFO - [1mINFO FROM NODE[0m node_21eb288e-4f34-4c6e-954e-28a86b218ec5
[1mMESSAGE:[0

2022-03-25 16:23:12,115 fedbiomed INFO - [1mINFO FROM NODE[0m node_21eb288e-4f34-4c6e-954e-28a86b218ec5
[1mMESSAGE:[0m 
Train Epoch: 0 [Batch All Samples]	Loss: 0.055435[0m
----------------------------------------
2022-03-25 16:23:12,115 fedbiomed INFO - [1mINFO FROM NODE[0m node_21eb288e-4f34-4c6e-954e-28a86b218ec5
[1mMESSAGE:[0m 
Train Epoch: 0 [Batch All Samples]	Loss: 0.241006[0m
----------------------------------------
2022-03-25 16:23:12,116 fedbiomed INFO - [1mINFO FROM NODE[0m node_21eb288e-4f34-4c6e-954e-28a86b218ec5
[1mMESSAGE:[0m 
Train Epoch: 0 [Batch All Samples]	Loss: 0.152345[0m
----------------------------------------
[1mMESSAGE:[0m 
Loss plot displayed on Tensorboard may be inaccurate (due to some plain SGD scikit learn limitations)[0m
----------------------------------------
2022-03-25 16:23:12,117 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_21eb288e-4f34-4c6e-954e-28a86b218ec5 
					 Epoch: 0 | Completed: 48000/48000 (100%) 
 					 Loss: 

2022-03-25 16:23:31,404 fedbiomed INFO - Send message to node node_21eb288e-4f34-4c6e-954e-28a86b218ec5 - {'researcher_id': 'researcher_fbbb1111-3e40-402c-a9cb-a9587cdf201c', 'job_id': '17ba35af-2b6a-4d39-baab-feb1427a6915', 'training_args': {'test_ratio': 0.2, 'test_on_local_updates': False, 'test_on_global_updates': True, 'test_metric': 'PRECISION', 'test_metric_args': {}, 'epochs': 1}, 'training': True, 'model_args': {'max_iter': 1000, 'tol': 0.0001, 'model': 'Perceptron', 'n_features': 784, 'n_classes': 10, 'eta0': 1e-06, 'random_state': 1234, 'alpha': 0.1, 'verbose': 1}, 'command': 'train', 'model_url': 'http://localhost:8844/media/uploads/2022/03/25/my_model_204252c7-f88f-4f97-bd0b-4689becd3e6b.py', 'params_url': 'http://localhost:8844/media/uploads/2022/03/25/aggregated_params_ce8ff597-6723-4674-aef8-b76793b698f0.pt', 'model_class': 'SkLearnClassifierTrainingPlan', 'training_data': {'node_21eb288e-4f34-4c6e-954e-28a86b218ec5': ['dataset_677de9ca-c83d-4d3e-9e58-ebf34c4512d3']}}
2


--------------------
Fed-BioMed researcher stopped due to exception:
FB604: repository error : bad URL when downloading file node_params_4499458e-8754-4e6c-bf3c-a918a66be185.pt(details :Invalid URL '': No scheme supplied. Perhaps you meant http://? )
--------------------


Feel free to run other sample notebooks or try your own models :D

## Testing using your own testing metric

In [None]:
import torch
import torch.nn as nn
from fedbiomed.common.training_plans import TorchTrainingPlan
from fedbiomed.common.data import DataManager
from torchvision import datasets, transforms

# Here we define the model to be used. 
# You can use any class name (here 'Net')
class MyTrainingPlanCM(TorchTrainingPlan):
    def __init__(self, model_args: dict = {}):
        super(MyTrainingPlanCM, self).__init__(model_args)
        self.conv1 = nn.Conv2d(1, 32, 3, 1)
        self.conv2 = nn.Conv2d(32, 64, 3, 1)
        self.dropout1 = nn.Dropout(0.25)
        self.dropout2 = nn.Dropout(0.5)
        self.fc1 = nn.Linear(9216, 128)
        self.fc2 = nn.Linear(128, 10)
        
        # Here we define the custom dependencies that will be needed by our custom Dataloader
        # In this case, we need the torch DataLoader classes
        # Since we will train on MNIST, we need datasets and transform from torchvision
        deps = ["from torchvision import datasets, transforms"]
        
        self.add_dependency(deps)

    def forward(self, x):
        x = self.conv1(x)
        x = F.relu(x)
        x = self.conv2(x)
        x = F.relu(x)
        x = F.max_pool2d(x, 2)
        x = self.dropout1(x)
        x = torch.flatten(x, 1)
        x = self.fc1(x)
        x = F.relu(x)
        x = self.dropout2(x)
        x = self.fc2(x)
        
        
        output = F.log_softmax(x, dim=1)
        return output

    def training_data(self, batch_size = 48):
        # Custom torch Dataloader for MNIST data
        transform = transforms.Compose([transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,))])
        dataset1 = datasets.MNIST(self.dataset_path, train=True, download=False, transform=transform)
        train_kwargs = {'batch_size': batch_size, 'shuffle': True}
        return DataManager(dataset=dataset1, **train_kwargs)
    
    def training_step(self, data, target):
        output = self.forward(data)
        loss   = torch.nn.functional.nll_loss(output, target)
        return loss

    def testing_step(self, data, target):
        
        output = self.forward(data)
        loss1   = torch.nn.functional.nll_loss(output, target)
        output = self(data)
        loss2   = torch.nn.functional.nll_loss(output, target)
        return [loss1, loss2]

In [None]:
model_args = {}

training_args = {
    'batch_size': 48, 
    'lr': 1e-3, 
    'epochs': 1, 
    'dry_run': False,  
    'batch_maxnum': 100, # Fast pass for development : only use ( batch_maxnum * batch_size ) samples
    'test_ratio': .3,
    'test_on_local_updates': True, 
    'test_on_global_updates': True
}

In [None]:
from fedbiomed.researcher.experiment import Experiment
from fedbiomed.researcher.aggregators.fedavg import FedAverage

tags =  ['#MNIST', '#dataset']
rounds = 2

exp = Experiment(tags=tags,
                 model_args=model_args,
                 model_class=MyTrainingPlanCM,
                 training_args=training_args,
                 round_limit=rounds,
                 aggregator=FedAverage(),
                 node_selection_strategy=None)

In [None]:
exp.run()

In [None]:
exp.run(rounds=1, increase=True)