# Fed-BioMed Researcher base example

Use for developing (autoreloads changes made across packages)

In [None]:
%load_ext autoreload
%autoreload 2

## Start the network
Before running this notebook, start the network with `./scripts/fedbiomed_run network`

## Setting the node up
It is necessary to previously configure a node:
1. `./scripts/fedbiomed_run node add`
  * Select option 2 (default) to add MNIST to the node
  * Confirm default tags by hitting "y" and ENTER
  * Pick the folder where MNIST is downloaded (this is due torch issue https://github.com/pytorch/vision/issues/3549)
  * Data must have been added (if you get a warning saying that data must be unique is because it's been already added)
  
2. Check that your data has been added by executing `./scripts/fedbiomed_run node list`
3. Run the node using `./scripts/fedbiomed_run node run`. Wait until you get `Starting task manager`. it means you are online.

## Define an experiment model and parameters"

Declare a torch.nn MyTrainingPlan class to send for training on the node

In [1]:
import torch
import torch.nn as nn
from fedbiomed.common.training_plans import TorchTrainingPlan
from fedbiomed.common.data import DataManager
from torchvision import datasets, transforms

# Here we define the model to be used. 
# You can use any class name (here 'Net')
class MyTrainingPlan(TorchTrainingPlan):
    def __init__(self, model_args: dict = {}):
        super(MyTrainingPlan, self).__init__(model_args)
        self.conv1 = nn.Conv2d(1, 32, 3, 1)
        self.conv2 = nn.Conv2d(32, 64, 3, 1)
        self.dropout1 = nn.Dropout(0.25)
        self.dropout2 = nn.Dropout(0.5)
        self.fc1 = nn.Linear(9216, 128)
        self.fc2 = nn.Linear(128, 10)
        
        # Here we define the custom dependencies that will be needed by our custom Dataloader
        # In this case, we need the torch DataLoader classes
        # Since we will train on MNIST, we need datasets and transform from torchvision
        deps = ["from torchvision import datasets, transforms"]
        
        self.add_dependency(deps)

    def forward(self, x):
        x = self.conv1(x)
        x = F.relu(x)
        x = self.conv2(x)
        x = F.relu(x)
        x = F.max_pool2d(x, 2)
        x = self.dropout1(x)
        x = torch.flatten(x, 1)
        x = self.fc1(x)
        x = F.relu(x)
        x = self.dropout2(x)
        x = self.fc2(x)
        
        
        output = F.log_softmax(x, dim=1)
        return output

    def training_data(self, batch_size = 48):
        # Custom torch Dataloader for MNIST data
        transform = transforms.Compose([transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,))])
        dataset1 = datasets.MNIST(self.dataset_path, train=True, download=False, transform=transform)
        train_kwargs = {'batch_size': batch_size, 'shuffle': True}
        return DataManager(dataset=dataset1, **train_kwargs)
    
    def training_step(self, data, target):
        output = self.forward(data)
        loss   = torch.nn.functional.nll_loss(output, target)
        return loss


This group of arguments correspond respectively:
* `model_args`: a dictionary with the arguments related to the model (e.g. number of layers, features, etc.). This will be passed to the model class on the node side.
* `training_args`: a dictionary containing the arguments for the training routine (e.g. batch size, learning rate, epochs, etc.). This will be passed to the routine on the node side.

**NOTE:** typos and/or lack of positional (required) arguments will raise error. 🤓

In [2]:
model_args = {}

training_args = {
    'batch_size': 48, 
    'lr': 1e-3, 
    'epochs': 1, 
    'dry_run': False,  
    'batch_maxnum': 100 # Fast pass for development : only use ( batch_maxnum * batch_size ) samples
}

## Declare and run the experiment

- search nodes serving data for these `tags`, optionally filter on a list of node ID with `nodes`
- run a round of local training on nodes with model defined in `model_path` + federation with `aggregator`
- run for `round_limit` rounds, applying the `node_selection_strategy` between the rounds

In [3]:
from fedbiomed.researcher.experiment import Experiment
from fedbiomed.researcher.aggregators.fedavg import FedAverage

tags =  ['#MNIST', '#dataset']
rounds = 2

exp = Experiment(tags=tags,
                 model_args=model_args,
                 model_class=MyTrainingPlan,
                 training_args=training_args,
                 round_limit=rounds,
                 aggregator=FedAverage(),
                 node_selection_strategy=None)

2022-05-24 08:31:01,620 fedbiomed INFO - Component environment:
2022-05-24 08:31:01,624 fedbiomed INFO - type = ComponentType.RESEARCHER
2022-05-24 08:31:01,781 fedbiomed INFO - Messaging researcher_f1dca854-b636-4cfc-9227-5ba0d8af8601 successfully connected to the message broker, object = <fedbiomed.common.messaging.Messaging object at 0x7fd06c565250>
2022-05-24 08:31:01,808 fedbiomed INFO - Searching dataset with data tags: ['#MNIST', '#dataset'] for all nodes
2022-05-24 08:31:11,846 fedbiomed INFO - Node selected for training -> node_19fdb3db-c00c-4e3e-a1f8-55b6654fd842
2022-05-24 08:31:11,906 fedbiomed DEBUG - Model file has been saved: /home/scansiz/Desktop/Inria/development/fedbiomed/var/experiments/Experiment_0007/my_model_4c27bd1d-6ca0-4d87-bc43-3cd05be16f3c.py
2022-05-24 08:31:12,082 fedbiomed DEBUG - upload (HTTP POST request) of file /home/scansiz/Desktop/Inria/development/fedbiomed/var/experiments/Experiment_0007/my_model_4c27bd1d-6ca0-4d87-bc43-3cd05be16f3c.py successful, 

Let's start the experiment.

By default, this function doesn't stop until all the `round_limit` rounds are done for all the nodes

In [9]:
exp.run_once(increase=True)

2022-05-24 09:18:51,742 fedbiomed INFO - Sampled nodes in round 0 ['node_19fdb3db-c00c-4e3e-a1f8-55b6654fd842']
2022-05-24 09:18:51,745 fedbiomed INFO - [1mSending request[0m 
					[1m To[0m: node_19fdb3db-c00c-4e3e-a1f8-55b6654fd842 
					[1m Request: [0m: Perform training with the arguments: {'researcher_id': 'researcher_f1dca854-b636-4cfc-9227-5ba0d8af8601', 'job_id': '17230b96-3b09-454e-8b9e-6759aa903d5e', 'training_args': scheme:
{'lr': {'rules': [<class 'float'>, <function TrainingArgs._lr_hook at 0x7fd06c660ee0>], 'required': False}, 'batch_size': {'rules': [<class 'int'>], 'required': False}, 'epochs': {'rules': [<class 'int'>], 'required': False}, 'dry_run': {'rules': [<class 'bool'>], 'required': False}, 'batch_maxnum': {'rules': [<class 'int'>], 'required': False}, 'test_ratio': {'rules': [<class 'float'>, <function TrainingArgs._test_ratio_hook at 0x7fd06c660dc0>], 'required': False, 'default': 0.0}, 'test_on_local_updates': {'rules': [<class 'bool'>], 'required': Fal

1

2022-05-24 09:20:43,874 fedbiomed INFO - [1mERROR[0m
					[1m NODE[0m test_logger_node_582115ff-7e28-4b13-8c1a-c2815dede6df
					[1m MESSAGE:[0m mqtt+console ERROR message[0m
-----------------------------------------------------------------
2022-05-24 09:20:56,059 fedbiomed INFO - [1mINFO[0m
					[1m NODE[0m node_1234
					[1m MESSAGE:[0m Messaging mock_researcher_XXX successfully connected to the message broker, object = <fedbiomed.common.messaging.Messaging object at 0x7ff7558cc0d0>[0m
-----------------------------------------------------------------
					[1m NODE[0m node_1234
					[1m MESSAGE:[0m Target data seems to be a regression, metric ACCURACY might not be appropriate[0m
-----------------------------------------------------------------
					[1m NODE[0m node_1234
					[1m MESSAGE:[0m Target data seems to be a regression, metric ACCURACY might not be appropriate[0m
-----------------------------------------------------------------
2022-05-24 09:20:58,117 f

2022-05-24 09:20:58,433 fedbiomed INFO - [1mINFO[0m
					[1m NODE[0m node_1234
					[1m MESSAGE:[0m Actual/True values (y_true) has more than two levels, using multiclass `weighted` calculation for the metric PRECISION[0m
-----------------------------------------------------------------
2022-05-24 09:20:58,451 fedbiomed INFO - [1mINFO[0m
					[1m NODE[0m node_1234
					[1m MESSAGE:[0m Actual/True values (y_true) has more than two levels, using multiclass `samples` calculation for the metric F1_SCORE[0m
-----------------------------------------------------------------
2022-05-24 09:20:58,467 fedbiomed INFO - [1mINFO[0m
					[1m NODE[0m node_1234
					[1m MESSAGE:[0m Actual/True values (y_true) has more than two levels, using multiclass `samples` calculation for the metric RECALL[0m
-----------------------------------------------------------------
2022-05-24 09:20:58,476 fedbiomed INFO - [1mINFO[0m
					[1m NODE[0m node_1234
					[1m MESSAGE:[0m Actual/True val

2022-05-24 09:21:00,154 fedbiomed INFO - [1mINFO[0m
					[1m NODE[0m node_1234
					[1m MESSAGE:[0m Recreating hashing for : variational-autoencoder.txt 	 model_7538c5a6-4278-44ad-aea3-de9f1a4976d9[0m
-----------------------------------------------------------------
2022-05-24 09:21:00,207 fedbiomed INFO - [1mINFO[0m
					[1m NODE[0m node_1234
					[1m MESSAGE:[0m Recreating hashing for : sklearn-perceptron.txt 	 model_c4561aca-5166-44f0-b126-4d450b9d3bdb[0m
-----------------------------------------------------------------
2022-05-24 09:21:00,228 fedbiomed INFO - [1mINFO[0m
					[1m NODE[0m node_1234
					[1m MESSAGE:[0m {'name': 'sklearn-sgdregressor.txt', 'description': 'Default model', 'hash': '55a8d2585e750aa1b4a8358400637bd67c77dd292743bc60d340580ad5522b16d2652694c62df137016435a54f71a6c695b240c920d95bce642c9491e1a18eb5', 'model_path': '/tmp/_nod_/default_models/sklearn-sgdregressor.txt', 'model_id': 'model_8eb3c39e-ae55-4b1d-bfb0-a89c04cdf26b', 'model_type': 'd

2022-05-24 09:21:01,136 fedbiomed INFO - [1mINFO[0m
					[1m NODE[0m node_1234
					[1m MESSAGE:[0m Recreating hashing for : sklearn-sgdregressor.txt 	 model_8eb3c39e-ae55-4b1d-bfb0-a89c04cdf26b[0m
-----------------------------------------------------------------
2022-05-24 09:21:01,138 fedbiomed INFO - [1mINFO[0m
					[1m NODE[0m node_1234
					[1m MESSAGE:[0m Recreating hashing for : pytorch-celeba.txt 	 model_4a3d2797-3216-496d-a4d1-6391b7891a1f[0m
-----------------------------------------------------------------
2022-05-24 09:21:01,271 fedbiomed INFO - [1mINFO[0m
					[1m NODE[0m node_1234
					[1m MESSAGE:[0m Recreating hashing for : pytorch-mnist.txt 	 model_5e0483a2-5342-47fb-a670-8c455879d3ac[0m
-----------------------------------------------------------------
2022-05-24 09:21:01,337 fedbiomed INFO - [1mINFO[0m
					[1m NODE[0m node_1234
					[1m MESSAGE:[0m Recreating hashing for : pytorch-usedcars.txt 	 model_23c97a81-0976-46bc-b6da-0e8d98f1c2e8[0m

2022-05-24 09:21:02,657 fedbiomed INFO - [1mINFO[0m
					[1m NODE[0m node_1234
					[1m MESSAGE:[0m Model : test-model could not found in : /home/scansiz/Desktop/Inria/development/fedbiomed/tests/test-model/copied-test-model-1.txt, will be removed[0m
-----------------------------------------------------------------
2022-05-24 09:21:03,369 fedbiomed INFO - [1mCRITICAL[0m
					[1m NODE[0m node_1234
					[1m MESSAGE:[0m researcher_id: 'True' instead of '<class 'str'>'[0m
-----------------------------------------------------------------
2022-05-24 09:21:03,370 fedbiomed INFO - [1mCRITICAL[0m
					[1m NODE[0m node_1234
					[1m MESSAGE:[0m FB601: message error: bad input value for message: ModelStatusReply(researcher_id=True, node_id='mock_node_XXX', job_id='xxx', success=True, approval_obligation=True, is_approved=True, msg='Model is approved by the node', model_url='file://tmp/_nod_/default_models/sklearn-sgdregressor.txt', command='model-status')[0m
-----------------

2022-05-24 09:21:05,564 fedbiomed INFO - [1mERROR[0m
					[1m NODE[0m node_1234
					[1m MESSAGE:[0m Undetermined error during the testing phase on global parameter updates: testing_routine() got an unexpected keyword argument 'metric_args'[0m
-----------------------------------------------------------------
2022-05-24 09:21:05,570 fedbiomed INFO - [1mERROR[0m
					[1m NODE[0m node_1234
					[1m MESSAGE:[0m Undetermined error during the testing phase on local parameter updatestesting_routine() got an unexpected keyword argument 'metric_args'[0m
-----------------------------------------------------------------
2022-05-24 09:21:05,572 fedbiomed INFO - [1mERROR[0m
					[1m NODE[0m node_1234
					[1m MESSAGE:[0m Undetermined error during the testing phase on global parameter updates: testing_routine() got an unexpected keyword argument 'metric_args'[0m
-----------------------------------------------------------------
2022-05-24 09:21:05,578 fedbiomed INFO - [1mERROR[

In [None]:
exp.run(rounds=8, increase=True)

Local training results for each round and each node are available via `exp.training_replies()` (index 0 to (`rounds` - 1) ).

For example you can view the training results for the last round below.

Different timings (in seconds) are reported for each dataset of a node participating in a round :
- `rtime_training` real time (clock time) spent in the training function on the node
- `ptime_training` process time (user and system CPU) spent in the training function on the node
- `rtime_total` real time (clock time) spent in the researcher between sending the request and handling the response, at the `Job()` layer

In [None]:
print("\nList the training rounds : ", exp.training_replies().keys())

print("\nList the nodes for the last training round and their timings : ")
round_data = exp.training_replies()[rounds - 1].data()
for c in range(len(round_data)):
    print("\t- {id} :\
    \n\t\trtime_training={rtraining:.2f} seconds\
    \n\t\tptime_training={ptraining:.2f} seconds\
    \n\t\trtime_total={rtotal:.2f} seconds".format(id = round_data[c]['node_id'],
        rtraining = round_data[c]['timing']['rtime_training'],
        ptraining = round_data[c]['timing']['ptime_training'],
        rtotal = round_data[c]['timing']['rtime_total']))
print('\n')
    
exp.training_replies()[rounds - 1].dataframe()

Federated parameters for each round are available via `exp.aggregated_params()` (index 0 to (`rounds` - 1) ).

For example you can view the federated parameters for the last round of the experiment :

In [None]:
print("\nList the training rounds : ", exp.aggregated_params().keys())

print("\nAccess the federated params for the last training round :")
print("\t- params_path: ", exp.aggregated_params()[rounds - 1]['params_path'])
print("\t- parameter data: ", exp.aggregated_params()[rounds - 1]['params'].keys())


Feel free to run other sample notebooks or try your own models :D