# Fed-BioMed Researcher base example

Use for developing (autoreloads changes made across packages)

In [None]:
%load_ext autoreload
%autoreload 2

## Start the network
Before running this notebook, start the network with `./scripts/fedbiomed_run network`

## Setting the node up
It is necessary to previously configure a node:
1. `./scripts/fedbiomed_run node add`
  * Select option 2 (default) to add MNIST to the node
  * Confirm default tags by hitting "y" and ENTER
  * Pick the folder where MNIST is downloaded (this is due to a pytorch issue https://github.com/pytorch/vision/issues/3549)
  * Data must have been added (if you get a warning saying that data must be unique is because it's been already added)
  
2. Check that your data has been added by executing `./scripts/fedbiomed_run node list`
3. Run the node using `./scripts/fedbiomed_run node start`. Wait until you get `Starting task manager`. it means you are online.

## Define an experiment model and parameters"

Declare a torch training plan MyTrainingPlan class to send for training on the node

In [1]:
import torch
import torch.nn as nn
from fedbiomed.common.training_plans import TorchTrainingPlan
from fedbiomed.common.data import DataManager
from torchvision import datasets, transforms
from declearn.optimizer import Optimizer
from declearn.optimizer.modules import AdamModule
from declearn.optimizer.regularizers import LassoRegularizer


# Here we define the model to be used. 
# You can use any class name (here 'Net')
class MyTrainingPlan(TorchTrainingPlan):
    
    # Defines and return model 
    def init_model(self, model_args):
        return self.Net(model_args = model_args)
    
    # Defines and return a declearn optimizer
    def init_optimizer(self, optimizer_args):
        return Optimizer(lrate=.1 ,modules=[AdamModule()], regularizers=[LassoRegularizer()])
    
    # Declares and return dependencies
    def init_dependencies(self):
        deps = ["from torchvision import datasets, transforms",
                "from declearn.optimizer import Optimizer",
                "from declearn.optimizer.modules import AdamModule",
                "from declearn.optimizer.regularizers import LassoRegularizer"]
        return deps
    
    class Net(nn.Module):
        def __init__(self, model_args):
            super().__init__()
            self.conv1 = nn.Conv2d(1, 32, 3, 1)
            self.conv2 = nn.Conv2d(32, 64, 3, 1)
            self.dropout1 = nn.Dropout(0.25)
            self.dropout2 = nn.Dropout(0.5)
            self.fc1 = nn.Linear(9216, 128)
            self.fc2 = nn.Linear(128, 10)

        def forward(self, x):
            x = self.conv1(x)
            x = F.relu(x)
            x = self.conv2(x)
            x = F.relu(x)
            x = F.max_pool2d(x, 2)
            x = self.dropout1(x)
            x = torch.flatten(x, 1)
            x = self.fc1(x)
            x = F.relu(x)
            x = self.dropout2(x)
            x = self.fc2(x)


            output = F.log_softmax(x, dim=1)
            return output

    def training_data(self, batch_size = 48):
        # Custom torch Dataloader for MNIST data
        transform = transforms.Compose([transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,))])
        dataset1 = datasets.MNIST(self.dataset_path, train=True, download=False, transform=transform)
        train_kwargs = {'batch_size': batch_size, 'shuffle': True}
        return DataManager(dataset=dataset1, **train_kwargs)
    
    def training_step(self, data, target):
        output = self.model().forward(data)
        loss   = torch.nn.functional.nll_loss(output, target)
        return loss


This group of arguments correspond respectively:
* `model_args`: a dictionary with the arguments related to the model (e.g. number of layers, features, etc.). This will be passed to the model class on the node side.
* `training_args`: a dictionary containing the arguments for the training routine (e.g. batch size, learning rate, epochs, etc.). This will be passed to the routine on the node side.

**NOTE:** typos and/or lack of positional (required) arguments will raise error. 🤓

In [2]:
model_args = {}

training_args = {
    'batch_size': 48, 
    'optimizer_args': {
        "lr" : 1e-3
    },
    'epochs': 1, 
    'dry_run': False,  
    'batch_maxnum': 100 # Fast pass for development : only use ( batch_maxnum * batch_size ) samples
}

## Declare and run the experiment

- search nodes serving data for these `tags`, optionally filter on a list of node ID with `nodes`
- run a round of local training on nodes with model defined in `model_path` + federation with `aggregator`
- run for `round_limit` rounds, applying the `node_selection_strategy` between the rounds

In [3]:
from fedbiomed.researcher.experiment import Experiment
from fedbiomed.researcher.aggregators.fedavg import FedAverage

tags =  ['#MNIST', '#dataset']
rounds = 2

exp = Experiment(tags=tags,
                 model_args=model_args,
                 training_plan_class=MyTrainingPlan,
                 training_args=training_args,
                 round_limit=rounds,
                 aggregator=FedAverage(),
                 node_selection_strategy=None)

2023-03-08 16:50:56,610 fedbiomed INFO - Messaging researcher_cdac117b-3411-4777-9eb7-bb3c477c3f29 successfully connected to the message broker, object = <fedbiomed.common.messaging.Messaging object at 0x7f3ac4b8b190>
2023-03-08 16:50:56,616 fedbiomed INFO - Searching dataset with data tags: ['#MNIST', '#dataset'] for all nodes
2023-03-08 16:51:06,630 fedbiomed INFO - Node selected for training -> node_57b462ee-4d35-4dd9-b3cb-ea964a889e92
2023-03-08 16:51:06,633 fedbiomed INFO - Node selected for training -> node_f2b2d532-f811-424d-966f-4b21c0bfd618
2023-03-08 16:51:06,639 fedbiomed INFO - Checking data quality of federated datasets...
Secure RNG turned off. This is perfectly fine for experimentation as it allows for much faster training performance, but remember to turn it on and retrain one last time before production with ``secure_mode`` turned on.
2023-03-08 16:51:06,694 fedbiomed DEBUG - Model file has been saved: /home/ybouilla/fedbiomed_2/fedbiomed/var/experiments/Experiment_004

Let's start the experiment.

By default, this function doesn't stop until all the `round_limit` rounds are done for all the nodes

In [4]:
exp.run()

2023-03-08 16:51:06,878 fedbiomed INFO - Sampled nodes in round 0 ['node_57b462ee-4d35-4dd9-b3cb-ea964a889e92', 'node_f2b2d532-f811-424d-966f-4b21c0bfd618']
2023-03-08 16:51:06,879 fedbiomed INFO - [1mSending request[0m 
					[1m To[0m: node_57b462ee-4d35-4dd9-b3cb-ea964a889e92 
					[1m Request: [0m: Perform training with the arguments: {'researcher_id': 'researcher_cdac117b-3411-4777-9eb7-bb3c477c3f29', 'job_id': '455d769d-7ddd-4105-aa9d-4e1114489196', 'training_args': {'batch_size': 48, 'optimizer_args': {'lr': 0.001}, 'epochs': 1, 'dry_run': False, 'batch_maxnum': 100, 'num_updates': None, 'test_ratio': 0.0, 'test_on_local_updates': False, 'test_on_global_updates': False, 'test_metric': None, 'test_metric_args': {}, 'log_interval': 10, 'fedprox_mu': None, 'use_gpu': False, 'dp_args': None}, 'training': True, 'model_args': {}, 'command': 'train', 'training_plan_url': 'http://localhost:8844/media/uploads/2023/03/08/my_model_a0ae608e-cd8a-43c4-91f5-ed0e6470bd1c.py', 'params_url'

2023-03-08 16:51:15,988 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_57b462ee-4d35-4dd9-b3cb-ea964a889e92 
					 Round 1 Epoch: 1 | Iteration: 100/100 (100%) | Samples: 4800/4800
 					 Loss: [1m2.295616[0m 
					 ---------
2023-03-08 16:51:16,062 fedbiomed INFO - [1mINFO[0m
					[1m NODE[0m node_f2b2d532-f811-424d-966f-4b21c0bfd618
					[1m MESSAGE:[0m results uploaded successfully [0m
-----------------------------------------------------------------
2023-03-08 16:51:16,133 fedbiomed INFO - [1mINFO[0m
					[1m NODE[0m node_57b462ee-4d35-4dd9-b3cb-ea964a889e92
					[1m MESSAGE:[0m results uploaded successfully [0m
-----------------------------------------------------------------
2023-03-08 16:51:21,901 fedbiomed INFO - Downloading model params after training on node_f2b2d532-f811-424d-966f-4b21c0bfd618 - from http://localhost:8844/media/uploads/2023/03/08/node_params_491208e6-419b-4c5b-b9bc-fe3ec30cef37.pt
2023-03-08 16:51:21,955 fedbiomed DEBUG - upload (HTTP

2023-03-08 16:51:28,255 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_f2b2d532-f811-424d-966f-4b21c0bfd618 
					 Round 2 Epoch: 1 | Iteration: 60/100 (60%) | Samples: 2880/4800
 					 Loss: [1m2.293581[0m 
					 ---------
2023-03-08 16:51:28,530 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_57b462ee-4d35-4dd9-b3cb-ea964a889e92 
					 Round 2 Epoch: 1 | Iteration: 60/100 (60%) | Samples: 2880/4800
 					 Loss: [1m2.322720[0m 
					 ---------
2023-03-08 16:51:28,852 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_f2b2d532-f811-424d-966f-4b21c0bfd618 
					 Round 2 Epoch: 1 | Iteration: 70/100 (70%) | Samples: 3360/4800
 					 Loss: [1m2.291246[0m 
					 ---------
2023-03-08 16:51:29,140 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_57b462ee-4d35-4dd9-b3cb-ea964a889e92 
					 Round 2 Epoch: 1 | Iteration: 70/100 (70%) | Samples: 3360/4800
 					 Loss: [1m2.323550[0m 
					 ---------
2023-03-08 16:51:29,541 fedbiomed INFO - [1mTRAINING[0m 
					

2

Local training results for each round and each node are available via `exp.training_replies()` (index 0 to (`rounds` - 1) ).

For example you can view the training results for the last round below.

Different timings (in seconds) are reported for each dataset of a node participating in a round :
- `rtime_training` real time (clock time) spent in the training function on the node
- `ptime_training` process time (user and system CPU) spent in the training function on the node
- `rtime_total` real time (clock time) spent in the researcher between sending the request and handling the response, at the `Job()` layer

In [None]:
print("\nList the training rounds : ", exp.training_replies().keys())

print("\nList the nodes for the last training round and their timings : ")
round_data = exp.training_replies()[rounds - 1].data()
for c in range(len(round_data)):
    print("\t- {id} :\
    \n\t\trtime_training={rtraining:.2f} seconds\
    \n\t\tptime_training={ptraining:.2f} seconds\
    \n\t\trtime_total={rtotal:.2f} seconds".format(id = round_data[c]['node_id'],
        rtraining = round_data[c]['timing']['rtime_training'],
        ptraining = round_data[c]['timing']['ptime_training'],
        rtotal = round_data[c]['timing']['rtime_total']))
print('\n')
    
exp.training_replies()[rounds - 1].dataframe()

Federated parameters for each round are available via `exp.aggregated_params()` (index 0 to (`rounds` - 1) ).

For example you can view the federated parameters for the last round of the experiment :

In [None]:
print("\nList the training rounds : ", exp.aggregated_params().keys())

print("\nAccess the federated params for the last training round :")
print("\t- params_path: ", exp.aggregated_params()[rounds - 1]['params_path'])
print("\t- parameter data: ", exp.aggregated_params()[rounds - 1]['params'].keys())


Feel free to run other sample notebooks or try your own models :D