# Fed-BioMed Researcher base example

Use for developing (autoreloads changes made across packages)

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
from fedbiomed.researcher.requests import Requests 
req = Requests()
req.list(verbose=True)


## Setting the node up
It is necessary to previously configure a node:
1. `./scripts/fedbiomed_run node add`
  * Select option 2 (default) to add MNIST to the node
  * Confirm default tags by hitting "y" and ENTER
  * Pick the folder where MNIST is downloaded (this is due to a pytorch issue https://github.com/pytorch/vision/issues/3549)
  * Data must have been added (if you get a warning saying that data must be unique is because it's been already added)
  
2. Check that your data has been added by executing `./scripts/fedbiomed_run node list`
3. Run the node using `./scripts/fedbiomed_run node start`. Wait until you get `Starting task manager`. it means you are online.

## Define an experiment model and parameters"

Declare a torch training plan MyTrainingPlan class to send for training on the node

In [None]:
import torch
import torch.nn as nn
from fedbiomed.common.training_plans import TorchTrainingPlan
from fedbiomed.common.data import DataManager
from torchvision import datasets, transforms


# Here we define the model to be used. 
# You can use any class name (here 'Net')
class MyTrainingPlan(TorchTrainingPlan):
    
    # Defines and return model 
    def init_model(self, model_args):
        return self.Net(model_args = model_args)
    
    # Defines and return optimizer
    def init_optimizer(self, optimizer_args):
        return torch.optim.Adam(self.model().parameters(), lr = optimizer_args["lr"])
    
    # Declares and return dependencies
    def init_dependencies(self):
        deps = ["from torchvision import datasets, transforms"]
        return deps
    
    class Net(nn.Module):
        def __init__(self, model_args):
            super().__init__()
            self.conv1 = nn.Conv2d(1, 32, 3, 1)
            self.conv2 = nn.Conv2d(32, 64, 3, 1)
            self.dropout1 = nn.Dropout(0.25)
            self.dropout2 = nn.Dropout(0.5)
            self.fc1 = nn.Linear(9216, 128)
            self.fc2 = nn.Linear(128, 10)

        def forward(self, x):
            x = self.conv1(x)
            x = F.relu(x)
            x = self.conv2(x)
            x = F.relu(x)
            x = F.max_pool2d(x, 2)
            x = self.dropout1(x)
            x = torch.flatten(x, 1)
            x = self.fc1(x)
            x = F.relu(x)
            x = self.dropout2(x)
            x = self.fc2(x)


            output = F.log_softmax(x, dim=1)
            return output

    def training_data(self):
        # Custom torch Dataloader for MNIST data
        transform = transforms.Compose([transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,))])
        dataset1 = datasets.MNIST(self.dataset_path, train=True, download=False, transform=transform)
        train_kwargs = { 'shuffle': True}
        return DataManager(dataset=dataset1, **train_kwargs)
    
    def training_step(self, data, target):
        output = self.model().forward(data)
        loss   = torch.nn.functional.nll_loss(output, target)
        return loss


This group of arguments correspond respectively:
* `model_args`: a dictionary with the arguments related to the model (e.g. number of layers, features, etc.). This will be passed to the model class on the node side.
* `training_args`: a dictionary containing the arguments for the training routine (e.g. batch size, learning rate, epochs, etc.). This will be passed to the routine on the node side.

**NOTE:** typos and/or lack of positional (required) arguments will raise error. 🤓

In [None]:
model_args = {}

training_args = {
    'loader_args': { 'batch_size': 48, }, 
    'optimizer_args': {
        "lr" : 1e-3
    },
    'epochs': 1, 
    'dry_run': False,  
    'batch_maxnum': 100 # Fast pass for development : only use ( batch_maxnum * batch_size ) samples
}

## Declare and run the experiment

- search nodes serving data for these `tags`, optionally filter on a list of node ID with `nodes`
- run a round of local training on nodes with model defined in `model_path` + federation with `aggregator`
- run for `round_limit` rounds, applying the `node_selection_strategy` between the rounds

In [None]:
from fedbiomed.researcher.experiment import Experiment
from fedbiomed.researcher.aggregators.fedavg import FedAverage

tags =  ['#MNIST', '#dataset']
rounds = 2

exp = Experiment(tags=tags,
                 model_args=model_args,
                 training_plan_class=MyTrainingPlan,
                 training_args=training_args,
                 round_limit=rounds,
                 aggregator=FedAverage(),
                 node_selection_strategy=None)

Let's start the experiment.

By default, this function doesn't stop until all the `round_limit` rounds are done for all the nodes

In [None]:
exp.info()

In [None]:
exp.run()

2023-11-08 16:35:27,894 fedbiomed DEBUG - Node: node_56066d82-ae6e-411b-a06b-8eec48f17acd polling for the tasks
2023-11-08 16:36:23,922 fedbiomed DEBUG - Node: node_01f09341-906f-4cc9-81c7-0bd9882c5c15 polling for the tasks
2023-11-08 16:36:27,896 fedbiomed DEBUG - Node: node_56066d82-ae6e-411b-a06b-8eec48f17acd polling for the tasks
2023-11-08 16:37:23,925 fedbiomed DEBUG - Node: node_01f09341-906f-4cc9-81c7-0bd9882c5c15 polling for the tasks
2023-11-08 16:37:27,900 fedbiomed DEBUG - Node: node_56066d82-ae6e-411b-a06b-8eec48f17acd polling for the tasks
2023-11-08 16:38:23,928 fedbiomed DEBUG - Node: node_01f09341-906f-4cc9-81c7-0bd9882c5c15 polling for the tasks
2023-11-08 16:38:27,902 fedbiomed DEBUG - Node: node_56066d82-ae6e-411b-a06b-8eec48f17acd polling for the tasks
2023-11-08 16:39:23,931 fedbiomed DEBUG - Node: node_01f09341-906f-4cc9-81c7-0bd9882c5c15 polling for the tasks
2023-11-08 16:39:27,904 fedbiomed DEBUG - Node: node_56066d82-ae6e-411b-a06b-8eec48f17acd polling for th

2023-11-08 17:16:17,066 fedbiomed DEBUG - Node: node_56066d82-ae6e-411b-a06b-8eec48f17acd polling for the tasks
2023-11-08 17:17:13,094 fedbiomed DEBUG - Node: node_01f09341-906f-4cc9-81c7-0bd9882c5c15 polling for the tasks
2023-11-08 17:17:17,070 fedbiomed DEBUG - Node: node_56066d82-ae6e-411b-a06b-8eec48f17acd polling for the tasks
2023-11-08 17:18:13,096 fedbiomed DEBUG - Node: node_01f09341-906f-4cc9-81c7-0bd9882c5c15 polling for the tasks
2023-11-08 17:18:17,073 fedbiomed DEBUG - Node: node_56066d82-ae6e-411b-a06b-8eec48f17acd polling for the tasks
2023-11-08 17:19:13,102 fedbiomed DEBUG - Node: node_01f09341-906f-4cc9-81c7-0bd9882c5c15 polling for the tasks
2023-11-08 17:19:17,073 fedbiomed DEBUG - Node: node_56066d82-ae6e-411b-a06b-8eec48f17acd polling for the tasks
2023-11-08 17:20:13,103 fedbiomed DEBUG - Node: node_01f09341-906f-4cc9-81c7-0bd9882c5c15 polling for the tasks
2023-11-08 17:20:17,077 fedbiomed DEBUG - Node: node_56066d82-ae6e-411b-a06b-8eec48f17acd polling for th

2023-11-09 08:23:01,251 fedbiomed DEBUG - Node: node_56066d82-ae6e-411b-a06b-8eec48f17acd polling for the tasks
2023-11-09 08:23:57,275 fedbiomed DEBUG - Node: node_01f09341-906f-4cc9-81c7-0bd9882c5c15 polling for the tasks
2023-11-09 08:24:01,253 fedbiomed DEBUG - Node: node_56066d82-ae6e-411b-a06b-8eec48f17acd polling for the tasks
2023-11-09 08:24:57,279 fedbiomed DEBUG - Node: node_01f09341-906f-4cc9-81c7-0bd9882c5c15 polling for the tasks
2023-11-09 08:25:01,257 fedbiomed DEBUG - Node: node_56066d82-ae6e-411b-a06b-8eec48f17acd polling for the tasks
2023-11-09 08:25:57,281 fedbiomed DEBUG - Node: node_01f09341-906f-4cc9-81c7-0bd9882c5c15 polling for the tasks
2023-11-09 08:26:01,260 fedbiomed DEBUG - Node: node_56066d82-ae6e-411b-a06b-8eec48f17acd polling for the tasks
2023-11-09 08:26:57,283 fedbiomed DEBUG - Node: node_01f09341-906f-4cc9-81c7-0bd9882c5c15 polling for the tasks
2023-11-09 08:27:01,263 fedbiomed DEBUG - Node: node_56066d82-ae6e-411b-a06b-8eec48f17acd polling for th

2023-11-09 09:12:46,061 fedbiomed DEBUG - Node: node_56066d82-ae6e-411b-a06b-8eec48f17acd polling for the tasks
2023-11-09 09:13:42,089 fedbiomed DEBUG - Node: node_01f09341-906f-4cc9-81c7-0bd9882c5c15 polling for the tasks
2023-11-09 09:13:46,060 fedbiomed DEBUG - Node: node_56066d82-ae6e-411b-a06b-8eec48f17acd polling for the tasks
2023-11-09 09:14:42,093 fedbiomed DEBUG - Node: node_01f09341-906f-4cc9-81c7-0bd9882c5c15 polling for the tasks
2023-11-09 09:14:46,061 fedbiomed DEBUG - Node: node_56066d82-ae6e-411b-a06b-8eec48f17acd polling for the tasks
2023-11-09 09:15:42,095 fedbiomed DEBUG - Node: node_01f09341-906f-4cc9-81c7-0bd9882c5c15 polling for the tasks
2023-11-09 09:15:46,065 fedbiomed DEBUG - Node: node_56066d82-ae6e-411b-a06b-8eec48f17acd polling for the tasks
2023-11-09 09:16:42,096 fedbiomed DEBUG - Node: node_01f09341-906f-4cc9-81c7-0bd9882c5c15 polling for the tasks
2023-11-09 09:16:46,066 fedbiomed DEBUG - Node: node_56066d82-ae6e-411b-a06b-8eec48f17acd polling for th

2023-11-09 09:49:46,156 fedbiomed DEBUG - Node: node_56066d82-ae6e-411b-a06b-8eec48f17acd polling for the tasks
2023-11-09 09:50:42,185 fedbiomed DEBUG - Node: node_01f09341-906f-4cc9-81c7-0bd9882c5c15 polling for the tasks
2023-11-09 09:50:46,159 fedbiomed DEBUG - Node: node_56066d82-ae6e-411b-a06b-8eec48f17acd polling for the tasks
2023-11-09 09:51:42,188 fedbiomed DEBUG - Node: node_01f09341-906f-4cc9-81c7-0bd9882c5c15 polling for the tasks
2023-11-09 09:51:46,162 fedbiomed DEBUG - Node: node_56066d82-ae6e-411b-a06b-8eec48f17acd polling for the tasks
2023-11-09 09:52:42,191 fedbiomed DEBUG - Node: node_01f09341-906f-4cc9-81c7-0bd9882c5c15 polling for the tasks
2023-11-09 09:52:46,166 fedbiomed DEBUG - Node: node_56066d82-ae6e-411b-a06b-8eec48f17acd polling for the tasks
2023-11-09 09:53:42,194 fedbiomed DEBUG - Node: node_01f09341-906f-4cc9-81c7-0bd9882c5c15 polling for the tasks
2023-11-09 09:53:46,168 fedbiomed DEBUG - Node: node_56066d82-ae6e-411b-a06b-8eec48f17acd polling for th

2023-11-09 10:26:46,259 fedbiomed DEBUG - Node: node_56066d82-ae6e-411b-a06b-8eec48f17acd polling for the tasks
2023-11-09 10:27:42,290 fedbiomed DEBUG - Node: node_01f09341-906f-4cc9-81c7-0bd9882c5c15 polling for the tasks
2023-11-09 10:27:46,261 fedbiomed DEBUG - Node: node_56066d82-ae6e-411b-a06b-8eec48f17acd polling for the tasks
2023-11-09 10:28:42,293 fedbiomed DEBUG - Node: node_01f09341-906f-4cc9-81c7-0bd9882c5c15 polling for the tasks
2023-11-09 10:28:46,266 fedbiomed DEBUG - Node: node_56066d82-ae6e-411b-a06b-8eec48f17acd polling for the tasks
2023-11-09 10:29:42,295 fedbiomed DEBUG - Node: node_01f09341-906f-4cc9-81c7-0bd9882c5c15 polling for the tasks
2023-11-09 10:29:46,268 fedbiomed DEBUG - Node: node_56066d82-ae6e-411b-a06b-8eec48f17acd polling for the tasks
2023-11-09 10:30:42,298 fedbiomed DEBUG - Node: node_01f09341-906f-4cc9-81c7-0bd9882c5c15 polling for the tasks
2023-11-09 10:30:46,272 fedbiomed DEBUG - Node: node_56066d82-ae6e-411b-a06b-8eec48f17acd polling for th

2023-11-09 11:03:46,359 fedbiomed DEBUG - Node: node_56066d82-ae6e-411b-a06b-8eec48f17acd polling for the tasks
2023-11-09 11:04:42,391 fedbiomed DEBUG - Node: node_01f09341-906f-4cc9-81c7-0bd9882c5c15 polling for the tasks
2023-11-09 11:04:46,363 fedbiomed DEBUG - Node: node_56066d82-ae6e-411b-a06b-8eec48f17acd polling for the tasks
2023-11-09 11:05:42,394 fedbiomed DEBUG - Node: node_01f09341-906f-4cc9-81c7-0bd9882c5c15 polling for the tasks
2023-11-09 11:05:46,366 fedbiomed DEBUG - Node: node_56066d82-ae6e-411b-a06b-8eec48f17acd polling for the tasks
2023-11-09 11:06:42,399 fedbiomed DEBUG - Node: node_01f09341-906f-4cc9-81c7-0bd9882c5c15 polling for the tasks
2023-11-09 11:06:46,366 fedbiomed DEBUG - Node: node_56066d82-ae6e-411b-a06b-8eec48f17acd polling for the tasks
2023-11-09 11:07:42,401 fedbiomed DEBUG - Node: node_01f09341-906f-4cc9-81c7-0bd9882c5c15 polling for the tasks
2023-11-09 11:07:46,370 fedbiomed DEBUG - Node: node_56066d82-ae6e-411b-a06b-8eec48f17acd polling for th

2023-11-09 11:40:46,465 fedbiomed DEBUG - Node: node_56066d82-ae6e-411b-a06b-8eec48f17acd polling for the tasks
2023-11-09 11:41:42,500 fedbiomed DEBUG - Node: node_01f09341-906f-4cc9-81c7-0bd9882c5c15 polling for the tasks
2023-11-09 11:41:46,468 fedbiomed DEBUG - Node: node_56066d82-ae6e-411b-a06b-8eec48f17acd polling for the tasks
2023-11-09 11:42:42,501 fedbiomed DEBUG - Node: node_01f09341-906f-4cc9-81c7-0bd9882c5c15 polling for the tasks
2023-11-09 11:42:46,474 fedbiomed DEBUG - Node: node_56066d82-ae6e-411b-a06b-8eec48f17acd polling for the tasks
2023-11-09 11:43:42,505 fedbiomed DEBUG - Node: node_01f09341-906f-4cc9-81c7-0bd9882c5c15 polling for the tasks
2023-11-09 11:43:46,475 fedbiomed DEBUG - Node: node_56066d82-ae6e-411b-a06b-8eec48f17acd polling for the tasks
2023-11-09 11:44:42,508 fedbiomed DEBUG - Node: node_01f09341-906f-4cc9-81c7-0bd9882c5c15 polling for the tasks
2023-11-09 11:44:46,478 fedbiomed DEBUG - Node: node_56066d82-ae6e-411b-a06b-8eec48f17acd polling for th

Local training results for each round and each node are available via `exp.training_replies()` (index 0 to (`rounds` - 1) ).

For example you can view the training results for the last round below.

Different timings (in seconds) are reported for each dataset of a node participating in a round :
- `rtime_training` real time (clock time) spent in the training function on the node
- `ptime_training` process time (user and system CPU) spent in the training function on the node
- `rtime_total` real time (clock time) spent in the researcher between sending the request and handling the response, at the `Job()` layer

In [None]:
print("\nList the training rounds : ", exp.training_replies().keys())

print("\nList the nodes for the last training round and their timings : ")
round_data = exp.training_replies()[rounds - 1].data()
for c in range(len(round_data)):
    print("\t- {id} :\
    \n\t\trtime_training={rtraining:.2f} seconds\
    \n\t\tptime_training={ptraining:.2f} seconds\
    \n\t\trtime_total={rtotal:.2f} seconds".format(id = round_data[c]['node_id'],
        rtraining = round_data[c]['timing']['rtime_training'],
        ptraining = round_data[c]['timing']['ptime_training'],
        rtotal = round_data[c]['timing']['rtime_total']))
print('\n')
    
exp.training_replies()[rounds - 1].dataframe()

Federated parameters for each round are available via `exp.aggregated_params()` (index 0 to (`rounds` - 1) ).

For example you can view the federated parameters for the last round of the experiment :

In [None]:
print("\nList the training rounds : ", exp.aggregated_params().keys())

print("\nAccess the federated params for the last training round :")
print("\t- params_path: ", exp.aggregated_params()[rounds - 1]['params_path'])
print("\t- parameter data: ", exp.aggregated_params()[rounds - 1]['params'].keys())


Feel free to run other sample notebooks or try your own models :D