# Fed-BioMed Researcher base example

Use for developing (autoreloads changes made across packages)

In [None]:
%load_ext autoreload
%autoreload 2

## Start the network
Before running this notebook, start the network with `./scripts/fedbiomed_run network`

## Setting the node up
It is necessary to previously configure a node:
1. `./scripts/fedbiomed_run node add`
  * Select option 2 (default) to add MNIST to the node
  * Confirm default tags by hitting "y" and ENTER
  * Pick the folder where MNIST is downloaded (this is due to a pytorch issue https://github.com/pytorch/vision/issues/3549)
  * Data must have been added (if you get a warning saying that data must be unique is because it's been already added)
  
2. Check that your data has been added by executing `./scripts/fedbiomed_run node list`
3. Run the node using `./scripts/fedbiomed_run node run`. Wait until you get `Starting task manager`. it means you are online.

## Define an experiment model and parameters"

Declare a torch.nn MyTrainingPlan class to send for training on the node

In [5]:
import torch
import torch.nn as nn
from fedbiomed.common.training_plans import TorchTrainingPlan
from fedbiomed.common.data import DataManager
from torchvision import datasets, transforms

            # Here we define the custom dependencies that will be needed by our custom Dataloader
            # In this case, we need the torch DataLoader classes
            # Since we will train on MNIST, we need datasets and transform from torchvision
            #deps = ["from torchvision import datasets, transforms"]

            #self.add_dependency(deps)

# Here we define the model to be used. 
# You can use any class name (here 'Net')
class MyTrainingPlan(TorchTrainingPlan):
    
    
    def build_model(self, model_args):
        return self.Net(model_args = model_args)
    
    def build_optimizer(self, optimizer_args):
        return torch.optim.Adam(self.model.parameters(), lr = optimizer_args["lr"])
    
    class Net(nn.Module):
        def __init__(self, model_args):
            super().__init__()
            self.conv1 = nn.Conv2d(1, 32, 3, 1)
            self.conv2 = nn.Conv2d(32, 64, 3, 1)
            self.dropout1 = nn.Dropout(0.25)
            self.dropout2 = nn.Dropout(0.5)
            self.fc1 = nn.Linear(9216, 128)
            self.fc2 = nn.Linear(128, 10)

        def forward(self, x):
            x = self.conv1(x)
            x = F.relu(x)
            x = self.conv2(x)
            x = F.relu(x)
            x = F.max_pool2d(x, 2)
            x = self.dropout1(x)
            x = torch.flatten(x, 1)
            x = self.fc1(x)
            x = F.relu(x)
            x = self.dropout2(x)
            x = self.fc2(x)


            output = F.log_softmax(x, dim=1)
            return output

    def training_data(self, batch_size = 48):
        # Custom torch Dataloader for MNIST data
        transform = transforms.Compose([transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,))])
        dataset1 = datasets.MNIST(self.dataset_path, train=True, download=False, transform=transform)
        train_kwargs = {'batch_size': batch_size, 'shuffle': True}
        return DataManager(dataset=dataset1, **train_kwargs)
    
    def training_step(self, data, target):
        output = self.model.forward(data)
        loss   = torch.nn.functional.nll_loss(output, target)
        return loss


This group of arguments correspond respectively:
* `model_args`: a dictionary with the arguments related to the model (e.g. number of layers, features, etc.). This will be passed to the model class on the node side.
* `training_args`: a dictionary containing the arguments for the training routine (e.g. batch size, learning rate, epochs, etc.). This will be passed to the routine on the node side.

**NOTE:** typos and/or lack of positional (required) arguments will raise error. 🤓

In [6]:
model_args = {}

training_args = {
    'batch_size': 48, 
    'optimizer_args': {
        "lr" : 1e-3
    },
    'epochs': 1, 
    'dry_run': False,  
    'batch_maxnum': 100 # Fast pass for development : only use ( batch_maxnum * batch_size ) samples
}

## Declare and run the experiment

- search nodes serving data for these `tags`, optionally filter on a list of node ID with `nodes`
- run a round of local training on nodes with model defined in `model_path` + federation with `aggregator`
- run for `round_limit` rounds, applying the `node_selection_strategy` between the rounds

In [7]:
from fedbiomed.researcher.experiment import Experiment
from fedbiomed.researcher.aggregators.fedavg import FedAverage

tags =  ['#MNIST', '#dataset']
rounds = 2

exp = Experiment(tags=tags,
                 model_args=model_args,
                 model_class=MyTrainingPlan,
                 training_args=training_args,
                 round_limit=rounds,
                 aggregator=FedAverage(),
                 node_selection_strategy=None)

2022-09-09 18:44:08,243 fedbiomed INFO - Searching dataset with data tags: ['#MNIST', '#dataset'] for all nodes
2022-09-09 18:44:18,258 fedbiomed INFO - Node selected for training -> node_6ef13f5b-dc01-42c1-b2cc-c93106987bff
2022-09-09 18:44:18,261 fedbiomed INFO - {'batch_maxnum': 100, 'fedprox_mu': None, 'log_interval': 10, 'dry_run': False, 'epochs': 1}
2022-09-09 18:44:18,287 fedbiomed DEBUG - Model file has been saved: /home/scansiz/projects/fedbiomed-dev/fedbiomed/var/experiments/Experiment_0007/my_model_98d7e626-1df5-4671-92da-a1472c4129fa.py
2022-09-09 18:44:18,343 fedbiomed DEBUG - upload (HTTP POST request) of file /home/scansiz/projects/fedbiomed-dev/fedbiomed/var/experiments/Experiment_0007/my_model_98d7e626-1df5-4671-92da-a1472c4129fa.py successful, with status code 201


HEREEEE
BEFORE REQUEST HANDLER
{'file': <_io.BufferedReader name='/home/scansiz/projects/fedbiomed-dev/fedbiomed/var/experiments/Experiment_0007/my_model_98d7e626-1df5-4671-92da-a1472c4129fa.py'>}
BEFORE_HTTP_REQUEST
UPDALOADED
BEFORE REQUEST HANDLER
{'file': <_io.BufferedReader name='/home/scansiz/projects/fedbiomed-dev/fedbiomed/var/experiments/Experiment_0007/aggregated_params_init_ac622e7b-cd4f-487a-b441-0bb30b995c15.pt'>}
BEFORE_HTTP_REQUEST


2022-09-09 18:44:18,520 fedbiomed DEBUG - upload (HTTP POST request) of file /home/scansiz/projects/fedbiomed-dev/fedbiomed/var/experiments/Experiment_0007/aggregated_params_init_ac622e7b-cd4f-487a-b441-0bb30b995c15.pt successful, with status code 201


Let's start the experiment.

By default, this function doesn't stop until all the `round_limit` rounds are done for all the nodes

In [10]:
exp.run_once(increase=True)

2022-09-09 18:45:08,864 fedbiomed DEBUG - Auto increasing total rounds for experiment from 2 to 3
2022-09-09 18:45:08,865 fedbiomed INFO - Sampled nodes in round 2 ['node_6ef13f5b-dc01-42c1-b2cc-c93106987bff']
2022-09-09 18:45:08,866 fedbiomed INFO - [1mSending request[0m 
					[1m To[0m: node_6ef13f5b-dc01-42c1-b2cc-c93106987bff 
					[1m Request: [0m: Perform training with the arguments: {'researcher_id': 'researcher_de1cf39f-f3cc-4399-938c-9e97b8312127', 'job_id': '1239e602-d174-4fb5-8989-f9fe135af39b', 'training_args': scheme:
{'optimizer_args': {'rules': [<class 'dict'>], 'required': True, 'default': {}}, 'batch_size': {'rules': [<class 'int'>], 'required': True, 'default': 48}, 'epochs': {'rules': [<class 'int'>], 'required': True, 'default': 1}, 'dry_run': {'rules': [<class 'bool'>], 'required': True, 'default': False}, 'batch_maxnum': {'rules': [<class 'int'>], 'required': True, 'default': 100}, 'test_ratio': {'rules': [<class 'float'>, <function TrainingArgs._test_ratio_

BEFORE_HTTP_REQUEST
BEFORE REQUEST HANDLER
{'file': <_io.BufferedReader name='/home/scansiz/projects/fedbiomed-dev/fedbiomed/var/experiments/Experiment_0007/aggregated_params_307d18f5-e0f1-4cf6-a6fe-a37841ceb464.pt'>}
BEFORE_HTTP_REQUEST


2022-09-09 18:45:24,144 fedbiomed DEBUG - upload (HTTP POST request) of file /home/scansiz/projects/fedbiomed-dev/fedbiomed/var/experiments/Experiment_0007/aggregated_params_307d18f5-e0f1-4cf6-a6fe-a37841ceb464.pt successful, with status code 201
2022-09-09 18:45:24,146 fedbiomed INFO - Saved aggregated params for round 2 in /home/scansiz/projects/fedbiomed-dev/fedbiomed/var/experiments/Experiment_0007/aggregated_params_307d18f5-e0f1-4cf6-a6fe-a37841ceb464.pt


1

In [11]:
exp.run(rounds=8, increase=True)

2022-09-09 18:45:31,087 fedbiomed DEBUG - Auto increasing total rounds for experiment from 3 to 11
2022-09-09 18:45:31,088 fedbiomed INFO - Sampled nodes in round 3 ['node_6ef13f5b-dc01-42c1-b2cc-c93106987bff']
2022-09-09 18:45:31,089 fedbiomed INFO - [1mSending request[0m 
					[1m To[0m: node_6ef13f5b-dc01-42c1-b2cc-c93106987bff 
					[1m Request: [0m: Perform training with the arguments: {'researcher_id': 'researcher_de1cf39f-f3cc-4399-938c-9e97b8312127', 'job_id': '1239e602-d174-4fb5-8989-f9fe135af39b', 'training_args': scheme:
{'optimizer_args': {'rules': [<class 'dict'>], 'required': True, 'default': {}}, 'batch_size': {'rules': [<class 'int'>], 'required': True, 'default': 48}, 'epochs': {'rules': [<class 'int'>], 'required': True, 'default': 1}, 'dry_run': {'rules': [<class 'bool'>], 'required': True, 'default': False}, 'batch_maxnum': {'rules': [<class 'int'>], 'required': True, 'default': 100}, 'test_ratio': {'rules': [<class 'float'>, <function TrainingArgs._test_ratio

BEFORE_HTTP_REQUEST
BEFORE REQUEST HANDLER
{'file': <_io.BufferedReader name='/home/scansiz/projects/fedbiomed-dev/fedbiomed/var/experiments/Experiment_0007/aggregated_params_a68c183d-deb0-4782-88ad-62732e157d41.pt'>}
BEFORE_HTTP_REQUEST


2022-09-09 18:45:46,311 fedbiomed DEBUG - upload (HTTP POST request) of file /home/scansiz/projects/fedbiomed-dev/fedbiomed/var/experiments/Experiment_0007/aggregated_params_a68c183d-deb0-4782-88ad-62732e157d41.pt successful, with status code 201
2022-09-09 18:45:46,312 fedbiomed INFO - Saved aggregated params for round 3 in /home/scansiz/projects/fedbiomed-dev/fedbiomed/var/experiments/Experiment_0007/aggregated_params_a68c183d-deb0-4782-88ad-62732e157d41.pt
2022-09-09 18:45:46,313 fedbiomed INFO - Sampled nodes in round 4 ['node_6ef13f5b-dc01-42c1-b2cc-c93106987bff']
2022-09-09 18:45:46,314 fedbiomed INFO - [1mSending request[0m 
					[1m To[0m: node_6ef13f5b-dc01-42c1-b2cc-c93106987bff 
					[1m Request: [0m: Perform training with the arguments: {'researcher_id': 'researcher_de1cf39f-f3cc-4399-938c-9e97b8312127', 'job_id': '1239e602-d174-4fb5-8989-f9fe135af39b', 'training_args': scheme:
{'optimizer_args': {'rules': [<class 'dict'>], 'required': True, 'default': {}}, 'batch_siz

BEFORE_HTTP_REQUEST
BEFORE REQUEST HANDLER
{'file': <_io.BufferedReader name='/home/scansiz/projects/fedbiomed-dev/fedbiomed/var/experiments/Experiment_0007/aggregated_params_531528e1-96e3-4d2f-b16b-25bc1c293fe2.pt'>}
BEFORE_HTTP_REQUEST


2022-09-09 18:46:01,565 fedbiomed DEBUG - upload (HTTP POST request) of file /home/scansiz/projects/fedbiomed-dev/fedbiomed/var/experiments/Experiment_0007/aggregated_params_531528e1-96e3-4d2f-b16b-25bc1c293fe2.pt successful, with status code 201
2022-09-09 18:46:01,567 fedbiomed INFO - Saved aggregated params for round 4 in /home/scansiz/projects/fedbiomed-dev/fedbiomed/var/experiments/Experiment_0007/aggregated_params_531528e1-96e3-4d2f-b16b-25bc1c293fe2.pt
2022-09-09 18:46:01,568 fedbiomed INFO - Sampled nodes in round 5 ['node_6ef13f5b-dc01-42c1-b2cc-c93106987bff']
2022-09-09 18:46:01,568 fedbiomed INFO - [1mSending request[0m 
					[1m To[0m: node_6ef13f5b-dc01-42c1-b2cc-c93106987bff 
					[1m Request: [0m: Perform training with the arguments: {'researcher_id': 'researcher_de1cf39f-f3cc-4399-938c-9e97b8312127', 'job_id': '1239e602-d174-4fb5-8989-f9fe135af39b', 'training_args': scheme:
{'optimizer_args': {'rules': [<class 'dict'>], 'required': True, 'default': {}}, 'batch_siz

BEFORE_HTTP_REQUEST
BEFORE REQUEST HANDLER
{'file': <_io.BufferedReader name='/home/scansiz/projects/fedbiomed-dev/fedbiomed/var/experiments/Experiment_0007/aggregated_params_bf1d3a90-3808-421d-bc5e-37c162b282d1.pt'>}
BEFORE_HTTP_REQUEST


2022-09-09 18:46:16,827 fedbiomed DEBUG - upload (HTTP POST request) of file /home/scansiz/projects/fedbiomed-dev/fedbiomed/var/experiments/Experiment_0007/aggregated_params_bf1d3a90-3808-421d-bc5e-37c162b282d1.pt successful, with status code 201
2022-09-09 18:46:16,828 fedbiomed INFO - Saved aggregated params for round 5 in /home/scansiz/projects/fedbiomed-dev/fedbiomed/var/experiments/Experiment_0007/aggregated_params_bf1d3a90-3808-421d-bc5e-37c162b282d1.pt
2022-09-09 18:46:16,829 fedbiomed INFO - Sampled nodes in round 6 ['node_6ef13f5b-dc01-42c1-b2cc-c93106987bff']
2022-09-09 18:46:16,830 fedbiomed INFO - [1mSending request[0m 
					[1m To[0m: node_6ef13f5b-dc01-42c1-b2cc-c93106987bff 
					[1m Request: [0m: Perform training with the arguments: {'researcher_id': 'researcher_de1cf39f-f3cc-4399-938c-9e97b8312127', 'job_id': '1239e602-d174-4fb5-8989-f9fe135af39b', 'training_args': scheme:
{'optimizer_args': {'rules': [<class 'dict'>], 'required': True, 'default': {}}, 'batch_siz

BEFORE_HTTP_REQUEST
BEFORE REQUEST HANDLER
{'file': <_io.BufferedReader name='/home/scansiz/projects/fedbiomed-dev/fedbiomed/var/experiments/Experiment_0007/aggregated_params_66f83d1b-1592-4e38-9882-e8524394cc69.pt'>}
BEFORE_HTTP_REQUEST


2022-09-09 18:46:32,076 fedbiomed DEBUG - upload (HTTP POST request) of file /home/scansiz/projects/fedbiomed-dev/fedbiomed/var/experiments/Experiment_0007/aggregated_params_66f83d1b-1592-4e38-9882-e8524394cc69.pt successful, with status code 201
2022-09-09 18:46:32,077 fedbiomed INFO - Saved aggregated params for round 6 in /home/scansiz/projects/fedbiomed-dev/fedbiomed/var/experiments/Experiment_0007/aggregated_params_66f83d1b-1592-4e38-9882-e8524394cc69.pt
2022-09-09 18:46:32,078 fedbiomed INFO - Sampled nodes in round 7 ['node_6ef13f5b-dc01-42c1-b2cc-c93106987bff']
2022-09-09 18:46:32,078 fedbiomed INFO - [1mSending request[0m 
					[1m To[0m: node_6ef13f5b-dc01-42c1-b2cc-c93106987bff 
					[1m Request: [0m: Perform training with the arguments: {'researcher_id': 'researcher_de1cf39f-f3cc-4399-938c-9e97b8312127', 'job_id': '1239e602-d174-4fb5-8989-f9fe135af39b', 'training_args': scheme:
{'optimizer_args': {'rules': [<class 'dict'>], 'required': True, 'default': {}}, 'batch_siz

BEFORE_HTTP_REQUEST
BEFORE REQUEST HANDLER
{'file': <_io.BufferedReader name='/home/scansiz/projects/fedbiomed-dev/fedbiomed/var/experiments/Experiment_0007/aggregated_params_62c57fb9-4e20-48aa-b156-550b7edb58fe.pt'>}
BEFORE_HTTP_REQUEST


2022-09-09 18:46:47,367 fedbiomed DEBUG - upload (HTTP POST request) of file /home/scansiz/projects/fedbiomed-dev/fedbiomed/var/experiments/Experiment_0007/aggregated_params_62c57fb9-4e20-48aa-b156-550b7edb58fe.pt successful, with status code 201
2022-09-09 18:46:47,368 fedbiomed INFO - Saved aggregated params for round 7 in /home/scansiz/projects/fedbiomed-dev/fedbiomed/var/experiments/Experiment_0007/aggregated_params_62c57fb9-4e20-48aa-b156-550b7edb58fe.pt
2022-09-09 18:46:47,369 fedbiomed INFO - Sampled nodes in round 8 ['node_6ef13f5b-dc01-42c1-b2cc-c93106987bff']
2022-09-09 18:46:47,369 fedbiomed INFO - [1mSending request[0m 
					[1m To[0m: node_6ef13f5b-dc01-42c1-b2cc-c93106987bff 
					[1m Request: [0m: Perform training with the arguments: {'researcher_id': 'researcher_de1cf39f-f3cc-4399-938c-9e97b8312127', 'job_id': '1239e602-d174-4fb5-8989-f9fe135af39b', 'training_args': scheme:
{'optimizer_args': {'rules': [<class 'dict'>], 'required': True, 'default': {}}, 'batch_siz

BEFORE_HTTP_REQUEST
BEFORE REQUEST HANDLER
{'file': <_io.BufferedReader name='/home/scansiz/projects/fedbiomed-dev/fedbiomed/var/experiments/Experiment_0007/aggregated_params_29ef0f7f-1eb1-4b32-b757-c3d7941c357e.pt'>}
BEFORE_HTTP_REQUEST


2022-09-09 18:47:02,625 fedbiomed DEBUG - upload (HTTP POST request) of file /home/scansiz/projects/fedbiomed-dev/fedbiomed/var/experiments/Experiment_0007/aggregated_params_29ef0f7f-1eb1-4b32-b757-c3d7941c357e.pt successful, with status code 201
2022-09-09 18:47:02,626 fedbiomed INFO - Saved aggregated params for round 8 in /home/scansiz/projects/fedbiomed-dev/fedbiomed/var/experiments/Experiment_0007/aggregated_params_29ef0f7f-1eb1-4b32-b757-c3d7941c357e.pt
2022-09-09 18:47:02,626 fedbiomed INFO - Sampled nodes in round 9 ['node_6ef13f5b-dc01-42c1-b2cc-c93106987bff']
2022-09-09 18:47:02,627 fedbiomed INFO - [1mSending request[0m 
					[1m To[0m: node_6ef13f5b-dc01-42c1-b2cc-c93106987bff 
					[1m Request: [0m: Perform training with the arguments: {'researcher_id': 'researcher_de1cf39f-f3cc-4399-938c-9e97b8312127', 'job_id': '1239e602-d174-4fb5-8989-f9fe135af39b', 'training_args': scheme:
{'optimizer_args': {'rules': [<class 'dict'>], 'required': True, 'default': {}}, 'batch_siz

BEFORE_HTTP_REQUEST
BEFORE REQUEST HANDLER
{'file': <_io.BufferedReader name='/home/scansiz/projects/fedbiomed-dev/fedbiomed/var/experiments/Experiment_0007/aggregated_params_772ec5e2-4309-4955-850a-00fce1b68a29.pt'>}
BEFORE_HTTP_REQUEST


2022-09-09 18:47:17,876 fedbiomed DEBUG - upload (HTTP POST request) of file /home/scansiz/projects/fedbiomed-dev/fedbiomed/var/experiments/Experiment_0007/aggregated_params_772ec5e2-4309-4955-850a-00fce1b68a29.pt successful, with status code 201
2022-09-09 18:47:17,877 fedbiomed INFO - Saved aggregated params for round 9 in /home/scansiz/projects/fedbiomed-dev/fedbiomed/var/experiments/Experiment_0007/aggregated_params_772ec5e2-4309-4955-850a-00fce1b68a29.pt
2022-09-09 18:47:17,877 fedbiomed INFO - Sampled nodes in round 10 ['node_6ef13f5b-dc01-42c1-b2cc-c93106987bff']
2022-09-09 18:47:17,878 fedbiomed INFO - [1mSending request[0m 
					[1m To[0m: node_6ef13f5b-dc01-42c1-b2cc-c93106987bff 
					[1m Request: [0m: Perform training with the arguments: {'researcher_id': 'researcher_de1cf39f-f3cc-4399-938c-9e97b8312127', 'job_id': '1239e602-d174-4fb5-8989-f9fe135af39b', 'training_args': scheme:
{'optimizer_args': {'rules': [<class 'dict'>], 'required': True, 'default': {}}, 'batch_si

BEFORE_HTTP_REQUEST
BEFORE REQUEST HANDLER
{'file': <_io.BufferedReader name='/home/scansiz/projects/fedbiomed-dev/fedbiomed/var/experiments/Experiment_0007/aggregated_params_ce3e961b-9ed6-4370-b403-40e3b58600af.pt'>}
BEFORE_HTTP_REQUEST


8

2022-09-09 18:47:47,223 fedbiomed INFO - [1mCRITICAL[0m
					[1m NODE[0m node_6ef13f5b-dc01-42c1-b2cc-c93106987bff
					[1m MESSAGE:[0m Node stopped in signal_handler, probably by user decision (Ctrl C)[0m
-----------------------------------------------------------------
2022-09-09 18:47:47,398 fedbiomed INFO - [1mCRITICAL[0m
					[1m NODE[0m node_6ef13f5b-dc01-42c1-b2cc-c93106987bff
					[1m MESSAGE:[0m Node stopped in signal_handler, probably by user decision (Ctrl C)[0m
-----------------------------------------------------------------


Local training results for each round and each node are available via `exp.training_replies()` (index 0 to (`rounds` - 1) ).

For example you can view the training results for the last round below.

Different timings (in seconds) are reported for each dataset of a node participating in a round :
- `rtime_training` real time (clock time) spent in the training function on the node
- `ptime_training` process time (user and system CPU) spent in the training function on the node
- `rtime_total` real time (clock time) spent in the researcher between sending the request and handling the response, at the `Job()` layer

In [None]:
print("\nList the training rounds : ", exp.training_replies().keys())

print("\nList the nodes for the last training round and their timings : ")
round_data = exp.training_replies()[rounds - 1].data()
for c in range(len(round_data)):
    print("\t- {id} :\
    \n\t\trtime_training={rtraining:.2f} seconds\
    \n\t\tptime_training={ptraining:.2f} seconds\
    \n\t\trtime_total={rtotal:.2f} seconds".format(id = round_data[c]['node_id'],
        rtraining = round_data[c]['timing']['rtime_training'],
        ptraining = round_data[c]['timing']['ptime_training'],
        rtotal = round_data[c]['timing']['rtime_total']))
print('\n')
    
exp.training_replies()[rounds - 1].dataframe()

Federated parameters for each round are available via `exp.aggregated_params()` (index 0 to (`rounds` - 1) ).

For example you can view the federated parameters for the last round of the experiment :

In [None]:
print("\nList the training rounds : ", exp.aggregated_params().keys())

print("\nAccess the federated params for the last training round :")
print("\t- params_path: ", exp.aggregated_params()[rounds - 1]['params_path'])
print("\t- parameter data: ", exp.aggregated_params()[rounds - 1]['params'].keys())


Feel free to run other sample notebooks or try your own models :D