# Fedbiomed Researcher base example

Use for developing (autoreloads changes made across packages)

In [1]:
%load_ext autoreload
%autoreload 2

## Start the network
Before running this notebook, start the network with `./scripts/fedbiomed_run network`

## Setting the node up
It is necessary to previously configure a node:
1. `./scripts/fedbiomed_run node add`
  * Select option 2 (default) to add MNIST to the node
  * Confirm default tags by hitting "y" and ENTER
  * Pick the folder where MNIST is downloaded (this is due torch issue https://github.com/pytorch/vision/issues/3549)
  * Data must have been added (if you get a warning saying that data must be unique is because it's been already added)
  
2. Check that your data has been added by executing `./scripts/fedbiomed_run node list`
3. Run the node using `./scripts/fedbiomed_run node run`. Wait until you get `Starting task manager`. it means you are online.

## Define an experiment model and parameters"

Declare a torch.nn MyTrainingPlan class to send for training on the node

In [15]:
from fedbiomed.researcher.environ import environ
import tempfile
tmp_dir_model = tempfile.TemporaryDirectory(dir=environ['TMP_DIR']+'/')
model_file = tmp_dir_model.name + '/class_export_mnist.py'

Note : write **only** the code to export in the following cell

In [16]:
%%writefile "$model_file"

import torch
import torch.nn as nn
from fedbiomed.common.torchnn import TorchTrainingPlan
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

# Here we define the model to be used. 
# You can use any class name (here 'Net')
class MyTrainingPlan(TorchTrainingPlan):
    def __init__(self):
        super(MyTrainingPlan, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, 3, 1)
        self.conv2 = nn.Conv2d(32, 64, 3, 1)
        self.dropout1 = nn.Dropout(0.25)
        self.dropout2 = nn.Dropout(0.5)
        self.fc1 = nn.Linear(9216, 128)
        self.fc2 = nn.Linear(128, 10)
        
        # Here we define the custom dependencies that will be needed by our custom Dataloader
        # In this case, we need the torch DataLoader classes
        # Since we will train on MNIST, we need datasets and transform from torchvision
        deps = ["from torchvision import datasets, transforms",
               "from torch.utils.data import DataLoader"]
        self.add_dependency(deps)

    def forward(self, x):
        x = self.conv1(x)
        x = F.relu(x)
        x = self.conv2(x)
        x = F.relu(x)
        x = F.max_pool2d(x, 2)
        x = self.dropout1(x)
        x = torch.flatten(x, 1)
        x = self.fc1(x)
        x = F.relu(x)
        x = self.dropout2(x)
        x = self.fc2(x)
        
        
        output = F.log_softmax(x, dim=1)
        return output

    def training_data(self, batch_size = 48):
        # Custom torch Dataloader for MNIST data
        transform = transforms.Compose([transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,))])
        dataset1 = datasets.MNIST(self.dataset_path, train=True, download=False, transform=transform)
        train_kwargs = {'batch_size': batch_size, 'shuffle': True}
        data_loader = torch.utils.data.DataLoader(dataset1, **train_kwargs)
        return data_loader
    
    def training_step(self, data, target):
        output = self.forward(data)
        loss   = torch.nn.functional.nll_loss(output, target)
        return loss


Writing /home/scansiz/Desktop/Inria/development/fedbiomed/var/tmp/tmpmpzy4run/class_export_mnist.py


This group of arguments correspond respectively:
* `model_args`: a dictionary with the arguments related to the model (e.g. number of layers, features, etc.). This will be passed to the model class on the node side.
* `training_args`: a dictionary containing the arguments for the training routine (e.g. batch size, learning rate, epochs, etc.). This will be passed to the routine on the node side.

**NOTE:** typos and/or lack of positional (required) arguments will raise error. 🤓

In [17]:
model_args = {}

training_args = {
    'batch_size': 48, 
    'lr': 1e-3, 
    'epochs': 1, 
    'dry_run': False,  
    'batch_maxnum': 100 # Fast pass for development : only use ( batch_maxnum * batch_size ) samples
}

In [18]:
import uuid 
str(uuid.uuid4())

'730b0f41-3fa4-4c66-a1db-d12b52e116a6'

## Declare and run the experiment

- search nodes serving data for these `tags`, optionally filter on a list of node ID with `nodes`
- run a round of local training on nodes with model defined in `model_path` + federation with `aggregator`
- run for `rounds` rounds, applying the `node_selection_strategy` between the rounds

In [20]:
from fedbiomed.researcher.experiment import Experiment
from fedbiomed.researcher.aggregators.fedavg import FedAverage

tags =  ['#MNIST', '#dataset']
rounds = 2

exp = Experiment(tags=tags,
                 #nodes=None,
                 model_path=model_file,
                 model_args=model_args,
                 model_class='MyTrainingPlan',
                 training_args=training_args,
                 rounds=rounds,
                 aggregator='asdads',
                 node_selection_strategy=None)

2022-01-20 16:14:17,918 fedbiomed INFO - Searching dataset with data tags: ['#MNIST', '#dataset'] for all nodes
2022-01-20 16:14:17,922 fedbiomed INFO - log from: node_f290cd48-a70a-4e55-9262-81f802f9c95c / DEBUG - Message received: {'researcher_id': 'researcher_420cfc13-37cb-447c-af20-f7ac5cb2b6ab', 'tags': ['#MNIST', '#dataset'], 'command': 'search'}
2022-01-20 16:14:27,930 fedbiomed INFO - Node selected for training -> node_f290cd48-a70a-4e55-9262-81f802f9c95c
2022-01-20 16:14:27,932 fedbiomed CRITICAL - FB419: Aggregator type is '<class 'str'>'  and it is not instance of fedbiomed.researcher.aggregators.aggregator.Aggregator.  
2022-01-20 16:14:28,072 fedbiomed DEBUG - torchnn saved model filename: /home/scansiz/Desktop/Inria/development/fedbiomed/var/experiments/Experiment_0025/my_model_cd916954-e4cb-4bc3-b138-a3913c6f3ad4.py


In [21]:
print(exp)

<fedbiomed.researcher.experiment.Experiment object at 0x7f6d509c6760>


Let's start the experiment.

By default, this function doesn't stop until all the `rounds` are done for all the nodes

In [22]:
exp.run()

Exception: Error while running the experiment: 

- FB415: Please set an aggregator

## Declare an Experiment Step by Step 
### Building Empty Experiment

In [2]:
from fedbiomed.researcher.experiment import Experiment
exp2 = Experiment()

2022-01-20 15:46:33,030 fedbiomed INFO - Component environment:
2022-01-20 15:46:33,031 fedbiomed INFO - - type = ComponentType.RESEARCHER
2022-01-20 15:46:34,832 fedbiomed INFO - Messaging researcher_420cfc13-37cb-447c-af20-f7ac5cb2b6ab successfully connected to the message broker, object = <fedbiomed.common.messaging.Messaging object at 0x7f6d522478b0>


### Setting Tags 

In [None]:
tags = ['#MNIST', '#dataset']
exp2.set_tags(tags = tags)

### Setting Model Path and Model Model Class

In [None]:
exp2.set_model_path(model_path = model_file)
exp2.set_model_class(model_class = 'MyTrainingPlan')

### Setting Model Arguments and Training Arguments

In [None]:
model_args = {}

training_args = {
    'batch_size': 48, 
    'lr': 1e-3, 
    'epochs': 1, 
    'dry_run': False,  
    'batch_maxnum': 100
}

exp2.set_model_args(model_args = model_args)
exp2.set_training_args(training_args = training_args)

### Setting Training Data

The method `set_trainig_data` gets there arguments: 

- `tags` : List of tags as string for the search request. If it is not provided. The method will try to use `tags` attribute of the object. 
- `nodes`: List of node ids that a search request will be sent. If this argument is not provided search request will be sent to all active nodes.  
- `training_data`: A dictionary or `FederatedDataset` object. If `training_data` provided search request with `tags` and `nodes` will be ignored. 

In [None]:
exp2.set_training_data()

### Setting Job 

Setting job will prepare all neccessary assets to be able to run a round. Therefore, `Job` should be set before running the experiment.  

To be able to set `Job`, you should be already set the arguments: `model_path`, `model_class`, `training_data`. Otherwiser `set_job()` will reaise an Exception. 

In [12]:
exp2.set_job()

2022-01-20 16:00:38,293 fedbiomed CRITICAL - Error while setting Job: 

- FB410: Please set training arguments with `.set_training_args()` before setting a `Job`.
- FB410: No Federated Dataset is found. Please use `.set_training_data()` before setting a `Job`.
- FB410: `model_class` is mandatory for setting `Job`.  Please initialize experiment with model class or use `.set_model_class()` method of the experiment


In [14]:
print(exp2.job())

None


In [None]:
exp2.set_node_selection_strategy()

In [None]:
exp2.run_once()

In [None]:
exp2.run_once()

### Changing Experiment Parameters with Setters
If the `Job` is already initialize and the arguments related to model is modified, `Job` should reinitialize with the method `.set_job()`. This information is also given by Experiment after setting model file.  
  
    
    
<div class="note">
    <p>After runing the experiment changing the model might have some consequances.</p>
</div>

In [None]:
exp2.set_model_path(model_file)
exp2.set_model_class('MyTrainingPlan')

In [None]:
exp2.set_job()

#### Changing Aggregator

Aggregator should be instance of `fedbiomed.researcher.aggregators.aggregator.Aggregator`. Otherwise `set_aggregator` will raise an Expection. Aggregator should be passed as `Callable` class or alredy built object.

Following cell will raise an Exception:

In [7]:
exp2.set_aggregator('ThisIsNotAnAggregator')

Exception: FB419: Aggregator type is '<class 'str'>'  and it is not instance of fedbiomed.researcher.aggregators.aggregator.Aggregator.  

Correct usage: 

In [8]:
from fedbiomed.researcher.aggregators.fedavg import FedAverage
# Can be passed as Callable class
exp2.set_aggregator(FedAverage)

# Can be passed as already build class
fedavg = FedAverage()
exp2.set_aggregator(fedavg)

In [None]:
print("\nList the training rounds : ", exp.training_replies.keys())

print("\nList the nodes for the last training round and their timings : ")
round_data = exp.training_replies[rounds - 1].data
for c in range(len(round_data)):
    print("\t- {id} :\
    \n\t\trtime_training={rtraining:.2f} seconds\
    \n\t\tptime_training={ptraining:.2f} seconds\
    \n\t\trtime_total={rtotal:.2f} seconds".format(id = round_data[c]['node_id'],
        rtraining = round_data[c]['timing']['rtime_training'],
        ptraining = round_data[c]['timing']['ptime_training'],
        rtotal = round_data[c]['timing']['rtime_total']))
print('\n')
    
exp.training_replies[rounds - 1].dataframe

Federated parameters for each round are available in `exp.aggregated_params` (index 0 to (`rounds` - 1) ).

For example you can view the federated parameters for the last round of the experiment :

In [None]:
print("\nList the training rounds : ", exp.aggregated_params.keys())

print("\nAccess the federated params for the last training round :")
print("\t- params_path: ", exp.aggregated_params[rounds - 1]['params_path'])
print("\t- parameter data: ", exp.aggregated_params[rounds - 1]['params'].keys())


Feel free to run other sample notebooks or try your own models :D