# Fedbiomed Researcher POC with LDP

In [1]:
%load_ext autoreload
%autoreload 2

## Start the network
Before running this notebook, start the network with `./scripts/fedbiomed_run network`

## Setting the nodes up
It is necessary to previously configure a node:
1. `./scripts/fedbiomed_run node add`
  * Select option 2 (default) to add MNIST to the node
  * Confirm default tags by hitting "y" and ENTER
  * Pick the folder where MNIST is downloaded (this is due torch issue https://github.com/pytorch/vision/issues/3549)
  * Data must have been added (if you get a warning saying that data must be unique is because it's been already added)
  
2. Check that your data has been added by executing `./scripts/fedbiomed_run node list`
3. Run the node using `./scripts/fedbiomed_run node start`. Wait until you get `Starting task manager`. it means you are online.
4. Following the same procedure, you can create a second node for client 2.

Check available clients:

In [2]:
from fedbiomed.researcher.requests import Requests
req = Requests()
req.list(verbose=True)

2022-03-03 23:14:54,254 fedbiomed INFO - Component environment:
2022-03-03 23:14:54,255 fedbiomed INFO - type = ComponentType.RESEARCHER
2022-03-03 23:14:54,738 fedbiomed INFO - Messaging researcher_939c8feb-35dd-42ba-ae62-221123361b5f successfully connected to the message broker, object = <fedbiomed.common.messaging.Messaging object at 0x108e86c70>
2022-03-03 23:14:54,835 fedbiomed INFO - Listing available datasets in all nodes... 
2022-03-03 23:14:54,840 fedbiomed INFO - log from: node_583b7f60-63bb-4d67-9f85-66b65b24179f / DEBUG - Message received: {'researcher_id': 'researcher_939c8feb-35dd-42ba-ae62-221123361b5f', 'command': 'list'}
2022-03-03 23:15:04,843 fedbiomed INFO - 
 Node: node_583b7f60-63bb-4d67-9f85-66b65b24179f | Number of Datasets: 1 
+--------+-------------+------------------------+----------------+--------------------+
| name   | data_type   | tags                   | description    | shape              |
| MNIST  | default     | ['#MNIST', '#dataset'] | MNIST databa

{'node_583b7f60-63bb-4d67-9f85-66b65b24179f': [{'name': 'MNIST',
   'data_type': 'default',
   'tags': ['#MNIST', '#dataset'],
   'description': 'MNIST database',
   'shape': [60000, 1, 28, 28]}]}

## Define an experiment model and parameters

Declare a torch.nn MyTrainingPlan class to send for training on the node

Note : write **only** the code to export in the following cell

In [3]:
import torch
import torch.nn as nn
from fedbiomed.common.training_plans import TorchTrainingPlan
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

# Here we define the model to be used. 
# You can use any class name (here 'Net')
class MyTrainingPlan(TorchTrainingPlan):
    def __init__(self, model_args: dict = {}):
        super(MyTrainingPlan, self).__init__(model_args)
        self.conv1 = nn.Conv2d(1, 32, 3, 1)
        self.conv2 = nn.Conv2d(32, 64, 3, 1)
        self.dropout1 = nn.Dropout(0.25)
        self.dropout2 = nn.Dropout(0.5)
        self.fc1 = nn.Linear(9216, 128)
        self.fc2 = nn.Linear(128, 10)
        self.C = model_args['DP']['C']
        self.sigma = model_args['DP']['sigma']
        
        # Here we define the custom dependencies that will be needed by our custom Dataloader
        # In this case, we need the torch DataLoader classes
        # Since we will train on MNIST, we need datasets and transform from torchvision
        deps = ["from torchvision import datasets, transforms",
               "from torch.utils.data import DataLoader",
               "import torch"]
        self.add_dependency(deps)

    def forward(self, x):
        x = self.conv1(x)
        x = F.relu(x)
        x = self.conv2(x)
        x = F.relu(x)
        x = F.max_pool2d(x, 2)
        x = self.dropout1(x)
        x = torch.flatten(x, 1)
        x = self.fc1(x)
        x = F.relu(x)
        x = self.dropout2(x)
        x = self.fc2(x)      
        
        output = F.log_softmax(x, dim=1)
        return output

    def training_data(self, batch_size = 48):
        # Custom torch Dataloader for MNIST data
        transform = transforms.Compose([transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,))])
        dataset1 = datasets.MNIST(self.dataset_path, train=True, download=False, transform=transform)
        train_kwargs = {'batch_size': batch_size, 'shuffle': True}
        data_loader = torch.utils.data.DataLoader(dataset1, **train_kwargs)
        return data_loader
    
    def training_step(self, data, target):
        output = self.forward(data)
        loss   = torch.nn.functional.nll_loss(output, target)
        return loss
    
    def postprocess(self, params):
        for name, param in params.items():
            
            ###
            ### Extracting the update
            ###
            delta_theta = params[name] - self.init_params[name]
            
            ###
            ### DP on update (L2 norm + noise)
            ###
            delta_theta_tilde = torch.clamp(delta_theta, -self.C, self.C) \
                            + torch.sqrt(torch.tensor([2]))*self.sigma*self.C * torch.randn_like(delta_theta)
            
            ###
            ### Other kind of perturbation models are possible
            ### torch.distributions.laplace.Laplace(torch.tensor([loc]), torch.tensor([scale]))
            ###
            params[name] = self.init_params[name] + delta_theta_tilde
            
        return params

This group of arguments correspond respectively:
* `model_args`: a dictionary with the arguments related to the model (e.g. number of layers, features, etc.). This will be passed to the model class on the node side. 
* `training_args`: a dictionary containing the arguments for the training routine (e.g. batch size, learning rate, epochs, etc.). This will be passed to the routine on the node side.
If FedProx optimisation is requested, `fedprox_mu` parameter must be defined here. It also must be a float between XX and YY.

**NOTE:** typos and/or lack of positional (required) arguments will raise error. 🤓

In [14]:
model_args = {'DP': {'C' : 5e-3, 'sigma' : 5e-3}}

training_args = {
    'batch_size': 48, 
    'lr': 1e-3, 
    'fedprox_mu': 0.01, 
    'epochs': 1, 
    'dry_run': False,  
    'batch_maxnum': 100 # Fast pass for development : only use ( batch_maxnum * batch_size ) samples
}

## Declare and run the experiment

- search nodes serving data for these `tags`, optionally filter on a list of node ID with `nodes`
- run a round of local training on nodes with model defined in `model_path` + federation with `aggregator`
- run for `round_limit` rounds, applying the `node_selection_strategy` between the rounds

In [15]:
from fedbiomed.researcher.experiment import Experiment
from fedbiomed.researcher.aggregators.fedavg import FedAverage

tags =  ['#MNIST', '#dataset']
rounds = 3

exp = Experiment(tags=tags,
                 model_args=model_args,
                 model_class=MyTrainingPlan,
                 training_args=training_args,
                 round_limit=rounds,
                 aggregator=FedAverage(),
                 node_selection_strategy=None)

2022-03-03 23:18:48,324 fedbiomed INFO - Searching dataset with data tags: ['#MNIST', '#dataset'] for all nodes
2022-03-03 23:18:48,329 fedbiomed INFO - log from: node_583b7f60-63bb-4d67-9f85-66b65b24179f / DEBUG - Message received: {'researcher_id': 'researcher_939c8feb-35dd-42ba-ae62-221123361b5f', 'tags': ['#MNIST', '#dataset'], 'command': 'search'}
2022-03-03 23:18:58,332 fedbiomed INFO - Node selected for training -> node_583b7f60-63bb-4d67-9f85-66b65b24179f
2022-03-03 23:18:58,345 fedbiomed DEBUG - Model file has been saved: /Users/mlorenzi/works/temp/fedbiomed/var/experiments/Experiment_0027/my_model_d61dca64-7189-4092-bc21-c7847a618fad.py
2022-03-03 23:18:58,405 fedbiomed DEBUG - upload (HTTP POST request) of file /Users/mlorenzi/works/temp/fedbiomed/var/experiments/Experiment_0027/my_model_d61dca64-7189-4092-bc21-c7847a618fad.py successful, with status code 201
2022-03-03 23:18:58,815 fedbiomed DEBUG - upload (HTTP POST request) of file /Users/mlorenzi/works/temp/fedbiomed/var

Let's start the experiment.

By default, this function doesn't stop until all the `round_limit` rounds are done for all the nodes

In [16]:
exp.run()

2022-03-03 23:18:58,873 fedbiomed INFO - Sampled nodes in round 0 ['node_583b7f60-63bb-4d67-9f85-66b65b24179f']
2022-03-03 23:18:58,873 fedbiomed INFO - Send message to node node_583b7f60-63bb-4d67-9f85-66b65b24179f - {'researcher_id': 'researcher_939c8feb-35dd-42ba-ae62-221123361b5f', 'job_id': 'dfa14066-9e15-40d0-9b71-db12e29e0b3a', 'training_args': {'batch_size': 48, 'lr': 0.001, 'fedprox_mu': 0.01, 'epochs': 1, 'dry_run': False, 'batch_maxnum': 100}, 'model_args': {'DP': {'C': 0.005, 'sigma': 0.005}}, 'command': 'train', 'model_url': 'http://localhost:8844/media/uploads/2022/03/03/my_model_d61dca64-7189-4092-bc21-c7847a618fad.py', 'params_url': 'http://localhost:8844/media/uploads/2022/03/03/aggregated_params_init_ece0914f-9c80-42a4-a262-ac793c83cafb.pt', 'model_class': 'MyTrainingPlan', 'training_data': {'node_583b7f60-63bb-4d67-9f85-66b65b24179f': ['dataset_2e82a5a5-a186-447a-8dc8-54d4baa018df']}}
2022-03-03 23:18:58,874 fedbiomed DEBUG - researcher_939c8feb-35dd-42ba-ae62-221123

2022-03-03 23:19:14,485 fedbiomed INFO - log from: node_583b7f60-63bb-4d67-9f85-66b65b24179f / DEBUG - [TASKS QUEUE] Item:{'researcher_id': 'researcher_939c8feb-35dd-42ba-ae62-221123361b5f', 'job_id': 'dfa14066-9e15-40d0-9b71-db12e29e0b3a', 'params_url': 'http://localhost:8844/media/uploads/2022/03/03/aggregated_params_025820ce-c142-4bae-8ac0-5652f5709609.pt', 'training_args': {'batch_size': 48, 'lr': 0.001, 'fedprox_mu': 0.01, 'epochs': 1, 'dry_run': False, 'batch_maxnum': 100}, 'training_data': {'node_583b7f60-63bb-4d67-9f85-66b65b24179f': ['dataset_2e82a5a5-a186-447a-8dc8-54d4baa018df']}, 'model_args': {'DP': {'C': 0.005, 'sigma': 0.005}}, 'model_url': 'http://localhost:8844/media/uploads/2022/03/03/my_model_d61dca64-7189-4092-bc21-c7847a618fad.py', 'model_class': 'MyTrainingPlan', 'command': 'train'}
2022-03-03 23:19:14,503 fedbiomed INFO - log from: node_583b7f60-63bb-4d67-9f85-66b65b24179f / DEBUG - upload (HTTP GET request) of file my_model_07a1c26d96224f579f13689f006adee4.py su

2022-03-03 23:19:30,254 fedbiomed INFO - log from: node_583b7f60-63bb-4d67-9f85-66b65b24179f / INFO - training with arguments {'monitor': <fedbiomed.node.history_monitor.HistoryMonitor object at 0x140741f70>, 'node_args': {'gpu': False, 'gpu_num': None, 'gpu_only': False}, 'batch_size': 48, 'lr': 0.001, 'fedprox_mu': 0.01, 'epochs': 1, 'dry_run': False, 'batch_maxnum': 100}
2022-03-03 23:19:30,255 fedbiomed INFO - log from: node_583b7f60-63bb-4d67-9f85-66b65b24179f / DEBUG - Dataset path has been set as/Users/mlorenzi/works/test
2022-03-03 23:19:30,257 fedbiomed INFO - log from: node_583b7f60-63bb-4d67-9f85-66b65b24179f / DEBUG - Using device cpu for training (cuda_available=False, gpu=False, gpu_only=False, use_gpu=False, gpu_num=None)
2022-03-03 23:19:38,333 fedbiomed INFO - log from: node_583b7f60-63bb-4d67-9f85-66b65b24179f / DEBUG - Reached 100 batches for this epoch, ignore remaining data
2022-03-03 23:19:38,335 fedbiomed INFO - log from: node_583b7f60-63bb-4d67-9f85-66b65b24179f

3

Local training results for each round and each node are available via `exp.training_replies()` (index 0 to (`rounds` - 1) ).

For example you can view the training results for the last round below.

Different timings (in seconds) are reported for each dataset of a node participating in a round :
- `rtime_training` real time (clock time) spent in the training function on the node
- `ptime_training` process time (user and system CPU) spent in the training function on the node
- `rtime_total` real time (clock time) spent in the researcher between sending the request and handling the response, at the `Job()` layer

In [10]:
print("\nList the training rounds : ", exp.training_replies().keys())

print("\nList the nodes for the last training round and their timings : ")
round_data = exp.training_replies()[rounds - 1].data()
for c in range(len(round_data)):
    print("\t- {id} :\
    \n\t\trtime_training={rtraining:.2f} seconds\
    \n\t\tptime_training={ptraining:.2f} seconds\
    \n\t\trtime_total={rtotal:.2f} seconds".format(id = round_data[c]['node_id'],
        rtraining = round_data[c]['timing']['rtime_training'],
        ptraining = round_data[c]['timing']['ptime_training'],
        rtotal = round_data[c]['timing']['rtime_total']))
print('\n')
    
exp.training_replies()[rounds - 1].dataframe()


List the training rounds :  dict_keys([0, 1, 2])

List the nodes for the last training round and their timings : 
	- node_583b7f60-63bb-4d67-9f85-66b65b24179f :    
		rtime_training=8.19 seconds    
		ptime_training=11.03 seconds    
		rtime_total=15.01 seconds




Unnamed: 0,success,msg,dataset_id,node_id,params_path,params,timing
0,True,,dataset_2e82a5a5-a186-447a-8dc8-54d4baa018df,node_583b7f60-63bb-4d67-9f85-66b65b24179f,/Users/mlorenzi/works/temp/fedbiomed/var/exper...,"{'conv1.weight': [[tensor([[ 0.1881, -0.0455, ...","{'rtime_training': 8.194219430999965, 'ptime_t..."


Federated parameters for each round are available via `exp.aggregated_params()` (index 0 to (`rounds` - 1) ).

For example you can view the federated parameters for the last round of the experiment :

In [None]:
print("\nList the training rounds : ", exp.aggregated_params().keys())

print("\nAccess the federated params for the last training round :")
print("\t- params_path: ", exp.aggregated_params()[rounds - 1]['params_path'])
print("\t- parameter data: ", exp.aggregated_params()[rounds - 1]['params'].keys())


Feel free to run other sample notebooks or try your own models :D