# Fedbiomed Researcher Listing Datasets and Selecting Particular Nodes

Use for developing (autoreloads changes made across packages)

In [None]:
%load_ext autoreload
%autoreload 2

## Setting the client up
It is necessary to previously configure multiple node:
1. `./scripts/fedbiomed_run node config config-n1.ini add`
  * Select option 2 (default) to add MNIST to the client
  * Confirm default tags by hitting "y" and ENTER
  * Pick the folder where MNIST is downloaded (this is due torch issue https://github.com/pytorch/vision/issues/3549)
  * Data must have been added (if you get a warning saying that data must be unique is because it's been already added)
  * Start node with `./scripts/fedbiomed_run node config config-n1.ini start`
  
2. Add data to seconda node: 
    * Open new terminal create new node by indicating the MNIST dataset that you already dowloaded
    `./scripts/fedbiomed_run node config config-n2.ini --add-mnist path/to/your/mnist/data`
    * Start node: `./scripts/fedbiomed_run node config config-n2.ini start`
3. Add a third node by following same instructions of step 2.  

## Create a Model and an Experiment

Declare a torch.nn MyTrainingPlan class to send for training on the node

In [1]:
from fedbiomed.researcher.environ import TMP_DIR
import tempfile
tmp_dir_model = tempfile.TemporaryDirectory(dir=TMP_DIR+'/')
model_file = tmp_dir_model.name + '/class_export_mnist.py'

Note : write **only** the code to export in the following cell

In [2]:
%%writefile "$model_file"

import torch
import torch.nn as nn
from fedbiomed.common.torchnn import TorchTrainingPlan
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

# Here we define the model to be used. 
# You can use any class name (here 'Net')
class MyTrainingPlan(TorchTrainingPlan):
    def __init__(self):
        super(MyTrainingPlan, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, 3, 1)
        self.conv2 = nn.Conv2d(32, 64, 3, 1)
        self.dropout1 = nn.Dropout(0.25)
        self.dropout2 = nn.Dropout(0.5)
        self.fc1 = nn.Linear(9216, 128)
        self.fc2 = nn.Linear(128, 10)
        
        # Here we define the custom dependencies that will be needed by our custom Dataloader
        # In this case, we need the torch DataLoader classes
        # Since we will train on MNIST, we need datasets and transform from torchvision
        deps = ["from torchvision import datasets, transforms",
               "from torch.utils.data import DataLoader"]
        self.add_dependency(deps)

    def forward(self, x):
        x = self.conv1(x)
        x = F.relu(x)
        x = self.conv2(x)
        x = F.relu(x)
        x = F.max_pool2d(x, 2)
        x = self.dropout1(x)
        x = torch.flatten(x, 1)
        x = self.fc1(x)
        x = F.relu(x)
        x = self.dropout2(x)
        x = self.fc2(x)
        output = F.log_softmax(x, dim=1)
        return output

    def training_data(self, batch_size = 48):
        # Custom torch Dataloader for MNIST data
        transform = transforms.Compose([transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,))])
        dataset1 = datasets.MNIST(self.dataset_path, train=True, download=False, transform=transform)
        train_kwargs = {'batch_size': batch_size, 'shuffle': True}
        data_loader = torch.utils.data.DataLoader(dataset1, **train_kwargs)
        return data_loader
    
    def training_step(self, data, target):
        output = self.forward(data)
        loss   = torch.nn.functional.nll_loss(output, target)
        return loss


Writing /home/scansiz/Desktop/Inria/development/fedbiomed/var/tmp/tmp31ushc4w/class_export_mnist.py


### List Dataset Available in Nodes

You can easly list dataset located in online nodes using `list()` method of `Request` class. 

**Arguments**
 `verbose` : Prints list of datasets in table format 
 `client`  : Array includes client ids. Gets list of dataset only given client ides  
 
 

In [4]:
from fedbiomed.researcher.requests import Requests

req = Requests()
datasets = req.list(verbose=True)


2021-10-15 16:39:27,054 fedbiomed INFO - Messaging researcher_8e2ef743-72d6-49da-a496-b4ca24495c50 successfully connected to the message broker, object = <fedbiomed.common.messaging.Messaging object at 0x7f73c8575880>
2021-10-15 16:39:27,086 fedbiomed INFO - Listing avaialbe dataset in nodes: 
2021-10-15 16:39:27,088 fedbiomed INFO - log from: client_f55bd856-ff6c-4fd0-b3d1-50910564ff4b - DEBUG Message received: {'researcher_id': 'researcher_8e2ef743-72d6-49da-a496-b4ca24495c50', 'command': 'list'}
2021-10-15 16:39:27,089 fedbiomed INFO - log from: client_9c1defaa-9967-4919-9277-7a6ccffc19f2 - DEBUG Message received: {'researcher_id': 'researcher_8e2ef743-72d6-49da-a496-b4ca24495c50', 'command': 'list'}
2021-10-15 16:39:27,089 fedbiomed INFO - log from: client_40369a72-c962-455d-ba8a-a88fb7d10154 - DEBUG Message received: {'researcher_id': 'researcher_8e2ef743-72d6-49da-a496-b4ca24495c50', 'command': 'list'}
2021-10-15 16:39:27,090 fedbiomed INFO - log from: client_32f6445c-b1fa-4eb4-8


 Node: client_40369a72-c962-455d-ba8a-a88fb7d10154 | Number of Datasets: 1
+--------+-------------+------------------------+----------------+--------------------+
| name   | data_type   | tags                   | description    | shape              |
| MNIST  | default     | ['#MNIST', '#dataset'] | MNIST database | [60000, 1, 28, 28] |
+--------+-------------+------------------------+----------------+--------------------+

 Node: client_f55bd856-ff6c-4fd0-b3d1-50910564ff4b | Number of Datasets: 0
 No data has been set up for this node.

 Node: client_9c1defaa-9967-4919-9277-7a6ccffc19f2 | Number of Datasets: 1
+--------+-------------+------------------------+----------------+--------------------+
| name   | data_type   | tags                   | description    | shape              |
| MNIST  | default     | ['#MNIST', '#dataset'] | MNIST database | [60000, 1, 28, 28] |
+--------+-------------+------------------------+----------------+--------------------+

 Node: client_32f6445c-b1fa

You can also access these information from return object of `list()` method. 

In [8]:
print('Datasets -----------------------------  ')
print(datasets)
print('Node ids -----------------------------  ')
print(datasets.keys())


Datasets -----------------------------  
{'client_40369a72-c962-455d-ba8a-a88fb7d10154': [{'name': 'MNIST', 'data_type': 'default', 'tags': ['#MNIST', '#dataset'], 'description': 'MNIST database', 'shape': [60000, 1, 28, 28]}], 'client_f55bd856-ff6c-4fd0-b3d1-50910564ff4b': [], 'client_9c1defaa-9967-4919-9277-7a6ccffc19f2': [{'name': 'MNIST', 'data_type': 'default', 'tags': ['#MNIST', '#dataset'], 'description': 'MNIST database', 'shape': [60000, 1, 28, 28]}], 'client_32f6445c-b1fa-4eb4-845a-5102418a9165': [{'name': 'MNIST', 'data_type': 'default', 'tags': ['#MNIST', '#dataset'], 'description': 'MNIST database', 'shape': [60000, 1, 28, 28]}]}
Node ids -----------------------------  
dict_keys(['client_40369a72-c962-455d-ba8a-a88fb7d10154', 'client_f55bd856-ff6c-4fd0-b3d1-50910564ff4b', 'client_9c1defaa-9967-4919-9277-7a6ccffc19f2', 'client_32f6445c-b1fa-4eb4-845a-5102418a9165'])


You can create a list that contains nodes ids that you want run your experiment. After that you need to initialize your Experiment with the node (client) id list. 

In [9]:
# WARNING: Please change values based on your listing result
clients = ['client_40369a72-c962-455d-ba8a-a88fb7d10154', 'client_9c1defaa-9967-4919-9277-7a6ccffc19f2']

This group of arguments correspond respectively:
* `model_args`: a dictionary with the arguments related to the model (e.g. number of layers, features, etc.). This will be passed to the model class on the client side.
* `training_args`: a dictionary containing the arguments for the training routine (e.g. batch size, learning rate, epochs, etc.). This will be passed to the routine on the client side.

**NOTE:** typos and/or lack of positional (required) arguments will raise error. 🤓

In [10]:
model_args = {}

training_args = {
    'batch_size': 48, 
    'lr': 1e-3, 
    'epochs': 1, 
    'dry_run': False,  
    'batch_maxnum': 100 # Fast pass for development : only use ( batch_maxnum * batch_size ) samples
}

Define an experiment
- search nodes serving data for these `tags`, optionally filter on a list of client ID with `clients`
- run a round of local training on nodes with model defined in `model_path` + federation with `aggregator`
- run for `rounds` rounds, applying the `client_selection_strategy` between the rounds

In [11]:
from fedbiomed.researcher.experiment import Experiment
from fedbiomed.researcher.aggregators.fedavg import FedAverage

tags =  ['#MNIST', '#dataset']
rounds = 2

exp = Experiment(tags=tags,
                 clients=clients,
                 model_path=model_file,
                 model_args=model_args,
                 model_class='MyTrainingPlan',
                 training_args=training_args,
                 rounds=rounds,
                 aggregator=FedAverage(),
                 client_selection_strategy=None)

2021-10-15 16:50:12,611 fedbiomed INFO - Searching for clients with data tags: ['#MNIST', '#dataset']
2021-10-15 16:50:12,614 fedbiomed INFO - log from: client_f55bd856-ff6c-4fd0-b3d1-50910564ff4b - DEBUG Message received: {'researcher_id': 'researcher_8e2ef743-72d6-49da-a496-b4ca24495c50', 'tags': ['#MNIST', '#dataset'], 'command': 'search'}
2021-10-15 16:50:12,616 fedbiomed INFO - log from: client_9c1defaa-9967-4919-9277-7a6ccffc19f2 - DEBUG Message received: {'researcher_id': 'researcher_8e2ef743-72d6-49da-a496-b4ca24495c50', 'tags': ['#MNIST', '#dataset'], 'command': 'search'}
2021-10-15 16:50:12,617 fedbiomed INFO - log from: client_32f6445c-b1fa-4eb4-845a-5102418a9165 - DEBUG Message received: {'researcher_id': 'researcher_8e2ef743-72d6-49da-a496-b4ca24495c50', 'tags': ['#MNIST', '#dataset'], 'command': 'search'}
2021-10-15 16:50:12,617 fedbiomed INFO - log from: client_40369a72-c962-455d-ba8a-a88fb7d10154 - DEBUG Message received: {'researcher_id': 'researcher_8e2ef743-72d6-49da

Let's start the experiment.

By default, this function doesn't stop until all the `rounds` are done for all the clients

In [12]:
exp.run()

2021-10-15 16:50:27,290 fedbiomed INFO - Sampled clients in round 0 ['client_9c1defaa-9967-4919-9277-7a6ccffc19f2', 'client_40369a72-c962-455d-ba8a-a88fb7d10154']
2021-10-15 16:50:27,293 fedbiomed INFO - Send message to client client_9c1defaa-9967-4919-9277-7a6ccffc19f2 - {'researcher_id': 'researcher_8e2ef743-72d6-49da-a496-b4ca24495c50', 'job_id': 'ef13c92d-743c-4343-8c8a-ed67b3348945', 'training_args': {'batch_size': 48, 'lr': 0.001, 'epochs': 1, 'dry_run': False, 'batch_maxnum': 100}, 'model_args': {}, 'command': 'train', 'model_url': 'http://localhost:8844/media/uploads/2021/10/15/my_model_253c24ea-e357-4376-8f4e-607b6f3cf532.py', 'params_url': 'http://localhost:8844/media/uploads/2021/10/15/my_model_d36ad93c-14e9-474f-989a-f6587a0889b7.pt', 'model_class': 'MyTrainingPlan', 'training_data': {'client_9c1defaa-9967-4919-9277-7a6ccffc19f2': ['dataset_2f7087c6-61e7-4219-8765-7dde9a58363c']}}
2021-10-15 16:50:27,295 fedbiomed DEBUG - researcher_8e2ef743-72d6-49da-a496-b4ca24495c50
2021

2021-10-15 16:50:36,931 fedbiomed INFO - log from: client_40369a72-c962-455d-ba8a-a88fb7d10154 - DEBUG Reached 100 batches for this epoch, ignore remaining data
2021-10-15 16:50:37,020 fedbiomed INFO - log from: client_9c1defaa-9967-4919-9277-7a6ccffc19f2 - DEBUG Reached 100 batches for this epoch, ignore remaining data
2021-10-15 16:50:37,170 fedbiomed INFO - log from: client_40369a72-c962-455d-ba8a-a88fb7d10154 - INFO results uploaded successfully 
2021-10-15 16:50:37,188 fedbiomed INFO - log from: client_9c1defaa-9967-4919-9277-7a6ccffc19f2 - INFO results uploaded successfully 
2021-10-15 16:50:42,344 fedbiomed INFO - Downloading model params after training on client_40369a72-c962-455d-ba8a-a88fb7d10154 - from http://localhost:8844/media/uploads/2021/10/15/node_params_0fb68f49-390b-4dcc-805f-91452862fc36.pt
2021-10-15 16:50:42,402 fedbiomed INFO - Downloading model params after training on client_9c1defaa-9967-4919-9277-7a6ccffc19f2 - from http://localhost:8844/media/uploads/2021/10

2021-10-15 16:50:51,519 fedbiomed INFO - log from: client_9c1defaa-9967-4919-9277-7a6ccffc19f2 - DEBUG Reached 100 batches for this epoch, ignore remaining data
2021-10-15 16:50:51,724 fedbiomed INFO - log from: client_9c1defaa-9967-4919-9277-7a6ccffc19f2 - INFO results uploaded successfully 
2021-10-15 16:50:52,094 fedbiomed INFO - log from: client_40369a72-c962-455d-ba8a-a88fb7d10154 - DEBUG Reached 100 batches for this epoch, ignore remaining data
2021-10-15 16:50:52,259 fedbiomed INFO - log from: client_40369a72-c962-455d-ba8a-a88fb7d10154 - INFO results uploaded successfully 
2021-10-15 16:50:57,641 fedbiomed INFO - Downloading model params after training on client_9c1defaa-9967-4919-9277-7a6ccffc19f2 - from http://localhost:8844/media/uploads/2021/10/15/node_params_05f1e5c9-ee0f-4140-9251-6a65f2273416.pt
2021-10-15 16:50:57,704 fedbiomed INFO - Downloading model params after training on client_40369a72-c962-455d-ba8a-a88fb7d10154 - from http://localhost:8844/media/uploads/2021/10

Local training results for each round and each node are available in `exp.training_replies` (index 0 to (`rounds` - 1) ).

For example you can view the training results for the last round below.

Different timings (in seconds) are reported for each dataset of a node participating in a round :
- `rtime_training` real time (clock time) spent in the training function on the node
- `ptime_training` process time (user and system CPU) spent in the training function on the node
- `rtime_total` real time (clock time) spent in the researcher between sending the request and handling the response, at the `Job()` layer

In [None]:
print("\nList the training rounds : ", exp.training_replies.keys())

print("\nList the clients for the last training round and their timings : ")
round_data = exp.training_replies[rounds - 1].data
for c in range(len(round_data)):
    print("\t- {id} :\
    \n\t\trtime_training={rtraining:.2f} seconds\
    \n\t\tptime_training={ptraining:.2f} seconds\
    \n\t\trtime_total={rtotal:.2f} seconds".format(id = round_data[c]['client_id'],
        rtraining = round_data[c]['timing']['rtime_training'],
        ptraining = round_data[c]['timing']['ptime_training'],
        rtotal = round_data[c]['timing']['rtime_total']))
print('\n')
    
exp.training_replies[rounds - 1].dataframe

Federated parameters for each round are available in `exp.aggregated_params` (index 0 to (`rounds` - 1) ).

For example you can view the federated parameters for the last round of the experiment :

In [None]:
print("\nList the training rounds : ", exp.aggregated_params.keys())

print("\nAccess the federated params for the last training round :")
print("\t- params_path: ", exp.aggregated_params[rounds - 1]['params_path'])
print("\t- parameter data: ", exp.aggregated_params[rounds - 1]['params'].keys())


## Optional : searching the data

In [None]:
from fedbiomed.researcher.requests import Requests

r = Requests()
data = r.search(tags)

import pandas as pd
for client_id in data.keys():
    print('\n','Data for ', client_id, '\n\n', pd.DataFrame(data[client_id]))

## Optional : clean file repository (do not run unless necessary)
Clean all the files in the repo via the rest API.

In [None]:
# import requests
# from fedbiomed.researcher.environ import UPLOADS_URL

# uploaded_models = requests.get(UPLOADS_URL).json()
# for m in uploaded_models:
#   requests.delete(m['url'])

Feel free to try your own models :D