# Fedbiomed Researcher base example

Use for developing (autoreloads changes made across packages)

In [None]:
%load_ext autoreload
%autoreload 2

## Start the network
Before running this notebook, start the network with `./scripts/fedbiomed_run network`

## Setting the node up
It is necessary to previously configure a node:
1. `./scripts/fedbiomed_run node add`
  * Select option 2 (default) to add MNIST to the node
  * Confirm default tags by hitting "y" and ENTER
  * Pick the folder where MNIST is downloaded (this is due torch issue https://github.com/pytorch/vision/issues/3549)
  * Data must have been added (if you get a warning saying that data must be unique is because it's been already added)
  
2. Check that your data has been added by executing `./scripts/fedbiomed_run node list`
3. Run the node using `./scripts/fedbiomed_run node run`. Wait until you get `Starting task manager`. it means you are online.

## Define an experiment model and parameters"

Declare a torch.nn MyTrainingPlan class to send for training on the node

In [1]:
import torch
import torch.nn as nn
from fedbiomed.common.training_plans import TorchTrainingPlan
from fedbiomed.common.data import DataManager
from torchvision import datasets, transforms

# Here we define the model to be used. 
# You can use any class name (here 'Net')
class MyTrainingPlan(TorchTrainingPlan):
    def __init__(self, model_args: dict = {}):
        super(MyTrainingPlan, self).__init__(model_args)
        self.conv1 = nn.Conv2d(1, 32, 3, 1)
        self.conv2 = nn.Conv2d(32, 64, 3, 1)
        self.dropout1 = nn.Dropout(0.25)
        self.dropout2 = nn.Dropout(0.5)
        self.fc1 = nn.Linear(9216, 128)
        self.fc2 = nn.Linear(128, 10)
        
        # Here we define the custom dependencies that will be needed by our custom Dataloader
        # In this case, we need the torch DataLoader classes
        # Since we will train on MNIST, we need datasets and transform from torchvision
        deps = ["from torchvision import datasets, transforms"]
        
        self.add_dependency(deps)

    def forward(self, x):
        x = self.conv1(x)
        x = F.relu(x)
        x = self.conv2(x)
        x = F.relu(x)
        x = F.max_pool2d(x, 2)
        x = self.dropout1(x)
        x = torch.flatten(x, 1)
        x = self.fc1(x)
        x = F.relu(x)
        x = self.dropout2(x)
        x = self.fc2(x)
        
        
        output = F.log_softmax(x, dim=1)
        return output

    def training_data(self, batch_size = 48):
        # Custom torch Dataloader for MNIST data
        transform = transforms.Compose([transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,))])
        dataset1 = datasets.MNIST(self.dataset_path, train=True, download=False, transform=transform)
        train_kwargs = {'batch_size': batch_size, 'shuffle': True}
        return DataManager(dataset=dataset1, **train_kwargs)
    
    def training_step(self, data, target):
        output = self.forward(data)
        loss   = torch.nn.functional.nll_loss(output, target)
        return loss


This group of arguments correspond respectively:
* `model_args`: a dictionary with the arguments related to the model (e.g. number of layers, features, etc.). This will be passed to the model class on the node side.
* `training_args`: a dictionary containing the arguments for the training routine (e.g. batch size, learning rate, epochs, etc.). This will be passed to the routine on the node side.

**NOTE:** typos and/or lack of positional (required) arguments will raise error. 🤓

In [2]:
model_args = {}

training_args = {
    'batch_size': 48, 
    'lr': 1e-3, 
    'epochs': 1, 
    'dry_run': False,  
    'batch_maxnum': 100 # Fast pass for development : only use ( batch_maxnum * batch_size ) samples
}

## Declare and run the experiment

In the experiment we will declare `tensorboard` status as `True` to be able to display testing/training feddback on the Tensorboard App. 

In [3]:
from fedbiomed.researcher.experiment import Experiment
from fedbiomed.researcher.aggregators.fedavg import FedAverage

tags =  ['#MNIST', '#dataset']
rounds = 2

exp = Experiment(tags=tags,
                 model_args=model_args,
                 model_class=MyTrainingPlan,
                 training_args=training_args,
                 round_limit=rounds,
                 aggregator=FedAverage(),
                 node_selection_strategy=None,
                 tensorboard=True
                )

2022-03-22 15:03:09,842 fedbiomed INFO - Component environment:
2022-03-22 15:03:09,844 fedbiomed INFO - type = ComponentType.RESEARCHER
2022-03-22 15:03:10,096 fedbiomed INFO - Messaging researcher_1fecd236-3507-4a58-9921-8d364492a6d1 successfully connected to the message broker, object = <fedbiomed.common.messaging.Messaging object at 0x7f840f8d2190>
2022-03-22 15:03:10,128 fedbiomed INFO - Searching dataset with data tags: ['#MNIST', '#dataset'] for all nodes
2022-03-22 15:03:20,167 fedbiomed INFO - Node selected for training -> node_fa6a1655-e676-42a8-a6a5-fe2630057d46
2022-03-22 15:03:20,200 fedbiomed DEBUG - Model file has been saved: /home/scansiz/Desktop/Inria/development/fedbiomed/var/experiments/Experiment_0025/my_model_69b4088c-ec8c-4c54-ab8d-9810fcb3c1bd.py
2022-03-22 15:03:20,234 fedbiomed DEBUG - upload (HTTP POST request) of file /home/scansiz/Desktop/Inria/development/fedbiomed/var/experiments/Experiment_0025/my_model_69b4088c-ec8c-4c54-ab8d-9810fcb3c1bd.py successful, 

Let's start the experiment. During the training the logs from nodes will be come as categorized as `TRAINING` and `TESTING`. 

In [4]:
exp.run()

2022-03-22 15:03:20,488 fedbiomed INFO - Sampled nodes in round 0 ['node_fa6a1655-e676-42a8-a6a5-fe2630057d46']
2022-03-22 15:03:20,489 fedbiomed INFO - Send message to node node_fa6a1655-e676-42a8-a6a5-fe2630057d46 - {'researcher_id': 'researcher_1fecd236-3507-4a58-9921-8d364492a6d1', 'job_id': '64aa60f1-f826-44a8-97c9-a7b43307a986', 'training_args': {'batch_size': 48, 'lr': 0.001, 'epochs': 1, 'dry_run': False, 'batch_maxnum': 100}, 'model_args': {}, 'command': 'train', 'model_url': 'http://localhost:8844/media/uploads/2022/03/22/my_model_69b4088c-ec8c-4c54-ab8d-9810fcb3c1bd.py', 'params_url': 'http://localhost:8844/media/uploads/2022/03/22/aggregated_params_init_0fecda6f-5bd1-41f3-8a4b-c2139a4d3bda.pt', 'model_class': 'MyTrainingPlan', 'training_data': {'node_fa6a1655-e676-42a8-a6a5-fe2630057d46': ['dataset_9c41414b-1bf1-4783-a437-cbe7dec0ae78']}}
2022-03-22 15:03:20,490 fedbiomed DEBUG - researcher_1fecd236-3507-4a58-9921-8d364492a6d1
2022-03-22 15:03:20,603 fedbiomed INFO - log fr

2022-03-22 15:03:57,255 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_fa6a1655-e676-42a8-a6a5-fe2630057d46 
					 Epoch: 1 | Completed: 4320/48000 (9%) 
 					 Loss: [1m0.188302[0m 
					 ---------
2022-03-22 15:04:04,290 fedbiomed INFO - [1mTESTING AFTER TRAINING[0m 
					 NODE_ID: node_fa6a1655-e676-42a8-a6a5-fe2630057d46 
					 Completed: 12000/12000 (100%) 
 					 ACCURACY: [1m0.962500[0m 
					 ---------
2022-03-22 15:04:04,507 fedbiomed INFO - log from: node_fa6a1655-e676-42a8-a6a5-fe2630057d46 / INFO - results uploaded successfully 
2022-03-22 15:04:10,896 fedbiomed INFO - Downloading model params after training on node_fa6a1655-e676-42a8-a6a5-fe2630057d46 - from http://localhost:8844/media/uploads/2022/03/22/node_params_866c7ecc-5fc8-4a19-aba1-82fa04ff7ef5.pt
2022-03-22 15:04:10,943 fedbiomed DEBUG - upload (HTTP GET request) of file node_params_9ed1a1b0-18d2-4309-926b-3055dd302994.pt successful, with status code 200
2022-03-22 15:04:10,959 fedbiomed INFO - Nodes

2

### Displaying Training and Testing Metrics on Tensorboard

In [5]:
from fedbiomed.researcher.environ import environ
tensorboard_dir = environ['TENSORBOARD_RESULTS_DIR']
%load_ext tensorboard

#### Launch Tensorboard App

In [6]:
tensorboard --logdir "$tensorboard_dir"

Reusing TensorBoard on port 6006 (pid 269009), started 1 day, 3:34:35 ago. (Use '!kill 269009' to kill it.)

2022-03-22 15:05:02,996 fedbiomed INFO - log from: node_fa6a1655-e676-42a8-a6a5-fe2630057d46 / CRITICAL - Node stopped in signal_handler, probably by user decision (Ctrl C)
2022-03-22 16:07:17,422 fedbiomed INFO - log from: test_logger_node_23b043e7-84aa-4068-87f9-c8ec310d4a95 / ERROR - mqtt+console ERROR message
2022-03-22 16:07:25,535 fedbiomed INFO - log from: node_1234 / INFO - Messaging mock_researcher_XXX successfully connected to the message broker, object = <fedbiomed.common.messaging.Messaging object at 0x7fe10bb14880>
2022-03-22 16:07:27,621 fedbiomed INFO - log from: node_1234 / INFO - Controlling Models Dir
2022-03-22 16:07:27,622 fedbiomed INFO - log from: node_1234 / INFO - /tmp/_nod_/default_models
2022-03-22 16:07:29,217 fedbiomed INFO - log from: node_1234 / INFO - {'name': 'pytorch-mnist.txt', 'description': 'Default model', 'hash': '006b7adac048184fbf96f62b87d0fd491278a61079a9e32f758100f99e951f8b', 'model_path': '/tmp/_nod_/default_models/pytorch-mnist.txt', 'model_i

2022-03-22 16:07:30,210 fedbiomed INFO - log from: node_1234 / INFO - Recreating hashing for : monai-image-registration.txt 	 model_fe4dd25f-dc25-4f98-8161-4b63e22031f9
2022-03-22 16:07:30,249 fedbiomed INFO - log from: node_1234 / INFO - Recreating hashing for : sklearn-perceptron.txt 	 model_5eb7a903-1fd9-4776-bede-a804c868e07f
2022-03-22 16:07:30,253 fedbiomed INFO - log from: node_1234 / INFO - Recreating hashing for : pytorch-celaba.txt 	 model_cc1424d0-6fa6-4877-940d-eb177d4bdd8c
2022-03-22 16:07:30,285 fedbiomed INFO - log from: node_1234 / INFO - Recreating hashing for : pytorch-usedcars.txt 	 model_c1ab805b-7f72-4f02-bc44-9f6b0fa78472
2022-03-22 16:07:30,307 fedbiomed INFO - log from: node_1234 / INFO - Recreating hashing for : variational-autoencoder.txt 	 model_d16c4e74-22e9-46c1-a1ed-febadb8c8e01
2022-03-22 16:07:30,451 fedbiomed INFO - log from: node_1234 / INFO - Recreating hashing for : pytorch-csv.txt 	 model_f2e4b95c-c532-4788-9c96-cfb05f854e0c
2022-03-22 16:07:30,473 

2022-03-22 16:07:31,590 fedbiomed INFO - log from: node_1234 / INFO - Removed default model file has been detected, it will be removed from DB as well: test-model
2022-03-22 16:07:31,765 fedbiomed INFO - log from: node_1234 / INFO - Modified default model file has been detected. Hashing will be updated for: monai-image-classification.txt
2022-03-22 16:07:31,793 fedbiomed INFO - log from: node_1234 / INFO - Modified default model file has been detected. Hashing will be updated for: sklearn-sgdregressor.txt
2022-03-22 16:07:31,815 fedbiomed INFO - log from: node_1234 / INFO - Modified default model file has been detected. Hashing will be updated for: pytorch-mnist.txt
2022-03-22 16:07:31,837 fedbiomed INFO - log from: node_1234 / INFO - Modified default model file has been detected. Hashing will be updated for: pytorch-mnist-opacus-dp.txt
2022-03-22 16:07:31,881 fedbiomed INFO - log from: node_1234 / INFO - Modified default model file has been detected. Hashing will be updated for: pytor