# Fed-BioMed Researcher base example

Use for developing (autoreloads changes made across packages)

In [None]:
%load_ext autoreload
%autoreload 2

## Start the network
Before running this notebook, start the network with `./scripts/fedbiomed_run network`

## Setting the node up
It is necessary to previously configure a node:
1. `./scripts/fedbiomed_run node add`
  * Select option 2 (default) to add MNIST to the node
  * Confirm default tags by hitting "y" and ENTER
  * Pick the folder where MNIST is downloaded (this is due to a pytorch issue https://github.com/pytorch/vision/issues/3549)
  * Data must have been added (if you get a warning saying that data must be unique is because it's been already added)
  
2. Check that your data has been added by executing `./scripts/fedbiomed_run node list`
3. Run the node using `./scripts/fedbiomed_run node start`. Wait until you get `Starting task manager`. it means you are online.

## Define an experiment model and parameters"

Declare a torch training plan MyTrainingPlan class to send for training on the node

In [1]:
import torch
import torch.nn as nn
from fedbiomed.common.training_plans import TorchTrainingPlan
from fedbiomed.common.data import DataManager
from torchvision import datasets, transforms


# Here we define the model to be used. 
# You can use any class name (here 'Net')
class MyTrainingPlan(TorchTrainingPlan):
    
    # Defines and return model 
    def init_model(self, model_args):
        return self.Net(model_args = model_args)
    
    # Defines and return optimizer
    def init_optimizer(self, optimizer_args):
        return torch.optim.Adam(self.model().parameters(), lr = optimizer_args["lr"])
    
    # Declares and return dependencies
    def init_dependencies(self):
        deps = ["from torchvision import datasets, transforms"]
        return deps
    
    class Net(nn.Module):
        def __init__(self, model_args):
            super().__init__()
            self.conv1 = nn.Conv2d(1, 32, 3, 1)
            self.conv2 = nn.Conv2d(32, 64, 3, 1)
            self.dropout1 = nn.Dropout(0.25)
            self.dropout2 = nn.Dropout(0.5)
            self.fc1 = nn.Linear(9216, 128)
            self.fc2 = nn.Linear(128, 10)

        def forward(self, x):
            x = self.conv1(x)
            x = F.relu(x)
            x = self.conv2(x)
            x = F.relu(x)
            x = F.max_pool2d(x, 2)
            x = self.dropout1(x)
            x = torch.flatten(x, 1)
            x = self.fc1(x)
            x = F.relu(x)
            x = self.dropout2(x)
            x = self.fc2(x)


            output = F.log_softmax(x, dim=1)
            return output

    def training_data(self):
        # Custom torch Dataloader for MNIST data
        transform = transforms.Compose([transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,))])
        dataset1 = datasets.MNIST(self.dataset_path, train=True, download=False, transform=transform)
        train_kwargs = { 'shuffle': True}
        return DataManager(dataset=dataset1, **train_kwargs)
    
    def training_step(self, data, target):
        output = self.model().forward(data)
        loss   = torch.nn.functional.nll_loss(output, target)
        return loss


This group of arguments correspond respectively:
* `model_args`: a dictionary with the arguments related to the model (e.g. number of layers, features, etc.). This will be passed to the model class on the node side.
* `training_args`: a dictionary containing the arguments for the training routine (e.g. batch size, learning rate, epochs, etc.). This will be passed to the routine on the node side.

**NOTE:** typos and/or lack of positional (required) arguments will raise error. 🤓

In [2]:
model_args = {}

training_args = {
    'loader_args': { 'batch_size': 48, }, 
    'optimizer_args': {
        "lr" : 1e-3
    },
    'epochs': 1, 
    'dry_run': False,  
    'batch_maxnum': 100 # Fast pass for development : only use ( batch_maxnum * batch_size ) samples
}

## Declare and run the experiment

- search nodes serving data for these `tags`, optionally filter on a list of node ID with `nodes`
- run a round of local training on nodes with model defined in `model_path` + federation with `aggregator`
- run for `round_limit` rounds, applying the `node_selection_strategy` between the rounds

In [5]:
from fedbiomed.researcher.experiment import Experiment
from fedbiomed.researcher.aggregators.fedavg import FedAverage

tags =  ['#MNIST', '#dataset']
rounds = 2

exp = Experiment(tags=tags,
                 model_args=model_args,
                 training_plan_class=MyTrainingPlan,
                 training_args=training_args,
                 round_limit=rounds,
                 aggregator=FedAverage(),
                 node_selection_strategy=None)

2023-09-01 17:45:38,215 fedbiomed INFO - Searching dataset with data tags: ['#MNIST', '#dataset'] for all nodes
2023-09-01 17:45:38,221 fedbiomed INFO - Received request form node_41533df5-d07b-4027-a826-d1f67410d627
2023-09-01 17:45:38,222 fedbiomed INFO - Node agent created node_41533df5-d07b-4027-a826-d1f67410d627
2023-09-01 17:45:38,222 fedbiomed INFO - Waiting for tasks
2023-09-01 17:45:38,224 fedbiomed INFO - {'protocol_version': '1', 'researcher_id': 'researcher_b69d4946-1045-4d62-bd6a-150907516a12', 'success': True, 'databases': [{'name': 'MNIST', 'data_type': 'default', 'tags': ['#MNIST', '#dataset'], 'description': 'MNIST database', 'shape': [60000, 1, 28, 28], 'dataset_id': 'dataset_3013e72d-de68-42a6-bacf-adf1ab3af98b', 'dtypes': [], 'dataset_parameters': None}], 'node_id': 'node_41533df5-d07b-4027-a826-d1f67410d627', 'count': 1, 'command': 'search'}


Sending request!
Received reply!!!!
Printing the class
<class 'fedbiomed.common.message.TaskRequest'>
<class 'fedbiomed.common.message.TaskRequest'>
Printing_dict
{'node': 'node_41533df5-d07b-4027-a826-d1f67410d627', 'protocol_version': '1'}
{'protocol_version': '1', 'node': 'node_41533df5-d07b-4027-a826-d1f67410d627'}
<class 'dict'>
Executing on message


2023-09-01 17:45:48,223 fedbiomed INFO - Node selected for training -> node_41533df5-d07b-4027-a826-d1f67410d627
Secure RNG turned off. This is perfectly fine for experimentation as it allows for much faster training performance, but remember to turn it on and retrain one last time before production with ``secure_mode`` turned on.


Let's start the experiment.

By default, this function doesn't stop until all the `round_limit` rounds are done for all the nodes

In [6]:
exp.run()

2023-09-01 17:45:48,241 fedbiomed INFO - Sampled nodes in round 0 ['node_41533df5-d07b-4027-a826-d1f67410d627']
2023-09-01 17:45:48,245 fedbiomed INFO - [1mSending request[0m 
					[1m To[0m: node_41533df5-d07b-4027-a826-d1f67410d627 
					[1m Request: [0m: TRAIN
 -----------------------------------------------------------------
2023-09-01 17:45:48,328 fedbiomed INFO - Received request form node_41533df5-d07b-4027-a826-d1f67410d627
2023-09-01 17:45:48,329 fedbiomed INFO - Node agent created node_41533df5-d07b-4027-a826-d1f67410d627
2023-09-01 17:45:48,330 fedbiomed INFO - Waiting for tasks


Sending request!
Printing the class
<class 'fedbiomed.common.message.TaskRequest'>
<class 'fedbiomed.common.message.TaskRequest'>
Printing_dict
{'node': 'node_41533df5-d07b-4027-a826-d1f67410d627', 'protocol_version': '1'}
{'protocol_version': '1', 'node': 'node_41533df5-d07b-4027-a826-d1f67410d627'}
<class 'dict'>


2023-09-01 17:45:48,563 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_41533df5-d07b-4027-a826-d1f67410d627 
					 Round 1 Epoch: 1 | Iteration: 1/100 (1%) | Samples: 48/4800
 					 Loss: [1m2.288929[0m 
					 ---------


protocol_version: "1"
scalar {
  researcher_id: "researcher_b69d4946-1045-4d62-bd6a-150907516a12"
  node_id: "node_41533df5-d07b-4027-a826-d1f67410d627"
  job_id: "efb42ef3-6485-4797-8fd9-9517eeeb2da2"
  train: true
  metric {
    key: "Loss"
    value: 2.28892875
  }
  epoch: 1
  total_samples: 4800
  batch_samples: 48
  num_batches: 100
  num_samples_trained: 48
  iteration: 1
}

Printing the class
<class 'fedbiomed.common.message.FeedbackMessage'>
researcher_id: "researcher_b69d4946-1045-4d62-bd6a-150907516a12"
node_id: "node_41533df5-d07b-4027-a826-d1f67410d627"
job_id: "efb42ef3-6485-4797-8fd9-9517eeeb2da2"
train: true
metric {
  key: "Loss"
  value: 2.28892875
}
epoch: 1
total_samples: 4800
batch_samples: 48
num_batches: 100
num_samples_trained: 48
iteration: 1

Printing conversion
Printing the class
<class 'fedbiomed.common.message.Scalar'>
<class 'fedbiomed.common.message.Scalar'>
Printing_dict
{'researcher_id': 'researcher_b69d4946-1045-4d62-bd6a-150907516a12', 'node_id': 'nod

2023-09-01 17:45:49,257 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_41533df5-d07b-4027-a826-d1f67410d627 
					 Round 1 Epoch: 1 | Iteration: 10/100 (10%) | Samples: 480/4800
 					 Loss: [1m1.646553[0m 
					 ---------


protocol_version: "1"
scalar {
  researcher_id: "researcher_b69d4946-1045-4d62-bd6a-150907516a12"
  node_id: "node_41533df5-d07b-4027-a826-d1f67410d627"
  job_id: "efb42ef3-6485-4797-8fd9-9517eeeb2da2"
  train: true
  metric {
    key: "Loss"
    value: 1.6465534
  }
  epoch: 1
  total_samples: 4800
  batch_samples: 48
  num_batches: 100
  num_samples_trained: 480
  iteration: 10
}

Printing the class
<class 'fedbiomed.common.message.FeedbackMessage'>
researcher_id: "researcher_b69d4946-1045-4d62-bd6a-150907516a12"
node_id: "node_41533df5-d07b-4027-a826-d1f67410d627"
job_id: "efb42ef3-6485-4797-8fd9-9517eeeb2da2"
train: true
metric {
  key: "Loss"
  value: 1.6465534
}
epoch: 1
total_samples: 4800
batch_samples: 48
num_batches: 100
num_samples_trained: 480
iteration: 10

Printing conversion
Printing the class
<class 'fedbiomed.common.message.Scalar'>
<class 'fedbiomed.common.message.Scalar'>
Printing_dict
{'researcher_id': 'researcher_b69d4946-1045-4d62-bd6a-150907516a12', 'node_id': 'n

2023-09-01 17:45:50,035 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_41533df5-d07b-4027-a826-d1f67410d627 
					 Round 1 Epoch: 1 | Iteration: 20/100 (20%) | Samples: 960/4800
 					 Loss: [1m0.975171[0m 
					 ---------


protocol_version: "1"
scalar {
  researcher_id: "researcher_b69d4946-1045-4d62-bd6a-150907516a12"
  node_id: "node_41533df5-d07b-4027-a826-d1f67410d627"
  job_id: "efb42ef3-6485-4797-8fd9-9517eeeb2da2"
  train: true
  metric {
    key: "Loss"
    value: 0.975170612
  }
  epoch: 1
  total_samples: 4800
  batch_samples: 48
  num_batches: 100
  num_samples_trained: 960
  iteration: 20
}

Printing the class
<class 'fedbiomed.common.message.FeedbackMessage'>
researcher_id: "researcher_b69d4946-1045-4d62-bd6a-150907516a12"
node_id: "node_41533df5-d07b-4027-a826-d1f67410d627"
job_id: "efb42ef3-6485-4797-8fd9-9517eeeb2da2"
train: true
metric {
  key: "Loss"
  value: 0.975170612
}
epoch: 1
total_samples: 4800
batch_samples: 48
num_batches: 100
num_samples_trained: 960
iteration: 20

Printing conversion
Printing the class
<class 'fedbiomed.common.message.Scalar'>
<class 'fedbiomed.common.message.Scalar'>
Printing_dict
{'researcher_id': 'researcher_b69d4946-1045-4d62-bd6a-150907516a12', 'node_id'

2023-09-01 17:45:50,757 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_41533df5-d07b-4027-a826-d1f67410d627 
					 Round 1 Epoch: 1 | Iteration: 30/100 (30%) | Samples: 1440/4800
 					 Loss: [1m0.962453[0m 
					 ---------


protocol_version: "1"
scalar {
  researcher_id: "researcher_b69d4946-1045-4d62-bd6a-150907516a12"
  node_id: "node_41533df5-d07b-4027-a826-d1f67410d627"
  job_id: "efb42ef3-6485-4797-8fd9-9517eeeb2da2"
  train: true
  metric {
    key: "Loss"
    value: 0.96245259
  }
  epoch: 1
  total_samples: 4800
  batch_samples: 48
  num_batches: 100
  num_samples_trained: 1440
  iteration: 30
}

Printing the class
<class 'fedbiomed.common.message.FeedbackMessage'>
researcher_id: "researcher_b69d4946-1045-4d62-bd6a-150907516a12"
node_id: "node_41533df5-d07b-4027-a826-d1f67410d627"
job_id: "efb42ef3-6485-4797-8fd9-9517eeeb2da2"
train: true
metric {
  key: "Loss"
  value: 0.96245259
}
epoch: 1
total_samples: 4800
batch_samples: 48
num_batches: 100
num_samples_trained: 1440
iteration: 30

Printing conversion
Printing the class
<class 'fedbiomed.common.message.Scalar'>
<class 'fedbiomed.common.message.Scalar'>
Printing_dict
{'researcher_id': 'researcher_b69d4946-1045-4d62-bd6a-150907516a12', 'node_id'

2023-09-01 17:45:51,620 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_41533df5-d07b-4027-a826-d1f67410d627 
					 Round 1 Epoch: 1 | Iteration: 40/100 (40%) | Samples: 1920/4800
 					 Loss: [1m0.756580[0m 
					 ---------


protocol_version: "1"
scalar {
  researcher_id: "researcher_b69d4946-1045-4d62-bd6a-150907516a12"
  node_id: "node_41533df5-d07b-4027-a826-d1f67410d627"
  job_id: "efb42ef3-6485-4797-8fd9-9517eeeb2da2"
  train: true
  metric {
    key: "Loss"
    value: 0.756579697
  }
  epoch: 1
  total_samples: 4800
  batch_samples: 48
  num_batches: 100
  num_samples_trained: 1920
  iteration: 40
}

Printing the class
<class 'fedbiomed.common.message.FeedbackMessage'>
researcher_id: "researcher_b69d4946-1045-4d62-bd6a-150907516a12"
node_id: "node_41533df5-d07b-4027-a826-d1f67410d627"
job_id: "efb42ef3-6485-4797-8fd9-9517eeeb2da2"
train: true
metric {
  key: "Loss"
  value: 0.756579697
}
epoch: 1
total_samples: 4800
batch_samples: 48
num_batches: 100
num_samples_trained: 1920
iteration: 40

Printing conversion
Printing the class
<class 'fedbiomed.common.message.Scalar'>
<class 'fedbiomed.common.message.Scalar'>
Printing_dict
{'researcher_id': 'researcher_b69d4946-1045-4d62-bd6a-150907516a12', 'node_i

2023-09-01 17:45:52,468 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_41533df5-d07b-4027-a826-d1f67410d627 
					 Round 1 Epoch: 1 | Iteration: 50/100 (50%) | Samples: 2400/4800
 					 Loss: [1m1.224749[0m 
					 ---------


protocol_version: "1"
scalar {
  researcher_id: "researcher_b69d4946-1045-4d62-bd6a-150907516a12"
  node_id: "node_41533df5-d07b-4027-a826-d1f67410d627"
  job_id: "efb42ef3-6485-4797-8fd9-9517eeeb2da2"
  train: true
  metric {
    key: "Loss"
    value: 1.22474897
  }
  epoch: 1
  total_samples: 4800
  batch_samples: 48
  num_batches: 100
  num_samples_trained: 2400
  iteration: 50
}

Printing the class
<class 'fedbiomed.common.message.FeedbackMessage'>
researcher_id: "researcher_b69d4946-1045-4d62-bd6a-150907516a12"
node_id: "node_41533df5-d07b-4027-a826-d1f67410d627"
job_id: "efb42ef3-6485-4797-8fd9-9517eeeb2da2"
train: true
metric {
  key: "Loss"
  value: 1.22474897
}
epoch: 1
total_samples: 4800
batch_samples: 48
num_batches: 100
num_samples_trained: 2400
iteration: 50

Printing conversion
Printing the class
<class 'fedbiomed.common.message.Scalar'>
<class 'fedbiomed.common.message.Scalar'>
Printing_dict
{'researcher_id': 'researcher_b69d4946-1045-4d62-bd6a-150907516a12', 'node_id'

2023-09-01 17:45:53,201 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_41533df5-d07b-4027-a826-d1f67410d627 
					 Round 1 Epoch: 1 | Iteration: 60/100 (60%) | Samples: 2880/4800
 					 Loss: [1m0.346564[0m 
					 ---------


protocol_version: "1"
scalar {
  researcher_id: "researcher_b69d4946-1045-4d62-bd6a-150907516a12"
  node_id: "node_41533df5-d07b-4027-a826-d1f67410d627"
  job_id: "efb42ef3-6485-4797-8fd9-9517eeeb2da2"
  train: true
  metric {
    key: "Loss"
    value: 0.346563548
  }
  epoch: 1
  total_samples: 4800
  batch_samples: 48
  num_batches: 100
  num_samples_trained: 2880
  iteration: 60
}

Printing the class
<class 'fedbiomed.common.message.FeedbackMessage'>
researcher_id: "researcher_b69d4946-1045-4d62-bd6a-150907516a12"
node_id: "node_41533df5-d07b-4027-a826-d1f67410d627"
job_id: "efb42ef3-6485-4797-8fd9-9517eeeb2da2"
train: true
metric {
  key: "Loss"
  value: 0.346563548
}
epoch: 1
total_samples: 4800
batch_samples: 48
num_batches: 100
num_samples_trained: 2880
iteration: 60

Printing conversion
Printing the class
<class 'fedbiomed.common.message.Scalar'>
<class 'fedbiomed.common.message.Scalar'>
Printing_dict
{'researcher_id': 'researcher_b69d4946-1045-4d62-bd6a-150907516a12', 'node_i

2023-09-01 17:45:53,905 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_41533df5-d07b-4027-a826-d1f67410d627 
					 Round 1 Epoch: 1 | Iteration: 70/100 (70%) | Samples: 3360/4800
 					 Loss: [1m0.469229[0m 
					 ---------


protocol_version: "1"
scalar {
  researcher_id: "researcher_b69d4946-1045-4d62-bd6a-150907516a12"
  node_id: "node_41533df5-d07b-4027-a826-d1f67410d627"
  job_id: "efb42ef3-6485-4797-8fd9-9517eeeb2da2"
  train: true
  metric {
    key: "Loss"
    value: 0.469229072
  }
  epoch: 1
  total_samples: 4800
  batch_samples: 48
  num_batches: 100
  num_samples_trained: 3360
  iteration: 70
}

Printing the class
<class 'fedbiomed.common.message.FeedbackMessage'>
researcher_id: "researcher_b69d4946-1045-4d62-bd6a-150907516a12"
node_id: "node_41533df5-d07b-4027-a826-d1f67410d627"
job_id: "efb42ef3-6485-4797-8fd9-9517eeeb2da2"
train: true
metric {
  key: "Loss"
  value: 0.469229072
}
epoch: 1
total_samples: 4800
batch_samples: 48
num_batches: 100
num_samples_trained: 3360
iteration: 70

Printing conversion
Printing the class
<class 'fedbiomed.common.message.Scalar'>
<class 'fedbiomed.common.message.Scalar'>
Printing_dict
{'researcher_id': 'researcher_b69d4946-1045-4d62-bd6a-150907516a12', 'node_i

2023-09-01 17:45:54,698 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_41533df5-d07b-4027-a826-d1f67410d627 
					 Round 1 Epoch: 1 | Iteration: 80/100 (80%) | Samples: 3840/4800
 					 Loss: [1m0.301287[0m 
					 ---------


protocol_version: "1"
scalar {
  researcher_id: "researcher_b69d4946-1045-4d62-bd6a-150907516a12"
  node_id: "node_41533df5-d07b-4027-a826-d1f67410d627"
  job_id: "efb42ef3-6485-4797-8fd9-9517eeeb2da2"
  train: true
  metric {
    key: "Loss"
    value: 0.301286697
  }
  epoch: 1
  total_samples: 4800
  batch_samples: 48
  num_batches: 100
  num_samples_trained: 3840
  iteration: 80
}

Printing the class
<class 'fedbiomed.common.message.FeedbackMessage'>
researcher_id: "researcher_b69d4946-1045-4d62-bd6a-150907516a12"
node_id: "node_41533df5-d07b-4027-a826-d1f67410d627"
job_id: "efb42ef3-6485-4797-8fd9-9517eeeb2da2"
train: true
metric {
  key: "Loss"
  value: 0.301286697
}
epoch: 1
total_samples: 4800
batch_samples: 48
num_batches: 100
num_samples_trained: 3840
iteration: 80

Printing conversion
Printing the class
<class 'fedbiomed.common.message.Scalar'>
<class 'fedbiomed.common.message.Scalar'>
Printing_dict
{'researcher_id': 'researcher_b69d4946-1045-4d62-bd6a-150907516a12', 'node_i

2023-09-01 17:45:55,554 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_41533df5-d07b-4027-a826-d1f67410d627 
					 Round 1 Epoch: 1 | Iteration: 90/100 (90%) | Samples: 4320/4800
 					 Loss: [1m0.168976[0m 
					 ---------


protocol_version: "1"
scalar {
  researcher_id: "researcher_b69d4946-1045-4d62-bd6a-150907516a12"
  node_id: "node_41533df5-d07b-4027-a826-d1f67410d627"
  job_id: "efb42ef3-6485-4797-8fd9-9517eeeb2da2"
  train: true
  metric {
    key: "Loss"
    value: 0.168975964
  }
  epoch: 1
  total_samples: 4800
  batch_samples: 48
  num_batches: 100
  num_samples_trained: 4320
  iteration: 90
}

Printing the class
<class 'fedbiomed.common.message.FeedbackMessage'>
researcher_id: "researcher_b69d4946-1045-4d62-bd6a-150907516a12"
node_id: "node_41533df5-d07b-4027-a826-d1f67410d627"
job_id: "efb42ef3-6485-4797-8fd9-9517eeeb2da2"
train: true
metric {
  key: "Loss"
  value: 0.168975964
}
epoch: 1
total_samples: 4800
batch_samples: 48
num_batches: 100
num_samples_trained: 4320
iteration: 90

Printing conversion
Printing the class
<class 'fedbiomed.common.message.Scalar'>
<class 'fedbiomed.common.message.Scalar'>
Printing_dict
{'researcher_id': 'researcher_b69d4946-1045-4d62-bd6a-150907516a12', 'node_i

2023-09-01 17:45:56,419 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_41533df5-d07b-4027-a826-d1f67410d627 
					 Round 1 Epoch: 1 | Iteration: 100/100 (100%) | Samples: 4800/4800
 					 Loss: [1m0.281088[0m 
					 ---------
2023-09-01 17:45:56,472 fedbiomed INFO - {'protocol_version': '1', 'researcher_id': 'researcher_b69d4946-1045-4d62-bd6a-150907516a12', 'job_id': 'efb42ef3-6485-4797-8fd9-9517eeeb2da2', 'success': True, 'node_id': 'node_41533df5-d07b-4027-a826-d1f67410d627', 'dataset_id': 'dataset_3013e72d-de68-42a6-bacf-adf1ab3af98b', 'timing': {'rtime_training': 7.9441012059978675, 'ptime_training': 31.405729196}, 'sample_size': 60000, 'msg': '', 'command': 'train', 'encrypted': False, 'params': {'conv1.weight': tensor([[[[-0.2975,  0.3334,  0.1305],
          [ 0.3259, -0.1353,  0.2768],
          [-0.0689,  0.2435,  0.3076]]],


        [[[-0.2072, -0.0828,  0.0129],
          [-0.0412, -0.1195, -0.2938],
          [ 0.0011,  0.2108, -0.0204]]],


        [[[-0.1287, 

protocol_version: "1"
scalar {
  researcher_id: "researcher_b69d4946-1045-4d62-bd6a-150907516a12"
  node_id: "node_41533df5-d07b-4027-a826-d1f67410d627"
  job_id: "efb42ef3-6485-4797-8fd9-9517eeeb2da2"
  train: true
  metric {
    key: "Loss"
    value: 0.281087518
  }
  epoch: 1
  total_samples: 4800
  batch_samples: 48
  num_batches: 100
  num_samples_trained: 4800
  iteration: 100
}

Printing the class
<class 'fedbiomed.common.message.FeedbackMessage'>
researcher_id: "researcher_b69d4946-1045-4d62-bd6a-150907516a12"
node_id: "node_41533df5-d07b-4027-a826-d1f67410d627"
job_id: "efb42ef3-6485-4797-8fd9-9517eeeb2da2"
train: true
metric {
  key: "Loss"
  value: 0.281087518
}
epoch: 1
total_samples: 4800
batch_samples: 48
num_batches: 100
num_samples_trained: 4800
iteration: 100

Printing conversion
Printing the class
<class 'fedbiomed.common.message.Scalar'>
<class 'fedbiomed.common.message.Scalar'>
Printing_dict
{'researcher_id': 'researcher_b69d4946-1045-4d62-bd6a-150907516a12', 'node

2023-09-01 17:46:03,273 fedbiomed INFO - Nodes that successfully reply in round 0 ['node_41533df5-d07b-4027-a826-d1f67410d627']
2023-09-01 17:46:03,289 fedbiomed INFO - Saved aggregated params for round 0 in /home/scansiz/projects/fedbiomed-dev/fedbiomed/var/experiments/Experiment_0073/aggregated_params_c4335588-2734-44e3-a782-d4eb0d6392d9.mpk
2023-09-01 17:46:03,290 fedbiomed INFO - Sampled nodes in round 1 ['node_41533df5-d07b-4027-a826-d1f67410d627']
2023-09-01 17:46:03,292 fedbiomed INFO - [1mSending request[0m 
					[1m To[0m: node_41533df5-d07b-4027-a826-d1f67410d627 
					[1m Request: [0m: TRAIN
 -----------------------------------------------------------------
2023-09-01 17:46:03,329 fedbiomed INFO - Received request form node_41533df5-d07b-4027-a826-d1f67410d627
2023-09-01 17:46:03,329 fedbiomed INFO - Node agent created node_41533df5-d07b-4027-a826-d1f67410d627
2023-09-01 17:46:03,330 fedbiomed INFO - Waiting for tasks
2023-09-01 17:46:03,486 fedbiomed INFO - [1mTRAINI

Sending request!
Printing the class
<class 'fedbiomed.common.message.TaskRequest'>
<class 'fedbiomed.common.message.TaskRequest'>
Printing_dict
{'node': 'node_41533df5-d07b-4027-a826-d1f67410d627', 'protocol_version': '1'}
{'protocol_version': '1', 'node': 'node_41533df5-d07b-4027-a826-d1f67410d627'}
<class 'dict'>
protocol_version: "1"
scalar {
  researcher_id: "researcher_b69d4946-1045-4d62-bd6a-150907516a12"
  node_id: "node_41533df5-d07b-4027-a826-d1f67410d627"
  job_id: "efb42ef3-6485-4797-8fd9-9517eeeb2da2"
  train: true
  metric {
    key: "Loss"
    value: 0.386612147
  }
  epoch: 1
  total_samples: 4800
  batch_samples: 48
  num_batches: 100
  num_samples_trained: 48
  iteration: 1
}

Printing the class
<class 'fedbiomed.common.message.FeedbackMessage'>
researcher_id: "researcher_b69d4946-1045-4d62-bd6a-150907516a12"
node_id: "node_41533df5-d07b-4027-a826-d1f67410d627"
job_id: "efb42ef3-6485-4797-8fd9-9517eeeb2da2"
train: true
metric {
  key: "Loss"
  value: 0.386612147
}
epoc

2023-09-01 17:46:04,154 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_41533df5-d07b-4027-a826-d1f67410d627 
					 Round 2 Epoch: 1 | Iteration: 10/100 (10%) | Samples: 480/4800
 					 Loss: [1m0.268395[0m 
					 ---------


protocol_version: "1"
scalar {
  researcher_id: "researcher_b69d4946-1045-4d62-bd6a-150907516a12"
  node_id: "node_41533df5-d07b-4027-a826-d1f67410d627"
  job_id: "efb42ef3-6485-4797-8fd9-9517eeeb2da2"
  train: true
  metric {
    key: "Loss"
    value: 0.268395096
  }
  epoch: 1
  total_samples: 4800
  batch_samples: 48
  num_batches: 100
  num_samples_trained: 480
  iteration: 10
}

Printing the class
<class 'fedbiomed.common.message.FeedbackMessage'>
researcher_id: "researcher_b69d4946-1045-4d62-bd6a-150907516a12"
node_id: "node_41533df5-d07b-4027-a826-d1f67410d627"
job_id: "efb42ef3-6485-4797-8fd9-9517eeeb2da2"
train: true
metric {
  key: "Loss"
  value: 0.268395096
}
epoch: 1
total_samples: 4800
batch_samples: 48
num_batches: 100
num_samples_trained: 480
iteration: 10

Printing conversion
Printing the class
<class 'fedbiomed.common.message.Scalar'>
<class 'fedbiomed.common.message.Scalar'>
Printing_dict
{'researcher_id': 'researcher_b69d4946-1045-4d62-bd6a-150907516a12', 'node_id'

2023-09-01 17:46:04,850 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_41533df5-d07b-4027-a826-d1f67410d627 
					 Round 2 Epoch: 1 | Iteration: 20/100 (20%) | Samples: 960/4800
 					 Loss: [1m0.319451[0m 
					 ---------


protocol_version: "1"
scalar {
  researcher_id: "researcher_b69d4946-1045-4d62-bd6a-150907516a12"
  node_id: "node_41533df5-d07b-4027-a826-d1f67410d627"
  job_id: "efb42ef3-6485-4797-8fd9-9517eeeb2da2"
  train: true
  metric {
    key: "Loss"
    value: 0.319450587
  }
  epoch: 1
  total_samples: 4800
  batch_samples: 48
  num_batches: 100
  num_samples_trained: 960
  iteration: 20
}

Printing the class
<class 'fedbiomed.common.message.FeedbackMessage'>
researcher_id: "researcher_b69d4946-1045-4d62-bd6a-150907516a12"
node_id: "node_41533df5-d07b-4027-a826-d1f67410d627"
job_id: "efb42ef3-6485-4797-8fd9-9517eeeb2da2"
train: true
metric {
  key: "Loss"
  value: 0.319450587
}
epoch: 1
total_samples: 4800
batch_samples: 48
num_batches: 100
num_samples_trained: 960
iteration: 20

Printing conversion
Printing the class
<class 'fedbiomed.common.message.Scalar'>
<class 'fedbiomed.common.message.Scalar'>
Printing_dict
{'researcher_id': 'researcher_b69d4946-1045-4d62-bd6a-150907516a12', 'node_id'

2023-09-01 17:46:05,524 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_41533df5-d07b-4027-a826-d1f67410d627 
					 Round 2 Epoch: 1 | Iteration: 30/100 (30%) | Samples: 1440/4800
 					 Loss: [1m0.154476[0m 
					 ---------


protocol_version: "1"
scalar {
  researcher_id: "researcher_b69d4946-1045-4d62-bd6a-150907516a12"
  node_id: "node_41533df5-d07b-4027-a826-d1f67410d627"
  job_id: "efb42ef3-6485-4797-8fd9-9517eeeb2da2"
  train: true
  metric {
    key: "Loss"
    value: 0.154475853
  }
  epoch: 1
  total_samples: 4800
  batch_samples: 48
  num_batches: 100
  num_samples_trained: 1440
  iteration: 30
}

Printing the class
<class 'fedbiomed.common.message.FeedbackMessage'>
researcher_id: "researcher_b69d4946-1045-4d62-bd6a-150907516a12"
node_id: "node_41533df5-d07b-4027-a826-d1f67410d627"
job_id: "efb42ef3-6485-4797-8fd9-9517eeeb2da2"
train: true
metric {
  key: "Loss"
  value: 0.154475853
}
epoch: 1
total_samples: 4800
batch_samples: 48
num_batches: 100
num_samples_trained: 1440
iteration: 30

Printing conversion
Printing the class
<class 'fedbiomed.common.message.Scalar'>
<class 'fedbiomed.common.message.Scalar'>
Printing_dict
{'researcher_id': 'researcher_b69d4946-1045-4d62-bd6a-150907516a12', 'node_i

2023-09-01 17:46:06,323 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_41533df5-d07b-4027-a826-d1f67410d627 
					 Round 2 Epoch: 1 | Iteration: 40/100 (40%) | Samples: 1920/4800
 					 Loss: [1m0.415504[0m 
					 ---------


protocol_version: "1"
scalar {
  researcher_id: "researcher_b69d4946-1045-4d62-bd6a-150907516a12"
  node_id: "node_41533df5-d07b-4027-a826-d1f67410d627"
  job_id: "efb42ef3-6485-4797-8fd9-9517eeeb2da2"
  train: true
  metric {
    key: "Loss"
    value: 0.415504247
  }
  epoch: 1
  total_samples: 4800
  batch_samples: 48
  num_batches: 100
  num_samples_trained: 1920
  iteration: 40
}

Printing the class
<class 'fedbiomed.common.message.FeedbackMessage'>
researcher_id: "researcher_b69d4946-1045-4d62-bd6a-150907516a12"
node_id: "node_41533df5-d07b-4027-a826-d1f67410d627"
job_id: "efb42ef3-6485-4797-8fd9-9517eeeb2da2"
train: true
metric {
  key: "Loss"
  value: 0.415504247
}
epoch: 1
total_samples: 4800
batch_samples: 48
num_batches: 100
num_samples_trained: 1920
iteration: 40

Printing conversion
Printing the class
<class 'fedbiomed.common.message.Scalar'>
<class 'fedbiomed.common.message.Scalar'>
Printing_dict
{'researcher_id': 'researcher_b69d4946-1045-4d62-bd6a-150907516a12', 'node_i

2023-09-01 17:46:07,113 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_41533df5-d07b-4027-a826-d1f67410d627 
					 Round 2 Epoch: 1 | Iteration: 50/100 (50%) | Samples: 2400/4800
 					 Loss: [1m0.262338[0m 
					 ---------


protocol_version: "1"
scalar {
  researcher_id: "researcher_b69d4946-1045-4d62-bd6a-150907516a12"
  node_id: "node_41533df5-d07b-4027-a826-d1f67410d627"
  job_id: "efb42ef3-6485-4797-8fd9-9517eeeb2da2"
  train: true
  metric {
    key: "Loss"
    value: 0.26233843
  }
  epoch: 1
  total_samples: 4800
  batch_samples: 48
  num_batches: 100
  num_samples_trained: 2400
  iteration: 50
}

Printing the class
<class 'fedbiomed.common.message.FeedbackMessage'>
researcher_id: "researcher_b69d4946-1045-4d62-bd6a-150907516a12"
node_id: "node_41533df5-d07b-4027-a826-d1f67410d627"
job_id: "efb42ef3-6485-4797-8fd9-9517eeeb2da2"
train: true
metric {
  key: "Loss"
  value: 0.26233843
}
epoch: 1
total_samples: 4800
batch_samples: 48
num_batches: 100
num_samples_trained: 2400
iteration: 50

Printing conversion
Printing the class
<class 'fedbiomed.common.message.Scalar'>
<class 'fedbiomed.common.message.Scalar'>
Printing_dict
{'researcher_id': 'researcher_b69d4946-1045-4d62-bd6a-150907516a12', 'node_id'

2023-09-01 17:46:07,817 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_41533df5-d07b-4027-a826-d1f67410d627 
					 Round 2 Epoch: 1 | Iteration: 60/100 (60%) | Samples: 2880/4800
 					 Loss: [1m0.262114[0m 
					 ---------


protocol_version: "1"
scalar {
  researcher_id: "researcher_b69d4946-1045-4d62-bd6a-150907516a12"
  node_id: "node_41533df5-d07b-4027-a826-d1f67410d627"
  job_id: "efb42ef3-6485-4797-8fd9-9517eeeb2da2"
  train: true
  metric {
    key: "Loss"
    value: 0.26211378
  }
  epoch: 1
  total_samples: 4800
  batch_samples: 48
  num_batches: 100
  num_samples_trained: 2880
  iteration: 60
}

Printing the class
<class 'fedbiomed.common.message.FeedbackMessage'>
researcher_id: "researcher_b69d4946-1045-4d62-bd6a-150907516a12"
node_id: "node_41533df5-d07b-4027-a826-d1f67410d627"
job_id: "efb42ef3-6485-4797-8fd9-9517eeeb2da2"
train: true
metric {
  key: "Loss"
  value: 0.26211378
}
epoch: 1
total_samples: 4800
batch_samples: 48
num_batches: 100
num_samples_trained: 2880
iteration: 60

Printing conversion
Printing the class
<class 'fedbiomed.common.message.Scalar'>
<class 'fedbiomed.common.message.Scalar'>
Printing_dict
{'researcher_id': 'researcher_b69d4946-1045-4d62-bd6a-150907516a12', 'node_id'

2023-09-01 17:46:08,592 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_41533df5-d07b-4027-a826-d1f67410d627 
					 Round 2 Epoch: 1 | Iteration: 70/100 (70%) | Samples: 3360/4800
 					 Loss: [1m0.224867[0m 
					 ---------


protocol_version: "1"
scalar {
  researcher_id: "researcher_b69d4946-1045-4d62-bd6a-150907516a12"
  node_id: "node_41533df5-d07b-4027-a826-d1f67410d627"
  job_id: "efb42ef3-6485-4797-8fd9-9517eeeb2da2"
  train: true
  metric {
    key: "Loss"
    value: 0.224867105
  }
  epoch: 1
  total_samples: 4800
  batch_samples: 48
  num_batches: 100
  num_samples_trained: 3360
  iteration: 70
}

Printing the class
<class 'fedbiomed.common.message.FeedbackMessage'>
researcher_id: "researcher_b69d4946-1045-4d62-bd6a-150907516a12"
node_id: "node_41533df5-d07b-4027-a826-d1f67410d627"
job_id: "efb42ef3-6485-4797-8fd9-9517eeeb2da2"
train: true
metric {
  key: "Loss"
  value: 0.224867105
}
epoch: 1
total_samples: 4800
batch_samples: 48
num_batches: 100
num_samples_trained: 3360
iteration: 70

Printing conversion
Printing the class
<class 'fedbiomed.common.message.Scalar'>
<class 'fedbiomed.common.message.Scalar'>
Printing_dict
{'researcher_id': 'researcher_b69d4946-1045-4d62-bd6a-150907516a12', 'node_i

2023-09-01 17:46:09,358 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_41533df5-d07b-4027-a826-d1f67410d627 
					 Round 2 Epoch: 1 | Iteration: 80/100 (80%) | Samples: 3840/4800
 					 Loss: [1m0.248535[0m 
					 ---------


protocol_version: "1"
scalar {
  researcher_id: "researcher_b69d4946-1045-4d62-bd6a-150907516a12"
  node_id: "node_41533df5-d07b-4027-a826-d1f67410d627"
  job_id: "efb42ef3-6485-4797-8fd9-9517eeeb2da2"
  train: true
  metric {
    key: "Loss"
    value: 0.248535097
  }
  epoch: 1
  total_samples: 4800
  batch_samples: 48
  num_batches: 100
  num_samples_trained: 3840
  iteration: 80
}

Printing the class
<class 'fedbiomed.common.message.FeedbackMessage'>
researcher_id: "researcher_b69d4946-1045-4d62-bd6a-150907516a12"
node_id: "node_41533df5-d07b-4027-a826-d1f67410d627"
job_id: "efb42ef3-6485-4797-8fd9-9517eeeb2da2"
train: true
metric {
  key: "Loss"
  value: 0.248535097
}
epoch: 1
total_samples: 4800
batch_samples: 48
num_batches: 100
num_samples_trained: 3840
iteration: 80

Printing conversion
Printing the class
<class 'fedbiomed.common.message.Scalar'>
<class 'fedbiomed.common.message.Scalar'>
Printing_dict
{'researcher_id': 'researcher_b69d4946-1045-4d62-bd6a-150907516a12', 'node_i

2023-09-01 17:46:10,059 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_41533df5-d07b-4027-a826-d1f67410d627 
					 Round 2 Epoch: 1 | Iteration: 90/100 (90%) | Samples: 4320/4800
 					 Loss: [1m0.344932[0m 
					 ---------


protocol_version: "1"
scalar {
  researcher_id: "researcher_b69d4946-1045-4d62-bd6a-150907516a12"
  node_id: "node_41533df5-d07b-4027-a826-d1f67410d627"
  job_id: "efb42ef3-6485-4797-8fd9-9517eeeb2da2"
  train: true
  metric {
    key: "Loss"
    value: 0.344931751
  }
  epoch: 1
  total_samples: 4800
  batch_samples: 48
  num_batches: 100
  num_samples_trained: 4320
  iteration: 90
}

Printing the class
<class 'fedbiomed.common.message.FeedbackMessage'>
researcher_id: "researcher_b69d4946-1045-4d62-bd6a-150907516a12"
node_id: "node_41533df5-d07b-4027-a826-d1f67410d627"
job_id: "efb42ef3-6485-4797-8fd9-9517eeeb2da2"
train: true
metric {
  key: "Loss"
  value: 0.344931751
}
epoch: 1
total_samples: 4800
batch_samples: 48
num_batches: 100
num_samples_trained: 4320
iteration: 90

Printing conversion
Printing the class
<class 'fedbiomed.common.message.Scalar'>
<class 'fedbiomed.common.message.Scalar'>
Printing_dict
{'researcher_id': 'researcher_b69d4946-1045-4d62-bd6a-150907516a12', 'node_i

2023-09-01 17:46:10,847 fedbiomed INFO - [1mTRAINING[0m 
					 NODE_ID: node_41533df5-d07b-4027-a826-d1f67410d627 
					 Round 2 Epoch: 1 | Iteration: 100/100 (100%) | Samples: 4800/4800
 					 Loss: [1m0.103830[0m 
					 ---------
2023-09-01 17:46:10,881 fedbiomed INFO - {'protocol_version': '1', 'researcher_id': 'researcher_b69d4946-1045-4d62-bd6a-150907516a12', 'job_id': 'efb42ef3-6485-4797-8fd9-9517eeeb2da2', 'success': True, 'node_id': 'node_41533df5-d07b-4027-a826-d1f67410d627', 'dataset_id': 'dataset_3013e72d-de68-42a6-bacf-adf1ab3af98b', 'timing': {'rtime_training': 7.417088494999916, 'ptime_training': 29.450622437000003}, 'sample_size': 60000, 'msg': '', 'command': 'train', 'encrypted': False, 'params': {'conv1.weight': tensor([[[[-3.0680e-01,  3.3002e-01,  1.2871e-01],
          [ 3.1714e-01, -1.4004e-01,  2.7606e-01],
          [-8.2937e-02,  2.3947e-01,  3.1087e-01]]],


        [[[-2.1010e-01, -7.8932e-02,  1.9804e-02],
          [-3.7826e-02, -1.2928e-01, -2.8969e-01]

protocol_version: "1"
scalar {
  researcher_id: "researcher_b69d4946-1045-4d62-bd6a-150907516a12"
  node_id: "node_41533df5-d07b-4027-a826-d1f67410d627"
  job_id: "efb42ef3-6485-4797-8fd9-9517eeeb2da2"
  train: true
  metric {
    key: "Loss"
    value: 0.103829898
  }
  epoch: 1
  total_samples: 4800
  batch_samples: 48
  num_batches: 100
  num_samples_trained: 4800
  iteration: 100
}

Printing the class
<class 'fedbiomed.common.message.FeedbackMessage'>
researcher_id: "researcher_b69d4946-1045-4d62-bd6a-150907516a12"
node_id: "node_41533df5-d07b-4027-a826-d1f67410d627"
job_id: "efb42ef3-6485-4797-8fd9-9517eeeb2da2"
train: true
metric {
  key: "Loss"
  value: 0.103829898
}
epoch: 1
total_samples: 4800
batch_samples: 48
num_batches: 100
num_samples_trained: 4800
iteration: 100

Printing conversion
Printing the class
<class 'fedbiomed.common.message.Scalar'>
<class 'fedbiomed.common.message.Scalar'>
Printing_dict
{'researcher_id': 'researcher_b69d4946-1045-4d62-bd6a-150907516a12', 'node

2023-09-01 17:46:18,318 fedbiomed INFO - Nodes that successfully reply in round 1 ['node_41533df5-d07b-4027-a826-d1f67410d627']
2023-09-01 17:46:18,376 fedbiomed INFO - Saved aggregated params for round 1 in /home/scansiz/projects/fedbiomed-dev/fedbiomed/var/experiments/Experiment_0073/aggregated_params_3ac260a7-b5c2-441f-9329-5cc63b315c4e.mpk


2

2023-09-01 17:47:03,333 fedbiomed INFO - Received request form node_41533df5-d07b-4027-a826-d1f67410d627
2023-09-01 17:47:03,335 fedbiomed INFO - Node agent created node_41533df5-d07b-4027-a826-d1f67410d627
2023-09-01 17:47:03,338 fedbiomed INFO - Waiting for tasks


Printing the class
<class 'fedbiomed.common.message.TaskRequest'>
<class 'fedbiomed.common.message.TaskRequest'>
Printing_dict
{'node': 'node_41533df5-d07b-4027-a826-d1f67410d627', 'protocol_version': '1'}
{'protocol_version': '1', 'node': 'node_41533df5-d07b-4027-a826-d1f67410d627'}
<class 'dict'>


Local training results for each round and each node are available via `exp.training_replies()` (index 0 to (`rounds` - 1) ).

For example you can view the training results for the last round below.

Different timings (in seconds) are reported for each dataset of a node participating in a round :
- `rtime_training` real time (clock time) spent in the training function on the node
- `ptime_training` process time (user and system CPU) spent in the training function on the node
- `rtime_total` real time (clock time) spent in the researcher between sending the request and handling the response, at the `Job()` layer

In [None]:
print("\nList the training rounds : ", exp.training_replies().keys())

print("\nList the nodes for the last training round and their timings : ")
round_data = exp.training_replies()[rounds - 1].data()
for c in range(len(round_data)):
    print("\t- {id} :\
    \n\t\trtime_training={rtraining:.2f} seconds\
    \n\t\tptime_training={ptraining:.2f} seconds\
    \n\t\trtime_total={rtotal:.2f} seconds".format(id = round_data[c]['node_id'],
        rtraining = round_data[c]['timing']['rtime_training'],
        ptraining = round_data[c]['timing']['ptime_training'],
        rtotal = round_data[c]['timing']['rtime_total']))
print('\n')
    
exp.training_replies()[rounds - 1].dataframe()

Federated parameters for each round are available via `exp.aggregated_params()` (index 0 to (`rounds` - 1) ).

For example you can view the federated parameters for the last round of the experiment :

In [None]:
print("\nList the training rounds : ", exp.aggregated_params().keys())

print("\nAccess the federated params for the last training round :")
print("\t- params_path: ", exp.aggregated_params()[rounds - 1]['params_path'])
print("\t- parameter data: ", exp.aggregated_params()[rounds - 1]['params'].keys())


Feel free to run other sample notebooks or try your own models :D