# nnUNet Experiment in NVFlare

This tutorial will guide you through the process of running a nnUNet experiment in a Federated Learning context, by using the NVIDIA Flare API.

In this tutorial, we will use the nnUNet framework to train a model on a dataset that is distributed across multiple sites. The training will be done in a secure and privacy-preserving manner, by using the Federated Learning capabilities of the NVIDIA Flare API.

### Model Training

In detail, the following steps will be performed:
1. Dataset Preparation: The dataset in the different sites will be prepared for training, harmonizing the data and creating the necessary files for training according to the nnUNet framework.
2. nnUNet Experiment Planning and Preprocessing: One of the sites will be selected as the main site, where the nnUNet experiment will be planned and the data will be preprocessed. This step has to be done only on one site, as the nnUNet plans will be shared with the other sites.
3. nnUNet Preprocessing: The data will be preprocessed according to the nnUNet plan in all the other sites.
4. nnUNet Training: The model will be trained on the data of all the sites, using the nnUNet framework and aggregating the local gradients from the different sites.

### Offline Training
Offline training is also made available for the clients, to test the training process and evaluate the model at each individual site, establishing a baseline before moving on to the Federated Learning phase.
In this fase, Data Preparation, Experiment Planning and Preprocessing, Training, Model Conversion, Validation and Packaging will be done on each site separately.

### Model Deployment
In addition, some specific jobs will be executed to:
- Convert the model from the MONAI Bundle standard to the nnUNet format.
- Validate the model using the nnUNet framework.
- Package the model into a MONAI Bundle, ready for deployment.
- Evaluate the model across different sites.

### PREREQUISITES
This tutorial assumes that you have already installed the NVIDIA Flare API and have access to a Federated Learning cluster, with ``Lead`` role. If you haven't done so, please refer to the [Getting Started](../README.md#getting-started) section of the Tutorial section.

Additionally, all the sites should have the necessary data for training ready in a known location ( not necessarily the same location across all sites).

## Configure the Federation in POC Mode

Before starting the tutorial, we need to configure the federation in NVFlare POC (Proof of Concept) mode. This mode allows the federation to be configured within a single site, which is useful for testing and development purposes.

In [None]:
%%bash

export NVFLARE_POC_WORKSPACE="/home/maia-user/Tutorials/NVFlare_POC"

/opt/conda/envs/MAIA/bin/nvflare poc prepare

In [None]:
%%bash

export NVFLARE_POC_WORKSPACE="/home/maia-user/Tutorials/NVFlare_POC"

/opt/conda/envs/MAIA/bin/nvflare poc start

or

In [None]:
from nvflare.tool.poc.poc_commands import _prepare_poc, _start_poc, _stop_poc, _clean_poc

_prepare_poc(
    ["site-1","site-2"],
    2,
    "/home/maia-user/Tutorials/NVFlare_POC",
)
_start_poc("/home/maia-user/Tutorials/NVFlare_POC",[0])

## Split the Data Across Sites

After starting the federation in POC mode, we need to split the data across the two different sites. In this tutorial, we will randomly split the Decathlon Spleen dataset into two parts, one for each site.


In [None]:
from PyMAIA.utils.file_utils import subfiles
from random import shuffle
import shutil
from pathlib import Path
image_dir = "/home/maia-user/Tutorials/MAIA/Task09_Spleen/imagesTr"
label_dir = "/home/maia-user/Tutorials/MAIA/Task09_Spleen/labelsTr"

# Site 1 will have the "Decathlon" dataset format
Path("/home/maia-user/Tutorials/NVFlare_POC/data/site-1/imagesTr").mkdir(parents=True, exist_ok=True)
Path("/home/maia-user/Tutorials/NVFlare_POC/data/site-1/labelsTr").mkdir(parents=True, exist_ok=True)

# Site 2 will have the "Subfolders" dataset format
Path("/home/maia-user/Tutorials/NVFlare_POC/data/site-2").mkdir(parents=True, exist_ok=True)

images = subfiles(image_dir, suffix=".nii.gz")
labels = subfiles(label_dir, suffix=".nii.gz")

print(f"Images: {len(images)}")
print(f"Labels: {len(labels)}")

shuffle(images)

for image in images[:len(images)//2]:
    shutil.copy(image, "/home/maia-user/Tutorials/NVFlare_POC/data/site-1/imagesTr")
    shutil.copy(image.replace("imagesTr", "labelsTr"), "/home/maia-user/Tutorials/NVFlare_POC/data/site-1/labelsTr")

for image in images[len(images)//2:]:
    id = image.split("/")[-1].split(".")[0]
    Path(f"/home/maia-user/Tutorials/NVFlare_POC/data/site-2/{id}").mkdir(parents=True, exist_ok=True)
    shutil.copy(image, f"/home/maia-user/Tutorials/NVFlare_POC/data/site-2/{id}/{id}_CT.nii.gz")
    shutil.copy(image.replace("imagesTr", "labelsTr"), f"/home/maia-user/Tutorials/NVFlare_POC/data/site-2/{id}/{id}_label.nii.gz")


## FL Cluster Authentication

In [3]:
from nvflare.fuel.flare_api.flare_api import new_secure_session
sess = new_secure_session(
    "admin@nvidia.com",
    "NVFlare_POC/example_project/prod_00/admin@nvidia.com"
)

In [4]:
print(sess.get_system_info())

SystemInfo
server_info:
status: stopped, start_time: Tue Dec 17 07:16:29 2024
client_info:
site-2(last_connect_time: Tue Dec 17 07:16:36 2024)
site-1(last_connect_time: Tue Dec 17 07:16:37 2024)
job_info:



## List Jobs

In [5]:
jobs = sess.list_jobs()

In [6]:
jobs[-1]

{'job_id': '6b26b950-4f40-40e0-81ae-27430d5f24a7',
 'job_name': 'fl_bundle',
 'status': 'FINISHED:ABANDONED',
 'submit_time': '2024-12-16T16:36:25.865534+00:00',
 'duration': 'N/A'}

In [8]:
job_id = jobs[-1]['job_id']

## Terminate Jobs

In [10]:
sess.abort_job(job_id)

## TODO

pymaia_nvflare from official PyMAIA
patient_id_in_file_identifier: True #Add Docs

## Job Preparation

The first step in the Federated Learning process is to prepare the job configuration files. The job configuration files contain the necessary information to run the job on the Federated Learning cluster, such as the job type, the resources required, and the parameters for the job execution.

The job configurations files are automatically generated by the script `generate_job_configs`, which is installed together with this package. The script takes as input client-specific configuration, together with the experiment-specific configuration, and generates the job configuration files for each client.


### Client Configuration
The client-specific configuration file should be in the following format:

```yaml
data_dir: "<DATASET_FOLDER>"
modality_dict:
  ct: "<CT_SUFFIX>"
  label: "<SEG_MASK_SUFFIX>"
dataset_format: "<DATASET_FORMAT>"
patient_id_in_file_identifier: True
nnunet_root_folder: "<NNUNET_ROOT_FOLDER>"
Client_Name: "<CLIENT_NAME>"
pymaia_config_file: "<CONFIG_FILE_PATH>"
nnunet_model_folder: "<NNUNET_MODEL_FOLDER>"
bundle_root: "<BUNDLE_ROOT>"
```

`dataset_format` should refer to one of  these three different formats, according to the `data_dir` structure:
1. `subfolders`: The dataset is organized in subfolders, where each subfolder corresponds to a subject and contains the images and labels for that subject.
```plaintext
  [Dataset_folder]
        [Subject_0]
            - Subject_0_image0.nii.gz    # Subject_0 modality 0
            - Subject_0_image1.nii.gz    # Subject_0 modality 1
            - Subject_0_mask.nii.gz      # Subject_0 semantic segmentation mask
        [Subject_1]
            - Subject_1_image0.nii.gz    # Subject_1 modality 0
            - Subject_1_image1.nii.gz    # Subject_1 modality 1
            - Subject_1_mask.nii.gz      # Subject_1 semantic segmentation mask
        ...

```
2. `decathlon`: The dataset is organized in the format of the Medical Decathlon challenge, where the images and labels are stored in separate folders.
```plaintext
  [Dataset_folder]
        [imagesTr]
            - Subject_0_image0.nii.gz    # Subject_0 modality 0
            - Subject_0_image1.nii.gz    # Subject_0 modality 1
            - Subject_1_image0.nii.gz    # Subject_1 modality 0
            - Subject_1_image1.nii.gz    # Subject_1 modality 1
        [labelsTr]
            - Subject_1_mask.nii.gz      # Subject_1 semantic segmentation mask
            - Subject_0_mask.nii.gz      # Subject_0 semantic segmentation mask
        ...

```

3. `nnunet`: The dataset has been already prepared according to the nnUNet framework, with the images and labels stored in separate folders.
```plaintext
  [nnUNet_raw]
      [DatasetXYZ_TaskName]  # THIS IS THE DATASET FOLDER
          dataset.json
          [imagesTr]
              - Subject_0_image0.nii.gz    # Subject_0 modality 0
              - Subject_0_image1.nii.gz    # Subject_0 modality 1
              - Subject_1_image0.nii.gz    # Subject_1 modality 0
              - Subject_1_image1.nii.gz    # Subject_1 modality 1
          [labelsTr]
              - Subject_1_mask.nii.gz      # Subject_1 semantic segmentation mask
              - Subject_0_mask.nii.gz      # Subject_0 semantic segmentation mask
        ...

```

`nnunet_root_folder` should refer to the root folder used by the nnUnet framework, where the nnUNet experiments are stored.
For the `subfolders` and `decathlon` dataset formats, this folder is created during the dataset preparation step. 
For the `nnunet` dataset format, this folder should contain the nnUNet experiments, with the following structure:
```plaintext
  [nnunet_root_folder]
      [nnUNet_raw]
          [DatasetXYZ_TaskName]
              dataset.json
              [imagesTr]
                  - Subject_0_image0.nii.gz    # Subject_0 modality 0
                  - Subject_0_image1.nii.gz    # Subject_0 modality 1
                  - Subject_1_image0.nii.gz    # Subject_1 modality 0
                  - Subject_1_image1.nii.gz    # Subject_1 modality 1
              [labelsTr]
                  - Subject_1_mask.nii.gz      # Subject_1 semantic segmentation mask
                  - Subject_0_mask.nii.gz      # Subject_0 semantic segmentation mask
            ...

```

`modality_dict` is a dictionary that maps the modality names to the file suffixes. The suffixes are used to identify the files that correspond to the different modalities in the dataset. For example, if the CT images have the suffix `_CT.nii.gz`, the entry in the `modality_dict` should be `ct: "_CT.nii.gz"`.

`pymaia_config_file` is the path to the configuration file generated by PyMAIA. The file can be found in the `results_folder` ( `nnunet_root_folder/TaskName/TaskName_results`) of the PyMAIA experiment, with the name `DatasetXYZ_TaskName.json`.

`nnunet_model_folder` is the folder where the nnUNet folds and models are stored. This folder is created during the nnUNet experiment run:
`results_folder/DatasetXYZ_TaskName/TrainerClass__PlansName__3d_fullres`

`bundle_root` is the root folder where the MONAI Bundle will be stored in the client. It can be retrieved at the end of the `prepare_bundle` job.

### Experiment Configuration

The experiment-specific configuration file should be in the following format:

```yaml
task_ID: <NNUNET_TASK_ID>
Experiment Name: <NNUNET_TASK_NAME>
label_dict:
  background: 0
  Lesion: 1
MLFlow_Token: <MLFLOW_TOKEN>
Tracking_URI: <MLFLOW_URL>
Bundle_File: <BUNDLE_FILE.zip>
num_rounds: 100
start_round: 0
local_epochs: 10
server_bundle_root: "<SERVER_BUNDLE_ROOT>"
Minio_Endpoint: "MINIO_URL"
Minio_Access_Key: "ACCESS_KEY"
Minio_Secret_Key: "SECRET_KEY"
Minio_Bucket: "BUCKET_NAME"
```
`task_ID` and `Experiment Name` are used as a reference to the nnUNet task ID and name, respectively. These values are used to identify the nnUNet experiment in the nnUNet framework. `Experiment Name` is also used to identify the experiment in the MLFlow server, and to generate the zipped nnUNet model file (as `<Experiment Name>.zip`).

`label_dict` is a dictionary that maps the label names to the label values. The label values are used to identify the different classes in the semantic segmentation masks. For example, if the background class is labeled as `0` and the lesion class is labeled as `1`, the entry in the `label_dict` should be `background: 0, Lesion: 1`.

`MLFlow_Token` and `Tracking_URI` are used to connect to the MLFlow server, where the experiments are logged, and the trained models are uploaded.

`Bundle_File` is the name of the zipped MONAI Bundle file that contains the python code and the necessary configuration files to run the model training. This file will be uploaded on each site during the training phase, during the execution of the training job.

`server_bundle_root` is the root folder where the MONAI Bundle will be stored on the server.

The MinIO parameters are needed to connect to the MinIO server and upload the trained FL models. You can leave these parameters empty if you are not using MinIO.


### Prepare the Job Configuration Files

To prepare the job configuration files, run the following command:

```bash
generate_job_configs -c <CLIENT_CONFIG_FILE_1> <CLIENT_CONFIG_FILE_2> ... -e <EXPERIMENT_CONFIG_FILE> -sd <SCRIPT_DIR> -j <JOB_DIR>
```
where:
- `<CLIENT_CONFIG_FILE_1>`, `<CLIENT_CONFIG_FILE_2>`, ... are the client-specific configuration files for each client.
- `<EXPERIMENT_CONFIG_FILE>` is the experiment-specific configuration file.
- `<SCRIPT_DIR>` is the directory where the job scripts and python files are stored [nnUnet_NVFlare](./nnUNet_NVFlare).
- `<JOB_DIR>` is the directory where the job configuration files will be stored.

In [1]:
import sys
from pathlib import Path
 
ROOT_FOLDER = "/home/maia-user/shared"
sys.path.append(ROOT_FOLDER)

In [16]:
%%writefile /home/maia-user/shared/Experiments/Spleen.yaml

task_ID: "109"
Experiment Name: Task09_Spleen
label_dict:
  background: 0
  Spleen: 1
MLFlow_Token: ""
Tracking_URI: "https://monai-demo.maia.cloud.cbh.kth.se/mlflow"
Bundle_File: "Spleen_Bundle.zip"
num_rounds: 10
start_round: 0
local_epochs: 10
server_bundle_root: "/home/maia-user/shared/src/Spleen_Bundle"
Minio_Endpoint: ""
Minio_Access_Key: ""
Minio_Secret_Key: ""
Minio_Bucket: ""

Overwriting /home/maia-user/shared/Experiments/Spleen.yaml


In [65]:
%%writefile /home/maia-user/shared/Clients/site-1.yaml

data_dir: "/home/maia-user/Tutorials/NVFlare_POC/data/site-1"
modality_dict:
  ct: ".nii.gz"
  label: ".nii.gz"
dataset_format: "decathlon"
patient_id_in_file_identifier: True
nnunet_root_folder: "/home/maia-user/Tutorials/NVFlare_POC/Fed-nnUNet/site-1"
Client_Name: "site-1"
pymaia_config_file: "/home/maia-user/Tutorials/NVFlare_POC/Fed-nnUNet/site-1/Task09_Spleen/Task09_Spleen_results/Dataset109_Task09_Spleen.json"
nnunet_model_folder: "/home/maia-user/Tutorials/NVFlare_POC/Fed-nnUNet/site-1/Task09_Spleen/Task09_Spleen_results/Dataset109_Task09_Spleen/nnUNetTrainer__nnUNetPlans_3d_fullres"
bundle_root: "/home/maia-user/Tutorials/NVFlare_POC/example_project/prod_00/site-1/Spleen_Bundle"

Overwriting /home/maia-user/shared/Clients/site-1.yaml


In [66]:
%%writefile /home/maia-user/shared/Clients/site-2.yaml

data_dir: "/home/maia-user/Tutorials/NVFlare_POC/data/site-2"
modality_dict:
  ct: "_CT.nii.gz"
  label: "_label.nii.gz"
dataset_format: "subfolders"
patient_id_in_file_identifier: True
nnunet_root_folder: "/home/maia-user/Tutorials/NVFlare_POC/Fed-nnUNet/site-2"
Client_Name: "site-2"
pymaia_config_file: "/home/maia-user/Tutorials/NVFlare_POC/Fed-nnUNet/site-2/Task09_Spleen/Task09_Spleen_results/Dataset109_Task09_Spleen.json"
nnunet_model_folder: "/home/maia-user/Tutorials/NVFlare_POC/Fed-nnUNet/site-2/Task09_Spleen/Task09_Spleen_results/Dataset109_Task09_Spleen/nnUNetTrainer__nnUNetPlans_3d_fullres"
bundle_root: "/home/maia-user/Tutorials/NVFlare_POC/example_project/prod_00/site-2/Spleen_Bundle"

Overwriting /home/maia-user/shared/Clients/site-2.yaml


In [7]:
%%writefile /home/maia-user/shared/src/Spleen/requirements.txt
pymaia-learn==1.2.1
nnunetv2==2.5.1
fire
pytorch-ignite==0.4.11
monai[nibabel, skimage, scipy, pillow, tensorboard, gdown, ignite, torchvision, itk, tqdm, pandas, mlflow, matplotlib, pydicom]
pydicom==2.4.4
minio
numpy==1.24.0
pandas
mlflow
monai-nvflare

Overwriting /home/maia-user/shared/src/Spleen/requirements.txt


In [67]:
experiment = "Spleen"
clients = [
    "site-1",
    "site-2"
]

In [68]:
%%bash

rm -r /home/maia-user/shared/Jobs/Spleen/

In [69]:
from generate_jobs_config import generate_config

generate_config(
    [str(Path(ROOT_FOLDER).joinpath("Clients",client+".yaml")) for client in clients],
    str(Path(ROOT_FOLDER).joinpath("Experiments",experiment+".yaml")),
    str(Path(ROOT_FOLDER).joinpath("src",experiment)),
    str(Path(ROOT_FOLDER).joinpath("Jobs",experiment)),
    nvflare_exec="/home/maia-user/.conda/envs/NVFlare/bin/nvflare",
    pymaia_nvflare_exec="pymaia_nvflare"
    
)


The following are the variables you can change in the template

---------------------------------------------------------------------------------------------------------------------------------------
                                                                                                                                       
  job folder: /home/maia-user/shared/Jobs/Spleen/jobs/prepare                                                                            
                                                                                                                                       
---------------------------------------------------------------------------------------------------------------------------------------
  file_name                      var_name                       value                               component                          
---------------------------------------------------------------------------------------------------------------------

In [70]:
JOB_DIR=str(Path(ROOT_FOLDER).joinpath("Jobs",experiment))

In [60]:
client_id = "site-2"

### Create and Modify Job Configuration

Jobs configuration files are automatically created, but can be recreated or modified using the following NVFlare command:

```bash
nvflare job create -j <CUSTOM_JOB_DIR>/<TASK_NAME> -w <JOB_DIR>/<TASK_NAME  -sd nnUNet_NVFlare/ --force
```

```bash
nvflare job show_variables -j <CUSTOM_JOB_DIR>/<TASK_NAME>
```

```bash
nvflare job create -j <CUSTOM_JOB_DIR>/<TASK_NAME> -w <JOB_DIR>/<TASK_NAME  -sd nnUNet_NVFlare/ -f <app>/<file_name> var_name=new_value
```

## 0. Check Python Packages

In [25]:
from pathlib import Path

job_id = sess.submit_job(str(Path(JOB_DIR).joinpath("jobs","check_client_packages")))

In [26]:
sess.monitor_job(job_id)

<MonitorReturnCode.JOB_FINISHED: 0>

In [46]:
client_id = "site-2"

In [31]:
#print(sess.api.do_command(f"cat server log.txt")['data'][0]['data'])
#print(sess.api.do_command(f"cat {client_id} {job_id}/log.txt")['data'][0]['data'])
print(sess.api.do_command(f"cat {client_id} nvidia-smi.log")['data'][0]['data'])
#print(sess.api.do_command(f"cat {client_id} init_logfile_out.log")['data'][0]['data'])
#print(sess.api.do_command(f"cat {client_id} init_logfile_err.log")['data'][0]['data'])

Mon Dec 16 13:23:30 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01             Driver Version: 535.183.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  NVIDIA GeForce RTX 3090        Off | 00000000:01:00.0 Off |                  N/A |
| 30%   35C    P8              26W / 350W |    261MiB / 24576MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

In [32]:
job_dir = sess.download_job_result(job_id)

In [33]:
import json
from pathlib import Path

with open(Path(job_dir).joinpath("workspace","package_report","package_report.json"),"r") as f:
    package_report = json.load(f)

In [34]:
package_report

{'site-2': {'nvflare': 'nvflare 2.5.2 is installed.',
  'pymaia-learn': 'pymaia-learn 1.2.1 is installed.',
  'torch': 'torch 2.5.1 is installed.',
  'monai': 'monai 1.4.0 is installed.',
  'numpy': 'numpy 1.24.0 is installed.',
  'nnunetv2': 'nnunetv2 2.5.1 is installed.',
  'MONAI': '1.4.0',
  'Numpy': '1.24.0',
  'Pytorch': '',
  'Pytorch Ignite': '0.4.11',
  'ITK': '5.4.0',
  'Nibabel': '5.3.2',
  'scikit-image': '0.25.0',
  'scipy': '1.14.1',
  'Pillow': '11.0.0',
  'Tensorboard': '2.18.0',
  'gdown': '5.2.0',
  'TorchVision': '0.20.1',
  'tqdm': '4.67.1',
  'lmdb': 'NOT INSTALLED or UNKNOWN VERSION.',
  'psutil': '6.1.0',
  'pandas': '2.2.3',
  'einops': '0.8.0',
  'transformers': 'NOT INSTALLED or UNKNOWN VERSION.',
  'mlflow': '2.19.0',
  'pynrrd': 'NOT INSTALLED or UNKNOWN VERSION.',
  'clearml': 'NOT INSTALLED or UNKNOWN VERSION.',
  'Num GPUs': '1',
  'cuDNN enabled': 'True',
  'GPU 0 Name': 'NVIDIA GeForce RTX 3090',
  'GPU 0 Total memory  GB ': '23.7',
  'CPU_Cores': 20,
 

## 1. Prepare Dataset

In this step, the dataset in the different sites will be prepared for training, harmonizing the data and creating the necessary files for training according to the nnUNet framework. The following python packages will be installed during the execution of this step:

```text
pymaia-learn==1.1
nnunetv2==2.5.1
fire
pytorch-ignite==0.4.11
monai[nibabel, skimage, scipy, pillow, tensorboard, gdown, ignite, torchvision, itk, tqdm, pandas, mlflow, matplotlib, pydicom]
```    

### Submit the Job

In [35]:
from pathlib import Path

job_id = sess.submit_job(str(Path(JOB_DIR).joinpath("jobs","prepare")))

In [36]:
sess.monitor_job(job_id)

<MonitorReturnCode.JOB_FINISHED: 0>

To monitor the job, or print the logs from either the server or the client side, you can use the following commands:

In [None]:
#print(sess.api.do_command(f"cat server log.txt")['data'][0]['data'])
#print(sess.api.do_command(f"cat {client_id} {job_id}/log.txt")['data'][0]['data'])
#print(sess.api.do_command(f"cat {client_id} init_logfile_out.log")['data'][0]['data'])
#print(sess.api.do_command(f"cat {client_id} init_logfile_err.log")['data'][0]['data'])

#print(sess.api.do_command(f"cat {client_id} logfile_datadir.log")['data'][0]['data'])
#print(sess.api.do_command(f"cat {client_id} logfile_storage.log")['data'][0]['data'])

#print(sess.api.do_command(f"cat {client_id} logfile_prepare_out.log")['data'][0]['data'])
#print(sess.api.do_command(f"cat {client_id} logfile_prepare_err.log")['data'][0]['data'])

### Download the Job Results
When the job is completed, you can download the results, which, for the Prepare Dataset job, will contain the dataset.json file, containing the information about the dataset. In detail, for each client, the dataset.json file will list the dataset files and wheter all the files are valid or not.

In [37]:
job_dir = sess.download_job_result(job_id)

In [38]:
import json
from pathlib import Path

with open(Path(job_dir).joinpath("workspace","prepare_dataset","data_prepare.json"),"r") as f:
    dataset_dict = json.load(f)

To display the PyMAIA Experiment Configuration, run the following command:

In [42]:
dataset_dict[client_id]['nnUNet_Dataset']['imagesTr']#['pymaia_config']
#dataset_dict[client_id]['pymaia_config']

['spleen_49_0000.nii.gz',
 'spleen_8_0000.nii.gz',
 'spleen_16_0000.nii.gz',
 'spleen_31_0000.nii.gz',
 'spleen_52_0000.nii.gz',
 'spleen_56_0000.nii.gz',
 'spleen_40_0000.nii.gz',
 'spleen_61_0000.nii.gz',
 'spleen_62_0000.nii.gz',
 'spleen_22_0000.nii.gz',
 'spleen_9_0000.nii.gz',
 'spleen_45_0000.nii.gz',
 'spleen_59_0000.nii.gz',
 'spleen_53_0000.nii.gz',
 'spleen_17_0000.nii.gz',
 'spleen_41_0000.nii.gz',
 'spleen_13_0000.nii.gz',
 'spleen_60_0000.nii.gz',
 'spleen_3_0000.nii.gz',
 'spleen_46_0000.nii.gz',
 'spleen_28_0000.nii.gz',
 'spleen_21_0000.nii.gz',
 'spleen_14_0000.nii.gz',
 'spleen_10_0000.nii.gz',
 'spleen_47_0000.nii.gz',
 'spleen_26_0000.nii.gz',
 'spleen_19_0000.nii.gz',
 'spleen_32_0000.nii.gz',
 'spleen_33_0000.nii.gz',
 'spleen_29_0000.nii.gz',
 'spleen_6_0000.nii.gz']

In [43]:
dataset_dict[client_id]['nnUNet_Dataset']

{'imagesTr': ['spleen_49_0000.nii.gz',
  'spleen_8_0000.nii.gz',
  'spleen_16_0000.nii.gz',
  'spleen_31_0000.nii.gz',
  'spleen_52_0000.nii.gz',
  'spleen_56_0000.nii.gz',
  'spleen_40_0000.nii.gz',
  'spleen_61_0000.nii.gz',
  'spleen_62_0000.nii.gz',
  'spleen_22_0000.nii.gz',
  'spleen_9_0000.nii.gz',
  'spleen_45_0000.nii.gz',
  'spleen_59_0000.nii.gz',
  'spleen_53_0000.nii.gz',
  'spleen_17_0000.nii.gz',
  'spleen_41_0000.nii.gz',
  'spleen_13_0000.nii.gz',
  'spleen_60_0000.nii.gz',
  'spleen_3_0000.nii.gz',
  'spleen_46_0000.nii.gz',
  'spleen_28_0000.nii.gz',
  'spleen_21_0000.nii.gz',
  'spleen_14_0000.nii.gz',
  'spleen_10_0000.nii.gz',
  'spleen_47_0000.nii.gz',
  'spleen_26_0000.nii.gz',
  'spleen_19_0000.nii.gz',
  'spleen_32_0000.nii.gz',
  'spleen_33_0000.nii.gz',
  'spleen_29_0000.nii.gz',
  'spleen_6_0000.nii.gz'],
 'labelsTr': ['spleen_9.nii.gz',
  'spleen_31.nii.gz',
  'spleen_28.nii.gz',
  'spleen_59.nii.gz',
  'spleen_10.nii.gz',
  'spleen_47.nii.gz',
  'spleen_4

And to inspect the dataset, run the following command:

In [47]:
verified = True

for case in dataset_dict[client_id]["train"]:
    for key in case:
        if key.endswith("_is_file") and not case[key]:
            file = case[key[:-len("_is_file")]]
            print(f"Error: {file} is not a valid file!")
            verified = False
if verified:
    print(f"Dataset succesfully verified for client {client_id}")

Error: /home/maia-user/Tutorials/NVFlare_POC/data/site-2/.ipynb_checkpoints/.ipynb_checkpoints_CT.nii.gz is not a valid file!
Error: /home/maia-user/Tutorials/NVFlare_POC/data/site-2/.ipynb_checkpoints/.ipynb_checkpoints_label.nii.gz is not a valid file!


In [48]:
print(len(dataset_dict[client_id]["train"]))

25


## 2. Plan and Preprocess

After the dataset has been prepared, the nnUNet experiment has to be planned and the data preprocessed. This step has to be done only on one site, as the nnUNet plans will be shared with the other sites.

The steps to plan and preprocess the nnUNet experiment are the following:
1. Run the `plan_and_preprocess` job on the chosen site.
2. Extract the nnUNet plans from the job results.
3. Share the nnUNet plans with the other sites.
4. Run the `preprocess` job on the other sites.

### Submit the Job

In [55]:
from pathlib import Path

job_id = sess.submit_job(str(Path(JOB_DIR).joinpath("jobs","plan_and_preprocess")))

In [56]:
sess.monitor_job(job_id)

<MonitorReturnCode.JOB_FINISHED: 0>

In [None]:
#print(sess.api.do_command(f"tail server log.txt")['data'][0]['data'])
#print(sess.api.do_command(f"cat {client_id} {job_id}/log.txt")['data'][0]['data'])
print(sess.api.do_command(f"cat {client_id} logfile_plan_and_preprocess_out.log")['data'][0]['data'])
print(sess.api.do_command(f"cat {client_id} logfile_plan_and_preprocess_err.log")['data'][0]['data'])

In [57]:
job_dir = sess.download_job_result(job_id)

### Inspect nnUNetPlans.json

The nnUNet plans are stored in the `nnUNetPlans.json` file, which contains the configuration for the nnUNet experiment. The file can be found in the `workspace/nnUNet_preprocessing` folder of the job results.

In [58]:
import json

with open(Path(job_dir).joinpath("workspace","nnUNet_preprocessing","nnUNetPlans.json"),"r") as f:
    nnunet_plans = json.load(f)

In [59]:
print(json.dumps(nnunet_plans,indent=4))

{
    "site-1": {
        "dataset_name": "Dataset109_Task09_Spleen",
        "plans_name": "nnUNetPlans",
        "original_median_spacing_after_transp": [
            5.0,
            0.7929689884185791,
            0.7929689884185791
        ],
        "original_median_shape_after_transp": [
            83,
            512,
            512
        ],
        "image_reader_writer": "SimpleITKIO",
        "transpose_forward": [
            0,
            1,
            2
        ],
        "transpose_backward": [
            0,
            1,
            2
        ],
        "configurations": {
            "2d": {
                "data_identifier": "nnUNetPlans_2d",
                "preprocessor_name": "DefaultPreprocessor",
                "batch_size": 12,
                "patch_size": [
                    512,
                    512
                ],
                "median_image_size_in_voxels": [
                    512.0,
                    512.0
                ],
         

### Copy nnUNetPlans into Transfer Folder

To share the nnUNet plans with the other sites, copy the `nnUNetPlans.json` file into the `src/Spleen/pymaia_nvflare` Folder.

In [61]:
with open(Path("/home/maia-user/shared/src/Spleen/pymaia_nvflare/nnUNetPlans.json"),"w") as f:
    json.dump(nnunet_plans,f)

## 3. Preprocess

After the nnUNet plans have been shared with the other sites, the data has to be preprocessed according to the nnUNet plans. This step has to be done on all the sites, except the one where the nnUNet experiment has been planned.

### Submit Job

In [34]:
job_id = sess.submit_job(str(Path(JOB_DIR).joinpath("jobs","preprocess")))

In [35]:
sess.monitor_job(job_id)

<MonitorReturnCode.JOB_FINISHED: 0>

In [98]:
#print(sess.api.do_command(f"cat server log.txt")['data'][0]['data'])
print(sess.api.do_command(f"cat {client_id} logfile_preprocess_out.log")['data'][0]['data'])
print(sess.api.do_command(f"cat {client_id} {job_id}/log.txt")['data'][0]['data'])
print(sess.api.do_command(f"cat {client_id} logfile_preprocess_err.log")['data'][0]['data'])

Dataset109_Task09_Spleen
Preprocessing dataset Dataset109_Task09_Spleen
Configuration: 3d_fullres...
2024-12-16 14:13:18,552 - worker_process - INFO - Worker_process started.
2024-12-16 14:13:18,584 - AuxRunner - INFO - registered aux handler for topic __end_run__
2024-12-16 14:13:18,584 - AuxRunner - INFO - registered aux handler for topic __do_task__
2024-12-16 14:13:18,652 - CoreCell - INFO - site-2.aadd4d58-ad6c-4e6f-acba-ae0df79d03ce: created backbone internal connector to tcp://localhost:7367 on parent
2024-12-16 14:13:18,653 - CoreCell - INFO - site-2.aadd4d58-ad6c-4e6f-acba-ae0df79d03ce: created backbone external connector to grpc://localhost:8002
2024-12-16 14:13:18,653 - nvflare.fuel.f3.sfm.conn_manager - INFO - Connector [CH00001 ACTIVE tcp://localhost:7367] is starting
2024-12-16 14:13:18,653 - nvflare.fuel.f3.sfm.conn_manager - INFO - Connector [CH00002 ACTIVE grpc://localhost:8002] is starting
2024-12-16 14:13:18,654 - nvflare.fuel.f3.sfm.conn_manager - INFO - Connection 

## 4. MONAI Bundle nnUNet Training

Once the dataset has been prepared and preprocessed following the nnUNet plans, the nnUNet training is ready to begin. In this phase, we will train an nnUNet model using a MONAI Bundle. It’s important to note that the training at this stage will be conducted separately at each site, with the model saved in the MONAI Bundle format. There will be no aggregation of local gradients and no Federated Learning applied in this step. This approach is intended to test the training process and evaluate the model at each individual site, establishing a baseline before moving on to the Federated Learning phase.


To resume the training process from an existing checkpoint:

```bash
nvflare job create -j <JOB_DIR>/jobs/bundle_training -w <JOB_DIR>/bundle_training  -sd nnUNet_NVFlare/ -f bundle_training-client-<CLIENT_NAME>/config_fed_client.conf resume_training=epoch_number
```

In [None]:
from pathlib import Path

job_id = sess.submit_job(str(Path(JOB_DIR).joinpath("jobs","bundle_training")))

In [None]:
job_id = "e5072797-f93b-4d43-9bbd-6d947f894856"

sess.monitor_job(job_id)

In [None]:
job_dir = sess.download_job_result(job_id)

In [None]:
from pathlib import Path


with open(Path(job_dir).joinpath("workspace","run_bundle_train","cmd.sh"),"r") as f:
    print(f.read())

To check the training logs, run the following command:

In [None]:

#bundle_name = "<BUNDLE_NAME>" #AutoPETLymphoma_nnUNet
print(sess.api.do_command(f"cat {client_id} {job_id}/log.txt")['data'][0]['data'])
print(sess.api.do_command(f"cat {client_id} logfile.log")['data'][0]['data'])
print(sess.api.do_command(f"cat {client_id} logfile_train_out.log")['data'][0]['data'])
print(sess.api.do_command(f"cat {client_id} logfile_train_err.log")['data'][0]['data'])
#print(sess.api.do_command(f"ls {client_id} {bundle_name}/models")['data'][0]['data'])

## 4.1 Federated Learning Training

After the individual training at each site, the Federated Learning training can be started. The Federated Learning training will aggregate the local gradients from the different sites and update the global model. The Federated Learning training will be conducted in rounds, with each round consisting of a number of local epochs. The Federated Learning training will be conducted in a secure and privacy-preserving manner, with the data remaining on the client side and only the local gradients being shared with the server.

Prior to starting the FL training, follow the steps below to correctly configure the federation:

1. Copy the `<BUNDLE_ROOT>/src` folder to the src folder of the NVFlare jobs.

2. Upload `requirements.txt` and the MONAI Bundle to the server. 

3. Upload the `plans.json` and `dataset.json` files to the server (in `<BUNDLE_ROOT>/models`).
    `dataset.json` can be in the form:
    ```json
        {
        "task": "Dataset109_Task09_Spleen",
        "dim": 3,
        "test_labels": true,
        "tensorImageSize": "4D",
        "channel_names": {"0": "ct"},
        "labels": {"background": 0, "Spleen": 1},
        "numTraining": 0,
        "numTest": 0,
        "training": [],
        "test": [],
        "file_ending": ".nii.gz"
    }
    ```
4. Specify `BUNDLE_ROOT` in the server bundle configuration file (`<BUNDLE_ROOT>/configs/train.yaml`).

5. Install the required packages on the server:
    ```bash
    pip install -r requirements.txt
    ```

6. Reconfigure the clients accordingly, adding:
    ```yaml
    pymaia_config_file: "<RESULTS_FOLDER>/Dataset<TASK_ID>_<TASK_NAME>.json"
    nnunet_model_folder: "<RESULTS_FOLDER>/Dataset<TASK_ID>_<TASK_NAME>/<TRAINER_CLASS>__<PLANS_IDENTIFIER>_3d_fullres"
    bundle_root: "<BUNDLE_ROOT>"
    ```
7. Specify the following parameters in the `Experiment` configuration:
   ```yaml
    num_rounds: 100
    server_bundle_root: "/workspace/Spleen_Bundle"
    start_round: 0
    local_epochs: 10
    Bundle_File: "<BUNDLE_FILE>"
    ```

        
8. Copy the nnUNet Bundle into the `src/<Project>` folder.

9. Re-execute the `generate_job_configs` script to update the job configurations.

### Prepare Bundle for FL Training

The MONAI Bundle has to be prepared for the Federated Learning training. In this step, the MONAI Bundle will be extracted on client, and the train and evaluation bundle parameters will be overwritten to match the Federated Learning training configuration.

To prepare the MONAI Bundle for the Federated Learning training, run the following command:


In [22]:
from pathlib import Path

job_id = sess.submit_job(str(Path(JOB_DIR).joinpath("jobs","prepare_bundle")))

In [23]:
sess.monitor_job(job_id)

<MonitorReturnCode.JOB_FINISHED: 0>

In [24]:
print(sess.api.do_command(f"cat {client_id} {job_id}/log.txt")['data'][0]['data'])

2024-12-16 15:01:02,444 - worker_process - INFO - Worker_process started.
2024-12-16 15:01:02,476 - AuxRunner - INFO - registered aux handler for topic __end_run__
2024-12-16 15:01:02,476 - AuxRunner - INFO - registered aux handler for topic __do_task__
2024-12-16 15:01:02,545 - CoreCell - INFO - site-2.140eb6cc-3036-4721-a247-9497f8392cdd: created backbone internal connector to tcp://localhost:7367 on parent
2024-12-16 15:01:02,545 - CoreCell - INFO - site-2.140eb6cc-3036-4721-a247-9497f8392cdd: created backbone external connector to grpc://localhost:8002
2024-12-16 15:01:02,545 - nvflare.fuel.f3.sfm.conn_manager - INFO - Connector [CH00001 ACTIVE tcp://localhost:7367] is starting
2024-12-16 15:01:02,546 - nvflare.fuel.f3.sfm.conn_manager - INFO - Connector [CH00002 ACTIVE grpc://localhost:8002] is starting
2024-12-16 15:01:02,546 - FederatedClient - INFO - Wait for client_runner to be created.
2024-12-16 15:01:02,547 - nvflare.fuel.f3.sfm.conn_manager - INFO - Connection [CN00002 127

In [25]:
job_dir = sess.download_job_result(job_id)

import json
from pathlib import Path

with open(Path(job_dir).joinpath("workspace","run_bundle_prepare","bundle_prepare.json"),"r") as f:
    print(json.load(f))

{'site-2': {'Bundle_Path': '/home/maia-user/Tutorials/NVFlare_POC/example_project/prod_00/site-2/startup/../140eb6cc-3036-4721-a247-9497f8392cdd/app_site-2/custom/Spleen_Bundle'}, 'site-1': {'Bundle_Path': '/home/maia-user/Tutorials/NVFlare_POC/example_project/prod_00/site-1/startup/../140eb6cc-3036-4721-a247-9497f8392cdd/app_site-1/custom/Spleen_Bundle'}}


### Run FL Training

In [None]:
from pathlib import Path

job_id = sess.submit_job(str(Path(JOB_DIR).joinpath("jobs","fl_bundle")))

In [64]:
sess.abort_job(job_id)

In [72]:
client_id = "site-1"
print(sess.api.do_command(f"cat {client_id} {job_id}/log.txt")['data'][0]['data'])

session_inactive


To download the trained models from the clients and upload them to the server, run the following command:

In [9]:
job_dir = sess.download_job_result(job_id)

In [None]:
import minio

client = minio.Minio(
            endpoint="<MINIO_URL>",
              access_key="<MINIO_ACCESS_KEY>",
                secret_key="<MINIO_SECRET_KEY>",
                secure=True
)

In [None]:
client.make_bucket("<EXPERIMENT_NAME>")

client.fput_object("<EXPERIMENT_NAME>","Lymphoma_nnUNet_Bundle/models/FL_global_model.pt","FL_global_model.pt")
client.fput_object("<EXPERIMENT_NAME>","Lymphoma_nnUNet_Bundle/models/best_FL_global_model.pt","best_FL_global_model.pt")


## Resume FL Training

To resume the Federated Learning training from an existing checkpoint:

1. Download the global models from the server (``` sess.download_job_result(job_id)```) and upload them to the MONAI Bundle in the server (`<BUNDLE_ROOT>/models`)

2. Add the following parameters to the server configuration file:
    ```yaml
    nnunet_trainer_def:
      pretrained_model: "<BUNDLE_ROOT>/models/FL_global_model.pt"
    ```

3. Update the Experiment Configuration file with the new start_round and num_rounds values.

    ```yaml
    start_round: <START_ROUND>
    num_rounds: <INITIAL_ROUNDS - START_ROUND>
    ```
4. Re-execute the `generate_job_configs` script to update the job configurations.

## 5. MONAI Bundle to nnUNet Conversion

Once the training is completed, the model can be converted to the nnUNet format. This step is necessary to be able to validate the model using the nnUNet framework.

Since the training is performed using the MONAI Bundle format, the model has to be converted to the nnUNet format. The conversion will be done using the `convert_model` job, which will convert the MONAI Bundle model to the nnUNet format.

In [None]:
from pathlib import Path
job_id = sess.submit_job(str(Path("<JOB_DIR>").joinpath("jobs","convert_model")))

In [None]:
sess.monitor_job(job_id)

In [None]:
client_id = "KTH-Cloud"
print(sess.api.do_command(f"cat {client_id} {job_id}/log.txt")['data'][0]['data'])
print(sess.api.do_command(f"cat server log.txt")['data'][0]['data'])

In [None]:
job_dir = sess.download_job_result(job_id)

In [None]:
from pathlib import Path


with open(Path(job_dir).joinpath("workspace","model_convert","path.txt"),"r") as f:
    print(f.read())

## 5.1 Fed Model Conversion

In [None]:
job_id = sess.submit_job(str(Path(JOB_DIR).joinpath("jobs","convert_fed_model")))

## 6. Validation

After the model has been converted to the nnUNet format, it can be validated using the nnUNet framework. The validation will be done using the `validate` job, which will validate the model using the nnUNet framework.

In [None]:
from pathlib import Path
job_id = sess.submit_job(str(Path("<JOB_DIR>").joinpath("jobs","validate")))

In [None]:
print(sess.api.do_command(f"cat {client_id} logfile_validate_out.log")['data'][0]['data'])
print(sess.api.do_command(f"cat {client_id} logfile_validate_err.log")['data'][0]['data'])

In [None]:
print(sess.api.do_command(f"cat {client_id} {job_id}/log.txt")['data'][0]['data'])

To extract the validation results and display them in a table, run the following commands:

In [None]:
job_dir = sess.download_job_result(job_id)

In [None]:
import json
from pathlib import Path


with open(Path(job_dir).joinpath("workspace","validation","summary.json"),"r") as f:
    summary = json.load(f)

In [None]:
config_dict = {
"label_dict": {"background": 0,
    "Lesion": 1
},
    "label_suffix":".nii.gz"
}

In [None]:
df = []

label_to_name = {v: k for k, v in config_dict["label_dict"].items()}

for case in summary['metric_per_case']:
    for label_id in case['metrics']:
        for metric in case['metrics'][label_id]:
            df.append({
                "Case": Path(case['reference_file']).name[:-len(config_dict["label_suffix"])],
                "Label": label_to_name[int(label_id)],
                "Metric": metric,
                "Value": case['metrics'][label_id][metric]
            })

In [None]:
import pandas as pd
df = pd.DataFrame(df)

In [None]:
import dtale.app as dtale_app
import dtale

dtale_app.JUPYTER_SERVER_PROXY = False

d = dtale.show(df,
               host="127.0.0.1",
              )

In [None]:
DTALE_URL = d._main_url

In [None]:
DTALE_URL

## 7. Package MONAI Bundle


After the model has been validated, it can be packaged into a MONAI Bundle. The MONAI Bundle will contain the model weights and the necessary configuration files to run the model. The packaging will be done using the `package_monai_bundle` job, which will package the model into a MONAI Bundle.

The MONAI Bundle will be stored in the MLFlow server, available for download and deployment.

To combine the MONAI Bundle with an existing MLFlow experiment, run the following command:
```bash
nvflare job create -j <JOB_DIR>/jobs/package_monai_bundle -w <JOB_DIR>/package_monai_bundle  -sd nnUNet_NVFlare/ -f package_monai_bundle-client-<CLIENT_NAME>/config_fed_client.conf MLFlow_Run_ID=<MLFLOW_RUN_ID>
```

In [None]:
from pathlib import Path


job_id = sess.submit_job(str(Path("<JOB_DIR>").joinpath("jobs","package_monai_bundle")))

In [None]:
client_id = "<CLIENT_NAME>"

print(sess.api.do_command(f"cat {client_id} logfile_package_out.log")['data'][0]['data'])
print(sess.api.do_command(f"cat {client_id} logfile_package_err.log")['data'][0]['data'])
print(sess.api.do_command(f"cat {client_id} {job_id}/log.txt")['data'][0]['data'])

In [None]:
job_dir = sess.download_job_result(job_id)

### 7.1 Package external nnUNet Experiments

In case the nnUNet experiment has been planned, preprocessed and trained indipendentely from the NVFlare API, the model can be packaged into a MONAI Bundle using the `package_monai_bundle` job.
To package an external nnUNet experiment, run the following command:
```bash
nvflare job create -j <JOB_DIR>/jobs/package_monai_bundle -w <JOB_DIR>/package_monai_bundle  -sd nnUNet_NVFlare/ -f package_monai_bundle-client-<CLIENT_NAME>/config_fed_client.conf external_training=True
```

## 8. Cross Site Evaluation

In [None]:
!nvflare job create -j jobs/cross_Site_Evaluation_nnUNet -w cross_Site_Evaluation_nnUNet/  -sd nnUNet_NVFlare/ --force 

In [None]:
from pathlib import Path


job_id = sess.submit_job(str(Path("..").joinpath("..","jobs","cross_Site_Evaluation_nnUNet")))

In [None]:
sess.abort_job(job_id)

In [None]:
client_id = "msf"
print(sess.api.do_command(f"cat {client_id} {job_id}/log.txt")['data'][0]['data'])
#print(sess.api.do_command(f"ls {client_id} Cross-Site-Validation/kth-cloud_positive_baseline_v2/extra_files")['data'][0]['data'])
#print(sess.api.do_command(f"ls {client_id} Cross-Site-Validation/bundles/kth-cloud_positive_baseline_v2/models")['data'][0]['data'])
#print(sess.api.do_command(f"ls {client_id} Cross-Site-Validation/inputs")['data'][0]['data'])

In [None]:
job_dir = sess.download_job_result(job_id)

In [None]:
import json
from pathlib import Path


with open(Path(job_dir).joinpath("workspace","validation","summary.json"),"r") as f:
    cross_site_summary = json.load(f)