# FedLymphoma

## Connect to the Federation using NVFlare

In [None]:
username = "<YOUR_USERNAME>"

In [None]:
from nvflare.fuel.flare_api.flare_api import new_secure_session

sess = new_secure_session(
    username,
    f"StartupKit/{username}",
)

In [None]:
print(sess.get_system_info())

## List Jobs

In [None]:
jobs = sess.list_jobs()

In [None]:
jobs[-1]

## Terminate Jobs

In [None]:
sess.abort_job(job_id)

## Job Preparation

The first step in the Federated Learning process is to prepare the job configuration files. The job configuration files contain the necessary information to run the job on the Federated Learning cluster, such as the job type, the resources required, and the parameters for the job execution.

The job configurations files are automatically generated by the script `nvflare_generate_job_configs`, which is installed together with this package. The script takes as input client-specific configuration, together with the experiment-specific configuration, and generates the job configuration files for each client.


### Client Configuration
The client-specific configuration file should be in the following format:

```yaml
data_dir: "<DATASET_FOLDER>"
modality_dict:
  ct: "<CT_SUFFIX>"
  label: "<SEG_MASK_SUFFIX>"
dataset_format: "<DATASET_FORMAT>"
patient_id_in_file_identifier: True
nnunet_root_folder: "<NNUNET_ROOT_FOLDER>"
client_name: "<CLIENT_NAME>"
subfolder_suffix: "<SUBFOLDER_SUFFIX>[OPTIONAL]"
bundle_root: "<BUNDLE_ROOT>[OPTIONAL]"
```

where:

`dataset_format` should refer to one of  these three different formats, according to the `data_dir` structure:
1. `subfolders`: The dataset is organized in subfolders, where each subfolder corresponds to a subject and contains the images and labels for that subject.
```plaintext
  [Dataset_folder]
        [Subject_0]
            - Subject_0_image0.nii.gz    # Subject_0 modality 0
            - Subject_0_image1.nii.gz    # Subject_0 modality 1
            - Subject_0_mask.nii.gz      # Subject_0 semantic segmentation mask
        [Subject_1]
            - Subject_1_image0.nii.gz    # Subject_1 modality 0
            - Subject_1_image1.nii.gz    # Subject_1 modality 1
            - Subject_1_mask.nii.gz      # Subject_1 semantic segmentation mask
        ...

```
2. `decathlon`: The dataset is organized in the format of the Medical Decathlon challenge, where the images and labels are stored in separate folders.
```plaintext
  [Dataset_folder]
        [imagesTr]
            - Subject_0_image0.nii.gz    # Subject_0 modality 0
            - Subject_0_image1.nii.gz    # Subject_0 modality 1
            - Subject_1_image0.nii.gz    # Subject_1 modality 0
            - Subject_1_image1.nii.gz    # Subject_1 modality 1
        [labelsTr]
            - Subject_1_mask.nii.gz      # Subject_1 semantic segmentation mask
            - Subject_0_mask.nii.gz      # Subject_0 semantic segmentation mask
        ...

```

3. `nnunet`: The dataset has been already prepared according to the nnUNet framework, with the images and labels stored in separate folders.
```plaintext
  [nnUNet_raw]
      [DatasetXYZ_TaskName]  # THIS IS THE DATASET FOLDER
          dataset.json
          [imagesTr]
              - Subject_0_image0.nii.gz    # Subject_0 modality 0
              - Subject_0_image1.nii.gz    # Subject_0 modality 1
              - Subject_1_image0.nii.gz    # Subject_1 modality 0
              - Subject_1_image1.nii.gz    # Subject_1 modality 1
          [labelsTr]
              - Subject_1_mask.nii.gz      # Subject_1 semantic segmentation mask
              - Subject_0_mask.nii.gz      # Subject_0 semantic segmentation mask
        ...

```

`nnunet_root_folder` should refer to the root folder used by the nnUnet framework, where the nnUNet experiments are stored.
For the `subfolders` and `decathlon` dataset formats, this folder is created during the dataset preparation step. 
For the `nnunet` dataset format, this folder should contain the nnUNet experiments, with the following structure:
```plaintext
  [nnunet_root_folder]
      [nnUNet_raw_data_base]
          [DatasetXYZ_TaskName]
              dataset.json
              [imagesTr]
                  - Subject_0_image0.nii.gz    # Subject_0 modality 0
                  - Subject_0_image1.nii.gz    # Subject_0 modality 1
                  - Subject_1_image0.nii.gz    # Subject_1 modality 0
                  - Subject_1_image1.nii.gz    # Subject_1 modality 1
              [labelsTr]
                  - Subject_1_mask.nii.gz      # Subject_1 semantic segmentation mask
                  - Subject_0_mask.nii.gz      # Subject_0 semantic segmentation mask
            ...

```

`modality_dict` is a dictionary that maps the modality names to the file suffixes. The suffixes are used to identify the files that correspond to the different modalities in the dataset. For example, if the CT images have the suffix `_CT.nii.gz`, the entry in the `modality_dict` should be `ct: "_CT.nii.gz"`.


`patient_id_in_file_identifier` is a flag used to specify if the patient ID is included in the file name. If this flag is set to `True`, the patient ID will be extracted from the file name. If this flag is set to `False`, the patient ID will be extracted from the file path. If set to `False`, the filename should only contain the modality suffix.

`client_name` is a unique identifier for the client.

`subfolder_suffix` is an optional parameter that specifies the suffix of the subfolders that contain the images and labels for each subject. This parameter is used when the dataset is organized in subfolders, and the subfolders have a specific suffix that needs to be removed to extract the patient ID.

`bundle_root` is an optional parameter that specifies the root folder where the MONAI Bundle is stored.

### Experiment Configuration

The experiment-specific configuration file should be in the following format:

```yaml
dataset_name_or_id: "<DATASET_NAME_OR_ID>"
experiment_name: "<EXPERIMENT_NAME>"
tracking_uri: "<TRACKING_URI>"
mlflow_token: "<MLFLOW_TOKEN>"
nnunet_trainer: "<NNUNET_TRAINER>[OPTIONAL]"
```
`dataset_name_or_id` and `experiment_ame` are used as a reference to the nnUNet Dataset ID and Experiment Name, respectively. These values are used to identify the nnUNet experiment in the nnUNet framework. `Experiment Name` is also used to identify the experiment in the MLFlow server, and to generate the zipped nnUNet model file (as `<Experiment Name>.zip`).


`mlflow_token` and `tracking_uri` are used to connect to the MLFlow server, where the experiments are logged, and the trained models are uploaded.

`nnunet_trainer` is an optional parameter that specifies the nnUNet trainer to be used for training. If this parameter is not specified, the default nnUNet trainer will be used.


### Prepare the Job Configuration Files

To prepare the job configuration files, run the following command:

```python
monai.nvflare.nvflare_generate_job_configs.generate_configs(client_files, experiment_file, script_dir, job_dir)
```
where:
- `<CLIENT_CONFIG_FILE_1>`, `<CLIENT_CONFIG_FILE_2>`, ... are the client-specific configuration files for each client.
- `<experiment_file>` is the experiment-specific configuration file.
- `<script_dir>` is the directory where the job scripts and python files are stored.
- `<job_dir>` is the directory where the job configuration files will be stored.

In [None]:
import sys
from pathlib import Path
 
ROOT_FOLDER = "/home/maia-user/shared"
sys.path.append(ROOT_FOLDER)

Path(ROOT_FOLDER).joinpath("Experiments").mkdir(parents=True, exist_ok=True)
Path(ROOT_FOLDER).joinpath("Clients").mkdir(parents=True, exist_ok=True)

In [None]:
%%writefile /home/maia-user/shared/Experiments/Lymphoma.yaml
dataset_name_or_id: "<EXPERIMENT_ID>"
experiment_name: "<EXPERIMENT_NAME>"
tracking_uri: "<MLFLOW_URI>"
nnunet_trainer: "nnUNetTrainer"

In [None]:
%%writefile /home/maia-user/shared/Clients/site-1.yaml
data_dir: "<DATASET_SITE_1_FOLDER>"
modality_dict:
  ct: ".nii.gz"
  label: ".nii.gz"
dataset_format: "decathlon"
patient_id_in_file_identifier: True
nnunet_root_folder: "<nnunet_root_folder>"
client_name: "site-1"

In [None]:
%%writefile /home/maia-user/shared/Clients/site-2.yaml
data_dir: "<DATASET_SITE_2_FOLDER>"
modality_dict:
  ct: "_CT.nii.gz"
  label: "_label.nii.gz"
dataset_format: "subfolders"
patient_id_in_file_identifier: True
nnunet_root_folder: "<nnunet_root_folder>"
client_name: "site-2"

In [None]:
%%writefile /home/maia-user/shared/src/Lymphoma/requirements.txt

fire
pytorch-ignite==0.4.11
pydicom==2.4.4
minio
numpy==1.24.0
pandas
mlflow
monai-nvflare==0.2.4
odict
pyhocon
monai[all]
git+https://github.com/SimoneBendazzoli93/MONAI.git@dev
git+https://github.com/SimoneBendazzoli93/nnUNet.git
#git network_architectures
#nnunetv2==2.5.1

In [None]:
experiment = "Spleen"
clients = [
    "site-1",
    "site-2"
]

Path(ROOT_FOLDER).joinpath("src").joinpath(experiment).mkdir(parents=True, exist_ok=True)

## Install MONAI with NVFlare Support

To install MONAI with NVFlare support, run the following command:

```python
!pip install git+https://github.com/SimoneBendazzoli93/MONAI.git@dev
```

In [None]:
%%bash
pip install git+https://github.com/SimoneBendazzoli93/MONAI.git@dev

In [None]:
from monai.nvflare.nvflare_generate_job_configs import generate_configs

generate_configs(
    [str(Path(ROOT_FOLDER).joinpath("Clients",client+".yaml")) for client in clients],
    str(Path(ROOT_FOLDER).joinpath("Experiments",experiment+".yaml")),
    str(Path(ROOT_FOLDER).joinpath("src",experiment)),
    str(Path(ROOT_FOLDER).joinpath("Jobs",experiment)),
    #nvflare_exec="/home/maia-user/.conda/envs/NVFlare/bin/nvflare",
    
)

In [None]:
JOB_DIR=str(Path(ROOT_FOLDER).joinpath("Jobs",experiment))