# Task1.2: Running nnU-Net

As seen in last task, the vanilla U-Net achives a performance that is not that bad. However, there is always room for more improvement. 

In this task, we will use the same dataset, ACDC, as in the previous one. There are many popular variants of the vanilla U-Net, one of which is the nnU-Net. nnU-Net (No New U-Net) is a self-configuring deep learning framework specifically designed for medical image segmentation tasks. It was introduced as a baseline model that autonomously adjusts its architecture, preprocessing, and training pipeline based on the characteristics of a given dataset, making it highly adaptable across different segmentation tasks without requiring manual tuning. Key features of nnUNet include:

+ Out-of-the-box Performance: It automatically configures the model to achieve state-of-the-art performance on various datasets without extensive user intervention.
+ Data-Driven Configurations: It customizes aspects like network architecture, patch size, and batch size depending on the size and properties of the input data.
+ Cross-dataset Applicability: nnUNet works well across different datasets and imaging modalities, like MRI, CT, and X-rays.
+ 3D and 2D Models: It supports both 2D and 3D U-Net models, adapting based on dataset dimensionality.
+ Robust Preprocessing: The framework includes built-in data augmentation and normalization techniques to enhance performance and generalization.

<img src="../img/nnU-Net.png" width="800" height="600">

For details of nn-UNet, please refer to the following paper. 

> Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J., & Maier-Hein, K. H. (2021). nnU-Net: a self-configuring 
method for deep learning-based biomedical image segmentation. Nature methods, 18(2), 203-211.

You need to run the pipeline on ACDC dataset according to the documentation as described in https://github.com/MIC-DKFZ/nnUNet/tree/master/documentation.

## Install nn-UNet

In [None]:
pip install nnunetv2

## Define Environmental Variables

nnU-Net requires the following environmental variables to be set: `nnUNet_raw`, `nnUNet_preprocessed`, `nnUNet_results`.

Here we define these variables.

In [None]:
# Define some variables
%env nnUNet_raw=../data/nnUNet/nnUNet_raw
%env nnUNet_preprocessed=../data/nnUNet/nnUNet_preprocessed
%env nnUNet_results=../data/nnUNet/nnUNet_results

To use the dataset conversion script, we need to make a bit modification to the downloaded ACDC dataset. In other words, we need to make sure the target path contains extracted 'training' and 'testing' sub-folders.

In [None]:
import os
import shutil

# todo: Enter the same path in previous task.
# Hint: The path_ACDC_all and path_ACDC_separate should be two different folders, and the first folder should have been created in task1.1
path_ACDC_all = None
path_ACDC_separate = None

if os.path.exists(path_ACDC_separate):
    shutil.rmtree(path_ACDC_separate)
os.makedirs(path_ACDC_separate)

training_path = os.path.join(path_ACDC_separate, "training")
testing_path = os.path.join(path_ACDC_separate, "testing")
os.makedirs(training_path)
os.makedirs(testing_path)

for folder in os.listdir(path_ACDC_all):
    if folder.startswith("patient"):
        patient_name = folder
        patient_id = int(patient_name.split("t")[2])
        if patient_id <= 100:
            shutil.copytree(os.path.join(path_ACDC_all, folder), os.path.join(training_path, folder))
        else:
            shutil.copytree(os.path.join(path_ACDC_all, folder), os.path.join(testing_path, folder))

There is a ready-to-use script that can be used to transform ACDC dataset to the desried format for nnU-Net. Please find the script and put it inside the `Task1.2` folder, renaming it as `dataset_conversion.py`.

In [None]:
!python ./dataset_conversion.py -i $path_ACDC_separate

What is the suffix of data in `imagesTr`? What does that represent? Could you think of a case that it needs to be changed?

The suffix is `0000`, which means that there is only one modality (short-axis) in the dataset. If there are multiple modalities, the suffix should be changed to represent the modality so that multi-modality training can be performed.

As the nnU-Net by default will train for 1000 epochs, please modify `self.num_epochs` to 200 in `<virtual env>/lib/<python>/site-packages/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py` so that the training will not take a long time.

As the dataset are prepared, you only need to finish the following steps to run nnU-Net on ACDC dataset.

+ Preprocess the dataset
+ Train the 2D model using the training set
+ Determine the best configuration of possible model. The training progress can be tracked in `nnUNet_results` folder.
+ Conduct Inference on the testing set, store the results in `labelsTs` folder.
+ Apply post-processing on the results

In [None]:
!nnUNetv2_plan_and_preprocess -d 027 --verify_dataset_integrity

In [None]:
!nnUNetv2_train 027 2d all --npz

In [None]:
!nnUNetv2_find_best_configuration 027 -c 2d

In [None]:
path_ACDC_testing_image = os.path.join("../data/nnUNet/nnUNet_raw/Dataset027_ACDC/", "imagesTs")
path_ACDC_testing_label = os.path.join("../data/nnUNet/nnUNet_raw/Dataset027_ACDC/", "labelsTs")
# post processsed
path_ACDC_testing_label_pp = os.path.join("../data/nnUNet/nnUNet_raw/Dataset027_ACDC/", "labelsTs_PP")
os.makedirs(path_ACDC_testing_label, exist_ok=True)
os.makedirs(path_ACDC_testing_label_pp, exist_ok=True)

In [None]:
!nnUNetv2_predict -d Dataset027_ACDC -i $path_ACDC_testing_image -o $path_ACDC_testing_label -f  0 1 2 3 4 -tr nnUNetTrainer -c 2d -p nnUNetPlans

In [None]:
!nnUNetv2_apply_postprocessing -i $path_ACDC_testing_label -o $path_ACDC_testing_label_pp -pp_pkl_file $nnUNet_results/Dataset027_ACDC/nnUNetTrainer__nnUNetPlans__2d/crossval_results_folds_0_1_2_3_4/postprocessing.pkl -np 8 -plans_json $nnUNet_results/Dataset027_ACDC/nnUNetTrainer__nnUNetPlans__2d/crossval_results_folds_0_1_2_3_4/plans.json

Use `torchmetrics` to calculate the Dice and Jaccard index as previous task for the last 30 patients after post-processing (121~150). Report the scores.

In [None]:
import random
import numpy as np
import torchmetrics
import torch

# Set the seed for reproducibility
torch.manual_seed(0)
np.random.seed(0)
torch.cuda.manual_seed(0)
random.seed(0)
device = "cuda" if torch.cuda.is_available() else "cpu"
print(device)

In [None]:
import sys

sys.path.append("..")
import nibabel as nib
from tqdm import tqdm
from utils.Task1_utils import load_config

config = load_config("../config/Task1.1_config.yaml")
NUM_CLASSES = config["dataset"]["number_classes"]
metrics = torchmetrics.MetricCollection(
    [
        # todo Evaluate Dice and Jaccard Index
        None,
        None
    ],
    prefix="metrics/",
)
test_metrics = metrics.clone(prefix="test_metrics/").to(device)
test_evaluator = test_metrics.clone().to(device)

for file in tqdm(os.listdir(path_ACDC_testing_label_pp)):
    if file.startswith("patient"):
        patient_name = file.split(".")[0]
        patient_id = int(patient_name.split("_")[0].split("t")[2])
        frame = patient_name.split("_")[1].split("e")[1]
        if patient_id > 120:
            # image_data = nib.load(os.path.join(path_ACDC_testing_image, f"{patient_name}_0000.nii.gz")).get_fdata()
            mask_data = nib.load(
                os.path.join(
                    path_ACDC_all,
                    f"patient{patient_id}",
                    f"patient{patient_id}_frame{frame}_gt.nii.gz",
                )
            ).get_fdata()
            pred_data = nib.load(
                os.path.join(path_ACDC_testing_label_pp, f"{patient_name}.nii.gz")
            ).get_fdata()
            for s in range(mask_data.shape[2]):
                preds_ = torch.from_numpy(pred_data[:, :, s].astype(np.uint8))
                masks_ = torch.from_numpy(mask_data[:, :, s].astype(np.uint8))
                test_evaluator.update(preds_, masks_)

In [None]:
print(test_evaluator.compute())

Please report the Dice and Jaccard Index score for nnUNet. How does it compare to the vanilla U-Net?