# Distribution-Aware Replay for Continual MRI Segmentation - Preprocessing
In this section we want to cover the preprocessing and preparation of the data used in our paper.
This section assumes that you have set the nnUNet related paths according to [this file](documentation/setting_up_paths.md).

Please download the hippocampus and prostate data from these links and place them in the `nnUNet_data_base/nnUNet_raw_data` folder.


| Anatomy     |Link                            
|-------------|-----------------------------------
| Hippocampus | http://medicaldecathlon.com/      
|             | https://datadryad.org/stash/dataset/doi:10.5061/dryad.gc72v
|             | http://www.hippocampal-protocol.net/SOPs/index.php
| Prostate    | https://liuquande.github.io/SAML/ 





## Preparation of the Hippocampus Data

In [None]:
raise NotImplementedError("This script is not yet implemented")

After preprocessing you should have the following datasets:
- Task097_DecathHip
- Task098_Dryad
- Task099_HarP

## Preparation of the Prostate Data

In [None]:
import numpy as np
import SimpleITK as sitk
import pandas as pd
from pathlib import Path
import os
from nnunet.dataset_conversion.utils import generate_dataset_json
import pathlib


def copyMetaData(original, img):
    img.SetSpacing(original.GetSpacing())
    img.SetOrigin(original.GetOrigin())
    originalDir = original.GetDirection()
    img.SetDirection((originalDir[0], originalDir[1], originalDir[2],
                    originalDir[3], originalDir[4], originalDir[5],
                    originalDir[6], originalDir[7], originalDir[8]))
    pass


def generate_dataset_json_method(BASE_PATH, TASK_NAME, VENDOR):
    generate_dataset_json(BASE_PATH + TASK_NAME + "/dataset.json",
        BASE_PATH + TASK_NAME + "/imagesTr",
        BASE_PATH + TASK_NAME + "/imagesTs",
        ('T1',),
        {0: 'background', 1: 'foreground' },
        TASK_NAME,
        dataset_description="VENDOR " + VENDOR) 


if __name__ == '__main__':
    
    INPUT_PATH = f"{os.environ['nnUNet_raw_data_base']}/nnUnet_raw_data_base/prostate_data"

    BASE_PATH = f"{os.environ['nnUNet_raw_data_base']}/nnUNet_raw_data/"
    TASK_NO = 10
    #TASK_NAME = "Task007_mHeart" + VENDOR 
    TASK_LIST = []

    for path, dirs, files in os.walk(INPUT_PATH):
        for dirs_ in dirs:
            TASK_NAME = "Task0" + str(TASK_NO) + "_" + "Prostate" + "-"  + dirs_
            TASK_LIST.append(TASK_NAME)
            
            
            #create output folder structure
            Path(BASE_PATH + TASK_NAME + "/imagesTr").mkdir(parents=True, exist_ok=True)
            Path(BASE_PATH + TASK_NAME + "/imagesTs").mkdir(parents=True, exist_ok=True)
            Path(BASE_PATH + TASK_NAME + "/labelsTr").mkdir(parents=True, exist_ok=True)
            
            TASK_NO = TASK_NO + 1
            

        for file in files:
            dirs_ = os.path.basename(os.path.normpath(pathlib.Path(os.path.join(path, file)).parent.resolve()))

            TASK_NAME = next(task for task in TASK_LIST if str(task.rpartition('-')[-1]) == dirs_)
            if "egmentation" in file: #"egmentation" since sometimes it's upper and lowercase
                original = sitk.ReadImage(os.path.join(path, file))
                arr = sitk.GetArrayFromImage(original).astype(np.float32)
                arr[arr != 0] = 1
                original = sitk.GetImageFromArray(arr)
                copyMetaData(sitk.ReadImage(INPUT_PATH +"/"+ dirs_ + "/" + file.split("_")[0] + ".nii.gz"), original)
                sitk.WriteImage(original, BASE_PATH + TASK_NAME + "/labelsTr/" + dirs_ + "_" + str(file.rpartition('_')[0])+ ".nii.gz")


            else: #images
                original = sitk.ReadImage(os.path.join(path, file))
                sitk.WriteImage(original, BASE_PATH + TASK_NAME + "/imagesTr/" + dirs_ + "_" + os.path.splitext(os.path.splitext(file)[0])[0] + "_0000" + ".nii.gz")
                
        
    for task in TASK_LIST:
        generate_dataset_json_method(BASE_PATH, task, str(task.rpartition('-')[-1]))
        


By default this script will produce tasks with ids 10 to 15. In case you want different ids assigned to the prostate datasets, you may want to change the ``TASK_NO`` variable, that defines the start id. In the following sections and the othernotebooks, we use the following task ids, names indicating the source dataset:
- Task011_Prostate-BIDMC
- Task012_Prostate-I2CVB
- Task013_Prostate-HK
- Task015_Prostate-UCL
- Task016_Prostate-RUNMC

If your tasks have different names, you can rename the folder names and change the task id. 

## Preprocessing

After preparing the datasets, we can run the command provided by nnUNet to preprocess the data for training:

In [None]:
!nnUNet_plan_and_preprocess -t 11
!nnUNet_plan_and_preprocess -t 12
!nnUNet_plan_and_preprocess -t 13
!nnUNet_plan_and_preprocess -t 15
!nnUNet_plan_and_preprocess -t 16

!nnUNet_plan_and_preprocess -t 97
!nnUNet_plan_and_preprocess -t 98
!nnUNet_plan_and_preprocess -t 99

Now, we want to generate a train, val, test split. As the nnUNet framework only generates a train, val split, we need to do this on our own by putting some of the training data into an additional test split. Unfortunatly, the nnUNet framework first creates its random split upon starting a training. So for each dataset we need to start a dummy training to force the nnUNet framework to generate the initial train, val split. Feel free to stop the training once you see a progress bar. 

In [None]:
!nnUNet_train_sequential 2d -t 11 -f 0 -num_epoch 1 -save_interval 25 -s seg_outputs --store_csv
!nnUNet_train_sequential 2d -t 12 -f 0 -num_epoch 1 -save_interval 25 -s seg_outputs --store_csv
!nnUNet_train_sequential 2d -t 13 -f 0 -num_epoch 1 -save_interval 25 -s seg_outputs --store_csv
!nnUNet_train_sequential 2d -t 15 -f 0 -num_epoch 1 -save_interval 25 -s seg_outputs --store_csv
!nnUNet_train_sequential 2d -t 16 -f 0 -num_epoch 1 -save_interval 25 -s seg_outputs --store_csv

!nnUNet_train_sequential 2d -t 97 -f 0 -num_epoch 1 -save_interval 25 -s seg_outputs --store_csv
!nnUNet_train_sequential 2d -t 98 -f 0 -num_epoch 1 -save_interval 25 -s seg_outputs --store_csv
!nnUNet_train_sequential 2d -t 99 -f 0 -num_epoch 1 -save_interval 25 -s seg_outputs --store_csv

Now, that we have generated an initial train, val split, we can run our [create_3_split.py](create_3_split.py) script to add an additional test split by taking 30% of the training cases.

In [None]:
!python create_3_split.py

This script will copy (symlink) the data and segmentations and add the test split in a new task by incrementing the task ID with 100. Task 11 will become 111, task 12 becomes task 112 and so on.

## Augemented Datasets
As the last step of the preprocessing we need to create copies of our datasets by augmenting them with MRI artifacts to be used in our OoD experiments

In [None]:
from dataset_augmentations import copy_dataset
import os
import torchio as tio

ROOT = os.path.join(os.environ['nnUNet_raw_data_base'], "nnUnet_raw_data")

transform = tio.OneOf({
    tio.RandomSpike(intensity=(3,5)): 1,
    tio.RandomBiasField(1): 1,
    tio.RandomGhosting(intensity=(3,5)): 1,
})

for src_task_name, dst_task_name in [("Task197_DecathHip", "Task400_DecathHipAugmented"),
                                     ("Task198_Dryad", "Task401_DryadAugmented"),
                                     ("Task199_HarP", "Task402_HarPAugmented"),
                                     
                                     ("Task111_Prostate-BIDMC", "Task403_Prostate-BIDMCAugmented"),
                                     ("Task112_Prostate-I2CVB", "Task404_Prostate-I2CVBAugmented"),
                                     ("Task113_Prostate-HK", "Task405_Prostate-HKAugmented"),
                                     ("Task115_Prostate-UCL", "Task406_Prostate-UCLAugmented"),
                                     ("Task116_Prostate-RUNMC", "Task407_Prostate-RUNMCAugmented")]:

    copy_dataset(os.path.join(ROOT, src_task_name), os.path.join(ROOT, dst_task_name), transform)
    os.makedirs(os.path.join(os.environ["nnUNet_preprocessed"], dst_task_name), exist_ok=True)
    if not os.path.exists(os.path.join(os.environ["nnUNet_preprocessed"], dst_task_name, "splits_final.pkl")):
        os.symlink(os.path.join(os.environ["nnUNet_preprocessed"], src_task_name, "splits_final.pkl"), 
                os.path.join(os.environ["nnUNet_preprocessed"], dst_task_name, "splits_final.pkl"))