### Inner Eye Deep learning framework implementation
This is prepared following https://github.com/microsoft/InnerEye-DeepLearning/blob/main/docs/creating_dataset.md

Since nnUNet is already trained at this stage, we will simply create the `dataset.csv` file by referencing the raw data files in the nnUNet standard folder structure.

In [12]:
from MEDIcaTe.file_folder_ops import *
from MEDIcaTe.utilities import *
from MEDIcaTe.nii_resampling import find_pix_dim_with_orientation
import pandas as pd

The requirements described at https://github.com/microsoft/InnerEye-DeepLearning/blob/main/docs/creating_dataset.md are already adhered to by data in the folder
'/homes/kovacs/project_data/hnc-auto-contouring/deepMedic/data_nifty/d_train_raw_data_base/images' and /homes/kovacs/project_data/hnc-auto-contouring/deepMedic/data_nifty/d_train_raw_data_base/labels. 

These were created as a part of the pre-processing for DeepMedic. 

The guide specifies, that images should be encoded as float32 and labels as binary masks. 
So we will create a copy of the images where the datatypes are changed to these formats.

We start by checking the current format of the data. 

In [3]:
path_to_image_folder_src = '/homes/kovacs/project_data/hnc-auto-contouring/deepMedic/data_nifty/d_train_raw_data_base/images'
path_to_label_folder_src = '/homes/kovacs/project_data/hnc-auto-contouring/deepMedic/data_nifty/d_train_raw_data_base/labels'

path_to_image_folder_dst = '/homes/kovacs/project_data/hnc-auto-contouring/inner-eye/d_train/images'
path_to_label_folder_dst = '/homes/kovacs/project_data/hnc-auto-contouring/inner-eye/d_train/labels'

### Converting dataset format to float32 for images and int8 for labels

In [5]:
# This only needs to be run  once, så it is uncommented.
# To run in background I used the file ./convert_dtypes.py
# convert images to float32
'''
for i,fi in enumerate(listdir(path_to_image_folder_src)):
    path_to_nii_src = join(path_to_image_folder_src,fi)
    path_to_nii_dst = join(path_to_image_folder_dst,fi)

    convert_nii_to_float32(s,path_to_nii_dst)
    if i > 1:
        break

# convert labels to int8
for i,fi in enumerate(listdir(path_to_label_folder_src)):
    path_to_nii_src = join(path_to_label_folder_src,fi)
    path_to_nii_dst = join(path_to_label_folder_dst,fi)
    convert_nii_to_int8(path_to_nii_src,path_to_nii_dst)
    if i > 1:
        break
'''

### Generating the dataset.csv file

In [6]:
# Generate dataset.csv file
'''
This all depends on how you strucuted your files. This script works for my structure, which is:
-- data
    -- d_train
        -- images
            HNC01_000_0000.nii.gz
            HNC01_000_0001.nii.gz
            HNC01_001_0000.nii.gz
            HNC01_001_0001.nii.gz
            ...
        -- labels
            HNC01_000.nii.gz
            HNC01_001.nii.gz
            ...
where the ending _0000.nii.gz are CT's and _0001.nii.gz are PET.
'''
path_labels = '/homes/kovacs/project_data/hnc-auto-contouring/inner-eye/d_train/labels'
path_images = '/homes/kovacs/project_data/hnc-auto-contouring/inner-eye/d_train/images'

path_to_dataset_csv = '/homes/kovacs/project_data/hnc-auto-contouring/inner-eye'

# paths relative to lcation of dataset.csv:
rel_path_labels = 'd_train/labels'
rel_path_images = 'd_train/images'

subject = []
filePath = []
channel = []

for i,f in enumerate(listdir(path_labels)):
    case_id = f[:-7]
    # add ct line
    filePath.append(join(rel_path_images,f'{case_id}_0000.nii.gz'))
    channel.append('ct')
    subject.append(i+1)

    # add pet line
    filePath.append(join(rel_path_images,f'{case_id}_0001.nii.gz'))
    channel.append('pet')
    subject.append(i+1)

    # add label line
    filePath.append(join(rel_path_labels,f'{case_id}.nii.gz'))
    channel.append('structure1')
    subject.append(i+1)

out_dat = pd.DataFrame(list(zip(subject, filePath, channel)), columns =['subject', 'filePath', 'channel'])
out_dat.to_csv(join(path_to_dataset_csv,'dataset.csv'),index=False)

In [10]:
# a quick print to see the result
print(out_dat.head(15))

    subject                              filePath     channel
0         1  d_train/images/HNC01_000_0000.nii.gz          ct
1         1  d_train/images/HNC01_000_0001.nii.gz         pet
2         1       d_train/labels/HNC01_000.nii.gz  structure1
3         2  d_train/images/HNC01_001_0000.nii.gz          ct
4         2  d_train/images/HNC01_001_0001.nii.gz         pet
5         2       d_train/labels/HNC01_001.nii.gz  structure1
6         3  d_train/images/HNC01_002_0000.nii.gz          ct
7         3  d_train/images/HNC01_002_0001.nii.gz         pet
8         3       d_train/labels/HNC01_002.nii.gz  structure1
9         4  d_train/images/HNC01_003_0000.nii.gz          ct
10        4  d_train/images/HNC01_003_0001.nii.gz         pet
11        4       d_train/labels/HNC01_003.nii.gz  structure1
12        5  d_train/images/HNC01_004_0000.nii.gz          ct
13        5  d_train/images/HNC01_004_0001.nii.gz         pet
14        5       d_train/labels/HNC01_004.nii.gz  structure1


### Adhering to the image size requirements
We check that this dataset adheres to the image size requirements as presscribed at https://github.com/microsoft/InnerEye-DeepLearning/blob/main/docs/creating_dataset.md. 

In [None]:
# Note: takes about 7 minutes to run for 8-900 cases.
path_labels = '/homes/kovacs/project_data/hnc-auto-contouring/inner-eye/d_train/labels'
path_images = '/homes/kovacs/project_data/hnc-auto-contouring/inner-eye/d_train/images'
path_to_dataset_csv = '/homes/kovacs/project_data/hnc-auto-contouring/inner-eye'


ct_dim_list = []
pet_dim_list = []
label_dim_list = []

for i,f in enumerate(listdir(path_labels)):
    case_id = f[:-7]

    ct_file = join(path_images,f'{case_id}_0000.nii.gz')
    pet_file = join(path_images,f'{case_id}_0001.nii.gz')
    label_file = join(path_labels,f)
    
    ct_dim = find_pix_dim_with_orientation(ct_file)
    pet_dim = find_pix_dim_with_orientation(pet_file)
    label_dim = find_pix_dim_with_orientation(label_file)

    ct_dim_list.append(ct_dim)
    pet_dim_list.append(pet_dim)
    label_dim_list.append(label_dim)

out_dat = pd.DataFrame(list(zip(ct_dim_list, pet_dim_list, label_dim_list)), columns =['ct', 'pet', 'label'])
out_dat.to_csv(join(path_to_dataset_csv,'image_dimensions.csv'),index=False)
