The following code serves as a representation of the workflow followed during this research internship. 

The goal is to exploit a dataset of full body PET MRI (T2 and dixon sequences) and CT to build and train a model which can create synthetic CT from the MRI images. The goal then using these sCT is to use them for body composition:  https://github.com/UMEssen/Body-and-Organ-Analysis

Body composition is a biomarker which can be used to determine treatment plans in oncology, cardiology. The CT image is fed to a software which will separate it into different regions thanks to a model that thresholds the HU to a specific intensity range. The regions to be determine are : 
- Subcutaneous adipose tissue
- Total adipose tissue
- Visceral adipose tissue
- Muscle volume

The software used also allows an organ segmentation of the trunk whiwh can also be later tested with the sCT. 

The advantage of creating such a sCT is to be able to be able to produce this body composition report without the need for an irradiating scan. Past models for creating sCT at the Bordet Institute were intended for radiotherapy treatments. The main difference here is that there is no need for such precision for body composition because we are more interested in global composition then the exact position of every organ in the scan. 

From the time being, the data is not yet recovered, the following code was used to explore torchio functionalities and learn how to load medical data into a model. 

In [2]:
#imports
import torch 
import torchio as tio
from torch.utils.data import DataLoader
import pydicom
import os

### Load the data using torchio 

It is often recommended to use NIfTI format for managing medical images, why? what are the advantages compared with simply working with dicom format?

In [3]:
def Create_dataset(rootdir):
    file_paths = []
    subject_paths = []
    subjects_list = []
    nb_subjects = 0
    
    # Start by recovering the path to the subjects
    for subjects in os.listdir(rootdir):
        subject_path = os.path.join(rootdir,subjects)
        subject_paths.append(subject_path)
        nb_subjects += 1
   
    for subject_path in subject_paths:
        # Recover all the files
        for files in os.listdir(subject_path):
            file_path= os.path.join(subject_path,files)
            file_paths.append(file_path)
        
        # Create the subject format of torchio witht the scans and the id of the patient
        sub = tio.Subject(
            CT = tio.ScalarImage(subject_path),
            id_patient = os.path.basename(subject_path),
        ) 
        
        subjects_list.append(sub)
        
        # List of transforms
        transforms = [
            tio.ToCanonical(),
            tio.Clamp(out_min=0,out_max = 2500),
            tio.RescaleIntensity(out_min_max=(-1,1)),
        ]
        
        # Create our own set of tranforms that we will apply to our dataset
        transform = tio.Compose(transforms)
    
    # Create the final data set made of the different subjects
   
    dataset = tio.SubjectsDataset(subjects_list, transform=transform)
    print('Dataset successfully created! The set is made of',nb_subjects,'patients.')
    return dataset

In [4]:
dataset = Create_dataset('../sCT code/data prostate')

# The plot of a Subject object will build the coronal and axial view when we only send it the sagittal view
# dataset[0].plot()

dataloader = DataLoader(dataset, batch_size=16, shuffle=True)

Dataset successfully created! The set is made of 8 patients.


Now that the dataset is crated with 3D images for each subject, we need to decide how to exploit them: do we keep them 3D in the model or do we explore them slice by slice to achieve a 2D analysis?

The following part will explore how this dataset can be included and preprocessed before feeding it to the model. 
It is important to specify that with the real data, some pre processing steps might be done using mice software to re-align the PET MRI with the CT (make sure they have the same zero and that morphological differences are not too disturbing).

A crucial notion to this project is to make sure that the model and preprocessing steps chosen will match the concrete clinical applications. 

The issue with medical images is their quantity, they often contain hundreds of milions of voxels and cannot always be downsampled. 
Big differences are to be noted when working with medical images: their size, the fact that they might be 3D, their format (often DICOM which contains metadata about the patient), the fact that they cannot be easily downsampled if details are needed. 

The batch size of medical images tend to be way smaller than the usual ones because of the quantity of information contained in a single medical image. 


### Create patches

To train in 2D, one need to extract slices from 3D volumes, aggrefating the inference results to generate a 3D volume: this is called batch based training, the patches along a dimension is one (cite Torchio patch based pipeline). 

In torchio it is possible to use patch samplers: functions that will randomly extract pathces from volumes when fed a SubjectDataset like we created earlier. 
We tend to use batch sampling when working with medical images because of their size: working with smaller patches reduce computation. It has also been proven that soemtimes, algorithms using patches can be more efficient, it is the case for denoising for example. 

You could chose Uniform or Weighted patching: uniform will take random patches from a volume with a uniform probabiliy while the weighted sampler will randomly extract patches given a probability map. 
If you chose very small patches, it could be intersting to create a probability map for each slice in order to focus on the region of interest and not consider much the background. 
However with larger patches, a uniform sampler is easier to use. 



In [None]:
# UNIFORM sampler
# Chose the sampler for your data
sampler_uniform = tio.data.UniformSampler(patch_size)

In [None]:
# WEIGHTED sampler 

dicom_dir = 'path_to_your_dicom_directory'

dicom_files = [pydicom.dcmread(os.path.join(dicom_dir, f)) for f in sorted(os.listdir(dicom_dir)) if f.endswith('.dcm')]

slices = [dicom_file.pixel_array for dicom_file in dicom_files]


image_np = np.stack(slices, axis=0)

image_tensor = torch.tensor(image_np, dtype=torch.float32)



threshold = 1000  

probability_map = torch.zeros_like(image_tensor)


for i in range(image_tensor.shape[0]):
    slice_2d = image_tensor[i, :, :]
    probability_map[i, :, :] = (slice_2d > threshold).float()


sampler = tio.data.WeightedSampler(probability_map, patch_size=(32, 32, 32))
