## Importing pydicom, neccessary libraries and loading data
* Dicom (Digital Imaging in Medicine) is the bread and butter of medical image datasets, storage and transfer.
* Reading dicom patients tumor masks files 

## Hounsfield Units 

* The unit of measurement in CT scans is the Hounsfield Unit (HU), which is a measure of radiodensity. CT scanners are carefully calibrated to accurately measure this. From Wikipedia:

* By default however, the returned values are not in this unit. Let's fix this.

* Some scanners have cylindrical scanning bounds, but the output image is square. The pixels that fall outside of these bounds get the fixed value -2000. The first step is setting these values to 0, which currently corresponds to air. Next, let's go back to HU units, by multiplying with the rescale slope and adding the intercept (which are conveniently stored in the metadata of the scans!).

## Normalization
* A commonly used set of thresholds to normalize is -200 and 500.
* Our values currently range from -1024 to around 2000. Anything above 500 is not interesting to us these are simply bones with different radiodensity, and below -200 is just fats, lungs and air radiodensties. 
*  Here's some code you can use

## Resampling

* A scan may have a pixel spacing of [2.5, 0.5, 0.5], which means that the distance between slices is 2.5 millimeters. For a different scan this may be [1.5, 0.725, 0.725], this can be problematic for automatic analysis (e.g. using ConvNets)!

* A common method of dealing with this is resampling the full dataset to a certain isotropic resolution. If we choose to resample everything to 1mm1mm1mm pixels we can use 3D convnets without worrying about learning zoom/slice thickness invariance.

* Whilst this may seem like a very simple step, it has quite some edge cases due to rounding. Also, it takes quite a while.


In [2]:
import tensorflow as tf
import os # for doing directory operations 
import pandas as pd # for some simple data analysis (right now, just to load in the labels data and quickly reference it)
import cv2
import numpy as np
import scipy.ndimage
import matplotlib.pyplot as plt
import pydicom
from tensorflow.keras.models import load_model


In [3]:
#returns the hounse fied unit of slices 
def get_pixels_hu(slices):
    image = np.stack([s.pixel_array for s in slices])
    # Convert to int16 (from sometimes int16), 
    # should be possible as values should always be low enough (<32k)
    image = image.astype(np.int16)

    # Set outside-of-scan pixels to 0
    # The intercept is usually -1024, so air is approximately 0
    image[image == -2000] = 0
    
    # Convert to Hounsfield units (HU)
    for slice_number in range(len(slices)):
        
        intercept = slices[slice_number].RescaleIntercept
        slope = slices[slice_number].RescaleSlope
        
        if slope != 1:
            image[slice_number] = slope * image[slice_number].astype(np.float64)
            image[slice_number] = image[slice_number].astype(np.int16)
            
        image[slice_number] += np.int16(intercept)
    
    return np.array(image, dtype=np.int16)


def normalize2(image):
    MAX_BOUND = np.max(image)
    MIN_BOUND = np.min(image)
    image = (image - MIN_BOUND) / (MAX_BOUND - MIN_BOUND)
    image[image>1] = 1.
    image[image<0] = 0.
    return image


#resample the image to 1mm in all diminsions 
def resample(image, scan, new_spacing=[1,1,1]):
    # Determine current pixel spacing
    spacing = np.array([scan[0].SliceThickness, scan[0].PixelSpacing[0], scan[0].PixelSpacing[1]], dtype=np.float32)

    resize_factor = spacing / new_spacing
    new_real_shape = image.shape * resize_factor
    new_shape = np.round(new_real_shape)
    real_resize_factor = new_shape / image.shape
    new_spacing = spacing / real_resize_factor 
    image = scipy.ndimage.interpolation.zoom(image, real_resize_factor, mode='nearest')
    image = image.astype(np.uint8)
    
    
    return image



In [4]:
def tumor_segm(tumur_count = [7,1,1,1,1,3,1,1,1,1,1,1,2,1,1]):
    tumor_segm = []
    pathes = []
    for i in range(15):
        s1 = "3Dircadb1/3Dircadb1."
        s2 = "/MASKS_DICOM/livertumor0"
        if tumur_count[i] == 1:
            pathes.append([s1+str(i+1)+s2+'1'])
        else:
            l = []
            for k in range(tumur_count[i]):
                l.append(s1+str(i+1)+s2+str(k+1))
            pathes.append(l)

    
    
    tumor_slices = []
    tumor_norm = []
    for path in pathes:
        if len(path) == 1:
            slices = [pydicom.read_file(path[0]+"/" + s) for s in os.listdir(path[0])]
            slices.sort(key = lambda x: int(x.ImagePositionPatient[2]))
            tumor_slices.append(slices)
            hf_3d = get_pixels_hu(slices)
            norm = normalize2(hf_3d)
            tumor_norm.append(norm)
            
        else:
            slices1 = [pydicom.read_file(path[0]+"/" + s) for s in os.listdir(path[0])]
            slices1.sort(key = lambda x: int(x.ImagePositionPatient[2]))
            tumor_slices.append(slices1)
            dumy = get_pixels_hu(slices1)
            add = np.zeros(shape = dumy.shape)
            for k in range(len(path)):
                slices = [pydicom.read_file(path[k]+"/" + s) for s in os.listdir(path[k])]
                slices.sort(key = lambda x: int(x.ImagePositionPatient[2]))
                hf_3d = get_pixels_hu(slices)
                norm = normalize2(hf_3d)
                add = add + norm
            tumor_norm.append(add)

    return tumor_norm,tumor_slices

normalized_tumor,tumor_slices = tumor_segm()


tumors_resampled = []
for i in range(15):
    pix_resampled = resample( normalized_tumor[i], tumor_slices[i], [1,1,1])
    tumors_resampled.append(pix_resampled)


## Saving preproccessed patients tumor masks offline

* preproccessing is a time and memory consuming proccess so we prefare to save our clean and ready data offline


In [5]:
np.save('offline-tumors.npy',tumors_resampled)