### **Pre-processing HC18 Dataset**

**Description:** In this notebook, we provide the codes for:

1.   Functions to pre-process images
2.   Running image pre-processing over HC18 dataset
3.   Saving images to new folder

**STEP 1 - Functions to pre-process images and masks**

In [4]:
!pip install SimpleITK

import SimpleITK as sitk
import numpy as np
import os
import PIL
import pandas as pd
import matplotlib.pyplot as plt
import cv2


def resize_image(nump_image, orig_spacing, new_spacing, new_size):

  '''This function re-samples an image to a new spacing and crops it to a  newvsize by using SimpleITK linear interpolation method. 
  nump_image: numpy array containing the image to process
  orig_spacing: original pixels size in mm
  new_spacing: spacing for re-sampling
  new_size: new_size to which crop the nump_image

  Return: a numpy array with the processed image
  '''

  image = sitk.GetImageFromArray(nump_image) # transform numpy array to SITK image
  image.SetSpacing(orig_spacing)

  resample = sitk.ResampleImageFilter()
  resample.SetInterpolator(sitk.sitkLinear) # Linear interpolation

  resample.SetOutputDirection(image.GetDirection())
  resample.SetOutputOrigin(image.GetOrigin())
  resample.SetOutputSpacing(new_spacing)

  # crop to new_size
  orig_size = np.array(image.GetSize(), dtype=np.int)
  resample.SetSize(new_size)
  newimage = resample.Execute(image)

  return newimage

def format_masks(img):
  '''This function creates a mask from an image which only contains segmentation contours.
    img: the image as numpy array 

    return: a numpy array with the processed image
  '''

  thresh = ((img>0)*255).astype(np.uint8) # ensure the contour is binary

  h,w = thresh.shape
  mask = np.zeros((h+2,w+2)).astype(np.uint8) #template mask

  _ = cv2.floodFill(thresh, mask, (0,0), (255,255,255)) # fill the outside of segmentation contours with 1
  img = (((1 - mask)>0)*255).astype(np.uint8) # get the mask as 1 inside the contours and 0 outside

  return img
  

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


**STEP 2 - For loop to pre-proces the images**

In [5]:
images = []
masks = []
filenames = []

mainpath = './Data/data_from_zenodo_org_record_1327317'

# images files
readpath = mainpath+'/training_set/'
files = os.listdir(readpath)
files.sort()

# data for pixels spacing in mm
img_sizes = pd.read_csv(mainpath+'/training_set_pixel_size_and_HC.csv')

for filename in files:

  img = np.array(PIL.Image.open(readpath+filename))
  imsize = float(img_sizes[img_sizes['filename'] == filename.replace('_Annotation','')]['pixel size(mm)'])
  orig_spacing = [imsize, imsize]

  if 'Annotation' in filename:
    img = format_masks(img)
  img = sitk.GetArrayFromImage(resize_image(img, orig_spacing, [0.35, 0.35],
                                            new_size=[256,256]))
  
  if 'Annotation' in filename:
    masks.append((img>0).astype(float))
  else:
    images.append(img)
    filenames.append(filename)
  

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations


**STEP 3 - Save images to new folders**

In [None]:
savetrainpath = './Data/format_train/'
savevalpath = './Data/format_val/'

for k, file in enumerate(filenames): 
  if k%10 == 0: # original train val split : then changed through training loops (see the other notebooks)
    savepath = savevalpath
  else:
    savepath = savetrainpath
    
  if not os.path.exists(savepath):
      os.makedirs(savepath)
    
  cv2.imwrite(savepath+file,images[k])
  cv2.imwrite(savepath+file.replace('.png','_Annotation.png'),masks[k])