# Module 11 - Image Proccessing

Today's data set comes from [kaggle](https://www.kaggle.com/c/data-science-bowl-2018/overview) as part of the 2018 Data Science Bowl. The task is to spot nuclei in under varied conditions. The dataset contains a large number of segmented nuclei images. The images were acquired under a variety of conditions and vary in the cell type, magnification, and imaging modality (brightfield vs. fluorescence).

Each image is represented by an associated ImageId. Files belonging to an image are contained in a folder with this ImageId. Within this folder are two subfolders:

+ images contains the image file.
+ masks contains the segmented masks of each nucleus. This folder is only included in the training set. Each mask contains one nucleus. Masks are not allowed to overlap (no pixel belongs to two masks).

We will be working only with the Competition's stage 1 labeled training data for today.

## Setup
Let's get all the requirements sorted before we move on to the excercise. Notice, today we will be using the datetime package to deal with timestamps.  

In [None]:
# Requirements
!pip install --upgrade ipykernel
!pip install pandas
!pip install numpy
!pip install matplotlib
!pip install tqdm #for progress bar
!pip install scikit-image
!pip install scipy


# Globals
seed = 1017

#imports
import os
import random
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy import ndimage as nd
from tqdm import tqdm
from skimage import img_as_float, img_as_ubyte
from skimage.io import imread, imshow
from skimage.transform import resize
from skimage.restoration import denoise_nl_means, estimate_sigma
from skimage.exposure import equalize_adapthist
from skimage.morphology import disk, diameter_closing, diameter_opening
from skimage.segmentation import clear_border
from skimage.color import rgb2gray
from skimage.feature import canny

#magic
%matplotlib inline

## Loading the data
The data for today can be found in the `data` folder distributed along with this notebook. You will have to unzip it manually.

In [None]:
#set path to training data
TRAIN_PATH="data/stage1_train/"
#get sample IDs
train_ids=next(os.walk(TRAIN_PATH))[1]
print(str(len(train_ids)) + " Samples found!")

## Formatting
Let's resize the images to something managable so we can speed up calculations.

In [None]:
#Declare image shape
IMG_HEIGHT=128
IMG_WIDTH=128
IMG_CHANNELS=3

In [None]:
#resize images to speedup calculations
X_train = np.zeros((len(train_ids), IMG_HEIGHT, IMG_WIDTH, IMG_CHANNELS), dtype=np.uint8)
Y_train = np.zeros((len(train_ids), IMG_HEIGHT, IMG_WIDTH, 1), dtype=np.bool)

print('Resizing training images and masks')
for n, id_ in tqdm(enumerate(train_ids), total=len(train_ids)):   
    path = TRAIN_PATH + id_
    img = imread(path + '/images/' + id_ + '.png')[:,:,:IMG_CHANNELS]  
    img = resize(img, (IMG_HEIGHT, IMG_WIDTH), mode='constant', preserve_range=True)
    X_train[n] = img  #Fill empty X_train with values from img
    mask = np.zeros((IMG_HEIGHT, IMG_WIDTH, 1), dtype=np.bool)
    for mask_file in next(os.walk(path + '/masks/'))[2]:
        mask_ = imread(path + '/masks/' + mask_file)
        mask_ = np.expand_dims(resize(mask_, (IMG_HEIGHT, IMG_WIDTH), mode='constant',  
                                      preserve_range=True), axis=-1)
        mask = np.maximum(mask, mask_)  
            
    Y_train[n] = mask  

In [None]:
#Have a look at a random image
image_x=random.randint(0, len(train_ids))
print(X_train.shape)
imshow(X_train[image_x])
plt.show()


In [None]:
print(Y_train.shape)
imshow(img_as_ubyte(Y_train[image_x, :, :, 0]))
plt.show()

## Denoising
We will use a non-local means filter to denoise the images.

In [None]:
#apply non-local filter
patch_kw = dict(patch_size=5, #patch size
                patch_distance=6, #search area
                multichannel=True)
for ith in range(len(train_ids)):
    img = X_train[ith]
    sigma_est = np.mean(estimate_sigma(img, multichannel=True))
    X_train[ith] = denoise_nl_means(img, h=1.15 * sigma_est, fast_mode=True, **patch_kw)

In [None]:
#replot the random image
print(X_train.shape)
imshow(X_train[image_x])
plt.show()

In [None]:
#apply adaptive histogram equalization
#for ith in range(len(train_ids)):
#    X_train[ith] = equalize_adapthist(X_train[ith], kernel_size=1, clip_limit=0.01, nbins=100)


In [None]:
#replot the random image
print(X_train.shape)
imshow(img_as_float(X_train[image_x]))
plt.show()

In [None]:
#clean masked regions with open/close ops

#Dilation enlarges bright regions and shrinks dark regions.
#Erosion shrinks bright regions and enlarges dark regions.

#declare operation size
opsize=3

for ith in range(len(train_ids)):
    mask = Y_train[ith, :, :, 0]
    
    #Closing on an image is defined as a dilation followed by an erosion.
    #Closing can remove small dark spots (i.e. “pepper”) and connect small bright cracks.
    #This tends to “close” up (dark) gaps between (bright) features.
    mask = nd.binary_closing(mask, disk(opsize//2))

    #diamerter closing will remove dark spots but leave dark cracks
    #mask = diameter_closing(mask, opsize, connectivity=2)

    
    #Opening on an image is defined as an erosion followed by a dilation.
    #Opening can remove small bright spots (i.e. “salt”) and connect small dark cracks.
    #This tends to “open” up (dark) gaps between (bright) features
    mask = nd.binary_opening(mask, disk(opsize//2))
 
    #diameter opening will remove bright spots but leave bright lines
    #mask = diameter_opening(mask, opsize, connectivity=2)


    #fill in enclosed regions
    mask = nd.binary_fill_holes(mask)

    #remove segments connected to image border
    mask = clear_border(mask)
    
    Y_train[ith, :, :, 0] = mask


In [None]:
print(Y_train.shape)
imshow(img_as_ubyte(Y_train[image_x, :, :, 0]))
plt.show()

## Feature detection

In [None]:
#create a dummy channel
Z = np.zeros(Y_train.shape)

#add two channels for img intensity and canny edges
X_train = np.concatenate((X_train, Z), axis = 3)
X_train = np.concatenate((X_train, Z), axis = 3)

print(X_train.shape)

In [None]:
for ith in range(len(train_ids)):
    #get grayscale image
    img_gray = rgb2gray(X_train[ith, :, :, 0:3])
    X_train[ith, :, :, 3] #add new feature
    #caluculate canny edges
    edges = canny(image=img_gray, sigma=2)
    X_train[ith, :, :, 4]=edges #add new feature


In [None]:
#replot the random image graysacle
print(X_train.shape)
imshow(X_train[image_x, :, :, 3])
plt.show()

#replot the random image canny edges
print(X_train.shape)
imshow(X_train[image_x, :, :, 4])
plt.show()