<a href="https://colab.research.google.com/github/a-parida12/furry-octo-spork/blob/main/LungSegmentation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Hands on ML Segmentation Session

The purpose of the notebook is to follow the lifecycle of creating a deep learning based algorithm for the task of segmentation. The task we will focus on is the segmentation of the left and right lungs in a CXR image.

We will be using the data from the kaggle dataset from the CXR segmentation challenge

# Getting The Playground Ready

Some of the steps to get the colab ready. In this section we will download the data from the kaggle and make the colab enviornment ready for the further tasks

In [None]:
import os
from pathlib import Path
from glob import glob
from collections import defaultdict
import pandas as pd
import re

import matplotlib.pyplot as plt
import cv2

# configure the kaggle downloader api
path_to_json = ??
os.environ['KAGGLE_CONFIG_DIR'] = str(Path(path_to_json).parent)

In [None]:

# confirm TensorFlow sees the GPU
from tensorflow.python.client import device_lib
assert 'GPU' in str(device_lib.list_local_devices())
print('All Good! GPU found!!')

In [None]:
# Download the dataset from kaggle
!kaggle datasets download -d nikhilpandey360/chest-xray-masks-and-labels
# unzip the images 
!unzip \*.zip  && rm *.zip

# Image Loading and Processing

This section is used to learn to load the data. The data, which are images undergo various processing to be able to train a model. The processing we focus here are- resizing the image to a standard size and normalising the image across the dataset.

### Learning Targets-

*   Reading Image data
*   Resizing the Image
*   Min-Max Normalisation of a Image



In [None]:
data_path = "/content/"  # directory where the Lung Segmentation folder is located

# load the image paths and mask paths
lung_image_paths = glob(os.path.join(data_path,"Lung Segmentation/CXR_png/*.png"))
mask_image_paths = glob(os.path.join(data_path,"Lung Segmentation/masks/*.png"))

In [None]:
related_paths = defaultdict(list)

# finding the annotation for a image to build the image-annotation pairs for training 
# building a hashmap of image annotation paris
for img_path in lung_image_paths:
    img_match = re.search("CXR_png/(.*)\.png$", img_path)
    if img_match:
        img_name = img_match.group(1)
    for mask_path in mask_image_paths:
        mask_match = re.search(img_name, mask_path)
        if mask_match:
            related_paths["image_path"].append(img_path)
            related_paths["mask_path"].append(mask_path)
# store the hashmap as a pandas dataframe for easy reading
paths_df = pd.DataFrame.from_dict(related_paths)

In [None]:
# lets see how the image-mask path pairing looks like
print(paths_df.head())

In [None]:
x_train=[]
y_train=[]
# process the data to make it train ready
for xray_num in range (len(paths_df)):

    img_path = paths_df["image_path"][xray_num]
    mask_path = paths_df["mask_path"][xray_num]

    # read the 3 channel image and resize it to (256, 256, 3)
    img = ## read an image ##
    img = ## resize the image ##
    
    # bring the img from pixel value [0,255] -> [0,1]
    assert img.shape == (256, 256, 3)
    img = ## normalise the image ##
    
    # read the single channel mask and resize it to (256, 256)
    mask = ## read an image ##
    mask = ## resize the image ##
    mask = ## make single channel ##

    # bring the mask from pixel value [0,255] -> [0,1]
    assert mask.shape == (256, 256)
    mask= ## normalise the image ##

    x_train.append(img)
    y_train.append(mask)

assert len(x_train)== len(y_train) == 704 == len(paths_df)
print (f"Image {i+1} added")

In [None]:
# let us see some image-mask pairing
for i in range (5):


    fig = plt.figure(figsize = (10,10))
    ax1 = fig.add_subplot(2,2,1)
    ax1.imshow(x_train[i], cmap = "gray")
    ax2 = fig.add_subplot(2,2,2)
    ax2.imshow(y_train[i], cmap = "gray")

In [None]:
# defining the image shape based on the pre-processing done above
input_shape = (??,?? ,?? ) ## fill in the size of the input

## Deep Learning Begins Here

This section is used to get to know more about the training of deep learning methods. We go through the steps of building a deep neural netweork, selecting the correct loss function and optimisers.

The model is then put through a training process and the ideal model is saved for the inference.

### Learning Targets-


*   Building Layers in a Deep Neural Network(DNN)
*   Selecting Hyperparameters for a Network
*   Training a DNN
*   Saving the trained DNN



In [None]:
import tensorflow as tf
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.layers import * 
from tensorflow.keras.models import Model
from tensorflow.keras import backend as K
import numpy as np
from sklearn.model_selection import train_test_split
from keras.callbacks import ModelCheckpoint

In [None]:
# to measure the overfitting of a model we create a training and validation set
# split the dataset into train and validation sets
x_train, x_val, y_train, y_val = train_test_split(x_train, y_train, test_size=??) # fill in the split


In [None]:
# defination of the deep learning based segmentor
def Segmenter(input_size=(256, 256, 1), latent_dim = 8):
    # Input Layer
    input_img = Input(shape=input_shape, name='encoder_input')
    
    # Encoder 
    # Conv2D(features, (kernelsize, kernelsize), padding_type, activation_layer)
    x = Conv2D(16, (3, 3), padding='same', activation='relu')(input_img)
    x = Conv2D(32, (3, 3), padding='same', activation='relu',strides=(2, 2))(x)
    x = Conv2D(64, (3, 3), padding='same', activation='relu')(x)
    x = ?? # a conv layer here. Recomendation: feature 128, kernel_size(3, 3), same padding, relu activation 
    conv_shape = K.int_shape(x) 
    x = Flatten()(x)
    x = Dense(32, activation='relu')(x)
    x = Dropout(0.1)(x)

    # Latent Space
    z = Dense(latent_dim, name='latent_vector')(x)
    
    # Decoder
    x = Dense(conv_shape[1] * conv_shape[2] * conv_shape[3], activation='relu')(z)
    x = Dropout(0.1)(x)
    x = Reshape((conv_shape[1], conv_shape[2], conv_shape[3]))(x)
    x = Conv2DTranspose(128, (3, 3), padding='same', activation='relu')(x)
    x = Conv2DTranspose(64, (3, 3), padding='same', activation='relu',strides=(2, 2))(x)
    output = ?? # a conv transpose layer here. Recomendation: feature 1, kernel_size(3, 3), same padding, sigmoid activation 
    

    model = Model(inputs = input_img, outputs = output)
    # define the model optimiser, losses and metrics
    #https://www.tensorflow.org/api_docs/python/tf/keras/Model#compile
    model.compile(optimizer = ?? , loss = ??, metrics = [??])
    return model

In [None]:
# load the model
model = Segmenter(input_shape)
# model layer by layer summary
model.summary()

In [None]:
# some more model viz
tf.keras.utils.plot_model(model, to_file="model.png")

In [None]:
# early stopping to prevent overfitting
model_checkpoint = ModelCheckpoint("/content/trained_model.h5", monitor='loss',verbose=2, save_best_only=True)

# tain the model
history = model.fit(x = np.array(x_train), 
                    y = np.array(y_train), 
                    epochs = ??, 
                    batch_size = ??,
                    validation_data=(np.array(x_val), np.array(y_val)),
                    callbacks = [model_checkpoint])

In [None]:
# lets take a look how model training went

# plot training vs validation accuracy
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

# plot training vs validation loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()


# Inference

This section is used to get to know how to use a trained a model to do inference. We go through the steps loading the pre-trained model and doing a prediction.

### Learning Targets-

*   Loading a trained model
*   Doing a inference on a Image



In [None]:
# if your trained model is not good enough you can use a pre-trained model
! wget https://github.com/a-parida12/furry-octo-spork/raw/main/solution/trained_model.h5

In [None]:
# loading the model
pretrained_weights_path = ??
model = Segmenter(input_shape)

# adding weights from the pretrained model
model.load_weights(pretrained_weights_path)

In [None]:
test_path_dir = os.path.join(data_path,"Lung Segmentation/test/")
test_images = os.listdir(os.path.join(data_path,"Lung Segmentation/test/"))

In [None]:
# load the data and do the same preprocessing as the training data
x_test=[]
for i in range (len(test_images)):
    xray_num = i
    test_path = test_path_dir + test_images[xray_num]

    img = cv2.imread(test_path)
    img = cv2.resize(img,(256,256))
    img = img/255
    x_test.append(img)
print (f"Image {i+1} added")

In [None]:
# inference
x_test= np.array(x_test)
y_pred=model.predict(x_test)

In [None]:
# let us visualise the outputs for the test set
for i in range (15,20):
    fig = plt.figure(figsize = (10,10))

    ax1 = fig.add_subplot(2,2,1)
    ax1.imshow(x_test[i], cmap = "gray")
    ax2 = fig.add_subplot(2,2,2)
    ax2.imshow(np.squeeze(y_pred[i]), cmap = "gray")