<a href="https://colab.research.google.com/github/amitchug/ALMlops/blob/main/M2_AST_05_Image_Segmentation_A.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Advanced Certification Programme in AI and MLOps
## A programme by IISc and TalentSprint
### Assignment: Image Segmentation using U-Net and DeepLabv3+

## Learning Objectives:

At the end of the experiment, you will be able to:

*  understand, prepare, and visualize the the dataset containing image and corresponding masked image used for segmentation
*  understand the encoder, bottleneck, and decoder region of a U-Net architecture
*  build and train a U-Net architecture for segmentation
*  create a masked image (prediction)
*  calculate the accuracy score like IoU and Dice-Score used in segmentation
* understand and implement DeeplabV3+ architecture for segmentation

## Dataset
We will be training the model on the [Oxford-IIIT Pet](https://www.robots.ox.ac.uk/~vgg/data/pets/) dataset. This contains pet images, their classes, segmentation masks, and head region of interest. We will only use the `images` and `segmentation masks` for this experiment.

The dataset consists of images of 37 pet breeds, with 200 images per breed. Each image includes the corresponding label and pixel-wise masks. The masks are class labels for each pixel. Each pixel is given one of three categories:

* Class 1: Pixel belonging to the pet.
* Class 2: Pixel bordering the pet.
* Class 3: None of the above/a surrounding pixel.

### Setup Steps:

In [None]:
#@title Please enter your registration id to start: { run: "auto", display-mode: "form" }
Id = "" #@param {type:"string"}

In [None]:
#@title Please enter your password (your registered phone number) to continue: { run: "auto", display-mode: "form" }
password = "" #@param {type:"string"}

In [None]:
#@title Run this cell to complete the setup for this Notebook
from IPython import get_ipython

ipython = get_ipython()

notebook= "M2_AST_05_Image_Segmentation_A" #name of the notebook

def setup():
#  ipython.magic("sx pip3 install torch")

    # ipython.magic("wget https://cdn.iisc.talentsprint.com/AIandMLOps/Datasets/Acoustic_Extinguisher_Fire_Dataset.xlsx")
    from IPython.display import HTML, display
    display(HTML('<script src="https://dashboard.talentsprint.com/aiml/record_ip.html?traineeId={0}&recordId={1}"></script>'.format(getId(),submission_id)))
    print("Setup completed successfully")
    return

def submit_notebook():
    ipython.magic("notebook -e "+ notebook + ".ipynb")

    import requests, json, base64, datetime

    url = "https://dashboard.talentsprint.com/xp/app/save_notebook_attempts"
    if not submission_id:
      data = {"id" : getId(), "notebook" : notebook, "mobile" : getPassword()}
      r = requests.post(url, data = data)
      r = json.loads(r.text)

      if r["status"] == "Success":
          return r["record_id"]
      elif "err" in r:
        print(r["err"])
        return None
      else:
        print ("Something is wrong, the notebook will not be submitted for grading")
        return None

    elif getAnswer() and getComplexity() and getAdditional() and getConcepts() and getComments() and getMentorSupport():
      f = open(notebook + ".ipynb", "rb")
      file_hash = base64.b64encode(f.read())

      data = {"complexity" : Complexity, "additional" :Additional,
              "concepts" : Concepts, "record_id" : submission_id,
              "answer" : Answer, "id" : Id, "file_hash" : file_hash,
              "notebook" : notebook,
              "feedback_experiments_input" : Comments,
              "feedback_mentor_support": Mentor_support}
      r = requests.post(url, data = data)
      r = json.loads(r.text)
      if "err" in r:
        print(r["err"])
        return None
      else:
        print("Your submission is successful.")
        print("Ref Id:", submission_id)
        print("Date of submission: ", r["date"])
        print("Time of submission: ", r["time"])
        print("View your submissions: https://aimlops-iisc.talentsprint.com/notebook_submissions")
        #print("For any queries/discrepancies, please connect with mentors through the chat icon in LMS dashboard.")
        return submission_id
    else: submission_id


def getAdditional():
  try:
    if not Additional:
      raise NameError
    else:
      return Additional
  except NameError:
    print ("Please answer Additional Question")
    return None

def getComplexity():
  try:
    if not Complexity:
      raise NameError
    else:
      return Complexity
  except NameError:
    print ("Please answer Complexity Question")
    return None

def getConcepts():
  try:
    if not Concepts:
      raise NameError
    else:
      return Concepts
  except NameError:
    print ("Please answer Concepts Question")
    return None


# def getWalkthrough():
#   try:
#     if not Walkthrough:
#       raise NameError
#     else:
#       return Walkthrough
#   except NameError:
#     print ("Please answer Walkthrough Question")
#     return None

def getComments():
  try:
    if not Comments:
      raise NameError
    else:
      return Comments
  except NameError:
    print ("Please answer Comments Question")
    return None


def getMentorSupport():
  try:
    if not Mentor_support:
      raise NameError
    else:
      return Mentor_support
  except NameError:
    print ("Please answer Mentor support Question")
    return None

def getAnswer():
  try:
    if not Answer:
      raise NameError
    else:
      return Answer
  except NameError:
    print ("Please answer Question")
    return None


def getId():
  try:
    return Id if Id else None
  except NameError:
    return None

def getPassword():
  try:
    return password if password else None
  except NameError:
    return None

submission_id = None
### Setup
if getPassword() and getId():
  submission_id = submit_notebook()
  if submission_id:
    setup()
else:
  print ("Please complete Id and Password cells before running setup")



### Import required packages

In [None]:
import os
from glob import glob
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
import tensorflow as tf
import tensorflow_datasets as tfds
from tensorflow import keras
from tensorflow.keras import layers

### Download the Oxford-IIIT Pet dataset

In [None]:
# Download dataset

!wget http://www.robots.ox.ac.uk/~vgg/data/pets/data/images.tar.gz
!wget http://www.robots.ox.ac.uk/~vgg/data/pets/data/annotations.tar.gz
!tar -xf images.tar.gz
!tar -xf annotations.tar.gz

### Visualize the images in dataset

* Images are present in `images/` directory
* Corresponding segmenation masks are present in `annotations/trimaps/` directory

In [None]:
# Visualize an image
an_img_path = sorted(glob("./images/*"))[0]        # The `glob` module is used to retrieve files and directories matching a specified pattern.
print(f"Path: {an_img_path}")

img = Image.open(an_img_path)
img_arr = np.array(img)
plt.imshow(img_arr);
plt.title("Image");

In [None]:
# Visualize a semantic part segmentation label image
a_segm_img_path = sorted(glob("./annotations/trimaps/*"))[0]
print(a_segm_img_path)

img = Image.open(# YOUR CODE HERE)
img_arr = # YOUR CODE HERE
# YOUR CODE HERE for plt.imshow
plt.title("Segmentation Mask");

### Load & Preprocess Images

**Train, validation, test Split:**
* Save the images paths in a list, and Split the list to have images for train, validation, and test sets
* Do the same for segmentation masks images

In [None]:
IMAGE_SIZE = 128
BATCH_SIZE = 64
NUM_CLASSES = 3        # Class 1: Pixel belonging to the pet; Class 2: Pixel bordering the pet; Class 3: Surrounding pixel
IMAGE_DIR = "./images/"
MASK_IMAGE_DIR = "./annotations/trimaps/"

# The below line uses the glob function from the glob module to find all the image files in the "images/" directory
# The resulting list of file paths is sorted in ascending order.
all_images = sorted([os.path.join(IMAGE_DIR, fname) for fname in os.listdir(IMAGE_DIR) if fname.endswith(".jpg")])

# The below line of code finds all the mask files in the masks subdirectory "annotations/trimaps/".
all_masks = sorted([os.path.join(MASK_IMAGE_DIR, fname) for fname in os.listdir(MASK_IMAGE_DIR) if fname.endswith(".png") and not fname.startswith(".")])

# Creating a list of validation image files by selecting every 4th image from the all_images list.
# Interval 4 is choosen intentionaly, as we have 200 images per category. So, will keep 1/4th in validation, 1/4th in test sets, remaining for training.
val_images = all_images[::4]

# Creating a list of validation mask files by selecting every 4th image from the all_masks list.
val_masks = all_masks[::4]

# Creating a list of test image files by selecting every 4th image starting
# from the second image in the all_images list.
test_images = all_images[1::4]

# Creating a list of test mask files by selecting every 4th image starting
# from the second image in the all_masks list.
test_masks = # YOUR CODE HERE

# Creating an empty list for the training image files & appending remaining images in it.
train_images = []
for i in all_images:
    if (i not in val_images) and (i not in test_images):
        train_images.append(i)

# Creating an empty list for the training image files & appending remaining mask images in it.
train_masks = []
for i in all_masks:
    if (i not in val_masks) and (i not in test_masks):
        # YOUR CODE HERE for appending i in train_masks


In [None]:
len(all_images), len(train_images), len(val_images), len(test_images)

In [None]:
len(all_masks), len(train_masks), len(val_masks), len(test_masks)

**Load & Preprocess:**

* Read the image using its path
* Resize it to have size (128 x 128)
* If its input image, then normalize the pixel values by diving it by 255
* If its target mask image, then subtract 1 from it. This is a preprocessing step to adjust the segmentation mask's pixel values.
    
    In `annotations/README` file of the dataset it is mentioned that the pixels in the segmentation mask are labeled as { 'foreground' : 1, 'background' : 2 , 'Not Classified' : 3 }.

    For the sake of convenience, we subtract 1 from the segmentation mask, resulting in labels that are : `{0, 1, 2}` and we will interpret these as {'pet', 'background', 'outline'}.

In [None]:
# The below function reads an image file and returns a preprocessed image tensor.
# The mask argument is set to False by default, indicating that it is an image file, not a mask file.

def read_image(image_path, mask=False):
    if mask:
        image = Image.open(image_path)                        # Open mask image               eg. shape (500, 600)
        image = image.resize((IMAGE_SIZE, IMAGE_SIZE))        # Resize the mask image         eg. shape (128, 128)
        arr = (np.array(image) - 1)                           # Change pixel values from {1,2,3} --> {0,1,2}
        image = tf.convert_to_tensor(arr, dtype=tf.float32)   # Convert to tensor
        image = tf.expand_dims(image, axis=-1)                # Add an extra dimension        eg. shape (128, 128, 1)
    else:
        image = Image.open(image_path)                        # Open input image              eg. shape (500, 600, 3)
        image = # YOUR CODE HERE                              # Resize the image              eg. shape (128, 128, 3)
        arr = np.array(image) / 255                           # Normalize pixel values to have values bw 0 & 1
        image = # YOUR CODE HERE                              # Convert to tensor

    return image


In [None]:
# Test `read_image` function for an input image
img = read_image(sorted(glob("./images/*"))[0])
print(f"Shape: {img.shape} \nMin pixel value: {img.numpy().min()} \nMax pixel value: {img.numpy().max()}")

plt.imshow(img);

In [None]:
# Test `read_image` function for a mask image
mask_img = read_image(sorted(glob("./annotations/trimaps/*"))[0], mask=True)
print(f"Shape: {mask_img.shape} \nMin pixel value: {mask_img.numpy().min()} \nMax pixel value: {mask_img.numpy().max()}")

plt.imshow(mask_img);

In [None]:
# Function `load_images` will apply `read_image` to all images in a list

def load_images(image_paths, mask_paths):
    images_tf = []
    masks_tf = []
    for i in range(len(image_paths)):
        if np.array(Image.open(image_paths[i])).shape[-1] == 3:    # check input images must have 3 channels
            images_tf.append(read_image(image_paths[i]))
            masks_tf.append(read_image(mask_paths[i], mask=True))
    return images_tf, masks_tf


In [None]:
# Load and preprocess train, val, test sets

train_images_tf, train_masks_tf = load_images(train_images, train_masks)
val_images_tf, val_masks_tf = # YOUR CODE HERE to load images using val_images, val_masks
test_images_tf, test_masks_tf = # YOUR CODE HERE to load images using test_images, test_masks

In [None]:
len(train_images_tf), len(val_images_tf), len(test_images_tf)

In [None]:
plt.imshow(val_images_tf[0]);

In [None]:
plt.imshow(val_masks_tf[0]);

### Create TensorFlow Dataset

Prepare batches for training, validation, and testing.

In [None]:
def data_generator(image_list, mask_list):
    dataset = tf.data.Dataset.from_tensor_slices((image_list, mask_list))
    dataset = dataset.batch(BATCH_SIZE, drop_remainder=True)
    return dataset


train_dataset = data_generator(train_images_tf, train_masks_tf)
val_dataset = # YOUR CODE HERE to crete TF dataset using (val_images_tf, val_masks_tf)
test_dataset = # YOUR CODE HERE to create TF dataset using (test_images_tf, test_masks_tf)

print("Train Dataset:", train_dataset)
print("Val Dataset:", val_dataset)
print("Test Dataset", test_dataset)

In [None]:
len(train_images_tf), len(val_images_tf), len(test_images_tf)

In [None]:
len(train_dataset), len(val_dataset), len(test_dataset)

In [None]:
# To Visualize an image, we need to iterate over a batch and access a particular image

sample_batch = next(iter(val_dataset))
random_index = np.random.choice(sample_batch[0].shape[0])
sample_image, sample_mask = sample_batch[0][random_index], sample_batch[1][random_index]

plt.figure(figsize=(10, 10))

plt.subplot(1, 2, 1)
plt.title("Image")
plt.imshow(sample_image);

plt.subplot(1, 2, 2)
plt.title("True Mask")
plt.imshow(sample_mask);

## **Implement the U-Net model**
Define model : With the dataset prepared, we can now build the UNet. Here is the overall architecture as shown below. A U-Net consists of an encoder (downsampler) and decoder (upsampler) with a bottleneck in between. The gray arrows correspond to the skip connections that concatenate encoder block outputs to each stage of the decoder. Let's see how to implement these starting with the encoder.

<center>
<img src="https://cdn.iisc.talentsprint.com/AIandMLOps/Images/U-Net.png" width=800px height=500px/>
</center>
<br><br>



## Encoder
The encoder is having repeating blocks. It's best to create functions for it to make the code modular. These encoder blocks contain two Conv2D layers activated by ReLU, followed by a MaxPooling and Dropout layer. Each stage has an increasing number of filters and the dimensionality of the features reduce because of the pooling layer.

### Creating Encoder utilities with the following three functions:

* conv2d_block() - to add two convolution layers and ReLU activations
* encoder_block() - to add pooling and dropout to the conv2d blocks. Recall that in UNet, you need to save the output of the convolution layers at each block so this function will return two values to take that into account (i.e. output of the conv block and the dropout)
* encoder() - to build the entire encoder. This will return the output of the last encoder block as well as the output of the previous conv blocks. These will be concatenated to the decoder blocks as you'll see later.

In [None]:
# Encoder Utilities

def conv2d_block(input_tensor, n_filters, kernel_size = 3):
  '''
  Adds 2 convolutional layers with the parameters passed to it

  Args:
    input_tensor (tensor) -- the input tensor
    n_filters (int) -- number of filters
    kernel_size (int) -- kernel size for the convolution

  Returns:
    tensor of output features
  '''
  # first layer
  x = input_tensor
  for i in range(2):
    x = tf.keras.layers.Conv2D(filters = n_filters, kernel_size = (kernel_size, kernel_size),\
            kernel_initializer = 'he_normal', padding = 'same')(x)
    x = tf.keras.layers.Activation('relu')(x)

  return x


def encoder_block(inputs, n_filters=64, pool_size=(2,2), dropout=0.3):
  '''
  Adds two convolutional blocks and then perform down sampling on output of convolutions.

  Args:
    input_tensor (tensor) -- the input tensor
    n_filters (int) -- number of filters
    kernel_size (int) -- kernel size for the convolution

  Returns:
    f - the output features of the convolution block
    p - the maxpooled features with dropout
  '''

  f = conv2d_block(inputs, n_filters=n_filters)
  p = tf.keras.layers.MaxPooling2D(pool_size=(2,2))(f)
  p = tf.keras.layers.Dropout(0.3)(p)

  return f, p


def encoder(inputs):
  '''
  This function defines the encoder or downsampling path.

  Args:
    inputs (tensor) -- batch of input images

  Returns:
    p4 - the output maxpooled features of the last encoder block
    (f1, f2, f3, f4) - the output features of all the encoder blocks
  '''
  f1, p1 = encoder_block(inputs, n_filters=64, pool_size=(2,2), dropout=0.3)
  f2, p2 = encoder_block(p1, n_filters=128, pool_size=(2,2), dropout=0.3)
  f3, p3 = encoder_block(p2, n_filters=256, pool_size=(2,2), dropout=0.3)
  f4, p4 = encoder_block(p3, n_filters=512, pool_size=(2,2), dropout=0.3)

  return p4, (f1, f2, f3, f4)

### Bottleneck
A bottleneck follows the encoder block and is used to extract more features. This does not have a pooling layer so the dimensionality remains the same. You can use the conv2d_block() function defined earlier to implement this.

In [None]:
def bottleneck(inputs):
  '''
  This function defines the bottleneck convolutions to extract more features before the upsampling layers.
  '''
  bottle_neck = conv2d_block(inputs, n_filters=1024)

  return bottle_neck

## Decoder
Finally, we have the decoder which upsamples the features back to the original image size. At each upsampling level, you will take the output of the corresponding encoder block and concatenate it before feeding to the next decoder block.

### Creating Decoder  utilities with the following two functions:

In [None]:
# Decoder Utilities

def decoder_block(inputs, conv_output, n_filters=64, kernel_size=3, strides=3, dropout=0.3):
  '''
  defines the one decoder block of the UNet

  Args:
    inputs (tensor) -- batch of input features
    conv_output (tensor) -- features from an encoder block
    n_filters (int) -- number of filters
    kernel_size (int) -- kernel size
    strides (int) -- strides for the deconvolution/upsampling
    padding (string) -- "same" or "valid", tells if shape will be preserved by zero padding

  Returns:
    c (tensor) -- output features of the decoder block
  '''
  u = tf.keras.layers.Conv2DTranspose(n_filters, kernel_size, strides = strides, padding = 'same')(inputs)
  c = tf.keras.layers.concatenate([u, conv_output])
  c = # YOUR CODE HERE to add a Dropout layer
  c = conv2d_block(c, n_filters, kernel_size=3)

  return c


def decoder(inputs, convs, output_channels):
  '''
  Defines the decoder of the UNet chaining together 4 decoder blocks.

  Args:
    inputs (tensor) -- batch of input features
    convs (tuple) -- features from the encoder blocks
    output_channels (int) -- number of classes in the label map

  Returns:
    outputs (tensor) -- the pixel wise label map of the image
  '''

  f1, f2, f3, f4 = convs

  c6 = decoder_block(inputs, f4, n_filters=512, kernel_size=(3,3), strides=(2,2), dropout=0.3)
  c7 = decoder_block(c6, f3, n_filters=256, kernel_size=(3,3), strides=(2,2), dropout=0.3)
  c8 = decoder_block(c7, f2, n_filters=128, kernel_size=(3,3), strides=(2,2), dropout=0.3)
  c9 = decoder_block(c8, f1, n_filters=64, kernel_size=(3,3), strides=(2,2), dropout=0.3)

  outputs = tf.keras.layers.Conv2D(output_channels, (1, 1), activation='softmax')(c9)

  return outputs


### Putting it all together
We can finally build the UNet by chaining the encoder, bottleneck, and decoder. We will specify the number of output channels and in this particular set, that would be 3. That is because there are three possible labels for each pixel: 'pet', 'background', and 'outline'.

In [None]:
OUTPUT_CHANNELS = 3

def unet():
  '''
  Defines the UNet by connecting the encoder, bottleneck and decoder.
  '''

  # specify the input shape
  inputs = tf.keras.layers.Input(shape=(128,128,3,))

  # feed the inputs to the encoder
  encoder_output, convs = encoder(inputs)

  # feed the encoder output to the bottleneck
  bottle_neck = bottleneck(encoder_output)

  # feed the bottleneck and encoder block outputs to the decoder
  # specify the number of classes via the `output_channels` argument
  outputs = decoder(bottle_neck, convs, output_channels=OUTPUT_CHANNELS)

  # create the model
  model = # YOUR CODE HERE to return a keras Model with (inputs=inputs, outputs=outputs)

  return model

# instantiate the model
unet_model = unet()

# see the resulting model architecture
unet_model.summary()

In [None]:
# tf.keras.utils.plot_model(model, show_shapes=False)

### Compile and Train the model
Now, all that is left to do is to compile and train the model. The loss we will use is sparse_categorical_crossentropy. The reason is that the network is trying to assign each pixel a label, just like a multi-class prediction. In the true segmentation mask, each pixel has either a {0,1,2}. The network here is outputting three channels. Essentially, each channel is trying to learn to predict a class and sparse_categorical_crossentropy is the recommended loss for such a scenario.

In [None]:
# configure the optimizer, loss and metrics for training
unet_model.compile(optimizer=tf.keras.optimizers.Adam(),
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

In [None]:
# model training

EPOCHS = 10

model_history = unet_model.fit(train_dataset,
                               epochs=EPOCHS,
                               validation_data=val_dataset)

In [None]:
# Save the model weights
unet_model.save_weights('unet_pets_model.weights.h5')

In [None]:
# To use the model later, create the model with the same architecture and load the model weights
## unet_model.load_weights('unet_pets_model.weights.h5')

We can plot the training and validation loss to see how the training went. This should show generally decreasing values per epoch.

#### Learning curve from model history

In [None]:
def display_learning_curves(model_history):
  acc = model_history.history["accuracy"]
  val_acc = model_history.history["val_accuracy"]
  loss = model_history.history["loss"]
  val_loss = model_history.history["val_loss"]
  epochs_range = range(EPOCHS)

  fig = plt.figure(figsize=(8,5))

  plt.subplot(1,2,1)
  plt.plot(epochs_range, acc, label="train accuracy")
  plt.plot(epochs_range, val_acc, label="validataion accuracy")
  plt.title("Accuracy")
  plt.xlabel("Epoch")
  plt.ylabel("Accuracy")
  plt.legend(loc="lower right")

  plt.subplot(1,2,2)
  plt.plot(epochs_range, loss, label="train loss")
  plt.plot(epochs_range, val_loss, label="validataion loss")
  plt.title("Loss")
  plt.xlabel("Epoch")
  plt.ylabel("Loss")
  plt.legend(loc="upper right")

  fig.tight_layout()
  plt.show()

In [None]:
# Display learning curves
display_learning_curves(model_history)

### Make predictions

In [None]:
sample_batch = next(iter(test_dataset))
random_index = np.random.choice(sample_batch[0].shape[0])
sample_image, sample_mask = sample_batch[0][random_index], sample_batch[1][random_index]

In [None]:
out = unet_model.predict(tf.reshape(sample_image, (1, 128, 128, 3)))         # shape (1, 128, 128, 3)
out_img = np.squeeze(out)                                                    # shape (128, 128, 3)
result = np.argmax(out_img, axis=2)                                          # shape (128, 128)

plt.imshow(result);

### Visualize Predictions

In [None]:
# Inference from model

def infer(image_tensor, model, verbose=1):
    # predictions from model, output shape -> (1, 128, 128, 3)
    predictions = model.predict(np.expand_dims((image_tensor), axis=0), verbose=verbose)
    # Shape after squeeze -> (128, 128, 3)
    predictions = np.squeeze(predictions)
    # Select only maximum predicted value for every pixel, output shape -> (128, 128)

    predictions = np.argmax(predictions, axis=2)
    return predictions.reshape(128,128,1)

#### Plot the predictions

In [None]:
def plot_predictions(test_img, test_mask, model, verbose=1):
    pred = infer(image_tensor = test_img, model = model, verbose=verbose)

    fig = plt.figure(figsize=(8,5))

    plt.subplot(1,3,1)
    plt.imshow(test_img)
    plt.title("Input image")

    plt.subplot(1,3,2)
    plt.imshow(test_mask)
    plt.title("Actual label")

    plt.subplot(1,3,3)
    plt.imshow(pred)
    plt.title("Predicted label")

    fig.tight_layout()
    plt.show()

In [None]:
# Inference on 1 test image

plot_predictions(test_images_tf[0], test_masks_tf[0], model= unet_model)

In [None]:
# Inference on 5 test images

for i in range(10,15):
    plot_predictions(test_images_tf[i], test_masks_tf[i], model = unet_model)

## Compute class-wise metrics:  IOU and Dice Score
* **Intersection over union (IoU)**: It is known to be a good metric for measuring overlap between two bounding boxes or masks[Ground truth mask vs predicted mask]. If the prediction is completely correct, IoU = 1. The lower the IoU, the worse the prediction result.
<center>
<img src="https://cdn.iisc.talentsprint.com/AIandMLOps/Images/IoU.jpg" width=400px height=200px/>
</center>
<br><br>

* **Dice score/coefficient**: It can be used to compare the pixel-wise agreement between a predicted segmentation and its corresponding ground truth.
The Dice coefficient is 2 times The area of Overlap divided by the total number of pixels in both images.

<center>
<img src="https://cdn.iisc.talentsprint.com/AIandMLOps/Images/Dice_score.jpg" width=450px height=200px/>
</center>
<br><br>

In [None]:
# Function to compute class wise IoU and Dice score for a single test image and its prediction

def class_wise_metrics(y_true, y_pred):
    class_wise_iou = []
    class_wise_dice_score = []

    smoothening_factor = 0.00001
    for i in range(3):     # 3 -> no. of classes
        intersection = np.sum((y_pred == i) * (y_true == i))
        y_true_area = np.sum((y_true == i))
        y_pred_area = np.sum((y_pred == i))
        combined_area = y_true_area + y_pred_area

        iou = (intersection + smoothening_factor) / (combined_area - intersection + smoothening_factor)
        class_wise_iou.append(iou)

        dice_score =  2 * ((intersection + smoothening_factor) / (combined_area + 2 * smoothening_factor))
        class_wise_dice_score.append(dice_score)

    return class_wise_iou, class_wise_dice_score

In [None]:
# Test `class_wise_metrics` function

test_0 = test_images_tf[0]
true_0 = test_masks_tf[0]
pred_0 = infer(test_0, model= unet_model)

class_wise_metrics(true_0.numpy(), pred_0)

### Calculate the metrics

In [None]:
# Prediction for entire test_dataset
test_preds = unet_model.predict(test_dataset)

In [None]:
test_preds.shape

In [None]:
# Process the prediction output
test_predictions = np.argmax(test_preds, axis=3)
test_predictions = test_predictions.reshape(-1,128,128,1)
test_predictions.shape

In [None]:
# Compute the class wise metrics for all test images
cls_wise_iou_scores = []
cls_wise_dice_scores = []
for i in range(test_predictions.shape[0]):
    test_i = test_images_tf[i]
    true_i = test_masks_tf[i]
    pred_i = test_predictions[i,:,:,:]
    iou, dice = class_wise_metrics(true_i.numpy(), pred_i)
    cls_wise_iou_scores.append(iou)
    cls_wise_dice_scores.append(dice)

# Take average to get the final result over the test set
cls_wise_iou = np.array(cls_wise_iou_scores).mean(axis=0).round(2)
cls_wise_dice_score = np.array(cls_wise_dice_scores).mean(axis=0).round(2)

In [None]:
# show the IOU for each class
class_names = ["pet", "background", "outline"]

for idx, iou in enumerate(cls_wise_iou):
  spaces = ' ' * (10-len(class_names[idx]) + 2)
  print("{}{}{} ".format(class_names[idx], spaces, iou))

In [None]:
# show the Dice Score for each class
for idx, dice_score in enumerate(cls_wise_dice_score):
  spaces = ' ' * (10-len(class_names[idx]) + 2)
  print("{}{}{} ".format(class_names[idx], spaces, dice_score))

## **Implement DeepLabV3+**

Downsampling is widely adopted in deep convolutional neural networks (CNN) for reducing memory consumption while preserving the transformation invariance to some degree.

Multiple downsampling of a CNN will lead the feature map resolution to become smaller, resulting in lower prediction accuracy and loss of boundary information in semantic segmentation.

DeepLabv3+ helps in solving these issues by including **atrous convolutions**. They aggregates context around a feature which helps in segmenting it better.

<br>

#### **Atrous Convolution/Dilated Convolution**

It is a tool for refining the effective field of view of the convolution. It modifies the field of view using a parameter termed ***atrous rate*** or ***dilation rate (d)***.

With dilated convolution, as we go deeper in the network, we can keep the stride constant but with larger field-of-view without increasing the number of parameters or the amount of computation. It also enables larger output feature maps, which is useful for semantic segmentation.

In the below figure, Atrous/Dilated Convolution has wider field of view with same number of parameters as Normal convolution. Only the pink ones will be consider, green ones will be ignored.

<br>
<img src="https://cdn.iisc.talentsprint.com/AIandMLOps/Images/Dilated_Conv.jpg" width=500px>
<br><br>



#### **DeepLabv3+**

Earlier version, DeepLabv3 has a problem of consuming too much time to process high-resolution images. DeepLabv3+ is a semantic segmentation architecture that improves upon DeepLabv3 with several improvements, such as adding an effective decoder module to refine the segmentation results.

The below figure shows the typical architecture of DeepLabv3+. The encoder module processes multiscale contextual information by applying dilated/atrous convolution at multiple scales, while the decoder module refines the segmentation results along object boundaries.

<br>
<img src="https://cdn.iisc.talentsprint.com/AIandMLOps/Images/deeplabv3_plus_diagram.png" >
<br><br>

Deeplabv3+ employs Aligned Xception network as its main feature extractor (encoder), although with substantial modifications. Depth-wise separable convolution replaces all max pooling procedures.

The reason for using **Dilated Spatial Pyramid Pooling** is that it was shown that as the sampling rate becomes larger, the number of valid filter weights (i.e., weights that are applied to the valid feature region, instead of padded zeros) becomes smaller.


In Model Playground, we can select feature extraction (encoding) network to use as either **Resnet** or EfficientNet.

For our model, we use the below architecture.

<br>
<img src="https://cdn.iisc.talentsprint.com/AIandMLOps/Images/deeplabv3_plus_model.png" width=1000px>
<br><br>

As we use ResNet-50 as the backbone network, let's check the different layers present in it.

In [None]:
# Create ResNet-50 architecture for explore purpose
res_input = keras.Input(shape=(128, 128, 3))
resnet50 = keras.applications.ResNet50(weights="imagenet", include_top=False, input_tensor = res_input)

# Layers present in ResNet-50 network
resnet50.summary()

From the above layers,

- Use the low-level features from the `conv2_block3_2_relu` layer of the ResNet-50 network to fead in Decoder.

- Use the features from the `conv4_block6_2_relu` layer of the ResNet-50 to fead in Dilated Spatial Pyramid Pooling module.



Let's create a function, `convolution_block()`, to add a convolution layer, a BatchNormalization layer, and apply ReLu activation in one go.

In [None]:
def convolution_block(block_input, num_filters=256, kernel_size=3, dilation_rate=1, padding="same", use_bias=False):
    x = layers.Conv2D(num_filters,
                      kernel_size=kernel_size,
                      dilation_rate=dilation_rate,
                      padding=padding,
                      use_bias=use_bias,
                      kernel_initializer=keras.initializers.HeNormal())(block_input)
    x = # YOUR CODE HERE to add a BatchNormalization layer
    x = keras.activations.relu(x)
    return x  #tf.nn.relu(x)

Create another function to perform Dilated Spatial Pyramid Pooling. Use above function to add different convolution blocks.

In [None]:
def DilatedSpatialPyramidPooling(dspp_input):
    dims = dspp_input.shape

    # 1x1 Conv rate=1
    out_1 = convolution_block(dspp_input, kernel_size=1, dilation_rate=1)
    # 3x3 Conv rate=6
    out_6 = convolution_block(dspp_input, kernel_size=3, dilation_rate=6)
    # 3x3 Conv rate=12
    out_12 = convolution_block(dspp_input, kernel_size=3, dilation_rate=12)
    # 3x3 Conv rate=18
    out_18 = convolution_block(dspp_input, kernel_size=3, dilation_rate=18)

    # Image pooling
    x = layers.AveragePooling2D(pool_size=(dims[-3], dims[-2]))(dspp_input)
    x = convolution_block(x, kernel_size=1, use_bias=True)
    out_pool = layers.UpSampling2D(size = (dims[-3] // x.shape[1], dims[-2] // x.shape[2]), interpolation = "bilinear")(x)

    # Concat
    resultant = layers.Concatenate(axis=-1)([out_pool, out_1, out_6, out_12, out_18])

    return resultant


### Create Encoder

Create a function to implement the architecture for Encoder block. Use **ResNet50** pretrained on ImageNet as the backbone network. Use the features from the `conv4_block6_2_relu` layer of the backbone to fead in Dilated Spatial Pyramid Pooling module. Then return the backbone network along with encoder output.

In [None]:
def Encoder(model_input):
    # Backbone network
    resnet50 = keras.applications.ResNet50(weights="imagenet", include_top=False, input_tensor=model_input)
    # Features from backbone network to fead in DSPP
    x = resnet50.get_layer("conv4_block6_2_relu").output
    # DSPP module
    concat_out = DilatedSpatialPyramidPooling(x)
    # 1x1 Conv
    output = convolution_block(concat_out, kernel_size=1)

    return resnet50, output


### Create Decoder

Create a function to implement the architecture for Decoder block. The encoder features are first bilinearly upsampled by a factor 4, and then concatenated with the corresponding low-level features (the `conv2_block3_2_relu` layer) from the network backbone that have the same spatial resolution.


In [None]:
def Decoder(image_size, back_network, x):
    # Output from Encoder, upsample by 4
    input_a = layers.UpSampling2D(size = (image_size // 4 // x.shape[1], image_size // 4 // x.shape[2]),
                                  interpolation = "bilinear")(x)
    # Low-level features from backbone network
    input_b = back_network.get_layer("conv2_block3_2_relu").output
    # Add 1x1 Conv on low-level features
    input_b = convolution_block(input_b, num_filters=48, kernel_size=1)

    # Concat
    x = layers.Concatenate(axis=-1)([input_a, input_b])
    # Add 3x3 Conv blocks
    x = convolution_block(x)
    x = convolution_block(x)

    # Resultant upsample by 4
    output = layers.UpSampling2D(size = (image_size // x.shape[1], image_size // x.shape[2]),
                            interpolation = "bilinear")(x)
    return output


### Create Model

Create a function to implement DeepLabV3+ architecture.

In [None]:
def DeeplabV3Plus(image_size, num_classes):
    model_input = keras.Input(shape=(image_size, image_size, 3))
    # Encoder part
    back_network, x = Encoder(model_input)
    # Decoder part
    x = Decoder(image_size, back_network, x)

    # Output/prediction layer
    model_output = layers.Conv2D(num_classes, kernel_size=(1, 1), padding="same")(x)

    return # YOUR CODE HERE to return a keras Model with (inputs=model_input, outputs=model_output)


In [None]:
# Create model
deeplab_model = DeeplabV3Plus(image_size = 128, num_classes = 3)
deeplab_model.summary()

### Training

We train the model using sparse categorical crossentropy as the loss function, and
Adam as the optimizer.

In [None]:
# Compile model
loss = keras.losses.SparseCategoricalCrossentropy(from_logits=True)
deeplab_model.compile(optimizer = keras.optimizers.Adam(learning_rate=0.001),
                      loss = loss,
                      metrics = ["accuracy"])

In [None]:
# configure the training parameters and train the model

EPOCHS = 20

deeplab_model_history = deeplab_model.fit(train_dataset,
                                          epochs=EPOCHS,
                                          validation_data=val_dataset)

In [None]:
# Save the model weights
deeplab_model.save_weights('deeplabv3plus_pets_model.weights.h5')

In [None]:
# To use the model later, create the model with the same architecture and load the model weights
## deeplab_model.load_weights('deeplabv3plus_pets_model.weights.h5')

In [None]:
# Display learning curves
display_learning_curves(deeplab_model_history)

### Visualize Predictions

In [None]:
# Inference on 1 test image

plot_predictions(test_images_tf[0], test_masks_tf[0], model= deeplab_model)

In [None]:
plot_predictions(test_images_tf[1], test_masks_tf[1], model= deeplab_model)

In [None]:
# Inference on 5 test images

for i in range(10,15):
    plot_predictions(test_images_tf[i], test_masks_tf[i], model= deeplab_model)

### Calculate the metrics

In [None]:
# Prediction for entire test_dataset

# feed the test set to the deeplab model to get the predicted masks
test_preds_deeplab = deeplab_model.predict(test_dataset)

In [None]:
test_preds_deeplab.shape

In [None]:
# Process prediction output
test_predictions_deeplab = np.argmax(test_preds_deeplab, axis=3)
test_predictions_deeplab = test_predictions_deeplab.reshape(-1,128,128,1)
test_predictions_deeplab.shape

In [None]:
# Compute the class wise metrics for all test images
deeplab_cls_wise_iou_scores = []
deeplab_cls_wise_dice_scores = []
for i in range(test_predictions_deeplab.shape[0]):
    test_i = # YOUR CODE HERE
    true_i = # YOUR CODE HERE
    pred_i = # YOUR CODE HERE
    iou, dice = # YOUR CODE HERE for class_wise_metrics(...)
    deeplab_cls_wise_iou_scores.append(iou)
    deeplab_cls_wise_dice_scores.append(dice)

# Take average to get the final result over the test set
deeplab_cls_wise_iou = np.array(deeplab_cls_wise_iou_scores).mean(axis=0).round(2)
deeplab_cls_wise_dice_score = np.array(deeplab_cls_wise_dice_scores).mean(axis=0).round(2)

In [None]:
# show the IOU for each class
class_names = ["pet", "background", "outline"]

for idx, iou in enumerate(deeplab_cls_wise_iou):
  spaces = ' ' * (10-len(class_names[idx]) + 2)
  print("{}{}{} ".format(class_names[idx], spaces, iou))

In [None]:
# show the Dice Score for each class
for idx, dice_score in enumerate(deeplab_cls_wise_dice_score):
  spaces = ' ' * (10-len(class_names[idx]) + 2)
  print("{}{}{} ".format(class_names[idx], spaces, dice_score))

### Compare with UNet

In [None]:
# Plot bar chart to show IoU scores for predictions from both models

fig = plt.figure(figsize =(6,4))
X = np.arange(3)
plt.bar(X + 0.25, deeplab_cls_wise_iou, color = 'g', width = 0.25, label = 'Deeplabv3+')
plt.bar(class_names, cls_wise_iou, color = 'b', width = 0.25, label = 'UNet')
plt.xlabel('Class label', fontsize = 12)
plt.ylabel('IoU score', fontsize = 12)
plt.legend()
plt.show()

In [None]:
# Plot bar chart to show Dice scores for predictions from both models

fig = plt.figure(figsize =(6,4))
X = np.arange(3)
plt.bar(X + 0.25, deeplab_cls_wise_dice_score, color = 'g', width = 0.25, label = 'Deeplabv3+')
plt.bar(class_names, cls_wise_dice_score, color = 'b', width = 0.25, label = 'UNet')
plt.xlabel('Class label', fontsize = 12)
plt.ylabel('Dice score', fontsize = 12)
plt.legend()
plt.show()

### Please answer the questions below to complete the experiment:

In [None]:
#@title The encoder also returns the complete resnet50 as an output while implementing Deeplabv3+ in this notebook. In the decoder, what layer is used from that resnet 50? {run: "auto", form-width: "500px", display-mode: "form" }
Answer = "" #@param ["", "conv4_block6_2_relu", "conv2_block3_2_relu", "the last layer of resnet50 i.e. conv5_block3_out"]

In [None]:
#@title How was the experiment? { run: "auto", form-width: "500px", display-mode: "form" }
Complexity = "" #@param ["","Too Simple, I am wasting time", "Good, But Not Challenging for me", "Good and Challenging for me", "Was Tough, but I did it", "Too Difficult for me"]


In [None]:
#@title If it was too easy, what more would you have liked to be added? If it was very difficult, what would you have liked to have been removed? { run: "auto", display-mode: "form" }
Additional = "" #@param {type:"string"}


In [None]:
#@title Can you identify the concepts from the lecture which this experiment covered? { run: "auto", vertical-output: true, display-mode: "form" }
Concepts = "" #@param ["","Yes", "No"]


In [None]:
#@title  Text and image description/explanation and code comments within the experiment: { run: "auto", vertical-output: true, display-mode: "form" }
Comments = "" #@param ["","Very Useful", "Somewhat Useful", "Not Useful", "Didn't use"]


In [None]:
#@title Mentor Support: { run: "auto", vertical-output: true, display-mode: "form" }
Mentor_support = "" #@param ["","Very Useful", "Somewhat Useful", "Not Useful", "Didn't use"]


In [None]:
#@title Run this cell to submit your notebook for grading { vertical-output: true }
try:
  if submission_id:
      return_id = submit_notebook()
      if return_id : submission_id = return_id
  else:
      print("Please complete the setup first.")
except NameError:
  print ("Please complete the setup first.")