Before I answer the questions, it is important for me to note that due to lack of memory in the computer (locally), I had to train the model on a remote server. That's why I'm attaching two files. The first file is a Jupyter Notebook file, in order to run it you need to change the path in cell number 3. And the second file is a regular .py file that I wrote to train on the remote server. Thanks and sorry for the inconvenience.
I will attach all the necessary files to GitHub.

Question 1

Image segmentation is a method in which a digital image is broken down into various subgroups called Image segments which helps in reducing the complexity of the image to make further processing or analysis of the image simpler. Segmentation in easy words is assigning labels to pixels.

Similarity approach: This approach is based on detecting similarity between image pixels to form a segment, based on a threshold. ML algorithms like clustering are based on this type of approach to segment an image.
Discontinuity approach: This approach relies on the discontinuity of pixel intensity values of the image. Line, Point, and Edge Detection techniques use this type of approach for obtaining intermediate segmentation results which can be later processed to obtain the final segmented image.


Types of Image Segmentation tasks:


1) Image segmentation tasks can be classified into three groups based on the amount and type of information they convey.


2) Semantic segmentation refers to the classification of pixels in an image into semantic classes. Pixels belonging to a particular class are simply 

classified to that class with no other information or context taken into consideration. 


3) Instance segmentation models classify pixels into categories on the basis of “instances” rather than classes. 


4) Panoptic segmentation, the most recently developed segmentation task, can be expressed as the combination of semantic segmentation and instance segmentation where each instance of an object in the image is segregated and the object’s identity is predicted. 


According to studies and articles, the best approach to solve a segmentation problem is through the use of the UNET network.
What exactly is a UNET network?


UNET is a U-shaped encoder-decoder network architecture, which consists of four encoder blocks and four decoder blocks that are connected via a bridge. The encoder network (contracting path) half the spatial dimensions and double the number of filters (feature channels) at each encoder block. Likewise, the decoder network doubles the spatial dimensions and half the number of feature channels.


The encoder network acts as the feature extractor and learns an abstract representation of the input image through a sequence of the encoder blocks.
These skip connections provide additional information that helps the decoder to generate better semantic features. They also act as a shortcut connection that helps the indirect flow of gradients to the earlier layers without any degradation.


The bridge connects the encoder and the decoder network and completes the flow of information.


The decoder network is used to take the abstract representation and generate a semantic segmentation mask.


In our case, I decided to use MobileNetV2 as the Encoder (the left part of the U shape). MobileNetV2 is an architecture that is optimized for mobile devices. It improves the state of the art performance of mobile models on multiple tasks and benchmarks as well as across a spectrum of different model sizes.


Advantages of using MobileNetV2 as an Encoder:


1) MobileNetV2 has less parameters, due to which it is easy to train.


2) Using a pre-trained encoder helps the model to converge much faster in comparison to the non-pretrained model.


3) A pre-trained encoder helps the model to achieve high performance as compared to a non pre-trained model.


As a loss function, I used dice loss. Dice Loss is widely used in medical image segmentation tasks to address the data imbalance problem. However, it only addresses the imbalance problem between foreground and background yet overlooks another imbalance between easy and hard examples that also severely affects the training process of a learning model.


Question 2


The base model I used is SegNet, generally composed of Convolutional Encoder-Decoder. I then built the UNET in order to use a more complicated network that aims to give better results.


The Dice coefficient is very similar to the IoU. They are positively correlated, meaning if one says model A is better than model B at segmenting an image, then the other will say the same. Like the IoU, they both range from 0 to 1, with 1 signifying the greatest similarity between predicted and truth. The most commonly used metrics for semantic segmentation are the IoU and the Dice Coefficient.


At first I trained the UNET network from Scretch. Then in order to improve results I decided to use transfer learning in order to use an already trained network in order to get better results. The results with the help of transfer learning are indeed better, after the convergence of the model I tried to improve beyond that through fine tuning.

In [None]:
import os
import numpy as np
import cv2
import glob
import tensorflow as tf
from tensorflow import keras
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

from tensorflow.keras.layers import Conv2D, Activation, BatchNormalization
from tensorflow.keras.layers import UpSampling2D, Input, Concatenate
from tensorflow.keras.models import Model
from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau
from tensorflow.keras.metrics import Recall, Precision
from tensorflow.keras import backend as K

In [None]:
np.random.seed(42)
tf.random.set_seed(42)

#Initializing parameters
IMAGE_SIZE = 256
EPOCHS = 15
BATCH = 16
LR = 1e-4


In [None]:
BASE_OUTPUT = '/mnt/md0/mory/DUTS-TR/'
PLOT_PATH = os.path.sep.join([BASE_OUTPUT,
	"loss_plot.png"])

images_path ='/mnt/md0/mory/DUTS-TR/DUTS-TR-Image'
mask_path = '/mnt/md0/mory/DUTS-TR/DUTS-TR-Mask'

In [None]:
#Sorting the names of the files
images_names = sorted(
    [
        os.path.join(images_path, fname)
        for fname in os.listdir(images_path)
        if fname.endswith(".jpg")
    ]
)
mask_names = sorted(
    [
        os.path.join(mask_path, fname)
        for fname in os.listdir(mask_path)
        if fname.endswith(".png") and not fname.startswith(".")
    ]
)

print(len(images_names))
print(len(mask_names))

In [None]:
#Load data
print("Loading Images...")

#Capture training image info as a list
train_images = []

for directory_path in glob.glob(images_path):
    for img_path in glob.glob(os.path.join(directory_path, "*.jpg")):
        img = cv2.imread(img_path, cv2.IMREAD_COLOR)
        img = cv2.resize(img, (IMAGE_SIZE, IMAGE_SIZE))
        img = img/255.0   
        train_images.append(img)
        
train_images = np.array(train_images)        
print(train_images.shape)

In [None]:
print("Loading Masks...")

#Capture mask/label info as a list
train_masks = [] 

for directory_path in glob.glob(mask_path):
    for mask_path in glob.glob(os.path.join(directory_path, "*.png")):
        mask = cv2.imread(mask_path, cv2.IMREAD_GRAYSCALE)
        mask = cv2.resize(mask, (IMAGE_SIZE, IMAGE_SIZE))
        mask = np.expand_dims(mask, axis = -1)
        mask = mask/255.0
        #mask = cv2.resize(mask, (SIZE_Y, SIZE_X), interpolation = cv2.INTER_NEAREST)  #Otherwise ground truth changes due to interpolation
        train_masks.append(mask)
        
#Convert list to array for machine learning processing          
train_masks = np.array(train_masks)
print(train_masks.shape)

In [None]:
np.random.seed(42)
tf.random.set_seed(42)

#Picking 20% for testing and remaining for training
train_x, test_x, train_y, test_y = train_test_split(train_images, train_masks, test_size = 0.20, random_state = 42)

#Checking the shapes after the split
print(train_x.shape)
print(test_x.shape)
print(train_y.shape)
print(test_y.shape)

In [None]:
#Defining model architecture 
def model():
    inputs = Input(shape=(IMAGE_SIZE, IMAGE_SIZE, 3), name="input_image")
    
    encoder = MobileNetV2(input_tensor=inputs, weights="imagenet", include_top=False, alpha=0.35)
    skip_connection_names = ["input_image", "block_1_expand_relu", "block_3_expand_relu", "block_6_expand_relu"]
    encoder_output = encoder.get_layer("block_13_expand_relu").output
    
    f = [16, 32, 48, 64]
    x = encoder_output
    for i in range(1, len(skip_connection_names)+1, 1):
        x_skip = encoder.get_layer(skip_connection_names[-i]).output
        x = UpSampling2D((2, 2))(x)
        x = Concatenate()([x, x_skip])
        
        x = Conv2D(f[-i], (3, 3), padding="same")(x)
        x = BatchNormalization()(x)
        x = Activation("relu")(x)
        
        x = Conv2D(f[-i], (3, 3), padding="same")(x)
        x = BatchNormalization()(x)
        x = Activation("relu")(x)
        
    x = Conv2D(1, (1, 1), padding="same")(x)
    x = Activation("sigmoid")(x)
    
    model = Model(inputs, x)
    return model

In [None]:
#Defining loss function 
smooth = 1e-15
def dice_coef(y_true, y_pred):
    y_true = tf.keras.layers.Flatten()(y_true)
    y_pred = tf.keras.layers.Flatten()(y_pred)
    intersection = tf.reduce_sum(y_true * y_pred)
    return (2. * intersection + smooth) / (tf.reduce_sum(y_true) + tf.reduce_sum(y_pred) + smooth)

def dice_loss(y_true, y_pred):
    return 1.0 - dice_coef(y_true, y_pred)

In [None]:
model = model()
model.summary()  

In [None]:
#Compiling the model
print("[INFO] compiling model...")
opt = tf.keras.optimizers.Adam(LR)
metrics = [dice_coef, Recall(), Precision()]
model.compile(loss=dice_loss, optimizer=opt, metrics=metrics)

In [None]:
input_shape = (128, 128, 3)
model = U_Net(input_shape)
model.summary()

In [None]:
#Defining callbacks
callbacks = [
    ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=4),
    EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=False),
    WandbCallback()
]

In [None]:
print("[INFO] training model...")
history=model.fit(
    train_x, 
    train_y,
    batch_size=BATCH, 
    epochs=EPOCHS,
    verbose=1,
    validation_data=(test_x, test_y),
    callbacks=callbacks)

print("[INFO] saving model...")
model.save("model_seg.h5")

In [None]:
def plot_training(H, plotPath):
	# construct a plot that plots and saves the training history
    plt.style.use("ggplot")
    plt.figure()
    plt.plot(H.history["loss"], label="train_loss")
    plt.plot(H.history["val_loss"], label="val_loss")
    plt.plot(H.history["dice_coef"], label="train_acc")
    plt.plot(H.history["val_dice_coef"], label="val_acc")
    plt.title("Training Loss and Accuracy")
    plt.xlabel("Epochs")
    plt.ylabel("Loss/Accuracy")
    plt.legend(loc="lower left")
    plt.savefig(plotPath)

plot_training(history, PLOT_PATH)  