# Semantic Segmentation
The objective of segmentation is to associate each pixel of an image with a well-defined class by providing a map of labels.
In this notebook we will use U-Net to perform semantic segmentation.
To this end we will use functions that belong to the libraries **keras_hub** and **easy_cv_dataset** that will be installed with the following instruction:

In [None]:
!pip install -q --upgrade keras-hub git+https://github.com/davin11/easy-cv-dataset 

Then, the following file `unet_hub.py` must be downloaded with the code:

In [None]:
SITE="https://raw.githubusercontent.com/davin11/easy-cv-dataset/master"
!wget -nc {SITE}/examples/segmentation/unet_hub.py

Now, we will import the stantard libraries, keras_hub, and easy_cv_dataset (with the alias ds).

In [None]:
%reset -f
import numpy as np
import matplotlib.pyplot as plt
import skimage.io as io
import keras
import keras_hub
import easy_cv_dataset as ds

### Data preparation
We will use the dataset [Oxford-IIIT Pet Dataset](https://www.robots.ox.ac.uk/~vgg/data/pets/), that includes images of cats and dogs. For each image a segmentation map is available that identifies the object (foreground) from the background.
On notebook you can directly execute the following instructions to download the dataset which is already separated in training, validation and test set:

In [None]:
# Oxford Pets Dataset
SITE=""
!wget -nc {SITE}/guide_TF/oxford_pets_dataset.zip
!unzip -q -n oxford_pets_dataset.zip

At this point you will find a folder called `oxford_pets_dataset` which contains images and segmentation maps.
In addition, you will find three CSV (Comma-Separated Values) files: `tab_train.csv`, `tab_val.csv` and `tab_test.csv`, for training, validation and test set, respectively.
Files with extension CSV (Comma-Separated Values) are textual files that contain a table and adopt the comma character (,) to separate columns. In our case these files contain two columns: *image* and *segmentation_mask* with the filepath of the images and the segmentation maps.
We will use the function `ds.image_segmentation_dataset_from_dataframe` to prepare images in the training, validation and test set.

In [None]:
class_names = ["Background", "Foreground",]
num_classes = 2

BATCH_SIZE=16
IMAGE_SIZE=160

from keras.layers import Resizing, RandomColorDegeneration, RandomRotation, Pipeline

pre_batching_processing = Resizing(IMAGE_SIZE, IMAGE_SIZE)
post_batching_processing = Pipeline(
    layers=[
        RandomColorDegeneration(0.5),
        RandomRotation((-0.06, 0.06)), # from -6% to +6% of 360Â°
    ]
)

print('test-set')
test_ds  = ds.image_segmentation_dataset_from_dataframe('oxford_pets_dataset/tab_test.csv', class_mode='categorical', class_names=class_names,
                                                        pre_batching_processing=pre_batching_processing, shuffle=False, batch_size=BATCH_SIZE)

print('trainig-set')
train_ds = ds.image_segmentation_dataset_from_dataframe('oxford_pets_dataset/tab_train.csv', class_mode='categorical', class_names=class_names,
                                                        pre_batching_processing=pre_batching_processing, shuffle=True , batch_size=BATCH_SIZE, post_batching_processing=post_batching_processing)

print('validetion-set')
valid_ds = ds.image_segmentation_dataset_from_dataframe('oxford_pets_dataset/tab_val.csv' , class_mode='categorical', class_names=class_names,
                                                        pre_batching_processing=pre_batching_processing, shuffle=False, batch_size=BATCH_SIZE)

The function `ds.image_segmentation_dataset_from_dataframe` requires as first parameter the CSV file.
The second parameter `class_mode` indicates the format needed to convert segmentation maps which is `'categorical'`.
Finally, the third parameter `class_names` is a list of the classes names.
The other parameters present in `da.image_segmentation_dataset_from_dataframe` are the same of
the function `ds.image_classification_dataset_from_dataframe`, already seen in previous examples.
Note that for all the images of the three datasets it is necessary a resizing operation to `224x224` pixels so that all the images have the same dimension. 
For the training dataset there are already Data Augmentation operations.
Let's use the following instruction to visualize some examples of the test set:


In [None]:
from easy_cv_dataset.visualization import plot_segmentation_mask_gallery
for images, segms in test_ds.take(1): # takes the first batch of test-set
  plot_segmentation_mask_gallery( # function to display image and box
    images, y_true=segms, num_classes=num_classes
  )

### Neural network definition
For this example, we will use the U-Net architecture for image segmentation.
The architecture is shown in following figure and consists of two parts: on the left the *encoder*, also called *contracting path* and on the right the *decoder*, called *expansive path*.

![U-Net](https://raw.githubusercontent.com/davin11/easy-cv-dataset/master/examples/segmentation/u_net.png)

The *encoder* compactly extracts high-level information such as context.
While the *decoder* restores the spatial dimensions ensuring precise localization.
The *encoder* follows the typical architecture of a convolutional neural network composed of convolutional layers and pooling layers. 
Pooling layers reduce the spatial dimensions of *feature maps* by halving them each time.
Downstream of the *encoder*, the *feature maps* will have a spatial resolution equal to one sixteenth of those of the starting image.
The *decoder*, in addition to the classic convolutional layers, provides for operations named *up-conv* which double the spatial dimensions each time.
The *up-conv* operation is formed by cascading a nearest-neighbor tween and a `2x2` pixel spatial convolution.

The U-Net architecture also includes shortcuts between the two parts, called (*skip connection*),
and show gray arrows in the illustration in figure. In particular, the feature-maps calculated at the various levels of the *encoder* are supplied in input to the real levels of the *decoder* by concatenating them with those of the previous level. Using the *skip connection* helps to improve the accuracy of the segmentation map.

Note that the U-Net architecture is a fully convolutional network, in fact it does not foresee *fully connected* layers. Therefore it can be applied to images of different sizes. The only constraint is that the number of rows and columns of the input image must be multiples of `16`.
In the `unet_hub.py` file there is the `UnetBackbone` function to instantiate the U-Net backbone architecture in KerasHub.


In [None]:
from unet_hub import UnetBackbone, ImageSemanticSegmenter
from keras_hub.layers import ImageConverter
from keras_hub.models import ImageSegmenterPreprocessor
backbone = UnetBackbone(use_batchnorm=True) 
normalization = ImageConverter(image_size=(IMAGE_SIZE, IMAGE_SIZE), scale=1./255)
model = ImageSemanticSegmenter(
    preprocessor = ImageSegmenterPreprocessor(normalization),
    backbone = backbone,
    num_classes = 2,
    activation=None)
model.summary()

The parameter `num_classes` specifies the number of output classes, while `use_batchnorm=True` enables the use of batch normalization layers compared to the original architecture described in the literature.

### Training
In this example, we will use Nadam optimization and as loss function the Focal Loss which is a variant of the cross-entropy loss, very muchused in segmentation problems.


In [None]:
from keras.losses import CategoricalFocalCrossentropy
from keras.optimizers import Nadam
from keras.metrics import MeanIoU
model.compile(
  loss=CategoricalFocalCrossentropy(from_logits=True),
  optimizer=Nadam(learning_rate=0.001),
  metrics=[MeanIoU(num_classes=2, sparse_y_true=False, sparse_y_pred=False),],
)

We do not use accuracy as a performance index, but Intersection over Union (IoU).
In fact, for segmentation, accuracy is not a reliable index as it is influenced by the size of the region to be segmented with respect to that of the image.
The IoU instead considers the percentage of correct pixels with respect to the size of the region to be segmented and of the predicted region. 
Parameters `sparse_y_true=False` and `sparse_y_pred=False` indicate to the function that will calculate the IoU that the segmentation maps and the network output respectively are in the format categorical.
We carry out the training using the `fit` function:

In [None]:
model.fit(train_ds, epochs=5, validation_data=valid_ds, verbose=True)
model.save_weights('net.weights.h5')

After training, the `save_weights` method is used to save the network parameters to a file in the HDF5 format. 

### Evaluation
Let's use the following instructions to see the network result on some examples from the test set:

In [None]:
from easy_cv_dataset.visualization import plot_segmentation_mask_gallery
for images, segms in test_ds.take(1):
  pred = model.predict(images)
  plot_segmentation_mask_gallery(
    images, y_true=segms, y_pred=pred, num_classes=num_classes
  )

Finally, let's use the function `evaluate` to compute the average intersection over union on the whole test set.

In [None]:
metrics = model.evaluate(test_ds, return_dict=True, verbose=True)
print(metrics)

### Finetuning
Now, we will use a partially pre-trained architecture. In particular, instead of the classic encoder of the U-Net architecture, we will use a ResNet50 network, pre-trained for classification on the ImageNet dataset.


In [None]:
from unet_hub import UnetBackbone, ImageSemanticSegmenter
from keras_hub.models import Backbone, ImageSegmenterPreprocessor
from keras_hub.layers import ImageConverter

pretrained_model = 'resnet_50_imagenet'
image_encoder = Backbone.from_preset(pretrained_model)  # Encoder 
backbone = UnetBackbone(image_encoder=image_encoder, use_batchnorm=True) # Encoder + Decoder
normalization = ImageConverter.from_preset(pretrained_model, image_size=(IMAGE_SIZE, IMAGE_SIZE))
model = ImageSemanticSegmenter(
    preprocessor = ImageSegmenterPreprocessor(normalization),
    backbone = backbone,
    num_classes = 2,
    activation=None)

To reduce the risk of overfitting, we can skip training the first layers of the network. If you don't want to train the entire encoder, use the following code:

In [None]:
model.backbone.image_encoder.trainable = False
model.summary()

In [None]:
model.backbone.summary()