# Computer vision and deep learning - Laboratory 4
 
In this laboratory we'll work with a semantic segmentation model. The task of semantic segmentation implies the labeling/classification of __all__ the pixels in the input image. So, in this case, the output is not a single class label, as for classification, but an 2D array.
 
For the next two labs, you'll build and train a fully convolutional neural network inspired by U-Net. Finally, you'll implement several metrics suitable for evaluating segmentation models.

Today, we'll focus on the data loading and preprocessing part, and we'll study the building blocks of the sematic segmentation module.


In [1]:
!pip install wget

import os
import cv2
import wget
import glob
import torch
import shutil
import numpy as np

import matplotlib.pyplot as plt


Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting wget
  Downloading wget-3.2.zip (10 kB)
Building wheels for collected packages: wget
  Building wheel for wget (setup.py) ... [?25l[?25hdone
  Created wheel for wget: filename=wget-3.2-py3-none-any.whl size=9675 sha256=3e39df938f75937aa15c78db1e92252f805e43b3d085eceaa97b49c132d71eac
  Stored in directory: /root/.cache/pip/wheels/a1/b6/7c/0e63e34eb06634181c63adacca38b79ff8f35c37e3c13e3c02
Successfully built wget
Installing collected packages: wget
Successfully installed wget-3.2


## Data loading
 
You will work with the OxfordPets dataset; we are aware that this dataset is also present in _torchvision_ module, but in this laboratory you are required to write the data loading from scratch. 

Each image has a segmentation mask assigned (with the same size as the input image); three classes are defined on each segmentation mask:
- Label 1: pet;
- Label 3: border of the pet;
- Label 2: background.
 
First let's write the code that will allow us to load this data.
As you remember from the previous lab, in _torch_ you have two data primitives that allow you to interact with the data: torch.utils.data.Dataset and torch.utils.data.DataLoader.

torch.utils.data.Dataset stores the samples and their corresponding labels, and DataLoader wraps an iterable around the Dataset to enable easy access to the samples.

[More details in the doc.](https://pytorch.org/tutorials/beginner/basics/data_tutorial.html)




## Writing a custom Dataset

In the file oxford_pets.py you have the boilerplate code for creating the custom dataset for this image segmentation problem.

Each custom dataset must implement the following methods:
- \_\_init\_\_ : the constructor is run when instantiating the Dataset object. Here you should do the initializations (input dirs, dataset splits) and the transforms (covered in more detail in the next section).
- \_\_len\_\_ : this should return the number of samples in the dataset;
- \_\_getitem\_\_ : this should load and return a sample from the dataset at the given index _idx_ (passed as a parameter). Based on the index, it identifies the image and its corresponding segmentation map location on disk, calls the transform functions on them (if applicable), and returns the tensor image and mask in a tuple.

In [2]:
!pip install wget
import os
import cv2
import wget
import torch
import random
import tarfile
import numpy as np
import matplotlib.pyplot as plt
import torchvision

class OxfordPets(torch.utils.data.Dataset):
    _URLS = [
        "https://www.robots.ox.ac.uk/~vgg/data/pets/data/images.tar.gz"
        "https://www.robots.ox.ac.uk/~vgg/data/pets/data/annotations.tar.gz"
    ]
    TARGET_SIZE = (224, 224)
    """
    OxfordPets segmentation dataset
    :param root_dir (string) - the root directory where the data is store
    :param is_train (bool) - whether to use the train or the test split of the data
    :param transforms (callable, optional): A function/transform that takes as input a sample and returns its
        transformed version (like a horizontal flip)
    :param download (bool, optional): whether to download the data
    """
    def __init__(self, root_dir, is_train, transforms=None, download=True):
        self.root_dir = root_dir
        self.transforms = transforms

        # if needed, download the dataset
        if download:
            self.download_dataset(root_dir)

        # the images are stored in the 'images' subfolder from root_dir as jpg file
        self.images_folder = os.path.join(self.root_dir, 'images')
        # the corresponding segmentation maps are stored in the 'annotations/trimaps'
        # subfolder from root_dir as png files
        anno_folder = os.path.join(self.root_dir, 'annotations')
        self.segmentations_folder = os.path.join(anno_folder, 'trimaps')

        # the dataset is already divided into train/test splits
        # these are stored in the 'annotations' subfolder in trainval.txt and test.txt respectively
        splits_file = os.path.join(anno_folder, 'trainval.txt' if is_train else 'test.txt')
        # TODO your code here
        # in these files, on each line you have the information about an image
        # split each line by spaces and take only the image name and store it to image_ids
        image_ids = []

        # then populate the lists images and segmentations with the full path of the images and their seg maps
        # for position i on this lists, you will have the ith image (images[i])
        # and its corresponding segmentation map (segmentations[i])
        # keep in mind that the images are jpg files stored in self.images_folder
        # and that the segmentation maps are png files stored in self.segmentation_folder
        self.images = []
        self.segmentations = []
        # end TODO your code here

    def __getitem__(self, idx: int):
        """
        Returns the idx-th sample from the dataset and its corresponding segmentation map
        """
        # TODO your code here
        # load the image and the segmentation map from position idx
        image = None
        seg = None # read the segmentation as grayscale
        # preprocess the segmentation mask using the preprocess_segmentation function
        # resize the mask and the input image to OxfordPets.TARGET_SIZE (use make_image_square and resize)
        # think about what interpolation should you be using when resizing the mask
        # if self.transfroms is not None, apply the transformation function
        # return a dictionary with the image and the segmentation map
        # {'image': None, 'segmentation': None}
        return None, None

    @staticmethod
    def preprocess_segmentation(mask: np.ndarray) ->np.ndarray:
        """"
        Preprocesses the segmentation mask such that the background pixels have a value of 0
        and the pet and border pixels have a value of 1
        :param segmentation mask
        """
        # TODO your code here
        # the pixels that have a value of 2 (background) should be set to 0
        # the pixels that have a value of 1 or 3 (pet or pet border) should be set to 1
        return None

    @staticmethod
    def make_image_square(img: np.ndarray, padding_mode: str = 'edge', padding_value: int = 0):
        """"
        Resizes the image such that it has an aspect ratio of 1.
        The smallest dimension is padded such that it has the same dimension as the largest dimension
        :param img - input image
        :param padding_mode - string descrbing the padding strategy (can be constant or edge).
                edge - the first and last rows and columns are replicated
                constants - the image is padded with constant values
        :param padding_value - in case of constant padding, the value used to pad the image.
        """
        # TODO your code here
        # determine the padding value
        padding = -1
        # use np.pad to pad the image
        # return the padded image
        return None


    @staticmethod
    def resize_image(img: np.ndarray, target_size: int = (224, 224), interpolation: int = cv2.INTER_LINEAR):
        """"
        Resizes the image to the specified size
        :param img - input image
        :param target_size - the requested size to which the image should be resized to)
        :param interpolation - interpolation type
        """
        # TODO your code here
        # resize the input image and return the result
        # cv2.resize might prove useful
        return None


    def __len__(self) -> int:
        """"
        Returns the size of this dataset (the number of images in the dataset)
        """
        # TODO your code here : return the number of images in the dataset
        return -1

    def download_dataset(self, root_dir: str):
        """"
        Downloads the OxfordPets images and annotations and saves them to root_dir
        :param root_dir(string): the directory where the data is downloaded
        """
        if not os.path.exists(root_dir):
            os.makedirs(root_dir)

            for url in OxfordPets._URLS:
                archive = wget.download(url, root_dir)
                with tarfile.open(archive) as archive_file:
                    archive_file.extractall(root_dir)
                os.remove(archive)
        else:
            print('Folder already exists, skipping download')

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
# let's test the code
root_dir = 'oxford_pets'
training_data = OxfordPets(
    root_dir=root_dir,
    is_train=True,
    download=True, transforms=torchvision.transforms.Compose([RandomCrop(200)])
)

figure = plt.figure(figsize=(8, 8))
cols, rows = 4, 4
for i in range(1, cols * rows + 1, 2):
    sample_idx = torch.randint(len(training_data), size=(1,)).item()
    sample = training_data[sample_idx]
    img, seg = sample['image'], sample['segmentation']*120
    figure.add_subplot(rows, cols, i)
    plt.title('image')
    plt.axis("off")
    plt.imshow(img)

    figure.add_subplot(rows, cols, i+1)
    plt.title('seg')
    plt.axis("off")
    plt.imshow(seg, cmap='gray')
plt.show()

test_data = OxfordPets(
    root_dir=root_dir,
    is_train=False,
    download=True,
)

Inspect the pixels values that you have in a segmentation mask.

In [None]:
# TODO your code here

## Transforms

Often you want to apply some transformations on the input data (for example bring them to a pre-established shape, normalizing them, converting them to tensor) or you need to apply some augmentation techniques.

This can be easily achieved via the _transforms_ callable that you sent to the constructor of the dataset (and applied it in _get_item()_).
Moreover, _torch_ offers an easy way to compose transforms by torchvision.transforms.Compose callable class which allows you to chain several transforms.

Let's implement a simple augmentation, in which you randomly crop a region from the image. As you might notice, in the case of image segmentation, if we crop the input image, we must also crop the segmentation mask. 

In [None]:
class RandomCrop(object):
    """Crop randomly the image and the segmentation mask

    param: output_size (tuple or int): requested output size.
           if int, square crop is made.
    """

    def __init__(self, output_size):
        assert isinstance(output_size, (int, tuple))
        if isinstance(output_size, int):
            self.output_size = (output_size, output_size)
        else:
            assert len(output_size) == 2
            self.output_size = output_size

    def __call__(self, sample):
        image, segmentation = sample['image'], sample['segmentation']
        h, w = image.shape[:2]
        new_h, new_w = self.output_size
        if h < new_w or w < new_w:
            return sample

        # TODO your code here
        # randomly generate the coordinates for the top left of the crop
        top = None
        left = None
        # crop both the image and the segmentation mask
        image = None
        segmentation = None
        return {'image': image, 'segmentation': segmentation}



Now, when you create the Dataset and this transform that you just wrote and analyse its effect.

In [None]:
# TODO your code here
# hint: the code is aleary written for you (see above), you just need to pass the trasnsform

Now, let's write another class to transform the image and the segmentation map to tensors.

Then, chain these transforms using torchvision.transforms.Compose.

In [None]:
class ToTensor(object):
    def __call__(self, sample):
        image, segmentation = sample['image'], sample['segmentation']
        # NEXT TIME
        # numpy image: H x W x C
        # torch image: C x H x W
        # image = image.transpose((2, 0, 1))
        return {'image': torch.from_numpy(image),
                'segmentation': torch.from_numpy(segmentation)}



## Dataloaders

By now you have a class that can easily retrieve one sample (image and segmentation map) at a time from your dataset. However, when training a model, you usually pass data in batches, you need to shuffle the data at each epoch and Python’s multiprocessing to speed up data retrieval.
Fortunately, all these are provided by the DataLoader.


In [None]:
# let's create a DataLoader to easily iterate over this dataset
bs = 4
dataloader = torch.utils.data.DataLoader(training_data, batch_size=bs, shuffle=True, num_workers=0)

for i_batch, sample_batched in enumerate(dataloader):
    imgs = sample_batched['image']
    segs = sample_batched['segmentation']
    print(i_batch, imgs.size(), segs.size())

    rows, cols = bs, 2
    figure = plt.figure(figsize=(bs, 2))
    for i in range(0, bs):
        figure.add_subplot(rows, cols, 2*i+1)
        plt.title('image')
        plt.axis("off")
        plt.imshow(imgs[i].numpy())

        figure.add_subplot(rows, cols, 2*i+2)
        plt.title('seg')
        plt.axis("off")
        plt.imshow(segs[i].numpy(), cmap="gray")
    plt.show()
    # display the first 3 batches
    if i_batch == 2:
        break


## Building the model
 
The model that will be used in this laboratory is inspired by the [U-Net](https://arxiv.org/abs/1505.04597) architecture.
U-Net is a fully convolutional neural network comprising two symmetric paths: a contracting path (to capture context) and an expanding path  (which enables precise localization). 
The network also uses skip connections between the corresponding layers in the downsampling path to the layer in the upsampling path, and thus directly fast-forwards high-resolution feature maps from the encoder to the decoder network.

The output of the model is an volume with depth C, where C is the number of pixel classes. For example, if you want to classify the pixels into pet and background, the output will be a volume of depth 2. 
If you want to classify the pixels into pet, pet border and background the output will be a volume of depth 3.

**Read the U-Net paper and try to understand the architecture.**
 
An overview of the U-Net architecture is depicted in the figure below:
<img src="https://miro.medium.com/max/1400/1*J3t2b65ufsl1x6caf6GiBA.png"/>
 



## The downsampling path
 

For the downsampling path we'll use a convolutional neural network from the pretrained torchvision models.
We'll cover this in detail in the next laboratory session.


## The upsamping path


In the upsampling path, we'll use transposed convolutions to progressively increase the resolution of the activation maps. The layers for the transposed convolution is [ConvTranspose2d](https://pytorch.org/docs/stable/generated/torch.nn.ConvTranspose2d.html).
 
Let's write a function to implement an upsampling block, consisting of a transposed convolution, a batch normalization block and a ReLu activation.
 
Remember, the output size $W_o$ of a transposed convolutional layer is:  
\begin{equation}
W_o = (W_i - 1) \cdot S - 2P + F
\end{equation},
 
where $W_i$ is the size of the input, $S$ is the stride, $P$ is the amount of padding and $F$ is the filter size.
 

In [None]:
import torch
def upsample_block(x, filters, size, stride = 2):
  """
  x - the input of the upsample block
  filters - the number of filters to be applied
  size - the size of the filters
  """

  # TODO your code here
  # transposed convolution
  # BN
  # relu activation
  return x

Now let's test this upsampling block

In [None]:
in_layer = torch.rand((32, 32, 128, 128))

filter_sz = 4
num_filters = 16

for stride in [2, 4, 8]:
  x = upsample_block(in_layer, num_filters, filter_sz, stride)
  print('in shape: ', in_layer.shape, ' upsample with filter size ', filter_sz, '; stride ', stride, ' -> out shape ', x.shape)

in shape:  torch.Size([32, 32, 128, 128])  upsample with filter size  4 ; stride  2  -> out shape  torch.Size([32, 16, 258, 258])
in shape:  torch.Size([32, 32, 128, 128])  upsample with filter size  4 ; stride  4  -> out shape  torch.Size([32, 16, 512, 512])
in shape:  torch.Size([32, 32, 128, 128])  upsample with filter size  4 ; stride  8  -> out shape  torch.Size([32, 16, 1020, 1020])
