# Exercise 5: Scene-Dependent Image Segmentation

The goal of this homework is to implement a model that seperates foreground and background objects for a specific scene.  
We will use the highway scene from the Change Detection dataset:  
http://jacarini.dinf.usherbrooke.ca/dataset2014#

![input image](highway/input/in001600.jpg "Title") ![gt image](highway/groundtruth/gt001600.png "Title")

## Task 1: Create a custom (Pytorch) dataset


https://pytorch.org/tutorials/beginner/basics/data_tutorial.html
You need to create a class that inherets from **from torch.utils.data.Dataset** and implements two methods:
- **def \_\_len\_\_(self)**:  returns the length of the dataset
- **def \_\_getitem\_\_(self, idx)**: given an integer idx returns the data x,y
    - x is the image as a float tensor of shape: $(3,H,W)$ 
    - y is the label image as a mask of shape: $(H,W)$ each pixel should contain the label 0 (background) or 1 (foreground). It is recommended to use the type torch.long
    
**Tips**:
- The first 470 images are not labeled. Just ignore these images. 
- If possible load all images into memory or even directly to GPU to increase speed.
- You can change the resolution to fit your model or your memory
- Add data augmentation to increase the data size and model robustness

In [None]:
import torch
from torch.utils.data import Dataset
from torchvision.io import read_image
from sklearn.model_selection import train_test_split

import glob

In [None]:
# load dataset
img_paths = sorted(glob.glob("highway/input/*.jpg"))
imgs = [read_image(img_path).float() for img_path in img_paths]
img_label_paths = sorted(glob.glob("highway/groundtruth/*.png"))
img_labels = [read_image(img_label_path).bool().int().long().squeeze() for img_label_path in img_label_paths]

# split dataset (80/20)
X_train, X_valid, y_train, y_valid = train_test_split(imgs, img_labels, test_size=0.2, random_state=42)

class ChangeDetectionDataset(Dataset):
    def __init__(self, data, labels):
        self.data = data
        self.labels = labels
    
    def __len__(self):
        return len(self.labels)
    
    def __getitem__(self, idx):
        return self.data[idx], self.labels[idx]

tensor([[[ 60.,  29.,  27.,  ...,  82.,  76.,  65.],
         [ 46.,  38.,  23.,  ..., 145., 137., 129.],
         [ 24.,  40.,  19.,  ..., 189., 189., 191.],
         ...,
         [255., 255., 255.,  ..., 103., 108., 126.],
         [250., 251., 252.,  ..., 105., 112., 115.],
         [252., 254., 255.,  ..., 110., 119., 109.]],

        [[ 68.,  37.,  35.,  ...,  77.,  71.,  60.],
         [ 54.,  46.,  31.,  ..., 140., 132., 124.],
         [ 32.,  48.,  27.,  ..., 184., 184., 186.],
         ...,
         [255., 255., 255.,  ...,  88.,  93., 111.],
         [250., 251., 252.,  ...,  92.,  99., 102.],
         [252., 254., 255.,  ...,  97., 106.,  96.]],

        [[ 53.,  22.,  20.,  ...,  81.,  77.,  66.],
         [ 39.,  31.,  16.,  ..., 144., 138., 130.],
         [ 17.,  33.,  12.,  ..., 188., 190., 192.],
         ...,
         [255., 255., 255.,  ...,  81.,  86., 104.],
         [250., 251., 252.,  ...,  84.,  91.,  94.],
         [252., 254., 255.,  ...,  89.,  98.,  88.]]]

## Task 2: Create a custom Segmentation Model

- input: a batch of images $(B,3,H,W)$ 
- output: a batch of pixel-wise class predictions $(B,C,H,W)$, where $C=2$

Tips:
- It is recommended to use a Fully-Convolutional Neural Network, because it flexible to the input and output resolution.
- Use Residual Blocks with convolutional layers.
- Base your model on established segmentation models:
    - U-Net: https://arxiv.org/abs/1505.04597
    - Deeplab: https://arxiv.org/abs/1606.00915

## Task 3: Create a training loop
- split data into training and test data, e.g. 80% training data and 20% test data using your custom dataset.
- Create a Dataloader for your custom datasets 
- Define a training loop for a single epoch:
    - forward pass
    - Loss function, e.g. cross entropy
    - optimizer 
    - backward pass
    - logging
- Define validation loop:
    - forward pass
    - extract binary labels, e.g. threshold or argmax for each pixel.
    - compute evaluation metrics: Accuracy, Precision, Recall and Intersection over Union for each image

## Task 4: Small Report of your model and training
- visualize training and test error over each epoch
- report the evaluation metrics of the final model