# Week 1: Introduction to Computer Vision

## Notebook 4: Semantic Segmentation with a U-Net Convolutional Neural Network using PyTorch

Welcome to the fourth notebook of this week's Applied AI Study Group! We will study semantic segmentation problem with MSRC-v2 image dataset provided by Microsoft. The aim of our task will be to make object segmentation in the given images.

### 1. Semantic Segmentation

Semantic Segmentation aims to label each pixel (aka classify pixel-wise) of a given image. We treat different objects of the same class as they are same object. On contrast, instance segmentation treats each objects of the same class as they are different objects, hence, label them differently such as object 1, object 2, etc. In this notebook, we will tackle the problem of semantic segmentation. The pixel-wise operations can be applied via segmentation on images, for example, portrait mode in images requires to differentiate between foreground and background of an image. We blur out the pixels which are classified as background. 

So, how do we build our model for this case? We know that the capabilities of convolution filters are proven in terms of their capabilities in processing structured data such as images. However, they reduce the size of their input vector depends on their kernel size. We need a model that outputs the same size of input vector since we want to retrieve the same image we give into the model. Luckily for us, we have U-Net Architecture for these kind of tasks. We will study U-Nets in the following section.

### 2. U-Net Convolutional Neural Network

U-Net

### 3. Imports and Checks

You should have installed Numpy and Matplotlib using `pip` and, PyTorch using [Week 0 - Notebook 2](https://github.com/inzva/Applied-AI-Study-Group/blob/add-frameworks-week/Applied%20AI%20Study%20Group%20%236%20-%20January%202022/Week%200/2-mnist_classification_convnet_pytorch.ipynb).


In [None]:
import numpy as np
import matplotlib.pyplot as plt
import os
import torch
from datasets.segmentation_dataset import SegmentationData, label_img_to_rgb

In [None]:
print(torch.__version__)
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)

In [None]:
data_root = os.path.join('./datasets','segmentation')

train_data = SegmentationData(image_paths_file=f'{data_root}/segmentation_data/train.txt')
val_data = SegmentationData(image_paths_file=f'{data_root}/segmentation_data/val.txt')
test_data = SegmentationData(image_paths_file=f'{data_root}/segmentation_data/test.txt')

In [None]:
print("Train size: %i" % len(train_data))
print("Validation size: %i" % len(val_data))
print("Img size: ", train_data[0][0].size())
print("Segmentation size: ", train_data[0][1].size())

num_example_imgs = 4
plt.figure(figsize=(10, 5 * num_example_imgs))
for i, (img, target) in enumerate(train_data[:num_example_imgs]):
    # img
    plt.subplot(num_example_imgs, 2, i * 2 + 1)
    plt.imshow(img.numpy().transpose(1,2,0))
    plt.axis('off')
    if i == 0:
        plt.title("Input image")
    
    # target
    plt.subplot(num_example_imgs, 2, i * 2 + 2)
    plt.imshow(label_img_to_rgb(target.numpy()))
    plt.axis('off')
    if i == 0:
        plt.title("Target image")
plt.show()

In [None]:
import torch
import torch.nn as nn
import torchvision.models as models

class SegmentationNN(nn.Module):

    def __init__(self, num_classes=23, hparams=None):
        super().__init__()
        self.hparams = hparams

        mobile_network = models.mobilenet_v2(pretrained=True)

        layers = list(mobile_network.children())[:-1]  # 1x1280x8x8
        layers.append(nn.Conv2d(1280, 120, 1, 1))  # 1x160x8x8
        layers.append(nn.LeakyReLU(0.1))
        layers.append(nn.Upsample(scale_factor=4))  # 1x160x32x32
        layers.append(nn.ConvTranspose2d(120, 80, 3, 2))  # 1x120x64x64
        layers.append(nn.LeakyReLU(0.1))
        layers.append(nn.ConvTranspose2d(80, 60, 9, dilation=2))  # 1x80x80x80
        layers.append(nn.LeakyReLU(0.1))
        layers.append(nn.ConvTranspose2d(60, 40, 9, dilation=2))  # 1x60x96x96
        layers.append(nn.LeakyReLU(0.1))
        layers.append(nn.ConvTranspose2d(40, 40, 11, dilation=2))  # 1x40x116x116
        layers.append(nn.LeakyReLU(0.1))
        layers.append(nn.Upsample(scale_factor=2))  # 1x40x232x232
        layers.append(nn.ConvTranspose2d(40, 23, 7))  # 1x23x240x240
        layers.append(nn.LeakyReLU(0.1))
        self.network = nn.Sequential(*layers)

    def forward(self, x):

        x = self.network(x)

        return x

In [None]:
hparams = {
    "lr" : 0.001,
    "batch_size" : 4,
    "num_epochs" : 4
}  

model = SegmentationNN(hparams=hparams)

model.to(device)

optimizer = torch.optim.Adam(model.parameters(), lr=hparams["lr"])
criterion = torch.nn.CrossEntropyLoss(ignore_index=-1, reduction='mean')

train_loader = torch.utils.data.DataLoader(train_data, batch_size=hparams["batch_size"], shuffle=True)
print(train_loader)
print(next(iter(train_loader)))

In [None]:
for (inputs, targets) in train_data[0:4]:
    inputs, targets = inputs, targets
    outputs = model(inputs.unsqueeze(0).to(device))
    losses = criterion(outputs, targets.unsqueeze(0).to(device))
    print(losses)

In [None]:
print('training starts!')
for epoch in range(hparams["num_epochs"]):
    
    epoch_loss = 0.0
    
    for i, data in enumerate(train_loader):
        images, labels = data[0].to(device), data[1].to(device)
        optimizer.zero_grad()
        predictions = model(images)
        loss = criterion(predictions, labels)
        loss.backward()
        optimizer.step()
        
        epoch_loss += loss.item()
    print("Epoch: %d Loss: %.3f" % (epoch + 1, epoch_loss / 276))