# U10 - Abdullah Barhoum (5041774), Katharina Müller (5284090)


# Assignment 10 - UNet

In this assignement we are going to program our own UNet network (https://arxiv.org/pdf/1505.04597.pdf) which is a simple but powerful one. This network is made to produce a segmentation map. This segmentation map can be a little bit smaller than the true map but keep the same spatial structure. This map however is composed of several layers, one per class. The goal for the network is to activate pixel-wisely a layer if the pixel are representing the object of the layer.

In [1]:
from IPython.display import Image
from IPython.core.display import HTML 
Image(url= "https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/u-net-architecture.png", width=700)

The network look this way. The descending part is simply made out of convolution layer and pooling, easy peasy. This part of the network allow a move from the "Where?" information to the "What?" information. Then the informations are spatially dilated through a so called "transpose convolution" looking like a convoltuion mixed with an inverse pooling and then you convolute. as I sayed above, there is one layer of exit per class, don't trust the drawing, the initial version of this network was only design to say yes or not (That why there is two output layer)

In [2]:
Image(url= "https://miro.medium.com/max/3200/0*mk6U6zQDuoQLK7Ca", width=700)

After each big step of convolution, the informations are stacked to the last part of the network (grey arrow) reinjecting this way the "Where?" information.

# 8.1

Yo have to reproduce this network by yourself. The images takken for this work come from the PascalVOC database (http://host.robots.ox.ac.uk/pascal/VOC/). Here you inject RGB images into your network and out a "cube" of maps. The label of the data are on the shape of images with one channel, the background is represented by 0 and the differents class by a unique label (all the pixel filled out of ones are representing a plan typically.)

You have to use dtype=torch.float32 for the images and dtype=torch.long for the mask and every thing should run perfectly. Use also the criterion to use should be criterion = nn.CrossEntropyLoss() because he can understand the type of label injected (https://pytorch.org/docs/stable/nn.html#torch.nn.CrossEntropyLoss). 

Try to work on this early, the training can be slow (like 1h for 50 epoch ; batch : 100)

In [1]:
import numpy as np
import sys
np.set_printoptions(threshold=sys.maxsize)
import torch
import torchvision
import torch.nn as nn
from torch import optim
import torch.nn.functional as F
from tqdm import tqdm
from torch.autograd import Function
from torch.utils.data import DataLoader, random_split
from matplotlib import pyplot as plt
from PIL import Image

In [2]:
class VOCSegLoader(torchvision.datasets.VOCSegmentation):
    def __getitem__(self, index):
        """
        Args:
            index (int): Index

        Returns:
            tuple: (image, target) where target is the image segmentation.
        """
        img = Image.open(self.images[index]).convert('RGB')
        target = Image.open(self.masks[index])
        
        if self.transforms is not None:
            img, target = self.transforms(img, target)
        
        target = np.array(target)
        target[target == 255] = 0
        labels = np.zeros((21, *target.shape))
        for i in range(21):
            labels[i] = (target == i) & 1
                
        labels = torch.as_tensor(np.asarray(labels, dtype=np.uint8), dtype=torch.uint8)
        return img, labels

In [3]:
n_epochs = 3
batch_size_train = 100
batch_size_val = 100
learning_rate = 0.001
momentum = 0.9
log_interval = 10
image_size = (64, 85)


transform_data = torchvision.transforms.Compose([torchvision.transforms.Resize(image_size), 
                                                 torchvision.transforms.ToTensor()])
transform_label = torchvision.transforms.Compose([torchvision.transforms.Resize(image_size, interpolation=0)])


train_dataset = VOCSegLoader('./data', year='2012', image_set='train', download=True,
                                         transform=transform_data, target_transform=transform_label)
val_dataset = VOCSegLoader('./data', year='2012', image_set='val', download=True,
                                         transform=transform_data , target_transform=transform_label)


train_loader = torch.utils.data.DataLoader(train_dataset, batch_size = batch_size_train)
val_loader = torch.utils.data.DataLoader(val_dataset, batch_size = batch_size_val)

Using downloaded and verified file: ./data\VOCtrainval_11-May-2012.tar
Using downloaded and verified file: ./data\VOCtrainval_11-May-2012.tar


In [5]:
image, target = train_dataset[0]
print(type(image), image.size())
print(type(target), target.size())

plt.figure()
# plt.imshow(np.asarray(image))
plt.show()

def imshow(inp, title=None):
    inp = inp.numpy().transpose((1, 2, 0))
    plt.figure(figsize=(20, 20))
    plt.axis('off')
    plt.imshow(inp)
    if title is not None:
        plt.title(title)

def show_databatch(inputs):
    out = torchvision.utils.make_grid(inputs)
    imshow(out)

<class 'torch.Tensor'> torch.Size([3, 64, 85])
<class 'torch.Tensor'> torch.Size([21, 64, 85])


<Figure size 432x288 with 0 Axes>

In [11]:
target.numpy()[8].sum()b

0

In [13]:
sys.getsizeof(target.numpy())

128

In [None]:
#TODO
def create_block(c1, c2, c3):
    return nn.Sequential(
        nn.Conv2d(c1, c2, 3, stride=1, padding=1),
        nn.ReLU(),
        nn.Conv2d(c2, c3, 3, stride=1, padding=1),
        nn.ReLU()
    )
    
class UNet(nn.Module):
    def __init__(self):
        super(UNet, self).__init__()
        self.conv1 = create_block(3, 16, 16)
        self.conv2 = create_block(16, 32, 32)
        self.conv3 = create_block(32, 64, 64)
        self.conv4 = create_block(64, 128, 128)
        self.conv5 = create_block(128, 256, 256)
        
        self.tconv1 = nn.ConvTranspose2d(256, 128, 4, 2, 1, bias=False)
        self.tconv2 = nn.ConvTranspose2d(256, 128, 4, 2, 1, bias=False)
        self.tconv3 = nn.ConvTranspose2d(256, 128, 4, 2, 1, bias=False)
        self.tconv4 = nn.ConvTranspose2d(256, 128, 4, 2, 1, bias=False)
    
    def forward(img):
        x1 = self.conv1(img)
        x2 = self.conv2(nn.functional.max_pool2d(x1))
        x3 = self.conv3(nn.functional.max_pool2d(x2))
        x4 = self.conv4(nn.functional.max_pool2d(x3))
        x5 = self.conv5(nn.functional.max_pool2d(x4))
        

## 8.2
Once you have done that, we want you to redesign a network where you remove to reinjection link (grey arrow on the drawing). You can remove the both from your choice just try and tell us if it's still working and why.

In [None]:
#TODO

## 8.3 BONUSTOCOME