# Lecture 3, Notebook 2: CNN Architectures

Tutorial by Cher Bass
(edited by Emma Robinson)

Let's start by importing the modules and Data that we need for the notebook. We start by testing on the MNIST dataset as before

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F #contains some useful functions like activation functions & convolution operations you can use

import torchvision
import numpy as np
from torchvision import datasets, models, transforms

device = torch.device("cuda: 0" if torch.cuda.is_available() else "cpu")

# This is used to transform the images to Tensor and normalize it
transform = transforms.Compose(
   [transforms.ToTensor(),
    transforms.Normalize([0.5], [0.5])])

training = torchvision.datasets.MNIST(root='./data', train=True,
                                       download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(training, batch_size=8,
                                         shuffle=True, num_workers=2)

testing = torchvision.datasets.MNIST(root='./data', train=False,
                                      download=True, transform=transform)
test_loader = torch.utils.data.DataLoader(testing, batch_size=8,
                                        shuffle=False, num_workers=2)

classes = ('0', '1', '2', '3',
          '4', '5', '6', '7', '8', '9')


## ResNet with pytorch

ResNet was first introduced in 2016 as a way to deal with the gradient vanishing problem. This can occur when the network is too deep, and the gradients shrink to zero after a few back propagation steps. This can result in the parameter weights not being updated, since the gradient is zero.

ResNets can counter this problem by allowing the gradients to flow directly backwards, by adding the additive resnet connections.

An example of a resnet block (from the original 2016 paper) is illustrated below:

![resnet-block](imgs/resnet-block.png)
source: https://d2l.ai/chapter_convolutional-modern/resnet.html

He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.

http://www.pabloruizruiz10.com/resources/CNNs/ResNet-PyTorch.html

https://towardsdatascience.com/understanding-and-visualizing-resnets-442284831be8


### Using existing ResNet 

It's possible to load existing networks using pytorch library torchvision - you can load these models using torchvision.models, which contains networks such as ResNet, Alexnet, VGG, Densenet, etc...
https://pytorch.org/docs/stable/torchvision/models.html

For example the following pretrained resnets models can be loaded in Pytorch:
```python
torchvision.models.resnet18(pretrained=True, **kwargs)
```

You can also load a model that hasn't been pretrained in the following way:
```python
torchvision.models.resnet18(pretrained=False, **kwargs)
```

You can find examples of how to use pretrained models in: https://pytorch.org/tutorials/beginner/finetuning_torchvision_models_tutorial.html

However, you will find that using a pretrained model doesn't always suit your needs. For example, the resnet models shown above have been trained on RGB images (i.e. they are 3 channels), which means that you can't use them without adjustment on grayscale images, or on 3D medical data.


## Exercise 2.1 Programming your own ResNet

The first thing we need to do is implement a `ResidualBlock` class, which will implement a single ResNet (2016) block, which includes the following steps: 

1. Convolution, followed by batchnorm, followed by relu
2. Convolution, followed by batchnorm 
3. shortcut step, where 
    - the input is first transformed through a strided $1 \times 1$ convolutional operation to match the dimensions of the output of the residual block
    - added to the output of the convolutions. 
4. relu

The only slightly challenging bit here is the first part of the shortcut step. So let's start by ignoring it to create the main body of the residual block. This will work provided we maintain input dimensions. 

#### To do 2.1.1 - Create the Residual  block

As shown above the residual block performs `Conv2d > BatchNorm2d > ReLU > Conv2D BatchNorm2d > ADD > ReLU `. Let us create a `ResidualBlock` and define (parametrise) the required `Conv2d` and `BatchNorm2d` steps in the constructor (`__init__`):

Tasks: edit (`__init__`) to input

1. `self.conv1` a 2D convolution with arguments `in_channels=channels1,out_channels=channels2, kernel_size=3, stride=res_stride, padding=1, bias=False`. Here, `channels1, channels2 and res_stride` are input arguments to `__init()`. 
2. ` self.bn1` a 2D batchnorm layer with input `num_features=channels`
2. `self.conv2` the second convolutional layer. *This time stride should be 1*. What should its input and output channel dimensions be? (note `kernel_size=3, stride=res_stride, padding=1, bias=False` as before)
4. ` self.bn1` the second 2D batchnorm layer. What does it expect for the number of input features (`num_features`)

See:
https://pytorch.org/docs/stable/nn.html#torch.nn.Conv2d
https://pytorch.org/docs/stable/nn.html#torch.nn.BatchNorm2d

For PyTorch documentation for each of these functions, included argument variable names and expect input/output.

Note, biases are set to `False` in the block as they are instead handled by the batchnorm layer see https://discuss.pytorch.org/t/why-does-the-resnet-model-given-by-pytorch-omit-biases-from-the-convolutional-layer/10990/2. Also, observe that the Relu layer is implemented in the forward pass function.


In [None]:
class ResidualBlock(nn.Module):

    def __init__(self, channels1,channels2,res_stride=1):
        super(ResidualBlock, self).__init__()
        self.inplanes=channels1
        # Exercise 2.1.1 construct the block without shortcut
        self.conv1 = None
        self.bn1 = None
        self.conv2 = None
        self.bn2 = None

        if res_stride != 1 or channels2 != channels1:
        # Exercise 2.1.3 the shortcut; create option for resizing input 
            self.shortcut=nn.Sequential()
        else:
            self.shortcut=nn.Sequential()
            

    def forward(self, x):
        
        # forward pass: Conv2d > BatchNorm2d > ReLU > 
        #Conv2D >  BatchNorm2d > ADD > ReLU
        out=self.conv1(x)
        out=self.bn1(out)
        out = F.relu(out)
        out = self.conv2(out)
        out = self.bn2(out)
        # THIS IS WHERE WE ADD THE INPUT
        #print('input shape',x.shape,self.inplanes)
        out += self.shortcut(x)
       # print('res block output shape',  out.shape)
        # final ReLu
        out = F.relu(out)

        return out

Note the steps of the forward pass. The networks performs the first convolution (followed by batchnorm and relu), then the second convolution (followed just by batchnorm). Then it adds the input. Here, the input is represented by the nn.Sequential() `self.shortcut` which we have currently left empty. Finally, the last operation of the block is a relu.

#### To do 2.1.2 . Perform a test forward pass (keeping input and output dimensions constant

1. Instantiate an instance of class ResidualBlock (create a network called `blk`)
2. create a random tensor of size $5 \times 3 \times 100 \times 100$ (which matches expected input dimensions $N,C_in,H,W$ the expected input dimensions of `nn.conv2d`
3. Pass the input through a forward pass with number of input channels 3 and output channels 3

**hint** look at how this was done in the last lecture. Remember - we don't need to explicitely call the forward function.

In [None]:
##  Student To do 

#### To do 2.1.3. Implement the shortcut 

Now, let us go back and edit the function to support resizing the input. This will allow us to downsample and change the number of feature dimensions within our residual block.

Change line 14 in `ResidualBlock.__init__()` to implement a Sequential block with two steps:
1. A $1 \times 1 $ `nn.Conv2d` layer with `stride=res_stride,bias=False`.  This will support changes of spatial dimensions through strided convolutions and changes of feature dimensions through $1 \times 1 $ convolutions. What should your input and output channels be to make it equivalent to the output of the residual block?
2. batchnorm. Think carefully about the input dimension. 

Once you have done this, test the network again, but this time change the number of output
 

In [None]:
##  Student To do 

We are now have all the building blocks we need to build a residual network. In what follows we will construct a ResNet with four residual layers. Each layer will contain 2 residual blocks. 

#### To do 2.1.4 : Complete the Residual Network class

**Step 1** Following the definition in the original paper the network starts with a  convolutional layer with a $7 \times 7 $ kernel, followed by a batchnorm. However, as we intend to test on the MNIST (which is very small) lets change the $7 \times 7 $ kernel to a $3 \times 3 $ one (**check how this is implemented**) 

**Step 2 (Student complete)** Comment the function `_make_layer`. What is each line doing? Complete the class constructor, using `_make_layer` to create 4 residual layers,  with `num_blocks` residual blocks per layer, `num_strides[i]` strides per block (where $i$ indexes the layer, starting from the initial convolution) and `num_features[i]` represents the number of output channels per layer.

**Step 4 (Student complete)** The last layer is a fully connected (softmax) layer. Complete this function. The number of inputs must match the number of outputs from the previous layer and the number of outputs must match the number of classes.


In [None]:
class ResNet(nn.Module):
    def __init__(self, block, num_blocks, num_strides, num_features, in_channels, num_classes=10):
        super(ResNet, self).__init__()
        self.in_planes = 64

        # step 1. Initialising the network with a 3 x3 conv and batch norm
        self.conv1 = nn.Conv2d(in_channels, num_features[0], kernel_size=3, stride=num_strides[0], padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        # Step 2: TO DO Using function make_layer() create 4 residual layers
        # num_blocks per layer is given by input argument num_blocks (which is an array)
        self.layer1 = None
        self.layer2 = None
        self.layer3 = None
        self.layer4 = None
        self.linear = None

    def _make_layer(self, block, planes, num_blocks, stride):
        layers = []
        
        for i in np.arange(num_blocks -1):
            layers.append(block(self.in_planes, planes))
            self.in_planes = planes 
        
        layers.append(block(planes, planes, stride))
        
        return nn.Sequential(*layers)

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        #print('init',out.shape)
        out = self.layer1(out)
        #print('layer1',out.shape)
        out = self.layer2(out)
       # print('layer2',out.shape)

        out = self.layer3(out)
        #print('layer3',out.shape)

        out = self.layer4(out)
       # print('layer4',out.shape)

        
        out = F.avg_pool2d(out, 4)
        out = out.view(out.size(0), -1)
        out = self.linear(out)
        return out
    

**Step 5** Observe below, creation of an instance of class `ResNet`. This requires as argument the `ResidualBlock` class defined above. We hard code the argments for number of blocks, as well as lists defining the number of strides and features per layer. These lists have length 5 to encode also for the initial convolution.

In [None]:
def my_ResNet4(in_channels=1):
    return ResNet(ResidualBlock,2, [1,1,2,2,2], [64,64,128,256,512], in_channels=in_channels)

#### To do 2.1.5:  run your ResNet on MNIST for classification
Create a ResNet network and run with the same code as above for classification, and then test.
Remember to define your loss function, optimizer (we suggest you first try SGD with lr=0.001 and momentum=0.9), dataloaders, and your resnet network. 
Then run the training and testing, as before.



In [None]:
#-----------------------------------------------------task 4 -----------------------------------------------------
# Task 4: Train and test ResNet on MNIST dataset for classification
# hints: define your resnet network, loss function, optimizer and dataloaders. 
# Then you can run the same training and testing code as above.
# ----------------------------------------------------------------------------------------------------------------

In [None]:
import torch.optim as optim

resnet = None
resnet = resnet.to(device)

loss_fun = None
loss_fun = loss_fun.to(device)

optimizer = None

In [None]:
epochs = 1
for epoch in range(epochs): 

    # enumerate can be used to output iteration index i, as well as the data 
    for i, (data, labels) in enumerate(train_loader, 0):
        # Student to complete

        
        
        
        
        
        
        
        
        
        # print statistics (expecting loss to be output to variable `loss`)
        ce_loss = loss.item()
        if i % 10 == 0:
            print('[%d, %5d] loss: %.3f' %
                 (epoch + 1, i + 1, ce_loss))


In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

#make an iterator from test_loader
#Get a batch of testing images
test_iterator = iter(test_loader)
images, labels = test_iterator.next()

images = images.to(device)
labels = labels.to(device)

y_score = resnet(images)
# get predicted class from the class probabilities
_, y_pred = torch.max(y_score, 1)

print('Predicted: ', ' '.join('%5s' % classes[y_pred[j]] for j in range(8)))
rows = 2
columns = 4
# plot y_score - true label (t) vs predicted label (p)
fig2 = plt.figure()
for i in range(8):
    fig2.add_subplot(rows, columns, i+1)
    plt.title('t: ' + classes[labels[i].cpu()] + ' p: ' + classes[y_pred[i].cpu()])
    img = images[i] / 2 + 0.5     # this is to unnormalize the image
    img = torchvision.transforms.ToPILImage()(img.cpu())
    plt.axis('off')
    plt.imshow(img)
plt.show()


In [None]:
y_true = labels.data.cpu().numpy()
y_pred = y_pred.data.cpu().numpy()

In [None]:
from sklearn.metrics import accuracy_score, f1_score, recall_score, precision_score

accuracy = accuracy_score(y_true, y_pred)
f1 = f1_score(y_true, y_pred, average='macro')
precision = precision_score(y_true, y_pred, average='macro')
recall = recall_score(y_true, y_pred, average='macro')
print('accuracy:', accuracy, ', f1 score:', f1, ', precision:', precision, ', recall:', recall)

## (Optional Exercise) Use ResNet for classification - CIFAR10

Use the torch inbuilt ResNet for RBG images and train for classification on the CIFAR10 dataset.

Here are some example images from the CIFAR10 datasets- we have 10 classes:

![cifar10](imgs/cifar10.jpg)
source: https://appliedmachinelearning.blog/2018/03/24/achieving-90-accuracy-in-object-recognition-task-on-cifar-10-dataset-with-keras-convolutional-neural-networks/

You can load the CIFAR10 dataset using torchvision in the following way:
```python
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(trainset, batch_size=8,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
test_loader = torch.utils.data.DataLoader(testset, batch_size=8,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
```
You can use this tutorial as a reference for training on CIFAR10 - https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html

Remember to define your loss function, optimizer, dataloaders, and your resnet network. 
Then run the training and testing, same as with MNIST.

First, import the PyTorch ResNet

In [None]:
from torchvision import models

resnet_cifar = models.resnet18(pretrained=True)
resnet_cifar = resnet_cifar.to(device)

In [None]:
#-----------------------------------------------------task 5 -----------------------------------------------------
# Task 5: Train and test ResNet on CIFAR10 dataset for classification
# hints: define your resnet network, loss function, optimizer and dataloaders. 
# Then you can run the same training and testing code as above.
# ----------------------------------------------------------------------------------------------------------------

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms

transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(trainset, batch_size=8,
                                          shuffle=True, num_workers=2)

testset = None
test_loader = None

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

In [None]:
import torch.optim as optim

loss_fun = None
loss_fun = loss_fun.to(device)

optimizer = None

In [None]:
epochs = 10
for epoch in range(epochs): 

    # enumerate can be used to output iteration index i, as well as the data 
    for i, (data, labels) in enumerate(train_loader, 0):
        # Student to complete
        
        
        
        
        
        
        
        # print statistics
        ce_loss = loss.item()
        if i % 10 == 0:
            print('[%d, %5d] loss: %.3f' %
                 (epoch + 1, i + 1, ce_loss))


In [None]:
#make an iterator from test_loader
#Get a batch of testing images
test_iterator = iter(test_loader)
images, labels = test_iterator.next()
images = images.to(device)
labels = labels.to(device)

y_score = resnet_cifar(images)
# get predicted class from the class probabilities
_, y_pred = torch.max(y_score, 1)

print('Predicted: ', ' '.join('%5s' % classes[y_pred[j]] for j in range(8)))

# plot y_score - true label (t) vs predicted label (p)
fig2 = plt.figure()
for i in range(8):
    fig2.add_subplot(rows, columns, i+1)
    plt.title('t: ' + classes[labels[i].cpu()] + ' p: ' + classes[y_pred[i].cpu()])
    img = images[i] / 2 + 0.5     # this is to unnormalize the image
    img = torchvision.transforms.ToPILImage()(img.cpu())
    plt.axis('off')
    plt.imshow(img)
plt.show()


In [None]:
y_true = labels.data.cpu().numpy()
y_pred = y_pred.data.cpu().numpy()

In [None]:
from sklearn.metrics import accuracy_score, f1_score, recall_score, precision_score

accuracy = accuracy_score(y_true, y_pred)
f1 = f1_score(y_true, y_pred, average='macro')
precision = precision_score(y_true, y_pred, average='macro')
recall = recall_score(y_true, y_pred, average='macro')
print('accuracy:', accuracy, ', f1 score:', f1, ', precision:', precision, ', recall:', recall)

## Exercise 3.2.2 Image segmentation with pytorch using U-net

U-net was first developed in 2015 by Ronneberger et al., as a segmentation network for biomedical image analysis.
It has been extremely successful, with 9,000+ citations, and many new methods that have used the U-net architecture since.


The architecture of U-net is based on the idea of using skip connections (i.e. concatenating) at different levels of the network to retain high, and low level features.

Here is the architecture of a U-net:

---

![U-net](imgs/unet.png)
Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. "U-net: Convolutional networks for biomedical image segmentation." International Conference on Medical image computing and computer-assisted intervention. Springer, Cham, 2015.

### Two-photon microscopy dataset of cortical axons

In this tutorial we use a dataset of cortical neurons with their corresponding segmentation binary labels.

These images were collected using in-vivo two-photon microscopy from the mouse somatosensory cortex. To generate the 2D images, a max projection was used over the 3D stack. The labels are binary segmentation maps of the axons.

Here we will use 100 [64x64] crops during training and validation. 

These are some example images [256x256] from the original dataset:
![axon_dataset](imgs/axon_dataset.png)

Bass, Cher, et al. "Image synthesis with a convolutional capsule generative adversarial network." Medical Imaging with Deep Learning (2019).


In [4]:
#load modules
from __future__ import print_function
import numpy as np
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms, utils
import torch
from torch.autograd import Variable
from AxonDataset import AxonDataset
import torch.nn as nn
from torch.utils.data.sampler import SubsetRandomSampler
import time
import torch.nn.functional as F
import torchvision.utils as vutils
import os
import matplotlib.pyplot as plt


In [None]:
# Setting parameters
timestr = time.strftime("%d%m%Y-%H%M")
__location__ = os.path.realpath(
    os.path.join(os.getcwd(), os.path.dirname('__file__')))

print(__location__)

path = os.path.join(__location__,'results')
if not os.path.exists(path):
    os.makedirs(path)
    
# Define your batch_size
batch_size = 16


### Creating a dataloader

In this example, a custom dataloader was created, and we import it from `AxonDataset.py`

We utilise the `torch.utils.data.sampler.SubsetRandomSampler` to create two DataLoaders for train and validation. Here, a random a subset of 20% of subject indices are selected for validation. The remaining 80% are used for training. The lists of train and validation subjects are passed to `torch.utils.data.sampler.SubsetRandomSampler` to create bespoke train/validation samplers; these are passed to the `DataLoader` using the argument `sampler,` and override the default use of `shuffle`.

#### 3.2.1 Create a list of random indices for train and validation sets (**hint** use np.random.choice)

Check you understand how the bespoke samplers are implemented

In [None]:
#First we create a dataloader for our example dataset- two photon microscopy with axons
axon_dataset = AxonDataset(data_name='org64', type='train')

# -----------------------------------------------------task 1----------------------------------------------------------------
# Task 1: create a random list of indices for training and testing with a 80%,20% split

# We need to further split our training dataset into training and validation sets.
# Define the indices
indices = None # start with all the indices in training set
split = None # define the split size

# Get indices for train and validation datasets, and split the data
validation_idx = None
train_idx = None
# ----------------------------------------------------------------------------------------------------------------------------

# feed indices into the sampler
train_sampler = SubsetRandomSampler(train_idx)
validation_sampler = SubsetRandomSampler(validation_idx)

# Create a dataloader instance 
train_loader = torch.utils.data.DataLoader(axon_dataset, batch_size = batch_size,
                                           sampler=train_sampler) 
val_loader = torch.utils.data.DataLoader(axon_dataset, batch_size = batch_size,
                                        sampler=validation_sampler) 


## Build a U-net 

We next build our u-net network.

First we define a layer `double_conv` that performs 2 sets of convolution followed by ReLu.This is set up as a `nn.Sequential(` block.

In [None]:
# define U-net
def double_conv(in_channels, out_channels, padding=1):
    return nn.Sequential(
        nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=padding),
        nn.ReLU(inplace=True),
        nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=padding),
        nn.ReLU(inplace=True)
    )


Next we need to define how we perform an downsample and an upsample step. The original U-net performs downsampling through a $2 \times 2 $ max pool (however, strided convolutions are equally viable). Upsampling is performed through use of `nn.Upsample` (https://pytorch.org/docs/stable/nn.html#torch.nn.Upsample), which interpolates the data to a higher resolution grid. The function expects arguments `scale_factor` and (interpolation) `mode`. There are several options for the interpolation mode; we recommend bilinear. In this example we upsample by a `scale_factor` of 2 each time (to match the $2\times 2$ max pool used during downsampling). 

Thus, in what follows, a single level of encoding can be represented as:

`conv1 = self.dconv_down1(x)
 conv1 = self.dropout(conv1)
 x = self.maxpool(conv1)`
        
In other words a double convolution followed by a maxpool. Here, a dropout layer is inserted between the convolutional layer and the maxpool for regularisation. An alternative approach is to insert a batchnorm between the `nn.Conv2d` and the `nn.ReLU` e.g. https://github.com/milesial/Pytorch-UNet

A single level of decoding might be represented as:

`deconv4 = self.upsample(conv5)
 deconv4  = self.dconv_up4(deconv4)
 deconv4 = self.dropout(deconv4)`
 
However, we are missing something vital...

### Skip connections

The U-net is a symmetric network with equal numbers of encoding and decoding layers. These form pairs where the spatial dimensions of each encoder/decoder layer in the pair are consistent.

A key feature of the U-net is that to support segmentation of sharp boundaries with preservation of high spatial resolution features it is necessary to pass features learnt during encoding across the network. The theory is that the early layers, with their small-receptive fields, learn the high-spatial frequency information (i.e. they act as edge detectors and/or texture filters). As the receptive field increases during encoding spatial specicity is lost, but spatial localisation (where class relevant objects broadly are in the image) is gained. In order to import the high spatial frequency information of the early encoding layers into the final decoding layers the *activations* learnt during encoding are directly concatenated onto the upsampled activations of the paired decoding layer.

In other words for the first decoding layer (which for a 5-layer U-Net is the layer that directly follows the bottleneck `conv5`) is:

`deconv4 = self.upsample(conv5)
 deconv4 = torch.cat([deconv4, conv4], dim=1)
 deconv4  = self.dconv_up4(deconv4)
 deconv4 = self.dropout(deconv4)`
 
 The activations (output) of convolution layer conv (`conv4`) is directly concatenated to the output of `self.upsample` where concatenation is performed on the channel axis (`axis=1`); Thus putting this all together

## Exercise 2.2. Building a U-Net

We then define our U-net network.

We first initialise all the different layers in the network in `__init__`:
1. `self.dconv_down1` is a double convolutional layer (defined above)
2. `self.maxpool` is a max pooling layer that is used to reduce the size of the input, and increase the receptive field
3. `self.upsample` is an upsampling layer that is used to increase the size of the input
4. `dropout` is a dropout layer that is applied to regularise the training
5. `dconv_up4` is also a double convolutional layer- note that it takes in additional channels from previous layers (i.e. the skip connections).


### To do 2.2.1  complete the forward pass

1. Following the example for conv1 complete encoder layers 2,3 and 4. How many features does each layer have?
2. Complete layer `conv5`; this is the bottleneck layer (the bottom of the network) and thus has no maxpool.
2. Using the upsampling and skip connection example above implement the decoder layers `deconv4`,`deconv3`,`deconv2`,`deconv1`. Note - you should be concatenating the activations of the paired layers from the encoding path; these are the activations with the matching spatial dimensions (see above notes for more details)
5. We are expecting class labels as output; thus the output requires a sigmoid transformation; check you understand what this does?

In [None]:

class UNet(nn.Module):

    def __init__(self):
        super().__init__()
        
        self.dconv_down1 = double_conv(1, 32)
        self.dconv_down2 = double_conv(32, 64)
        self.dconv_down3 = double_conv(64, 128)
        self.dconv_down4 = double_conv(128, 256)
        self.dconv_down5 = double_conv(256, 512)

        self.maxpool = nn.MaxPool2d(2)
        self.upsample = nn.Upsample(scale_factor=2, mode='bilinear')
        self.dropout = nn.Dropout2d(0.5)
        self.dconv_up4 = double_conv(256 + 512, 256)
        self.dconv_up3 = double_conv(128 + 256, 128)
        self.dconv_up2 = double_conv(128 + 64, 64)
        self.dconv_up1 = double_conv(64 + 32, 32)

        self.conv_last = nn.Conv2d(32, 1, 1)

    def forward(self, x):
        
        #######   ENCODER ###############
        
        conv1 = self.dconv_down1(x)
        conv1 = self.dropout(conv1)
        x = self.maxpool(conv1)

        # --------------------------------------------------- task 2.2.1 ----------------------------------------------------------
        # implement encoder layers conv2, conv3 and conv4
        
        

        # --------------------------------------------------- task 2.2.2 ----------------------------------------------------------
        # implement bottleneck layer conv5
        
        conv5 = self.dconv_down5(x)
        conv5 = self.dropout(conv5)
        # ---------------------------------------------------------------------------------------------------------------------
       
        #######   DECODER ###############
        
        # --------------------------------------------------- task 2.2.3 ----------------------------------------------------------
        # Implement the decoding layers
        

        #---------------------------------------------------------------------------------------------------------------------
        out = F.sigmoid(self.conv_last(deconv1))

        return out

To save time we initialise the network with a previously trained network by loading the weights

*for practical reasons training this network from scratch will take too long, and require large computational resources*

In [None]:
# initialise network - and load weights
net = UNet()
#net.load_state_dict(torch.load(path+'/'+'model.pt')) #this function loads a pretrained network
net.load_state_dict(torch.load(path+'/'+'model.pt',map_location=torch.device('cpu')))

## Defining an appropriate loss function
We next define our loss function - in this case we use Dice loss, a commonly used loss for image segmentation.

The Dice coefficient can be used as a loss function, and is essentially a measure of overlap between two samples.

Dice is in the range of 0 to 1, where a Dice coefficient of 1 denotes perfect and complete overlap. The Dice coefficient was originally developed for binary data, and can be calculated as:

$Dice = \dfrac{2|A\cap B|}{|A| + |B|}$

where $|A\cap B|$ represents the common elements between sets $A$ and $B$, and $|A|$ represents the number of elements in set $A$ (and likewise for set $B$).

For the case of evaluating a Dice coefficient on predicted segmentation masks, we can approximate  $|A\cap B|$ as the element-wise multiplication between the prediction and target mask, and then sum the resulting matrix.

An **alternative loss** function would be pixel-wise cross entropy loss. It would examine each pixel individually, comparing the class predictions (depth-wise pixel vector) to our one-hot encoded target vector.


In [None]:
# dice loss
def dice_coeff(pred, target):
    """This definition generalize to real valued pred and target vector.
    This should be differentiable.
    pred: tensor with first dimension as batch
    target: tensor with first dimension as batch
    """

    smooth = 1.
    epsilon = 10e-8

    # have to use contiguous since they may from a torch.view op
    iflat = pred.contiguous().view(-1)
    tflat = target.contiguous().view(-1)
    intersection = (iflat * tflat).sum()

    A_sum = torch.sum(iflat * iflat)
    B_sum = torch.sum(tflat * tflat)

    dice = (2. * intersection + smooth) / (A_sum + B_sum + smooth)
    dice = dice.mean(dim=0)
    dice = torch.clamp(dice, 0, 1.0-epsilon)

    return  dice

# cross entropy loss
loss_BCE = nn.BCEWithLogitsLoss()


Here the penalty term `smooth` is added to prevent division by zero.

As before, we define the optimiser to train our network - here we use Adam.


In [None]:
#define your optimiser
optimizer = torch.optim.Adam(filter(lambda p: p.requires_grad, net.parameters()), lr=1e-05, betas=(0.5, 0.999))
optimizer.zero_grad()


## Training and evaluating our segmentation network
We next train and evaluate our network 

note that the results are saved to a folder \results - so please check that

In [None]:
epochs=10
save_every=10
all_error = np.zeros(0)
all_error_L1 = np.zeros(0)
all_error_dice = np.zeros(0)
all_dice = np.zeros(0)
all_val_dice = np.zeros(1)
all_val_error = np.zeros(0)

for epoch in range(epochs):

    ##########
    # Train
    ##########
    t0 = time.time()
    for i, (data, label) in enumerate(train_loader):
        
        # setting your network to train will ensure that parameters will be updated during training, 
        # and that dropout will be used
        net.train()
        net.zero_grad()

        target_real = torch.ones(data.size()[0])
        batch_size = data.size()[0]
        pred = net(data)
        
        # dice loss = 1-dice_coeff
        # ----------------------------------------------- task 3 ------------------------------------------------------------
        # Task 3: change loss function here
        err = 1- dice_coeff(pred, label)
        err = loss_BCE(pred, label)
        # -------------------------------------------------------------------------------------------------------------------

        dice_value = dice_coeff(pred, label).item()

        err.backward()
        optimizer.step()
        optimizer.zero_grad()

        time_elapsed = time.time() - t0
        print('[{:d}/{:d}][{:d}/{:d}] Elapsed_time: {:.0f}m{:.0f}s Loss: {:.4f} Dice: {:.4f}'
              .format(epoch, epochs, i, len(train_loader), time_elapsed // 60, time_elapsed % 60,
                      err.item(), dice_value))

        if i % save_every == 0:
            # setting your network to eval mode to remove dropout during testing
            net.eval()

            vutils.save_image(data.data, '%s/epoch_%03d_i_%03d_train_data.png' % (path, epoch, i),
                                  normalize=True)
            vutils.save_image(label.data, '%s/epoch_%03d_i_%03d_train_label.png' % (path, epoch, i),
                                  normalize=True)
            vutils.save_image(pred.data, '%s/epoch_%03d_i_%03d_train_pred.png' % (path, epoch, i),
                                  normalize=True)

            error = err.item()

            all_error = np.append(all_error, error)
            all_dice = np.append(all_dice, dice_value)

    # #############
    # # Validation
    # #############
    mean_error = np.zeros(0)
    mean_dice = np.zeros(0)
    t0 = time.time()
    for i, (data, label) in enumerate(val_loader):

        net.eval()
        batch_size = data.size()[0]

        data, label = Variable(data), Variable(label)
        pred = net(data)
        
        # ----------------------------------------------- task 3 ------------------------------------------------------------
        # Task 3: change loss function here
        err = 1-dice_coeff(pred, label)
        # err = loss_BCE(pred, label)
        # -------------------------------------------------------------------------------------------------------------------

        # compare generated image to data-  metric
        dice_value = dice_coeff(pred, label).item()

        if i == 0:
            vutils.save_image(data.data, '%s/epoch_%03d_i_%03d_val_data.png' % (path, epoch, i),
                              normalize=True)
            vutils.save_image(label.data, '%s/epoch_%03d_i_%03d_val_label.png' % (path, epoch, i),
                              normalize=True)
            vutils.save_image(pred.data, '%s/epoch_%03d_i_%03d_val_pred.png' % (path, epoch, i),
                              normalize=True)

        error = err.item()
        mean_error = np.append(mean_error, error)
        mean_dice = np.append(mean_dice, dice_value)

    all_val_error = np.append(all_val_error, np.mean(mean_error))
    all_val_dice = np.append(all_val_dice, np.mean(mean_dice))

    time_elapsed = time.time() - t0

    print('Elapsed_time: {:.0f}m{:.0f}s Val dice: {:.4f}'
          .format(time_elapsed // 60, time_elapsed % 60, mean_dice.mean()))
    
    
    num_it_per_epoch_train = ((train_loader.dataset.x_data.shape[0] * (1 - 0.2)) // (
            save_every * batch_size)) + 1
    epochs_train = np.arange(1,all_error.size+1) / num_it_per_epoch_train
    epochs_val = np.arange(0,all_val_dice.size)

    plt.figure()
    plt.plot(epochs_val, all_val_dice, label='dice_val')
    plt.xlabel('epochs')
    plt.legend()
    plt.title('Dice score')
    plt.savefig(path + '/dice_val.png')
    plt.close()



## Results 
the results are saved to a folder \results - so please check that:

The results are saved per epoch for both training and validation, and are saved as the 
1. real data, 
2. binary labels, 
3. predicted labels. 

In this example since we trained on a small sample of the data (100 crops) the results are far from optimal, and are likely to overfit to the data.

### Task 3

1. Change the dice loss to a cross entropy loss in the code - is dice loss or cross entropy loss better?
2. run the training with dropout - what's the effect?

**Note down your dice validation scores for each experiment, then change**
