# Homework 3, exercise 2 - Residual Neural Network on CIFAR10

In this exercise we implement a (slightly modified) ResNet as introduced in [this paper](https://arxiv.org/pdf/1512.03385.pdf).

Group 6
Jardi Timmerhuis, Patrick Vine, Ryan Sijstermans

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

import time

For this exercise it is recommended to use the GPU!

In [None]:

use_cuda = True

if use_cuda and torch.cuda.is_available():
  device = torch.device('cuda')
else:
  device = torch.device('cpu')

device

device(type='cuda')

### Load the CIFAR10 dataset

In [None]:
import torchvision
import torchvision.transforms as transforms

transform_train = transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])

transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])

trainset = torchvision.datasets.CIFAR10(root='./data_cifar', train=True,
                                        download=True, transform=transform_train)

testset = torchvision.datasets.CIFAR10(root='./data_cifar', train=False,
                                       download=True, transform=transform_test)

batch_size = 128

c, w, h = 3, 32, 32

trainloader = torch.utils.data.DataLoader(trainset,
                                          batch_size=batch_size,
                                          shuffle=True)

testloader = torch.utils.data.DataLoader(testset,
                                         batch_size=batch_size,
                                         shuffle=True)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data_cifar/cifar-10-python.tar.gz


HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))

Extracting ./data_cifar/cifar-10-python.tar.gz to ./data_cifar
Files already downloaded and verified


## Exercise - Implement a Residual Block

Residual neural networks mainly consist of components called Residual Blocks. One residual block can be expressed as **y** = *F*(**x**) + **x** where **x** and **y** are the input and output of the block, respectively. So the input **x** is added to the result of *F*(**x**) using a *skip connection*. In this exercise, *F* consists of:
* a convolutional layer with `in_channels` input channels, `hidden_channels` output channels, a kernel size of (3, 3), a stride of 1, padding of 1 and no bias parameter.
* a batch normalisation layer 
* ReLU activation
* a convolutional layer with `hidden_channels` input channels, `out_channels` output channels, a kernel size of (3, 3), a stride of 1, padding of 1 and no bias parameter.
* a batch normalisation layer

After this the `skip_connection` is applied. If the dimensions of *F*(**x**) and **x** don't match an extra linear projection is applied to **x** so the dimensions do match. This has already been implemented for you. You only need to call it at the right place. 
Finally, a ReLU activation is applied on the output **y**


In [None]:
class ResidualBlock(nn.Module):

  def __init__(self, in_channels, hidden_channels, out_channels):
    super().__init__()

    # Complete the code here!
    self.conv1 = nn.Conv2d(in_channels, hidden_channels, kernel_size=3, stride=1, padding=1, bias=False)
    self.bn1 = nn.BatchNorm2d(hidden_channels)

    self.conv2 = nn.Conv2d(hidden_channels, out_channels, kernel_size=3, stride=1, padding=1, bias=False)
    self.bn2 = nn.BatchNorm2d(out_channels)

    if in_channels != out_channels:  # F(x) and x dimensions do not match! Define a projection for input x
      self.skip_connection = nn.Sequential(
          nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=1, bias=False),
          nn.BatchNorm2d(out_channels)
      )
    else:
      self.skip_connection = lambda x: x  # The dimensions already match! No need to do a projection on x

  def forward(self, x):
    out = self.conv1(x)
    out = self.bn1(out)
    out = F.relu(out)

    out = self.conv2(out)
    out = self.bn2(out)
    out = out + self.skip_connection(x)
    out = F.relu(out)
    
    return out


  

## Exercise - Implement a Residual Neural Network
Now you can use the previously defined Residual Block to create your ResNet.

The network consists of:
* a convolutional layer with `in_channels` input channels, 64 output channels, a stride of 1, padding of 1 and no bias parameter,
* a batch normalisation layer
* ReLU activation
* a max pooling layer with kernel size (3, 3), a stride of 2 and padding of 1,
* eight residual blocks, with (64, 64, 128, 128, 256, 256, 512, 512) channels, respectively (see code below) 
* an average pooling layer over all feature maps (already present)
* a dense layer to form the output distribution (already present)

In [None]:
class ResNet(nn.Module):

  def __init__(self, in_channels, out_size):
    super().__init__()

    # Complete the code here!
    # kernel size - 3x3 seems like a reasonable choice
    # ALSO tried: 5x5 - trains slower with similar results, 
    #             7x7 stride 2 - what was used in paper, faster training, results not as good
    #             5x5 stride 2 - faster training, results not as good
    self.conv1 = nn.Conv2d(in_channels, 64, stride=1, kernel_size=3, padding=1, bias=False) 
    self.bn1 = nn.BatchNorm2d(64)
    self.pool1 = nn.MaxPool2d(3, stride=2, padding=1)


    self.res_blocks = nn.ModuleList(
        [
         ResidualBlock(64, 64, 64),
         ResidualBlock(64, 64, 64),
         
         ResidualBlock(64, 128, 128),
         ResidualBlock(128, 128, 128),
         
         ResidualBlock(128, 256, 256),
         ResidualBlock(256, 256, 256),

         ResidualBlock(256, 512, 512),
         ResidualBlock(512, 512, 512),
        ]
    )

    self.dense_layer = nn.Linear(512, out_size)
    
    for module in self.modules():
      if isinstance(module, nn.Conv2d):
          nn.init.kaiming_normal_(module.weight, mode='fan_out', nonlinearity='relu')

  def forward(self, x):  

    # Complete the code here!
    # Add everything that needs to be done before the average pooling
    x = self.conv1(x)
    x = self.bn1(x)
    x = F.relu(x)
    x = self.pool1(x)

    for res_block in self.res_blocks:
      x = res_block(x)

    x = F.avg_pool2d(x, x.shape[2:])
    
    x = x.view(x.size(0), -1)
    x = self.dense_layer(x)

    return x



### Initialize the network, Loss function and Optimizer

In [None]:
net = ResNet(c, len(classes)).to(device)

criterion = nn.CrossEntropyLoss()

optimizer = torch.optim.Adam(net.parameters(), lr=0.001)

## Exercise - Train/evaluate the network
Train the network you built using the code below. Add the following answers in your report:
* What test accuracy were you able to get?
* How many layers does your network have? (counting only convolutional and dense layers)
* Why do the skip connections help for training deep neural networks?

In [None]:
start=time.time()

for epoch in range(0,200):

  net.train()  # Put the network in train mode
  for i, (x_batch, y_batch) in enumerate(trainloader):
    x_batch, y_batch = x_batch.to(device), y_batch.to(device)  # Move the data to the device that is used
    
    optimizer.zero_grad()  # Set all currenly stored gradients to zero 

    y_pred = net(x_batch)

    loss = criterion(y_pred, y_batch)

    loss.backward()

    optimizer.step()

    # Compute relevant metrics
    
    y_pred_max = torch.argmax(y_pred, dim=1)  # Get the labels with highest output probability

    correct = torch.sum(torch.eq(y_pred_max, y_batch)).item()  # Count how many are equal to the true labels

    elapsed = time.time() - start  # Keep track of how much time has elapsed

    # Show progress every 20 batches 
    if not i % 20:
      print(f'epoch: {epoch}, time: {elapsed:.3f}s, loss: {loss.item():.3f}, train accuracy: {correct / batch_size:.3f}')
    
    correct_total = 0

  net.eval()  # Put the network in eval mode
  for i, (x_batch, y_batch) in enumerate(testloader):
    x_batch, y_batch = x_batch.to(device), y_batch.to(device)  # Move the data to the device that is used

    y_pred = net(x_batch)
    y_pred_max = torch.argmax(y_pred, dim=1)

    correct_total += torch.sum(torch.eq(y_pred_max, y_batch)).item()

  print(f'Accuracy on the test set: {correct_total / len(testset):.3f}')




epoch: 0, time: 0.411s, loss: 2.506, train accuracy: 0.070
epoch: 0, time: 6.150s, loss: 1.824, train accuracy: 0.297
epoch: 0, time: 11.899s, loss: 1.680, train accuracy: 0.352
epoch: 0, time: 17.655s, loss: 1.641, train accuracy: 0.359
epoch: 0, time: 23.438s, loss: 1.483, train accuracy: 0.406
epoch: 0, time: 29.232s, loss: 1.482, train accuracy: 0.461
epoch: 0, time: 35.041s, loss: 1.414, train accuracy: 0.492
epoch: 0, time: 40.894s, loss: 1.273, train accuracy: 0.578
epoch: 0, time: 46.788s, loss: 1.246, train accuracy: 0.539
epoch: 0, time: 52.665s, loss: 1.391, train accuracy: 0.508
epoch: 0, time: 58.580s, loss: 1.332, train accuracy: 0.555
epoch: 0, time: 64.467s, loss: 1.320, train accuracy: 0.531
epoch: 0, time: 70.385s, loss: 1.196, train accuracy: 0.555
epoch: 0, time: 76.302s, loss: 1.197, train accuracy: 0.539
epoch: 0, time: 82.226s, loss: 1.192, train accuracy: 0.617
epoch: 0, time: 88.156s, loss: 1.179, train accuracy: 0.586
epoch: 0, time: 94.122s, loss: 1.272, trai

In [None]:
correct_total = 0

for i, (x_batch, y_batch) in enumerate(testloader):
  x_batch, y_batch = x_batch.to(device), y_batch.to(device)  # Move the data to the device that is used

  y_pred = net(x_batch)
  y_pred_max = torch.argmax(y_pred, dim=1)

  correct_total += torch.sum(torch.eq(y_pred_max, y_batch)).item()

print(f'Accuracy on the test set: {correct_total / len(testset):.3f}')

Accuracy on the test set: 0.899
