<a href="https://colab.research.google.com/github/chendingyan/Pytorch_Exercise/blob/master/460cw2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Coursework2: Convolutional Neural Networks 

## instructions

Please submit a version of this notebook containing your answers **together with your trained model** on CATe as CW2.zip. Write your answers in the cells below each question.

A PDF version of this notebook is also provided in case the figures do not render correctly.

**The deadline for submission is 19:00, Thu 14th February, 2019**

### Setting up working environment 

For this coursework you will need to train a large network, therefore we recommend you work with Google Colaboratory, which provides free GPU time. You will need a Google account to do so. 

Please log in to your account and go to the following page: https://colab.research.google.com. Then upload this notebook.

For GPU support, go to "Edit" -> "Notebook Settings", and select "Hardware accelerator" as "GPU".

You will need to install pytorch by running the following cell:

In [0]:
!pip install torch torchvision

Collecting pillow>=4.1.1 (from torchvision)
[?25l  Downloading https://files.pythonhosted.org/packages/0d/f3/421598450cb9503f4565d936860763b5af413a61009d87a5ab1e34139672/Pillow-5.4.1-cp27-cp27mu-manylinux1_x86_64.whl (2.0MB)
[K    100% |████████████████████████████████| 2.0MB 10.0MB/s 
[31mfastai 0.7.0 has requirement torch<0.4, but you'll have torch 1.0.0 which is incompatible.[0m
[?25hInstalling collected packages: pillow
  Found existing installation: Pillow 4.0.0
    Uninstalling Pillow-4.0.0:
      Successfully uninstalled Pillow-4.0.0
Successfully installed pillow-5.4.1


## Introduction

For this coursework you will implement one of the most commonly used model for image recognition tasks, the Residual Network. The architecture is introduced in 2015 by Kaiming He, et al. in the paper ["Deep residual learning for image recognition"](https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf). 
<br>

In a residual network, each block contains some convolutional layers, plus "skip" connections, which allow the activations to by pass a layer, and then be summed up with the activations of the skipped layer. The image below illustrates a building block in residual networks.

![resnet-block](resnet-block.png)

Depending on the number of building blocks, resnets can have different architectures, for example ResNet-50, ResNet-101 and etc. Here you are required to build ResNet-18 to perform classification on the CIFAR-10 dataset, therefore your network will have the following architecture:

![resnet](utils/resnet.png)

## Part 1 (40 points)

In this part, you will use basic pytorch operations to define the 2D convolution and max pooling operation. 

### YOUR TASK

- implement the forward pass for Conv2D and MaxPool2D
- You can only fill in the parts which are specified as "YOUR CODE HERE"
- You are **NOT** allowed to use the torch.nn module and the conv2d/maxpooling functions in torch.nn.functional

In [0]:
import torch
import torch.nn as nn
import torch.nn.functional as F

In [3]:
def run():
    in_channels = 2
    out_channels = 3
    size = 4
    torch.manual_seed(123)
    X = torch.rand(1, in_channels, size, size)
    conv = nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=3, padding=1, bias=False)
    out = conv(X)
    print('out', out)
    print('out.size()', out.size())
    print('')

    Xunfold = F.unfold(X, kernel_size=3, padding=1)
    print('X.size()', X.size())
    print('Xunfold.size()', Xunfold.size())

    kernels_flat = conv.weight.data.view(out_channels, -1)
    print('kernels_flat.size()', kernels_flat.size())

#     res = torch.mul(kernels_flat, Xunfold)
    res = kernels_flat @ Xunfold
    res = res.view(1, out_channels, size, size)
    print('res', res)
    print('res.size()', res.size())


run()

out tensor([[[[ 0.1043,  0.1039,  0.0015,  0.0616],
          [ 0.3468,  0.1187,  0.2039,  0.1497],
          [ 0.3702,  0.1094,  0.3367, -0.0799],
          [ 0.2783,  0.0509,  0.2248, -0.0560]],

         [[-0.2539, -0.1130, -0.5706, -0.2353],
          [-0.1587, -0.1629, -0.2579, -0.2327],
          [-0.1034, -0.2095, -0.2072, -0.2556],
          [-0.0035, -0.2657, -0.1382, -0.2666]],

         [[-0.1875, -0.1812, -0.0619, -0.1588],
          [ 0.1102, -0.0302,  0.1760, -0.0813],
          [ 0.0375, -0.1669, -0.2365, -0.2954],
          [ 0.2358,  0.1364, -0.0573,  0.0137]]]],
       grad_fn=<MkldnnConvolutionBackward>)
out.size() torch.Size([1, 3, 4, 4])

X.size() torch.Size([1, 2, 4, 4])
Xunfold.size() torch.Size([1, 18, 16])
kernels_flat.size() torch.Size([3, 18])
res tensor([[[[ 0.1043,  0.1039,  0.0015,  0.0616],
          [ 0.3468,  0.1187,  0.2039,  0.1497],
          [ 0.3702,  0.1094,  0.3367, -0.0799],
          [ 0.2783,  0.0509,  0.2248, -0.0560]],

         [[-0.2539, -

In [45]:
class Conv2D(nn.Module):
    
    def __init__(self, inchannel, outchannel, kernel_size, stride, padding, bias = True):
        
        super(Conv2D, self).__init__()
        
        self.inchannel = inchannel
        self.outchannel = outchannel
        self.kernel_size = kernel_size
        self.stride = stride
        self.padding = padding
        
        self.weights = nn.Parameter(torch.Tensor(outchannel, inchannel, 
                                                 kernel_size, kernel_size))
        self.weights.data.normal_(-0.1, 0.1)

        if bias:
            self.bias = nn.Parameter(torch.Tensor(outchannel, ))
            self.bias.data.normal_(-0.1, 0.1)
        else:
            self.bias = None
  
    def forward(self, x):
        
        ##############################################################
        #                       YOUR CODE HERE                       #       
        ##############################################################
        print(x.shape)
        input = F.unfold(x, kernel_size = self.kernel_size, padding = self.padding, stride = self.stride)
        print(input)
        output = 0.0
#         for batch_x in range(x.shape[0]):

#           width = (x.shape[3] - self.kernel_size + 2* self.padding)/self.stride + 1

# #           output = nn.Parameter(torch.Tensor(x.shape[0],self.outchannel, width, width))
#           output = torch.zeros((x.shape[0], self.outchannel, width, width), device ='cuda')
#           input = torch.zeros((self.inchannel, x.shape[3] + 2*self.padding, x.shape[3] + 2*self.padding))
#           input[:, self.padding: x.shape[3]+ self.padding, self.padding: x.shape[3]+self.padding ] = x[batch_x]
#           print(input)
#           input = input.unfold(0, self.kernel_size, self.stride)
#           print(input)
#           start_pixel_h = 0
#           end_pixel_h = start_pixel_h +self.kernel_size
#           for i in range(width):
#             start_pixel_w = 0
#             end_pixel_w = start_pixel_w + self.kernel_size
#             for j in range(width):
#               matrix_result = torch.mul(self.weights.type(torch.cuda.FloatTensor), input[:, start_pixel_h:end_pixel_h, start_pixel_w:end_pixel_w].type(torch.cuda.FloatTensor))
# #               print(matrix_result.type())
              
#               if self.bias:
#                 output[batch_x, :, i, j] = torch.sum(matrix_result).type(torch.cuda.FloatTensor) + self.bias[:]
#               else:
#                 output[batch_x, :, i, j] = torch.sum(matrix_result).type(torch.cuda.FloatTensor)
#               start_pixel_w+=self.stride
#               end_pixel_w = start_pixel_w +self.kernel_size
#             start_pixel_h+= self.stride
#             end_pixel_h = start_pixel_h + self.kernel_size
#           for channel in range(self.outchannel):
#             for i in range(width):
#               for j in range(width):
#                 sum_channel = 0
#                 for inchan in range(self.inchannel):
#                   sum = 0
#                   for m in range(self.kernel_size):
#                     for n in range(self.kernel_size):
#                       sum += self.weights[channel][inchan][m][n] * input[inchan][i+m][j+n]
#                   sum_channel += sum
#                 output[batch_x][channel][i][j] = sum_channel
#                 if self.bias:
#                   output[batch_x][channel][i][j] += self.bias[channel]
        print('Finish one conv2d')

        ##############################################################
        #                       END OF YOUR CODE                     #
        ##############################################################
        

        return output
conv2d = Conv2D(1,1,3,1,1,True)
x = torch.tensor([[1,1,1,0,0,0,1,1,1,0,0,0,1,1,1,0,0,1,1,0,0,1,1,0,0]])
x=x.view(1,1,5,5)
print(x.shape[3])
output = conv2d(x)
print (output)

5
torch.Size([1, 1, 5, 5])


KeyError: ignored

In [26]:
class MaxPool2D(nn.Module):
    
    def __init__(self, pooling_size):
        # assume pooling_size = kernel_size = stride
        
        super(MaxPool2D, self).__init__()
        
        self.pooling_size = pooling_size
        

    def forward(self, x):
        
        
        ##############################################################
        #                       YOUR CODE HERE                       #       
        ##############################################################
        for batch_x in range(x.shape[0]): 
          
          width = x.shape[3]/self.pooling_size
#             output = nn.Parameter(torch.Tensor(x.shape[0], x.shape[1], width, width), device='cuda')
          for channel in range(x.shape[1]):
            output = torch.zeros((x.shape[0], x.shape[1], width, width), device ='cuda')
            start_pixel_h = 0
            end_pixel_h = start_pixel_h + self.pooling_size
            for i in range(width):
              start_pixel_w = 0
              end_pixel_w = start_pixel_w + self.pooling_size
              for j in range(width):

                max = torch.max(x[batch_x, channel, start_pixel_h: end_pixel_h, start_pixel_w: end_pixel_w])

                output[batch_x][channel][i][j] = max

                start_pixel_w += self.pooling_size
                end_pixel_w = start_pixel_w + self.pooling_size
              start_pixel_h += self.pooling_size
              end_pixel_h = start_pixel_h + self.pooling_size
        print('Finish one batch pooling')
#                 max = x[batch_x][channel][self.pooling_size*i][self.pooling_size*j]
#                 for m in range(self.pooling_size):
#                   for n in range(self.pooling_size):
#                     if max < x[batch_x][channel][self.pooling_size*i+m][self.pooling_size*j+n]:
#                       max = x[batch_x][channel][self.pooling_size*i+m][self.pooling_size*j+n]
#                 output[batch_x][channel][i][j] = max
                


        ##############################################################
        #                       END OF YOUR CODE                     #
        ##############################################################
                
        
        return output
x = torch.tensor([[ 0,  1,  2,  3, 4,  5,  6,  7, 8,  9, 10, 11, 12, 13, 14, 15]])
x = x.view(1,1,4,4)
print(x)
maxpool = MaxPool2D(2)
output = maxpool(x)
print(output)

tensor([[[[ 0,  1,  2,  3],
          [ 4,  5,  6,  7],
          [ 8,  9, 10, 11],
          [12, 13, 14, 15]]]])
Finish one batch pooling
tensor([[[[ 5.,  7.],
          [13., 15.]]]], device='cuda:0')


In [0]:
# define resnet building blocks

class ResidualBlock(nn.Module): 
    def __init__(self, inchannel, outchannel, stride=1): 
        
        super(ResidualBlock, self).__init__() 
        
        self.left = nn.Sequential(Conv2D(inchannel, outchannel, kernel_size=3, 
                                         stride=stride, padding=1, bias=False), 
                                  nn.BatchNorm2d(outchannel), 
                                  nn.ReLU(inplace=True), 
                                  Conv2D(outchannel, outchannel, kernel_size=3, 
                                         stride=1, padding=1, bias=False), 
                                  nn.BatchNorm2d(outchannel)) 
        
        self.shortcut = nn.Sequential() 
        
        if stride != 1 or inchannel != outchannel: 
            
            self.shortcut = nn.Sequential(Conv2D(inchannel, outchannel, 
                                                 kernel_size=1, stride=stride, 
                                                 padding = 0, bias=False), 
                                          nn.BatchNorm2d(outchannel) ) 
            
    def forward(self, x): 
        
        out = self.left(x) 
        
        out += self.shortcut(x) 
        
        out = F.relu(out) 
        print('Finish one residual block')
        return out


In [0]:
# define resnet

class ResNet(nn.Module):
    
    def __init__(self, ResidualBlock, num_classes = 10):
        
        super(ResNet, self).__init__()
        
        self.inchannel = 64
        self.conv1 = nn.Sequential(Conv2D(3, 64, kernel_size = 3, stride = 1,
                                            padding = 1, bias = False), 
                                  nn.BatchNorm2d(64), 
                                  nn.ReLU())
        
#         self.conv1 = nn.Sequential(nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias = False),
#                                   nn.BatchNorm2d(64), 
#                                   nn.ReLU())
        
        self.layer1 = self.make_layer(ResidualBlock, 64, 2, stride = 1)
        self.layer2 = self.make_layer(ResidualBlock, 128, 2, stride = 2)
        self.layer3 = self.make_layer(ResidualBlock, 256, 2, stride = 2)
        self.layer4 = self.make_layer(ResidualBlock, 512, 2, stride = 2)
        self.maxpool = MaxPool2D(4)
        self.fc = nn.Linear(512, num_classes)
        
    
    def make_layer(self, block, channels, num_blocks, stride):
        
        strides = [stride] + [1] * (num_blocks - 1)
        
        layers = []
        
        for stride in strides:
            
            layers.append(block(self.inchannel, channels, stride))
            
            self.inchannel = channels
            
        return nn.Sequential(*layers)
    
    
    def forward(self, x):
        x = self.conv1(x)
        print('Finish first convolution layer')
        x = self.layer1(x)
        print('Finish first Resnet layer')
        x = self.layer2(x)
        print('Finish second Resnet layer')
        x = self.layer3(x)
        print('Finish third Resnet layer')
        x = self.layer4(x)
        print('Finish fourth Resnet layer')
        x = self.maxpool(x)
        
        x = x.view(x.size(0), -1)

        x = self.fc(x)
        
        return x
    
    
def ResNet18():
    return ResNet(ResidualBlock)

## Part 2 (40 points)

In this part, you will train the ResNet-18 defined in the previous part on the CIFAR-10 dataset. Code for loading the dataset, training and evaluation are provided. 

### Your Task

1. Train your network to achieve the best possible test set accuracy after a maximum of 10 epochs of training.

2. You can use techniques such as optimal hyper-parameter searching, data pre-processing

3. If necessary, you can also use another optimiser

4. **Answer the following question:**
Given such a network with a large number of trainable parameters, and a training set of a large number of data, what do you think is the best strategy for hyperparameter searching? 

**YOUR ANSWER FOR 2.4 HERE**

A:

In [32]:
import torch.optim as optim
from torch.utils.data import DataLoader
from torch.utils.data import sampler

import torchvision.datasets as dset

import numpy as np

import torchvision.transforms as T


transform = T.ToTensor()


# load data

NUM_TRAIN = 49000
print_every = 100


data_dir = './data'
cifar10_train = dset.CIFAR10(data_dir, train=True, download=True, transform=transform)
loader_train = DataLoader(cifar10_train, batch_size=64, 
                          sampler=sampler.SubsetRandomSampler(range(NUM_TRAIN)))

cifar10_val = dset.CIFAR10(data_dir, train=True, download=True, transform=transform)
loader_val = DataLoader(cifar10_val, batch_size=64, 
                        sampler=sampler.SubsetRandomSampler(range(NUM_TRAIN, 50000)))

cifar10_test = dset.CIFAR10(data_dir, train=False, download=True, transform=transform)
loader_test = DataLoader(cifar10_test, batch_size=64)


USE_GPU = True
dtype = torch.float32 

if USE_GPU and torch.cuda.is_available():
    device = torch.device('cuda')
else:
    device = torch.device('cpu')

Files already downloaded and verified
Files already downloaded and verified
Files already downloaded and verified


In [0]:
import time
def check_accuracy(loader, model):
    # function for test accuracy on validation and test set
    
    if loader.dataset.train:
        print('Checking accuracy on validation set')
    else:
        print('Checking accuracy on test set')   
    num_correct = 0
    num_samples = 0
    model.eval()  # set model to evaluation mode
    with torch.no_grad():
        for x, y in loader:
            x = x.to(device=device, dtype=dtype)  # move to device
            y = y.to(device=device, dtype=torch.long)
            scores = model(x.cuda())
            _, preds = scores.max(1)
            num_correct += (preds == y).sum()
            num_samples += preds.size(0)
        acc = float(num_correct) / num_samples
        print('Got %d / %d correct (%.2f)' % (num_correct, num_samples, 100 * acc))


def train_part(model, optimizer, epochs=1):
    """
    Train a model on CIFAR-10 using the PyTorch Module API.
    
    Inputs:
    - model: A PyTorch Module giving the model to train.
    - optimizer: An Optimizer object we will use to train the model
    - epochs: (Optional) A Python integer giving the number of epochs to train for
    
    Returns: Nothing, but prints model accuracies during training.
    """
    model = model.to(device=device)  # move the model parameters to CPU/GPU
    for e in range(epochs):
        print(len(loader_train))
        start = time.time()
        for t, (x, y) in enumerate(loader_train):
            model.train()  # put model to training mode
            x = x.to(device=device, dtype=dtype)  # move to device, e.g. GPU
            y = y.to(device=device, dtype=torch.long)

            scores = model(x.cuda())
            loss = F.cross_entropy(scores, y)

            # Zero out all of the gradients for the variables which the optimizer
            # will update.
            optimizer.zero_grad()

            loss.backward()

            # Update the parameters of the model using the gradients
            optimizer.step()
            print('Start')
            if t % print_every == 0:
                print('Epoch: %d, Iteration %d, loss = %.4f ,time = %.4f sec' % (e, t, loss.item(),(time.time() - start)))
                check_accuracy(loader_val, model)
                print()

In [34]:
# code for optimising your network performance

##############################################################
#                       YOUR CODE HERE                       #       
##############################################################



##############################################################
#                       END OF YOUR CODE                     #
##############################################################


# define and train the network
model = ResNet18()

optimizer = optim.Adam(model.parameters())

train_part(model, optimizer, epochs = 1)


# report test set accuracy

check_accuracy(loader_test, model)


# save the model
torch.save(model.state_dict(), 'model.pt')

766
Finish one conv2d
Finish first convolution layer
Finish one conv2d
Finish one conv2d
Finish one residual block
Finish one conv2d
Finish one conv2d
Finish one residual block
Finish first Resnet layer
Finish one conv2d
Finish one conv2d
Finish one conv2d
Finish one residual block
Finish one conv2d
Finish one conv2d
Finish one residual block
Finish second Resnet layer
Finish one conv2d
Finish one conv2d
Finish one conv2d
Finish one residual block
Finish one conv2d
Finish one conv2d
Finish one residual block
Finish third Resnet layer
Finish one conv2d
Finish one conv2d
Finish one conv2d
Finish one residual block
Finish one conv2d
Finish one conv2d
Finish one residual block
Finish fourth Resnet layer
Finish one batch pooling
Start
Epoch: 0, Iteration 0, loss = 2.2964 ,time = 78.9193 sec
Checking accuracy on validation set
Finish one conv2d
Finish first convolution layer


KeyboardInterrupt: ignored

In [0]:
## Part 3 (20 points)

The code provided below will allow you to visualise the feature maps computed by different layers of your network. Run the code (install matplotlib if necessary) and **answer the following questions**: 

1. Compare the feature maps from low-level layers to high-level layers, what do you observe? 

2. Use the training log, reported test set accuracy and the feature maps, analyse the performance of your network. If you think the performance is sufficiently good, explain why; if not, what might be the problem and how can you improve the performance?

3. What are the other possible ways to analyse the performance of your network?

**YOUR ANSWER FOR PART 3 HERE**

A:

In [0]:
#!pip install matplotlib

import matplotlib.pyplot as plt

plt.tight_layout()


activation = {}
def get_activation(name):
    def hook(model, input, output):
        activation[name] = output.detach()
    return hook

vis_labels = ['conv1', 'layer1', 'layer2', 'layer3', 'layer4']

for l in vis_labels:

    getattr(model, l).register_forward_hook(get_activation(l))
    
    
data, _ = cifar10_test[0]
data = data.unsqueeze_(0).to(device = device, dtype = dtype)

output = model(data)



for idx, l in enumerate(vis_labels):

    act = activation[l].squeeze()

    if idx < 2:
        ncols = 8
    else:
        ncols = 32
        
    nrows = act.size(0) // ncols
    
    fig, axarr = plt.subplots(nrows, ncols)
    fig.suptitle(l)


    for i in range(nrows):
        for j in range(ncols):
            axarr[i, j].imshow(act[i * nrows + j].cpu())
            axarr[i, j].axis('off')

**=============== END OF CW2 ===============**