# Residual Network Experiments in PyTorch

**|| Jonty Sinai ||** 13-04-2019

To date, ResNet is one of the most successful neural network architectures in computer vision. Using what's called a skip connection, ResNet controls vanishing gradients and allows information to be propagated through relatively deeper layers. 

In [1]:
%matplotlib inline

import os
import re  # we'll use this later to process layer type keys in an OrderedDict 
import random

from collections import OrderedDict

import numpy as np

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

import torchvision
from torchvision import transforms

import matplotlib.pyplot as plt


HOME = os.environ['AI_HOME']
ROOT = os.path.join(HOME, 'artificial_neural_networks')
DATA = os.path.join(ROOT, 'data')
MNIST = os.path.join(DATA, 'mnist')
CIFAR10 = os.path.join(DATA, 'cifar10')

random.seed(1901)
np.random.seed(1901)
torch.manual_seed(1901)

<torch._C.Generator at 0x7f3f4c12e6d0>

## Deep Residual Networks

The key idea behind ResNet is to observe that if we have two networks, a shallow network and its deeper counterpart, then the deeper network should perform no worse than the shallower network if the additional layers are identity mappings from the input. 

In particular (and as outlined in the paper referenced below) let this mapping, called the _residual mapping_, be denoted by $\mathcal{H}(x)$. Then since the shallower network, let's denote it by $\mathcal{F}(x)$ differs from the residual network by the identity we have:

\begin{align}
    \mathcal{F}(x) = \mathcal{H}(x) - x
\end{align}

Now we can learn the deeper network instead:

\begin{align}
    \mathcal{H}(x) = \mathcal{F}(x) + x
\end{align}

The picture looks something like this

<img src="./assets/resnet_skip_connection.png" width="300">

source: [Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, _Deep Residual Learning for Image Recognition_, 2015](https://arxiv.org/abs/1512.03385)

So how does this help solve the vanishing gradient problem? Well let's suppose that the network with mapping $\mathcal{F}$ suffers from weight degradation, so that the input is mapped to approximately zero, then:

\begin{align}
    \mathcal{F}(x) \approx 0 \ , \ \ \lVert \mathcal{F}(x) - x \rVert \approx x
\end{align}

This means that the residual network, $\mathcal{H}$ can be used to "reset" the input if $\mathcal{F}$ has degraded:

\begin{align}
    \mathcal{H}(x) \approx x
\end{align}

With this in mind, suppose that we have stacks of such residual networks, called _residual blocks_, mapping into each other:

\begin{align}
x \rightarrow \mathcal{H}_1 \rightarrow \mathcal{H}_2 \cdots \rightarrow \mathcal{H}_L \rightarrow y
\end{align}

Then if any one of these intermediate blocks degrades, then its input will simply be reset and passed onto the next block. This allows information to be propagated through a deep network even if weights decay.

An example of these stacked blocks look and map to the output is shown below:

<img src="./assets/resnet_architectures.png" width="800">

source: [Andrew Ng, Coursera](https://www.coursera.org/learn/convolutional-neural-networks/home/welcome)

Notice how ResNet only uses two pooling layers, once at the start and once after the last residual block.

## Residual Block

We will see how we can improve on CIFAR-10 using a simple architecture with two resnet blocks. A key implementation detail is that both $x, \mathcal{F}(x) \in \mathbb{R}^d$ so that we can add them. This means that all convolutional layers will need same padding and the same number of intermediate channels.

In [114]:
class ResBlock(nn.Module):
    
    def __init__(self, kernel_size, num_channels, depth, stride=1, input_size=None):
        super().__init__()
        # same padding calculation
        if stride > 1:
            if input_size is None:
                raise ValueError("input size is needed to calculate same padding when stride > 1")
            same_padding = round((kernel_size + (stride - 1) * input_size - stride) / 2)
        else:
            same_padding = round((kernel_size - 1) / 2)
        # successive convolutions
        self.convolution_block = nn.ModuleList(
            [nn.Conv2d(num_channels, num_channels, kernel_size, stride, same_padding) for _ in range(depth)])
        
    def forward(self, x):
        h = x.clone()
        for convolution in self.convolution_block:
            h = F.relu(convolution(h))
        return h + x


### Unit test: ResBlock

In [116]:
res_block = ResBlock(kernel_size=3,
                     num_channels=3,
                     depth=4, 
                     stride=2,
                     input_size=10)

print(res_block)

ResBlock(
  (convolution_block): ModuleList(
    (0): Conv2d(3, 3, kernel_size=(3, 3), stride=(2, 2), padding=(6, 6))
    (1): Conv2d(3, 3, kernel_size=(3, 3), stride=(2, 2), padding=(6, 6))
    (2): Conv2d(3, 3, kernel_size=(3, 3), stride=(2, 2), padding=(6, 6))
    (3): Conv2d(3, 3, kernel_size=(3, 3), stride=(2, 2), padding=(6, 6))
  )
)


In [117]:
x = torch.randn(5, 3, 10, 10)

z = res_block(x)
print(z.size())

torch.Size([5, 3, 10, 10])


## Simple ResNet

Now let's implement a simple ResNet for CIFAR-10 which contains an initial convolution layer with max pooling for dimension reduction and channel expansion, two shallow residual blocks, followed by an average pooling layer and finally a fully connected layer mapping to the output.

We'll use the same generic [`ConvNet`](conv_net.ipynb) module as before, but this time we'll add a `"ResBlock"` option and allow only the `"ReLU"` nonlinear activation. For simplicity we'll also restrict ResBlock strides to 1 so that the input size is not needed to calculate same padding.

In [118]:
class Flatten(nn.Module):
    # ref: https://discuss.pytorch.org/t/flatten-layer-of-pytorch-build-by-sequential-container/5983/3
    def forward(self, x):
        x = x.view(x.size()[0], -1)
        return x

In [119]:
class ConvNet(nn.Module):
    
    def __init__(self, arch_dict: OrderedDict):
        """
        Args:
            arch_dict (OrderedDict) : Specifies the CNN archicture where
                key, value pairs correspond to layer_type, layer_params.
                Layer parameters are specified as a tuple of integers or
                they can be None.
                
                The supported layer types with their parameters are:
                
                    Conv2d : (in_channels, out_channels, kernel_size, stride, padding)
                    AvgPool2d : (kernel_size, stride, padding)
                    MaxPool2d : (kernel_size, stride, padding)
                    ResBlock : (kernel_size, num_channels, depth)
                    Flatten : None
                    Linear : input_size, output_size
                    ReLU : None
                    
                If more layer_types are used repeatedly, then they should be
                post-fixed with an underscore followed by an alphanumeric
                index. 
                
                Eg: "Conv2d_1", "Conv2d_2", "Tanh_1a", "Tanh_1b"
                    
        """
        super().__init__()
        
        self.layers = nn.ModuleList()
        
        # make sure arch_dict is an OrderedDict
        # for activation layers, use None for layer_params
        for layer_type, layer_params in arch_dict.items():
            
            layer_type = re.sub(r"_[\d\w]+", "", layer_type) # remove number/letter post-fixing of layer types
            
            if layer_type == "Conv2d":
                in_channels, out_channels, kernel_size, stride, padding = layer_params
                self.layers.append(
                    nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding))
            elif layer_type == "AvgPool2d":
                kernel_size, stride, padding = layer_params
                self.layers.append(
                    nn.AvgPool2d(kernel_size, stride, padding))
            elif layer_type == "MaxPool2d":
                kernel_size, stride, padding = layer_params
                self.layers.append(
                    nn.MaxPool2d(kernel_size, stride, padding))
            elif layer_type == "ResBlock":
                kernel_size, num_channels, depth = layer_params
                self.layers.append(
                    ResBlock(kernel_size, num_channels, depth=depth))
            elif layer_type == "Flatten":
                self.layers.append(
                    Flatten())
            elif layer_type == "Linear":
                input_size, output_size = layer_params
                self.layers.append(
                    nn.Linear(input_size, output_size))
            elif layer_type == "ReLU":
                self.layers.append(
                    nn.ReLU())
            else:
                raise ValueError(f"Unsupported layer type: {layer_type}")
                
    def forward(self, x):
        for layer in self.layers:
            x = layer(x)
        return x


### Unit test: Simple ResNet

In [137]:
res_arch = OrderedDict()

# 1: Convolutional layer
res_arch["Conv2d_1"] = (3, 16, 3, 1, 1)  # 32x32x3 -> 32x32x16, 3x3 kernel, stride=1, padding=1
res_arch["MaxPool2d_1"] = (2, 2, 0)  # 32x32x16 -> 16x16x16, 2x2 kernel, stride=2, padding = 0
res_arch["ReLU_1"] = None
# 2: ResBlock
res_arch["ResBlock_2"] = (3, 16, 2) # 3x3 kernels, 16 channels, 2 layers
# 3: Convolutional layer
res_arch["Conv2d_3"] = (16, 32, 1, 1, 0)  # 16x16x16 -> 16x16x32, 1x1 kernel, stride=1, padding=0
res_arch["MaxPool2d_3"] = (2, 2, 0)  # 16x16x32 -> 8x8x32, 2x2 kernel, stride=2, padding = 0
res_arch["ReLU_3"] = None
# 4: ResBlock
res_arch["ResBlock_4"] = (3, 32, 2) # 3x3 kernels, 32 channels, 2 layers
# 5: AvgPool
res_arch["AvgPool2d_5"] = (2, 2, 0) # 8x8x32 -> 4x4x32, 2x2 kernel, stride=2, padding = 0
# 6: Flatten
res_arch["Flatten_6"] = None  # 4x4x32 -> 512
# 7: Fully Connected Layer
res_arch["Linear"] = (512, 10) # 512 hidden units -> 10 output units

res_arch

OrderedDict([('Conv2d_1', (3, 16, 3, 1, 1)),
             ('MaxPool2d_1', (2, 2, 0)),
             ('ReLU_1', None),
             ('ResBlock_2', (3, 16, 2)),
             ('Conv2d_3', (16, 32, 1, 1, 0)),
             ('MaxPool2d_3', (2, 2, 0)),
             ('ReLU_3', None),
             ('ResBlock_4', (3, 32, 2)),
             ('AvgPool2d_5', (2, 2, 0)),
             ('Flatten_6', None),
             ('Linear', (512, 10))])

In [138]:
res_small = ConvNet(res_arch)

print(res_small)

ConvNet(
  (layers): ModuleList(
    (0): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (2): ReLU()
    (3): ResBlock(
      (convolution_block): ModuleList(
        (0): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      )
    )
    (4): Conv2d(16, 32, kernel_size=(1, 1), stride=(1, 1))
    (5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): ReLU()
    (7): ResBlock(
      (convolution_block): ModuleList(
        (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      )
    )
    (8): AvgPool2d(kernel_size=2, stride=2, padding=0)
    (9): Flatten()
    (10): Linear(in_features=512, out_features=10, bias=True)
  )
)


In [139]:
x = torch.randn(10, 3, 32, 32)

y = res_small(x)
print(y.size())

torch.Size([10, 10])


## Train and Evaluate Functions

In [127]:
def train(model, training_data, optimiser, loss_function, num_epochs):
    
    for epoch in range(num_epochs):
        print(f"Epoch: {epoch + 1} " + "="*80 + ">")
        
        total_loss = 0.0
        for batch_idx, batch in enumerate(training_data):
            images, labels = batch
            
            # zero accumulated gradients
            optimiser.zero_grad()
            
            # forward pass
            output = model(images)
            # backward pass
            loss = loss_function(output, labels)
            loss.backward()
            optimiser.step()
            
            total_loss += loss.item()
            # print progress
            
            if (batch_idx + 1) % 1000 == 0:    # print every 1000 mini-batches
                print("[%4d/6000] loss: %.3f" %
                      (batch_idx + 1, total_loss / 1000))
                total_loss = 0.0
                
    print("Finished Training " + "="*71 + ">")

In [128]:
def evaluate(model, test_data):
    correct = 0
    total = 0
    with torch.no_grad():
        for data in test_data:
            images, truth = data
            output = model(images)
            _, predicted = torch.max(output.data, 1)
            total += truth.size(0)
            correct += (predicted == truth).sum().item()

    print('Test accuracy on %d test images: %.4f %%' % (total, 100 * correct / total))

## CIFAR-10 Data

In [126]:
cifar10_transforms = transforms.Compose([
                    transforms.ToTensor(),
                    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]  # note that we normalise by rank-1 tensors
                )

# Training Set
cifar10_trainset = torchvision.datasets.CIFAR10(root=CIFAR10, train=True, download=True, transform=cifar10_transforms)
cifar10_trainloader = torch.utils.data.DataLoader(cifar10_trainset, batch_size=10, shuffle=True, num_workers=0)

# Test Set
cifar10_testset = torchvision.datasets.CIFAR10(root=CIFAR10, train=False, download=False, transform=cifar10_transforms)
cifar10_testloader = torch.utils.data.DataLoader(cifar10_testset, batch_size=10, shuffle=False, num_workers=0)

cifar10_classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

Files already downloaded and verified


Now train

In [140]:
cross_entropy = nn.CrossEntropyLoss()
adam = optim.Adam(res_small.parameters(), lr=0.001)

In [141]:
train(res_small, cifar10_trainloader, adam, cross_entropy, num_epochs=3)

[1000/6000] loss: 1.788
[2000/6000] loss: 1.459
[3000/6000] loss: 1.373
[4000/6000] loss: 1.309
[5000/6000] loss: 1.263
[1000/6000] loss: 1.191
[2000/6000] loss: 1.167
[3000/6000] loss: 1.141
[4000/6000] loss: 1.110
[5000/6000] loss: 1.080
[1000/6000] loss: 1.037
[2000/6000] loss: 1.020
[3000/6000] loss: 0.998
[4000/6000] loss: 1.005
[5000/6000] loss: 0.972


In [142]:
evaluate(res_small, cifar10_testloader)

Test accuracy on 10000 test images: 64.4800 %


## Corrected Implementation of ResBlock

In [146]:
class ResBlock(nn.Module):
    
    def __init__(self, kernel_size, num_channels, depth, stride=1, input_size=None):
        super().__init__()
        # same padding calculation
        if stride > 1:
            if input_size is None:
                raise ValueError("input size is needed to calculate same padding when stride > 1")
            same_padding = round((kernel_size + (stride - 1) * input_size - stride) / 2)
        else:
            same_padding = round((kernel_size - 1) / 2)
        # successive convolutions
        self.convolution_block = nn.ModuleList(
            [nn.Conv2d(num_channels, num_channels, kernel_size, stride, same_padding) for _ in range(depth)])
        
    def forward(self, x):
        h = x.clone()
        for convolution in self.convolution_block[:-1]:
            h = F.relu(convolution(h))
        # don't apply relu to last convolution in block
        h = self.convolution_block[-1](h)
        # add and then apply relu
        return F.relu(h + x)

In [147]:
res_small = ConvNet(res_arch)

print(res_small)

ConvNet(
  (layers): ModuleList(
    (0): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (2): ReLU()
    (3): ResBlock(
      (convolution_block): ModuleList(
        (0): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      )
    )
    (4): Conv2d(16, 32, kernel_size=(1, 1), stride=(1, 1))
    (5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): ReLU()
    (7): ResBlock(
      (convolution_block): ModuleList(
        (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      )
    )
    (8): AvgPool2d(kernel_size=2, stride=2, padding=0)
    (9): Flatten()
    (10): Linear(in_features=512, out_features=10, bias=True)
  )
)


In [148]:
x = torch.randn(10, 3, 32, 32)

y = res_small(x)
print(y.size())

torch.Size([10, 10])


In [149]:
cross_entropy = nn.CrossEntropyLoss()
adam = optim.Adam(res_small.parameters(), lr=0.001)

In [150]:
train(res_small, cifar10_trainloader, adam, cross_entropy, num_epochs=3)

[1000/6000] loss: 1.791
[2000/6000] loss: 1.488
[3000/6000] loss: 1.370
[4000/6000] loss: 1.302
[5000/6000] loss: 1.224
[1000/6000] loss: 1.155
[2000/6000] loss: 1.120
[3000/6000] loss: 1.116
[4000/6000] loss: 1.084
[5000/6000] loss: 1.050
[1000/6000] loss: 0.996
[2000/6000] loss: 0.977
[3000/6000] loss: 0.986
[4000/6000] loss: 0.957
[5000/6000] loss: 0.937


In [151]:
evaluate(res_small, cifar10_testloader)

Test accuracy on 10000 test images: 66.4600 %


## ResNet with Transitions

In [193]:
class ResBlockTransition(nn.Module):
    
    def __init__(self, kernel_size, in_channels, out_channels, depth):
        super().__init__()
        # padding calculation
        padding = round((kernel_size - 1) / 2)
        # convolution layers
        # first layer changes number of channels and halves output size
        self.convolution_block = nn.ModuleList(
            [nn.Conv2d(in_channels, out_channels, kernel_size, 2, padding)])
        # remaining layers preserve number of channels
        self.convolution_block.extend(
            [nn.Conv2d(out_channels, out_channels, kernel_size, 1, padding) for _ in range(depth - 1)])
        # transition convolution
        self.transition = nn.Conv2d(in_channels, out_channels, 1, 2, 0)
        
    def forward(self, x):
        h = x.clone()
        for convolution in self.convolution_block[:-1]:
            h = F.relu(convolution(h))
        # don't apply relu to last convolution in block
        h = self.convolution_block[-1](h)
        # transition input through a 1x1 convolution kernel to change channel sizes and half resolution
        x = self.transition(x)
        return F.relu(h + x)

### Unit test: ResBlockTransition

In [194]:
res_block = ResBlockTransition(
                     kernel_size=3,
                     in_channels=3,
                     out_channels=6,
                     depth=4)

print(res_block)

ResBlockTransition(
  (convolution_block): ModuleList(
    (0): Conv2d(3, 6, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (1): Conv2d(6, 6, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (2): Conv2d(6, 6, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): Conv2d(6, 6, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  )
  (transition): Conv2d(3, 6, kernel_size=(1, 1), stride=(2, 2))
)


In [195]:
x = torch.randn(5, 3, 10, 10)

z = res_block(x)
print(z.size())

torch.Size([5, 6, 5, 5])


### Train with automatic transitions

In [196]:
class ConvNet(nn.Module):
    
    def __init__(self, arch_dict: OrderedDict):
        """
        Args:
            arch_dict (OrderedDict) : Specifies the CNN archicture where
                key, value pairs correspond to layer_type, layer_params.
                Layer parameters are specified as a tuple of integers or
                they can be None.
                
                The supported layer types with their parameters are:
                
                    Conv2d : (in_channels, out_channels, kernel_size, stride, padding)
                    AvgPool2d : (kernel_size, stride, padding)
                    MaxPool2d : (kernel_size, stride, padding)
                    ResBlock : (kernel_size, num_channels, depth)
                    ResBlockTransition : (kernel_size, in_channels, out_channels, depth)
                    Flatten : None
                    Linear : input_size, output_size
                    ReLU : None
                    
                If more layer_types are used repeatedly, then they should be
                post-fixed with an underscore followed by an alphanumeric
                index. 
                
                Eg: "Conv2d_1", "Conv2d_2", "ReLU_1a", "ReLU_1b"
                    
        """
        super().__init__()
        
        self.layers = nn.ModuleList()
        
        # make sure arch_dict is an OrderedDict
        # for activation layers, use None for layer_params
        for layer_type, layer_params in arch_dict.items():
            
            layer_type = re.sub(r"_[\d\w]+", "", layer_type) # remove number/letter post-fixing of layer types
            
            if layer_type == "Conv2d":
                in_channels, out_channels, kernel_size, stride, padding = layer_params
                self.layers.append(
                    nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding))
            elif layer_type == "AvgPool2d":
                kernel_size, stride, padding = layer_params
                self.layers.append(
                    nn.AvgPool2d(kernel_size, stride, padding))
            elif layer_type == "MaxPool2d":
                kernel_size, stride, padding = layer_params
                self.layers.append(
                    nn.MaxPool2d(kernel_size, stride, padding))
            elif layer_type == "ResBlock":
                kernel_size, num_channels, depth = layer_params
                self.layers.append(
                    ResBlock(kernel_size, num_channels, depth=depth))
            elif layer_type == "ResBlockTransition":
                kernel_size, in_channels, out_channels, depth = layer_params
                self.layers.append(
                    ResBlockTransition(kernel_size, in_channels, out_channels, depth=depth))
            elif layer_type == "Flatten":
                self.layers.append(
                    Flatten())
            elif layer_type == "Linear":
                input_size, output_size = layer_params
                self.layers.append(
                    nn.Linear(input_size, output_size))
            elif layer_type == "ReLU":
                self.layers.append(
                    nn.ReLU())
            else:
                raise ValueError(f"Unsupported layer type: {layer_type}")
                
    def forward(self, x):
        for layer in self.layers:
            x = layer(x)
        return x


In [201]:
res_arch = OrderedDict()

# 1: Convolutional layer
res_arch["Conv2d_1"] = (3, 16, 3, 1, 1)  # 32x32x3 -> 32x32x16, 3x3 kernel, stride=1, padding=1
res_arch["MaxPool2d_1"] = (2, 2, 0)  # 32x32x16 -> 16x16x16, 2x2 kernel, stride=2, padding = 0
res_arch["ReLU_1"] = None
# 2: ResBlock
res_arch["ResBlock_2"] = (3, 16, 2) # 3x3 kernels, 16 channels, 2 layers
# 3: ResBlock
res_arch["ResBlockTransition_3"] = (3, 16, 32, 2) # 3x3 kernels, 32 channels, 2 layers
# 4: AvgPool
res_arch["AvgPool2d_4"] = (2, 2, 0) # 8x8x32 -> 4x4x32, 2x2 kernel, stride=2, padding = 0
# 5: Flatten
res_arch["Flatten_5"] = None  # 4x4x32 -> 512
# 6: Fully Connected Layer
res_arch["Linear"] = (512, 10) # 512 hidden units -> 10 output units

res_arch

OrderedDict([('Conv2d_1', (3, 16, 3, 1, 1)),
             ('MaxPool2d_1', (2, 2, 0)),
             ('ReLU_1', None),
             ('ResBlock_2', (3, 16, 2)),
             ('ResBlockTransition_3', (3, 16, 32, 2)),
             ('AvgPool2d_4', (2, 2, 0)),
             ('Flatten_5', None),
             ('Linear', (512, 10))])

In [202]:
res_small = ConvNet(res_arch)

print(res_small)

ConvNet(
  (layers): ModuleList(
    (0): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (2): ReLU()
    (3): ResBlock(
      (convolution_block): ModuleList(
        (0): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      )
    )
    (4): ResBlockTransition(
      (convolution_block): ModuleList(
        (0): Conv2d(16, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
        (1): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      )
      (transition): Conv2d(16, 32, kernel_size=(1, 1), stride=(2, 2))
    )
    (5): AvgPool2d(kernel_size=2, stride=2, padding=0)
    (6): Flatten()
    (7): Linear(in_features=512, out_features=10, bias=True)
  )
)


In [203]:
x = torch.randn(10, 3, 32, 32)

y = res_small(x)
print(y.size())

torch.Size([10, 10])


In [204]:
cross_entropy = nn.CrossEntropyLoss()
adam = optim.Adam(res_small.parameters(), lr=0.001)

In [205]:
train(res_small, cifar10_trainloader, adam, cross_entropy, num_epochs=3)

[1000/6000] loss: 1.751
[2000/6000] loss: 1.434
[3000/6000] loss: 1.340
[4000/6000] loss: 1.264
[5000/6000] loss: 1.201
[1000/6000] loss: 1.121
[2000/6000] loss: 1.101
[3000/6000] loss: 1.070
[4000/6000] loss: 1.038
[5000/6000] loss: 1.010
[1000/6000] loss: 0.967
[2000/6000] loss: 0.957
[3000/6000] loss: 0.930
[4000/6000] loss: 0.954
[5000/6000] loss: 0.901


In [206]:
evaluate(res_small, cifar10_testloader)

Test accuracy on 10000 test images: 68.2100 %


## Deeper ResNet

In [222]:
resnet10_arch = OrderedDict()

# 1: Convolutional layer
resnet10_arch["Conv2d_1"] = (3, 16, 3, 1, 1)  # 32x32x3 -> 32x32x16, 3x3 kernel, stride=1, padding=1
resnet10_arch["MaxPool2d_1"] = (2, 2, 0)  # 32x32x16 -> 16x16x16, 2x2 kernel, stride=2, padding = 0
resnet10_arch["ReLU_1"] = None
# 2: ResBlock
resnet10_arch["ResBlock_2"] = (3, 16, 2) # 3x3 kernels, 16 channels, 2 layers
# 3: ResBlock
resnet10_arch["ResBlock_3"] = (3, 16, 2) # 3x3 kernels, 16 channels, 2 layers
# 4: ResBlock
resnet10_arch["ResBlockTransition_4"] = (3, 16, 32, 2) # 3x3 kernels, 32 in channels, 64 out channels, 2 layers
# 5: AvgPool
resnet10_arch["AvgPool2d_5"] = (2, 2, 0) # 8x8x32 -> 4x4x32, 2x2 kernel, stride=2, padding = 0
# 6: Flatten
resnet10_arch["Flatten"] = None  # 4x4x32 -> 512
# 7: Fully Connected Layer
resnet10_arch["Linear"] = (512, 10) # 512 hidden units -> 10 output units

resnet10_arch

OrderedDict([('Conv2d_1', (3, 16, 3, 1, 1)),
             ('MaxPool2d_1', (2, 2, 0)),
             ('ReLU_1', None),
             ('ResBlock_2', (3, 16, 2)),
             ('ResBlock_3', (3, 16, 2)),
             ('ResBlockTransition_4', (3, 16, 32, 2)),
             ('AvgPool2d_5', (2, 2, 0)),
             ('Flatten', None),
             ('Linear', (512, 10))])

In [213]:
resnet10 = ConvNet(resnet10_arch)

print(resnet10)

ConvNet(
  (layers): ModuleList(
    (0): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (2): ReLU()
    (3): ResBlock(
      (convolution_block): ModuleList(
        (0): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      )
    )
    (4): ResBlock(
      (convolution_block): ModuleList(
        (0): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      )
    )
    (5): ResBlockTransition(
      (convolution_block): ModuleList(
        (0): Conv2d(16, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
        (1): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      )
      (transition): Conv2d(16, 32, kernel_size=(1, 1), stride=(2, 2))
    )
    (6): AvgPool2d(kern

In [217]:
x = torch.randn(10, 3, 32, 32)

y = resnet10(x)
print(y.size())

torch.Size([10, 10])


In [218]:
cross_entropy = nn.CrossEntropyLoss()
adam = optim.Adam(resnet10.parameters(), lr=0.001)

In [219]:
train(resnet10, cifar10_trainloader, adam, cross_entropy, num_epochs=3)

[1000/6000] loss: 1.779
[2000/6000] loss: 1.416
[3000/6000] loss: 1.299
[4000/6000] loss: 1.211
[5000/6000] loss: 1.157
[1000/6000] loss: 1.086
[2000/6000] loss: 1.047
[3000/6000] loss: 1.018
[4000/6000] loss: 1.013
[5000/6000] loss: 0.971
[1000/6000] loss: 0.911
[2000/6000] loss: 0.921
[3000/6000] loss: 0.908
[4000/6000] loss: 0.895
[5000/6000] loss: 0.904


In [221]:
evaluate(resnet10, cifar10_testloader)

Test accuracy on 10000 test images: 68.7200 %


# ResNet 12 for CIFAR-10

Note: ResNet12 takes a little too long to train

In [223]:
resnet12_arch = OrderedDict()

# 1: Convolutional layer
resnet12_arch["Conv2d_1"] = (3, 16, 3, 1, 1)  # 32x32x3 -> 32x32x16, 3x3 kernel, stride=1, padding=1
resnet12_arch["MaxPool2d_1"] = (2, 2, 0)  # 32x32x16 -> 16x16x16, 2x2 kernel, stride=2, padding = 0
resnet12_arch["ReLU_1"] = None
# 2: ResBlock
resnet12_arch["ResBlock_2"] = (3, 16, 2) # 3x3 kernels, 16 channels, 2 layers
# 3: ResBlock
resnet12_arch["ResBlock_3"] = (3, 16, 2) # 3x3 kernels, 16 channels, 2 layers
# 4: ResBlock
resnet12_arch["ResBlockTransition_4"] = (3, 16, 32, 2) # 3x3 kernels, 32 in channels, 64 out channels, 2 layers
# 5: ResBlock
resnet12_arch["ResBlock_5"] = (3, 32, 2) # 3x3 kernels, 32 channels, 2 layers
# 6: AvgPool
resnet12_arch["AvgPool2d_5"] = (2, 2, 0) # 8x8x32 -> 4x4x32, 2x2 kernel, stride=2, padding = 0
# 7: Flatten
resnet12_arch["Flatten"] = None  # 4x4x32 -> 512
# 8: Fully Connected Layer
resnet12_arch["Linear"] = (512, 10) # 512 hidden units -> 10 output units

resnet12_arch

OrderedDict([('Conv2d_1', (3, 16, 3, 1, 1)),
             ('MaxPool2d_1', (2, 2, 0)),
             ('ReLU_1', None),
             ('ResBlock_2', (3, 16, 2)),
             ('ResBlock_3', (3, 16, 2)),
             ('ResBlockTransition_4', (3, 16, 32, 2)),
             ('ResBlock_5', (3, 32, 2)),
             ('AvgPool2d_5', (2, 2, 0)),
             ('Flatten', None),
             ('Linear', (512, 10))])

In [227]:
resnet12 = ConvNet(resnet12_arch)

print(resnet12)

ConvNet(
  (layers): ModuleList(
    (0): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (2): ReLU()
    (3): ResBlock(
      (convolution_block): ModuleList(
        (0): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      )
    )
    (4): ResBlock(
      (convolution_block): ModuleList(
        (0): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      )
    )
    (5): ResBlockTransition(
      (convolution_block): ModuleList(
        (0): Conv2d(16, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
        (1): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      )
      (transition): Conv2d(16, 32, kernel_size=(1, 1), stride=(2, 2))
    )
    (6): ResBlock(
    

In [228]:
x = torch.randn(10, 3, 32, 32)

y = resnet12(x)
print(y.size())

torch.Size([10, 10])


In [229]:
cross_entropy = nn.CrossEntropyLoss()
adam = optim.Adam(resnet12.parameters(), lr=0.001)

In [230]:
train(resnet12, cifar10_trainloader, adam, cross_entropy, num_epochs=3)

[1000/6000] loss: 1.786
[2000/6000] loss: 1.481
[3000/6000] loss: 1.334
[4000/6000] loss: 1.242
[5000/6000] loss: 1.168
[1000/6000] loss: 1.076
[2000/6000] loss: 1.053
[3000/6000] loss: 1.034
[4000/6000] loss: 1.027
[5000/6000] loss: 0.976
[1000/6000] loss: 0.914
[2000/6000] loss: 0.904
[3000/6000] loss: 0.919
[4000/6000] loss: 0.903
[5000/6000] loss: 0.880


In [231]:
evaluate(resnet12, cifar10_testloader)

Test accuracy on 10000 test images: 67.9800 %


## Extra Fully Connected Layer

In [235]:
resnet_fc_arch = OrderedDict()

# 1: Convolresnet_fcutional layer
resnet_fc_arch["Conv2d_1"] = (3, 16, 3, 1, 1)  # 32x32x3 -> 32x32x16, 3x3 kernel, stride=1, padding=1
resnet_fc_arch["MaxPool2d_1"] = (2, 2, 0)  # 32x32x16 -> 16x16x16, 2x2 kernel, stride=2, padding = 0
resnet_fc_arch["ReLU_1"] = None
# 2: ResBlock
resnet_fc_arch["ResBlock_2"] = (3, 16, 2) # 3x3 kernels, 16 channels, 2 layers
# 3: ResBlock
resnet_fc_arch["ResBlock_3"] = (3, 16, 2) # 3x3 kernels, 16 channels, 2 layers
# 4: ResBlock
resnet_fc_arch["ResBlockTransition_4"] = (3, 16, 32, 2) # 3x3 kernels, 32 in channels, 64 out channels, 2 layers
# 5: AvgPool
resnet_fc_arch["AvgPool2d_5"] = (2, 2, 0) # 8x8x32 -> 4x4x32, 2x2 kernel, stride=2, padding = 0
# 6: Flatten
resnet_fc_arch["Flatten_6"] = None  # 4x4x32 -> 512
# 7: Fully Connected Layer
resnet_fc_arch["Linear_1"] = (512, 256) # 512 hidden units -> 256 hidden units
# 7: Fully Connected Layer
resnet_fc_arch["Linear_2"] = (256, 10) # 256 hidden units -> 10 hidden units

resnet_fc_arch

OrderedDict([('Conv2d_1', (3, 16, 3, 1, 1)),
             ('MaxPool2d_1', (2, 2, 0)),
             ('ReLU_1', None),
             ('ResBlock_2', (3, 16, 2)),
             ('ResBlock_3', (3, 16, 2)),
             ('ResBlockTransition_4', (3, 16, 32, 2)),
             ('AvgPool2d_5', (2, 2, 0)),
             ('Flatten_6', None),
             ('Linear_1', (512, 256)),
             ('Linear_2', (256, 10))])

In [236]:
resnet_fc = ConvNet(resnet_fc_arch)

print(resnet_fc)

ConvNet(
  (layers): ModuleList(
    (0): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (2): ReLU()
    (3): ResBlock(
      (convolution_block): ModuleList(
        (0): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      )
    )
    (4): ResBlock(
      (convolution_block): ModuleList(
        (0): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      )
    )
    (5): ResBlockTransition(
      (convolution_block): ModuleList(
        (0): Conv2d(16, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
        (1): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      )
      (transition): Conv2d(16, 32, kernel_size=(1, 1), stride=(2, 2))
    )
    (6): AvgPool2d(kern

In [237]:
x = torch.randn(10, 3, 32, 32)

y = resnet_fc(x)
print(y.size())

torch.Size([10, 10])


In [238]:
cross_entropy = nn.CrossEntropyLoss()
adam = optim.Adam(resnet_fc.parameters(), lr=0.001)

In [239]:
train(resnet_fc, cifar10_trainloader, adam, cross_entropy, num_epochs=3)

[1000/6000] loss: 1.727
[2000/6000] loss: 1.422
[3000/6000] loss: 1.301
[4000/6000] loss: 1.195
[5000/6000] loss: 1.176
[1000/6000] loss: 1.092
[2000/6000] loss: 1.063
[3000/6000] loss: 1.024
[4000/6000] loss: 1.021
[5000/6000] loss: 1.003
[1000/6000] loss: 0.923
[2000/6000] loss: 0.924
[3000/6000] loss: 0.923
[4000/6000] loss: 0.936
[5000/6000] loss: 0.931


In [241]:
evaluate(resnet10, cifar10_testloader)

Test accuracy on 10000 test images: 65.8000 %
