#### Implementation of ResNet architecture using PyTorch

A Residual Network, also known as ResNet, is a deep neural network architecture that introduced the concept of residual blocks. The key idea behind ResNet is the use of residual blocks, which enable the training of extremely deep networks. Traditional deep neural networks suffer from the problem of vanishing gradients, where the gradients become increasingly small as they propagate backward through multiple layers. This makes it difficult to train very deep networks because the earlier layers receive weak gradient and have trouble learning meaningful representation.

![Residual Connections](./residual_connections.png)

To address this issue, ResNet introduces residual connections, also called skip connections, that allow for the direct flow of information from earlier layers to later layers. Instead of trying to learn the direct mapping from input to the output, ResNet learns the residual mapping - The difference between the input and output - making it easier for the network to optimize and learn the underlying features.

##### Residual Block

Residual ResNet block is composed of two layers of 3x3 convolutional layer/batch normalization/relu. In the picture below, the lines represent the residual operation.

![Basic ResNet Block](residual_resnet_block.png)

In [68]:
import torch
import torch.nn as nn
import torch.nn.functional as functional

In [69]:
class ResidualBlock(nn.Module):
    expansion = 1

    def __init__(self, in_channels: int, out_channels: int, stride: int = 1):
        super(ResidualBlock, self).__init__()
        self.conv1 = nn.Conv2d(
            in_channels,
            out_channels,
            kernel_size=3,
            stride=stride,
            padding=1,
            bias=False
        )
        self.bn1 = nn.BatchNorm2d(out_channels)

        self.conv2 = nn.Conv2d(
            out_channels,
            out_channels,
            kernel_size=3,
            stride=1,
            padding=1,
            bias=False
        )
        self.bn2 = nn.BatchNorm2d(out_channels)

        self.shortcut = nn.Sequential()
        if stride != 1 or in_channels != out_channels:
            self.shortcut = nn.Sequential(
                nn.Conv2d(
                    in_channels,
                    self.expansion * out_channels,
                    kernel_size=1,
                    stride=stride,
                    bias=False
                ),
                nn.BatchNorm2d(self.expansion * out_channels)
            )

        self.relu = nn.ReLU(inplace=True)

    def forward(self, x):
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out += self.shortcut(x)
        return self.relu(out)

##### Bottleneck Version for Deeper Networks

To increase the network depth while keeping the parameters size as low as possible, the authors defined a BottleNeck block that “The three layers are 1x1, 3x3, and 1x1 convolutions, where the 1×1 layers are responsible for reducing and then increasing (restoring) dimensions, leaving the 3×3 layer a bottleneck with smaller input/output dimensions.”

![BottleNeck Block](./bottleneck_resnet_block.png)

In [70]:
class BottleneckBlock(nn.Module):
    expansion = 4

    def __init__(self, in_channels: int, out_channels: int, stride: int = 1):
        super(BottleneckBlock, self).__init__()

        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=1, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channels)

        self.conv2 = nn.Conv2d(
            out_channels,
            out_channels,
            kernel_size=3,
            stride=stride,
            padding=1,
            bias=False
        )
        self.bn2 = nn.BatchNorm2d(out_channels)

        self.conv3 = nn.Conv2d(
            out_channels,
            self.expansion * out_channels,
            kernel_size=1,
            bias=False
        )
        self.bn3 = nn.BatchNorm2d(self.expansion * out_channels)
        self.shortcut = nn.Sequential()
        if stride != 1 or in_channels != self.expansion * out_channels:
            self.shortcut = nn.Sequential(
                nn.Conv2d(
                    in_channels,
                    self.expansion * out_channels,
                    kernel_size=1,
                    stride=stride,
                    bias=False
                ),
                nn.BatchNorm2d(self.expansion * out_channels)
            )

        self.relu = nn.ReLU(inplace=True)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        out += self.shortcut(x)
        return self.relu(out)


##### ResNet Layer
A ResNet’s layer is composed of the same blocks stacked one after the other.

In [71]:
class ResNet(nn.Module):
    def __init__(self, block, num_blocks, in_channel=3, zero_init_residual=False):
        super(ResNet, self).__init__()

        self.in_planes = 64
        self.conv1 = nn.Conv2d(
            in_channel,
            64,
            kernel_size=3,
            stride=1,
            padding=1,
            bias=False
        )
        self.bn1 = nn.BatchNorm2d(64)
        self.layer1 = self._make_layer(block, 64, num_blocks[0], stride=1)
        self.layer2 = self._make_layer(block, 128, num_blocks[1], stride=2)
        self.layer3 = self._make_layer(block, 256, num_blocks[2], stride=2)
        self.layer4 = self._make_layer(block, 512, num_blocks[3], stride=2)
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode="fan_out", nonlinearity="relu")
            elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)

        if zero_init_residual:
            for m in self.modules():
                if isinstance(m, BottleneckBlock):
                    nn.init.constant_(m.bn3.weight, 0)
                elif isinstance(m, ResidualBlock):
                    nn.init.constant_(m.bn2.weight, 0)

    def _make_layer(self, block, planes, num_blocks, stride):
        strides = [stride] + [1] * (num_blocks - 1)
        layers = []
        for i in range(num_blocks):
            stride = strides[i]
            layers.append(block(self.in_planes, planes, stride))
            self.in_planes = planes * block.expansion

        return nn.Sequential(*layers)

    def forward(self, x):
        out = functional.relu(self.bn1(self.conv1(x)))
        out = self.layer1(out)
        out = self.layer2(out)
        out = self.layer3(out)
        out = self.layer4(out)
        out = self.avgpool(out)
        out = torch.flatten(out, 1)
        return out

##### Model Architecture
ResNet architectures can vary in depth, with deeper versions having more residual blocks. Common variants include ResNet-18, ResNet-34, ResNet-50, ResNet-101, and ResNet-152, where the numbers represent the total number of layers in the network.

![ResNet 18](./resnet_18_architecture.png)
![ResNet 50](./resnet_50_architecture.png)

In [72]:
def resnet18():
    return ResNet(ResidualBlock, [2, 2, 2, 2])


def resnet50():
    return ResNet(BottleneckBlock, [3, 4, 6, 3])


def resnet101():
    return ResNet(BottleneckBlock, [3, 4, 23, 3])


def resnet152():
    return ResNet(BottleneckBlock, [3, 8, 36, 3])

In [75]:
from torchsummary import summary

model = resnet50()
summary(model, (3, 224, 224))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1         [-1, 64, 224, 224]           1,728
       BatchNorm2d-2         [-1, 64, 224, 224]             128
            Conv2d-3         [-1, 64, 224, 224]           4,096
       BatchNorm2d-4         [-1, 64, 224, 224]             128
              ReLU-5         [-1, 64, 224, 224]               0
            Conv2d-6         [-1, 64, 224, 224]          36,864
       BatchNorm2d-7         [-1, 64, 224, 224]             128
              ReLU-8         [-1, 64, 224, 224]               0
            Conv2d-9        [-1, 256, 224, 224]          16,384
      BatchNorm2d-10        [-1, 256, 224, 224]             512
           Conv2d-11        [-1, 256, 224, 224]          16,384
      BatchNorm2d-12        [-1, 256, 224, 224]             512
             ReLU-13        [-1, 256, 224, 224]               0
  BottleneckBlock-14        [-1, 256, 2

In [74]:
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.1, weight_decay=0.001, momentum=0.9)