A simple Resnet block with 1x1 convolution if dimension match is required.
So if downsampling is applied (the input feature map width and height size halved), 
we apply 1x1 convolution to the identity block to downsample that too, in order to match the
dimensions before summation.

![Screenshot%202023-02-01%20at%2021-04-36%201512.03385.pdf.png](attachment:Screenshot%202023-02-01%20at%2021-04-36%201512.03385.pdf.png)

In [1]:
from torch import nn
from torchsummary import summary
import cv2
import torchvision.transforms as transforms
import torch

class ResNetBlock(nn.Module):

    def __init__(self, c_in, downsample=False, c_out=-1):
        """
        Inputs:
            c_in - Number of input features
            downsample - If True, we want to apply a stride inside the block and reduce the output shape by 2 in height and width
            c_out - Number of output features. Note that this is only relevant if downsample is True, as otherwise, c_out = c_in
        """
        super().__init__()
        self.downsample = downsample
        if not downsample:
            c_out = c_in

        # Network representing F
        self.net = nn.Sequential(
            nn.Conv2d(c_in, c_out, kernel_size=3, padding=1, stride=1 if not self.downsample else 2), 
            nn.BatchNorm2d(c_out),
            nn.ReLU(),
            nn.Conv2d(c_out, c_out, kernel_size=3, padding=1),
            nn.BatchNorm2d(c_out)
        )

        self.downsample_net = nn.Sequential(
                             nn.Conv2d(c_in, c_out, kernel_size=1, stride=2),
                             nn.ReLU())

    def forward(self, x):
        z = self.net(x)
        if self.downsample:
            x = self.downsample_net(x)
        out = z + x
        print("feature map shape after resnet block applied with downsample:", self.downsample, " ", out.shape)
        out = nn.ReLU(out)
        return out



Read the image, apply necessary transforms and initial convolution 

In [2]:
img = cv2.imread("data_flowers/daisy/100080576_f52e8ee070_n.jpg") 

transform = transforms.Compose([
    transforms.ToPILImage(),
    transforms.Resize((224,224)),
    transforms.ToTensor()
])  

img = transform(img)
img = img.unsqueeze(0)

initial_conv = nn.Conv2d(3, 64, kernel_size=7, padding=1, stride=2) #apply the first convolution to create 64 feature maps


Pass the image through simple resnet block without downsampling

In [3]:
out = initial_conv(img)
print("initial feature map shape", out.shape)
resnetblock = ResNetBlock(64)
out = resnetblock(out)

initial feature map shape torch.Size([1, 64, 110, 110])
feature map shape after resnet block applied with downsample: False   torch.Size([1, 64, 110, 110])


Pass the image through resnet block with downsampling

In [4]:
out = initial_conv(img)
print("initial feature map shape", out.shape)
resnetblock = ResNetBlock(64, downsample=True, c_out=64)
out = resnetblock(out)

initial feature map shape torch.Size([1, 64, 110, 110])
feature map shape after resnet block applied with downsample: True   torch.Size([1, 64, 55, 55])


Implementing the bottleneck block

![Screenshot%202023-02-01%20at%2020-50-11%201512.03385.pdf%20%28copy%29.png](attachment:Screenshot%202023-02-01%20at%2020-50-11%201512.03385.pdf%20%28copy%29.png)

In [5]:
class BottleneckBlock(nn.Module):

    def __init__(self, c_in, downsample=False):
        """
        Inputs:
            c_in - Number of input features
            downsample - If True, we want to apply a stride inside the block and reduce the output shape by 2 in height and width
            c_out - Number of output features. Note that this is only relevant if downsample is True, as otherwise, c_out = c_in
        """
        super().__init__()
       
        c_out = int(c_in / 2)
        # Network representing F
        self.net = nn.Sequential(
            nn.Conv2d(c_in, c_out, kernel_size=1, padding=0, stride=1), #squeeze
            nn.BatchNorm2d(c_out),
            nn.ReLU(),
            nn.Conv2d(c_out, c_out, kernel_size=3, padding=1, stride=1),
            nn.BatchNorm2d(c_out),
            nn.ReLU(),
            nn.Conv2d(c_out, c_in, kernel_size=1, padding=0, stride=1), #expand
        )

    def forward(self, x):
        z = self.net(x)
        out = z + x
        print("feature map shape after bottlenet block:", out.shape)
        out = nn.ReLU(out)
        return out


In [6]:
out = initial_conv(img)
print("initial feature map shape", out.shape)
bottlenetblock = BottleneckBlock(64, downsample=True)
out = bottlenetblock(out)

initial feature map shape torch.Size([1, 64, 110, 110])
feature map shape after bottlenet block: torch.Size([1, 64, 110, 110])


<b> Quick Tip </b> <br>
1. Note that we use simple sum (+) in resnet to add identity input to the main input whereas we used torch.concat function in Inception Module to concatanate the feature maps coming from different levels of inception module. That's why we need to have exact dimensions of feature maps for identity and main block in resnet, where as in Inception Module the width and height size should match, but channel dimension may be different and we obtain the final feature map size by summing up all different channel amounts.

2. Shallow ResNet architectures like ResNet18, ResNet34 doesnt actually use bottleneck blocks but they only use residual blocks. We see that the deeper the architecture become, the more bottleneck blocks are used. 

![Screenshot%20from%202023-02-06%2015-55-27.png](attachment:Screenshot%20from%202023-02-06%2015-55-27.png)