<a href="https://colab.research.google.com/github/Redcoder815/Deep_Learning_PyTorch/blob/main/22ResNetAndResNext.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [3]:
import torch
from torch import nn
import torchvision.transforms as transforms
from torch.utils import data
from torchvision import datasets
import torch.optim as optim
from torch.nn import functional as F

In [2]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
device

device(type='cuda', index=0)

In [3]:
class Residual(nn.Module):
    """The Residual block of ResNet models."""
    def __init__(self, num_channels, use_1x1conv=False, strides=1):
        super().__init__()
        self.conv1 = nn.LazyConv2d(num_channels, kernel_size=3, padding=1,
                                   stride=strides)
        self.conv2 = nn.LazyConv2d(num_channels, kernel_size=3, padding=1)
        if use_1x1conv:
            self.conv3 = nn.LazyConv2d(num_channels, kernel_size=1,
                                       stride=strides)
        else:
            self.conv3 = None
        self.bn1 = nn.LazyBatchNorm2d()
        self.bn2 = nn.LazyBatchNorm2d()

    def forward(self, X):
        Y = F.relu(self.bn1(self.conv1(X)))
        Y = self.bn2(self.conv2(Y))
        if self.conv3:
            X = self.conv3(X)
        Y += X
        return F.relu(Y)

In [4]:
blk = Residual(3)
X = torch.randn(4, 3, 6, 6)
blk(X).shape

torch.Size([4, 3, 6, 6])

When use_1x1conv=True and strides=2, the output shape is torch.Size([4, 6, 3, 3]) due to the following:

Batch Size and Output Channels: The first two dimensions 4 and 6 correspond to the batch size of the input X (which is 4) and the num_channels specified in the Residual block constructor (which is 6).

Spatial Dimensions (Height and Width):

Both the main convolutional path (conv1) and the shortcut path (conv3) apply a stride of 2. A stride of 2 effectively halves the spatial dimensions (height and width) of the input feature map.
The input spatial dimensions are 6x6.
For a convolutional layer, the output spatial dimension can be calculated as floor((Input_Dim - Kernel_Size + 2 * Padding) / Stride) + 1.
For self.conv1 (kernel_size=3, padding=1, stride=2): floor((6 - 3 + 2*1) / 2) + 1 = floor(5 / 2) + 1 = 2 + 1 = 3. So, the height and width become 3x3.
For self.conv3 (kernel_size=1, padding=0 by default for 1x1, stride=2): floor((6 - 1 + 2*0) / 2) + 1 = floor(5 / 2) + 1 = 2 + 1 = 3. So, the height and width become 3x3.
Since both paths result in a 6-channel, 3x3 feature map, they can be added together, and the final output retains this shape.

-----------------------------------------------

That's a good question! Let's break down self.conv2:

self.conv2 = nn.LazyConv2d(num_channels, kernel_size=3, padding=1):

It also outputs num_channels (which is 6 in the example).
It has a kernel_size=3 and padding=1.
Crucially, if strides is not specified, it defaults to 1.
When kernel_size=3, padding=1, and stride=1, the convolutional layer preserves the spatial dimensions of its input.

The output from self.conv1 (after ReLU and BatchNorm) is 4, 6, 3, 3. When this passes through self.conv2, its dimensions remain 3x3. So, self.conv2 takes an input of 3x3 spatial dimensions and outputs 3x3 spatial dimensions.

In summary, self.conv1 (with strides=2) reduces the spatial dimensions from 6x6 to 3x3. self.conv2 then processes this 3x3 feature map but does not further change its spatial dimensions because its stride is 1. Therefore, the output of the main path (self.conv2(F.relu(self.bn1(self.conv1(X))))) and the shortcut path (self.conv3(X)) both have the spatial dimensions 3x3, allowing them to be added together.

---------------------------------------------------

You're right to ask for that clarification! Let's re-emphasize the first two dimensions of the torch.Size([4, 6, 3, 3]) output:

The 4 (Batch Size): This first dimension directly corresponds to the batch size of your input tensor X. When you defined X = torch.randn(4, 3, 6, 6), the 4 specified that there are 4 independent samples in the batch. This dimension is preserved throughout the network unless an operation explicitly changes it (e.g., a batch-wise aggregation).

The 6 (Output Channels): This second dimension comes from the num_channels parameter you passed to the Residual block constructor. In your example, you created blk = Residual(6, use_1x1conv=True, strides=2). The 6 here dictates that both self.conv1, self.conv2, and self.conv3 (if use_1x1conv is True) will output 6 feature channels. The Residual block is designed to maintain the number of channels specified by num_channels as its output feature depth.

So, torch.Size([batch_size, num_channels, height, width]) is torch.Size([4, 6, 3, 3]) because:

4 is your input batch size.
6 is the num_channels you set in Residual(6, ...).
3x3 are the spatial dimensions after the strided convolutions, as we discussed.

In [5]:
blk = Residual(6, use_1x1conv=True, strides=2)
blk(X).shape

torch.Size([4, 6, 3, 3])

Certainly! Let's break down the line self.net.add_module(f'b{i+2}', self.block(*b, first_block=(i==0))) from the ResNet class:

self.net: This refers to the nn.Sequential container that makes up the main body of your ResNet model. It was initialized earlier with self.net = nn.Sequential(self.b1()).
.add_module(name, module): This is a method available for nn.Module (and thus for nn.Sequential). It's used to add a new child module to the self.net sequential container, giving it a specific name.
f'b{i+2}': This is the name given to the new module being added. It's an f-string that dynamically generates a name for each block of residual layers. For example, when i=0, the name will be 'b2' (since self.b1() is already added as 'b1'). When i=1, it will be 'b3', and so on.
self.block(*b, first_block=(i==0)): This is the actual module that is being added. It calls the self.block method (defined within your ResNet class) to construct a sequence of Residual layers. Let's look at its arguments:
*b: The arch parameter in the ResNet constructor is expected to be a list of tuples (e.g., [(2, 64), (2, 128)]). Each tuple b contains two values: num_residuals (number of residual layers in this block) and num_channels (number of output channels for these layers). The *b syntax unpacks this tuple so that self.block receives these as separate arguments: self.block(num_residuals, num_channels, ...). For example, if b = (2, 64), it calls self.block(2, 64, ...).
first_block=(i==0): This is a boolean flag passed to self.block. It evaluates to True only for the very first iteration of the loop (i=0). This flag is crucial within the self.block method to determine if the first Residual layer in that particular block should perform downsampling (by setting use_1x1conv=True and strides=2) or not. This is a common pattern in ResNet to downsample the feature map at the beginning of certain blocks.
In summary: This line of code iteratively adds blocks of Residual layers to the ResNet model. Each block is named sequentially (e.g., b2, b3), and the first Residual layer within each new block (except the very first block overall) might perform spatial downsampling based on the first_block flag to gradually reduce the feature map size while increasing the number of channels.

----------------------------------------

blk.append(Residual(num_channels, use_1x1conv=True, strides=2)):

This line of code is found within the block method of your ResNet class. It's responsible for creating and adding a specific type of Residual block to a list (blk), which will then form part of the larger nn.Sequential block.

Here's what each part signifies:

blk.append(...): This means you're adding an instance of a Residual module to the list blk. The block method aggregates several Residual layers into a single nn.Sequential module.

Residual(num_channels, use_1x1conv=True, strides=2):

num_channels: This argument specifies the number of output channels for the convolutional layers within this particular Residual block. For example, if num_channels is 64, then self.conv1, self.conv2, and self.conv3 (if used) inside this Residual instance will all output 64 channels.
use_1x1conv=True: This is a crucial parameter. When set to True, it tells the Residual block to employ a 1x1 convolution in its shortcut path (the self.conv3 in your Residual class). This 1x1 convolution serves two main purposes in ResNet:
Channel Matching: If the main path of the residual block changes the number of channels (which it often does in ResNet stages), the 1x1 convolution in the shortcut path ensures that the number of channels matches between the input X and the output Y of the main path, so they can be added together (Y += X).
Spatial Downsampling: When combined with strides > 1 (as in this case), the 1x1 convolution in the shortcut path also performs spatial downsampling.
strides=2: This parameter further defines the behavior of the Residual block. When strides=2 is used in conjunction with use_1x1conv=True (which it is here):
The first convolutional layer in the main path (self.conv1) will use a stride of 2, effectively halving the spatial dimensions (height and width) of the feature map.
The 1x1 convolution in the shortcut path (self.conv3) will also use a stride of 2, ensuring that its output also has halved spatial dimensions, matching the output of the main path.
In summary: This specific line creates a Residual block that is designed to change the number of channels to num_channels and simultaneously reduce the spatial dimensions of the feature map by half. This is typically used at the beginning of a new stage in a ResNet architecture to downsample the feature maps while increasing their depth (number of channels).

-----------------------------------

That's a great question for clarification! No, the line blk.append(Residual(num_channels, use_1x1conv=True, strides=2)) itself does not directly use or call the forward method of the Residual module.

Here's why:

Instantiation (__init__): When you write Residual(...), you are creating an instance of the Residual class. At this point, the __init__ method of the Residual class is called. This method is responsible for setting up the layers (conv1, conv2, bn1, bn2, conv3) and their configurations (like num_channels, kernel_size, strides).
Execution (forward): The forward method of a PyTorch nn.Module is called implicitly when you pass an input tensor through an instance of that module. For example, if you had my_residual_block = Residual(...), then output = my_residual_block(input_tensor) would cause the forward method to be executed.
So, the blk.append(...) line is involved in constructing the network by adding a configured Residual block. The forward method of that Residual block will only be called later, when data is actually passed through the entire ResNet model (which contains this blk as part of its self.net sequential container).

---------------------------------------

That's an excellent question to understand how the architecture is constructed! Let's break down how i and b take values from arch and how *b works in self.block.

How i and b get values from arch (for i, b in enumerate(arch):)

arch is a list of tuples, for example, ((2, 64), (2, 128), (2, 256), (2, 512)). Each tuple represents a 'stage' or 'block' of residual layers in the ResNet.
enumerate(arch): This built-in Python function is used to iterate over a sequence (like arch) while also keeping track of the index of each item. It yields pairs of (index, item).
for i, b in enumerate(arch):
In each iteration of the loop, i receives the current index.
b receives the current item (which is one of the tuples from arch).
Let's trace it:

First iteration: i will be 0, and b will be (2, 64).
Second iteration: i will be 1, and b will be (2, 128).
And so on, for each tuple in arch.
How *b passes values to self.block (self.block(*b, ...) )

You defined self.block to accept arguments like this: def block(self, num_residuals, num_channels, first_block=False):
The * (asterisk) operator before b is called the unpacking operator.
When you use *b in a function call, it takes the elements of the iterable b (which is a tuple in this case) and unpacks them as separate, positional arguments to the function.
Let's use our example values:

When b is (2, 64) (from the first iteration):

self.block(*b, ...) effectively becomes self.block(2, 64, ...).
So, num_residuals inside self.block will be 2, and num_channels will be 64.
When b is (2, 128) (from the second iteration):

self.block(*b, ...) effectively becomes self.block(2, 128, ...).
So, num_residuals inside self.block will be 2, and num_channels will be 128.
In essence, enumerate provides both the count (i) and the architectural tuple (b), and the *b syntax conveniently passes the components of that tuple directly as arguments to the block method, allowing you to define the number of residual layers and channels for each stage of your ResNet.

In [6]:
class ResNet(nn.Module):
    def __init__(self, arch, num_classes=10):
        super().__init__()
        self.net = nn.Sequential(self.b1())
        for i, b in enumerate(arch):
            self.net.add_module(f'b{i+2}', self.block(*b, first_block=(i==0)))
        self.net.add_module('last', nn.Sequential(
            nn.AdaptiveAvgPool2d((1, 1)), nn.Flatten(),
            nn.LazyLinear(num_classes)))

    def b1(self):
        return nn.Sequential(
            nn.LazyConv2d(64, kernel_size=7, stride=2, padding=3),
            nn.LazyBatchNorm2d(), nn.ReLU(),
            nn.MaxPool2d(kernel_size=3, stride=2, padding=1))

    def block(self, num_residuals, num_channels, first_block=False):
        blk = []
        for i in range(num_residuals):
            if i == 0 and not first_block:
                blk.append(Residual(num_channels, use_1x1conv=True, strides=2))
            else:
                blk.append(Residual(num_channels))
        return nn.Sequential(*blk)

    def forward(self, X):
        return self.net(X)

In [7]:
class ResNet18(ResNet):
    def __init__(self, num_classes=10):
        super().__init__(((2, 64), (2, 128), (2, 256), (2, 512)),
                       num_classes)

In [8]:
model = ResNet18()
model.to(device)

ResNet18(
  (net): Sequential(
    (0): Sequential(
      (0): LazyConv2d(0, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3))
      (1): LazyBatchNorm2d(0, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU()
      (3): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    )
    (b2): Sequential(
      (0): Residual(
        (conv1): LazyConv2d(0, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (conv2): LazyConv2d(0, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (bn1): LazyBatchNorm2d(0, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (bn2): LazyBatchNorm2d(0, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (1): Residual(
        (conv1): LazyConv2d(0, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (conv2): LazyConv2d(0, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (bn1): LazyBatchNorm2d(0, eps=1e-05, momentu

In [9]:
batch_size = 256

In [10]:
Transform = transforms.Compose([
    transforms.Resize((227, 227)),
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
])

In [11]:
mnist_train = datasets.FashionMNIST(root="../data", train=True, transform=Transform, download=True)
mnist_val = datasets.FashionMNIST(root="../data", train=False, transform=Transform, download=True)

train_iter = data.DataLoader(mnist_train, batch_size, shuffle=True, num_workers=2)
val_iter = data.DataLoader(mnist_val, batch_size, shuffle=False, num_workers=2)

100%|██████████| 26.4M/26.4M [00:00<00:00, 111MB/s]
100%|██████████| 29.5k/29.5k [00:00<00:00, 3.69MB/s]
100%|██████████| 4.42M/4.42M [00:00<00:00, 58.3MB/s]
100%|██████████| 5.15k/5.15k [00:00<00:00, 4.00MB/s]


In [12]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

In [13]:
max_epochs = 3

In [14]:
for epoch in range(max_epochs):
  model.train()
  train_loss_sum, train_accuracy_sum, n = 0.0, 0.0, 0
  for images, labels in train_iter:
    images, labels = images.to(device), labels.to(device)
    y_pred = model(images)
    l = criterion(y_pred, labels)
    optimizer.zero_grad()
    l.backward()
    optimizer.step()
    train_loss_sum += l
    predicted_labels = torch.argmax(y_pred, dim=1)
    train_accuracy_sum += (predicted_labels == labels).float().sum()
    n += labels.numel()

  model.eval()
  test_accuracy_sum, test_n = 0.0, 0
  with torch.no_grad():
    for images, labels in val_iter:
      images, labels = images.to(device), labels.to(device)
      y_pred = model(images)
      predicted_labels = torch.argmax(y_pred, dim=1)
      test_accuracy_sum += (predicted_labels == labels).float().sum()
      test_n += labels.numel()
  test_accuracy = test_accuracy_sum / test_n
  print(f'Epoch {epoch + 1}, Loss: {train_loss_sum / n:.4f}, Train Accuracy: {train_accuracy_sum / n:.4f}, Validation Accuracy: {test_accuracy:.4f}')

Epoch 1, Loss: 0.0022, Train Accuracy: 0.7973, Validation Accuracy: 0.8231
Epoch 2, Loss: 0.0012, Train Accuracy: 0.8914, Validation Accuracy: 0.8833
Epoch 3, Loss: 0.0009, Train Accuracy: 0.9132, Validation Accuracy: 0.9045


In [4]:
class ResNeXtBlock(nn.Module):
    """The ResNeXt block."""
    def __init__(self, num_channels, groups, bot_mul, use_1x1conv=False,
                 strides=1):
        super().__init__()
        bot_channels = int(round(num_channels * bot_mul))
        self.conv1 = nn.LazyConv2d(bot_channels, kernel_size=1, stride=1)
        self.conv2 = nn.LazyConv2d(bot_channels, kernel_size=3,
                                   stride=strides, padding=1,
                                   groups=bot_channels//groups)
        self.conv3 = nn.LazyConv2d(num_channels, kernel_size=1, stride=1)
        self.bn1 = nn.LazyBatchNorm2d()
        self.bn2 = nn.LazyBatchNorm2d()
        self.bn3 = nn.LazyBatchNorm2d()
        if use_1x1conv:
            self.conv4 = nn.LazyConv2d(num_channels, kernel_size=1,
                                       stride=strides)
            self.bn4 = nn.LazyBatchNorm2d()
        else:
            self.conv4 = None

    def forward(self, X):
        Y = F.relu(self.bn1(self.conv1(X)))
        Y = F.relu(self.bn2(self.conv2(Y)))
        Y = self.bn3(self.conv3(Y))
        if self.conv4:
            X = self.bn4(self.conv4(X))
        return F.relu(Y + X)

In [5]:
blk = ResNeXtBlock(32, 16, 1)
X = torch.randn(4, 32, 96, 96)
blk(X).shape

torch.Size([4, 32, 96, 96])

In [8]:
class ResNeXt(nn.Module):
    def __init__(self, arch, num_classes=10, groups=32, bot_mul=2):
        super().__init__()
        self.net = nn.Sequential(self.b1())
        for i, b in enumerate(arch):
            # Pass groups and bot_mul to the block method
            self.net.add_module(f'b{i+2}', self.block(*b, groups=groups, bot_mul=bot_mul, first_block=(i==0)))
        self.net.add_module('last', nn.Sequential(
            nn.AdaptiveAvgPool2d((1, 1)), nn.Flatten(),
            nn.LazyLinear(num_classes)))

    def b1(self):
        return nn.Sequential(
            nn.LazyConv2d(64, kernel_size=7, stride=2, padding=3),
            nn.LazyBatchNorm2d(), nn.ReLU(),
            nn.MaxPool2d(kernel_size=3, stride=2, padding=1))

    def block(self, num_residuals, num_channels, groups, bot_mul, first_block=False):
        blk = []
        for i in range(num_residuals):
            if i == 0 and not first_block:
                # Pass groups and bot_mul to ResNeXtBlock
                blk.append(ResNeXtBlock(num_channels, groups, bot_mul, use_1x1conv=True, strides=2))
            else:
                # Pass groups and bot_mul to ResNeXtBlock
                blk.append(ResNeXtBlock(num_channels, groups, bot_mul))
        return nn.Sequential(*blk)

    def forward(self, X):
        return self.net(X)

In [9]:
class ResNext18(ResNeXt):
    def __init__(self, num_classes=10):
        super().__init__(((2, 64), (2, 128), (2, 256), (2, 512)),
                       num_classes)

In [11]:
model = ResNext18()