<a href="https://colab.research.google.com/github/Redcoder815/Deep_Learning_PyTorch/blob/main/23DenseNet.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import torch
from torch import nn
import torchvision.transforms as transforms
from torch.utils import data
from torchvision import datasets
import torch.optim as optim

In [2]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
device

device(type='cuda', index=0)

Dense block

In [3]:
def conv_block(num_channels):
    return nn.Sequential(
        nn.LazyBatchNorm2d(), nn.ReLU(),
        nn.LazyConv2d(num_channels, kernel_size=3, padding=1))

In [4]:
class DenseBlock(nn.Module):
    def __init__(self, num_convs, num_channels):
        super().__init__()
        layer = []
        for i in range(num_convs):
            layer.append(conv_block(num_channels))
        self.net = nn.Sequential(*layer)

    def forward(self, X):
        for blk in self.net:
            Y = blk(X)
            # Concatenate input and output of each block along the channels
            X = torch.cat((X, Y), dim=1)
        return X

The number of channels becomes 23 because of the way the DenseBlock concatenates the input and output of each conv_block. Let's trace the channel dimensions:

Initial Input X: When you create X = torch.randn(4, 3, 8, 8), the input has 3 channels.
First conv_block: The DenseBlock is initialized with num_convs=2 and num_channels=10. The first conv_block processes the initial X (3 channels) and outputs 10 channels (as defined by num_channels).
First Concatenation: The forward method concatenates the original X (3 channels) with the output of the first conv_block (10 channels). So, 3 + 10 = 13 channels.
Second conv_block: The second conv_block then receives this new X (13 channels) and outputs 10 channels.
Second Concatenation: Finally, this X (13 channels) is concatenated with the output of the second conv_block (10 channels). So, 13 + 10 = 23 channels.
This progressive concatenation leads to the final output having 23 channels.

----------------------------------------

Certainly! Let's break down the concatenation process even further. The key operation here is torch.cat((X, Y), dim=1).

torch.cat(): This PyTorch function is used to concatenate (join) tensors along a specified dimension.
(X, Y): These are the two tensors being concatenated. X is the current accumulated input to the DenseBlock (or the initial input), and Y is the output of the latest conv_block.
dim=1: This is crucial. In convolutional neural networks, the dimensions of a feature map are typically (batch_size, channels, height, width). When dim=1 is specified, torch.cat joins the tensors along the channel dimension. This means the number of channels from X and Y are added together.
Let's re-trace with this in mind, assuming X starts with 3 channels and each conv_block outputs 10 channels (num_channels=10):

Initial State: X has 3 channels. (e.g., (batch_size, 3, H, W))

First conv_block execution:

blk (the first conv_block) receives X (3 channels).
Y = blk(X): Y will have 10 channels. (e.g., (batch_size, 10, H, W))
First Concatenation:

X = torch.cat((X, Y), dim=1):
X (original) has 3 channels.
Y (from conv_block) has 10 channels.
The new X will have 3 + 10 = 13 channels. (e.g., (batch_size, 13, H, W))
Second conv_block execution:

blk (the second conv_block) receives the new X (which now has 13 channels).
Y = blk(X): Y will still have 10 channels (because num_channels for conv_block is 10). (e.g., (batch_size, 10, H, W))
Second Concatenation:

X = torch.cat((X, Y), dim=1):
X (current) has 13 channels.
Y (from conv_block) has 10 channels.
The final X will have 13 + 10 = 23 channels. (e.g., (batch_size, 23, H, W))
So, each time a conv_block within the DenseBlock processes data, its output channels are appended to the existing input channels, causing the channel count to grow additively with each step.

In [5]:
blk = DenseBlock(2, 10)
X = torch.randn(4, 3, 8, 8)
Y = blk(X)
Y.shape

torch.Size([4, 23, 8, 8])

Transition layer

In [6]:
def transition_block(num_channels):
    return nn.Sequential(
        nn.LazyBatchNorm2d(), nn.ReLU(),
        nn.LazyConv2d(num_channels, kernel_size=1),
        nn.AvgPool2d(kernel_size=2, stride=2))

The shape [4, 10, 4, 4] is obtained after blk(Y) due to the operations performed by the transition_block. Let's break it down:

Input Y shape: From the previous execution, Y has a shape of [4, 23, 8, 8] (batch size 4, 23 channels, 8x8 spatial dimensions).

nn.LazyBatchNorm2d() and nn.ReLU(): These layers are applied first within the transition_block. They perform normalization and activation functions, respectively, but do not change the spatial or channel dimensions of the tensor. So, the shape remains [4, 23, 8, 8].

nn.LazyConv2d(num_channels, kernel_size=1): This is a 1x1 convolutional layer. You initialized the transition_block with num_channels=10 (i.e., transition_block(10)).

A 1x1 convolution changes the number of output channels to num_channels (which is 10 here).
It does not change the spatial dimensions (height and width) when kernel_size=1 and default padding/stride are used.
So, after this layer, the tensor's shape becomes [4, 10, 8, 8].
nn.AvgPool2d(kernel_size=2, stride=2): This is an average pooling layer with a kernel size of 2x2 and a stride of 2.

Pooling layers reduce the spatial dimensions. With kernel_size=2 and stride=2, the height and width are effectively halved.
The number of channels remains unchanged.
So, after this layer, the spatial dimensions change from 8x8 to (8/2)x(8/2) = 4x4.
The final shape becomes [4, 10, 4, 4].
Combining these steps, the 1x1 convolution reduces the channels from 23 to 10, and the average pooling layer reduces the spatial dimensions from 8x8 to 4x4, resulting in the final shape [4, 10, 4, 4].

In [7]:
blk = transition_block(10)
blk(Y).shape

torch.Size([4, 10, 4, 4])

In [8]:
class DenseNet(nn.Module):
    def __init__(self, num_channels=64, growth_rate=32, arch=(4, 4, 4, 4),
                 num_classes=10):
        super().__init__()
        self.net = nn.Sequential(self.b1())
        for i, num_convs in enumerate(arch):
            self.net.add_module(f'dense_blk{i+1}', DenseBlock(num_convs,
                                                              growth_rate))
            # The number of output channels in the previous dense block
            num_channels += num_convs * growth_rate
            # A transition layer that halves the number of channels is added
            # between the dense blocks
            if i != len(arch) - 1:
                num_channels //= 2
                self.net.add_module(f'tran_blk{i+1}', transition_block(
                    num_channels))
        self.net.add_module('last', nn.Sequential(
            nn.LazyBatchNorm2d(), nn.ReLU(),
            nn.AdaptiveAvgPool2d((1, 1)), nn.Flatten(),
            nn.LazyLinear(num_classes)))

    def b1(self):
        return nn.Sequential(
            nn.LazyConv2d(64, kernel_size=7, stride=2, padding=3),
            nn.LazyBatchNorm2d(), nn.ReLU(),
            nn.MaxPool2d(kernel_size=3, stride=2, padding=1))
    def forward(self, X):
        return self.net(X)

In [9]:
model = DenseNet()
model.to(device)

DenseNet(
  (net): Sequential(
    (0): Sequential(
      (0): LazyConv2d(0, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3))
      (1): LazyBatchNorm2d(0, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU()
      (3): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    )
    (dense_blk1): DenseBlock(
      (net): Sequential(
        (0): Sequential(
          (0): LazyBatchNorm2d(0, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (1): ReLU()
          (2): LazyConv2d(0, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        )
        (1): Sequential(
          (0): LazyBatchNorm2d(0, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (1): ReLU()
          (2): LazyConv2d(0, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        )
        (2): Sequential(
          (0): LazyBatchNorm2d(0, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        

In [10]:
batch_size = 256

In [11]:
Transform = transforms.Compose([
    transforms.Resize((227, 227)),
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
])

In [12]:
mnist_train = datasets.FashionMNIST(root="../data", train=True, transform=Transform, download=True)
mnist_val = datasets.FashionMNIST(root="../data", train=False, transform=Transform, download=True)

train_iter = data.DataLoader(mnist_train, batch_size, shuffle=True, num_workers=2)
val_iter = data.DataLoader(mnist_val, batch_size, shuffle=False, num_workers=2)

100%|██████████| 26.4M/26.4M [00:00<00:00, 109MB/s]
100%|██████████| 29.5k/29.5k [00:00<00:00, 3.80MB/s]
100%|██████████| 4.42M/4.42M [00:00<00:00, 60.3MB/s]
100%|██████████| 5.15k/5.15k [00:00<00:00, 34.6MB/s]


In [13]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

In [14]:
max_epochs = 3

In [15]:
for epoch in range(max_epochs):
  model.train()
  train_loss_sum, train_accuracy_sum, n = 0.0, 0.0, 0
  for images, labels in train_iter:
    images, labels = images.to(device), labels.to(device)
    y_pred = model(images)
    l = criterion(y_pred, labels)
    optimizer.zero_grad()
    l.backward()
    optimizer.step()
    train_loss_sum += l
    predicted_labels = torch.argmax(y_pred, dim=1)
    train_accuracy_sum += (predicted_labels == labels).float().sum()
    n += labels.numel()

  model.eval()
  test_accuracy_sum, test_n = 0.0, 0
  with torch.no_grad():
    for images, labels in val_iter:
      images, labels = images.to(device), labels.to(device)
      y_pred = model(images)
      predicted_labels = torch.argmax(y_pred, dim=1)
      test_accuracy_sum += (predicted_labels == labels).float().sum()
      test_n += labels.numel()
  test_accuracy = test_accuracy_sum / test_n
  print(f'Epoch {epoch + 1}, Loss: {train_loss_sum / n:.4f}, Train Accuracy: {train_accuracy_sum / n:.4f}, Validation Accuracy: {test_accuracy:.4f}')

Epoch 1, Loss: 0.0022, Train Accuracy: 0.8059, Validation Accuracy: 0.7739
Epoch 2, Loss: 0.0013, Train Accuracy: 0.8827, Validation Accuracy: 0.8632
Epoch 3, Loss: 0.0010, Train Accuracy: 0.9028, Validation Accuracy: 0.8713
