# Deep Learning

# Tutorial 17: ResNet architecture

In this tutorial, we will cover:

- Architecture for Deep Neural Networks ResNet residual network 2015

Prerequisites:

- Python, Tensor basics, PyTorch

My contact:

- Niklas Beuter (niklas.beuter@th-luebeck.de)

Course:

- Slides and notebooks will be available at https://lernraum.th-luebeck.de/course/view.php?id=5383

## Expected Outcomes
* Understand the basic components of neural networks: layers, neurons, weights, biases, activations, and loss functions.
* Gain hands-on experience with the computational aspects of setting up neural networks, including training and usage.
* Learn how to add layers with correct sizes to a deep neural network

# Introduction to ResNet

[ResNet](https://arxiv.org/abs/1512.03385), short for Residual Network, is a type of convolutional neural network (CNN) that was introduced by Kaiming He and others in their seminal paper, "Deep Residual Learning for Image Recognition," during the 2015 ImageNet competition. ResNet revolutionized the way deep neural networks are constructed and paved the way for building much deeper networks than were previously feasible.

## Architecture Overview

The primary innovation in ResNet is the introduction of "residual blocks." These blocks allow signals to propagate directly from earlier layers to later layers through what is known as a "skip connection" or "shortcut connection." A typical residual block has two main paths: the main path, where data goes through convolutional layers, and the shortcut path, which skips these layers.

The formula for a basic Residual block can be represented as:

$$
\mathbf{y} = \mathcal{F}(\mathbf{x}, \{W_i\}) + \mathbf{x}
$$

Here, $ \mathbf{x} $ and $ \mathbf{y} $ are the input and output vectors of the layers considered. The function $ \mathcal{F}(\mathbf{x}, \{W_i\}) $ represents the residual mapping to be learned.

## Key Features of ResNet

- **Deeper Networks**: By using residual blocks, ResNets can effectively train networks with much greater depth (over 100 layers) without suffering from the vanishing gradient problem.
- **Improved Training**: The skip connections in ResNet facilitate the flow of gradients throughout the network, which improves the convergence during training.
- **Versatility**: ResNet has proven to be highly effective for a wide range of applications beyond image classification, including object detection and segmentation.

## Training

ResNet architectures are often trained using stochastic gradient descent (SGD), with momentum and weight decay. Batch normalization is also a critical component used in every Residual block which helps in accelerating the training process.

## Impact and Legacy

ResNet was the winner of the ILSVRC 2015 classification task, and its design has influenced a wide array of subsequent deep learning architectures. Its ability to enable the training of networks that are significantly deeper than those previously possible has made it a standard choice for many computer vision tasks.

ResNet has several popular variants, such as ResNet-18, ResNet-34, ResNet-50, ResNet-101, and ResNet-152, differing in the number of layers. Each variant offers a trade-off between computational complexity and predictive power, making them suitable for different applications based on resource availability and requirement.

## Conclusion

ResNet's introduction of residual learning fundamentally changed the landscape of deep neural networks by demonstrating that with proper architectural considerations, deeper networks can be trained effectively. Its legacy continues to impact the design of new architectures aiming for greater depths and improved performance.


# Let us implement ResNet

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import torchvision.models as models

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)

cuda


In [2]:
# Loading CIFAR10 dataset
transform = transforms.Compose([
    transforms.Resize(224),  # Resize images to fit ResNets input dimensions
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=32, shuffle=True)

testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=32, shuffle=False)

Files already downloaded and verified
Files already downloaded and verified


## The basic block of ResNet

We have two convolutions combined with BatchNorm and the shortcut. The shortcut is just added to the outcome of the block.

The ResNet itself builds the corresponding blocks with specific sizes and arranges them in different layers of these blocks. The corresponding architecture can be found in the paper. 

In [3]:
class BasicBlock(nn.Module):
    expansion = 1

    def __init__(self, in_planes, planes, stride=1):
        super(BasicBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(planes)

        self.shortcut = nn.Sequential()
        if stride != 1 or in_planes != self.expansion * planes:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_planes, self.expansion * planes, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(self.expansion * planes)
            )

    def forward(self, x):
        out = nn.ReLU()(self.bn1(self.conv1(x)))
        out = self.bn2(self.conv2(out))
        out += self.shortcut(x)
        out = nn.ReLU()(out)
        return out

class ResNet(nn.Module):
    def __init__(self, block, num_blocks, num_classes=10):
        super(ResNet, self).__init__()
        self.in_planes = 64

        self.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.layer1 = self._make_layer(block, 64, num_blocks[0], stride=1)
        self.layer2 = self._make_layer(block, 128, num_blocks[1], stride=2)
        self.layer3 = self._make_layer(block, 256, num_blocks[2], stride=2)
        self.layer4 = self._make_layer(block, 512, num_blocks[3], stride=2)
        self.linear = nn.Linear(512 * block.expansion, num_classes)

    def _make_layer(self, block, planes, num_blocks, stride):
        strides = [stride] + [1]*(num_blocks-1)
        layers = []
        for stride in strides:
            layers.append(block(self.in_planes, planes, stride))
            self.in_planes = planes * block.expansion
        return nn.Sequential(*layers)

    def forward(self, x):
        out = nn.ReLU()(self.bn1(self.conv1(x)))
        out = self.layer1(out)
        out = self.layer2(out)
        out = self.layer3(out)
        out = self.layer4(out)
        out = nn.AdaptiveAvgPool2d((1, 1))(out)
        out = out.view(out.size(0), -1)
        out = self.linear(out)
        return out

def ResNet18():
    return ResNet(BasicBlock, [2, 2, 2, 2])

In [4]:
# Training loop (simplified)
def train(model, device, train_loader, criterion, optimizer, epochs):
    model.train() # set model in training mode
    for epoch in range(epochs):
        for images, labels in train_loader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            loss = criterion(outputs, labels)
            
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
        
        print(f'Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}')

In [5]:
def initialize_weights(m):
    if isinstance(m, nn.Conv2d):
        torch.nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
    elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
        torch.nn.init.constant_(m.weight, 1)
        torch.nn.init.constant_(m.bias, 0)
    elif isinstance(m, nn.Linear):
        torch.nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
        torch.nn.init.constant_(m.bias, 0)

In [6]:
def evaluate_model(model, test_loader):
    model.eval()  # Set model to evaluation mode
    correct = 0
    total = 0
    with torch.no_grad():
        for images, labels in test_loader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
    accuracy = 100 * correct / total
    print(f'Accuracy: {accuracy:.2f}%')

In [12]:
# Define the model
net = ResNet18()

# Initialize weights
net.apply(initialize_weights)
net.to(device)  # Move the model to the appropriate device

# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.01, momentum=0.5, weight_decay=5e-4)

In [13]:
# Train the model
train(net, device, trainloader, criterion, optimizer, epochs=10)
evaluate_model(net, testloader)

Epoch [1/10], Loss: 1.7727
Epoch [2/10], Loss: 1.2070
Epoch [3/10], Loss: 1.4251
Epoch [4/10], Loss: 0.8442
Epoch [5/10], Loss: 1.6031
Epoch [6/10], Loss: 1.0799
Epoch [7/10], Loss: 1.0879
Epoch [8/10], Loss: 1.0204
Epoch [9/10], Loss: 0.7621
Epoch [10/10], Loss: 0.9608
Accuracy: 69.68%


In [9]:
evaluate_model(net, testloader)

Accuracy: 60.90%


## Standard PyTorch Model Implementation of ResNet

In [None]:
# Load the pretrained ResNet18 model
net = models.resnet18(pretrained=True)

# Modify the final layer to fit CIFAR10 (10 classes)
net.fc = nn.Linear(net.fc.in_features, 10)
net.to(device)  # Move the model to the appropriate device

# Define the loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

In [None]:
# Train the model
train(net, device, trainloader, criterion, optimizer, epochs=2)
evaluate_model(net, testloader)

## Load the standard model with pretrained weights and fine tune it

Keep all parameters frozen, but adjust the parameters of the fully connected layer at the end. Training still needs ~40 minutes on a good GPU.

In [14]:
# Load the pretrained ResNet18 model
net = models.resnet18(pretrained=True)
net.fc = nn.Linear(net.fc.in_features, 10)

print(net)



ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
  

In [7]:
# Freeze all parameters in the model
for param in net.parameters():
    param.requires_grad = False

# Unfreeze the parameters in the last fully connected layer
for param in net.fc.parameters():
    param.requires_grad = True

# Unfreeze the parameters in the last feature layer
for param in net.layer4.parameters():
    param.requires_grad = True

net.to(device)  # Move the model to the appropriate device

# Define your optimizer to only update the parameters of the last fully connected layer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(net.fc.parameters(), lr=0.01)

In [8]:
# Train the model
train(net, device, trainloader, criterion, optimizer, epochs=2)
evaluate_model(net, testloader)

Epoch [1/2], Loss: 0.7819
Epoch [2/2], Loss: 0.4660
Accuracy: 78.35%


# References

This notebook is uses following sources:
* [Dive into deep learning](https://d2l.ai/)