 # **CNN Tutorial Fall, 2022**


# A Convolutional ResNet and Residual Blocks

Please note that this example does not implement a really deep ResNet as described in literature but rather illustrates how the residual blocks described in He et al. [1] can be implemented in PyTorch.

- [1] He, Kaiming, et al. "Deep residual learning for image recognition." *Proceedings of the IEEE conference on computer vision and pattern recognition*. 2016.

## Imports

In [1]:
import time
import numpy as np
import torch
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision import transforms

## Settings and Dataset

In [2]:
print(torch.cuda.is_available())

False


In [3]:
##########################
### SETTINGS
##########################

# Device
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# Hyperparameters
random_seed = 123
learning_rate = 0.01
num_epochs = 5
batch_size = 128

# Architecture
num_classes = 10


##########################
### MNIST DATASET
##########################

# Note transforms.ToTensor() scales input images
# to 0-1 range
train_dataset = datasets.MNIST(root='data', 
                               train=True, 
                               transform=transforms.ToTensor(),
                               download=True)

test_dataset = datasets.MNIST(root='data', 
                              train=False, 
                              transform=transforms.ToTensor())


train_loader = DataLoader(dataset=train_dataset, 
                          batch_size=batch_size, 
                          shuffle=True)

test_loader = DataLoader(dataset=test_dataset, 
                         batch_size=batch_size, 
                         shuffle=False)

# Checking the dataset
for images, labels in train_loader:  
    print('Image batch dimensions:', images.shape)
    print('Image label dimensions:', labels.shape)
    break

Image batch dimensions: torch.Size([128, 1, 28, 28])
Image label dimensions: torch.Size([128])


## ResNet with identity blocks

The following code implements the residual blocks with skip connections such that the input passed via the shortcut matches the dimensions of the main path's output, which allows the network to learn identity functions. Such a residual block is illustrated below:

![](./2-resnet-ex/resnet-ex-1-1.png)

In [None]:
##########################
### MODEL
##########################


class ConvNet(torch.nn.Module):

    def __init__(self, num_classes):
        super(ConvNet, self).__init__()
        
        #########################
        ### 1st residual block
        #########################
        
        self.block_1 = torch.nn.Sequential(
                torch.nn.Conv2d(in_channels=1,
                                out_channels=4,
                                kernel_size=(1, 1),
                                stride=(1, 1),
                                padding=0),
                torch.nn.BatchNorm2d(4),
                torch.nn.ReLU(inplace=True),
                torch.nn.Conv2d(in_channels=4,
                                out_channels=1,
                                kernel_size=(3, 3),
                                stride=(1, 1),
                                padding=1),
                torch.nn.BatchNorm2d(1)
        )
        
        self.block_2 = torch.nn.Sequential(
                torch.nn.Conv2d(in_channels=1,
                                out_channels=4,
                                kernel_size=(1, 1),
                                stride=(1, 1),
                                padding=0),
                torch.nn.BatchNorm2d(4),
                torch.nn.ReLU(inplace=True),
                torch.nn.Conv2d(in_channels=4,
                                out_channels=1,
                                kernel_size=(3, 3),
                                stride=(1, 1),
                                padding=1),
                torch.nn.BatchNorm2d(1)
        )

        #########################
        ### Fully connected
        #########################        
        self.linear_1 = torch.nn.Linear(1*28*28, num_classes)

        
    def forward(self, x):
        
        #########################
        ### 1st residual block
        #########################
        shortcut = x
        x = self.block_1(x)
        x = torch.nn.functional.relu(x + shortcut)
        
        #########################
        ### 2nd residual block
        #########################
        shortcut = x
        x = self.block_2(x)
        x = torch.nn.functional.relu(x + shortcut)
        
        #########################
        ### Fully connected
        #########################
        logits = self.linear_1(x.view(-1,  1*28*28))
        return logits

    
torch.manual_seed(random_seed)
model = ConvNet(num_classes=num_classes)
model = model.to(device)
    
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)  

In [None]:
### added 
# 实验1: 无学习率调度 (基线)
optimizer1 = torch.optim.Adam(model.parameters(), lr=0.001)
acc1, lr1, loss1 = train_model(optimizer1, scheduler=None, scheduler_name="No Scheduling")

# 实验2: 步长衰减
optimizer2 = torch.optim.Adam(model.parameters(), lr=0.001)
scheduler2 = torch.optim.lr_scheduler.StepLR(optimizer2, step_size=30, gamma=0.1)
acc2, lr2, loss2 = train_model(optimizer2, scheduler2, "Step Decay")

# 实验3: 指数衰减
optimizer3 = torch.optim.Adam(model.parameters(), lr=0.001)
scheduler3 = torch.optim.lr_scheduler.ExponentialLR(optimizer3, gamma=0.95)
acc3, lr3, loss3 = train_model(optimizer3, scheduler3, "Exponential Decay")

# 实验4: 余弦退火
optimizer4 = torch.optim.Adam(model.parameters(), lr=0.001)
scheduler4 = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer4, T_max=epochs)
acc4, lr4, loss4 = train_model(optimizer4, scheduler4, "Cosine Annealing")

# 实验5: 带热重启的余弦退火
optimizer5 = torch.optim.Adam(model.parameters(), lr=0.001)
scheduler5 = torch.optim.lr_scheduler.CosineAnnealingWarmRestarts(optimizer5, T_0=25, T_mult=2)
acc5, lr5, loss5 = train_model(optimizer5, scheduler5, "Cosine Warm Restarts")

### Training

In [5]:
def compute_accuracy(model, data_loader):
    correct_pred, num_examples = 0, 0
    for i, (features, targets) in enumerate(data_loader):            
        features = features.to(device)
        targets = targets.to(device)
        logits = model(features)
        _, predicted_labels = torch.max(logits, 1)
        num_examples += targets.size(0)
        correct_pred += (predicted_labels == targets).sum()
    return correct_pred.float()/num_examples * 100


start_time = time.time()
for epoch in range(num_epochs):
    model = model.train()
    for batch_idx, (features, targets) in enumerate(train_loader):
        
        features = features.to(device)
        targets = targets.to(device)
        
        ### FORWARD AND BACK PROP
        logits = model(features)
        cost = torch.nn.functional.cross_entropy(logits, targets)
        optimizer.zero_grad()
        
        cost.backward()
        
        ### UPDATE MODEL PARAMETERS
        optimizer.step()
        
        ### LOGGING
        if not batch_idx % 250:
            print ('Epoch: %03d/%03d | Batch %03d/%03d | Cost: %.4f' 
                   %(epoch+1, num_epochs, batch_idx, 
                     len(train_loader), cost))

    model = model.eval() # eval mode to prevent upd. batchnorm params during inference
    with torch.set_grad_enabled(False): # save memory during inference
        print('Epoch: %03d/%03d training accuracy: %.2f%%' % (
              epoch+1, num_epochs, 
              compute_accuracy(model, train_loader)))

    print('Time elapsed: %.2f min' % ((time.time() - start_time)/60))
    
print('Total Training Time: %.2f min' % ((time.time() - start_time)/60))

Epoch: 001/005 | Batch 000/469 | Cost: 2.6800
Epoch: 001/005 | Batch 250/469 | Cost: 0.3114
Epoch: 001/005 training accuracy: 90.64%
Time elapsed: 0.80 min
Epoch: 002/005 | Batch 000/469 | Cost: 0.3199
Epoch: 002/005 | Batch 250/469 | Cost: 0.2050
Epoch: 002/005 training accuracy: 91.25%
Time elapsed: 1.58 min
Epoch: 003/005 | Batch 000/469 | Cost: 0.2060
Epoch: 003/005 | Batch 250/469 | Cost: 0.2725
Epoch: 003/005 training accuracy: 92.58%
Time elapsed: 2.36 min
Epoch: 004/005 | Batch 000/469 | Cost: 0.2786
Epoch: 004/005 | Batch 250/469 | Cost: 0.3188
Epoch: 004/005 training accuracy: 93.00%
Time elapsed: 3.12 min
Epoch: 005/005 | Batch 000/469 | Cost: 0.3492
Epoch: 005/005 | Batch 250/469 | Cost: 0.3000
Epoch: 005/005 training accuracy: 93.23%
Time elapsed: 3.91 min
Total Training Time: 3.91 min


In [6]:
print('Test accuracy: %.2f%%' % (compute_accuracy(model, test_loader)))

Test accuracy: 92.46%


## ResNet with convolutional blocks for resizing

The following code implements the residual blocks with skip connections such that the input passed via the shortcut matches is resized to dimensions of the main path's output. Such a residual block is illustrated below:

![](./2-resnet-ex/resnet-ex-1-2.png)

In [7]:
class ResidualBlock(torch.nn.Module):
    """ Helper Class"""

    def __init__(self, channels):
        
        super(ResidualBlock, self).__init__()
        
        self.block = torch.nn.Sequential(
                torch.nn.Conv2d(in_channels=channels[0],
                                out_channels=channels[1],
                                kernel_size=(3, 3),
                                stride=(2, 2),
                                padding=1),
                torch.nn.BatchNorm2d(channels[1]),
                torch.nn.ReLU(inplace=True),
                torch.nn.Conv2d(in_channels=channels[1],
                                out_channels=channels[2],
                                kernel_size=(1, 1),
                                stride=(1, 1),
                                padding=0),   
                torch.nn.BatchNorm2d(channels[2])
        )

        self.shortcut = torch.nn.Sequential(
                torch.nn.Conv2d(in_channels=channels[0],
                                out_channels=channels[2],
                                kernel_size=(1, 1),
                                stride=(2, 2),
                                padding=0),
                torch.nn.BatchNorm2d(channels[2])
        )
            
    def forward(self, x):
        shortcut = x
        
        block = self.block(x)
        shortcut = self.shortcut(x)    
        x = torch.nn.functional.relu(block+shortcut)

        return x

In [8]:
##########################
### MODEL
##########################



class ConvNet(torch.nn.Module):

    def __init__(self, num_classes):
        super(ConvNet, self).__init__()
        
        self.residual_block_1 = ResidualBlock(channels=[1, 4, 8])
        self.residual_block_2 = ResidualBlock(channels=[8, 16, 32])
    
        self.linear_1 = torch.nn.Linear(7*7*32, num_classes)

        
    def forward(self, x):

        out = self.residual_block_1(x)
        out = self.residual_block_2(out)
         
        logits = self.linear_1(out.view(-1, 7*7*32))
        return logits

    
torch.manual_seed(random_seed)
model = ConvNet(num_classes=num_classes)

model.to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)  

### Training

In [None]:
for epoch in range(num_epochs):
    model = model.train()
    for batch_idx, (features, targets) in enumerate(train_loader):
        
        features = features.to(device)
        targets = targets.to(device)
            
        ### FORWARD AND BACK PROP
        logits = model(features)
        cost = torch.nn.functional.cross_entropy(logits, targets)
        optimizer.zero_grad()
        
        cost.backward()
        
        ### UPDATE MODEL PARAMETERS
        optimizer.step()
        
        ### LOGGING
        if not batch_idx % 50:
            print ('Epoch: %03d/%03d | Batch %03d/%03d | Cost: %.4f' 
                   %(epoch+1, num_epochs, batch_idx, 
                     len(train_dataset)//batch_size, cost))

    model = model.eval() # eval mode to prevent upd. batchnorm params during inference
    with torch.set_grad_enabled(False): # save memory during inference
        print('Epoch: %03d/%03d training accuracy: %.2f%%' % (
              epoch+1, num_epochs, 
              compute_accuracy(model, train_loader)))

Epoch: 001/005 | Batch 000/468 | Cost: 2.3534
Epoch: 001/005 | Batch 050/468 | Cost: 0.2685
Epoch: 001/005 | Batch 100/468 | Cost: 0.2457
Epoch: 001/005 | Batch 150/468 | Cost: 0.0959
Epoch: 001/005 | Batch 200/468 | Cost: 0.0691
Epoch: 001/005 | Batch 250/468 | Cost: 0.1175
Epoch: 001/005 | Batch 300/468 | Cost: 0.2868
Epoch: 001/005 | Batch 350/468 | Cost: 0.1927
Epoch: 001/005 | Batch 400/468 | Cost: 0.0589
Epoch: 001/005 | Batch 450/468 | Cost: 0.1199
Epoch: 001/005 training accuracy: 97.39%
Epoch: 002/005 | Batch 000/468 | Cost: 0.1081
Epoch: 002/005 | Batch 050/468 | Cost: 0.0487
Epoch: 002/005 | Batch 100/468 | Cost: 0.0918
Epoch: 002/005 | Batch 150/468 | Cost: 0.1325
Epoch: 002/005 | Batch 200/468 | Cost: 0.1049
Epoch: 002/005 | Batch 250/468 | Cost: 0.0362
Epoch: 002/005 | Batch 300/468 | Cost: 0.0348
Epoch: 002/005 | Batch 350/468 | Cost: 0.0418
Epoch: 002/005 | Batch 400/468 | Cost: 0.0290
Epoch: 002/005 | Batch 450/468 | Cost: 0.0717
Epoch: 002/005 training accuracy: 98.30

In [10]:
print('Test accuracy: %.2f%%' % (compute_accuracy(model, test_loader)))

Test accuracy: 98.21%
