<div class="alert alert-block alert-info">
<b>Deadline:</b> March 15, 2023 (Wednesday) 23:00
</div>

# Exercise 2. Convolutional networks. VGG-style network.

In the second part you need to train a convolutional neural network with an architecture inspired by a VGG-network [(Simonyan \& Zisserman, 2015)](https://arxiv.org/abs/1409.1556).

In [46]:
skip_training = True  # Set this flag to True before validation and submission

In [47]:
# During evaluation, this cell sets skip_training to True
# skip_training = True

import tools, warnings
warnings.showwarning = tools.customwarn

In [48]:
import os
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

import torch
import torchvision
import torchvision.transforms as transforms

import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

import tools
import tests

In [49]:
# When running on your own computer, you can specify the data directory by:
# data_dir = tools.select_data_dir('/your/local/data/directory')
data_dir = tools.select_data_dir()

The data directory is /coursedata


In [50]:
# Select the device for training (use GPU if you have one)
#device = torch.device('cuda:0')
device = torch.device('cpu')

In [51]:
if skip_training:
    # The models are always evaluated on CPU
    device = torch.device("cpu")

## FashionMNIST dataset

Let us use the FashionMNIST dataset. It consists of 60,000 training images of 10 classes: 'T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot'.

In [52]:
transform = transforms.Compose([
    transforms.ToTensor(),  # Transform to tensor
    transforms.Normalize((0.5,), (0.5,))  # Scale images to [-1, 1]
])

trainset = torchvision.datasets.FashionMNIST(root=data_dir, train=True, download=True, transform=transform)
testset = torchvision.datasets.FashionMNIST(root=data_dir, train=False, download=True, transform=transform)

classes = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal',
           'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

trainloader = torch.utils.data.DataLoader(trainset, batch_size=32, shuffle=True)
testloader = torch.utils.data.DataLoader(testset, batch_size=5, shuffle=False)

# VGG-style network

Let us now define a convolution neural network with an architecture inspired by the [VGG-net](https://arxiv.org/abs/1409.1556).

The architecture:
- A block of three convolutional layers with:
    - 3x3 kernel
    - 20 output channels
    - one pixel zero-pading on both sides
    - 2d batch normalization after each convolutional layer
    - ReLU nonlinearity after each 2d batch normalization layer
- Max pooling layer with 2x2 kernel and stride 2.
- A block of three convolutional layers with:
    - 3x3 kernel
    - 40 output channels
    - one pixel zero-pading on both sides
    - 2d batch normalization after each convolutional layer
    - ReLU nonlinearity after each 2d batch normalization layer
- Max pooling layer with 2x2 kernel and stride 2.
- One convolutional layer with:
    - 3x3 kernel
    - 60 output channels
    - *no padding*
    - 2d batch normalization after the convolutional layer
    - ReLU nonlinearity after the 2d batch normalization layer
- One convolutional layer with:
    - 1x1 kernel
    - 40 output channels
    - *no padding*
    - 2d batch normalization after the convolutional layer
    - ReLU nonlinearity after the 2d batch normalization layer
- One convolutional layer with:
    - 1x1 kernel
    - 20 output channels
    - *no padding*
    - 2d batch normalization after the convolutional layer
    - ReLU nonlinearity after the 2d batch normalization layer
- Global average pooling (compute the average value of each channel across all the input locations):
    - 5x5 kernel (the input of the layer should be 5x5)
- A fully-connected layer with 10 outputs (no nonlinearity)

Notes:
* Batch normalization is expected to be right after a convolutional layer, before nonlinearity.
* We recommend that you check the number of modules with trainable parameters in your network.

In [53]:
class VGGNet(nn.Module):
    def __init__(self):
        super(VGGNet, self).__init__()
        # YOUR CODE HERE
        # raise NotImplementedError()

        # Convolutional layer
        # torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, 
        # dilation=1, groups=1, bias=True, padding_mode='zeros', device=None, dtype=None)
        
        # Max pooling layer
        # torch.nn.MaxPool2d(kernel_size, stride=None, padding=0, dilation=1, 
        #                    return_indices=False, ceil_mode=False)
        
        # ReLU function
        # torch.nn.ReLU(inplace=False)
        
        # Fully connected (fc) layer
        # torch.nn.Linear(in_features, out_features, bias=True, device=None, dtype=None)
        
        # torch.nn.BatchNorm2d(num_features, eps=1e-05, momentum=0.1, affine=True, 
        #                      track_running_stats=True, device=None, dtype=None)
        # num_features (int) – C from the 4D shape input (N, C, H, W)
        
        # torch.nn.AvgPool2d(kernel_size, stride=None, padding=0, ceil_mode=False, 
        #                    count_include_pad=True, divisor_override=None)
        
        # Shape of the input tensor: torch.Size([32, 1, 28, 28])
        
        # A block of three convolutional layers 
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=20, kernel_size=3, stride=1, padding=1)
        self.batchnorm1 = nn.BatchNorm2d(num_features=20)
        self.relu1 = nn.ReLU()
    
        self.conv2 = nn.Conv2d(in_channels=20, out_channels=20, kernel_size=3, stride=1, padding=1)
        self.batchnorm2 = nn.BatchNorm2d(num_features=20)
        self.relu2 = nn.ReLU()
        
        self.conv3 = nn.Conv2d(in_channels=20, out_channels=20, kernel_size=3, stride=1, padding=1)
        self.batchnorm3 = nn.BatchNorm2d(num_features=20)
        self.relu3 = nn.ReLU()
        
        # torch.Size([32, 20, 28, 28])
        
        # Max pooling layer with 2x2 kernel and stride 2
        self.maxpool1 = nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
        
        # torch.Size([32, 20, 14, 14])
        
        # A block of three convolutional layers 
        self.conv4 = nn.Conv2d(in_channels=20, out_channels=40, kernel_size=3, stride=1, padding=1)
        self.batchnorm4 = nn.BatchNorm2d(num_features=40)
        self.relu4 = nn.ReLU()
    
        self.conv5 = nn.Conv2d(in_channels=40, out_channels=40, kernel_size=3, stride=1, padding=1)
        self.batchnorm5 = nn.BatchNorm2d(num_features=40)
        self.relu5 = nn.ReLU()
        
        self.conv6 = nn.Conv2d(in_channels=40, out_channels=40, kernel_size=3, stride=1, padding=1)
        self.batchnorm6 = nn.BatchNorm2d(num_features=40)
        self.relu6 = nn.ReLU()
        
        # torch.Size([32, 40, 14, 14])
        
        # Max pooling layer with 2x2 kernel and stride 2
        self.maxpool2 = nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
        
        # torch.Size([32, 40, 7, 7])
        
        # One convolutional layer
        self.conv7 = nn.Conv2d(in_channels=40, out_channels=60, kernel_size=3, stride=1, padding=0)
        self.batchnorm7 = nn.BatchNorm2d(num_features=60)
        self.relu7 = nn.ReLU()

        # torch.Size([32, 60, 5, 5])
        
        # One convolutional layer
        self.conv8 = nn.Conv2d(in_channels=60, out_channels=40, kernel_size=1, stride=1, padding=0)
        self.batchnorm8 = nn.BatchNorm2d(num_features=40)
        self.relu8 = nn.ReLU()

        # torch.Size([32, 40, 5, 5])
        
        # One convolutional layer
        self.conv9 = nn.Conv2d(in_channels=40, out_channels=20, kernel_size=1, stride=1, padding=0)
        self.batchnorm9 = nn.BatchNorm2d(num_features=20)
        self.relu9 = nn.ReLU()
        
        # torch.Size([32, 20, 5, 5])
        
        # Global average pooling
        self.avgpool1 = nn.AvgPool2d(kernel_size=5)
        
        # torch.Size([32, 20, 1, 1])
        
        # Flatten in the forward to turn this into the shape below
        # torch.Size([32, 20])
        
        # A fully-connected layer with 10 outputs
        self.fc1 = nn.Linear(in_features=20, out_features=10)
        
        # torch.Size([32, 10])
        
    def forward(self, x, verbose=False):
        """
        Args:
          x of shape (batch_size, 1, 28, 28): Input images.
          verbose: True if you want to print the shapes of the intermediate variables.
        
        Returns:
          y of shape (batch_size, 10): Outputs of the network.
        """
        # YOUR CODE HERE
        # raise NotImplementedError()

        # A block of three convolutional layers 
 
        #print(x.shape)
        
        y = self.conv1(x)
        y = self.batchnorm1(y)
        y = self.relu1(y)
    
        y = self.conv2(y)
        y = self.batchnorm2(y)
        y = self.relu2(y)
        
        y = self.conv3(y)
        y = self.batchnorm3(y)
        y = self.relu3(y)
        
        #print(y.shape)
        
        # Max pooling layer with 2x2 kernel and stride 2
        y = self.maxpool1(y)
        
        #print(y.shape)
        
        # A block of three convolutional layers 
        y = self.conv4(y)
        y = self.batchnorm4(y)
        y = self.relu4(y)
    
        y = self.conv5(y)
        y = self.batchnorm5(y)
        y = self.relu5(y)
        
        y = self.conv6(y)
        y = self.batchnorm6(y)
        y = self.relu6(y)
        
        #print(y.shape)

        # Max pooling layer with 2x2 kernel and stride 2
        y = self.maxpool2(y)
        
        #print(y.shape)
        
        # One convolutional layer
        y = self.conv7(y)
        y = self.batchnorm7(y)
        y = self.relu7(y)

        # One convolutional layer
        y = self.conv8(y)
        y = self.batchnorm8(y)
        y = self.relu8(y)

        # One convolutional layer
        y = self.conv9(y)
        y = self.batchnorm9(y)
        y = self.relu9(y)
        
        #print(y.shape)
        
        # Global average pooling
        y = self.avgpool1(y)
        
        #print(y.shape)
        
        # Flatten y
        feature = 1
        for size in y.size()[1:]:
            feature *= size
        
        y = y.view(-1, feature)
        
        #print(y.shape)
        
        # A fully-connected layer with 10 outputs
        y = self.fc1(y)
        
        #print(y.shape)
        
        return y

In [54]:
def test_VGGNet_shapes():
    net = VGGNet()
    net.to(device)

    # Feed a batch of images from the training data to test the network
    with torch.no_grad():
        images, labels = next(iter(trainloader))
        images = images.to(device)
        print('Shape of the input tensor:', images.shape)

        y = net(images, verbose=True)
        assert y.shape == torch.Size([trainloader.batch_size, 10]), f"Bad y.shape: {y.shape}"

    print('Success')

test_VGGNet_shapes()

Shape of the input tensor: torch.Size([32, 1, 28, 28])
Success


In [55]:
# Check the number of layers
def test_vgg_layers():
    net = VGGNet()
    
    # get gradients for parameters in forward path
    net.zero_grad()
    x = torch.randn(1, 1, 28, 28)
    outputs = net(x)
    outputs[0,0].backward()

    n_conv_layers = sum(1 for module in net.modules()
                        if isinstance(module, nn.Conv2d) and next(module.parameters()).grad is not None)
    assert n_conv_layers == 9, f"Wrong number of convolutional layers ({n_conv_layers})"

    n_bn_layers = sum(1 for module in net.modules()
                      if isinstance(module, nn.BatchNorm2d) and next(module.parameters()).grad is not None)
    assert n_bn_layers == 9, f"Wrong number of batch norm layers ({n_bn_layers})"

    n_linear_layers = sum(1 for module in net.modules()
                          if isinstance(module, nn.Linear) and next(module.parameters()).grad is not None)
    assert n_linear_layers == 1, f"Wrong number of linear layers ({n_linear_layers})"

    print('Success')

def test_vgg_net():
    net = VGGNet()
    
    # get gradients for parameters in forward path
    net.zero_grad()
    x = torch.randn(1, 1, 28, 28)
    outputs = net(x)
    outputs[0,0].backward()
    
    parameter_shapes = sorted(tuple(p.shape) for p in net.parameters() if p.grad is not None)
    print(parameter_shapes)
    expected = [
        (10,), (10, 20), (20,), (20,), (20,), (20,), (20,), (20,), (20,), (20,), (20,),
        (20,), (20,), (20,), (20, 1, 3, 3), (20, 20, 3, 3), (20, 20, 3, 3), (20, 40, 1, 1),
        (40,), (40,), (40,), (40,), (40,), (40,), (40,), (40,), (40,), (40,), (40,), (40,),
        (40, 20, 3, 3), (40, 40, 3, 3), (40, 40, 3, 3), (40, 60, 1, 1), (60,), (60,), (60,),
        (60, 40, 3, 3)]
    assert parameter_shapes == expected, "Wrong number of training parameters."
    
    print('Success')

test_vgg_layers()
test_vgg_net()

Success
[(10,), (10, 20), (20,), (20,), (20,), (20,), (20,), (20,), (20,), (20,), (20,), (20,), (20,), (20,), (20, 1, 3, 3), (20, 20, 3, 3), (20, 20, 3, 3), (20, 40, 1, 1), (40,), (40,), (40,), (40,), (40,), (40,), (40,), (40,), (40,), (40,), (40,), (40,), (40, 20, 3, 3), (40, 40, 3, 3), (40, 40, 3, 3), (40, 60, 1, 1), (60,), (60,), (60,), (60, 40, 3, 3)]
Success


# Train the network

In [56]:
# This function computes the accuracy on the test dataset
def compute_accuracy(net, testloader):
    net.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for images, labels in testloader:
            images, labels = images.to(device), labels.to(device)
            outputs = net(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
    return correct / total

### Training loop

Your task is to implement the training loop. The recommended hyperparameters:
* Adam optimizer with learning rate 0.01.
* Cross-entropy loss. Note that we did not use softmax nonlinearity in the final layer of our network. Therefore, we need to use a loss function with log_softmax implemented, such as [nn.CrossEntropyLoss](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html#torch.nn.CrossEntropyLoss).
* Number of epochs: 10

We recommend you to use function `compute_accuracy()` defined above to track the accaracy during training. The test accuracy should be above 0.89.

**Note: function `compute_accuracy()` sets the network into the evaluation mode which changes the way the batch statistics are computed in batch normalization. You need to set the network into the training mode (by calling `net.train()`) when you want to perform training.**

In [57]:
net = VGGNet()

In [58]:
# Implement the training loop in this cell
if not skip_training:
    # YOUR CODE HERE
    #raise NotImplementedError()

    # Zero out the gradients
    net.zero_grad()
    
    # The loss function
    crossEntropyLoss = nn.CrossEntropyLoss()
    
    # The optimizers
    optimizer = optim.Adam(net.parameters(), lr=0.01)
    
    # Number of training epochs 
    epochs = 10
    
    # Training over the dataset multiple times
    for epoch in range(epochs):
        # Before training the network, we call net.train() 
        net.train()
        for images, labels in trainloader:
            optimizer.zero_grad()
            labels_pred = net(images)
            loss = crossEntropyLoss(labels_pred, labels)
            loss.backward()
            optimizer.step()
        
        accuracy = compute_accuracy(net,testloader)
        print(f"Accuracy at epoch {epoch}: {accuracy}")

In [59]:
# Save the model to disk (the pth-files will be submitted automatically together with your notebook)
# Set confirm=False if you do not want to be asked for confirmation before saving.
if not skip_training:
    tools.save_model(net, '2_vgg_net.pth', confirm=False)

In [60]:
if skip_training:
    net = VGGNet()
    tools.load_model(net, '2_vgg_net.pth', device)

Model loaded from 2_vgg_net.pth.


In [61]:
# Compute the accuracy on the test set
accuracy = compute_accuracy(net, testloader)
print(f'Accuracy of the VGG net on the test images: {accuracy: .3f}')
assert accuracy > 0.89, 'Poor accuracy'
print('Success')

Accuracy of the VGG net on the test images:  0.913
Success
