# Convolutional Neural Networks

## Using PyTorch to handle tensors

In [1]:
import torch

In [7]:
x = torch.arange(12, dtype=torch.float32) # The arange(n) function creates a tensor with values from 1 to n - 1.
x
x.numel()
X = x.reshape(3, 4) # Rearrange the tensor as a tensor of size 3x4.
torch.zeros((2, 3, 4)) # Create a 2x3x4 tensor with just zeros.
torch.ones((2, 3, 4)) # Create a 2x3x4 tensor with just ones.
torch.randn(3, 4) # Create a 3x4 tensor with random values from a standard normal distribution.

tensor([[-0.7630,  0.4577, -0.1429, -0.1219],
        [-1.4268, -1.8814,  0.2850,  0.7874],
        [ 0.0273, -0.4959, -0.7572,  2.6761]])

Indexing and slicing allow to access elements from a tensor.
Indexing can also be used to manipulate and modify a tensor.

In [9]:
X[-1] # Indexing: access one element.
X[1:3] # Slicing: access more elements
X[1, 2] = 17 # Change X[1, 2] from 6 to 17.
X

tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  5., 17.,  7.],
        [ 8.,  9., 10., 11.]])

In [16]:
torch.exp(x) # Apply e^x on each entry of x.
x = torch.tensor([1, 2, 4, 8])
y = torch.tensor([2, 2, 2, 2])
x + y # Point-wise sum.
x - y # Point-wise subtraction.
x / y # Point-wise division.
x * y # Point-wise multiplication.

X = torch.arange(12, dtype=torch.float32).reshape((3, 4))
Y = torch.tensor([[2, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])
torch.cat((X, Y), dim=0), torch.cat((X, Y), dim=1) # Concatenation along the x and y axes.

X == Y # Binary tensor that checks whether X[i, j] = Y[i, j].

X.sum() # Summing elements in a tensor returns a 1x1 tensor with the result.

a = torch.arange(3).reshape((3, 1))
b = torch.arange(2).reshape((1, 2))
a, b
a + b # The shape of the result is maximised.

tensor([[0, 1],
        [1, 2],
        [2, 3]])

In [17]:
a = torch.tensor([3.5])
a.item() # Convert a 1x1 tensor into a scalar.

3.5

## AlexNet

In [None]:
!git clone https://github.com/d2l-ai/d2l-pytorch-colab.git
%cd 'd2l-pytorch-colab'
!pip install -e

In [None]:
import torch
from torch import nn
from d2l import torch as d2l

AlexNet is an 8-layer convolutional neural network that, while shallower than more modern models, is still vable to achieve good classification performance, although, nowadays, it is less used due to the big memory overhead.

In [None]:
class AlexNet(d2l.Classifier):
    def __init__(self, lr=0.1, num_classes=10):
        super().__init__()
        self.save_hyperparameters()
        self.net = nn.Sequential(
            nn.LazyConv2d(96, kernel_size=11, stride=4, padding=1),
            nn.ReLU(), nn.MaxPool2d(kernel_size=3, stride=2),
            nn.LazyConv2d(256, kernel_size=5, padding=2), nn.ReLU(),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.LazyConv2d(384, kernel_size=3, padding=1), nn.ReLU(),
            nn.LazyConv2d(384, kernel_size=3, padding=1), nn.ReLU(),
            nn.LazyConv2d(256, kernel_size=3, padding=1), nn.ReLU(),
            nn.MaxPool2d(kernel_size=3, stride=2), nn.Flatten(),
            nn.LazyLinear(4096), nn.ReLU(), nn.Dropout(p=0.5),
            nn.LazyLinear(4096), nn.ReLU(),nn.Dropout(p=0.5),
            nn.LazyLinear(num_classes))
        self.net.apply(d2l.init_cnn)

In [None]:
AlexNet().layer_summary((1, 1, 224, 224)) # Summary of AlexNet's behaviour and output shapes.

In [None]:
model = AlexNet(lr=0.01) # Create an instance of the model with learning rate 0.01.
data = d2l.FashionMNIST(batch_size=128, resize=(224, 224)) # For simplicity, perform training on the FashionMNIST dataset.
trainer = d2l.Trainer(max_epochs=10, num_gpus=1)
trainer.fit(model, data)

'''
Overall, this instance manages to achieve very good classification accuracy throughout the epochs.
In particular, most of the loss is handled during the first two epochs and then steadily decreases from the next one.
'''

Questions:
1. **Analyze the computational properties of AlexNet.**
   1. **Compute the memory footprint for convolutions and fully connected layers.** <br>
      Most of the memory footprint is associated to the first fully connected layer as it needs to implement very large weight matrices for the following fully connected layers. <br>
      Generally speaking, it is possible to measure the memory footprint in terms of the number of required parameters, which, in the case of a convolutional layer, can be computed as $(F^2C + 1)K$ or, in the case of a linear layer, as $FCIO$. <br>
      Therefore, most of the work will be carried out at the fully connected layers.
   2. **Calculate the computational cost for the convolutions and the fully connected layers.** <br>
      Despite its simple architecture, AlexNet requires approximately 60 million parameters, resulting in a very high computational cost.
   3. **How does the memory affect computation?** <br>
      From a general point of view, the memory overhead of AlexNet introduces an additional latency that is most significant during the computations of the fully connected layers.
2. **You are a chip designer and need to trade off computation and memory bandwidth. How do you optimize?**
3. **Why do engineers no longer report performance benchmarks on AlexNet?**
4. **Try increasing the number of epochs when training AlexNet. Compared to LeNet, how do the results differ?**
5. **AlexNet may be too complex for he Fashion-MNIST dataset.**
   1. **Try simplifying the model to make the training faster, while ensuring that the accuracy does not drop significantly.**
   2. **Design a better model that works directly on 28x28 image.**
6. **Modify the batch size and observe the changes in throughput, accuracy and GPU memory.**
7. **Apply dropout and ReLU to LeNet-5: does it improve?**
8. **Can you make AlexNet overfit?**

## VGGNet

VGGNet is a convolutional neural network that tries to implement repeated layer patterns as blocks consisting of multiple convolutional layers followed by a max pooling layer.

In [None]:
def vgg_block(num_convs, out_channels):
    layers = []
    for _ in range(num_convs):
        layers.append(nn.LazyConv2d(out_channels, kernel_size=3, padding=1))
        layers.append(nn.ReLU())
    layers.append(nn.MaxPool2d(kernel_size=2,stride=2))
    return nn.Sequential(*layers)

In [None]:
class VGG(d2l.Classifier):
    def __init__(self, arch, lr=0.1, num_classes=10):
        super().__init__()
        self.save_hyperparameters()
        conv_blks = []
        for (num_convs, out_channels) in arch:
            conv_blks.append(vgg_block(num_convs, out_channels))
        self.net = nn.Sequential(
            *conv_blks, nn.Flatten(),
            nn.LazyLinear(4096), nn.ReLU(), nn.Dropout(0.5),
            nn.LazyLinear(4096), nn.ReLU(), nn.Dropout(0.5),
            nn.LazyLinear(num_classes))
        self.net.apply(d2l.init_cnn)

The first-ever implementation of the VGGNet model was VGG-11, which used five convolutional blocks. <br>
Of these, the first two blocks implement one convolutional layer each, whereas the other blocks use two convolutional layers each.

In [None]:
VGG(arch=((1, 64), (1, 128), (2, 256), (2, 512), (2, 512))).layer_summary(
    (1, 1, 224, 224))

In [None]:
model = VGG(arch=((1, 16), (1, 32), (2, 64), (2, 128), (2, 128)), lr=0.01)
trainer = d2l.Trainer(max_epochs=10, num_gpus=1)
data = d2l.FashionMNIST(batch_size=128, resize=(224, 224)) # For simplicity, perform training on the FashionMNIST dataset.
model.apply_init([next(iter(data.get_dataloader(True)))[0]], d2l.init_cnn)
trainer.fit(model, data)

'''
Similarly for AlexNet, this instance manages to achieve very good classification accuracy throughout the epochs.
However, since the training and validation losses are very close, the model is considered to be slightly overfitting the data.
'''

Questions:
1. **Compared with AlexNet, VGG is much slower in terms of computation, and it also needs more GPU memory.**
   1. **Compare the number of parameters needed for AlexNet and VGG.**
   2. **Compare the number of floating point operations used in the convolutional layers and in the fully connected layers.**
   3. **How could you reduce the computational cost created by the fully connected layers?**
2. **When displaying the dimensions associated with the various layers of the network, there is information just for eight blocks. Where did the other layers go?**
3. **Construct other common models, such as VGG-16 or VGG-19.**
4. **Upsampling the resolution in Fashion-MNIST is very wasteful. Try modifying the network architecture and resolution conversion. Can you do so without reducing the accuracy of the network?**