### **LAB 2 -** Computer Vision. Convolutional Neural Networks.
Group: M24-RO-01 / M24-RO15-01

Instructor: Alexey Kornaev

TA: Kirill Yakovlev

**INTRODUCTION**

**Today we start talking about one of the most important and fundamental concept in Computer Vision - convolutions. Convolution is an operation used to extract features from data (usually high-dimensional). The operation itself simply takes a matrix made of numbers, moves it through the data, and takes the sum of products between the data and that matrix. This matrix is called kernel or filter.**

**Let's take a simple classification case for a better understading using a pytorch framework.**

In [34]:
import torch
import numpy as np
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms, models

**Let's start with sequential transformations. Do you know what they are exactly doing?**

In [35]:
transform_cifar_train = transforms.Compose([
    transforms.RandomHorizontalFlip(),
    transforms.RandomCrop(32, padding=4),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

transform_cifar_test = transform_cifar_train = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
# Uploading data and finalizing dataset preparation

cifar10_train = datasets.CIFAR10(root='data', train=True, transform=transform_cifar_train, download=True)
cifar10_test = datasets.CIFAR10(root='data', train=False, transform=transform_cifar_test, download=True)

cifar10_loader_train = DataLoader(cifar10_train, batch_size=32, shuffle=True)
cifar10_loader_test = DataLoader(cifar10_test, batch_size=32)

Files already downloaded and verified
Files already downloaded and verified


**Let's start with designing our first architecture. We want to keep it as simple as possible in the first attempt gradually improving it in next sections. This CNN consists of one convolutional layer following by the max pooling layer and finalizing our model by the fully connected layer for the classification purpose.**

In [36]:
class My_First_CNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv = nn.Conv2d(3, 16, kernel_size=3, padding=1)
        self.pool = nn.MaxPool2d(2, 2)
        self.fc = nn.Linear(16 * 16 * 16, 10)
        self.out = nn.LogSoftmax(dim=1)
        # self.out = nn.Softmax(dim=1)

    def forward(self, x):
        x = self.pool(torch.relu(self.conv(x)))
        x = x.view(-1, 16 * 16 * 16)
        x = self.fc(x)
        x = self.out(x)
        return x

**Let's take a look on our model!**

In [37]:
model = My_First_CNN()
print(model)

My_First_CNN(
  (conv): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (fc): Linear(in_features=4096, out_features=10, bias=True)
  (out): LogSoftmax(dim=1)
)


**Also we can check our model out and analyze its properties using summary. Sometimes it also can help you arrange size of your layers and avoid training errors.**

In [38]:
from torchsummary import summary

model = My_First_CNN()
summary(model, input_size = (3,32,32))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1           [-1, 16, 32, 32]             448
         MaxPool2d-2           [-1, 16, 16, 16]               0
            Linear-3                   [-1, 10]          40,970
        LogSoftmax-4                   [-1, 10]               0
Total params: 41,418
Trainable params: 41,418
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.01
Forward/backward pass size (MB): 0.16
Params size (MB): 0.16
Estimated Total Size (MB): 0.33
----------------------------------------------------------------


**Next let's try to train our initial model and see what result we will get. Let's make a specific function for that so we can reuse it for further experiments.**

In [39]:
def train(model, train_loader, test_loader, optimizer, criterion = torch.nn.NLLLoss(),
          n_epochs = 10, max_epochs_stop = 3, save_file = 'model-cifar.pt'):

    # specify loss function
    criterion = criterion

    # specify optimizer
    optimizer = optimizer

    epochs_no_improve = 0
    max_epochs_stop = max_epochs_stop
    test_loss_min = np.Inf

    for epoch in range(1, n_epochs+1):

        # keep track of training and Test loss
        train_loss = 0.0
        test_loss = 0.0

        train_acc = 0
        test_acc = 0



        # TRAIN STEP


        model.train()

        for i, (data, target) in enumerate(train_loader):
            # move tensors to GPU if CUDA is available
            if train_on_gpu:
                data, target = data.cuda(), target.cuda()
            # clear the gradients of all optimized variables
            optimizer.zero_grad()
            # forward pass: compute predicted outputs by passing inputs to the model
            output = model(data)
            # calculate the batch loss
            loss = criterion(output, target)
            # backward pass: compute gradient of the loss with respect to model parameters
            loss.backward()
            # perform a single optimization step (parameter update)
            optimizer.step()
            # update training loss
            train_loss += loss.item()

            # Calculate accuracy (don't forget to modify if you are using Cross-Enthropy and Softmax function)
            ps = torch.exp(output)
            topk, topclass = ps.topk(1, dim = 1)
            equals = topclass == target.view(*topclass.shape)
            accuracy = torch.mean(equals.type(torch.FloatTensor))
            train_acc += accuracy.item()

            print(f'Epoch: {epoch} \t {100 * i / len(train_loader):.2f}% complete.', end = '\r')

        # VALIDATION STEP


        model.eval()
        for data, target in test_loader:
            # move tensors to GPU if CUDA is available
            if train_on_gpu:
                data, target = data.cuda(), target.cuda()
            # forward pass: compute predicted outputs by passing inputs to the model
            output = model(data)
            # calculate the batch loss
            loss = criterion(output, target)
            # update average Test loss
            test_loss += loss.item()

            # Calculate accuracy
            ps = torch.exp(output)
            topk, topclass = ps.topk(1, dim = 1)
            equals = topclass == target.view(*topclass.shape)
            accuracy = torch.mean(equals.type(torch.FloatTensor))
            test_acc += accuracy.item()

        # calculate average losses
        train_loss = train_loss/len(train_loader)
        test_loss = test_loss/len(test_loader)

        train_acc = train_acc/len(train_loader)
        test_acc = test_acc/len(test_loader)

        # print training/Test statistics
        print('\nEpoch: {} \tTraining Loss: {:.6f} \tTest Loss: {:.6f}'.format(
            epoch, train_loss, test_loss))
        print(f'Training Accuracy: {100 * train_acc:.2f}%\t Test Accuracy: {100 * test_acc:.2f}%')

        # save model if Test loss has decreased
        if test_loss <= test_loss_min:
            print('Test loss decreased ({:.6f} --> {:.6f}).  Saving model ...'.format(
            test_loss_min,
            test_loss))
            torch.save(model.state_dict(), save_file)
            epochs_no_improve = 0
            test_loss_min = test_loss
        else:
            epochs_no_improve += 1
            print(f'{epochs_no_improve} epochs with no improvement.')
            if epochs_no_improve >= max_epochs_stop:
                print('Early Stopping')
                break

**Let's specify our training parameters like number of training epochs, optimzier, etc. It's up to you what you think can be specified to enhance your results**

In [40]:
n_epochs = 20 # you may increase this number to train in a final model
optimizer = optim.Adam(model.parameters()) # Choosing optimizer. Let's choose classical Adam optimizer
save_file_name = 'model-cifar.pt' # define name to save weights of our model

**Last step we can specify whether we want to train our model by CPU or GPU. In colab you can choose T4 in your Runtime type settings, but usage time is a strictly limited.**

In [41]:
train_on_gpu = torch.cuda.is_available()

if train_on_gpu:
    model.cuda()

**Start training...**

In [42]:
train(model, cifar10_loader_train, cifar10_loader_test, optimizer = optimizer,
      n_epochs = n_epochs, save_file = save_file_name)


Epoch: 1 	Training Loss: 1.449822 	Test Loss: 1.277729
Training Accuracy: 49.13%	 Test Accuracy: 56.01%
Test loss decreased (inf --> 1.277729).  Saving model ...

Epoch: 2 	Training Loss: 1.193094 	Test Loss: 1.226409
Training Accuracy: 58.57%	 Test Accuracy: 57.04%
Test loss decreased (1.277729 --> 1.226409).  Saving model ...

Epoch: 3 	Training Loss: 1.082447 	Test Loss: 1.094553
Training Accuracy: 62.26%	 Test Accuracy: 62.00%
Test loss decreased (1.226409 --> 1.094553).  Saving model ...

Epoch: 4 	Training Loss: 1.006337 	Test Loss: 1.077619
Training Accuracy: 65.33%	 Test Accuracy: 62.81%
Test loss decreased (1.094553 --> 1.077619).  Saving model ...

Epoch: 5 	Training Loss: 0.958029 	Test Loss: 1.059042
Training Accuracy: 66.81%	 Test Accuracy: 63.65%
Test loss decreased (1.077619 --> 1.059042).  Saving model ...

Epoch: 6 	Training Loss: 0.924292 	Test Loss: 1.038461
Training Accuracy: 68.09%	 Test Accuracy: 64.78%
Test loss decreased (1.059042 --> 1.038461).  Saving model .

**By such trivial architecture we are still able to achieve accuracy ~64.8% which is pretty decent given number of classes.**

**But let's switch to something more advanced like DNN (Deep Neural Networks). Basically, it means we start stacking many layers sequentially making our model more complex as well as increasing it is ability to catch more advanced relations between training data and classes accordingly.**

**TO make it more convenient pytorch has a special method [Sequential](https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html) making it easier to construct more complex architectures. However, it requires some different reorganization in code from us. We can arrange this architecture as blocks for feature extraction and classification procedures.**

In [43]:
class CNN_advanced(nn.Module):
    def __init__(self):
        super().__init__()
        self.feature_extraction_block = nn.Sequential(
            nn.Conv2d(3, 16, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2, 2),
            nn.Conv2d(16, 32, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2, 2),
            nn.Conv2d(32, 16, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2, 2)
        )

        self.flatten = nn.Flatten()

        self.classification_block = nn.Sequential(
            nn.Linear(in_features= 4 * 4 * 4 * 4, out_features=10),
        )
        self.out = nn.LogSoftmax(dim=1)

    def forward(self, x):
        x = self.feature_extraction_block(x)
        x = self.flatten(x)
        x = self.classification_block(x)
        x = self.out(x)
        return x

In [44]:
model = CNN_advanced()
summary(model, input_size = (3,32,32))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1           [-1, 16, 32, 32]             448
              ReLU-2           [-1, 16, 32, 32]               0
         MaxPool2d-3           [-1, 16, 16, 16]               0
            Conv2d-4           [-1, 32, 16, 16]           4,640
              ReLU-5           [-1, 32, 16, 16]               0
         MaxPool2d-6             [-1, 32, 8, 8]               0
            Conv2d-7             [-1, 16, 8, 8]           4,624
              ReLU-8             [-1, 16, 8, 8]               0
         MaxPool2d-9             [-1, 16, 4, 4]               0
          Flatten-10                  [-1, 256]               0
           Linear-11                   [-1, 10]           2,570
       LogSoftmax-12                   [-1, 10]               0
Total params: 12,282
Trainable params: 12,282
Non-trainable params: 0
---------------------------------

**We can notice that adding new convolution layers is not associated with significant parameter increase, which is crucial when you are dealing with calculation resource limitations.**

**Let's try to train our new model...**

**Don't hesitate to repeat training process couple (several) times, remember that we are dealing with a LOCALLY optimal parametric system that sometimes is associated with unstable results.**

In [45]:
n_epochs = 30
optimizer = optim.Adam(model.parameters(), lr=1e-3)
save_file_name = 'model-cifar-advanced.pt'

train(model, cifar10_loader_train, cifar10_loader_test, optimizer = optimizer,
      n_epochs = n_epochs, save_file = save_file_name)


Epoch: 1 	Training Loss: 1.525545 	Test Loss: 1.336467
Training Accuracy: 44.65%	 Test Accuracy: 52.57%
Test loss decreased (inf --> 1.336467).  Saving model ...

Epoch: 2 	Training Loss: 1.232501 	Test Loss: 1.162223
Training Accuracy: 56.21%	 Test Accuracy: 59.50%
Test loss decreased (1.336467 --> 1.162223).  Saving model ...

Epoch: 3 	Training Loss: 1.107738 	Test Loss: 1.054220
Training Accuracy: 61.15%	 Test Accuracy: 62.54%
Test loss decreased (1.162223 --> 1.054220).  Saving model ...

Epoch: 4 	Training Loss: 1.025046 	Test Loss: 1.004530
Training Accuracy: 64.08%	 Test Accuracy: 64.75%
Test loss decreased (1.054220 --> 1.004530).  Saving model ...

Epoch: 5 	Training Loss: 0.970392 	Test Loss: 0.981290
Training Accuracy: 65.97%	 Test Accuracy: 65.92%
Test loss decreased (1.004530 --> 0.981290).  Saving model ...

Epoch: 6 	Training Loss: 0.926637 	Test Loss: 0.952419
Training Accuracy: 67.56%	 Test Accuracy: 66.80%
Test loss decreased (0.981290 --> 0.952419).  Saving model .

**As we can notice adding couple new convolution layers gave us ~4.9% increase in accuracy, which is not that bad.**

**This principle with a consequent applying convolution with ReLU and max pooling can be met in advanced deep architectures like VGG16 (image below). But for CIFAR10 this architecture looks exorbitantly advanced.**

![](https://www.researchgate.net/profile/Jose-Cano-6/publication/327070011/figure/fig1/AS:660549306159105@1534498635256/VGG-16-neural-network-architecture.png)

**Nevertheless, for real-world problems when we usually stick to already existent solutions and architectures. Pytorch provides already pretrained models for classification tasks usually pretrained on [Image Net](https://www.image-net.org/) dataset. As of today they have the following models available:**



* AlexNet
* ConvNeXt
* DenseNet
* EfficientNet
* EfficientNetV2
* GoogLeNet
* Inception V3
* MaxVit
* MNASNet
* MobileNet V2
* MobileNet V3
* RegNet
* ResNet
* ResNeXt
* ShuffleNet V2
* SqueezeNet
* SwinTransformer
* VGG
* VisionTransformer
* Wide ResNets

**Let's take something pretty straightforward from the architectural perspective. For example DenseNet looks appropriate for demonstrative goals. You can apply two different strategies:**

1.   pretrained=**True**: use the pre-trained weights of the model which are trained on a larger database (recommended)
2.   pretrained=**False**: begin with randomized weights and the densenet121 architecture.

**Let's upload a model with pretraiend weights and check its architectural shape.**

In [46]:
model = models.densenet121(pretrained=True)
print(model)



DenseNet(
  (features): Sequential(
    (conv0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (norm0): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu0): ReLU(inplace=True)
    (pool0): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (denseblock1): _DenseBlock(
      (denselayer1): _DenseLayer(
        (norm1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      )
      (denselayer2): _DenseLayer(
        (norm1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu

**As the next step we can define some transformations step for our data before training our architectrue. It is up to you which transformations you think seems relevant.**

In [47]:
transform_cifar_train_densenet = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
    ])

transform_cifar_test_densenet = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
    ])

In [48]:
cifar10_train_pre = datasets.CIFAR10(root='data', train=True, transform=transform_cifar_train_densenet, download=True)
cifar10_test_pre = datasets.CIFAR10(root='data', train=False, transform=transform_cifar_test_densenet, download=True)

cifar10_loader_train_pre = DataLoader(cifar10_train_pre, batch_size=32, shuffle=True)
cifar10_loader_test_pre = DataLoader(cifar10_test_pre, batch_size=32)

Files already downloaded and verified
Files already downloaded and verified


**Given we want to base our solution on some pretrained architecture we are not interested to retrain already existed layers in this model.**

**First let's check whether layers are in "training" mode.**

In [49]:
for name, param in model.named_parameters():
    print(name,param.requires_grad)

features.conv0.weight True
features.norm0.weight True
features.norm0.bias True
features.denseblock1.denselayer1.norm1.weight True
features.denseblock1.denselayer1.norm1.bias True
features.denseblock1.denselayer1.conv1.weight True
features.denseblock1.denselayer1.norm2.weight True
features.denseblock1.denselayer1.norm2.bias True
features.denseblock1.denselayer1.conv2.weight True
features.denseblock1.denselayer2.norm1.weight True
features.denseblock1.denselayer2.norm1.bias True
features.denseblock1.denselayer2.conv1.weight True
features.denseblock1.denselayer2.norm2.weight True
features.denseblock1.denselayer2.norm2.bias True
features.denseblock1.denselayer2.conv2.weight True
features.denseblock1.denselayer3.norm1.weight True
features.denseblock1.denselayer3.norm1.bias True
features.denseblock1.denselayer3.conv1.weight True
features.denseblock1.denselayer3.norm2.weight True
features.denseblock1.denselayer3.norm2.bias True
features.denseblock1.denselayer3.conv2.weight True
features.denseb

**Next let's change this by applying False mode for each layer in the pretrained model. Since we know that classifier layer is responsible specifically for distinguising classes we can replace it by some new parametric module. For this purpose we can also apply Sequential method from the previous section.**

In [50]:
for param in model.parameters():
    param.requires_grad = False

**Let's take a layer for retraining...**

In [51]:
for param in model.features.denseblock4.parameters():
    param.requires_grad = True

In [52]:
num_features = model.classifier.in_features

In [55]:
model.classifier = nn.Sequential(
                        nn.Linear(num_features, 256),
                        nn.LeakyReLU(),
                        nn.Dropout(0.3),
                        nn.Linear(256, 10),
                        nn.LogSoftmax(dim=1))

                        # nn.Softmax(dim=1))

In [None]:
n_epochs = 40
optimizer = optim.Adam(model.parameters(), lr=1e-4)
save_file_name = 'model-cifar-pretrained.pt'

train(model, cifar10_loader_train_pre, cifar10_loader_test_pre, optimizer = optimizer,
      n_epochs = n_epochs, criterion = torch.nn.NLLLoss(), max_epochs_stop = 10, save_file = save_file_name)


Epoch: 1 	Training Loss: 1.272059 	Test Loss: 1.039625
Training Accuracy: 55.48%	 Test Accuracy: 63.54%
Test loss decreased (inf --> 1.039625).  Saving model ...

Epoch: 2 	Training Loss: 1.045023 	Test Loss: 0.984338
Training Accuracy: 63.40%	 Test Accuracy: 65.91%
Test loss decreased (1.039625 --> 0.984338).  Saving model ...

Epoch: 3 	Training Loss: 0.954804 	Test Loss: 0.971447
Training Accuracy: 66.41%	 Test Accuracy: 66.74%
Test loss decreased (0.984338 --> 0.971447).  Saving model ...

Epoch: 4 	Training Loss: 0.883186 	Test Loss: 0.968694
Training Accuracy: 69.02%	 Test Accuracy: 67.18%
Test loss decreased (0.971447 --> 0.968694).  Saving model ...

Epoch: 5 	Training Loss: 0.824159 	Test Loss: 0.944177
Training Accuracy: 70.96%	 Test Accuracy: 68.38%
Test loss decreased (0.968694 --> 0.944177).  Saving model ...

Epoch: 6 	Training Loss: 0.773601 	Test Loss: 0.927481
Training Accuracy: 72.81%	 Test Accuracy: 68.61%
Test loss decreased (0.944177 --> 0.927481).  Saving model .

**AS we might notice result is not that drastically better compared to our initial model with convolution layers. Still you can try some new models as well as new a new shape for a classifier module.**

**That's it for today...**

---

## **Home Task**

You can choose any multi-label classification dataset for the following tasks(you can still stick with CIFAR10)


1.   In this task you need to improve the last architecture (CNN_advanced) using unused methods and tricks to improve the final result. As an example this could be:

*   Batch Normalization
*   Dropout
*   Random Filter Pruning
*   etc.

Result should be improved compared to the initial one we have had during the lab seesion.
2.  Train an additional architecture using any pretrained model

Requirements:


*   You should have your own classification layer (usually the last one) or layer that is responsible for classification. You can add a new layer if a classification layer is initially absent
*   You are able to choose at least one layer to retrain from the pretrained architecture
*   Architecture should beat the initial model in the 1st task
*   You  can choose any architecture from the pretrained list

3.  Explain what methods (or model additions) have had the biggest impact on the test set accuracy. Provide graphical comparison in metrics between models you have obtained in tasks 1 and 2


Remember that you have approximately **3 days** to complete these tasks. You are allowed to use any avaialble computational resource.

**Expected result:** uploaded Jupyter notebook with completed tasks to the Moodle assignment section.

In [None]:
# *** YOUR CODE ***