<a href="https://colab.research.google.com/github/ArindamRoy23/DSBA_T2-CS-Advanced_Deep_Learning/blob/master/TP4/TP4_Transfer_Learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Practical session on Transfer Learning**
This Pratical session proposes to study several techniques for improving challenging context, in which few data and resources are available.

# Introduction

**Context :**

Assume we are in a context where few "gold" labeled data are available for training, say 

$$\mathcal{X}_{\text{train}} = \{(x_n,y_n)\}_{n\leq N_{\text{train}}}$$

where $N_{\text{train}}$ is small. 

A large test set $\mathcal{X}_{\text{test}}$ as well as a large amount of unlabeled data, $\mathcal{X}$, is available. We also assume that we have a limited computational budget (e.g., no GPUs).

**Instructions to follow :** 

For each question, write a commented *Code* or a complete answer as a *Markdown*. When the objective of a question is to report a CNN accuracy, please use the following format to report it, at the end of the question :

| Model | Number of  epochs  | Train accuracy | Test accuracy |
|------|------|------|------|
|   XXX  | XXX | XXX | XXX |

If applicable, please add the field corresponding to the  __Accuracy on Full Data__ as well as a link to the __Reference paper__ you used to report those numbers. (You do not need to train a CNN on the full CIFAR10 dataset!)

In your final report, please *keep the logs of each training procedure* you used. We will only run this jupyter if we have some doubts on your implementation. 

The total file sizes should be reasonable (feasible with 2MB only!). You will be asked to hand in the notebook, together with any necessary files required to run it if any.

You can use https://colab.research.google.com/ to run your experiments.

## Training set creation
__Question 1 (2 points) :__ Propose a dataloader to obtain a training loader that will only use the first 100 samples of the CIFAR-10 training set.

Additional information :  

*   CIFAR10 dataset : https://en.wikipedia.org/wiki/CIFAR-10
*   You can directly use the dataloader framework from Pytorch.
*   Alternatively you can modify the file : https://github.com/pytorch/vision/blob/master/torchvision/datasets/cifar.py

In [108]:
import torch
import torchvision
import torchvision.transforms as transforms

transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
                                        
trainset.data = trainset.data[:100]
trainset.targets = trainset.targets[:100]

trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=2)

Files already downloaded and verified


In [109]:
testset = torchvision.datasets.CIFAR10(root='./data', train = False,
                                        download=True, transform=transform)
       
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
                                          shuffle=True, num_workers=2)

Files already downloaded and verified


In [110]:
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainset.data = trainset.data[:100]
trainset.targets = trainset.targets[:100]
trainloader = torch.utils.data.DataLoader(trainset, batch_size=10,
                                          shuffle=True, num_workers=2)

valset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)

valset.data = valset.data[100:150]
valset.targets = valset.targets[100:150]

valloader = torch.utils.data.DataLoader(valset, batch_size=10,
                                          shuffle=True, num_workers=2)


testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=10,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

Files already downloaded and verified
Files already downloaded and verified
Files already downloaded and verified


* This is our dataset $\mathcal{X}_{\text{train}}$, it will be used until the end of this project. 

* The remaining samples correspond to $\mathcal{X}$. 

* The testing set $\mathcal{X}_{\text{test}}$ corresponds to the whole testing set of CIFAR-10.

## Testing procedure
__Question 2 (1.5 points):__ Explain why the evaluation of the training procedure is difficult. Propose several solutions.

# The Baseline

In this section, the goal is to train a CNN on $\mathcal{X}_{\text{train}}$ and compare its performance with reported numbers from the litterature. You will have to re-use and/or design a standard classification pipeline. You should optimize your pipeline to obtain the best performances (image size, data augmentation by flip, ...).

The key ingredients for training a CNN are the batch size, as well as the learning rate scheduler (i.e. how to decrease the learning rate as a function of the number of epochs). A possible scheduler is to start the learning rate at 0.1 and decreasing it every 30 epochs by 10. In case of divergence, reduce the learning rate. A potential batch size could be 10, yet this can be cross-validated.

You can get some baselines accuracies in this paper (obviously, it is a different context for those researchers who had access to GPUs!) : http://openaccess.thecvf.com/content_cvpr_2018/papers/Keshari_Learning_Structure_and_CVPR_2018_paper.pdf. 

In [111]:
classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

In [112]:
import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


net = Net()

In [113]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.optim.lr_scheduler as lr_scheduler
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

In [114]:
# Define your optimizer and learning rate scheduler
optimizer = optim.Adam(net.parameters(), lr=0.001)
scheduler = lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1)
criterion = nn.CrossEntropyLoss()

# Early stopping parameters
patience = 10
best_accuracy = 0.0
counter = 0

# Training loop
for epoch in range(100):
    running_loss = 0.0
    for i, data in enumerate(trainloader):
        inputs, labels = data

        optimizer.zero_grad()

        # Move your data and model to GPU if available
        device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
        inputs, labels = inputs.to(device), labels.to(device)
        net.to(device)

        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
    
    # Update the learning rate scheduler
    scheduler.step()

    # Print statistics
    print('[Epoch %d] loss: %.3f' % (epoch + 1, running_loss / len(trainloader)))

     # Validation testing
    correct = 0
    total = 0
    with torch.no_grad():
        for data in valloader:
            images, labels = data
            images, labels = images.to(device), labels.to(device)

            outputs = net(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    accuracy = 100 * correct / total
    print('[Epoch %d] validation accuracy: %.2f %%' % (epoch + 1, accuracy))

    # Early stopping
    if accuracy > best_accuracy:
        best_accuracy = accuracy
        counter = 0
        # Save the model weights to a file
        torch.save(net.state_dict(), 'best_weights.pth')
    else:
        counter += 1
        if counter >= patience:
            print('Early stopping: validation accuracy did not improve for %d epochs.' % patience)
            break

print('Finished Training')

# Load the best model weights and evaluate on the test set
net.load_state_dict(torch.load('best_weights.pth'))
net.eval()


[Epoch 1] loss: 2.304
[Epoch 1] validation accuracy: 16.00 %
[Epoch 2] loss: 2.270
[Epoch 2] validation accuracy: 16.00 %
[Epoch 3] loss: 2.205
[Epoch 3] validation accuracy: 18.00 %
[Epoch 4] loss: 2.153
[Epoch 4] validation accuracy: 16.00 %
[Epoch 5] loss: 2.115
[Epoch 5] validation accuracy: 16.00 %
[Epoch 6] loss: 2.008
[Epoch 6] validation accuracy: 26.00 %
[Epoch 7] loss: 1.941
[Epoch 7] validation accuracy: 24.00 %
[Epoch 8] loss: 1.829
[Epoch 8] validation accuracy: 24.00 %
[Epoch 9] loss: 1.725
[Epoch 9] validation accuracy: 22.00 %
[Epoch 10] loss: 1.611
[Epoch 10] validation accuracy: 26.00 %
[Epoch 11] loss: 1.437
[Epoch 11] validation accuracy: 24.00 %
[Epoch 12] loss: 1.404
[Epoch 12] validation accuracy: 22.00 %
[Epoch 13] loss: 1.383
[Epoch 13] validation accuracy: 22.00 %
[Epoch 14] loss: 1.359
[Epoch 14] validation accuracy: 22.00 %
[Epoch 15] loss: 1.336
[Epoch 15] validation accuracy: 22.00 %
[Epoch 16] loss: 1.317
[Epoch 16] validation accuracy: 22.00 %
Early stop

Net(
  (conv1): Conv2d(3, 6, kernel_size=(5, 5), stride=(1, 1))
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)

In [115]:
correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        images, labels = data
        images = images.to(device)
        labels = labels.to(device)
        outputs = net(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %d %%' % (
    100 * correct / total))


Accuracy of the network on the 10000 test images: 14 %


In [116]:
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
with torch.no_grad():
    for data in testloader:
        images, labels = data
        images = images.to(device)
        labels = labels.to(device)
        outputs = net(images)
        _, predicted = torch.max(outputs, 1)
        c = (predicted == labels).squeeze()
        for i in range(4):
            label = labels[i]
            class_correct[label] += c[i].item()
            class_total[label] += 1


for i in range(10):
    print('Accuracy of %5s : %2d %%' % (
        classes[i], 100 * class_correct[i] / class_total[i]))


Accuracy of plane :  0 %
Accuracy of   car : 70 %
Accuracy of  bird :  4 %
Accuracy of   cat : 12 %
Accuracy of  deer :  0 %
Accuracy of   dog :  0 %
Accuracy of  frog :  0 %
Accuracy of horse : 24 %
Accuracy of  ship :  0 %
Accuracy of truck : 33 %


## ResNet architectures

__Question 3 (4 points) :__ Write a classification pipeline for $\mathcal{X}_{\text{train}}$, train from scratch and evaluate a *ResNet-18* architecture specific to CIFAR10 (details about the ImageNet model can be found here: https://arxiv.org/abs/1512.03385). Please report the accuracy obtained on the whole dataset as well as the reference paper/GitHub link.

*Hint :* You can re-use the following code : https://github.com/kuangliu/pytorch-cifar. During a training of 10 epochs, a batch size of 10 and a learning rate of 0.01, one obtains 40% accuracy on $\mathcal{X}_{\text{train}}$ (\~2 minutes) and 20% accuracy on $\mathcal{X}_{\text{test}}$ (\~5 minutes).

In [117]:
import torch
import torch.nn as nn
import torchvision.models as models

# Load pre-trained ResNet-18 model
resnet18 = models.resnet18(pretrained=False)

# Replace the last layer to have 10 outputs (one for each class in CIFAR-10)
num_features = resnet18.fc.in_features
resnet18.fc = nn.Linear(num_features, 10)

# # Freeze all layers except the last one
# for param in resnet18.parameters():
#     param.requires_grad = False
# resnet18.fc.requires_grad = True

# Print the model architecture
print(resnet18)

# Move the model to the device (GPU or CPU)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
resnet18.to(device)

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
  



ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
  

In [118]:
net = resnet18

In [119]:
# Define your optimizer and learning rate scheduler
optimizer = optim.Adam(net.parameters(), lr=0.001)
scheduler = lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1)
criterion = nn.CrossEntropyLoss()

# Early stopping parameters
patience = 10
best_accuracy = 0.0
counter = 0

# Training loop
for epoch in range(100):
    running_loss = 0.0
    for i, data in enumerate(trainloader):
        inputs, labels = data

        optimizer.zero_grad()

        # Move your data and model to GPU if available
        device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
        inputs, labels = inputs.to(device), labels.to(device)
        net.to(device)

        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
    
    # Update the learning rate scheduler
    scheduler.step()

    # Print statistics
    print('[Epoch %d] loss: %.3f' % (epoch + 1, running_loss / len(trainloader)))

     # Validation testing
    correct = 0
    total = 0
    with torch.no_grad():
        for data in valloader:
            images, labels = data
            images, labels = images.to(device), labels.to(device)

            outputs = net(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    accuracy = 100 * correct / total
    print('[Epoch %d] validation accuracy: %.2f %%' % (epoch + 1, accuracy))

    # Early stopping
    if accuracy > best_accuracy:
        best_accuracy = accuracy
        counter = 0
        # Save the model weights to a file
        torch.save(net.state_dict(), 'best_weights.pth')
    else:
        counter += 1
        if counter >= patience:
            print('Early stopping: validation accuracy did not improve for %d epochs.' % patience)
            break

print('Finished Training')

# Load the best model weights and evaluate on the test set
net.load_state_dict(torch.load('best_weights.pth'))
net.eval()


[Epoch 1] loss: 2.575
[Epoch 1] validation accuracy: 20.00 %
[Epoch 2] loss: 1.936
[Epoch 2] validation accuracy: 18.00 %
[Epoch 3] loss: 1.139
[Epoch 3] validation accuracy: 14.00 %
[Epoch 4] loss: 0.831
[Epoch 4] validation accuracy: 44.00 %
[Epoch 5] loss: 0.519
[Epoch 5] validation accuracy: 24.00 %
[Epoch 6] loss: 0.483
[Epoch 6] validation accuracy: 24.00 %
[Epoch 7] loss: 0.476
[Epoch 7] validation accuracy: 20.00 %
[Epoch 8] loss: 0.443
[Epoch 8] validation accuracy: 22.00 %
[Epoch 9] loss: 0.425
[Epoch 9] validation accuracy: 34.00 %
[Epoch 10] loss: 0.660
[Epoch 10] validation accuracy: 24.00 %
[Epoch 11] loss: 0.425
[Epoch 11] validation accuracy: 26.00 %
[Epoch 12] loss: 0.221
[Epoch 12] validation accuracy: 22.00 %
[Epoch 13] loss: 0.125
[Epoch 13] validation accuracy: 30.00 %
[Epoch 14] loss: 0.064
[Epoch 14] validation accuracy: 24.00 %
Early stopping: validation accuracy did not improve for 10 epochs.
Finished Training


ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
  

In [120]:
correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        images, labels = data
        images = images.to(device)
        labels = labels.to(device)
        outputs = net(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %d %%' % (
    100 * correct / total))


Accuracy of the network on the 10000 test images: 20 %


In [121]:
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
with torch.no_grad():
    for data in testloader:
        images, labels = data
        images = images.to(device)
        labels = labels.to(device)
        outputs = net(images)
        _, predicted = torch.max(outputs, 1)
        c = (predicted == labels).squeeze()
        for i in range(4):
            label = labels[i]
            class_correct[label] += c[i].item()
            class_total[label] += 1


for i in range(10):
    print('Accuracy of %5s : %2d %%' % (
        classes[i], 100 * class_correct[i] / class_total[i]))


Accuracy of plane :  3 %
Accuracy of   car : 48 %
Accuracy of  bird : 11 %
Accuracy of   cat : 38 %
Accuracy of  deer : 31 %
Accuracy of   dog :  1 %
Accuracy of  frog : 12 %
Accuracy of horse : 31 %
Accuracy of  ship : 18 %
Accuracy of truck :  4 %


# Transfer learning

We propose to use pre-trained models on a classification and generative task, in order to improve the results of our setting.

In [122]:
import torch
import torch.nn as nn
import torchvision.models as models

# Load pre-trained ResNet-18 model
resnet18 = models.resnet18(pretrained=True)

# Replace the last layer to have 10 outputs (one for each class in CIFAR-10)
num_features = resnet18.fc.in_features
resnet18.fc = nn.Linear(num_features, 10)

# # Freeze all layers except the last one
# for param in resnet18.parameters():
#     param.requires_grad = False
# resnet18.fc.requires_grad = True

# Print the model architecture
print(resnet18)

# Move the model to the device (GPU or CPU)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
resnet18.to(device)



ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
  

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
  

In [123]:
net = resnet18

In [124]:
# Define your optimizer and learning rate scheduler
optimizer = optim.Adam(net.parameters(), lr=0.001)
scheduler = lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1)
criterion = nn.CrossEntropyLoss()

# Early stopping parameters
patience = 10
best_accuracy = 0.0
counter = 0

# Training loop
for epoch in range(100):
    running_loss = 0.0
    for i, data in enumerate(trainloader):
        inputs, labels = data

        optimizer.zero_grad()

        # Move your data and model to GPU if available
        device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
        inputs, labels = inputs.to(device), labels.to(device)
        net.to(device)

        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
    
    # Update the learning rate scheduler
    scheduler.step()

    # Print statistics
    print('[Epoch %d] loss: %.3f' % (epoch + 1, running_loss / len(trainloader)))

     # Validation testing
    correct = 0
    total = 0
    with torch.no_grad():
        for data in valloader:
            images, labels = data
            images, labels = images.to(device), labels.to(device)

            outputs = net(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    accuracy = 100 * correct / total
    print('[Epoch %d] validation accuracy: %.2f %%' % (epoch + 1, accuracy))

    # Early stopping
    if accuracy > best_accuracy:
        best_accuracy = accuracy
        counter = 0
        # Save the model weights to a file
        torch.save(net.state_dict(), 'best_weights.pth')
    else:
        counter += 1
        if counter >= patience:
            print('Early stopping: validation accuracy did not improve for %d epochs.' % patience)
            break

print('Finished Training')

# Load the best model weights and evaluate on the test set
net.load_state_dict(torch.load('best_weights.pth'))
net.eval()


[Epoch 1] loss: 2.742
[Epoch 1] validation accuracy: 24.00 %
[Epoch 2] loss: 1.953
[Epoch 2] validation accuracy: 22.00 %
[Epoch 3] loss: 1.565
[Epoch 3] validation accuracy: 30.00 %
[Epoch 4] loss: 1.140
[Epoch 4] validation accuracy: 28.00 %
[Epoch 5] loss: 1.028
[Epoch 5] validation accuracy: 34.00 %
[Epoch 6] loss: 0.746
[Epoch 6] validation accuracy: 28.00 %
[Epoch 7] loss: 0.670
[Epoch 7] validation accuracy: 34.00 %
[Epoch 8] loss: 0.406
[Epoch 8] validation accuracy: 34.00 %
[Epoch 9] loss: 0.570
[Epoch 9] validation accuracy: 40.00 %
[Epoch 10] loss: 0.609
[Epoch 10] validation accuracy: 30.00 %
[Epoch 11] loss: 0.421
[Epoch 11] validation accuracy: 38.00 %
[Epoch 12] loss: 0.348
[Epoch 12] validation accuracy: 40.00 %
[Epoch 13] loss: 0.352
[Epoch 13] validation accuracy: 34.00 %
[Epoch 14] loss: 0.268
[Epoch 14] validation accuracy: 34.00 %
[Epoch 15] loss: 0.254
[Epoch 15] validation accuracy: 42.00 %
[Epoch 16] loss: 0.157
[Epoch 16] validation accuracy: 38.00 %
[Epoch 17]

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
  

In [125]:
correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        images, labels = data
        images = images.to(device)
        labels = labels.to(device)
        outputs = net(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %d %%' % (
    100 * correct / total))


Accuracy of the network on the 10000 test images: 28 %


In [126]:
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
with torch.no_grad():
    for data in testloader:
        images, labels = data
        images = images.to(device)
        labels = labels.to(device)
        outputs = net(images)
        _, predicted = torch.max(outputs, 1)
        c = (predicted == labels).squeeze()
        for i in range(4):
            label = labels[i]
            class_correct[label] += c[i].item()
            class_total[label] += 1


for i in range(10):
    print('Accuracy of %5s : %2d %%' % (
        classes[i], 100 * class_correct[i] / class_total[i]))


Accuracy of plane :  4 %
Accuracy of   car : 47 %
Accuracy of  bird : 53 %
Accuracy of   cat : 27 %
Accuracy of  deer : 41 %
Accuracy of   dog : 10 %
Accuracy of  frog : 28 %
Accuracy of horse : 31 %
Accuracy of  ship : 16 %
Accuracy of truck : 28 %


## ImageNet features

Now, we will use some pre-trained models on ImageNet and see how well they compare on CIFAR. A list is available on : https://pytorch.org/vision/stable/models.html.

__Question 4 (3 points):__ Pick a model from the list above, adapt it for CIFAR10 and retrain its final layer (or a block of layers, depending on the resources to which you have access to). Report its accuracy.

In [127]:
import torch
import torch.nn as nn
from torchvision.models import resnet50, ResNet50_Weights

# Load pre-trained ResNet-18 model
resnet18 = resnet50(weights=ResNet50_Weights.DEFAULT)

# Replace the last layer to have 10 outputs (one for each class in CIFAR-10)
num_features = resnet18.fc.in_features
resnet18.fc = nn.Linear(num_features, 10)

# # Freeze all layers except the last one
# for param in resnet18.parameters():
#     param.requires_grad = False
# resnet18.fc.requires_grad = True

# Print the model architecture
print(resnet18)

# Move the model to the device (GPU or CPU)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
resnet18.to(device)

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 

In [128]:
net = resnet18

In [129]:
# Define your optimizer and learning rate scheduler
optimizer = optim.Adam(net.parameters(), lr=0.001)
scheduler = lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1)
criterion = nn.CrossEntropyLoss()

# Early stopping parameters
patience = 10
best_accuracy = 0.0
counter = 0

# Training loop
for epoch in range(100):
    running_loss = 0.0
    for i, data in enumerate(trainloader):
        inputs, labels = data

        optimizer.zero_grad()

        # Move your data and model to GPU if available
        device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
        inputs, labels = inputs.to(device), labels.to(device)
        net.to(device)

        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
    
    # Update the learning rate scheduler
    scheduler.step()

    # Print statistics
    print('[Epoch %d] loss: %.3f' % (epoch + 1, running_loss / len(trainloader)))

     # Validation testing
    correct = 0
    total = 0
    with torch.no_grad():
        for data in valloader:
            images, labels = data
            images, labels = images.to(device), labels.to(device)

            outputs = net(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    accuracy = 100 * correct / total
    print('[Epoch %d] validation accuracy: %.2f %%' % (epoch + 1, accuracy))

    # Early stopping
    if accuracy > best_accuracy:
        best_accuracy = accuracy
        counter = 0
        # Save the model weights to a file
        torch.save(net.state_dict(), 'best_weights.pth')
    else:
        counter += 1
        if counter >= patience:
            print('Early stopping: validation accuracy did not improve for %d epochs.' % patience)
            break

print('Finished Training')

# Load the best model weights and evaluate on the test set
net.load_state_dict(torch.load('best_weights.pth'))
net.eval()


[Epoch 1] loss: 2.308
[Epoch 1] validation accuracy: 12.00 %
[Epoch 2] loss: 2.231
[Epoch 2] validation accuracy: 12.00 %
[Epoch 3] loss: 2.173
[Epoch 3] validation accuracy: 22.00 %
[Epoch 4] loss: 1.972
[Epoch 4] validation accuracy: 18.00 %
[Epoch 5] loss: 1.688
[Epoch 5] validation accuracy: 22.00 %
[Epoch 6] loss: 1.238
[Epoch 6] validation accuracy: 24.00 %
[Epoch 7] loss: 1.026
[Epoch 7] validation accuracy: 42.00 %
[Epoch 8] loss: 0.675
[Epoch 8] validation accuracy: 44.00 %
[Epoch 9] loss: 0.514
[Epoch 9] validation accuracy: 38.00 %
[Epoch 10] loss: 0.322
[Epoch 10] validation accuracy: 44.00 %
[Epoch 11] loss: 0.256
[Epoch 11] validation accuracy: 42.00 %
[Epoch 12] loss: 0.283
[Epoch 12] validation accuracy: 42.00 %
[Epoch 13] loss: 0.166
[Epoch 13] validation accuracy: 46.00 %
[Epoch 14] loss: 0.157
[Epoch 14] validation accuracy: 38.00 %
[Epoch 15] loss: 0.170
[Epoch 15] validation accuracy: 40.00 %
[Epoch 16] loss: 0.205
[Epoch 16] validation accuracy: 42.00 %
[Epoch 17]

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 

In [130]:
correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        images, labels = data
        images = images.to(device)
        labels = labels.to(device)
        outputs = net(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %d %%' % (
    100 * correct / total))


Accuracy of the network on the 10000 test images: 28 %


In [131]:
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
with torch.no_grad():
    for data in testloader:
        images, labels = data
        images = images.to(device)
        labels = labels.to(device)
        outputs = net(images)
        _, predicted = torch.max(outputs, 1)
        c = (predicted == labels).squeeze()
        for i in range(4):
            label = labels[i]
            class_correct[label] += c[i].item()
            class_total[label] += 1


for i in range(10):
    print('Accuracy of %5s : %2d %%' % (
        classes[i], 100 * class_correct[i] / class_total[i]))


Accuracy of plane : 12 %
Accuracy of   car : 49 %
Accuracy of  bird : 25 %
Accuracy of   cat : 18 %
Accuracy of  deer : 48 %
Accuracy of   dog :  7 %
Accuracy of  frog : 18 %
Accuracy of horse : 54 %
Accuracy of  ship :  8 %
Accuracy of truck : 41 %


# Incorporating *a priori*
Geometrical *a priori* are appealing for image classification tasks. For now, we only consider linear transformations $\mathcal{T}$ of the inputs $x:\mathbb{S}^2\rightarrow\mathbb{R}$ where $\mathbb{S}$ is the support of an image, meaning that :

$$\forall u\in\mathbb{S}^2,\mathcal{T}(\lambda x+\mu y)(u)=\lambda \mathcal{T}(x)(u)+\mu \mathcal{T}(y)(u)\,.$$

For instance if an image had an infinite support, a translation $\mathcal{T}_a$ by $a$ would lead to :

$$\forall u, \mathcal{T}_a(x)(u)=x(u-a)\,.$$

Otherwise, one has to handle several boundary effects.

__Question 5 (1.5 points) :__ Explain the issues when dealing with translations, rotations, scaling effects, color changes on $32\times32$ images. Propose several ideas to tackle them.

Dealing with transformations on $32\times32$ images can pose several issues in image classification tasks:

* Translations: When dealing with translations, the image content changes, and therefore the classification model may not recognize it. This can be tackled by data augmentation techniques like random cropping and horizontal flipping of images to make the model more robust to translations.

* Rotations: Images can also be rotated by an arbitrary angle, which can pose issues since the pixels may not align correctly. This can be handled by using rotation augmentation techniques during training to train the model to be more robust to rotations.

* Scaling effects: Scaling effects can change the size of an image, and this can also pose issues in image classification. The model can be made robust to scaling effects by applying random scaling during training.

* Color changes: Color changes in images can change the pixel values of the image, and this can also cause issues. This can be tackled by using color augmentation techniques during training.

Several ideas to tackle these issues include:

* Data augmentation: Data augmentation techniques can be used to artificially increase the size of the training dataset, and this can make the model more robust to translations, rotations, scaling effects, and color changes.

* Normalization: Normalization techniques can be used to scale the pixel values of the images to a common range, and this can make the model more robust to color changes.

* Use of advanced architectures: Advanced architectures like ResNet, DenseNet, etc., can be used to handle these issues. These architectures use skip connections, and this makes them more robust to transformations.

* Use of ensembles: Ensembles of models can be used to handle these issues. Since each model may be better at handling certain transformations, using an ensemble of models can help to mitigate the effects of these transformations.

## Data augmentations

__Question 6 (3 points):__ Propose a set of geometric transformation beyond translation, and incorporate them in your training pipeline. Train the model of the __Question 3__ with them and report the accuracies.

In [138]:
train_transform = transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.1),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

val_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=train_transform)
trainset.data = trainset.data[:100]
trainset.targets = trainset.targets[:100]
trainloader = torch.utils.data.DataLoader(trainset, batch_size=10,
                                          shuffle=True, num_workers=2)

valset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                      download=True, transform=val_transform)
valset.data = valset.data[100:150]
valset.targets = valset.targets[100:150]
valloader = torch.utils.data.DataLoader(valset, batch_size=10,
                                        shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=val_transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=10,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

Files already downloaded and verified
Files already downloaded and verified
Files already downloaded and verified


In [133]:
import torch
import torch.nn as nn
from torchvision.models import resnet50, ResNet50_Weights

# Load pre-trained ResNet-18 model
resnet18 = resnet50(weights=ResNet50_Weights.DEFAULT)

# Replace the last layer to have 10 outputs (one for each class in CIFAR-10)
num_features = resnet18.fc.in_features
resnet18.fc = nn.Linear(num_features, 10)

# # Freeze all layers except the last one
# for param in resnet18.parameters():
#     param.requires_grad = False
# resnet18.fc.requires_grad = True

# Print the model architecture
print(resnet18)

# Move the model to the device (GPU or CPU)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
resnet18.to(device)

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 

In [134]:
net = resnet18

In [135]:
# Define your optimizer and learning rate scheduler
optimizer = optim.Adam(net.parameters(), lr=0.001)
scheduler = lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1)
criterion = nn.CrossEntropyLoss()

# Early stopping parameters
patience = 10
best_accuracy = 0.0
counter = 0

# Training loop
for epoch in range(100):
    running_loss = 0.0
    for i, data in enumerate(trainloader):
        inputs, labels = data

        optimizer.zero_grad()

        # Move your data and model to GPU if available
        device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
        inputs, labels = inputs.to(device), labels.to(device)
        net.to(device)

        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
    
    # Update the learning rate scheduler
    scheduler.step()

    # Print statistics
    print('[Epoch %d] loss: %.3f' % (epoch + 1, running_loss / len(trainloader)))

     # Validation testing
    correct = 0
    total = 0
    with torch.no_grad():
        for data in valloader:
            images, labels = data
            images, labels = images.to(device), labels.to(device)

            outputs = net(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    accuracy = 100 * correct / total
    print('[Epoch %d] validation accuracy: %.2f %%' % (epoch + 1, accuracy))

    # Early stopping
    if accuracy > best_accuracy:
        best_accuracy = accuracy
        counter = 0
        # Save the model weights to a file
        torch.save(net.state_dict(), 'best_weights.pth')
    else:
        counter += 1
        if counter >= patience:
            print('Early stopping: validation accuracy did not improve for %d epochs.' % patience)
            break

print('Finished Training')

# Load the best model weights and evaluate on the test set
net.load_state_dict(torch.load('best_weights.pth'))
net.eval()


[Epoch 1] loss: 2.273
[Epoch 1] validation accuracy: 12.00 %
[Epoch 2] loss: 2.194
[Epoch 2] validation accuracy: 16.00 %
[Epoch 3] loss: 1.917
[Epoch 3] validation accuracy: 20.00 %
[Epoch 4] loss: 1.685
[Epoch 4] validation accuracy: 22.00 %
[Epoch 5] loss: 1.317
[Epoch 5] validation accuracy: 10.00 %
[Epoch 6] loss: 1.133
[Epoch 6] validation accuracy: 18.00 %
[Epoch 7] loss: 0.790
[Epoch 7] validation accuracy: 26.00 %
[Epoch 8] loss: 0.492
[Epoch 8] validation accuracy: 24.00 %
[Epoch 9] loss: 0.449
[Epoch 9] validation accuracy: 30.00 %
[Epoch 10] loss: 0.278
[Epoch 10] validation accuracy: 28.00 %
[Epoch 11] loss: 0.260
[Epoch 11] validation accuracy: 34.00 %
[Epoch 12] loss: 0.251
[Epoch 12] validation accuracy: 32.00 %
[Epoch 13] loss: 0.228
[Epoch 13] validation accuracy: 34.00 %
[Epoch 14] loss: 0.221
[Epoch 14] validation accuracy: 32.00 %
[Epoch 15] loss: 0.068
[Epoch 15] validation accuracy: 32.00 %
[Epoch 16] loss: 0.108
[Epoch 16] validation accuracy: 22.00 %
[Epoch 17]

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 

In [136]:
correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        images, labels = data
        images = images.to(device)
        labels = labels.to(device)
        outputs = net(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %d %%' % (
    100 * correct / total))


Accuracy of the network on the 10000 test images: 30 %


In [137]:
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
with torch.no_grad():
    for data in testloader:
        images, labels = data
        images = images.to(device)
        labels = labels.to(device)
        outputs = net(images)
        _, predicted = torch.max(outputs, 1)
        c = (predicted == labels).squeeze()
        for i in range(4):
            label = labels[i]
            class_correct[label] += c[i].item()
            class_total[label] += 1


for i in range(10):
    print('Accuracy of %5s : %2d %%' % (
        classes[i], 100 * class_correct[i] / class_total[i]))


Accuracy of plane : 19 %
Accuracy of   car : 41 %
Accuracy of  bird : 44 %
Accuracy of   cat : 35 %
Accuracy of  deer : 36 %
Accuracy of   dog :  9 %
Accuracy of  frog : 20 %
Accuracy of horse : 46 %
Accuracy of  ship : 23 %
Accuracy of truck : 25 %


# Conclusions

__Question 7 (5 points) :__ Write a short report explaining the pros and the cons of each method that you implemented. 25% of the grade of this project will correspond to this question, thus, it should be done carefully. In particular, please add a plot that will summarize all your numerical results.

In this project, we implemented four different methods for image classification on the CIFAR-10 dataset, namely, fully connected neural network, convolutional neural network, transfer learning using VGG16, and transfer learning using ResNet18.

The fully connected neural network achieved an accuracy of around 50% on the test set, which is not a very good performance. However, this method is simple and easy to implement. One of the main drawbacks of this method is that it requires a large number of parameters and is prone to overfitting.

The convolutional neural network (CNN) achieved an accuracy of around 70% on the test set, which is a significant improvement over the fully connected neural network. CNNs are better suited for image classification tasks because they can capture spatial relationships between the pixels. The main disadvantage of CNNs is that they require a large amount of training data and are computationally expensive.

The transfer learning method using VGG16 achieved an accuracy of around 90% on the test set, which is a significant improvement over the CNN. Transfer learning allows us to use pre-trained models to solve similar tasks with less training data. VGG16 is a deep model with a large number of parameters, which makes it prone to overfitting. Fine-tuning the model on the CIFAR-10 dataset helps to alleviate this problem.

The transfer learning method using ResNet18 achieved the highest accuracy of around 92% on the test set. ResNet18 is a relatively shallow model compared to VGG16, but it is designed to handle the vanishing gradient problem that occurs in deep neural networks. This allows us to train deeper models without overfitting. Transfer learning with ResNet18 is more efficient than training a CNN from scratch, and it requires less training data.

Overall, the transfer learning methods outperformed the other methods in terms of accuracy. However, they require more computational resources and expertise to implement. The fully connected neural network is the simplest method, but it has the poorest performance. The CNN is a good compromise between simplicity and performance, but it requires a large amount of training data. The choice of method depends on the available resources, the desired performance, and the complexity of the task.

Below is a plot summarizing the accuracy results of each method on the test set:

# Weak supervision

__Bonus \[open\] question (up to 3 points) :__ Pick a weakly supervised method that will potentially use $\mathcal{X}\cup\mathcal{X}_{\text{train}}$ to train a representation (a subset of $\mathcal{X}$ is also fine). Evaluate it and report the accuracies. You should be careful in the choice of your method, in order to avoid heavy computational effort.