# **Practical session on Transfer Learning**
This Pratical session proposes to study several techniques for improving challenging context, in which few data and resources are available.

# Introduction

**Context :**

Assume we are in a context where few "gold" labeled data are available for training, say 

$$\mathcal{X}_{\text{train}} = \{(x_n,y_n)\}_{n\leq N_{\text{train}}}$$

where $N_{\text{train}}$ is small. 

A large test set $\mathcal{X}_{\text{test}}$ as well as a large amount of unlabeled data, $\mathcal{X}$, is available. We also assume that we have a limited computational budget (e.g., no GPUs).

**Instructions to follow :** 

For each question, write a commented *Code* or a complete answer as a *Markdown*. When the objective of a question is to report a CNN accuracy, please use the following format to report it, at the end of the question :

| Model | Number of  epochs  | Train accuracy | Test accuracy |
|------|------|------|------|
|   XXX  | XXX | XXX | XXX |

If applicable, please add the field corresponding to the  __Accuracy on Full Data__ as well as a link to the __Reference paper__ you used to report those numbers. (You do not need to train a CNN on the full CIFAR10 dataset!)

In your final report, please *keep the logs of each training procedure* you used. We will only run this jupyter if we have some doubts on your implementation. 

The total file sizes should be reasonable (feasible with 2MB only!). You will be asked to hand in the notebook, together with any necessary files required to run it if any.

You can use https://colab.research.google.com/ to run your experiments.

## Training set creation
__Question 1 (2 points) :__ Propose a dataloader to obtain a training loader that will only use the first 100 samples of the CIFAR-10 training set.

Additional information :  

*   CIFAR10 dataset : https://en.wikipedia.org/wiki/CIFAR-10
*   You can directly use the dataloader framework from Pytorch.
*   Alternatively you can modify the file : https://github.com/pytorch/vision/blob/master/torchvision/datasets/cifar.py

In [12]:
import torch
import torchvision
import numpy as np
from torchvision import transforms
batch_size = 20
cifar_set_train = torchvision.datasets.CIFAR10(root ="\content", train=True,
                                               download=True , transform=transforms.ToTensor())
sampler_100 = torch.utils.data.Subset(cifar_set_train, list(np.arange(100))) 
X_train = torch.utils.data.DataLoader(sampler_100, batch_size=batch_size, shuffle=True)
cifar_set_test = torchvision.datasets.CIFAR10(root='./data', train=False,
                                        download=True, transform=transforms.ToTensor())
X_test = torch.utils.data.DataLoader(cifar_set_test, batch_size=batch_size, shuffle=False)

Files already downloaded and verified
Files already downloaded and verified


* This is our dataset $\mathcal{X}_{\text{train}}$, it will be used until the end of this project. 

* The remaining samples correspond to $\mathcal{X}$. 

* The testing set $\mathcal{X}_{\text{test}}$ corresponds to the whole testing set of CIFAR-10.

## Testing procedure
__Question 2 (1.5 points):__ Explain why the evaluation of the training procedure is difficult. Propose several solutions.

**Answer 2:** The main problem is the small size of the training set. We have only 100 samples, what is obviously not enough for training. Therefore, we could (and probably would) encounter next problems. First, we one obvious problem is overfitting. The dataset is too small, so any more or less advanced model will simply remember and show no generalization after training. We can tackle this problem with usage of data augmentation or just use semi-supervised techniques. Second, the dataset would be imbalanced, so we would have small amount of data for one class, so the model won't be able to learn properly for this class. Also, there could be a problem of omiting of some classes(the probability is small, but worth of mentioning). In order to fight this problem, we can resample this dataset again and again until we get evenly distributed class. Also, we can find right class weight and adjust loss function to this weights.

# The Baseline

In this section, the goal is to train a CNN on $\mathcal{X}_{\text{train}}$ and compare its performance with reported numbers from the litterature. You will have to re-use and/or design a standard classification pipeline. You should optimize your pipeline to obtain the best performances (image size, data augmentation by flip, ...).

The key ingredients for training a CNN are the batch size, as well as the learning rate scheduler (i.e. how to decrease the learning rate as a function of the number of epochs). A possible scheduler is to start the learning rate at 0.1 and decreasing it every 30 epochs by 10. In case of divergence, reduce the learning rate. A potential batch size could be 10, yet this can be cross-validated.

You can get some baselines accuracies in this paper (obviously, it is a different context for those researchers who had access to GPUs!) : http://openaccess.thecvf.com/content_cvpr_2018/papers/Keshari_Learning_Structure_and_CVPR_2018_paper.pdf. 

## ResNet architectures

__Question 3 (4 points) :__ Write a classification pipeline for $\mathcal{X}_{\text{train}}$, train from scratch and evaluate a *ResNet-18* architecture specific to CIFAR10 (details about the ImageNet model can be found here: https://arxiv.org/abs/1512.03385). Please report the accuracy obtained on the whole dataset as well as the reference paper/GitHub link.

*Hint :* You can re-use the following code : https://github.com/kuangliu/pytorch-cifar. During a training of 10 epochs, a batch size of 10 and a learning rate of 0.01, one obtains 40% accuracy on $\mathcal{X}_{\text{train}}$ (\~2 minutes) and 20% accuracy on $\mathcal{X}_{\text{test}}$ (\~5 minutes).

In [14]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.transforms as transforms
use_cuda = torch.cuda.is_available()
device = 'cuda' if use_cuda else 'cpu'

class BasicBlock(nn.Module):
    expansion = 1

    def __init__(self, in_planes, planes, stride=1):
        super(BasicBlock, self).__init__()
        self.conv1 = nn.Conv2d(
            in_planes, planes, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3,
                               stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(planes)

        self.shortcut = nn.Sequential()
        if stride != 1 or in_planes != self.expansion*planes:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_planes, self.expansion*planes,
                          kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(self.expansion*planes)
            )

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.bn2(self.conv2(out))
        out += self.shortcut(x)
        out = F.relu(out)
        return out


class Bottleneck(nn.Module):
    expansion = 4

    def __init__(self, in_planes, planes, stride=1):
        super(Bottleneck, self).__init__()
        self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=1, bias=False)
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3,
                               stride=stride, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(planes)
        self.conv3 = nn.Conv2d(planes, self.expansion *
                               planes, kernel_size=1, bias=False)
        self.bn3 = nn.BatchNorm2d(self.expansion*planes)

        self.shortcut = nn.Sequential()
        if stride != 1 or in_planes != self.expansion*planes:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_planes, self.expansion*planes,
                          kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(self.expansion*planes)
            )

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = F.relu(self.bn2(self.conv2(out)))
        out = self.bn3(self.conv3(out))
        out += self.shortcut(x)
        out = F.relu(out)
        return out


class ResNet(nn.Module):
    def __init__(self, block, num_blocks, num_classes=10):
        super(ResNet, self).__init__()
        self.in_planes = 64

        self.conv1 = nn.Conv2d(3, 64, kernel_size=3,
                               stride=1, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.layer1 = self._make_layer(block, 64, num_blocks[0], stride=1)
        self.layer2 = self._make_layer(block, 128, num_blocks[1], stride=2)
        self.layer3 = self._make_layer(block, 256, num_blocks[2], stride=2)
        self.layer4 = self._make_layer(block, 512, num_blocks[3], stride=2)
        self.linear = nn.Linear(512*block.expansion, num_classes)

    def _make_layer(self, block, planes, num_blocks, stride):
        strides = [stride] + [1]*(num_blocks-1)
        layers = []
        for stride in strides:
            layers.append(block(self.in_planes, planes, stride))
            self.in_planes = planes * block.expansion
        return nn.Sequential(*layers)

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.layer1(out)
        out = self.layer2(out)
        out = self.layer3(out)
        out = self.layer4(out)
        out = F.avg_pool2d(out, 4)
        out = out.view(out.size(0), -1)
        out = self.linear(out)
        return out


def ResNet18():
    return ResNet(BasicBlock, [2, 2, 2, 2])
def train(model, num_epochs, optimizer, scheduler, criterion, train_loader, verbose=True, device=device):
    model = model.to(device)
    training_losses = []
    training_accuracies = []
    for epoch in range(1, num_epochs+1): 
        model.train()
        epoch_loss = 0
        epoch_samples = 0
        epoch_correct = 0   
        for batch_idx, (data, target) in enumerate(train_loader):
            data, target = data.to(device), target.to(device)
            optimizer.zero_grad()
            output = model(data)
            loss = criterion(output, target)
            loss.backward()
            optimizer.step()
            epoch_loss += loss.item()
            predicted_batch = output.argmax(dim=1)
            epoch_samples += len(target) 
            epoch_correct += (predicted_batch == target).sum()
        epoch_accuracy = 100 * (epoch_correct / epoch_samples)
        training_accuracies.append(epoch_accuracy.item())
        average_epoch_loss = epoch_loss / len(train_loader)
        training_losses.append(average_epoch_loss)
        if verbose :
            if epoch % 5 == 0: 
                print('Epoch: {}, Loss: {:.3f}, Accuracy: {:.1f} %'.format(epoch, average_epoch_loss, epoch_accuracy))

    return training_accuracies, training_losses
def test(model, criterion, device, test_loader, verbose=True):
    model.eval()
    with torch.no_grad():
        test_loss = 0
        correct = 0
        total_samples = 0
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += criterion(output, target).item()
            predicted = output.argmax(dim=1)
            correct += predicted.eq(target).sum().item()
            total_samples += len(target)
        accuracy = 100 * correct / total_samples
        loss = test_loss / len(test_loader)
        if verbose:
            print('Test_loss: {:.3f}, Test_accuracy: {:.3f} %'.format(loss, accuracy))
    return accuracy

epochs = 10
resnet18 = ResNet18()
optimizer = torch.optim.SGD(resnet18.parameters(), lr=3e-2, momentum=0.9)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.2)
criterion = torch.nn.CrossEntropyLoss()
train_accuracy, train_loss = train(resnet18, epochs, optimizer, scheduler, criterion, X_train)
test_accuracy = test(resnet18, criterion, device, X_test)

Epoch: 5, Loss: 2.307, Accuracy: 31.0 %
Epoch: 10, Loss: 1.168, Accuracy: 60.0 %
Test_loss: 3.492, Test_accuracy: 20.100 %


# Transfer learning

We propose to use pre-trained models on a classification and generative task, in order to improve the results of our setting.

## ImageNet features

Now, we will use some pre-trained models on ImageNet and see how well they compare on CIFAR. A list is available on : https://pytorch.org/vision/stable/models.html.

__Question 4 (3 points):__ Pick a model from the list above, adapt it for CIFAR10 and retrain its final layer (or a block of layers, depending on the resources to which you have access to). Report its accuracy.

In [6]:
from torchvision.models import mobilenet_v3_small
import torch.nn as nn
model = mobilenet_v3_small(pretrained=True)
num_classes = 10
epochs = 50
model.classifier = nn.Sequential(
    nn.Linear(in_features=576, out_features=128),
    nn.ReLU(),
    nn.Linear(in_features=128, out_features=num_classes),
)
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.02, momentum=0.8)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.2)

for param in model.parameters():
    param.requires_grad = False

for param in model.classifier.parameters():
    param.requires_grad = True
  
train_accuracy, train_loss = train(model, epochs, optimizer, scheduler, criterion, X_train)
for param in model.parameters():
    param.requires_grad = True
child_nbrs = 0
for child in model.children():
    child_nbrs += 1
child_counter = 0
for child in model.children():
    if child_counter < child_nbrs//2 :
        child_counter += 1
        for param in child.parameters():
            param.requires_grad = False
for param in model.classifier.parameters():
    param.requires_grad = True
train_accuracy, train_loss = train(model, epochs, optimizer, scheduler, criterion, X_train) 
for param in model.parameters():
    param.requires_grad = True
train_accuracy, train_loss = train(model, epochs, optimizer, scheduler, criterion, X_train) 
test_accuracy = test(model, criterion, device, X_test)


Downloading: "https://download.pytorch.org/models/mobilenet_v3_small-047dcff4.pth" to /root/.cache/torch/hub/checkpoints/mobilenet_v3_small-047dcff4.pth
100%|██████████| 9.83M/9.83M [00:00<00:00, 57.7MB/s]


Epoch: 5, Loss: 1.799, Accuracy: 44.0 %
Epoch: 10, Loss: 1.216, Accuracy: 61.0 %
Epoch: 15, Loss: 0.846, Accuracy: 78.0 %
Epoch: 20, Loss: 0.793, Accuracy: 76.0 %
Epoch: 25, Loss: 0.693, Accuracy: 74.0 %
Epoch: 30, Loss: 0.722, Accuracy: 78.0 %
Epoch: 35, Loss: 0.552, Accuracy: 84.0 %
Epoch: 40, Loss: 0.532, Accuracy: 81.0 %
Epoch: 45, Loss: 0.409, Accuracy: 88.0 %
Epoch: 50, Loss: 0.349, Accuracy: 92.0 %
Epoch: 5, Loss: 0.636, Accuracy: 82.0 %
Epoch: 10, Loss: 0.409, Accuracy: 86.0 %
Epoch: 15, Loss: 0.337, Accuracy: 88.0 %
Epoch: 20, Loss: 0.412, Accuracy: 87.0 %
Epoch: 25, Loss: 0.490, Accuracy: 85.0 %
Epoch: 30, Loss: 0.336, Accuracy: 88.0 %
Epoch: 35, Loss: 0.478, Accuracy: 86.0 %
Epoch: 40, Loss: 0.272, Accuracy: 93.0 %
Epoch: 45, Loss: 0.188, Accuracy: 94.0 %
Epoch: 50, Loss: 0.247, Accuracy: 93.0 %
Epoch: 5, Loss: 2.049, Accuracy: 26.0 %
Epoch: 10, Loss: 1.071, Accuracy: 69.0 %
Epoch: 15, Loss: 0.554, Accuracy: 81.0 %
Epoch: 20, Loss: 0.288, Accuracy: 90.0 %
Epoch: 25, Loss: 0.

In [7]:
from torchvision.models import mobilenet_v3_large
model = mobilenet_v3_large(pretrained=True)
num_classes = 10
epochs = 50
model.classifier = nn.Sequential(
    nn.Linear(in_features=960, out_features=128),
    nn.ReLU(),
    nn.Linear(in_features=128, out_features=num_classes),
)
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.02, momentum=0.8)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.2)

for param in model.parameters():
    param.requires_grad = False

for param in model.classifier.parameters():
    param.requires_grad = True
  
train_accuracy, train_loss = train(model, epochs, optimizer, scheduler, criterion, X_train)
for param in model.parameters():
    param.requires_grad = True
child_nbrs = 0
for child in model.children():
    child_nbrs += 1
child_counter = 0
for child in model.children():
    if child_counter < child_nbrs//2 :
        child_counter += 1
        for param in child.parameters():
            param.requires_grad = False
for param in model.classifier.parameters():
    param.requires_grad = True
train_accuracy, train_loss = train(model, epochs, optimizer, scheduler, criterion, X_train) 
for param in model.parameters():
    param.requires_grad = True
train_accuracy, train_loss = train(model, epochs, optimizer, scheduler, criterion, X_train) 
test_accuracy = test(model, criterion, device, X_test)


Downloading: "https://download.pytorch.org/models/mobilenet_v3_large-8738ca79.pth" to /root/.cache/torch/hub/checkpoints/mobilenet_v3_large-8738ca79.pth
100%|██████████| 21.1M/21.1M [00:00<00:00, 46.4MB/s]


Epoch: 5, Loss: 1.475, Accuracy: 56.0 %
Epoch: 10, Loss: 0.766, Accuracy: 75.0 %
Epoch: 15, Loss: 0.542, Accuracy: 85.0 %
Epoch: 20, Loss: 0.440, Accuracy: 88.0 %
Epoch: 25, Loss: 0.426, Accuracy: 86.0 %
Epoch: 30, Loss: 0.382, Accuracy: 88.0 %
Epoch: 35, Loss: 0.282, Accuracy: 89.0 %
Epoch: 40, Loss: 0.339, Accuracy: 92.0 %
Epoch: 45, Loss: 0.293, Accuracy: 90.0 %
Epoch: 50, Loss: 0.446, Accuracy: 85.0 %
Epoch: 5, Loss: 0.159, Accuracy: 97.0 %
Epoch: 10, Loss: 0.237, Accuracy: 93.0 %
Epoch: 15, Loss: 0.150, Accuracy: 94.0 %
Epoch: 20, Loss: 0.126, Accuracy: 97.0 %
Epoch: 25, Loss: 0.168, Accuracy: 97.0 %
Epoch: 30, Loss: 0.156, Accuracy: 94.0 %
Epoch: 35, Loss: 0.290, Accuracy: 91.0 %
Epoch: 40, Loss: 0.126, Accuracy: 97.0 %
Epoch: 45, Loss: 0.220, Accuracy: 93.0 %
Epoch: 50, Loss: 0.069, Accuracy: 99.0 %
Epoch: 5, Loss: 1.777, Accuracy: 37.0 %
Epoch: 10, Loss: 0.570, Accuracy: 84.0 %
Epoch: 15, Loss: 0.609, Accuracy: 80.0 %
Epoch: 20, Loss: 0.371, Accuracy: 90.0 %
Epoch: 25, Loss: 0.

# Incorporating *a priori*
Geometrical *a priori* are appealing for image classification tasks. For now, we only consider linear transformations $\mathcal{T}$ of the inputs $x:\mathbb{S}^2\rightarrow\mathbb{R}$ where $\mathbb{S}$ is the support of an image, meaning that :

$$\forall u\in\mathbb{S}^2,\mathcal{T}(\lambda x+\mu y)(u)=\lambda \mathcal{T}(x)(u)+\mu \mathcal{T}(y)(u)\,.$$

For instance if an image had an infinite support, a translation $\mathcal{T}_a$ by $a$ would lead to :

$$\forall u, \mathcal{T}_a(x)(u)=x(u-a)\,.$$

Otherwise, one has to handle several boundary effects.

__Question 5 (1.5 points) :__ Explain the issues when dealing with translations, rotations, scaling effects, color changes on $32\times32$ images. Propose several ideas to tackle them.

**Answer 5:** There are several problems that can emerge when working with translations, rotations, scaling effects, and color changes on 32x32 images:
When a picture is translated or turned, portions of it may shift out of frame or become distorted, resulting in information loss. This can make it more challenging for the model to identify the picture accurately. Scaling effects can cause images to look larger or smaller, making it challenging for a model to correctly recognize objects in the image. Changes in illumination or hue can alter the look of a picture, making it more difficult for a model to correctly recognize items.

Here are some solutions to these problems. We can produce more training data and teach the model to be more resilient to differences by adding modifications to the training pictures such as translations, rotations, scaling, and color changes. This can help minimize the risk of overfitting and increase the accuracy of the model. We can minimize the effect of differences in lighting conditions and color shifts by normalizing the pixel values of the pictures. This can help to make the model more resistant to such changes. We can use pre-trained models to utilize information acquired on large datasets to enhance the performance of our model on smaller datasets with fewer training instances. This can aid the model's generalization to new variants of the input pictures.  We can help the model learn to recognize things at various sizes by training it on pictures at varied scales, making it more resilient to changes in scaling effects. We can enhance the resilience of the model and reduce the risk of overfitting by merging the forecasts of multiple models trained with various augmentations or initialization methods. This can help to improve the model's precision on test data with differences that were not seen during training.




## Data augmentations

__Question 6 (3 points):__ Propose a set of geometric transformation beyond translation, and incorporate them in your training pipeline. Train the model of the __Question 3__ with them and report the accuracies.

In [8]:
train_trans = transforms.Compose([
                 transforms.RandomHorizontalFlip(),
                 transforms.RandomCrop(size=[32,32], padding=3),
                 transforms.RandomRotation(degrees=15),
                 transforms.ColorJitter(hue=.1),
                 transforms.ToTensor(),
                 ])
test_trans = transforms.Compose([
                 transforms.ToTensor(),
])
cifar_set_train = torchvision.datasets.CIFAR10(root ="\content", train=True,
                                               download=True , transform=train_trans)
sampler_100 = torch.utils.data.Subset(cifar_set_train, list(np.arange(100))) 
X_train2 = torch.utils.data.DataLoader(sampler_100, batch_size=batch_size, shuffle=True)
cifar_set_test = torchvision.datasets.CIFAR10(root='./data', train=False,
                                        download=True, transform=transforms.ToTensor())
X_test2 = torch.utils.data.DataLoader(cifar_set_test, batch_size=batch_size, shuffle=False)
epochs = 50
resnet18 = ResNet18()
optimizer = torch.optim.SGD(resnet18.parameters(), lr=1e-2, momentum=0.9)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.2)
criterion = torch.nn.CrossEntropyLoss()
train_accuracy, train_loss = train(resnet18, epochs, optimizer, scheduler, criterion, X_train2)
test_accuracy = test(resnet18, criterion, device, X_test2)

Files already downloaded and verified
Files already downloaded and verified
Epoch: 5, Loss: 1.973, Accuracy: 31.0 %
Epoch: 10, Loss: 1.601, Accuracy: 47.0 %
Epoch: 15, Loss: 1.137, Accuracy: 62.0 %
Epoch: 20, Loss: 0.874, Accuracy: 69.0 %
Epoch: 25, Loss: 0.718, Accuracy: 73.0 %
Epoch: 30, Loss: 0.615, Accuracy: 81.0 %
Epoch: 35, Loss: 0.688, Accuracy: 73.0 %
Epoch: 40, Loss: 0.411, Accuracy: 85.0 %
Epoch: 45, Loss: 0.326, Accuracy: 88.0 %
Epoch: 50, Loss: 0.425, Accuracy: 85.0 %
Test_loss: 6.464, Test_accuracy: 19.920 %


In [9]:
model = mobilenet_v3_small(pretrained=True)
num_classes = 10
epochs = 50
model.classifier = nn.Sequential(
    nn.Linear(in_features=576, out_features=128),
    nn.ReLU(),
    nn.Linear(in_features=128, out_features=num_classes),
)
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.02, momentum=0.8)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.2)

for param in model.parameters():
    param.requires_grad = False

for param in model.classifier.parameters():
    param.requires_grad = True
  
train_accuracy, train_loss = train(model, epochs, optimizer, scheduler, criterion, X_train2)
for param in model.parameters():
    param.requires_grad = True
child_nbrs = 0
for child in model.children():
    child_nbrs += 1
child_counter = 0
for child in model.children():
    if child_counter < child_nbrs//2 :
        child_counter += 1
        for param in child.parameters():
            param.requires_grad = False
for param in model.classifier.parameters():
    param.requires_grad = True
train_accuracy, train_loss = train(model, epochs, optimizer, scheduler, criterion, X_train2) 
for param in model.parameters():
    param.requires_grad = True
train_accuracy, train_loss = train(model, epochs, optimizer, scheduler, criterion, X_train2) 
test_accuracy = test(model, criterion, device, X_test2)

Epoch: 5, Loss: 2.102, Accuracy: 26.0 %
Epoch: 10, Loss: 2.015, Accuracy: 26.0 %
Epoch: 15, Loss: 1.984, Accuracy: 30.0 %
Epoch: 20, Loss: 1.934, Accuracy: 35.0 %
Epoch: 25, Loss: 1.846, Accuracy: 32.0 %
Epoch: 30, Loss: 1.875, Accuracy: 30.0 %
Epoch: 35, Loss: 1.770, Accuracy: 35.0 %
Epoch: 40, Loss: 1.904, Accuracy: 38.0 %
Epoch: 45, Loss: 1.794, Accuracy: 35.0 %
Epoch: 50, Loss: 1.733, Accuracy: 36.0 %
Epoch: 5, Loss: 1.775, Accuracy: 36.0 %
Epoch: 10, Loss: 1.805, Accuracy: 38.0 %
Epoch: 15, Loss: 1.845, Accuracy: 35.0 %
Epoch: 20, Loss: 1.806, Accuracy: 28.0 %
Epoch: 25, Loss: 1.932, Accuracy: 36.0 %
Epoch: 30, Loss: 1.841, Accuracy: 28.0 %
Epoch: 35, Loss: 1.899, Accuracy: 30.0 %
Epoch: 40, Loss: 1.804, Accuracy: 34.0 %
Epoch: 45, Loss: 1.716, Accuracy: 37.0 %
Epoch: 50, Loss: 1.756, Accuracy: 39.0 %
Epoch: 5, Loss: 2.003, Accuracy: 22.0 %
Epoch: 10, Loss: 2.011, Accuracy: 28.0 %
Epoch: 15, Loss: 1.717, Accuracy: 37.0 %
Epoch: 20, Loss: 1.402, Accuracy: 53.0 %
Epoch: 25, Loss: 1.

In [10]:
from torchvision.models import mobilenet_v3_large
model = mobilenet_v3_large(pretrained=True)
num_classes = 10
epochs = 50
model.classifier = nn.Sequential(
    nn.Linear(in_features=960, out_features=128),
    nn.ReLU(),
    nn.Linear(in_features=128, out_features=num_classes),
)
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.02, momentum=0.8)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.2)

for param in model.parameters():
    param.requires_grad = False

for param in model.classifier.parameters():
    param.requires_grad = True
  
train_accuracy, train_loss = train(model, epochs, optimizer, scheduler, criterion, X_train2)
for param in model.parameters():
    param.requires_grad = True
child_nbrs = 0
for child in model.children():
    child_nbrs += 1
child_counter = 0
for child in model.children():
    if child_counter < child_nbrs//2 :
        child_counter += 1
        for param in child.parameters():
            param.requires_grad = False
for param in model.classifier.parameters():
    param.requires_grad = True
train_accuracy, train_loss = train(model, epochs, optimizer, scheduler, criterion, X_train2) 
for param in model.parameters():
    param.requires_grad = True
train_accuracy, train_loss = train(model, epochs, optimizer, scheduler, criterion, X_train2) 
test_accuracy = test(model, criterion, device, X_test2)


Epoch: 5, Loss: 2.024, Accuracy: 23.0 %
Epoch: 10, Loss: 1.786, Accuracy: 38.0 %
Epoch: 15, Loss: 1.788, Accuracy: 36.0 %
Epoch: 20, Loss: 1.813, Accuracy: 33.0 %
Epoch: 25, Loss: 1.534, Accuracy: 53.0 %
Epoch: 30, Loss: 1.560, Accuracy: 47.0 %
Epoch: 35, Loss: 1.562, Accuracy: 45.0 %
Epoch: 40, Loss: 1.465, Accuracy: 52.0 %
Epoch: 45, Loss: 1.532, Accuracy: 52.0 %
Epoch: 50, Loss: 1.338, Accuracy: 52.0 %
Epoch: 5, Loss: 1.350, Accuracy: 57.0 %
Epoch: 10, Loss: 1.583, Accuracy: 41.0 %
Epoch: 15, Loss: 1.551, Accuracy: 47.0 %
Epoch: 20, Loss: 1.291, Accuracy: 58.0 %
Epoch: 25, Loss: 1.468, Accuracy: 50.0 %
Epoch: 30, Loss: 1.290, Accuracy: 57.0 %
Epoch: 35, Loss: 1.484, Accuracy: 46.0 %
Epoch: 40, Loss: 1.198, Accuracy: 64.0 %
Epoch: 45, Loss: 1.365, Accuracy: 52.0 %
Epoch: 50, Loss: 1.291, Accuracy: 56.0 %
Epoch: 5, Loss: 2.159, Accuracy: 20.0 %
Epoch: 10, Loss: 1.451, Accuracy: 47.0 %
Epoch: 15, Loss: 1.176, Accuracy: 62.0 %
Epoch: 20, Loss: 0.762, Accuracy: 77.0 %
Epoch: 25, Loss: 0.

# Conclusions

__Question 7 (5 points) :__ Write a short report explaining the pros and the cons of each method that you implemented. 25% of the grade of this project will correspond to this question, thus, it should be done carefully. In particular, please add a plot that will summarize all your numerical results.

In this part of our project, we aimed to improve the accuracy of our image classification models by augmenting our 100-sample dataset with a set of geometric transformations. By randomly augmenting our images using these transformations, we hoped to ensure that the models would encounter different samples for every iteration, thus enhancing their ability to generalize and make accurate predictions on unseen data.

To achieve this, we incorporated a pipeline of transformations using the torchvision.transforms.Compose() method. Specifically, we applied the following transformations: horizontally flip the given image randomly, randomly crop the input image, random rotation and randomly change the hue of an image.

By applying this pipeline of data augmentation, we were able to improve the accuracy of some models we tried by up to 3.4%. In particular, we weren't able to improve the accuracy of the ResNet-18 model on the test set, we got from 20.10% to 19.92%. The best accuracy we obtained was 26.37% with the Pretrained MobileNetV3 model.

Below, we report the accuracies of all the models we tried with and without data augmentation:

| Model                | Accuracy with data augmentation | Accuracy without data augmentation |
|----------------------|---------------------------------|------------------------------------|
| ResNet-18            | 19.92%                          | 20.10%                             |
| MobileNetV3Small            | 23.24%                          | 16.37%                             |
| MobileNetV3Large | 26.37%                          | 18.22%                             |



As the table shows, data augmentation consistently led to improved accuracy across most of the models we tried. We conclude that incorporating a pipeline of geometric transformations can be an effective way to enhance the performance of image classification models.

# Weak supervision

__Bonus \[open\] question (up to 3 points) :__ Pick a weakly supervised method that will potentially use $\mathcal{X}\cup\mathcal{X}_{\text{train}}$ to train a representation (a subset of $\mathcal{X}$ is also fine). Evaluate it and report the accuracies. You should be careful in the choice of your method, in order to avoid heavy computational effort.