# **Practical session on Transfer Learning**
This Pratical session proposes to study several techniques for improving challenging context, in which few data and resources are available.

# Introduction

**Context :**

Assume we are in a context where few "gold" labeled data are available for training, say

$$\mathcal{X}_{\text{train}} = \{(x_n,y_n)\}_{n\leq N_{\text{train}}}$$

where $N_{\text{train}}$ is small.

A large test set $\mathcal{X}_{\text{test}}$ as well as a large amount of unlabeled data, $\mathcal{X}$, is available. We also assume that we have a limited computational budget (e.g., no GPUs).

**Instructions to follow :**

For each question, write a commented *Code* or a complete answer as a *Markdown*. When the objective of a question is to report a CNN accuracy, please use the following format to report it, at the end of the question :

| Model | Number of  epochs  | Train accuracy | Test accuracy |
|------|------|------|------|
|   XXX  | XXX | XXX | XXX |

If applicable, please add the field corresponding to the  __Accuracy on Full Data__ as well as a link to the __Reference paper__ you used to report those numbers. (You do not need to train a CNN on the full CIFAR10 dataset!)

In your final report, please *keep the logs of each training procedure* you used. We will only run this jupyter if we have some doubts on your implementation.

The total file sizes should be reasonable (feasible with 2MB only!). You will be asked to hand in the notebook, together with any necessary files required to run it if any.

You can use https://colab.research.google.com/ to run your experiments.

## Training set creation
__Question 1 (1 points) :__ Propose a dataloader to obtain a training loader that will only use the first 100 samples of the CIFAR-10 training set.

Additional information :  

*   CIFAR10 dataset : https://en.wikipedia.org/wiki/CIFAR-10
*   You can directly use the dataloader framework from Pytorch.
*   Alternatively you can modify the file : https://github.com/pytorch/vision/blob/master/torchvision/datasets/cifar.py

In [1]:
import torch
import torchvision
import torchvision.transforms as transforms
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

from tqdm import tqdm
import numpy as np
import matplotlib.pyplot as plt

In [2]:
transform = transforms.Compose([
    transforms.ToTensor(),
])

batch_size = 10

# Only use the first 100 samples
train_set = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
train_set.data = train_set.data[:100]
train_set.targets = train_set.targets[:100]
train_dataloader = torch.utils.data.DataLoader(train_set, batch_size=batch_size, shuffle=True)

# For validation during train loop
unlabel_set =  torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
unlabel_set.data = unlabel_set.data[100:]
unlabel_set.targets = unlabel_set.targets[100:]
unlabel_dataloader = torch.utils.data.DataLoader(unlabel_set, batch_size=batch_size, shuffle=True)

# Test set
test_set = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
test_dataloader = torch.utils.data.DataLoader(test_set, batch_size=batch_size, shuffle=False)

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz


100%|██████████| 170498071/170498071 [00:03<00:00, 47874294.44it/s]


Extracting ./data/cifar-10-python.tar.gz to ./data
Files already downloaded and verified
Files already downloaded and verified


* This is our dataset $\mathcal{X}_{\text{train}}$, it will be used until the end of this project.

* The remaining samples correspond to $\mathcal{X}$.

* The testing set $\mathcal{X}_{\text{test}}$ corresponds to the whole testing set of CIFAR-10.

## Testing procedure
__Question 2 (0.5 points):__ Explain why the evaluation of the training procedure is difficult. Propose several solutions.

The evaluation of the training procedure is difficult because sometimes the accuracy rate is very high on traning set but it is overfitting, and may have a poor performance on the testing set. This is a very commen dilemma when trying to balance between accuracy and generalization. Especially when training on a small dataset, it is very easy to overfit to the training data.


**Solutions**

* Data Augmentation: We can use different data augmentation techniques to artificially increase the size of the dataset. This can help reduce overfitting and improve the generalization of the model.
* Regularization: We can implement regularization techniques such as dropout or weight decay to penalize large weights and reduce the risk of overfitting.
* Early Stopping: We can monitor the validation loss during training and stop the training process if the validation loss does not improve for a certain number of epochs. This can also help reduce overfitting.


# The Baseline

In this section, the goal is to train a CNN on $\mathcal{X}_{\text{train}}$ and compare its performance with reported numbers from the litterature. You will have to re-use and/or design a standard classification pipeline. You should optimize your pipeline to obtain the best performances (image size, data augmentation by flip, ...).

The key ingredients for training a CNN are the batch size, as well as the learning rate scheduler (i.e. how to decrease the learning rate as a function of the number of epochs). A possible scheduler is to start the learning rate at 0.1 and decreasing it every 30 epochs by 10. In case of divergence, reduce the learning rate. A potential batch size could be 10, yet this can be cross-validated.

You can get some baselines accuracies in this paper (obviously, it is a different context for those researchers who had access to GPUs!) : http://openaccess.thecvf.com/content_cvpr_2018/papers/Keshari_Learning_Structure_and_CVPR_2018_paper.pdf.

## ResNet architectures

__Question 3 (2 points) :__ Write a classification pipeline for $\mathcal{X}_{\text{train}}$, train from scratch and evaluate a *ResNet-18* architecture specific to CIFAR10 (details about the ImageNet model can be found here: https://arxiv.org/abs/1512.03385). Please report the accuracy obtained on the whole dataset as well as the reference paper/GitHub link.

*Hint :* You can re-use the following code : https://github.com/kuangliu/pytorch-cifar. During a training of 10 epochs, a batch size of 10 and a learning rate of 0.01, one obtains 40% accuracy on $\mathcal{X}_{\text{train}}$ (\~2 minutes) and 20% accuracy on $\mathcal{X}_{\text{test}}$ (\~5 minutes).

In [3]:
#########################################
### Training Pipeline
#########################################

def train(model, epochs, trainloader, valloader, optimizer, scheduler, criterion, device):
    best_acc = 0.0
    stop_time = 0

    # Train
    for epoch in range(epochs):
        model.train()
        train_loss = 0
        correct = 0
        total = 0
        for batch_idx, (inputs, targets) in enumerate(trainloader):
            # Assuming No GPUs
            inputs, targets = inputs.to(device), targets.to(device)
            model.to(device)

            optimizer.zero_grad()
            outputs = model(inputs)
            loss = criterion(outputs, targets)
            loss.backward()
            optimizer.step()

            train_loss += loss.item()
            _, predicted = outputs.max(1)
            total += targets.size(0)
            correct += predicted.eq(targets).sum().item()

        # Update learning rate with scheduler
        scheduler.step()
        # Print out infomation
        print('Epoch %d Train Loss: %.3f | Train Acc: %.3f%%' % (epoch + 1, train_loss / len(trainloader), 100.*correct/total))

        # Validation
        correct = 0
        total = 0
        model.eval()
        with torch.no_grad():
            for batch_idx, (inputs, targets) in enumerate(valloader):
                inputs, targets = inputs.to(device), targets.to(device)
                outputs = model(inputs)
                _, predicted = outputs.max(1)
                total += targets.size(0)
                correct += predicted.eq(targets).sum().item()
                acc = 100.*correct/total

        print('Epoch %d Validation Acc: %.3f%%' % (epoch + 1, acc))


        # Early Stop
        if acc > best_acc:
            best_acc = acc
            torch.save(model.state_dict(), 'best_weights.pth')
            stop_time = 0
        else:
            stop_time += 1
            patience = 5
            if (stop_time >= patience):
              print('Validation accuracy no longer improves for %d epochs, early stopping.' % patience)
              break

    model.load_state_dict(torch.load('best_weights.pth'))
    return model

In [4]:
#########################################
### ResNet-18, architecture from https://github.com/kuangliu/pytorch-cifar
#########################################

class BasicBlock(nn.Module):
    expansion = 1

    def __init__(self, in_planes, planes, stride=1):
        super(BasicBlock, self).__init__()
        self.conv1 = nn.Conv2d(
            in_planes, planes, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3,
                               stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(planes)

        self.shortcut = nn.Sequential()
        if stride != 1 or in_planes != self.expansion*planes:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_planes, self.expansion*planes,
                          kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(self.expansion*planes)
            )

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.bn2(self.conv2(out))
        out += self.shortcut(x)
        out = F.relu(out)
        return out

class Bottleneck(nn.Module):
    expansion = 4

    def __init__(self, in_planes, planes, stride=1):
        super(Bottleneck, self).__init__()
        self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=1, bias=False)
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3,
                               stride=stride, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(planes)
        self.conv3 = nn.Conv2d(planes, self.expansion *
                               planes, kernel_size=1, bias=False)
        self.bn3 = nn.BatchNorm2d(self.expansion*planes)

        self.shortcut = nn.Sequential()
        if stride != 1 or in_planes != self.expansion*planes:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_planes, self.expansion*planes,
                          kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(self.expansion*planes)
            )

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = F.relu(self.bn2(self.conv2(out)))
        out = self.bn3(self.conv3(out))
        out += self.shortcut(x)
        out = F.relu(out)
        return out

class ResNet(nn.Module):
    def __init__(self, block, num_blocks, num_classes=10):
        super(ResNet, self).__init__()
        self.in_planes = 64

        self.conv1 = nn.Conv2d(3, 64, kernel_size=3,
                               stride=1, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.layer1 = self._make_layer(block, 64, num_blocks[0], stride=1)
        self.layer2 = self._make_layer(block, 128, num_blocks[1], stride=2)
        self.layer3 = self._make_layer(block, 256, num_blocks[2], stride=2)
        self.layer4 = self._make_layer(block, 512, num_blocks[3], stride=2)
        self.linear = nn.Linear(512*block.expansion, num_classes)

    def _make_layer(self, block, planes, num_blocks, stride):
        strides = [stride] + [1]*(num_blocks-1)
        layers = []
        for stride in strides:
            layers.append(block(self.in_planes, planes, stride))
            self.in_planes = planes * block.expansion
        return nn.Sequential(*layers)

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.layer1(out)
        out = self.layer2(out)
        out = self.layer3(out)
        out = self.layer4(out)
        out = F.avg_pool2d(out, 4)
        out = out.view(out.size(0), -1)
        out = self.linear(out)
        return out

def ResNet18():
    return ResNet(BasicBlock, [2, 2, 2, 2])

In [7]:
model = ResNet18()

optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=200)
criterion = nn.CrossEntropyLoss()

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

print(device)

cuda


In [8]:
best_model = train(model, 100, train_dataloader, unlabel_dataloader, optimizer, scheduler, criterion, device)

Epoch 1 Train Loss: 2.291 | Train Acc: 18.000%
Epoch 1 Validation Acc: 9.994%
Epoch 2 Train Loss: 1.957 | Train Acc: 34.000%
Epoch 2 Validation Acc: 10.597%
Epoch 3 Train Loss: 1.634 | Train Acc: 51.000%
Epoch 3 Validation Acc: 10.142%
Epoch 4 Train Loss: 1.278 | Train Acc: 63.000%
Epoch 4 Validation Acc: 10.852%
Epoch 5 Train Loss: 0.898 | Train Acc: 84.000%
Epoch 5 Validation Acc: 20.567%
Epoch 6 Train Loss: 0.534 | Train Acc: 98.000%
Epoch 6 Validation Acc: 21.174%
Epoch 7 Train Loss: 0.362 | Train Acc: 98.000%
Epoch 7 Validation Acc: 21.487%
Epoch 8 Train Loss: 0.259 | Train Acc: 99.000%
Epoch 8 Validation Acc: 21.882%
Epoch 9 Train Loss: 0.096 | Train Acc: 100.000%
Epoch 9 Validation Acc: 22.369%
Epoch 10 Train Loss: 0.068 | Train Acc: 100.000%
Epoch 10 Validation Acc: 22.457%
Epoch 11 Train Loss: 0.087 | Train Acc: 100.000%
Epoch 11 Validation Acc: 22.523%
Epoch 12 Train Loss: 0.041 | Train Acc: 100.000%
Epoch 12 Validation Acc: 22.900%
Epoch 13 Train Loss: 0.047 | Train Acc: 100

In [9]:
correct = 0
total = 0
with torch.no_grad():
    for batch_idx, (inputs, targets) in enumerate(test_dataloader):
      inputs, targets = inputs.to(device), targets.to(device)
      outputs = model(inputs)
      _, predicted = outputs.max(1)
      total += targets.size(0)
      correct += (predicted == targets).sum().item()

print('Test Acc: %d %%' % (100 * correct / total))

Test Acc: 23 %



| Model | Number of  epochs  | Train accuracy | Test accuracy |
|------|------|------|------|
|   ResNet18  | 18 | 100% | 24% |

# Transfer learning

We propose to use pre-trained models on a classification and generative task, in order to improve the results of our setting.

## ImageNet features

Now, we will use some pre-trained models on ImageNet and see how well they compare on CIFAR. A list is available on : https://pytorch.org/vision/stable/models.html.

__Question 4 (1 points):__ Pick a model from the list above, adapt it for CIFAR10 and retrain its final layer (or a block of layers, depending on the resources to which you have access to). Report its accuracy.

In [None]:
from torchvision.models import resnet18, ResNet18_Weights

# Pre-trained resnet18 model
# New weights with accuracy 89.078%
model = resnet18(weights=ResNet18_Weights.IMAGENET1K_V1)

# Reset the last layer
num_features = model.fc.in_features
model.fc = nn.Linear(num_features, 10)

print(model)

In [None]:
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=200)
criterion = nn.CrossEntropyLoss()

In [None]:
best_model = train(model, 100, train_dataloader, unlabel_dataloader, optimizer, scheduler, criterion, device)

Epoch 1 Train Loss: 2.570 | Train Acc: 12.000%
Epoch 1 Validation Acc: 13.437%
Epoch 2 Train Loss: 1.600 | Train Acc: 51.000%
Epoch 2 Validation Acc: 18.038%
Epoch 3 Train Loss: 1.262 | Train Acc: 55.000%
Epoch 3 Validation Acc: 22.032%
Epoch 4 Train Loss: 0.883 | Train Acc: 70.000%
Epoch 4 Validation Acc: 23.118%
Epoch 5 Train Loss: 0.639 | Train Acc: 82.000%
Epoch 5 Validation Acc: 24.487%
Epoch 6 Train Loss: 0.514 | Train Acc: 86.000%
Epoch 6 Validation Acc: 24.575%
Epoch 7 Train Loss: 0.369 | Train Acc: 89.000%
Epoch 7 Validation Acc: 25.242%
Epoch 8 Train Loss: 0.401 | Train Acc: 88.000%
Epoch 8 Validation Acc: 25.772%
Epoch 9 Train Loss: 0.367 | Train Acc: 87.000%
Epoch 9 Validation Acc: 26.545%
Epoch 10 Train Loss: 0.310 | Train Acc: 86.000%
Epoch 10 Validation Acc: 26.547%
Epoch 11 Train Loss: 0.263 | Train Acc: 89.000%
Epoch 11 Validation Acc: 28.246%
Epoch 12 Train Loss: 0.281 | Train Acc: 93.000%
Epoch 12 Validation Acc: 28.413%
Epoch 13 Train Loss: 0.172 | Train Acc: 94.000

In [None]:
correct = 0
total = 0
with torch.no_grad():
    for batch_idx, (inputs, targets) in enumerate(test_dataloader):
      inputs, targets = inputs.to(device), targets.to(device)
      outputs = model(inputs)
      _, predicted = outputs.max(1)
      total += targets.size(0)
      correct += (predicted == targets).sum().item()

print('Test Acc: %d %%' % (100 * correct / total))

Test Acc: 29 %


| Model | Number of  epochs  | Train accuracy | Test accuracy |
|------|------|------|------|
|   ResNet18  | 17 | 96% | 29% |

# Incorporating *a priori*
Geometrical *a priori* are appealing for image classification tasks, though one might have to handle several boundary effects.

__Question 5 (0.5 points) :__ Explain the issues when dealing with translations, rotations, scaling effects, color changes on $32\times32$ images. Propose several ideas to tackle them.

When we mess around with 32*32 images by shifting them, rotating them, scaling them, or changing their colors, we end up losing some details. And since these small images don't have much detail to begin with, these changes can make them too distorted. This means the features that the model needs to learn from might not be clear anymore.

To avoid this, we could stick to making only small adjustments like tiny shifts or rotations. This way, most of the important stuff in the image stays intact. Another option is to make the images bigger using special methods that fill in the gaps caused by resizing, which can help preserve more of the original information.

## Data augmentations

__Question 6 (4 points):__ Propose a set of geometric transformation beyond translation, and incorporate them in your training pipeline. Train the model of the __Question 3__ with them and report the accuracies.

In [10]:
#########################################
### Geometric transformation
### Reference: https://github.com/kuangliu/pytorch-cifar
#########################################

original_transform = transforms.Compose([
    transforms.ToTensor(),
])

transform_train = transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])

transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])



batch_size = 10

# X_train(100 samples from cifar10) union geometric_transformation(X_train)

origin_train_set = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=original_transform)
origin_train_set.data = train_set.data[:100]
origin_train_set.targets = train_set.targets[:100]

transformed_train_set = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform_train)
transformed_train_set.data = train_set.data[:100]
transformed_train_set.targets = train_set.targets[:100]

combined_train_set = torch.utils.data.ConcatDataset([origin_train_set, transformed_train_set])
train_dataloader = torch.utils.data.DataLoader(combined_train_set, batch_size=batch_size, shuffle=True)

# For validation during train loop
unlabel_set =  torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform_train)
unlabel_set.data = unlabel_set.data[100:]
unlabel_set.targets = unlabel_set.targets[100:]
unlabel_dataloader = torch.utils.data.DataLoader(unlabel_set, batch_size=batch_size, shuffle=True)

# Test set
test_set = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform_test)
test_dataloader = torch.utils.data.DataLoader(test_set, batch_size=batch_size, shuffle=False)

Files already downloaded and verified
Files already downloaded and verified
Files already downloaded and verified
Files already downloaded and verified


In [12]:
print("Number of batches in the train_dataloader:", len(train_dataloader))
print("Batch size:", train_dataloader.batch_size)
print("Number of samples in the train_dataloader:", len(train_dataloader.dataset))

Number of batches in the train_dataloader: 20
Batch size: 10
Number of samples in the train_dataloader: 200


In [13]:
model = ResNet18()

optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=200)
criterion = nn.CrossEntropyLoss()

print(model)

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential()
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=

In [14]:
best_model = train(model, 100, train_dataloader, unlabel_dataloader, optimizer, scheduler, criterion, device)

Epoch 1 Train Loss: 2.393 | Train Acc: 12.000%
Epoch 1 Validation Acc: 9.994%
Epoch 2 Train Loss: 2.355 | Train Acc: 15.500%
Epoch 2 Validation Acc: 9.988%
Epoch 3 Train Loss: 2.177 | Train Acc: 19.000%
Epoch 3 Validation Acc: 15.912%
Epoch 4 Train Loss: 2.017 | Train Acc: 27.500%
Epoch 4 Validation Acc: 15.697%
Epoch 5 Train Loss: 1.873 | Train Acc: 35.500%
Epoch 5 Validation Acc: 20.627%
Epoch 6 Train Loss: 1.722 | Train Acc: 39.000%
Epoch 6 Validation Acc: 20.926%
Epoch 7 Train Loss: 1.575 | Train Acc: 45.500%
Epoch 7 Validation Acc: 18.361%
Epoch 8 Train Loss: 1.463 | Train Acc: 45.500%
Epoch 8 Validation Acc: 17.505%
Epoch 9 Train Loss: 1.388 | Train Acc: 52.500%
Epoch 9 Validation Acc: 22.283%
Epoch 10 Train Loss: 1.211 | Train Acc: 56.500%
Epoch 10 Validation Acc: 20.637%
Epoch 11 Train Loss: 1.038 | Train Acc: 66.500%
Epoch 11 Validation Acc: 22.481%
Epoch 12 Train Loss: 0.871 | Train Acc: 71.000%
Epoch 12 Validation Acc: 23.944%
Epoch 13 Train Loss: 0.832 | Train Acc: 71.500%


In [15]:
correct = 0
total = 0
with torch.no_grad():
    for batch_idx, (inputs, targets) in enumerate(test_dataloader):
      inputs, targets = inputs.to(device), targets.to(device)
      outputs = model(inputs)
      _, predicted = outputs.max(1)
      total += targets.size(0)
      correct += (predicted == targets).sum().item()

print('Test Acc: %d %%' % (100 * correct / total))

Test Acc: 27 %


| Model | Number of  epochs  | Train accuracy | Test accuracy |
|------|------|------|------|
|   ResNet18  | 18 | 86% | 27% |

In [18]:
from torchvision.models import resnet18, ResNet18_Weights

# Pre-trained resnet18 model
model = resnet18(weights=ResNet18_Weights.IMAGENET1K_V1)
num_features = model.fc.in_features
model.fc = nn.Linear(num_features, 10)

optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=200)
criterion = nn.CrossEntropyLoss()

Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /root/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth
100%|██████████| 44.7M/44.7M [00:00<00:00, 140MB/s]


In [19]:
best_model = train(model, 100, train_dataloader, unlabel_dataloader, optimizer, scheduler, criterion, device)

Epoch 1 Train Loss: 2.518 | Train Acc: 14.000%
Epoch 1 Validation Acc: 17.315%
Epoch 2 Train Loss: 2.097 | Train Acc: 27.500%
Epoch 2 Validation Acc: 16.689%
Epoch 3 Train Loss: 1.957 | Train Acc: 38.500%
Epoch 3 Validation Acc: 20.529%
Epoch 4 Train Loss: 1.640 | Train Acc: 45.000%
Epoch 4 Validation Acc: 23.385%
Epoch 5 Train Loss: 1.608 | Train Acc: 46.000%
Epoch 5 Validation Acc: 21.996%
Epoch 6 Train Loss: 1.432 | Train Acc: 48.000%
Epoch 6 Validation Acc: 25.012%
Epoch 7 Train Loss: 1.282 | Train Acc: 56.000%
Epoch 7 Validation Acc: 26.036%
Epoch 8 Train Loss: 1.161 | Train Acc: 59.000%
Epoch 8 Validation Acc: 25.912%
Epoch 9 Train Loss: 1.186 | Train Acc: 60.500%
Epoch 9 Validation Acc: 27.124%
Epoch 10 Train Loss: 0.804 | Train Acc: 73.000%
Epoch 10 Validation Acc: 27.559%
Epoch 11 Train Loss: 0.839 | Train Acc: 74.500%
Epoch 11 Validation Acc: 26.373%
Epoch 12 Train Loss: 0.803 | Train Acc: 72.000%
Epoch 12 Validation Acc: 25.333%
Epoch 13 Train Loss: 0.927 | Train Acc: 72.500

In [20]:
correct = 0
total = 0
with torch.no_grad():
    for batch_idx, (inputs, targets) in enumerate(test_dataloader):
      inputs, targets = inputs.to(device), targets.to(device)
      outputs = model(inputs)
      _, predicted = outputs.max(1)
      total += targets.size(0)
      correct += (predicted == targets).sum().item()

print('Test Acc: %d %%' % (100 * correct / total))

Test Acc: 32 %


# Conclusions

__Question 7 (3 points) :__ Write a short report explaining the pros and the cons of each method that you implemented. 25% of the grade of this project will correspond to this question, thus, it should be done carefully. In particular, please add a plot that will summarize all your numerical results.

### Results

<center>

| Model | Number of  epochs  | Train accuracy | Test accuracy |
|------|------|------|------|
|   ResNet18  | 18 | 100% | 24% |
|   ResNet18 (pre-trained)  | 17 | 96% | 29% |
|   ResNet18 (data augmentation)  | 18 | 86% | 27% |
|   ResNet18 (pre-trained & data augmentation)  | 24 | 86% | 32% |


<center>

# Weak supervision

__Bonus \[open\] question (up to 3 points) :__ Pick a weakly supervised method that will potentially use $\mathcal{X}\cup\mathcal{X}_{\text{train}}$ to train a representation (a subset of $\mathcal{X}$ is also fine). Evaluate it and report the accuracies. You should be careful in the choice of your method, in order to avoid heavy computational effort.