# Small data and deep learning
This mini-project proposes to study several techniques for improving challenging context, in which few data and resources are available.

# Introduction
Assume we are in a context where few "gold" labeled data are available for training, say $\mathcal{X}_{\text{train}}\triangleq\{(x_n,y_n)\}_{n\leq N_{\text{train}}}$, where $N_{\text{train}}$ is small. A large test set $\mathcal{X}_{\text{test}}$ is available. A large amount of unlabeled data, $\mathcal{X}$, is available. We also assume that we have a limited computational budget (e.g., no GPUs).

For each question, write a commented *Code* or a complete answer as a *Markdown*. When the objective of a question is to report a CNN accuracy, please use the following format to report it, at the end of the question:

| Model | Number of  epochs  | Train accuracy | Test accuracy |
|------|------|------|------|
|   XXX  | XXX | XXX | XXX |

If applicable, please add the field corresponding to the  __Accuracy on Full Data__ as well as a link to the __Reference paper__ you used to report those numbers. (You do not need to train a CNN on the full CIFAR10 dataset)

In your final report, please keep the logs of each training procedure you used. We will only run this jupyter if we have some doubts on your implementation. 

__The total file sizes should not exceed 2MB. Please name your notebook (LASTNAME)\_(FIRSTNAME).ipynb, zip/tar it with any necessary files required to run your notebook, in a compressed file named (LASTNAME)\_(FIRSTNAME).X where X is the corresponding extension. Zip/tar files exceeding 2MB will not be considered for grading. Submit the compressed file via the submission link provided on the website of the class.__

You can use https://colab.research.google.com/ to run your experiments.

## Training set creation
__Question 1:__ Propose a dataloader or modify the file located at https://github.com/pytorch/vision/blob/master/torchvision/datasets/cifar.py in order to obtain a training loader that will only use the first 100 samples of the CIFAR-10 training set. 

In [2]:
import torch
import torchvision
import torchvision.transforms as transforms

class SubsetSampler(torch.utils.data.Sampler):
  def __init__(self, subset_indice):
    self.subset_indice = subset_indice

  def __iter__(self):
    return iter(self.subset_indice)

  def __len__(self):
    return len(self.subset_indice)
      
class SubsetDataLoader(torch.utils.data.DataLoader):
  def __init__(self, subset_indice, dataset, batch_size=1, shuffle=False, sampler=None, batch_sampler=None, num_workers=0):
    torch.utils.data.DataLoader.__init__(self, dataset, batch_size=batch_size, shuffle=shuffle, sampler=SubsetSampler(subset_indice), batch_sampler=batch_sampler, num_workers=num_workers)
      
  
my_transform = transforms.Compose([
    transforms.ToTensor()
])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=my_transform)           
X_train = SubsetDataLoader(range(0, 100), trainset)
X = SubsetDataLoader(range(100, len(trainset)), trainset)

testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=my_transform)
X_test = torch.utils.data.DataLoader(testset)

print("Train size : "+str(len(X_train)))
print("X size : "+str(len(X)))
print("Test size : "+str(len(X_test)))

Files already downloaded and verified
Files already downloaded and verified


170500096it [00:50, 7279672.64it/s]                               

Train size : 100
X size : 49900
Test size : 10000


This is our dataset $\mathcal{X}_{\text{train}}$, it will be used until the end of this project. The remaining samples correspond to $\mathcal{X}$. The testing set $\mathcal{X}_{\text{test}}$ corresponds to the whole testing set of CIFAR-10.

## Testing procedure
__Question 2:__ Explain why the evaluation of the training procedure is difficult. Propose several solutions.

The training is complicated since it saturates quickly at 100% (large overfitting)

# Raw approach: the baseline

In this section, the goal is to train a CNN on $\mathcal{X}_{\text{train}}$ and compare its performances with reported number from the litterature. You will have to re-use and/or design a standard classification pipeline. You should optimize your pipeline to obtain the best performances (image size, data augmentation by flip, ...).

The key ingredients for training a CNN are the batch size, as well as the learning rate schedule, i.e. how to decrease the learning rate as a function of the number of epochs. A possible schedule is to start the learning rate at 0.1 and decreasing it every 30 epochs by 10. In case of divergence, reduce the laerning rate. A potential batch size could be 10, yet this can be cross-validated.

You can get some baselines accuracies in this paper: http://openaccess.thecvf.com/content_cvpr_2018/papers/Keshari_Learning_Structure_and_CVPR_2018_paper.pdf. Obviously, it is a different context, as those researchers had access to GPUs.

## ResNet architectures

__Question 3:__ Write a classification pipeline for $\mathcal{X}_{\text{train}}$, train from scratch and evaluate a *ResNet-18* architecture specific to CIFAR10 (details about the ImageNet model can be found here: https://arxiv.org/abs/1409.1556 ). If possible, please report the accuracy obtained on the whole dataset, as well as the reference paper/GitHub link you might have used.

*Hint:* You can re-use the following code: https://github.com/kuangliu/pytorch-cifar. During a training of 10 epochs, a batch size of 10 and a learning rate of 0.01, one obtains 40% accuracy on $\mathcal{X}_{\text{train}}$ (~2 minutes) and 20% accuracy on $\mathcal{X}_{\text{test}}$ (~5 minutes).

In [0]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class BasicBlock(nn.Module):
    expansion = 1

    def __init__(self, in_planes, planes, stride=1):
        super(BasicBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(planes)

        self.shortcut = nn.Sequential()
        if stride != 1 or in_planes != self.expansion*planes:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_planes, self.expansion*planes, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(self.expansion*planes)
            )

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.bn2(self.conv2(out))
        out += self.shortcut(x)
        out = F.relu(out)
        return out

class ResNet(nn.Module):
    def __init__(self, block, num_blocks, num_classes=10):
        super(ResNet, self).__init__()
        self.in_planes = 64

        self.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.layer1 = self._make_layer(block, 64, num_blocks[0], stride=1)
        self.layer2 = self._make_layer(block, 128, num_blocks[1], stride=2)
        self.layer3 = self._make_layer(block, 256, num_blocks[2], stride=2)
        self.layer4 = self._make_layer(block, 512, num_blocks[3], stride=2)
        self.linear = nn.Linear(512*block.expansion, num_classes)

    def _make_layer(self, block, planes, num_blocks, stride):
        strides = [stride] + [1]*(num_blocks-1)
        layers = []
        for stride in strides:
            layers.append(block(self.in_planes, planes, stride))
            self.in_planes = planes * block.expansion
        return nn.Sequential(*layers)

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.layer1(out)
        out = self.layer2(out)
        out = self.layer3(out)
        out = self.layer4(out)
        out = F.avg_pool2d(out, 4)
        out = out.view(out.size(0), -1)
        out = self.linear(out)
        return out


def ResNet18():
    return ResNet(BasicBlock, [2,2,2,2])

In [0]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'

X_test = torch.utils.data.DataLoader(testset, batch_size=512)
  
def train_and_test(net, X_train, transfert_learning=None):
  net = net.to(device)
  if device == 'cuda':
      net = torch.nn.DataParallel(net)
      cudnn.benchmark = True

  criterion = nn.CrossEntropyLoss()
  if transfert_learning is None:
    optimizer = optim.SGD(net.parameters(), lr=lr)
  else:
    for child in net.children():
      for name, param in child.named_parameters():
        param.requires_grad = False
    transfert_learning.requires_grad = True
    transfert_learning.weight.requires_grad = True
    transfert_learning.bias.requires_grad = True
    optimizer = optim.SGD(filter(lambda p: p.requires_grad, net.parameters()), lr=lr)
    #optimizer.add_param_group({'params': transfert_learning.parameters()})


  # Training
  def train(epoch, trainloader):
      net.train()
      train_loss = 0
      correct = 0
      total = 0
      for batch_idx, (inputs, targets) in enumerate(trainloader):
          inputs, targets = inputs.to(device), targets.to(device)
          optimizer.zero_grad()
          outputs = net(inputs)
          loss = criterion(outputs, targets)
          loss.backward()
          optimizer.step()

          train_loss += loss.item()
          _, predicted = outputs.max(1)
          total += targets.size(0)
          correct += predicted.eq(targets).sum().item()

      print('Loss: %.3f | Acc: %.3f%% (%d/%d)'
              % (train_loss/(batch_idx+1), 100.*correct/total, correct, total))

  def test(testloader):
      global best_acc
      net.eval()
      test_loss = 0
      correct = 0
      total = 0
      with torch.no_grad():
          for batch_idx, (inputs, targets) in enumerate(testloader):
              inputs, targets = inputs.to(device), targets.to(device)
              outputs = net(inputs)
              loss = criterion(outputs, targets)

              test_loss += loss.item()
              _, predicted = outputs.max(1)
              total += targets.size(0)
              correct += predicted.eq(targets).sum().item()

          print('Loss: %.3f | Acc: %.3f%% (%d/%d)'
                  % (test_loss/(batch_idx+1), 100.*correct/total, correct, total))

  for epoch in range(0, nb_epoch):
      print('\nEpoch: %d' % epoch)
      print("Train")
      train(epoch, X_train)

  print("\nTest on : ")
  print("X_test")
  test(X_test)

In [0]:
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import torch.backends.cudnn as cudnn

import torchvision
import torchvision.transforms as transforms

from tqdm import tqdm

#params
nb_epoch = 10
batch_size = 10
lr = 0.01


print("Partial Dataset")
X_train = SubsetDataLoader(range(0, 100), trainset, batch_size=batch_size)
train_and_test(ResNet18(), X_train)
print("Complete Dataset")
X_train_complete = torch.utils.data.DataLoader(trainset, batch_size=batch_size)
train_and_test(ResNet18(), X_train_complete)


Epoch: 0
Train
Loss: 2.292 | Acc: 11.000% (11/100)

Epoch: 1
Train
Loss: 1.981 | Acc: 30.000% (30/100)

Epoch: 2
Train
Loss: 1.663 | Acc: 56.000% (56/100)

Epoch: 3
Train
Loss: 1.400 | Acc: 67.000% (67/100)

Epoch: 4
Train
Loss: 1.094 | Acc: 83.000% (83/100)

Epoch: 5
Train
Loss: 0.753 | Acc: 96.000% (96/100)

Epoch: 6
Train
Loss: 0.504 | Acc: 98.000% (98/100)

Epoch: 7
Train
Loss: 0.320 | Acc: 100.000% (100/100)

Epoch: 8
Train
Loss: 0.219 | Acc: 100.000% (100/100)

Epoch: 9
Train
Loss: 0.161 | Acc: 100.000% (100/100)

Test on : 
X_test
Loss: 2.289 | Acc: 19.580% (1958/10000)
Complete Dataset

Epoch: 0
Train
Loss: 1.276 | Acc: 54.148% (27074/50000)

Epoch: 1
Train
Loss: 0.778 | Acc: 73.050% (36525/50000)

Epoch: 2
Train
Loss: 0.541 | Acc: 81.770% (40885/50000)

Epoch: 3
Train
Loss: 0.359 | Acc: 88.084% (44042/50000)

Epoch: 4
Train
Loss: 0.241 | Acc: 91.894% (45947/50000)

Epoch: 5
Train
Loss: 0.175 | Acc: 94.046% (47023/50000)

Epoch: 6
Train
Loss: 0.123 | Acc: 95.762% (47881/50000)

###Training on 100 samples :

| Model | Number of  epochs  | Train accuracy | Test accuracy |
|------|------|------|------|
|   ResNet18  | 10 | 100% | 19.6% |

The accuracy on the training set is already at 100%, no need to increase the number of epochs.

###Training on the whole dataset :

| Model | Number of  epochs  | Train accuracy | Test accuracy |
|------|------|------|------|
|   ResNet18  | 10 | 98.7% | 80.5% |

## VGG-like architectures

__Question 4:__ Same question as before, but with a *VGG*. Which model do you recommend?

In [0]:
cfg = {
    'VGG11': [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
    'VGG13': [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
    'VGG16': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'],
    'VGG19': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512, 'M'],
}


class VGG(nn.Module):
    def __init__(self, vgg_name):
        super(VGG, self).__init__()
        self.features = self._make_layers(cfg[vgg_name])
        self.classifier = nn.Linear(512, 10)

    def forward(self, x):
        out = self.features(x)
        out = out.view(out.size(0), -1)
        out = self.classifier(out)
        return out

    def _make_layers(self, cfg):
        layers = []
        in_channels = 3
        for x in cfg:
            if x == 'M':
                layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
            else:
                layers += [nn.Conv2d(in_channels, x, kernel_size=3, padding=1),
                           nn.BatchNorm2d(x),
                           nn.ReLU(inplace=True)]
                in_channels = x
        layers += [nn.AvgPool2d(kernel_size=1, stride=1)]
        return nn.Sequential(*layers)


In [0]:
print("Partial Dataset")
X_train = SubsetDataLoader(range(0, 100), trainset, batch_size=batch_size)
train_and_test(VGG('VGG16'), X_train)
print("Complete Dataset")
X_train_complete = torch.utils.data.DataLoader(trainset, batch_size=batch_size)
train_and_test(VGG('VGG16'), X_train_complete)

###Training on 100 samples :

| Model | Number of  epochs  | Train accuracy | Test accuracy |
|------|------|------|------|
|   VGG16 | 10 | 100% | 24.7% |
The accuracy on the training set is already at 100%. Even if it could decrease the loss, I took it as a stop criterion. So, there is no need to increase the number of epochs.

###Training on the whole dataset :

| Model | Number of  epochs  | Train accuracy | Test accuracy |
|------|------|------|------|
|   VGG16  | 10 | 98.7% | 80.5% |

# Transfer learning

We propose to use pre-trained models on a classification and generative task, in order to improve the results of our setting.

## ImageNet features

Now, we will use some pre-trained models on ImageNet and see how well they compare on CIFAR. A list is available on: https://pytorch.org/docs/stable/torchvision/models.html.

__Question 5:__ Pick a model from the list above, adapt it to CIFAR and retrain its final layer (or a block of layers, depending on the resources to which you have access to). Report its accuracy.

In [0]:
import torchvision.models as models
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import torch.backends.cudnn as cudnn

import torchvision
import torchvision.transforms as transforms

from tqdm import tqdm

my_transform = transforms.Compose([
    transforms.ToTensor()
])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=my_transform) 

#params
nb_epoch = 80
batch_size = 10
lr = 0.01

print("Partial Dataset")
net = models.vgg16(pretrained=True)
last_layer = net.classifier[6]
X_train = SubsetDataLoader(range(0, 100), trainset, batch_size=batch_size)
train_and_test(net, X_train, transfert_learning=last_layer)

Files already downloaded and verified
Partial Dataset

Epoch: 0
Train
Loss: 11.737 | Acc: 3.000% (3/100)

Epoch: 1
Train
Loss: 5.159 | Acc: 30.000% (30/100)

Epoch: 2
Train
Loss: 2.553 | Acc: 50.000% (50/100)

Epoch: 3
Train
Loss: 1.707 | Acc: 56.000% (56/100)

Epoch: 4
Train
Loss: 0.953 | Acc: 76.000% (76/100)

Epoch: 5
Train
Loss: 0.795 | Acc: 78.000% (78/100)

Epoch: 6
Train
Loss: 0.554 | Acc: 84.000% (84/100)

Epoch: 7
Train
Loss: 0.482 | Acc: 84.000% (84/100)

Epoch: 8
Train
Loss: 0.482 | Acc: 82.000% (82/100)

Epoch: 9
Train
Loss: 0.369 | Acc: 91.000% (91/100)

Epoch: 10
Train
Loss: 0.223 | Acc: 94.000% (94/100)

Epoch: 11
Train
Loss: 0.221 | Acc: 97.000% (97/100)

Epoch: 12
Train
Loss: 0.214 | Acc: 93.000% (93/100)

Epoch: 13
Train
Loss: 0.213 | Acc: 95.000% (95/100)

Epoch: 14
Train
Loss: 0.279 | Acc: 94.000% (94/100)

Epoch: 15
Train
Loss: 0.211 | Acc: 92.000% (92/100)

Epoch: 16
Train
Loss: 0.214 | Acc: 93.000% (93/100)

Epoch: 17
Train
Loss: 0.120 | Acc: 97.000% (97/100)

Ep

Training on 100 samples :

| Model | Number of  epochs  | Train accuracy | Test accuracy |
|------|------|------|------|
|   VGG16  | 10 | 93% | 27.7% |
|   VGG16  | 20 | 98% | 28.2% |
|   VGG16  | 40 | 99% | 29.3% |
|   VGG16  | 80 | 100% | 30.0% |

We have increased the number of epochs (by a factor 2) until the accuracy on the training set reach 100%

## DCGan features

GANs correspond to an unsupervised technique for generating images. In https://arxiv.org/pdf/1511.06434.pdf, Sec. 5.1 shows that the representation obtained from the Discriminator has some nice generalization properties on CIFAR10.

__Question 6:__  Using for instance a pretrained model from https://github.com/soumith/dcgan.torch combined with https://github.com/pytorch/examples/tree/master/dcgan, propose a model to train on $\mathcal{X}_{\text{train}}$. Train it and report its accuracy.

*Hint:* You can use the library: https://github.com/bshillingford/python-torchfile to load the weights of a model from torch(Lua) to pytorch(python).

# Incorporating *a priori*
Geometrical *a priori* are appealing for image classification tasks. For now, we only consider linear transformations $\mathcal{T}$ of the inputs $x:\mathbb{S}^2\rightarrow\mathbb{R}$ where $\mathbb{S}$ is the support of an image, meaning that:

$$\forall u\in\mathbb{S}^2,\mathcal{T}(\lambda x+\mu y)(u)=\lambda \mathcal{T}(x)(u)+\mu \mathcal{T}(y)(u)\,.$$

For instance if an image had an infinite support, a translation $\mathcal{T}_a$ by $a$ would lead to:

$$\forall u, \mathcal{T}_a(x)(u)=x(u-a)\,.$$

Otherwise, one has to handle several boundary effects.

__Question 7:__ Explain the issues when dealing with translations, rotations, scaling effects, color changes on $32\times32$ images. Propose several ideas to tackle them.

- A lot of information is lost (the trimmed parts)
- A lot of noise is added (the padding added opposite the trimmed part)
- It can take more time to train (more epochs before the training score saturates)



## Data augmentations

__Question 8:__ Propose a set of geometric transformation beyond translation, and incorporate them in your training pipeline. Train the model of the __Question 3__ and __Question 4__ with them and report the accuracies.

In [0]:
import torchvision.models as models
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import torch.backends.cudnn as cudnn

import torchvision
import torchvision.transforms as transforms
from torch.autograd import Variable

from tqdm import tqdm
import numpy as np

class GaussianNoise(object):
    def __init__(self, sigma):
        self.sigma = sigma

    def __call__(self, image):
        return image + Variable(torch.randn(image.size()) * self.sigma)
      
      
my_transform_beyond = transforms.Compose([
    #transforms.RandomAffine(2, translate=(0.05, 0.05), scale=(0.95, 1.05), shear=0.05),
    transforms.ToTensor(),
    GaussianNoise(sigma=0.3)
])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=my_transform_beyond)           

#params
nb_epoch = 40
batch_size = 10
lr = 0.01

print("Partial Dataset")
X_train = SubsetDataLoader(range(0, 100), trainset, batch_size=batch_size)
train_and_test(ResNet18(), X_train)

Files already downloaded and verified
Partial Dataset

Epoch: 0
Train
Loss: 2.320 | Acc: 10.000% (10/100)

Epoch: 1
Train
Loss: 2.258 | Acc: 11.000% (11/100)

Epoch: 2
Train
Loss: 2.211 | Acc: 16.000% (16/100)

Epoch: 3
Train
Loss: 2.171 | Acc: 19.000% (19/100)

Epoch: 4
Train
Loss: 2.116 | Acc: 22.000% (22/100)

Epoch: 5
Train
Loss: 2.079 | Acc: 24.000% (24/100)

Epoch: 6
Train
Loss: 2.018 | Acc: 26.000% (26/100)

Epoch: 7
Train
Loss: 1.948 | Acc: 32.000% (32/100)

Epoch: 8
Train
Loss: 1.867 | Acc: 35.000% (35/100)

Epoch: 9
Train
Loss: 1.821 | Acc: 33.000% (33/100)

Epoch: 10
Train
Loss: 1.706 | Acc: 38.000% (38/100)

Epoch: 11
Train
Loss: 1.615 | Acc: 41.000% (41/100)

Epoch: 12
Train
Loss: 1.504 | Acc: 51.000% (51/100)

Epoch: 13
Train
Loss: 1.379 | Acc: 60.000% (60/100)

Epoch: 14
Train
Loss: 1.238 | Acc: 64.000% (64/100)

Epoch: 15
Train
Loss: 1.115 | Acc: 71.000% (71/100)

Epoch: 16
Train
Loss: 1.006 | Acc: 78.000% (78/100)

Epoch: 17
Train
Loss: 0.875 | Acc: 80.000% (80/100)

E

Training on 100 samples :

###With *RandomAffine(2, translate=(0.1, 0.1), scale=(0.9, 1.1), shear=0.1)* :

| Model | Number of  epochs  | Train accuracy | Test accuracy |
|------|------|------|------|
|   VGG16  | 10 | 77% | 21.9% |
|   **VGG16**  | 20 | 100% | **25.6%** |
|   ResNet18  | 10 | 51% | 19.7% |
|   ResNet18  | 20 | 89% | 18.4% |
|   ResNet18  | 40 | 100% | 23.3% |

###With *RandomAffine(2, translate=(0.05, 0.05), scale=(0.95, 1.05), shear=0.05)* :

| Model | Number of  epochs  | Train accuracy | Test accuracy |
|------|------|------|------|
|   VGG16  | 10 | 100% | 22.9% |
|   ResNet18  | 10 | 67% | 20.5% |
|   ResNet18  | 20 | 100% | 23.1% |

###With *RandomAffine(2, translate=(0.1, 0.1), scale=(0.8, 1.2), shear=0.2)* :

| Model | Number of  epochs  | Train accuracy | Test accuracy |
|------|------|------|------|
|   VGG16  | 10 | 70% | 13.4% |
|   VGG16  | 20 | 100% | 22.9% |
|   ResNet18  | 10 | 52% | 22.6% |
|   ResNet18  | 20 | 82% | 19.0% |
|   **ResNet18**  | 40 | 100% | **24.4%** |

###With *GaussianNoise(sigma=0.05)* only :

| Model | Number of  epochs  | Train accuracy | Test accuracy |
|------|------|------|------|
|   VGG16  | 10 | 100% | 22.9% |
|   ResNet18  | 10 | 97% | 20.9% |
|   ResNet18  | 20 | 100% | 22.4% |

###With *GaussianNoise(sigma=0.1)*  :

| Model | Number of  epochs  | Train accuracy | Test accuracy |
|------|------|------|------|
|   VGG16  | 10 | 100% | 21.7% |
|   ResNet18  | 10 | 74% | 21.1% |
|   ResNet18  | 20 | 100% | 21.1% |

###With *GaussianNoise(sigma=0.3)*  :

| Model | Number of  epochs  | Train accuracy | Test accuracy |
|------|------|------|------|
|   VGG16  | 10 | 93% | 12.4% |
|   VGG16  | 20 | 100% | 17.6% |
|   ResNet18  | 10 | 33% | 14.9% |
|   ResNet18  | 20 | 88% | 18.3% |
|   ResNet18  | 40 | 100% | 19.6% |

## Wavelets

__Question 9:__ Use a Scattering Transform as an input to a ResNet-like architecture. You can find a baseline here: https://arxiv.org/pdf/1703.08961.pdf.

*Hint:* You can use the following package: https://www.kymat.io/

In [0]:
!pip install kymatio
from kymatio import Scattering2D

import torchvision.models as models
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import torch.backends.cudnn as cudnn

import torchvision
import torchvision.transforms as transforms
from torch.autograd import Variable

from tqdm import tqdm
import numpy as np


class MyScattingNetwork(nn.Module):
    def __init__(self, net):
        nn.Module.__init__(self)
        self.net = net

    def forward(self, x):
        return self.net(x.view(x.size()[0], x.size()[1]*x.size()[2], x.size()[3], x.size()[4]))


def ResNet18():
    return ResNet(BasicBlock, [2,2,2,2])
  

class ScatteringTransform(object):
    def __init__(self, J):
        self.scattering = Scattering2D(J=J, shape=(32, 32))

    def __call__(self, image):
        return self.scattering(image)
      
class ResNet(nn.Module):
    def __init__(self, block, num_blocks, num_classes=10):
        super(ResNet, self).__init__()
        self.in_planes = 64

        self.conv1 = nn.Conv2d(243, 64, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.layer1 = self._make_layer(block, 64, num_blocks[0], stride=1)
        self.layer2 = self._make_layer(block, 128, num_blocks[1], stride=2)
        self.layer3 = self._make_layer(block, 256, num_blocks[2], stride=2)
        self.layer4 = self._make_layer(block, 512, num_blocks[3], stride=2)
        self.linear = nn.Linear(512*block.expansion, num_classes)

    def _make_layer(self, block, planes, num_blocks, stride):
        strides = [stride] + [1]*(num_blocks-1)
        layers = []
        for stride in strides:
            layers.append(block(self.in_planes, planes, stride))
            self.in_planes = planes * block.expansion
        return nn.Sequential(*layers)

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.layer1(out)
        out = self.layer2(out)
        out = self.layer3(out)
        out = self.layer4(out)
        out = F.avg_pool2d(out, 4)
        out = out.view(out.size(0), -1)
        out = self.linear(out)
        return out


def ResNet18():
    return ResNet(BasicBlock, [2,2,2,2])
      
my_transform_wavelet = transforms.Compose([
    transforms.ToTensor(),
    ScatteringTransform(J=2)
])


trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=my_transform_wavelet) 
testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=my_transform_wavelet)
X_test = torch.utils.data.DataLoader(testset, batch_size=512)

#params
nb_epoch = 10
batch_size = 10
lr = 0.01

print("Partial Dataset")
X_train = SubsetDataLoader(range(0, 100), trainset, batch_size=batch_size)
train_and_test(MyScattingNetwork(ResNet18()), X_train)


Files already downloaded and verified
Files already downloaded and verified
Partial Dataset

Epoch: 0
Train
Loss: 2.433 | Acc: 7.000% (7/100)

Epoch: 1
Train
Loss: 2.169 | Acc: 22.000% (22/100)

Epoch: 2
Train
Loss: 1.471 | Acc: 55.000% (55/100)

Epoch: 3
Train
Loss: 0.860 | Acc: 79.000% (79/100)

Epoch: 4
Train
Loss: 0.410 | Acc: 99.000% (99/100)

Epoch: 5
Train
Loss: 0.172 | Acc: 100.000% (100/100)

Epoch: 6
Train
Loss: 0.085 | Acc: 100.000% (100/100)

Epoch: 7
Train
Loss: 0.057 | Acc: 100.000% (100/100)

Epoch: 8
Train
Loss: 0.044 | Acc: 100.000% (100/100)

Epoch: 9
Train
Loss: 0.037 | Acc: 100.000% (100/100)

Test on : 
X_test


The training is much longer (on Colab, about 10s per ecoch vs 0.5s before)

# Weak supervision

Weakly supervised techniques permit to tackle the issue of labeled data. An introduction to those techniques can be found here: https://hazyresearch.github.io/snorkel/blog/ws_blog_post.html.

__(Open) Question 10:__ Pick a weakly supervised method that will potentially use $\mathcal{X}\cup\mathcal{X}_{\text{train}}$ to train a representation (a subset of $\mathcal{X}$ is also fine). Evaluate it and report the accuracies. You should be careful in the choice of your method, in order to avoid heavy computational effort.

# Conclusions

__Question 11:__ Write a short report explaining the pros and the cons of each methods that you implemented. 25% of the grade of this project will correspond to this question, thus, it should be done carefully. In particular, please add a plot that will summarize all your numerical results.