# Small data and deep learning
This mini-project proposes to study several techniques for improving challenging context, in which few data and resources are available.

# Introduction
Assume we are in a context where few "gold" labeled data are available for training, say $\mathcal{X}_{\text{train}}\triangleq\{(x_n,y_n)\}_{n\leq N_{\text{train}}}$, where $N_{\text{train}}$ is small. A large test set $\mathcal{X}_{\text{test}}$ is available. A large amount of unlabeled data, $\mathcal{X}$, is available. We also assume that we have a limited computational budget (e.g., no GPUs).

For each question, write a commented *Code* or a complete answer as a *Markdown*. When the objective of a question is to report a CNN accuracy, please use the following format to report it, at the end of the question:

| Model | Number of  epochs  | Train accuracy | Test accuracy |
|------|------|------|------|
|   XXX  | XXX | XXX | XXX |

If applicable, please add the field corresponding to the  __Accuracy on Full Data__ as well as a link to the __Reference paper__ you used to report those numbers. (You do not need to train a CNN on the full CIFAR10 dataset)

In your final report, please keep the logs of each training procedure you used. We will only run this jupyter if we have some doubts on your implementation. 

__The total file sizes should not exceed 2MB. Please name your notebook (LASTNAME)\_(FIRSTNAME).ipynb, zip/tar it with any necessary files required to run your notebook, in a compressed file named (LASTNAME)\_(FIRSTNAME).X where X is the corresponding extension. Zip/tar files exceeding 2MB will not be considered for grading. Submit the compressed file via the submission link provided on the website of the class.__

You can use https://colab.research.google.com/ to run your experiments.

## Training set creation
__Question 1:__ Propose a dataloader or modify the file located at https://github.com/pytorch/vision/blob/master/torchvision/datasets/cifar.py in order to obtain a training loader that will only use the first 100 samples of the CIFAR-10 training set. 

In [117]:
from __future__ import print_function
from PIL import Image
import os
import os.path
import numpy as np
import sys
import torch.utils.data as data
import hashlib
import errno
from torch.utils.model_zoo import tqdm
if sys.version_info[0] == 2:
    import cPickle as pickle
else:
    import pickle

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import torch.backends.cudnn as cudnn

import torchvision
import torchvision.transforms as transforms


def download_url(url, root, filename=None, md5=None):
    """Download a file from a url and place it in root.
    Args:
        url (str): URL to download file from
        root (str): Directory to place downloaded file in
        filename (str, optional): Name to save the file under. If None, use the basename of the URL
        md5 (str, optional): MD5 checksum of the download. If None, do not check
    """
    from six.moves import urllib

    root = os.path.expanduser(root)
    if not filename:
        filename = os.path.basename(url)
    fpath = os.path.join(root, filename)

    makedir_exist_ok(root)

    # downloads file
    if os.path.isfile(fpath):
        print('Using downloaded and verified file: ' + fpath)
    else:
        try:
            print('Downloading ' + url + ' to ' + fpath)
            urllib.request.urlretrieve(
                url, fpath,
                reporthook=gen_bar_updater()
            )
        except OSError:
            if url[:5] == 'https':
                url = url.replace('https:', 'http:')
                print('Failed download. Trying https -> http instead.'
                      ' Downloading ' + url + ' to ' + fpath)
                urllib.request.urlretrieve(
                    url, fpath,
                    reporthook=gen_bar_updater()
                )

                
def makedir_exist_ok(dirpath):
    """
    Python2 support for os.makedirs(.., exist_ok=True)
    """
    try:
        os.makedirs(dirpath)
    except OSError as e:
        if e.errno == errno.EEXIST:
            pass
        else:
            raise


def gen_bar_updater():
    pbar = tqdm(total=None)

    def bar_update(count, block_size, total_size):
        if pbar.total is None and total_size:
            pbar.total = total_size
        progress_bytes = count * block_size
        pbar.update(progress_bytes - pbar.n)

    return bar_update


class CIFAR10(data.Dataset):
    """`CIFAR10 <https://www.cs.toronto.edu/~kriz/cifar.html>`_ Dataset.
    Args:
        root (string): Root directory of dataset where directory
            ``cifar-10-batches-py`` exists or will be saved to if download is set to True.
        transform (callable, optional): A function/transform that takes in an PIL image
            and returns a transformed version. E.g, ``transforms.RandomCrop``
        target_transform (callable, optional): A function/transform that takes in the
            target and transforms it.
        download (bool, optional): If true, downloads the dataset from the internet and
            puts it in root directory. If dataset is already downloaded, it is not
            downloaded again.
    """
    base_folder = 'cifar-10-batches-py'
    url = "https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz"
    filename = "cifar-10-python.tar.gz"
    tgz_md5 = 'c58f30108f718f92721af3b95e74349a'
    train_list = [
        ['data_batch_1', 'c99cafc152244af753f735de768cd75f'],
        ['data_batch_2', 'd4bba439e000b95fd0a9bffe97cbabec'],
        ['data_batch_3', '54ebc095f3ab1f0389bbae665268c751'],
        ['data_batch_4', '634d18415352ddfa80567beed471001a'],
        ['data_batch_5', '482c414d41f54cd18b22e5b47cb7c3cb'],
    ]

    test_list = [
        ['test_batch', '40351d587109b95175f43aff81a1287e'],
    ]
    
    meta = {
        'filename': 'batches.meta',
        'key': 'label_names',
        'md5': '5ff9c542aee3614f3951f8cda6e48888',
    }

    
    def __init__(self, root, download=False, x_train=True, x=False, x_test=False, transform=None, target_transform=None):
        self.root = os.path.expanduser(root)
        self.transform = transform
        self.target_transform = target_transform
        self.x_train = x_train
        self.x = x
        self.x_test = x_test
      
               
        if download:
            self.download()


        if self.x_train or self.x:
            downloaded_list = self.train_list
        elif self.x_test:
            downloaded_list = self.test_list

        self.data = []
        self.targets = []

        for file_name, checksum in downloaded_list:
            file_path = os.path.join(self.root, self.base_folder, file_name)
            with open(file_path, 'rb') as f:
                if sys.version_info[0] == 2:
                    entry = pickle.load(f)
                else:
                    entry = pickle.load(f, encoding='latin1')
                self.data.append(entry['data'])
                if 'labels' in entry:
                    self.targets.extend(entry['labels'])
                else:
                    self.targets.extend(entry['fine_labels'])

        self.data = np.vstack(self.data).reshape(-1, 3, 32, 32)
        self.data = self.data.transpose((0, 2, 3, 1))
        
        if self.x_train:
            self.data = self.data[0:100]
            self.targets = self.targets[0:100]
        elif self.x:
            self.data = self.data[100:]
            self.targets = self.targets[100:]
        
        self._load_meta()

    
    def _load_meta(self):
        path = os.path.join(self.root, self.base_folder, self.meta['filename'])
        with open(path, 'rb') as infile:
            if sys.version_info[0] == 2:
                data = pickle.load(infile)
            else:
                data = pickle.load(infile, encoding='latin1')
            self.classes = data[self.meta['key']]
        self.class_to_idx = {_class: i for i, _class in enumerate(self.classes)}

    
    def __getitem__(self, index):
        """
        Args:
            index (int): Index
        Returns:
            tuple: (image, target) where target is index of the target class.
        """
        img, target = self.data[index], self.targets[index]

        img = Image.fromarray(img)
        
        if self.transform is not None:
            img = self.transform(img)

        if self.target_transform is not None:
            target = self.target_transform(target)

        return img, target

    def __len__(self):
        return len(self.data)

    def download(self):
        import tarfile

        download_url(self.url, self.root, self.filename, self.tgz_md5)

        with tarfile.open(os.path.join(self.root, self.filename), "r:gz") as tar:
            tar.extractall(path=self.root)


In [118]:
#TEST

x_train = CIFAR10("data", download=True, x_train=True, x=False, x_test=False, transform=transforms.ToTensor, target_transform=None)
x = CIFAR10("data", download=True, x_train=False, x=True, x_test=False, transform=None, target_transform=None)
x_test = CIFAR10("data", download=True, x_train=False, x=False, x_test=True, transform=None, target_transform=None)

Using downloaded and verified file: data\cifar-10-python.tar.gz
Using downloaded and verified file: data\cifar-10-python.tar.gz
Using downloaded and verified file: data\cifar-10-python.tar.gz


This is our dataset $\mathcal{X}_{\text{train}}$, it will be used until the end of this project. The remaining samples correspond to $\mathcal{X}$. The testing set $\mathcal{X}_{\text{test}}$ corresponds to the whole testing set of CIFAR-10.

## Testing procedure
__Question 2:__ Explain why the evaluation of the training procedure is difficult. Propose several solutions.

> We are training with 100 different data.
> It is very low compared to the number of parameters of a deep learning algorithm. Basically we might overfit a lot during training. The models might be  good ones but not trained with enough examples.

# Raw approach: the baseline

In this section, the goal is to train a CNN on $\mathcal{X}_{\text{train}}$ and compare its performances with reported number from the litterature. You will have to re-use and/or design a standard classification pipeline. You should optimize your pipeline to obtain the best performances (image size, data augmentation by flip, ...).

The key ingredients for training a CNN are the batch size, as well as the learning rate schedule, i.e. how to decrease the learning rate as a function of the number of epochs. A possible schedule is to start the learning rate at 0.1 and decreasing it every 30 epochs by 10. In case of divergence, reduce the laerning rate. A potential batch size could be 10, yet this can be cross-validated.

You can get some baselines accuracies in this paper: http://openaccess.thecvf.com/content_cvpr_2018/papers/Keshari_Learning_Structure_and_CVPR_2018_paper.pdf. Obviously, it is a different context those researchers had access to GPUs.

## ResNet architectures

__Question 3:__ Write a classification pipeline for $\mathcal{X}_{\text{train}}$, train from scratch and evaluate a *ResNet-18* architecture specific to CIFAR10 (details about the ImageNet model can be found here: https://arxiv.org/abs/1409.1556 ). Please report the accuracy obtained on the whole dataset as well as the reference paper/GitHub link.

*Hint:* You can re-use the following code: https://github.com/kuangliu/pytorch-cifar. During a training of 10 epochs, a batch size of 10 and a learning rate of 0.01, one obtains 40% accuracy on $\mathcal{X}_{\text{train}}$ (~2 minutes) and 20% accuracy on $\mathcal{X}_{\text{test}}$ (~5 minutes).

In [88]:
"""The code below is inspired by https://github.com/kuangliu/pytorch-cifar"""
import sys
import time
import math

import torch.nn as nn
import torch.nn.init as init


def get_mean_and_std(dataset):
    '''Compute the mean and std value of dataset.'''
    dataloader = torch.utils.data.DataLoader(dataset, batch_size=1, shuffle=True, num_workers=2)
    mean = torch.zeros(3)
    std = torch.zeros(3)
    print('==> Computing mean and std..')
    for inputs, targets in dataloader:
        for i in range(3):
            mean[i] += inputs[:,i,:,:].mean()
            std[i] += inputs[:,i,:,:].std()
    mean.div_(len(dataset))
    std.div_(len(dataset))
    return mean, std

def init_params(net):
    '''Init layer parameters.'''
    for m in net.modules():
        if isinstance(m, nn.Conv2d):
            init.kaiming_normal(m.weight, mode='fan_out')
            if m.bias:
                init.constant(m.bias, 0)
        elif isinstance(m, nn.BatchNorm2d):
            init.constant(m.weight, 1)
            init.constant(m.bias, 0)
        elif isinstance(m, nn.Linear):
            init.normal(m.weight, std=1e-3)
            if m.bias:
                init.constant(m.bias, 0)

term_width = 0

TOTAL_BAR_LENGTH = 65.
last_time = time.time()
begin_time = last_time
def progress_bar(current, total, msg=None):
    global last_time, begin_time
    if current == 0:
        begin_time = time.time()  # Reset for new bar.

    cur_len = int(TOTAL_BAR_LENGTH*current/total)
    rest_len = int(TOTAL_BAR_LENGTH - cur_len) - 1

    sys.stdout.write(' [')
    for i in range(cur_len):
        sys.stdout.write('=')
    sys.stdout.write('>')
    for i in range(rest_len):
        sys.stdout.write('.')
    sys.stdout.write(']')

    cur_time = time.time()
    step_time = cur_time - last_time
    last_time = cur_time
    tot_time = cur_time - begin_time

    L = []
    L.append('  Step: %s' % format_time(step_time))
    L.append(' | Tot: %s' % format_time(tot_time))
    if msg:
        L.append(' | ' + msg)

    msg = ''.join(L)
    sys.stdout.write(msg)
    for i in range(term_width-int(TOTAL_BAR_LENGTH)-len(msg)-3):
        sys.stdout.write(' ')

    # Go back to the center of the bar.
    for i in range(term_width-int(TOTAL_BAR_LENGTH/2)+2):
        sys.stdout.write('\b')
    sys.stdout.write(' %d/%d ' % (current+1, total))

    if current < total-1:
        sys.stdout.write('\r')
    else:
        sys.stdout.write('\n')
    sys.stdout.flush()

def format_time(seconds):
    days = int(seconds / 3600/24)
    seconds = seconds - days*3600*24
    hours = int(seconds / 3600)
    seconds = seconds - hours*3600
    minutes = int(seconds / 60)
    seconds = seconds - minutes*60
    secondsf = int(seconds)
    seconds = seconds - secondsf
    millis = int(seconds*1000)

    f = ''
    i = 1
    if days > 0:
        f += str(days) + 'D'
        i += 1
    if hours > 0 and i <= 2:
        f += str(hours) + 'h'
        i += 1
    if minutes > 0 and i <= 2:
        f += str(minutes) + 'm'
        i += 1
    if secondsf > 0 and i <= 2:
        f += str(secondsf) + 's'
        i += 1
    if millis > 0 and i <= 2:
        f += str(millis) + 'ms'
        i += 1
    if f == '':
        f = '0ms'
    return f

class BasicBlock(nn.Module):
    expansion = 1

    def __init__(self, in_planes, planes, stride=1):
        super(BasicBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(planes)

        self.shortcut = nn.Sequential()
        if stride != 1 or in_planes != self.expansion*planes:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_planes, self.expansion*planes, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(self.expansion*planes)
            )

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.bn2(self.conv2(out))
        out += self.shortcut(x)
        out = F.relu(out)
        return out


class Bottleneck(nn.Module):
    expansion = 4

    def __init__(self, in_planes, planes, stride=1):
        super(Bottleneck, self).__init__()
        self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=1, bias=False)
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(planes)
        self.conv3 = nn.Conv2d(planes, self.expansion*planes, kernel_size=1, bias=False)
        self.bn3 = nn.BatchNorm2d(self.expansion*planes)

        self.shortcut = nn.Sequential()
        if stride != 1 or in_planes != self.expansion*planes:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_planes, self.expansion*planes, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(self.expansion*planes)
            )

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = F.relu(self.bn2(self.conv2(out)))
        out = self.bn3(self.conv3(out))
        out += self.shortcut(x)
        out = F.relu(out)
        return out


class ResNet(nn.Module):
    def __init__(self, block, num_blocks, num_classes=10):
        super(ResNet, self).__init__()
        self.in_planes = 64

        self.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.layer1 = self._make_layer(block, 64, num_blocks[0], stride=1)
        self.layer2 = self._make_layer(block, 128, num_blocks[1], stride=2)
        self.layer3 = self._make_layer(block, 256, num_blocks[2], stride=2)
        self.layer4 = self._make_layer(block, 512, num_blocks[3], stride=2)
        self.linear = nn.Linear(512*block.expansion, num_classes)

    def _make_layer(self, block, planes, num_blocks, stride):
        strides = [stride] + [1]*(num_blocks-1)
        layers = []
        for stride in strides:
            layers.append(block(self.in_planes, planes, stride))
            self.in_planes = planes * block.expansion
        return nn.Sequential(*layers)

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.layer1(out)
        out = self.layer2(out)
        out = self.layer3(out)
        out = self.layer4(out)
        out = F.avg_pool2d(out, 4)
        out = out.view(out.size(0), -1)
        out = self.linear(out)
        return out

def ResNet18():
    return ResNet(BasicBlock, [2,2,2,2])



device = 'cpu'
best_acc = 0  

trainset = CIFAR10("data", download=True, x_train=True, x=False, x_test=False, transform=transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),]), target_transform=None)

testset = CIFAR10("data", download=True, x_train=False, x=False, x_test=True, transform=transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),]), target_transform=None)

trainloader = torch.utils.data.DataLoader(trainset, batch_size=10, shuffle=True)
testloader = torch.utils.data.DataLoader(testset, batch_size=10, shuffle=False)


classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')


net = ResNet18()
net = net.to(device)

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=lr, momentum=0.9, weight_decay=5e-4)


def train(epoch):
    print('\nEpoch: %d' % epoch)
    net.train()
    train_loss = 0
    correct = 0
    total = 0
    for batch_idx, (inputs, targets) in enumerate(trainloader):
        optimizer.zero_grad()
        outputs = net(inputs)
        loss = criterion(outputs, targets)
        loss.backward()
        optimizer.step()

        train_loss += loss.item()
        _, predicted = outputs.max(1)
        total += targets.size(0)
        correct += predicted.eq(targets).sum().item()

        progress_bar(batch_idx, len(trainloader), 'Loss: %.3f | Acc: %.3f%% (%d/%d)'
            % (train_loss/(batch_idx+1), 100.*correct/total, correct, total))

def test(epoch):
    global best_acc
    net.eval()
    test_loss = 0
    correct = 0
    total = 0
    with torch.no_grad():
        for batch_idx, (inputs, targets) in enumerate(testloader):
            outputs = net(inputs)
            loss = criterion(outputs, targets)

            test_loss += loss.item()
            _, predicted = outputs.max(1)
            total += targets.size(0)
            correct += predicted.eq(targets).sum().item()

            progress_bar(batch_idx, len(testloader), 'Loss: %.3f | Acc: %.3f%% (%d/%d)'
                % (test_loss/(batch_idx+1), 100.*correct/total, correct, total))
        

print("\n Training")
lr = 0.1
nb_epochs = 20

for epoch in range(nb_epochs):
    if epoch % 30 == 0 and epoch != 0:
        lr = lr/10
    train(epoch)
    
print("\n Testing")
test(epoch)

==> Preparing data..
Using downloaded and verified file: data\cifar-10-python.tar.gz
Using downloaded and verified file: data\cifar-10-python.tar.gz
==> Building model..

 Training

Epoch: 0

Epoch: 1

Epoch: 2

Epoch: 3

Epoch: 4

Epoch: 5



Epoch: 6

Epoch: 7

Epoch: 8

Epoch: 9

Epoch: 10



Epoch: 11

Epoch: 12

Epoch: 13

Epoch: 14

Epoch: 15



Epoch: 16

Epoch: 17

Epoch: 18

Epoch: 19

 Testing


 [>................................................................]  Step: 545ms | Tot: 41ms | Loss: 3.023 | Acc: 0.000% (0/10) 1/1000 >................................................................]  Step: 550ms | Tot: 592ms | Loss: 3.216 | Acc: 15.000% (3/20) 2/1000 >................................................................]  Step: 573ms | Tot: 1s165ms | Loss: 2.920 | Acc: 20.000% (6/30) 3/1000 >................................................................]  Step: 590ms | Tot: 1s756ms | Loss: 2.690 | Acc: 20.000% (8/40) 4/1000 >................................................................]  Step: 564ms | Tot: 2s321ms | Loss: 2.820 | Acc: 20.000% (10/50) 5/1000 >................................................................]  Step: 585ms | Tot: 2s906ms | Loss: 2.912 | Acc: 16.667% (10/60) 6/1000 >................................................................]  Step: 536ms | Tot: 3s443ms | Loss: 3.011 | Acc: 17.143% (12/70) 7/1000 >..................................

































Saving..


> We can observe that the overfitting is really important which was easy to predict.
> The accuracy on Full Data was gotten from the following GitHub repository:  https://github.com/kuangliu/pytorch-cifar.

>| Model | Number of  epochs  | Train accuracy | Test accuracy | Accuracy on Full Data |
|------|------|------|------|------|
| ResNet18 | 20 | 50.00 % | 23.25% | 93.02% |


## VGG-like architectures

__Question 4:__ Same question as before, but with a *VGG*. Which model do you recommend?

In [121]:
"""The code below is inspired by:  https://github.com/kuangliu/pytorch-cifar"""


cfg = {
    'VGG16': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'],
}


class VGG(nn.Module):
    def __init__(self, vgg_name):
        super(VGG, self).__init__()
        self.features = self._make_layers(cfg[vgg_name])
        self.classifier = nn.Linear(512, 10)

    def forward(self, x):
        out = self.features(x)
        out = out.view(out.size(0), -1)
        out = self.classifier(out)
        return out

    def _make_layers(self, cfg):
        layers = []
        in_channels = 3
        for x in cfg:
            if x == 'M':
                layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
            else:
                layers += [nn.Conv2d(in_channels, x, kernel_size=3, padding=1),
                           nn.BatchNorm2d(x),
                           nn.ReLU(inplace=True)]
                in_channels = x
        layers += [nn.AvgPool2d(kernel_size=1, stride=1)]
        return nn.Sequential(*layers)

      
device = 'cpu'
best_acc = 0


trainset = CIFAR10("data", download=True, x_train=True, x=False, x_test=False, transform=transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),]), target_transform=None)


testset = CIFAR10("data", download=True, x_train=False, x=False, x_test=True, transform=transforms.Compose([
    transforms.ToTensor(),
    
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),]), target_transform=None)


trainloader = torch.utils.data.DataLoader(trainset, batch_size=10, shuffle=True)

testloader = torch.utils.data.DataLoader(testset, batch_size=10, shuffle=False)



classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')



net = VGG('VGG16')
net = net.to(device)

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=lr, momentum=0.9, weight_decay=5e-4)


def train(epoch):
    print('\nEpoch: %d' % epoch)
    net.train()
    train_loss = 0
    correct = 0
    total = 0
    for batch_idx, (inputs, targets) in enumerate(trainloader):
        optimizer.zero_grad()
        outputs = net(inputs)
        loss = criterion(outputs, targets)
        loss.backward()
        optimizer.step()

        train_loss += loss.item()
        _, predicted = outputs.max(1)
        total += targets.size(0)
        correct += predicted.eq(targets).sum().item()

        progress_bar(batch_idx, len(trainloader), 'Loss: %.3f | Acc: %.3f%% (%d/%d)'
            % (train_loss/(batch_idx+1), 100.*correct/total, correct, total))

def test(epoch):
    global best_acc
    net.eval()
    test_loss = 0
    correct = 0
    total = 0
    with torch.no_grad():
        for batch_idx, (inputs, targets) in enumerate(testloader):
            outputs = net(inputs)
            loss = criterion(outputs, targets)

            test_loss += loss.item()
            _, predicted = outputs.max(1)
            total += targets.size(0)
            correct += predicted.eq(targets).sum().item()

            progress_bar(batch_idx, len(testloader), 'Loss: %.3f | Acc: %.3f%% (%d/%d)'
                % (test_loss/(batch_idx+1), 100.*correct/total, correct, total))


print("\n Training")
lr = 0.15
nb_epochs = 40

for epoch in range(nb_epochs):
    if epoch % 30 == 0 and epoch != 0:
        lr = lr/10
    train(epoch)
    
print("\n Testing")
test(epoch)

==> Preparing data..
Using downloaded and verified file: data\cifar-10-python.tar.gz
Using downloaded and verified file: data\cifar-10-python.tar.gz
==> Building model..

 Training

Epoch: 0

KeyboardInterrupt: 

> The accuracy on Full Data was gotten from the following GitHub repository:  https://github.com/kuangliu/pytorch-cifar.

>| Model | Number of  epochs  | Train accuracy | Test accuracy | Accuracy on Full Data |
|------|------|------|------|------|
| VGG16 | 40 | 17.00% | 10.151% | 92.64% |

> VGG is a deeper architecture and thus longer to train and requires a bigger dataset. In a context similar to ours (small dataset and computation power) I would recommend the ResNet architecture.

# Transfer learning

We propose to use pre-trained models on a classification and generative task, in order to improve the results of our setting.

## ImageNet features

Now, we will use some pre-trained models on ImageNet and see how well they compare on CIFAR. A list is available on: https://pytorch.org/docs/stable/torchvision/models.html.

__Question 5:__ Pick a model from the list above, adapt it for CIFAR and retrain its final layer (or a block of layers, depending on the resources to which you have access to). Report its accuracy.

>Let's work with a ResNet18

In [100]:
device = 'cpu'


trainset = CIFAR10("data", download=True, x_train=True, x=False, x_test=False, transform=transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),]), target_transform=None)


testset = CIFAR10("data", download=True, x_train=False, x=False, x_test=True, transform=transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),]), target_transform=None)


trainloader = torch.utils.data.DataLoader(trainset, batch_size=10, shuffle=True)


testloader = torch.utils.data.DataLoader(testset, batch_size=10, shuffle=False)


classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')


net =  torchvision.models.resnet18(pretrained=True)
net.avgpool = torch.nn.AvgPool2d(kernel_size=1, stride=1, padding=0)

i=0
for param in net.parameters():
    i += 1
    if i<60:
        param.requires_grad = False
    
num_ftrs = net.fc.in_features
net.fc = nn.Linear(num_ftrs, 10)

net = net.to(device)

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=lr, momentum=0.9, weight_decay=5e-4)

# Training
def train(epoch):
    print('\nEpoch: %d' % epoch)
    net.train()
    train_loss = 0
    correct = 0
    total = 0
    for batch_idx, (inputs, targets) in enumerate(trainloader):
        optimizer.zero_grad()
        outputs = net(inputs)
        loss = criterion(outputs, targets)
        loss.backward()
        optimizer.step()

        train_loss += loss.item()
        _, predicted = outputs.max(1)
        total += targets.size(0)
        correct += predicted.eq(targets).sum().item()

        progress_bar(batch_idx, len(trainloader), 'Loss: %.3f | Acc: %.3f%% (%d/%d)'
            % (train_loss/(batch_idx+1), 100.*correct/total, correct, total))

def test(epoch):
    global best_acc
    net.eval()
    test_loss = 0
    correct = 0
    total = 0
    with torch.no_grad():
        for batch_idx, (inputs, targets) in enumerate(testloader):
            outputs = net(inputs)
            loss = criterion(outputs, targets)

            test_loss += loss.item()
            _, predicted = outputs.max(1)
            total += targets.size(0)
            correct += predicted.eq(targets).sum().item()

            progress_bar(batch_idx, len(testloader), 'Loss: %.3f | Acc: %.3f%% (%d/%d)'
                % (test_loss/(batch_idx+1), 100.*correct/total, correct, total))


print("\n Training")
lr = 0.01
nb_epochs = 20

for epoch in range(nb_epochs):
    train(epoch)
    
print("\n Testing")
test(epoch)

==> Preparing data..
Using downloaded and verified file: data\cifar-10-python.tar.gz
Using downloaded and verified file: data\cifar-10-python.tar.gz
==> Building model..

 Training

Epoch: 0

Epoch: 1

Epoch: 2

Epoch: 3

Epoch: 4

Epoch: 5



Epoch: 6

Epoch: 7

Epoch: 8

Epoch: 9

Epoch: 10



Epoch: 11

Epoch: 12

Epoch: 13

Epoch: 14

Epoch: 15



Epoch: 16

Epoch: 17

Epoch: 18

Epoch: 19

 Testing


 [>................................................................]  Step: 223ms | Tot: 51ms | Loss: 16.531 | Acc: 10.000% (1/10) 1/1000 >................................................................]  Step: 244ms | Tot: 296ms | Loss: 12.587 | Acc: 15.000% (3/20) 2/1000 >................................................................]  Step: 236ms | Tot: 533ms | Loss: 11.000 | Acc: 20.000% (6/30) 3/1000 >................................................................]  Step: 196ms | Tot: 730ms | Loss: 9.400 | Acc: 22.500% (9/40) 4/1000 >................................................................]  Step: 224ms | Tot: 955ms | Loss: 8.462 | Acc: 20.000% (10/50) 5/1000 >................................................................]  Step: 267ms | Tot: 1s222ms | Loss: 8.693 | Acc: 21.667% (13/60) 6/1000 >................................................................]  Step: 184ms | Tot: 1s407ms | Loss: 9.410 | Acc: 18.571% (13/70) 7/1000 >....................................

































Saving..


>| Model | Number of  epochs  | Train accuracy | Test accuracy |
|------|------|------|------|
| ResNet18 | 20 | 45.00 % | 21.340% |

## DCGan features

GANs correspond to an unsupervised technique for generating images. In https://arxiv.org/pdf/1511.06434.pdf, Sec. 5.1 shows that the representation obtained from the Discriminator has some nice generalization properties on CIFAR10.

__Question 6:__  Using for instance a pretrained model from https://github.com/soumith/dcgan.torch combined with https://github.com/pytorch/examples/tree/master/dcgan, propose a model to train on $\mathcal{X}_{\text{train}}$. Train it and report its accuracy.

*Hint:* You can use the library: https://github.com/bshillingford/python-torchfile to load the weights of a model from torch(Lua) to pytorch(python).

In [123]:
# python dcgan.py --dataset cifar10 --dataroot /scratch/users/vision/yu_dl/raaz.rsk/data/cifar10 --imageSize 32 --cuda --outf out_cifar --manualSeed 13 --niter 100

import torchvision.utils as vutils

class Generator(nn.Module):
    def __init__(self, ngpu, nc=3, nz=100, ngf=64):
        super(Generator, self).__init__()
        self.ngpu = ngpu
        self.main = nn.Sequential(
            # input is Z, going into a convolution
            nn.ConvTranspose2d(     nz, ngf * 8, 4, 1, 0, bias=False),
            nn.BatchNorm2d(ngf * 8),
            nn.ReLU(True),
            # state size. (ngf*8) x 4 x 4
            nn.ConvTranspose2d(ngf * 8, ngf * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 4),
            nn.ReLU(True),
            # state size. (ngf*4) x 8 x 8
            nn.ConvTranspose2d(ngf * 4, ngf * 2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 2),
            nn.ReLU(True),
            # state size. (ngf*2) x 16 x 16
            nn.ConvTranspose2d(ngf * 2,     ngf, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf),
            nn.ReLU(True),
            nn.ConvTranspose2d(    ngf,      nc, kernel_size=1, stride=1, padding=0, bias=False),
            nn.Tanh()
        )

    def forward(self, input):
        if input.is_cuda and self.ngpu > 1:
            output = nn.parallel.data_parallel(self.main, input, range(self.ngpu))
        else:
            output = self.main(input)
        return output


class Discriminator(nn.Module):
    def __init__(self, ngpu, nc=3, ndf=64):
        super(Discriminator, self).__init__()
        self.ngpu = ngpu
        self.main = nn.Sequential(
            # input is (nc) x 64 x 64
            nn.Conv2d(nc, ndf, 4, 2, 1, bias=False),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf) x 32 x 32
            nn.Conv2d(ndf, ndf * 2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 2),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf*2) x 16 x 16
            nn.Conv2d(ndf * 2, ndf * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 4),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf*4) x 8 x 8
            nn.Conv2d(ndf * 4, ndf * 8, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 8),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf*8) x 4 x 4
            nn.Conv2d(ndf * 8, 1, 2, 2, 0, bias=False),
            nn.Sigmoid()
        )

    def forward(self, input):
        if input.is_cuda and self.ngpu > 1:
            output = nn.parallel.data_parallel(self.main, input, range(self.ngpu))
        else:
            output = self.main(input)

        return output.view(-1, 1).squeeze(1)
    
trainset = CIFAR10("data", download=True, x_train=True, x=False, x_test=False, transform=transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),]), target_transform=None)

testset = CIFAR10("data", download=True, x_train=False, x=False, x_test=True, transform=transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),]), target_transform=None)

trainloader = torch.utils.data.DataLoader(trainset, batch_size=10, shuffle=True)
testloader = torch.utils.data.DataLoader(testset, batch_size=10, shuffle=False)


classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

outf = 'out'
niter = 25
try:
    os.makedirs(outf)
except OSError:
    pass

cudnn.benchmark = True



dataset = trainset
nc=3

dataloader = trainloader

device = torch.device("cpu")
ngpu = 1
nz = 100
ngf = 64
ndf = 64


# custom weights initialization called on netG and netD
def weights_init(m):
    classname = m.__class__.__name__
    if classname.find('Conv') != -1:
        m.weight.data.normal_(0.0, 0.02)
    elif classname.find('BatchNorm') != -1:
        m.weight.data.normal_(1.0, 0.02)
        m.bias.data.fill_(0)

netG = Generator(ngpu).to(device)
netG.apply(weights_init)
netG.load_state_dict(torch.load('weights/netG_epoch_199.pth'))
print(netG)



netD = Discriminator(ngpu).to(device)
netD.apply(weights_init)
netD.load_state_dict(torch.load('weights/netD_epoch_199.pth'))
print(netD)

criterion = nn.BCELoss()

batchSize = 64
beta1 = 0.5
fixed_noise = torch.randn(batchSize, nz, 1, 1, device=device)
real_label = 1
fake_label = 0
lr = 0.0002

# setup optimizer
optimizerD = optim.Adam(netD.parameters(), lr=lr, betas=(beta1, 0.999))
optimizerG = optim.Adam(netG.parameters(), lr=lr, betas=(beta1, 0.999))

for epoch in range(niter):
    for i, data in enumerate(dataloader, 0):
        ############################
        # (1) Update D network: maximize log(D(x)) + log(1 - D(G(z)))
        ###########################
        # train with real
        netD.zero_grad()
        real_cpu = data[0].to(device)
        batch_size = real_cpu.size(0)
        label = torch.full((batch_size,), real_label, device=device)

        output = netD(real_cpu)
        errD_real = criterion(output, label)
        errD_real.backward()
        D_x = output.mean().item()

        # train with fake
        noise = torch.randn(batch_size, nz, 1, 1, device=device)
        fake = netG(noise)
        label.fill_(fake_label)
        output = netD(fake.detach())
        errD_fake = criterion(output, label)
        errD_fake.backward()
        D_G_z1 = output.mean().item()
        errD = errD_real + errD_fake
        optimizerD.step()

        ############################
        # (2) Update G network: maximize log(D(G(z)))
        ###########################
        netG.zero_grad()
        label.fill_(real_label)  # fake labels are real for generator cost
        output = netD(fake)
        errG = criterion(output, label)
        errG.backward()
        D_G_z2 = output.mean().item()
        optimizerG.step()

        print('[%d/%d][%d/%d] Loss_D: %.4f Loss_G: %.4f D(x): %.4f D(G(z)): %.4f / %.4f'
              % (epoch, niter, i, len(dataloader),
                 errD.item(), errG.item(), D_x, D_G_z1, D_G_z2))
        if i % 100 == 0:
            vutils.save_image(real_cpu,
                    '%s/real_samples.png' % outf,
                    normalize=True)
            fake = netG(fixed_noise)
            vutils.save_image(fake.detach(),
                    '%s/fake_samples_epoch_%03d.png' % (outf, epoch),
                    normalize=True)

    # do checkpointing
    torch.save(netG.state_dict(), '%s/netG_epoch_%d.pth' % (outf, epoch))
torch.save(netD.state_dict(), '%s/netD_epoch_%d.pth' % (outf, epoch))

Using downloaded and verified file: data\cifar-10-python.tar.gz
Using downloaded and verified file: data\cifar-10-python.tar.gz
Generator(
  (main): Sequential(
    (0): ConvTranspose2d(100, 512, kernel_size=(4, 4), stride=(1, 1), bias=False)
    (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace)
    (3): ConvTranspose2d(512, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (5): ReLU(inplace)
    (6): ConvTranspose2d(256, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (7): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (8): ReLU(inplace)
    (9): ConvTranspose2d(128, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (10): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (11): ReLU(inplace)
    (12): C

[7/25][5/10] Loss_D: 0.1191 Loss_G: 12.8751 D(x): 0.9290 D(G(z)): 0.0001 / 0.0001
[7/25][6/10] Loss_D: 0.0067 Loss_G: 8.8384 D(x): 1.0000 D(G(z)): 0.0066 / 0.0040
[7/25][7/10] Loss_D: 0.0597 Loss_G: 9.4898 D(x): 1.0000 D(G(z)): 0.0452 / 0.0002
[7/25][8/10] Loss_D: 0.0055 Loss_G: 10.1681 D(x): 1.0000 D(G(z)): 0.0054 / 0.0037
[7/25][9/10] Loss_D: 0.0006 Loss_G: 11.2951 D(x): 0.9998 D(G(z)): 0.0004 / 0.0004
[8/25][0/10] Loss_D: 0.0601 Loss_G: 10.3865 D(x): 1.0000 D(G(z)): 0.0461 / 0.0013
[8/25][1/10] Loss_D: 0.0032 Loss_G: 13.0530 D(x): 0.9969 D(G(z)): 0.0001 / 0.0001
[8/25][2/10] Loss_D: 0.0002 Loss_G: 11.4949 D(x): 1.0000 D(G(z)): 0.0002 / 0.0002
[8/25][3/10] Loss_D: 0.0091 Loss_G: 8.0053 D(x): 1.0000 D(G(z)): 0.0089 / 0.0034
[8/25][4/10] Loss_D: 0.0014 Loss_G: 9.9052 D(x): 1.0000 D(G(z)): 0.0014 / 0.0012
[8/25][5/10] Loss_D: 0.0178 Loss_G: 8.6529 D(x): 0.9997 D(G(z)): 0.0169 / 0.0029
[8/25][6/10] Loss_D: 0.0046 Loss_G: 8.7705 D(x): 0.9987 D(G(z)): 0.0032 / 0.0016
[8/25][7/10] Loss_D: 0

[17/25][4/10] Loss_D: 0.0271 Loss_G: 15.4016 D(x): 0.9763 D(G(z)): 0.0000 / 0.0000
[17/25][5/10] Loss_D: 0.6977 Loss_G: 8.8172 D(x): 0.7749 D(G(z)): 0.0001 / 0.0020
[17/25][6/10] Loss_D: 0.0011 Loss_G: 10.5076 D(x): 1.0000 D(G(z)): 0.0011 / 0.0042
[17/25][7/10] Loss_D: 0.0017 Loss_G: 8.1099 D(x): 1.0000 D(G(z)): 0.0017 / 0.0028
[17/25][8/10] Loss_D: 0.0046 Loss_G: 6.7566 D(x): 0.9997 D(G(z)): 0.0042 / 0.0045
[17/25][9/10] Loss_D: 0.0722 Loss_G: 9.8707 D(x): 1.0000 D(G(z)): 0.0604 / 0.0004
[18/25][0/10] Loss_D: 0.0055 Loss_G: 10.1321 D(x): 0.9999 D(G(z)): 0.0053 / 0.0020
[18/25][1/10] Loss_D: 0.3412 Loss_G: 9.6047 D(x): 0.8643 D(G(z)): 0.0874 / 0.0009
[18/25][2/10] Loss_D: 0.7055 Loss_G: 19.2656 D(x): 1.0000 D(G(z)): 0.2984 / 0.0000
[18/25][3/10] Loss_D: 0.0045 Loss_G: 14.4139 D(x): 0.9972 D(G(z)): 0.0016 / 0.0003
[18/25][4/10] Loss_D: 0.0000 Loss_G: 15.6005 D(x): 1.0000 D(G(z)): 0.0000 / 0.0000
[18/25][5/10] Loss_D: 0.0003 Loss_G: 16.0237 D(x): 0.9997 D(G(z)): 0.0000 / 0.0000
[18/25][6

In [139]:
best_acc = 0

def test(epoch):
    global best_acc
    netD.eval()
    test_loss = 0
    correct = 0
    total = 0
    with torch.no_grad():
        for batch_idx, (inputs, targets) in enumerate(testloader):
            outputs = netD(inputs)
            
            print(outputs)
            loss = criterion(outputs, targets.float())

            test_loss += loss.item()
            _, predicted = outputs.max(0)
            total += targets.size(0)
            correct += predicted.eq(targets).sum().item()

            progress_bar(batch_idx, len(testloader), 'Loss: %.3f | Acc: %.3f%% (%d/%d)'
                % (test_loss/(batch_idx+1), 100.*correct/total, correct, total))
            
            
test(20)


tensor([9.7804e-04, 7.3845e-05, 9.6964e-01, 2.1930e-04, 9.5407e-01, 7.8901e-01,
        1.0000e+00, 3.4025e-02, 6.5453e-01, 7.4244e-01])
tensor([1.3564e-01, 1.8143e-03, 9.3863e-01, 9.9968e-01, 1.0351e-07, 9.9999e-01,43s | Tot: 54ms | Loss: 7.076 | Acc: 30.000% (3/10) 1/1000 
        5.5349e-02, 6.4599e-01, 9.7096e-01, 8.3351e-01])
tensor([9.6295e-01, 4.2635e-08, 3.1786e-01, 1.5378e-02, 6.9816e-02, 5.4639e-05,ms | Tot: 166ms | Loss: 5.677 | Acc: 25.000% (5/20) 2/1000 
        9.9982e-01, 9.8279e-01, 9.9997e-01, 9.9982e-01])
tensor([9.2869e-01, 6.2361e-01, 9.2022e-01, 1.5792e-01, 9.8859e-01, 1.1001e-01,ms | Tot: 270ms | Loss: 0.672 | Acc: 16.667% (5/30) 3/1000 
        4.5030e-03, 1.0935e-01, 1.3507e-05, 1.2152e-03])
tensor([6.1372e-01, 9.9983e-01, 8.2551e-03, 9.5039e-01, 8.3571e-01, 4.0404e-04,ms | Tot: 371ms | Loss: 3.351 | Acc: 17.500% (7/40) 4/1000 
        8.0086e-02, 2.7351e-03, 9.0076e-01, 8.1191e-01])
tensor([7.4145e-05, 8.4098e-03, 6.2571e-01, 9.9782e-01, 4.8824e-01, 2.8501e-01,

tensor([0.9874, 0.8695, 0.3207, 0.5885, 0.0016, 0.9549, 0.9613, 0.9588, 0.9928,ms | Tot: 5s175ms | Loss: 9.983 | Acc: 12.093% (52/430) 43/1000 
        0.9981])
tensor([9.6394e-01, 2.8173e-01, 1.7605e-03, 1.3544e-04, 9.9826e-01, 8.1070e-01,s | Tot: 5s274ms | Loss: 9.676 | Acc: 11.818% (52/440) 44/1000 
        9.5329e-01, 1.7521e-05, 2.7685e-01, 2.6345e-02])
tensor([7.7140e-06, 1.4038e-02, 9.3980e-01, 2.4831e-02, 3.2485e-03, 1.4041e-01,ms | Tot: 5s376ms | Loss: 9.694 | Acc: 11.778% (53/450) 45/1000 
        9.9917e-01, 9.8215e-01, 1.9296e-01, 9.9460e-01])
tensor([5.5045e-01, 1.3449e-01, 9.4489e-01, 9.9216e-01, 1.8190e-01, 2.8722e-01,s | Tot: 5s476ms | Loss: 9.625 | Acc: 11.739% (54/460) 46/1000 
        9.9095e-01, 2.1452e-08, 1.7896e-01, 7.6483e-01])
tensor([1.8372e-01, 2.5593e-05, 6.9644e-10, 8.8744e-03, 9.8208e-01, 1.8365e-03,ms | Tot: 5s582ms | Loss: 9.520 | Acc: 11.915% (56/470) 47/1000 
        9.7747e-01, 4.2266e-02, 9.6573e-01, 9.9689e-01])
tensor([6.4754e-01, 6.8886e-01, 7.241

tensor([1.0339e-01, 9.1463e-02, 9.9538e-01, 5.3610e-03, 1.0843e-01, 1.0000e+00,ms | Tot: 12s592ms | Loss: 9.872 | Acc: 11.786% (132/1120) 112/1000 
        3.9833e-08, 5.1964e-03, 3.8887e-01, 9.5271e-01])
tensor([9.4829e-07, 4.2424e-02, 2.6860e-01, 9.8100e-01, 4.2274e-03, 4.1964e-02,ms | Tot: 12s745ms | Loss: 9.791 | Acc: 11.770% (133/1130) 113/1000 
        9.9132e-01, 1.6310e-02, 6.4938e-01, 9.9904e-01])
tensor([0.9939, 0.4738, 0.9061, 0.0027, 0.4873, 0.8049, 0.8486, 0.9817, 0.9332,ms | Tot: 12s848ms | Loss: 9.728 | Acc: 11.754% (134/1140) 114/1000 
        0.9993])
tensor([3.1118e-01, 9.7969e-01, 2.7934e-04, 1.9689e-07, 9.7896e-01, 1.2324e-07,ms | Tot: 13s20ms | Loss: 9.578 | Acc: 11.652% (134/1150) 115/1000 
        9.0354e-01, 5.6289e-01, 4.1066e-01, 9.3355e-01])
tensor([9.0998e-01, 5.9089e-01, 3.4327e-02, 6.3983e-01, 5.6583e-08, 1.1324e-03,ms | Tot: 13s270ms | Loss: 9.691 | Acc: 11.724% (136/1160) 116/1000 
        7.1106e-01, 3.5650e-01, 6.7498e-03, 1.2861e-01])
        2.0742e-

        0.9992])
tensor([1.9699e-01, 2.8129e-09, 2.6043e-03, 5.0530e-01, 8.6442e-01, 9.1171e-01,ms | Tot: 20s467ms | Loss: 10.062 | Acc: 11.405% (211/1850) 185/1000 
        1.6242e-03, 8.0200e-02, 1.2032e-03, 8.5705e-01])
tensor([3.3071e-04, 9.9987e-01, 9.9967e-01, 3.1635e-02, 2.3797e-04, 1.7939e-02,ms | Tot: 20s586ms | Loss: 10.054 | Acc: 11.505% (214/1860) 186/1000 
        7.1981e-04, 9.9998e-01, 4.5260e-01, 2.8670e-04])
tensor([2.0053e-06, 8.3322e-01, 6.8878e-01, 9.9606e-01, 4.5024e-03, 2.5337e-04,ms | Tot: 20s725ms | Loss: 10.100 | Acc: 11.551% (216/1870) 187/1000 
        1.2085e-01, 9.3927e-01, 4.7420e-08, 1.0823e-03])
tensor([5.6253e-03, 2.2348e-15, 6.7042e-03, 8.5077e-01, 5.8956e-02, 1.2676e-04,s | Tot: 20s825ms | Loss: 10.128 | Acc: 11.596% (218/1880) 188/1000 
        1.9130e-03, 9.5591e-01, 1.7558e-02, 2.2482e-07])
tensor([7.1984e-02, 2.2270e-07, 9.8578e-01, 3.8499e-02, 9.9672e-01, 3.6455e-06,s | Tot: 20s917ms | Loss: 10.373 | Acc: 11.587% (219/1890) 189/1000 
        9.72

        1.2687e-03, 8.7239e-01, 2.8076e-01, 3.4998e-04])
tensor([9.9901e-01, 2.5729e-03, 9.9897e-01, 1.0909e-08, 9.6903e-01, 6.4644e-01, 10.399 | Acc: 10.814% (279/2580) 258/1000 
        9.9978e-01, 1.4022e-02, 9.9135e-01, 9.8648e-01])
tensor([2.0456e-02, 4.4602e-02, 9.9952e-01, 9.9683e-01, 6.9937e-03, 9.7696e-01,ms | Tot: 28s88ms | Loss: 10.364 | Acc: 10.849% (281/2590) 259/1000 
        3.5818e-07, 5.4897e-01, 3.3898e-01, 4.6244e-06])
tensor([9.9515e-01, 1.0291e-01, 9.5837e-01, 5.6533e-04, 2.6087e-05, 9.9162e-01,ms | Tot: 28s217ms | Loss: 10.399 | Acc: 10.885% (283/2600) 260/1000 
        8.9451e-02, 6.2485e-01, 9.9704e-01, 9.8024e-01])
tensor([0.9427, 0.9587, 0.9973, 0.8763, 0.0244, 0.9838, 0.9998, 0.2982, 0.9933,ms | Tot: 28s332ms | Loss: 10.357 | Acc: 10.920% (285/2610) 261/1000 
        0.5798])
tensor([0.1514, 0.1981, 0.0169, 0.0236, 0.0325, 0.9751, 0.9987, 0.0477, 0.9828,ms | Tot: 28s495ms | Loss: 10.288 | Acc: 10.916% (286/2620) 262/1000 
        0.8707])
tensor([4.8511e-02, 

        7.9102e-01, 1.4292e-02, 2.1560e-02, 4.5428e-02])
tensor([9.7701e-04, 1.4158e-01, 5.1664e-01, 4.0305e-03, 1.0000e+00, 7.7834e-01,s | Tot: 34s994ms | Loss: 10.597 | Acc: 10.765% (352/3270) 327/1000 
        6.9220e-01, 1.0132e-01, 7.5969e-06, 1.5821e-01])
        1.0000e+00, 9.3300e-01, 2.0764e-07, 5.5374e-01])
tensor([9.2074e-03, 9.9985e-01, 9.8387e-01, 5.3055e-01, 2.0689e-03, 2.5552e-03,s | Tot: 35s764ms | Loss: 10.530 | Acc: 10.687% (358/3350) 335/1000 
        1.8436e-01, 1.9326e-01, 2.0454e-08, 4.0394e-05])
tensor([8.4983e-01, 8.4136e-01, 1.3143e-01, 9.9701e-01, 2.9518e-02, 2.2018e-03,s | Tot: 35s863ms | Loss: 10.597 | Acc: 10.685% (359/3360) 336/1000 
        1.7809e-03, 9.8597e-01, 3.3658e-06, 1.7957e-05])
tensor([3.2292e-06, 4.0714e-04, 9.8490e-01, 9.6936e-01, 9.9978e-01, 7.1649e-01,ms | Tot: 35s976ms | Loss: 10.639 | Acc: 10.653% (359/3370) 337/1000 
        7.6278e-02, 9.7643e-03, 3.6796e-03, 8.0891e-01])
tensor([6.3641e-01, 4.2342e-01, 9.8224e-01, 1.2023e-04, 2.2581e-0

        3.6161e-07, 8.8132e-01, 9.9683e-01, 2.0869e-03])
tensor([4.3358e-02, 9.8780e-01, 9.1110e-01, 1.9243e-01, 5.4668e-01, 1.0088e-03,s | Tot: 43s149ms | Loss: 10.462 | Acc: 10.553% (420/3980) 398/1000 
        9.5601e-01, 5.6933e-07, 9.9906e-01, 9.7712e-01])
tensor([7.3900e-01, 9.4944e-01, 2.6722e-09, 7.8051e-01, 2.5013e-03, 9.8432e-01,s | Tot: 43s238ms | Loss: 10.447 | Acc: 10.526% (420/3990) 399/1000 
        4.3149e-03, 9.9992e-01, 1.8832e-01, 9.9401e-06])
tensor([2.1196e-02, 9.4449e-02, 6.4769e-03, 2.3724e-04, 6.7495e-03, 1.5748e-05,s | Tot: 43s334ms | Loss: 10.456 | Acc: 10.525% (421/4000) 400/1000 
        9.3118e-01, 3.1896e-04, 5.9772e-02, 3.5281e-01])
tensor([6.0680e-01, 1.8685e-02, 1.3737e-04, 4.5483e-04, 6.7084e-01, 9.1797e-01,
        8.9639e-01, 3.5922e-01, 1.2013e-03, 2.2464e-01])
tensor([2.5718e-02, 1.0363e-11, 8.7820e-01, 1.6302e-01, 4.6648e-06, 7.5266e-01,ms | Tot: 44s27ms | Loss: 10.342 | Acc: 10.467% (426/4070) 407/1000 
        4.4615e-01, 9.4262e-01, 3.4206e-08,

tensor([9.9990e-01, 8.9027e-01, 9.9852e-01, 7.3838e-02, 1.6368e-15, 1.4452e-05,ms | Tot: 50s280ms | Loss: 10.389 | Acc: 10.297% (486/4720) 472/1000 
        9.8245e-01, 5.2566e-04, 9.3015e-01, 9.7098e-01])
tensor([9.4954e-02, 2.7304e-02, 3.0832e-01, 4.1955e-01, 1.3707e-01, 4.2867e-01,s | Tot: 50s372ms | Loss: 10.420 | Acc: 10.296% (487/4730) 473/1000 
        9.9994e-01, 8.0300e-04, 8.2610e-04, 2.1642e-02])
tensor([9.7491e-01, 1.6587e-08, 6.0668e-01, 1.4735e-02, 9.8792e-01, 1.2370e-03,s | Tot: 50s455ms | Loss: 10.431 | Acc: 10.316% (489/4740) 474/1000 
        9.1849e-01, 8.0578e-01, 2.7898e-01, 9.8956e-01])
tensor([7.8357e-01, 5.9200e-01, 8.1970e-01, 1.9793e-07, 9.7530e-01, 2.8552e-01,s | Tot: 50s538ms | Loss: 10.415 | Acc: 10.295% (489/4750) 475/1000 
        6.3360e-01, 8.8825e-08, 5.6512e-01, 1.3455e-05])
tensor([5.0760e-03, 3.9265e-01, 2.7560e-05, 1.0554e-04, 9.8522e-01, 9.9662e-01,s | Tot: 50s627ms | Loss: 10.409 | Acc: 10.294% (490/4760) 476/1000 
        1.2484e-04, 6.8362e-03,

        9.9936e-01, 5.0597e-03, 9.9732e-01, 4.6931e-01])
tensor([4.4288e-02, 3.1039e-01, 8.7053e-01, 7.5026e-01, 3.7914e-05, 3.6567e-02,ms | Tot: 57s223ms | Loss: 10.662 | Acc: 10.182% (558/5480) 548/1000 
        2.9251e-01, 7.4382e-02, 8.5678e-01, 4.4546e-01])
tensor([9.7974e-01, 1.0381e-04, 8.8066e-03, 9.9178e-01, 2.3195e-02, 9.6435e-01,s | Tot: 57s316ms | Loss: 10.666 | Acc: 10.164% (558/5490) 549/1000 
        2.2748e-04, 2.0080e-01, 9.9982e-01, 3.4120e-01])
tensor([9.9798e-01, 9.9917e-01, 9.6255e-01, 9.7935e-04, 5.6872e-01, 1.0770e-01,s | Tot: 57s412ms | Loss: 10.644 | Acc: 10.145% (558/5500) 550/1000 
        2.0385e-01, 8.4366e-01, 1.2585e-02, 9.9827e-01])
tensor([9.9287e-01, 3.8203e-01, 8.5052e-01, 3.7573e-04, 1.4464e-01, 4.3075e-01,s | Tot: 57s500ms | Loss: 10.617 | Acc: 10.127% (558/5510) 551/1000 
        8.1158e-01, 9.5665e-01, 2.4276e-01, 1.1961e-02])
tensor([1.8020e-01, 7.3877e-03, 7.8464e-03, 5.9368e-01, 3.3125e-03, 8.7944e-08,s | Tot: 57s589ms | Loss: 10.608 | Acc: 10.

        1.3490e-07, 7.7167e-02, 7.3417e-01, 5.8814e-01])
tensor([5.3002e-02, 1.2223e-01, 9.9683e-01, 8.8406e-05, 9.8634e-01, 7.1731e-02,s | Tot: 1m4s | Loss: 10.256 | Acc: 9.968% (625/6270) 627/1000 
        9.7034e-01, 8.8897e-01, 2.5799e-05, 8.9071e-05])
tensor([9.2156e-01, 9.7470e-01, 9.2327e-01, 9.7856e-01, 1.2257e-04, 9.8510e-01,ms | Tot: 1m4s | Loss: 10.247 | Acc: 10.000% (628/6280) 628/1000 
        9.6504e-02, 9.9950e-01, 9.7934e-01, 7.9042e-01])
tensor([3.7999e-03, 1.6703e-04, 2.3470e-01, 1.5534e-01, 2.3522e-02, 8.6925e-03,s | Tot: 1m4s | Loss: 10.214 | Acc: 9.984% (628/6290) 629/1000 
        9.9877e-01, 9.5433e-01, 4.4955e-01, 9.8724e-01])
tensor([5.3515e-04, 9.9997e-01, 5.6353e-04, 3.6895e-02, 1.3048e-04, 3.2915e-01,s | Tot: 1m4s | Loss: 10.216 | Acc: 9.984% (629/6300) 630/1000 
        3.1988e-06, 8.2835e-01, 8.1663e-02, 4.2703e-01])
tensor([5.7602e-03, 7.2466e-01, 2.6216e-01, 3.3488e-03, 9.9349e-01, 2.7476e-06,s | Tot: 1m4s | Loss: 10.206 | Acc: 10.000% (631/6310) 631/100

        8.7801e-01, 9.4529e-01, 9.1359e-02, 3.3401e-01])
tensor([2.1390e-01, 9.8194e-01, 9.9482e-01, 3.2454e-01, 5.9205e-01, 9.0689e-01,s | Tot: 1m11s | Loss: 10.123 | Acc: 9.858% (693/7030) 703/1000 
        8.0156e-01, 9.7776e-01, 4.1257e-01, 3.7190e-04])
tensor([9.9284e-01, 3.1322e-01, 2.8614e-02, 9.9962e-01, 1.7368e-02, 9.8457e-01,s | Tot: 1m11s | Loss: 10.111 | Acc: 9.844% (693/7040) 704/1000 
        9.4419e-04, 4.3852e-01, 9.9987e-01, 9.9196e-01])
tensor([4.0607e-01, 8.1795e-01, 6.1552e-05, 7.3460e-01, 9.9980e-01, 5.6864e-01,s | Tot: 1m11s | Loss: 10.099 | Acc: 9.858% (695/7050) 705/1000 
        1.8013e-04, 8.6909e-01, 6.9805e-10, 5.5706e-02])
tensor([9.9329e-01, 8.8511e-01, 3.0949e-01, 9.9997e-01, 2.1339e-02, 9.0172e-01,s | Tot: 1m11s | Loss: 10.115 | Acc: 9.858% (696/7060) 706/1000 
        9.9203e-01, 7.7969e-01, 2.5577e-03, 8.6526e-06])
tensor([1.0000, 1.0000, 0.3222, 0.9986, 0.9602, 0.0130, 0.7462, 0.0938, 0.0014,s | Tot: 1m11s | Loss: 10.095 | Acc: 9.844% (696/7070) 707/1

        9.9999e-01, 5.8820e-01, 9.7179e-01, 8.9478e-01])
tensor([0.9845, 0.9807, 0.8954, 0.1201, 0.8079, 0.0133, 0.9998, 0.7566, 0.0071,s | Tot: 1m19s | Loss: 9.959 | Acc: 9.717% (756/7780) 778/1000 
        0.6367])
tensor([1.9420e-04, 9.2381e-01, 7.8901e-01, 8.7476e-03, 9.9891e-01, 1.8734e-01,s | Tot: 1m19s | Loss: 9.937 | Acc: 9.718% (757/7790) 779/1000 
        1.0238e-03, 9.6566e-01, 9.9795e-01, 2.9347e-01])
tensor([8.4110e-03, 6.2733e-01, 7.8379e-01, 2.6968e-05, 1.0991e-01, 9.5578e-01,s | Tot: 1m19s | Loss: 9.937 | Acc: 9.705% (757/7800) 780/1000 
        2.1424e-04, 9.9990e-01, 1.6132e-01, 9.1598e-01])
tensor([0.7126, 0.0362, 0.0025, 0.0167, 0.0781, 0.0611, 0.3697, 0.6158, 0.2422,s | Tot: 1m19s | Loss: 9.931 | Acc: 9.693% (757/7810) 781/1000 
        0.9992])
tensor([9.8927e-01, 8.1227e-08, 9.9745e-01, 8.6842e-03, 5.3084e-09, 1.2649e-04,s | Tot: 1m19s | Loss: 9.925 | Acc: 9.693% (758/7820) 782/1000 
        1.4420e-04, 2.0692e-05, 2.2192e-01, 2.2496e-02])
tensor([6.8173e-02, 9.5

        2.8399e-01, 9.4024e-04, 4.7507e-02, 2.0320e-03])
tensor([7.3058e-02, 9.9988e-01, 1.5601e-01, 9.3233e-02, 5.1172e-05, 4.4869e-01,ms | Tot: 1m27s | Loss: 10.025 | Acc: 9.779% (841/8600) 860/1000 
        2.8306e-02, 3.8250e-03, 7.7053e-08, 6.4287e-01])
tensor([6.6907e-01, 6.2101e-01, 9.6845e-01, 5.8884e-01, 2.0053e-05, 8.9138e-01,ms | Tot: 1m27s | Loss: 10.041 | Acc: 9.791% (843/8610) 861/1000 
        1.4073e-01, 6.4485e-01, 2.1046e-01, 9.9195e-01])
tensor([7.0513e-04, 1.9088e-07, 7.4736e-01, 1.0000e+00, 5.3911e-01, 4.6278e-01,ms | Tot: 1m27s | Loss: 10.035 | Acc: 9.791% (844/8620) 862/1000 
        9.7733e-01, 9.8680e-01, 9.8234e-01, 3.4956e-02])
tensor([9.9985e-01, 9.9877e-01, 6.0643e-02, 9.8691e-01, 1.5094e-03, 8.7721e-02,s | Tot: 1m27s | Loss: 10.025 | Acc: 9.791% (845/8630) 863/1000 
        3.8161e-02, 9.5538e-01, 6.3395e-04, 1.2464e-11])
tensor([4.0089e-01, 6.7123e-02, 8.1029e-02, 1.6306e-05, 2.1533e-01, 1.2275e-06,s | Tot: 1m27s | Loss: 10.036 | Acc: 9.792% (846/8640) 86

        4.2198e-01, 2.6252e-01, 6.3350e-03, 5.5001e-04])
tensor([1.3739e-01, 8.8440e-01, 3.5494e-03, 6.7841e-01, 3.2572e-06, 3.0424e-01,s | Tot: 1m34s | Loss: 10.001 | Acc: 9.893% (925/9350) 935/1000 
        2.8546e-03, 1.3026e-03, 2.4653e-04, 1.1117e-06])
tensor([4.4797e-09, 1.3571e-02, 5.7879e-04, 1.3876e-01, 1.3832e-01, 9.8339e-01,s | Tot: 1m34s | Loss: 10.009 | Acc: 9.904% (927/9360) 936/1000 
        4.9417e-01, 9.8209e-01, 9.9450e-02, 9.6053e-01])
tensor([1.4724e-03, 6.9244e-03, 9.3131e-01, 9.3586e-01, 9.9998e-01, 9.9815e-01,s | Tot: 1m34s | Loss: 10.015 | Acc: 9.936% (931/9370) 937/1000 
        7.9505e-05, 2.9673e-02, 6.3064e-07, 9.9274e-05])
tensor([2.3645e-02, 7.0775e-01, 3.3999e-08, 1.1760e-04, 1.4764e-04, 9.9997e-01,s | Tot: 1m34s | Loss: 10.031 | Acc: 9.936% (932/9380) 938/1000 
        8.6233e-01, 6.0324e-01, 6.8433e-01, 4.6112e-01])
tensor([3.5204e-03, 1.4759e-02, 4.1506e-02, 2.6450e-01, 9.7614e-01, 8.3066e-01,s | Tot: 1m34s | Loss: 10.044 | Acc: 9.936% (933/9390) 939/1

> 10% accuracy on testing set, well, nothing much is learned....

# Incorporating *a priori*
Geometrical *a priori* are appealing for image classification tasks. For now, we only consider linear transformations $\mathcal{T}$ of the inputs $x:\mathbb{S}^2\rightarrow\mathbb{R}$ where $\mathbb{S}$ is the support of an image, meaning that:

$$\forall u\in\mathbb{S}^2,\mathcal{T}(\lambda x+\mu y)(u)=\lambda \mathcal{T}(x)(u)+\mu \mathcal{T}(y)(u)\,.$$

For instance if an image had an infinite support, a translation $\mathcal{T}_a$ by $a$ would lead to:

$$\forall u, \mathcal{T}_a(x)(u)=x(u-a)\,.$$

Otherwise, one has to handle several boundary effects.

__Question 7:__ Explain the issues when dealing with translations, rotations, scaling effects, color changes on $32\times32$ images. Propose several ideas to tackle them.

> Issue with translation: New pixels are added to the image and some are lost during the translation because the image as a finite support. We use paddinfg to deal with this issue, putting new pixels to zero, but it's far from being a perfect solution.

> Issue for rotations: during rotation, if the angle is not 90, 180 or 270° and the image a square, we will lose some pixels and have to add some. Again, we can tackle this with padding.

> Issue for scaling effects: We basically change the proportion of the image, and thus might change it. Using interpolation, we can change the scale of the image without changing too much the proportions.

> Issue with color changes: We might change it too much and make it unrecognizable. Changing small parameters as brightness changes the color without affecting too much the image content.

## Data augmentations

__Question 8:__ Propose a set of geometric transformation beyond translation, and incorporate them in your training pipeline. Train the model of the __Question 3__ and __Question 4__ with them and report the accuracies.

In [119]:
device = 'cpu'
best_acc = 0


trainset1 = CIFAR10("data", download=True, x_train=True, x=False, x_test=False, transform=transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomGrayscale(p=0.1),
    transforms.RandomAffine(45, (0.15, 0.15)),
    
    transforms.RandomHorizontalFlip(),
    transforms.ColorJitter(brightness=2, contrast=2, saturation=2, hue=0.25),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),]), target_transform=None)


trainset2 = CIFAR10("data", download=True, x_train=True, x=False, x_test=False, transform=transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomGrayscale(p=0.1),
    transforms.RandomAffine(45, (0.15, 0.15)),
    
    transforms.RandomHorizontalFlip(),
    transforms.ColorJitter(brightness=2, contrast=2, saturation=2, hue=0.25),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),]), target_transform=None)


trainset3 = CIFAR10("data", download=True, x_train=True, x=False, x_test=False, transform=transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomGrayscale(p=0.1),
    transforms.RandomAffine(45, (0.15, 0.15)),
    
    transforms.RandomHorizontalFlip(),
    transforms.ColorJitter(brightness=2, contrast=2, saturation=2, hue=0.25),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),]), target_transform=None)


trainset4 = CIFAR10("data", download=True, x_train=True, x=False, x_test=False, transform=transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomGrayscale(p=0.1),
    transforms.RandomAffine(45, (0.15, 0.15)),
    
    transforms.RandomHorizontalFlip(),
    transforms.ColorJitter(brightness=2, contrast=2, saturation=2, hue=0.25),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),]), target_transform=None)


trainset5 = CIFAR10("data", download=True, x_train=True, x=False, x_test=False, transform=transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomGrayscale(p=0.1),
    transforms.RandomAffine(45, (0.15, 0.15)),
    
    transforms.RandomHorizontalFlip(),
    transforms.ColorJitter(brightness=2, contrast=2, saturation=2, hue=0.25),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),]), target_transform=None)

trainset = []
for st in [trainset1,trainset2, trainset3, trainset4, trainset5]:
    for element in st:
        trainset.append(element)

trainloader = torch.utils.data.DataLoader(trainset, batch_size=50, shuffle=True)


testset = CIFAR10("data", download=True, x_train=False, x=False, x_test=True, transform=transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),]), target_transform=None)
testloader = torch.utils.data.DataLoader(testset, batch_size=10, shuffle=False)


classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

net = ResNet18()
net = net.to(device)

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=lr, momentum=0.9, weight_decay=5e-4)

def train(epoch):
    print('\nEpoch: %d' % epoch)
    net.train()
    train_loss = 0
    correct = 0
    total = 0
    for batch_idx, (inputs, targets) in enumerate(trainloader):
        optimizer.zero_grad()
        outputs = net(inputs)
        loss = criterion(outputs, targets)
        loss.backward()
        optimizer.step()

        train_loss += loss.item()
        _, predicted = outputs.max(1)
        total += targets.size(0)
        correct += predicted.eq(targets).sum().item()

        progress_bar(batch_idx, len(trainloader), 'Loss: %.3f | Acc: %.3f%% (%d/%d)'
            % (train_loss/(batch_idx+1), 100.*correct/total, correct, total))

def test(epoch):
    global best_acc
    net.eval()
    test_loss = 0
    correct = 0
    total = 0
    with torch.no_grad():
        for batch_idx, (inputs, targets) in enumerate(testloader):
            outputs = net(inputs)
            loss = criterion(outputs, targets)

            test_loss += loss.item()
            _, predicted = outputs.max(1)
            total += targets.size(0)
            correct += predicted.eq(targets).sum().item()

            progress_bar(batch_idx, len(testloader), 'Loss: %.3f | Acc: %.3f%% (%d/%d)'
                % (test_loss/(batch_idx+1), 100.*correct/total, correct, total))

    acc = 100.*correct/total
        

print("\n Training")
lr = 0.11
nb_epochs = 30

for epoch in range(nb_epochs):
    train(epoch)
    
print("\n Testing")
test(epoch)

==> Preparing data..
Using downloaded and verified file: data\cifar-10-python.tar.gz
Using downloaded and verified file: data\cifar-10-python.tar.gz
Using downloaded and verified file: data\cifar-10-python.tar.gz
Using downloaded and verified file: data\cifar-10-python.tar.gz
Using downloaded and verified file: data\cifar-10-python.tar.gz
Using downloaded and verified file: data\cifar-10-python.tar.gz
==> Building model..

 Training

Epoch: 0

Epoch: 1

Epoch: 2

Epoch: 3

Epoch: 4

Epoch: 5



Epoch: 6

Epoch: 7

Epoch: 8

Epoch: 9

Epoch: 10



Epoch: 11

Epoch: 12

Epoch: 13

Epoch: 14

Epoch: 15



Epoch: 16

Epoch: 17

Epoch: 18

Epoch: 19

 Testing


 [>................................................................]  Step: 873ms | Tot: 41ms | Loss: 2.253 | Acc: 30.000% (3/10) 1/1000 >................................................................]  Step: 608ms | Tot: 650ms | Loss: 2.465 | Acc: 15.000% (3/20) 2/1000 >................................................................]  Step: 653ms | Tot: 1s304ms | Loss: 2.430 | Acc: 13.333% (4/30) 3/1000 >................................................................]  Step: 620ms | Tot: 1s924ms | Loss: 2.431 | Acc: 12.500% (5/40) 4/1000 >................................................................]  Step: 649ms | Tot: 2s573ms | Loss: 2.435 | Acc: 10.000% (5/50) 5/1000 >................................................................]  Step: 658ms | Tot: 3s231ms | Loss: 2.432 | Acc: 11.667% (7/60) 6/1000 >................................................................]  Step: 620ms | Tot: 3s852ms | Loss: 2.379 | Acc: 14.286% (10/70) 7/1000 >...................................



































In [112]:
device = 'cpu'

best_acc = 0  


trainset1 = CIFAR10("data", download=True, x_train=True, x=False, x_test=False, transform=transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomGrayscale(p=0.1),
    transforms.RandomAffine(45, (0.15, 0.15)),
    
    transforms.RandomHorizontalFlip(),
    transforms.ColorJitter(brightness=2, contrast=2, saturation=2, hue=0.25),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),]), target_transform=None)

trainset2 = CIFAR10("data", download=True, x_train=True, x=False, x_test=False, transform=transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomGrayscale(p=0.1),
    transforms.RandomAffine(45, (0.15, 0.15)),
    
    transforms.RandomHorizontalFlip(),
    transforms.ColorJitter(brightness=2, contrast=2, saturation=2, hue=0.25),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),]), target_transform=None)


trainset3 = CIFAR10("data", download=True, x_train=True, x=False, x_test=False, transform=transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomGrayscale(p=0.1),
    transforms.RandomAffine(45, (0.15, 0.15)),
    
    transforms.RandomHorizontalFlip(),
    transforms.ColorJitter(brightness=2, contrast=2, saturation=2, hue=0.25),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),]), target_transform=None)


trainset4 = CIFAR10("data", download=True, x_train=True, x=False, x_test=False, transform=transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomGrayscale(p=0.1),
    transforms.RandomAffine(45, (0.15, 0.15)),
    
    transforms.RandomHorizontalFlip(),
    transforms.ColorJitter(brightness=2, contrast=2, saturation=2, hue=0.25),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),]), target_transform=None)


trainset5 = CIFAR10("data", download=True, x_train=True, x=False, x_test=False, transform=transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomGrayscale(p=0.1),
    transforms.RandomAffine(45, (0.15, 0.15)),
    
    transforms.RandomHorizontalFlip(),
    transforms.ColorJitter(brightness=2, contrast=2, saturation=2, hue=0.25),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),]), target_transform=None)

trainset = []
for st in [trainset1,trainset2, trainset3, trainset4, trainset5]:
    for element in st:
        trainset.append(element)

trainloader = torch.utils.data.DataLoader(trainset, batch_size=50, shuffle=True)


testset = CIFAR10("data", download=True, x_train=False, x=False, x_test=True, transform=transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),]), target_transform=None)
testloader = torch.utils.data.DataLoader(testset, batch_size=10, shuffle=False)


classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

net = VGG('VGG16')
net = net.to(device)

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=lr, momentum=0.9, weight_decay=5e-4)

def train(epoch):
    print('\nEpoch: %d' % epoch)
    net.train()
    train_loss = 0
    correct = 0
    total = 0
    for batch_idx, (inputs, targets) in enumerate(trainloader):
        optimizer.zero_grad()
        outputs = net(inputs)
        loss = criterion(outputs, targets)
        loss.backward()
        optimizer.step()

        train_loss += loss.item()
        _, predicted = outputs.max(1)
        total += targets.size(0)
        correct += predicted.eq(targets).sum().item()

        progress_bar(batch_idx, len(trainloader), 'Loss: %.3f | Acc: %.3f%% (%d/%d)'
            % (train_loss/(batch_idx+1), 100.*correct/total, correct, total))

def test(epoch):
    global best_acc
    net.eval()
    test_loss = 0
    correct = 0
    total = 0
    with torch.no_grad():
        for batch_idx, (inputs, targets) in enumerate(testloader):
            outputs = net(inputs)
            loss = criterion(outputs, targets)

            test_loss += loss.item()
            _, predicted = outputs.max(1)
            total += targets.size(0)
            correct += predicted.eq(targets).sum().item()

            progress_bar(batch_idx, len(testloader), 'Loss: %.3f | Acc: %.3f%% (%d/%d)'
                % (test_loss/(batch_idx+1), 100.*correct/total, correct, total))

    acc = 100.*correct/total

print("\n Training")
lr = 0.12
nb_epochs = 40

for epoch in range(nb_epochs):
    train(epoch)
    
print("\n Testing")
test(epoch)

==> Preparing data..
Using downloaded and verified file: data\cifar-10-python.tar.gz
Using downloaded and verified file: data\cifar-10-python.tar.gz
Using downloaded and verified file: data\cifar-10-python.tar.gz
Using downloaded and verified file: data\cifar-10-python.tar.gz
Using downloaded and verified file: data\cifar-10-python.tar.gz
Using downloaded and verified file: data\cifar-10-python.tar.gz
==> Building model..

 Training

Epoch: 0

Epoch: 1

Epoch: 2

Epoch: 3

Epoch: 4

Epoch: 5



Epoch: 6

Epoch: 7

Epoch: 8

Epoch: 9

Epoch: 10



Epoch: 11

Epoch: 12

Epoch: 13

Epoch: 14

Epoch: 15



Epoch: 16

Epoch: 17

Epoch: 18

Epoch: 19

Epoch: 20



Epoch: 21

Epoch: 22

Epoch: 23

Epoch: 24

Epoch: 25



Epoch: 26

Epoch: 27

Epoch: 28

Epoch: 29

Epoch: 30



Epoch: 31

Epoch: 32

Epoch: 33

Epoch: 34

Epoch: 35



Epoch: 36

Epoch: 37

Epoch: 38

Epoch: 39

 Testing


 [>................................................................]  Step: 414ms | Tot: 37ms | Loss: 2.457 | Acc: 20.000% (2/10) 1/1000 >................................................................]  Step: 363ms | Tot: 401ms | Loss: 2.458 | Acc: 15.000% (3/20) 2/1000 >................................................................]  Step: 346ms | Tot: 747ms | Loss: 2.358 | Acc: 13.333% (4/30) 3/1000 >................................................................]  Step: 315ms | Tot: 1s63ms | Loss: 2.354 | Acc: 12.500% (5/40) 4/1000 >................................................................]  Step: 336ms | Tot: 1s399ms | Loss: 2.341 | Acc: 12.000% (6/50) 5/1000 >................................................................]  Step: 346ms | Tot: 1s746ms | Loss: 2.384 | Acc: 10.000% (6/60) 6/1000 >................................................................]  Step: 299ms | Tot: 2s46ms | Loss: 2.370 | Acc: 10.000% (7/70) 7/1000 >........................................

































Saving..


>| Model | Number of  epochs  | Train accuracy | Test accuracy |
|------|------|------|------|
| ResNet18 + DataAugmentation | 40 | 25.80 % | 21.30% |
| VGG16 + DataAugmentation | 40 | 17.40 % | 11.77% |

## Wavelets

__Question 9:__ Use a Scattering Transform as an input to a ResNet-like architecture. You can find a baseline here: https://arxiv.org/pdf/1703.08961.pdf.

*Hint:* You can use the following package: https://www.kymat.io/

In [110]:
from kymatio import Scattering2D


scattering = Scattering2D(J=2, shape=(32, 32))


best_acc = 0 
device = 'cpu'

trainset = CIFAR10("data", download=True, x_train=True, x=False, x_test=False, transform=transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),]), target_transform=None)

testset = CIFAR10("data", download=True, x_train=False, x=False, x_test=True, transform=transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),]), target_transform=None)

new_train = []

for i in range(len(trainset)):
    
    new_train.append((scattering(trainset[i][0]).view(-1, 81, 64),trainset[i][1]))

    
new_test = []

for i in range(len(testset)):
    
    new_test.append((scattering(testset[i][0]).view(-1, 81, 64),testset[i][1]))

    
trainloader = torch.utils.data.DataLoader(new_train, batch_size=10, shuffle=True)
testloader = torch.utils.data.DataLoader(new_test, batch_size=10, shuffle=False)


classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

net = ResNet18()
net.linear = torch.nn.Linear(in_features=2048, out_features=10, bias=True)
net = net.to(device)

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=lr, momentum=0.8, weight_decay=5e-4)

# Training
def train(epoch):
    print('\nEpoch: %d' % epoch)
    net.train()
    train_loss = 0
    correct = 0
    total = 0
    for batch_idx, (inputs, targets) in enumerate(trainloader):
        optimizer.zero_grad()
        outputs = net(inputs)
        loss = criterion(outputs, targets)
        loss.backward()
        optimizer.step()

        train_loss += loss.item()
        _, predicted = outputs.max(1)
        total += targets.size(0)
        correct += predicted.eq(targets).sum().item()

        progress_bar(batch_idx, len(trainloader), 'Loss: %.3f | Acc: %.3f%% (%d/%d)'
            % (train_loss/(batch_idx+1), 100.*correct/total, correct, total))

def test(epoch):
    global best_acc
    net.eval()
    test_loss = 0
    correct = 0
    total = 0
    with torch.no_grad():
        for batch_idx, (inputs, targets) in enumerate(testloader):
            outputs = net(inputs)
            loss = criterion(outputs, targets)

            test_loss += loss.item()
            _, predicted = outputs.max(1)
            total += targets.size(0)
            correct += predicted.eq(targets).sum().item()

            progress_bar(batch_idx, len(testloader), 'Loss: %.3f | Acc: %.3f%% (%d/%d)'
                % (test_loss/(batch_idx+1), 100.*correct/total, correct, total))

    # Save checkpoint.
    acc = 100.*correct/total      

print("\n Training")
lr = 0.11
nb_epochs = 40

for epoch in range(nb_epochs):
    train(epoch)
    
print("\n Testing")
test(epoch)

==> Preparing data..
Using downloaded and verified file: data\cifar-10-python.tar.gz
Using downloaded and verified file: data\cifar-10-python.tar.gz
==> Building model..

 Training

Epoch: 0

Epoch: 1

Epoch: 2

Epoch: 3

Epoch: 4

Epoch: 5



Epoch: 6

Epoch: 7

Epoch: 8

Epoch: 9

Epoch: 10



Epoch: 11

Epoch: 12

Epoch: 13

Epoch: 14

Epoch: 15



Epoch: 16

Epoch: 17

Epoch: 18

Epoch: 19

Epoch: 20



Epoch: 21

Epoch: 22

Epoch: 23

Epoch: 24

Epoch: 25



Epoch: 26

Epoch: 27

Epoch: 28

Epoch: 29

Epoch: 30



Epoch: 31

Epoch: 32

Epoch: 33

Epoch: 34

Epoch: 35



Epoch: 36

Epoch: 37

Epoch: 38

Epoch: 39

 Testing


 [>................................................................]  Step: 1s956ms | Tot: 34ms | Loss: 4.375 | Acc: 0.000% (0/10) 1/1000 >................................................................]  Step: 1s992ms | Tot: 2s27ms | Loss: 3.722 | Acc: 15.000% (3/20) 2/1000 >................................................................]  Step: 1s973ms | Tot: 4s1ms | Loss: 3.441 | Acc: 20.000% (6/30) 3/1000 >................................................................]  Step: 2s26ms | Tot: 6s28ms | Loss: 3.231 | Acc: 22.500% (9/40) 4/1000 >................................................................]  Step: 1s968ms | Tot: 7s996ms | Loss: 3.305 | Acc: 22.000% (11/50) 5/1000 >................................................................]  Step: 1s892ms | Tot: 9s889ms | Loss: 3.259 | Acc: 20.000% (12/60) 6/1000 >................................................................]  Step: 2s10ms | Tot: 11s900ms | Loss: 3.311 | Acc: 18.571% (13/70) 7/1000 >.......................



































>| Model | Number of  epochs  | Train accuracy | Test accuracy |
|------|------|------|------|
| Scattering Transform + ResNet18 | 40 | 60.00 % | 18.740% |

# Weak supervision

Weakly supervised techniques permit to tackle the issue of labeled data. An introduction to those techniques can be found here: https://hazyresearch.github.io/snorkel/blog/ws_blog_post.html.

__(Open) Question 10:__ Pick a weakly supervised method that will potentially use $\mathcal{X}\cup\mathcal{X}_{\text{train}}$ to train a representation (a subset of $\mathcal{X}$ is also fine). Evaluate it and report the accuracies. You should be careful in the choice of your method, in order to avoid heavy computational effort.

# Conclusions

__Question 11:__ Write a short report explaining the pros and the cons of each methods that you implemented. 25% of the grade of this project will correspond to this question, thus, it should be done carefully. In particular, please add a plot that will summarize all your numerical results.

> Summary of the results:

>| Model | Number of  epochs  | Train accuracy | Test accuracy |
|------|------|------|------|
| ResNet18 | 20 | 50.00 % | 23.25% | 93.02% |
| VGG16 | 40 | 17.00% | 10.151% | 92.64% |
| ResNet18 (transfer learning) | 20 | 45.00 % | 21.340% | 
| ResNet18 + DataAugmentation | 40 | 25.80 % | 21.30% |
| VGG16 + DataAugmentation | 40 | 17.00 % | 10.00% | 
| Scattering Transform + ResNet18 | 40 | 60.00 % | 18.740% |

> We have tried several models through this project.
The first thing we can see is that all the models have overfitted a lot. This is due to the lack of data. Indeed 100 images is way not enough to train sustainably such deep architectures. Data augmentation was able to tackle a little this issue, but not as much as it could we more initial data.

> Moreover, we can think that for most models, more computation (more epochs) might be needed, but we were limited in term of computation power. It can be illustrated with the VGG and ResNet architectures. In the literature, they are supposed to both give similar results on the CIFAR dataset. However, VGG architectures are deeper and thus it is here more difficult to train them, with small computation power and small datasets.

> Transfer learning is an efficient method to quickly train a network with samll computation power. We can achieve really good results in a small amount of time with well trained weights. However because some weights were trained beforehand with a different kind a dataset, this method will always remain less precise than if we had the possiblity to fully train a network with a lot of data. In our case, we were able to achive really good results in almost no time.

> DCGANs are an interesting architecture. However it's really difficult to make them converge. We end up with both generator and discriminator. The discriminator will never be as efficient as a trained classsifier but the generator can be used for a later data augmentation which makes it particularly useful. As the discriminator is also trained on the generator, we can think that it is trained on data that is less accurate and thus will be worse than a classifier trained on full original data.

> Data Augmentation lowered a lot overfitting which is a really good thing. However because we added less relevant data to the training set (through this augmentation), the classifier does not perform really better on the testing set. We still reduced a lot overfitting which is a really good thing. It could be interesting to proceed to more data augmentation with several more techniques. On some tries, the Resnet overfitted a lot (~90% accuracy on the training set and ~20% on the testing set, and I couldn't explain why...).

> Finally, we witnessed that data transformation (scattering transform) was not that efficient. But again, this section was really time consuming and I wasn't able to fully experience it with more epochs, as it was already particularly long. Plus here, it looks like it increases overfitting in our case. Looking at papers dealing with it, we can see that it is particularly efficient (alos on the CIFAR10 dataset) in hybrid architectures (encoding the scattering transform within the net).

> As a conclusion we can be a bit disappointed by the results obtained here. Because most of methods (data augmentation, caterring transform, transfer learning) are supposed to improve the results or at least in some contexts, but it is not reflected by the numbers obtained here.