# Small data and deep learning
This mini-project proposes to study several techniques for improving challenging context, in which few data and resources are available.

# Introduction
Assume we are in a context where few "gold" labeled data are available for training, say $\mathcal{X}_{\text{train}}\triangleq\{(x_n,y_n)\}_{n\leq N_{\text{train}}}$, where $N_{\text{train}}$ is small. A large test set $\mathcal{X}_{\text{test}}$ is available. A large amount of unlabeled data, $\mathcal{X}$, is available. We also assume that we have a limited computational budget (e.g., no GPUs).

For each question, write a commented *Code* or a complete answer as a *Markdown*. When the objective of a question is to report a CNN accuracy, please use the following format to report it, at the end of the question:

| Model | Number of  epochs  | Train accuracy | Test accuracy |
|------|------|------|------|
|   XXX  | XXX | XXX | XXX |

If applicable, please add the field corresponding to the  __Accuracy on Full Data__ as well as a link to the __Reference paper__ you used to report those numbers. (You do not need to train a CNN on the full CIFAR10 dataset)

In your final report, please keep the logs of each training procedure you used. We will only run this jupyter if we have some doubts on your implementation. 

__The total file sizes should not exceed 2MB. Please name your notebook (LASTNAME)\_(FIRSTNAME).ipynb, zip/tar it with any necessary files required to run your notebook, in a compressed file named (LASTNAME)\_(FIRSTNAME).X where X is the corresponding extension. Zip/tar files exceeding 2MB will not be considered for grading. Submit the compressed file via the submission link provided on the website of the class.__

You can use https://colab.research.google.com/ to run your experiments.

**Library Dependency**: Numpy, PIL, matplotlib, torch, tqdm, torchsummary (for debug), tensorboardX

In [57]:
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt

import os, sys
if sys.version_info[0] == 2:
    import cPickle as pickle
else:
    import pickle

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torch.utils.data as data

from torch.autograd import Variable
from torch.utils.data import TensorDataset, DataLoader

import torchvision
import torchvision.models as models
import torchvision.transforms as transforms

import tensorboardX

from torchvision.datasets.utils import check_integrity, download_url

from torchsummary import summary

from tqdm import trange, tqdm

In [2]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'

## Training set creation
__Question 1:__ Propose a dataloader or modify the file located at https://github.com/pytorch/vision/blob/master/torchvision/datasets/cifar.py in order to obtain a training loader that will only use the first 100 samples of the CIFAR-10 training set. 

> The following class can be used for three purposes:
> * Get a train dataset with at most n examples (train = 1, max_num = n)
> * Get the whole CIFAR10 dataset of size 60000 (train = -1)
> * Get the test dataset of size 10000 (train = 0)

In [3]:
class CIFAR10(data.Dataset):
    """`CIFAR10 <https://www.cs.toronto.edu/~kriz/cifar.html>`_ Dataset.
    Args:
        root (string): Root directory of dataset where directory
            ``cifar-10-batches-py`` exists or will be saved to if download is set to True.
        train (bool, optional): If True, creates dataset from training set, otherwise
            creates from test set.
        transform (callable, optional): A function/transform that takes in an PIL image
            and returns a transformed version. E.g, ``transforms.RandomCrop``
        target_transform (callable, optional): A function/transform that takes in the
            target and transforms it.
        download (bool, optional): If true, downloads the dataset from the internet and
            puts it in root directory. If dataset is already downloaded, it is not
            downloaded again.
    """
    base_folder = 'cifar-10-batches-py'
    url = "https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz"
    filename = "cifar-10-python.tar.gz"
    tgz_md5 = 'c58f30108f718f92721af3b95e74349a'
    train_list = [
        ['data_batch_1', 'c99cafc152244af753f735de768cd75f'],
        ['data_batch_2', 'd4bba439e000b95fd0a9bffe97cbabec'],
        ['data_batch_3', '54ebc095f3ab1f0389bbae665268c751'],
        ['data_batch_4', '634d18415352ddfa80567beed471001a'],
        ['data_batch_5', '482c414d41f54cd18b22e5b47cb7c3cb'],
    ]

    test_list = [
        ['test_batch', '40351d587109b95175f43aff81a1287e'],
    ]
    meta = {
        'filename': 'batches.meta',
        'key': 'label_names',
        'md5': '5ff9c542aee3614f3951f8cda6e48888',
    }

    def __init__(self, root, train=1, max_num = 50000,
                 transform=None, target_transform=None,
                 download=False):
        self.root = os.path.expanduser(root)
        self.transform = transform
        self.target_transform = target_transform
        self.train = train  # training set or test set

        if download:
            self.download()

        if not self._check_integrity():
            raise RuntimeError('Dataset not found or corrupted.' +
                               ' You can use download=True to download it')

        if self.train == 1:
            downloaded_list = self.train_list
        elif self.train == 0:
            downloaded_list = self.test_list
        elif self.train == -1:
            downloaded_list = self.train_list + self.test_list

        self.data = []
        self.targets = []

        # now load the picked numpy arrays
        for file_name, checksum in downloaded_list:
            file_path = os.path.join(self.root, self.base_folder, file_name)
            with open(file_path, 'rb') as f:
                if sys.version_info[0] == 2:
                    entry = pickle.load(f)
                else:
                    entry = pickle.load(f, encoding='latin1')
                self.data.append(entry['data'])
                if 'labels' in entry:
                    self.targets.extend(entry['labels'])
                else:
                    self.targets.extend(entry['fine_labels'])

        self.data = np.vstack(self.data).reshape(-1, 3, 32, 32)
        self.data = self.data.transpose((0, 2, 3, 1))  # convert to HWC
        
        if len(self.data) > max_num:
            self.data = self.data[:max_num]
            self.targets = self.targets[:max_num]

        self._load_meta()

    def _load_meta(self):
        path = os.path.join(self.root, self.base_folder, self.meta['filename'])
        if not check_integrity(path, self.meta['md5']):
            raise RuntimeError('Dataset metadata file not found or corrupted.' +
                               ' You can use download=True to download it')
        with open(path, 'rb') as infile:
            if sys.version_info[0] == 2:
                data = pickle.load(infile)
            else:
                data = pickle.load(infile, encoding='latin1')
            self.classes = data[self.meta['key']]
        self.class_to_idx = {_class: i for i, _class in enumerate(self.classes)}

    def __getitem__(self, index):
        """
        Args:
            index (int): Index
        Returns:
            tuple: (image, target) where target is index of the target class.
        """
        img, target = self.data[index], self.targets[index]

        # doing this so that it is consistent with all other datasets
        # to return a PIL Image
        img = Image.fromarray(img)

        if self.transform is not None:
            img = self.transform(img)

        if self.target_transform is not None:
            target = self.target_transform(target)

        return img, target

    def __len__(self):
        return len(self.data)

    def _check_integrity(self):
        root = self.root
        for fentry in (self.train_list + self.test_list):
            filename, md5 = fentry[0], fentry[1]
            fpath = os.path.join(root, self.base_folder, filename)
            if not check_integrity(fpath, md5):
                return False
        return True

    def download(self):
        import tarfile

        if self._check_integrity():
            print('Files already downloaded and verified')
            return

        download_url(self.url, self.root, self.filename, self.tgz_md5)

        # extract file
        with tarfile.open(os.path.join(self.root, self.filename), "r:gz") as tar:
            tar.extractall(path=self.root)

    def __repr__(self):
        fmt_str = 'Dataset ' + self.__class__.__name__ + '\n'
        fmt_str += '    Number of datapoints: {}\n'.format(self.__len__())
        tmp = 'train' if self.train is True else 'test'
        fmt_str += '    Split: {}\n'.format(tmp)
        fmt_str += '    Root Location: {}\n'.format(self.root)
        tmp = '    Transforms (if any): '
        fmt_str += '{0}{1}\n'.format(tmp, self.transform.__repr__().replace('\n', '\n' + ' ' * len(tmp)))
        tmp = '    Target Transforms (if any): '
        fmt_str += '{0}{1}'.format(tmp, self.target_transform.__repr__().replace('\n', '\n' + ' ' * len(tmp)))
        return fmt_str

In [4]:
trainset = CIFAR10("../data", train = 1, max_num = 100)
testset = CIFAR10("../data", train = 0)

print("Train Loader Loss: ", len(trainset))
print("Test Loader Loss : ", len(testset))

Train Loader Loss:  100
Test Loader Loss :  10000


This is our dataset $\mathcal{X}_{\text{train}}$, it will be used until the end of this project. The remaining samples correspond to $\mathcal{X}$. The testing set $\mathcal{X}_{\text{test}}$ corresponds to the whole testing set of CIFAR-10.

## Testing procedure
__Question 2:__ Explain why the evaluation of the training procedure is difficult. Propose several solutions.

> Since we have few training examples, it is difficult to seperate a validation dataset to help us evaluate the model. Otherwise the model may overfit easily our training dataset.

> (1) One possibility is to download some pictures on the internet which correspond to our classes. This dataset can be used as the validation dataset.

> (2) K-folder is not widely used in deep learning because of its computational cost. However it is suitable when we have few examples. We can split the whose set into 5 (10) folders and each time we train the model on 4 (9) shares and evaluate it on 1 share. 

> (3) Instead of fix K folders, we can choose randomly each time a certain number (3 - 5) of examples as the validation set. 

# Raw approach: the baseline

In this section, the goal is to train a CNN on $\mathcal{X}_{\text{train}}$ and compare its performances with reported number from the litterature. You will have to re-use and/or design a standard classification pipeline. You should optimize your pipeline to obtain the best performances (image size, data augmentation by flip, ...).

The key ingredients for training a CNN are the batch size, as well as the learning rate schedule, i.e. how to decrease the learning rate as a function of the number of epochs. A possible schedule is to start the learning rate at 0.1 and decreasing it every 30 epochs by 10. In case of divergence, reduce the laerning rate. A potential batch size could be 10, yet this can be cross-validated.

You can get some baselines accuracies in this paper: http://openaccess.thecvf.com/content_cvpr_2018/papers/Keshari_Learning_Structure_and_CVPR_2018_paper.pdf. Obviously, it is a different context those researchers had access to GPUs.

## ResNet architectures

### Net Architecture

In [5]:
class BasicBlock(nn.Module):
    expansion = 1

    def __init__(self, in_planes, planes, stride=1):
        super(BasicBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(planes)

        self.shortcut = nn.Sequential()
        if stride != 1 or in_planes != self.expansion*planes:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_planes, self.expansion*planes, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(self.expansion*planes)
            )

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.bn2(self.conv2(out))
        out += self.shortcut(x)
        out = F.relu(out)
        return out


class Bottleneck(nn.Module):
    expansion = 4

    def __init__(self, in_planes, planes, stride=1):
        super(Bottleneck, self).__init__()
        self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=1, bias=False)
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(planes)
        self.conv3 = nn.Conv2d(planes, self.expansion*planes, kernel_size=1, bias=False)
        self.bn3 = nn.BatchNorm2d(self.expansion*planes)

        self.shortcut = nn.Sequential()
        if stride != 1 or in_planes != self.expansion*planes:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_planes, self.expansion*planes, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(self.expansion*planes)
            )

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = F.relu(self.bn2(self.conv2(out)))
        out = self.bn3(self.conv3(out))
        out += self.shortcut(x)
        out = F.relu(out)
        return out


class ResNet(nn.Module):
    def __init__(self, block, num_blocks, num_classes=10):
        super(ResNet, self).__init__()
        self.in_planes = 64

        self.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.layer1 = self._make_layer(block, 64, num_blocks[0], stride=1)
        self.layer2 = self._make_layer(block, 128, num_blocks[1], stride=2)
        self.layer3 = self._make_layer(block, 256, num_blocks[2], stride=2)
        self.layer4 = self._make_layer(block, 512, num_blocks[3], stride=2)
        self.linear = nn.Linear(512*block.expansion, num_classes)

    def _make_layer(self, block, planes, num_blocks, stride):
        strides = [stride] + [1]*(num_blocks-1)
        layers = []
        for stride in strides:
            layers.append(block(self.in_planes, planes, stride))
            self.in_planes = planes * block.expansion
        return nn.Sequential(*layers)

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.layer1(out)
        out = self.layer2(out)
        out = self.layer3(out)
        out = self.layer4(out)
        out = F.avg_pool2d(out, 4)
        out = out.view(out.size(0), -1)
        out = self.linear(out)
        return out


def ResNet18():
    return ResNet(BasicBlock, [2, 2, 2, 2])

### Train and Validation

In [6]:
def train_epoch(epoch, net, trainloader, optimizer, criterion):
    
    train_loss = 0
    correct = 0
    total = 0
    
    
    with trange(len(trainloader), file=sys.stderr) as t:
        net.train()
        for batch_idx, (inputs, targets) in enumerate(trainloader):
            inputs, targets = inputs.to(device), targets.to(device)
            optimizer.zero_grad()
            outputs = net(inputs)
            loss = criterion(outputs, targets)
            loss.backward()
            optimizer.step()
            
            train_loss += loss.item()
            _, predicted = outputs.max(1)
            total += targets.size(0)
            correct += predicted.eq(targets).sum().item()
            t.update()

        t.set_postfix(train_loss = train_loss, train_acc = correct * 100. / total, epoch = epoch + 1)
        t.update()
        

def val_epoch(net, testloader):
    
    test_loss = 0
    correct = 0
    total = 0
        
    with trange(len(testloader), file=sys.stderr) as t:
        with torch.no_grad():
            for batch_idx, (inputs, targets) in enumerate(testloader):
                inputs, targets = inputs.to(device), targets.to(device)
                outputs = net(inputs)
                loss = criterion(outputs, targets)

                test_loss += loss.item()
                _, predicted = outputs.max(1)
                total += targets.size(0)
                correct += predicted.eq(targets).sum().item()
                t.update()

        t.set_postfix(test_loss = test_loss / total, test_acc = correct * 100. / total)
        t.update(1)
            
    return correct * 100. / total


def update_model(net, best_acc, epoch, name = 'ckpt'):
    
    tqdm.write("Saving...", file=sys.stderr)
    state = {'net': net.state_dict(),
             'acc': acc,
             'epoch': epoch,
            }
    
    if not os.path.isdir('checkpoint'):
        os.mkdir('checkpoint')
    torch.save(state, './checkpoint/' + name + '.t7')

### Data

In [37]:
transform_train = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])

transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])

trainset = CIFAR10("../data", train = 1, max_num = 100, transform = transform_train)
testset = CIFAR10("../data", train = 0, transform = transform_test)
cifarset = CIFAR10("../data", train = -1, max_num = 60000, transform = transform_test)

trainloader = torch.utils.data.DataLoader(trainset, batch_size = 10, shuffle = True, num_workers = 1)
testloader = torch.utils.data.DataLoader(testset, batch_size = 64, shuffle = False, num_workers = 1)
cifarloader = torch.utils.data.DataLoader(cifarset, batch_size = 64, shuffle = False, num_workers = 1)

In [38]:
# Define net
net = ResNet18()
net.to(device)

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr = 0.01, momentum = 0.9, weight_decay = 5e-4)
best_acc = 0

# Train
for epoch in range(100):
    
    train_epoch(epoch, net, trainloader, optimizer, criterion)
    
    if epoch % 10 == 0:
        acc = val_epoch(net, testloader)
        
        if acc > best_acc:
            best_acc = acc
            update_model(net, best_acc, epoch, "resnet18")

11it [00:00, 34.89it/s, epoch=1, train_acc=8, train_loss=24.8]                        
158it [00:03, 49.60it/s, test_acc=11.7, test_loss=0.0376]                         
Saving...
11it [00:00, 24.71it/s, epoch=2, train_acc=28, train_loss=21.2]                        
11it [00:00, 36.05it/s, epoch=3, train_acc=39, train_loss=19.5]                        
11it [00:00, 37.32it/s, epoch=4, train_acc=40, train_loss=16.2]                        
11it [00:00, 37.13it/s, epoch=5, train_acc=50, train_loss=14.4]                        
11it [00:00, 37.40it/s, epoch=6, train_acc=60, train_loss=12]                        
11it [00:00, 38.92it/s, epoch=7, train_acc=76, train_loss=8.11]                        
11it [00:00, 40.00it/s, epoch=8, train_acc=78, train_loss=6.15]                        
11it [00:00, 37.04it/s, epoch=9, train_acc=86, train_loss=4.32]                        
11it [00:00, 27.36it/s, epoch=10, train_acc=86, train_loss=3.69]                        
11it [00:00, 36.80it/s, epoch

158it [00:03, 52.06it/s, test_acc=24, test_loss=0.0695]                         
Saving...
11it [00:00, 36.60it/s, epoch=82, train_acc=100, train_loss=0.0121]                        
11it [00:00, 40.19it/s, epoch=83, train_acc=100, train_loss=0.0248]                        
11it [00:00, 40.14it/s, epoch=84, train_acc=100, train_loss=0.00552]                        
11it [00:00, 39.88it/s, epoch=85, train_acc=100, train_loss=0.00917]                        
11it [00:00, 39.85it/s, epoch=86, train_acc=100, train_loss=0.0203]                        
11it [00:00, 38.00it/s, epoch=87, train_acc=100, train_loss=0.0108]                        
11it [00:00, 40.12it/s, epoch=88, train_acc=100, train_loss=0.0347]                        
11it [00:00, 39.72it/s, epoch=89, train_acc=100, train_loss=0.0117]                        
11it [00:00, 36.03it/s, epoch=90, train_acc=100, train_loss=0.00831]                        
11it [00:00, 37.29it/s, epoch=91, train_acc=100, train_loss=0.031]            

In [39]:
# Test model on three datasets
net.load_state_dict(torch.load("checkpoint/resnet18.t7")['net'])

train_acc = val_epoch(net, trainloader)
val_acc = val_epoch(net, testloader)
cifar_acc = val_epoch(net, cifarloader)

print("Train Acc: ", train_acc)
print("Val   Acc: ", val_acc)
print("Cifar Acc: ", cifar_acc)

11it [00:00, 72.23it/s, test_acc=100, test_loss=5.23e-5]                        
158it [00:03, 51.93it/s, test_acc=24, test_loss=0.0695]                         
939it [00:17, 53.21it/s, test_acc=24.3, test_loss=0.0687]                         

Train Acc:  100.0
Val   Acc:  23.99
Cifar Acc:  24.346666666666668





__Question 3:__ Write a classification pipeline for $\mathcal{X}_{\text{train}}$, train from scratch and evaluate a *ResNet-18* architecture specific to CIFAR10 (details about the ImageNet model can be found here: https://arxiv.org/abs/1409.1556 ). Please report the accuracy obtained on the whole dataset as well as the reference paper/GitHub link.

*Hint:* You can re-use the following code: https://github.com/kuangliu/pytorch-cifar. During a training of 10 epochs, a batch size of 10 and a learning rate of 0.01, one obtains 40% accuracy on $\mathcal{X}_{\text{train}}$ (~2 minutes) and 20% accuracy on $\mathcal{X}_{\text{test}}$ (~5 minutes).

> We test the model on 3 datasets: the train dataset, the test dataset, and the results on shown as below:

| Model | Number of  epochs  | Train accuracy | Test accuracy | Accuracy on Full Data | Reference github|
|------|------|------|------|------|------|
|   ResNet18  | 100 | 100.0 | 24 | 24.35 | https://github.com/kuangliu/pytorch-cifar |

## VGG-like architectures

### Net Architecture

In [10]:
cfg = {
    'VGG11': [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
    'VGG13': [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
    'VGG16': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'],
    'VGG19': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512, 'M'],
}


class VGG(nn.Module):
    def __init__(self, vgg_name):
        super(VGG, self).__init__()
        self.features = self._make_layers(cfg[vgg_name])
        self.classifier = nn.Linear(512, 10)

    def forward(self, x):
        out = self.features(x)
        out = out.view(out.size(0), -1)
        out = self.classifier(out)
        return out

    def _make_layers(self, cfg):
        layers = []
        in_channels = 3
        for x in cfg:
            if x == 'M':
                layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
            else:
                layers += [nn.Conv2d(in_channels, x, kernel_size=3, padding=1),
                           nn.BatchNorm2d(x),
                           nn.ReLU(inplace=True)]
                in_channels = x
        layers += [nn.AvgPool2d(kernel_size=1, stride=1)]
        return nn.Sequential(*layers)

### Train and Evaluation

In [11]:
net = VGG("VGG11")
net.to(device)

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr = 0.005, momentum = 0.9, weight_decay = 5e-4)
best_acc = 0

for epoch in range(100):
    
    train_epoch(epoch, net, trainloader, optimizer, criterion)
    
    if epoch % 10 == 0:
        acc = val_epoch(net, testloader)
        
    if acc > best_acc:
        best_acc = acc
        update_model(net, best_acc, epoch, "vgg")

11it [00:00, 45.77it/s, epoch=1, train_acc=9, train_loss=24.3]                        
158it [00:01, 85.67it/s, test_acc=16.6, test_loss=0.0381]                         
Saving...
11it [00:00, 44.27it/s, epoch=2, train_acc=23, train_loss=21.2]                        
11it [00:00, 58.60it/s, epoch=3, train_acc=42, train_loss=17.7]                        
11it [00:00, 58.33it/s, epoch=4, train_acc=48, train_loss=14.4]                        
11it [00:00, 57.83it/s, epoch=5, train_acc=68, train_loss=10.4]                        
11it [00:00, 51.96it/s, epoch=6, train_acc=80, train_loss=6.6]                        
11it [00:00, 54.03it/s, epoch=7, train_acc=82, train_loss=5.88]                        
11it [00:00, 55.73it/s, epoch=8, train_acc=90, train_loss=4.03]                        
11it [00:00, 58.84it/s, epoch=9, train_acc=91, train_loss=3.18]                        
11it [00:00, 55.24it/s, epoch=10, train_acc=96, train_loss=2.09]                        
11it [00:00, 58.70it/s, epoc

158it [00:01, 85.49it/s, test_acc=27.5, test_loss=0.0734]                         
11it [00:00, 42.66it/s, epoch=82, train_acc=100, train_loss=0.00266]                        
11it [00:00, 58.05it/s, epoch=83, train_acc=100, train_loss=0.00489]                        
11it [00:00, 56.36it/s, epoch=84, train_acc=100, train_loss=0.00341]                        
11it [00:00, 59.51it/s, epoch=85, train_acc=100, train_loss=0.0068]                        
11it [00:00, 59.17it/s, epoch=86, train_acc=100, train_loss=0.00348]                        
11it [00:00, 60.00it/s, epoch=87, train_acc=100, train_loss=0.00566]                        
11it [00:00, 58.62it/s, epoch=88, train_acc=100, train_loss=0.00472]                        
11it [00:00, 51.72it/s, epoch=89, train_acc=100, train_loss=0.00371]                        
11it [00:00, 53.55it/s, epoch=90, train_acc=100, train_loss=0.00889]                        
11it [00:00, 58.07it/s, epoch=91, train_acc=100, train_loss=0.00546]             

In [12]:
net.load_state_dict(torch.load("checkpoint/vgg.t7")['net'])

train_acc = val_epoch(net, trainloader)
val_acc = val_epoch(net, testloader)
cifar_acc = val_epoch(net, cifarloader)

print("Train Acc: ", train_acc)
print("Val   Acc: ", val_acc)
print("Cifar Acc: ", cifar_acc)

11it [00:00, 107.33it/s, test_acc=100, test_loss=0.000121]                       
158it [00:01, 85.32it/s, test_acc=27.7, test_loss=0.0719]                         
939it [00:10, 87.91it/s, test_acc=27.6, test_loss=0.0715]                         

Train Acc:  100.0
Val   Acc:  27.67
Cifar Acc:  27.628333333333334





__Question 4:__ Same question as before, but with a *VGG*. Which model do you recommend?

> We test the model on 3 datasets: the train dataset, the test dataset, and the results on shown as below:

| Model | Number of  epochs  | Train accuracy | Test accuracy | Accuracy on Full Data | Reference github|
|------|------|------|------|------|------|
|   VGG11  | 100 | 100.0 | 27.67 | 27.63 | https://github.com/kuangliu/pytorch-cifar |

> Compared to ResNet18, VGG11 achieved a slighly better accuracy on both test dataset and full dataset. While ResNet18 is faster than VGG11. Thus if sufficient computational power is available, I recommand VGG, otherwise ResNet is a better choice.

# Transfer learning

We propose to use pre-trained models on a classification and generative task, in order to improve the results of our setting.

## ImageNet features

### Get Pretrained Model

In [15]:
def set_parameter_requires_grad(model, feature_extracting):
    if feature_extracting:
        for param in model.parameters():
            param.requires_grad = False

In [16]:
os.environ['CUDA_LAUNCH_BLOCKING'] = '1'
model = models.resnet18(pretrained = True)
set_parameter_requires_grad(model, True)
num_ftrs = model.fc.in_features
model.fc = nn.Linear(num_ftrs, 10)
model.to(device)

### Prepare Data

In [19]:
input_size = 224

transform_train = transforms.Compose([
    transforms.RandomResizedCrop(input_size),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
])

transform_test = transforms.Compose([
    transforms.Resize(input_size),
    transforms.CenterCrop(input_size),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
])

trainset = CIFAR10("../data", train = 1, max_num = 100, transform = transform_train)
testset = CIFAR10("../data", train = 0, transform = transform_test)
cifarset = CIFAR10("../data", train = -1, max_num = 60000, transform = transform_test)

trainloader = torch.utils.data.DataLoader(trainset, batch_size = 32, shuffle = True, num_workers = 1)
testloader = torch.utils.data.DataLoader(testset, batch_size = 128, shuffle = False, num_workers = 1)
cifarloader = torch.utils.data.DataLoader(cifarset, batch_size = 128, shuffle = False, num_workers = 1)

### Train and Evaluation

In [20]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr = 0.001)
best_acc = 0

for epoch in range(100):
    
    train_epoch(epoch, model, trainloader, optimizer, criterion)
    
    if epoch % 10 == 0:
        acc = val_epoch(model, testloader)
        
        if acc > best_acc:
            best_acc = acc
            update_model(model, best_acc, epoch, "pretrained_resnet18")

5it [00:00, 15.81it/s, epoch=1, train_acc=15, train_loss=9.37]                       
80it [00:15,  5.03it/s, test_acc=11.5, test_loss=0.0195]                        
Saving...
5it [00:00, 16.47it/s, epoch=2, train_acc=13, train_loss=9.03]                       
5it [00:00, 17.72it/s, epoch=3, train_acc=15, train_loss=9.72]                       
5it [00:00, 17.90it/s, epoch=4, train_acc=21, train_loss=8.55]                       
5it [00:00, 17.92it/s, epoch=5, train_acc=30, train_loss=8.59]                       
5it [00:00, 17.96it/s, epoch=6, train_acc=28, train_loss=8.13]                       
5it [00:00, 17.90it/s, epoch=7, train_acc=30, train_loss=7.65]                       
5it [00:00, 17.69it/s, epoch=8, train_acc=30, train_loss=7.88]                       
5it [00:00, 17.79it/s, epoch=9, train_acc=31, train_loss=7.68]                       
5it [00:00, 17.73it/s, epoch=10, train_acc=33, train_loss=7.55]                       
5it [00:00, 17.56it/s, epoch=11, train_acc=39, t

5it [00:00, 15.33it/s, epoch=86, train_acc=70, train_loss=4.26]                       
5it [00:00, 15.46it/s, epoch=87, train_acc=72, train_loss=4.26]                       
5it [00:00, 15.66it/s, epoch=88, train_acc=71, train_loss=3.72]                       
5it [00:00, 15.31it/s, epoch=89, train_acc=72, train_loss=4.01]                       
5it [00:00, 15.30it/s, epoch=90, train_acc=76, train_loss=4.27]                       
5it [00:00, 17.96it/s, epoch=91, train_acc=74, train_loss=3.85]                       
80it [00:16,  4.96it/s, test_acc=50.4, test_loss=0.0115]                        
Saving...
5it [00:00, 16.17it/s, epoch=92, train_acc=71, train_loss=4.18]                       
5it [00:00, 17.50it/s, epoch=93, train_acc=72, train_loss=3.33]                       
5it [00:00, 16.01it/s, epoch=94, train_acc=74, train_loss=4.09]                       
5it [00:00, 17.86it/s, epoch=95, train_acc=72, train_loss=3.53]                       
5it [00:00, 17.90it/s, epoch=96, train_

In [22]:
model.load_state_dict(torch.load("checkpoint/pretrained_resnet18.t7")['net'])

train_acc = val_epoch(model, trainloader)
val_acc = val_epoch(model, testloader)
cifar_acc = val_epoch(model, cifarloader)

print("Train Acc: ", train_acc)
print("Val   Acc: ", val_acc)
print("Cifar Acc: ", cifar_acc)

5it [00:00, 16.76it/s, test_acc=77, test_loss=0.0423]                       
80it [00:16,  4.94it/s, test_acc=50.4, test_loss=0.0115]                        
470it [01:33,  5.03it/s, test_acc=50.6, test_loss=0.0113]                         

Train Acc:  77.0
Val   Acc:  50.41
Cifar Acc:  50.61





Now, we will use some pre-trained models on ImageNet and see how well they compare on CIFAR. A list is available on: https://pytorch.org/docs/stable/torchvision/models.html.

__Question 5:__ Pick a model from the list above, adapt it for CIFAR and retrain its final layer (or a block of layers, depending on the resources to which you have access to). Report its accuracy.

> We test the model on 3 datasets: the train dataset, the test dataset, and the results on shown as below:

| Model | Number of  epochs  | Train accuracy | Test accuracy | Accuracy on Full Data | Reference github|
|------|------|------|------|------|------|
|   Pretrained ResNet18  | 100 | 77.0 | 50.4 | 50.61 | - |

## DCGan features

GANs correspond to an unsupervised technique for generating images. In https://arxiv.org/pdf/1511.06434.pdf, Sec. 5.1 shows that the representation obtained from the Discriminator has some nice generalization properties on CIFAR10.

### Pretrained Discriminator

In [23]:
class _netD(nn.Module):
    def __init__(self, nc=3, ndf=64):
        super(_netD, self).__init__()
        self.main = nn.Sequential(
            # input is (nc) x 64 x 64
            nn.Conv2d(nc, ndf, 4, 2, 1, bias=False),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf) x 32 x 32
            nn.Conv2d(ndf, ndf * 2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 2),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf*2) x 16 x 16
            nn.Conv2d(ndf * 2, ndf * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 4),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf*4) x 8 x 8
            nn.Conv2d(ndf * 4, ndf * 8, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 8),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf*8) x 4 x 4
            nn.Conv2d(ndf * 8, 1, 2, 2, 0, bias=False),
            nn.Sigmoid()
        )
        
    def get_blocks(self, indices):
        
        self.blocks = []
        
        for index in indices:
            self.blocks.append(list(self.children())[0][:index])
            
        self.max_layers = [nn.MaxPool2d(8, stride = 8),
                           nn.MaxPool2d(4, stride = 4),
                           nn.MaxPool2d(2, stride = 2)]

    def forward(self, input):
        batch_size = input.shape[0]
        output1 = self.max_layers[0](self.blocks[0](input)).view(batch_size, -1)
        output2 = self.max_layers[1](self.blocks[1](input)).view(batch_size, -1)
        output3 = self.max_layers[2](self.blocks[2](input)).view(batch_size, -1)
        output4 = self.blocks[3](input).view(batch_size, -1)
        
        output = torch.cat([output1, output2, output3, output4], dim = 1)

        return output

In [24]:
# Number of colours
NC = 3
# Number Des filter
NDF = 64

netD = _netD(NC, NDF)
netD.load_state_dict(torch.load("netD_epoch_199.pth"))
netD.get_blocks([2, 5, 8, 11])
netD.to(device)

_netD(
  (main): Sequential(
    (0): Conv2d(3, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (1): LeakyReLU(negative_slope=0.2, inplace)
    (2): Conv2d(64, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (3): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (4): LeakyReLU(negative_slope=0.2, inplace)
    (5): Conv2d(128, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (6): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (7): LeakyReLU(negative_slope=0.2, inplace)
    (8): Conv2d(256, 512, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (9): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (10): LeakyReLU(negative_slope=0.2, inplace)
    (11): Conv2d(512, 1, kernel_size=(2, 2), stride=(2, 2), bias=False)
    (12): Sigmoid()
  )
)

### Data Preparation

In [25]:
transform_test = transforms.Compose([
    transforms.Resize(64),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
])

trainset = CIFAR10("../data", train = 1, max_num = 100, transform = transform_test)
testset = CIFAR10("../data", train = 0, transform = transform_test)

trainloader = torch.utils.data.DataLoader(trainset, batch_size = 64, shuffle = True, num_workers = 1)
testloader = torch.utils.data.DataLoader(testset, batch_size = 64, shuffle = False, num_workers = 1)

### Feature Extraction

In [26]:
def extract_features(model, dataloader):
    
    X_features, X_targets = [], []
    
    for batch_idx, (inputs, targets) in enumerate(dataloader):
        inputs, targets = inputs.to(device), targets.to(device)
        outputs = model(inputs)
        X_features.append(outputs.data.cpu().numpy())
        X_targets.append(targets.data.cpu().numpy())
        
    return np.vstack(X_features), np.hstack(X_targets)

In [27]:
train_features, train_targets = extract_features(netD, trainloader)
test_features, test_targets = extract_features(netD, testloader)

### SVM

In [28]:
from sklearn.svm import SVC

clf = SVC(C = 10, gamma='scale')
clf.fit(train_features, train_targets)

SVC(C=10, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='scale', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

In [29]:
print("Train accuracy: ", clf.score(train_features, train_targets))
print("Test  accuracy: ", clf.score(test_features, test_targets))

Train accuracy:  1.0
Test  accuracy:  0.277


__Question 6:__  Using for instance a pretrained model from https://github.com/soumith/dcgan.torch combined with https://github.com/pytorch/examples/tree/master/dcgan, propose a model to train on $\mathcal{X}_{\text{train}}$. Train it and report its accuracy.

*Hint:* You can use the library: https://github.com/bshillingford/python-torchfile to load the weights of a model from torch(Lua) to pytorch(python).

> By using a pretrained discriminator of DCGAN, we extracted for each image a feature vector of dimension 15360. We trained a SVM on the first 100 images of CIFAR10 and the accuracies on train dataset and test dataset are reported as below:

| Model | Number of  epochs  | Train accuracy | Test accuracy | Reference github|
|------|------|------|------|------|
|   Pretrained DCGAN + SVM  | - | 1.0 | 27.7 | https://github.com/csinva/pytorch_gan_pretrained |

# Incorporating *a priori*
Geometrical *a priori* are appealing for image classification tasks. For now, we only consider linear transformations $\mathcal{T}$ of the inputs $x:\mathbb{S}^2\rightarrow\mathbb{R}$ where $\mathbb{S}$ is the support of an image, meaning that:

$$\forall u\in\mathbb{S}^2,\mathcal{T}(\lambda x+\mu y)(u)=\lambda \mathcal{T}(x)(u)+\mu \mathcal{T}(y)(u)\,.$$

For instance if an image had an infinite support, a translation $\mathcal{T}_a$ by $a$ would lead to:

$$\forall u, \mathcal{T}_a(x)(u)=x(u-a)\,.$$

Otherwise, one has to handle several boundary effects.

__Question 7:__ Explain the issues when dealing with translations, rotations, scaling effects, color changes on $32\times32$ images. Propose several ideas to tackle them.

> Since the images are small, any operator may lead to a large distortion and then the image does no longer belong to the corresponding class. For example, if we rotate the image by 90 degrees, half of the pixels are missing thus we get an image with 256 pixel values, which is far from enough to provide enough information.

> To overcome this issue, we can try following methods:
> * Define a range of distortion for each operation
> * Process the image in frequency domain
> * Reject training examples with low image quality

## Data augmentations

In [41]:
transform_train = transforms.Compose([
    transforms.Resize(36),
    transforms.RandomCrop(32, padding = 4),
    transforms.ColorJitter(brightness=0.4, contrast=0.4, saturation=0.4, hue=0.1),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])

transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])

trainset = CIFAR10("../data", train = 1, max_num = 100, transform = transform_train)
testset = CIFAR10("../data", train = 0, transform = transform_test)
cifarset = CIFAR10("../data", train = -1, max_num = 60000, transform = transform_test)

trainloader = torch.utils.data.DataLoader(trainset, batch_size = 10, shuffle = True, num_workers = 1)
testloader = torch.utils.data.DataLoader(testset, batch_size = 64, shuffle = False, num_workers = 1)
cifarloader = torch.utils.data.DataLoader(cifarset, batch_size = 64, shuffle = False, num_workers = 1)

### ResNet18

In [42]:
net = ResNet18()
net.to(device)

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr = 0.01, momentum = 0.9, weight_decay = 5e-4)
best_acc = 0

for epoch in range(100):
    
    train_epoch(epoch, net, trainloader, optimizer, criterion)
    
    if epoch % 10 == 0:
        acc = val_epoch(net, testloader)
        
    if acc > best_acc:
        best_acc = acc
        update_model(net, best_acc, epoch, "resnet18")

11it [00:00, 34.32it/s, epoch=1, train_acc=13, train_loss=23.7]                        
158it [00:03, 50.99it/s, test_acc=9.92, test_loss=0.0395]                         
Saving...
11it [00:00, 35.37it/s, epoch=2, train_acc=16, train_loss=23.5]                        
11it [00:00, 38.46it/s, epoch=3, train_acc=19, train_loss=23.5]                        
11it [00:00, 38.92it/s, epoch=4, train_acc=16, train_loss=23.2]                        
11it [00:00, 38.15it/s, epoch=5, train_acc=23, train_loss=21.6]                        
11it [00:00, 38.95it/s, epoch=6, train_acc=16, train_loss=22.8]                        
11it [00:00, 38.90it/s, epoch=7, train_acc=17, train_loss=22.4]                        
11it [00:00, 38.89it/s, epoch=8, train_acc=24, train_loss=21.6]                        
11it [00:00, 37.99it/s, epoch=9, train_acc=18, train_loss=22.6]                        
11it [00:00, 38.63it/s, epoch=10, train_acc=16, train_loss=21.5]                        
11it [00:00, 39.01it/s, ep

11it [00:00, 38.78it/s, epoch=84, train_acc=58, train_loss=11.3]                        
11it [00:00, 39.08it/s, epoch=85, train_acc=60, train_loss=10.8]                        
11it [00:00, 38.96it/s, epoch=86, train_acc=58, train_loss=10.6]                        
11it [00:00, 36.77it/s, epoch=87, train_acc=56, train_loss=11.8]                        
11it [00:00, 38.47it/s, epoch=88, train_acc=63, train_loss=10.2]                        
11it [00:00, 38.90it/s, epoch=89, train_acc=63, train_loss=11.4]                        
11it [00:00, 38.34it/s, epoch=90, train_acc=60, train_loss=12.5]                        
11it [00:00, 38.05it/s, epoch=91, train_acc=57, train_loss=10.6]                        
158it [00:03, 50.53it/s, test_acc=25.1, test_loss=0.0466]                         
11it [00:00, 34.72it/s, epoch=92, train_acc=55, train_loss=11.7]                        
11it [00:00, 26.12it/s, epoch=93, train_acc=72, train_loss=8.79]                        
11it [00:00, 37.53it/s, epo

In [43]:
net.load_state_dict(torch.load("checkpoint/resnet18.t7")['net'])

train_acc = val_epoch(net, trainloader)
val_acc = val_epoch(net, testloader)
cifar_acc = val_epoch(net, cifarloader)

print("Train Acc: ", train_acc)
print("Val   Acc: ", val_acc)
print("Cifar Acc: ", cifar_acc)

11it [00:00, 63.41it/s, test_acc=43, test_loss=0.143]                        
158it [00:03, 49.86it/s, test_acc=25.5, test_loss=0.0427]                         
939it [00:17, 52.50it/s, test_acc=25.5, test_loss=0.0425]                         

Train Acc:  43.0
Val   Acc:  25.52
Cifar Acc:  25.541666666666668





### VGG

In [49]:
net = VGG("VGG11")
net.to(device)

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr = 0.005, momentum = 0.9, weight_decay = 5e-4)

best_acc = 0

for epoch in range(100):
    
    train_epoch(epoch, net, trainloader, optimizer, criterion)
    
    if epoch % 10 == 0:
        acc = val_epoch(net, testloader)
        
    if acc > best_acc:
        best_acc = acc
        update_model(net, best_acc, epoch, "vgg")

11it [00:00, 43.92it/s, epoch=1, train_acc=6, train_loss=24.6]                        
158it [00:01, 86.06it/s, test_acc=12.7, test_loss=0.0392]                         
Saving...
11it [00:00, 32.74it/s, epoch=2, train_acc=12, train_loss=23.6]                        
11it [00:00, 35.03it/s, epoch=3, train_acc=22, train_loss=24.5]                        
11it [00:00, 43.31it/s, epoch=4, train_acc=11, train_loss=27.4]                        
11it [00:00, 42.87it/s, epoch=5, train_acc=22, train_loss=22.9]                        
11it [00:00, 42.96it/s, epoch=6, train_acc=16, train_loss=22.7]                        
11it [00:00, 43.02it/s, epoch=7, train_acc=20, train_loss=23.4]                        
11it [00:00, 42.38it/s, epoch=8, train_acc=24, train_loss=22.5]                        
11it [00:00, 43.16it/s, epoch=9, train_acc=22, train_loss=21.9]                        
11it [00:00, 43.40it/s, epoch=10, train_acc=25, train_loss=20.9]                        
11it [00:00, 43.06it/s, epo

11it [00:00, 33.84it/s, epoch=84, train_acc=83, train_loss=6.15]                        
11it [00:00, 43.28it/s, epoch=85, train_acc=71, train_loss=6.65]                        
11it [00:00, 43.16it/s, epoch=86, train_acc=73, train_loss=9.53]                        
11it [00:00, 43.03it/s, epoch=87, train_acc=69, train_loss=8.91]                        
11it [00:00, 42.69it/s, epoch=88, train_acc=68, train_loss=8.4]                        
11it [00:00, 44.05it/s, epoch=89, train_acc=75, train_loss=7.29]                        
11it [00:00, 35.37it/s, epoch=90, train_acc=76, train_loss=6.66]                        
11it [00:00, 43.19it/s, epoch=91, train_acc=72, train_loss=6.89]                        
158it [00:01, 84.62it/s, test_acc=24.8, test_loss=0.0567]                         
11it [00:00, 24.65it/s, epoch=92, train_acc=76, train_loss=7.07]                        
11it [00:00, 43.37it/s, epoch=93, train_acc=75, train_loss=6.69]                        
11it [00:00, 43.73it/s, epoc

In [50]:
net.load_state_dict(torch.load("checkpoint/vgg.t7")['net'])

train_acc = val_epoch(net, trainloader)
val_acc = val_epoch(net, testloader)
cifar_acc = val_epoch(net, cifarloader)

print("Train Acc: ", train_acc)
print("Val   Acc: ", val_acc)
print("Cifar Acc: ", cifar_acc)

11it [00:00, 61.63it/s, test_acc=79, test_loss=0.0586]                        
158it [00:01, 83.08it/s, test_acc=27.5, test_loss=0.0483]                         
939it [00:10, 88.36it/s, test_acc=27.3, test_loss=0.0481]                         

Train Acc:  79.0
Val   Acc:  27.49
Cifar Acc:  27.255





__Question 8:__ Propose a set of geometric transformation beyond translation, and incorporate them in your training pipeline. Train the model of the __Question 3__ and __Question 4__ with them and report the accuracies.

> We use following transformation methods: horizental flip, rotation, random crop, rescaling, color adjusting. Then the models in Q3 and Q4 are retrained.

| Model | Number of  epochs  | Train accuracy | Test accuracy | Accuracy on Full Data | Reference github|
|------|------|------|------|------|------|
|   ResNet18  | 100 | 43.0 | 25.52 | 25.54 | https://github.com/kuangliu/pytorch-cifar |
|   VGG11  | 100 | 79.0 | 27.49 | 27.26 | https://github.com/kuangliu/pytorch-cifar |

## Wavelets

__Question 9:__ Use a Scattering Transform as an input to a ResNet-like architecture. You can find a baseline here: https://arxiv.org/pdf/1703.08961.pdf.

*Hint:* You can use the following package: https://www.kymat.io/

In [51]:
from kymatio import Scattering2D

### Modified Network

In [52]:
class ResNet_transform(nn.Module):
    def __init__(self, block, num_blocks, k, num_classes=10):
        super(ResNet_transform, self).__init__()
        self.in_planes = 64
        self.k = k
        
        self.bn0 = nn.BatchNorm2d(self.k)
        self.conv1 = nn.Conv2d(self.k, 64, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.layer1 = self._make_layer(block, 64, num_blocks[0], stride=1)
        self.layer2 = self._make_layer(block, 128, num_blocks[1], stride=2)
        self.layer3 = self._make_layer(block, 256, num_blocks[2], stride=2)
        self.layer4 = self._make_layer(block, 512, num_blocks[3], stride=2)
        self.linear = nn.Linear(512 * block.expansion, num_classes)

    def _make_layer(self, block, planes, num_blocks, stride):
        strides = [stride] + [1]*(num_blocks-1)
        layers = []
        for stride in strides:
            layers.append(block(self.in_planes, planes, stride))
            self.in_planes = planes * block.expansion
        return nn.Sequential(*layers)

    def forward(self, x):
        out = self.bn0(x.view(-1, self.k, 8, 8))
        out = F.relu(self.bn1(self.conv1(out)))
        out = self.layer1(out)
        out = self.layer2(out)
        out = self.layer3(out)
        out = self.layer4(out)
        out = F.avg_pool2d(out, 4)
        out = out.view(out.size(0), -1)
        out = self.linear(out)
        return out


def ResNet18_transform(k):
    return ResNet_transform(BasicBlock, [2, 2, 2, 2], k)

### Modified Training and Evaluation

In [53]:
def train_epoch_transform(epoch, net, trainloader, optimizer, criterion, scattering):
    
    train_loss = 0
    correct = 0
    total = 0
    
    
    with trange(len(trainloader), file=sys.stderr) as t:
        net.train()
        for batch_idx, (inputs, targets) in enumerate(trainloader):
            inputs, targets = inputs.to(device), targets.to(device)
            optimizer.zero_grad()
            outputs = net(scattering(inputs))
            loss = criterion(outputs, targets)
            loss.backward()
            optimizer.step()
            
            train_loss += loss.item()
            _, predicted = outputs.max(1)
            total += targets.size(0)
            correct += predicted.eq(targets).sum().item()
            t.update()

        t.set_postfix(train_loss = train_loss, train_acc = correct * 100. / total, epoch = epoch + 1)
        t.update()
        
def val_epoch_transform(net, testloader, scattering):
    
    test_loss = 0
    correct = 0
    total = 0
        
    with trange(len(testloader), file=sys.stderr) as t:
        with torch.no_grad():
            for batch_idx, (inputs, targets) in enumerate(testloader):
                inputs, targets = inputs.to(device), targets.to(device)
                outputs = net(scattering(inputs))
                loss = criterion(outputs, targets)

                test_loss += loss.item()
                _, predicted = outputs.max(1)
                total += targets.size(0)
                correct += predicted.eq(targets).sum().item()
                t.update()

        t.set_postfix(test_loss = test_loss / total, test_acc = correct * 100. / total)
        t.update(1)
            
    return correct * 100. / total

### Prepare Data

In [54]:
transform_train = transforms.Compose([
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])

transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])

trainset = CIFAR10("../data", train = 1, max_num = 100, transform = transform_train)
testset = CIFAR10("../data", train = 0, transform = transform_test)
cifarset = CIFAR10("../data", train = -1, max_num = 60000, transform = transform_test)

trainloader = torch.utils.data.DataLoader(trainset, batch_size = 10, shuffle = True, num_workers = 1)
testloader = torch.utils.data.DataLoader(testset, batch_size = 64, shuffle = False, num_workers = 1)
cifarloader = torch.utils.data.DataLoader(cifarset, batch_size = 64, shuffle = False, num_workers = 1)

### Training

In [55]:
scattering = Scattering2D(J = 2, shape = (32, 32))
K = 81 * 3

net = ResNet18_transform(K)
net.to(device)

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr = 0.1, momentum = 0.9, weight_decay = 5e-4)

if device == 'cuda':
    scattering = scattering.cuda()
    
for epoch in range(100):
    
    train_epoch_transform(epoch, net, trainloader, optimizer, criterion, scattering)
    
    if epoch % 10 == 0:
        acc = val_epoch_transform(net, testloader, scattering)
        
    if acc > best_acc:
        best_acc = acc
        update_model(net, best_acc, epoch, "resnet18_scattering")

11it [00:00, 11.60it/s, epoch=1, train_acc=13, train_loss=42.7]                        
158it [00:11, 13.67it/s, test_acc=12.9, test_loss=0.0671]                         
11it [00:00, 12.08it/s, epoch=2, train_acc=18, train_loss=64.3]                        
11it [00:00, 12.06it/s, epoch=3, train_acc=6, train_loss=48.9]                        
11it [00:00, 11.93it/s, epoch=4, train_acc=18, train_loss=29.4]                        
11it [00:00, 11.88it/s, epoch=5, train_acc=12, train_loss=26.7]                        
11it [00:00, 12.05it/s, epoch=6, train_acc=12, train_loss=23.4]                        
11it [00:00, 11.98it/s, epoch=7, train_acc=20, train_loss=22.5]                        
11it [00:00, 11.89it/s, epoch=8, train_acc=18, train_loss=21.3]                        
11it [00:00, 11.99it/s, epoch=9, train_acc=26, train_loss=21.2]                        
11it [00:00, 11.34it/s, epoch=10, train_acc=18, train_loss=21.6]                        
11it [00:00, 11.14it/s, epoch=11, tra

11it [00:00, 11.80it/s, epoch=84, train_acc=82, train_loss=6.22]                        
11it [00:00, 11.75it/s, epoch=85, train_acc=77, train_loss=7.02]                        
11it [00:00, 12.05it/s, epoch=86, train_acc=78, train_loss=7.11]                        
11it [00:00, 11.88it/s, epoch=87, train_acc=83, train_loss=5.27]                        
11it [00:00, 11.87it/s, epoch=88, train_acc=87, train_loss=4.41]                        
11it [00:00, 12.03it/s, epoch=89, train_acc=81, train_loss=5.77]                        
11it [00:00, 12.04it/s, epoch=90, train_acc=90, train_loss=2.91]                        
11it [00:00, 12.09it/s, epoch=91, train_acc=87, train_loss=2.66]                        
158it [00:11, 13.57it/s, test_acc=27.7, test_loss=0.0782]                         
11it [00:00, 11.61it/s, epoch=92, train_acc=88, train_loss=3.44]                        
11it [00:00, 12.07it/s, epoch=93, train_acc=81, train_loss=6.98]                        
11it [00:00, 11.43it/s, epo

In [56]:
net.load_state_dict(torch.load("checkpoint/resnet18_scattering.t7")['net'])

train_acc = val_epoch_transform(net, trainloader, scattering)
val_acc = val_epoch_transform(net, testloader, scattering)
cifar_acc = val_epoch_transform(net, cifarloader, scattering)

print("Train Acc: ", train_acc)
print("Val   Acc: ", val_acc)
print("Cifar Acc: ", cifar_acc)

11it [00:00, 13.14it/s, test_acc=89, test_loss=0.0433]                        
158it [00:11, 13.74it/s, test_acc=29.4, test_loss=0.0609]                         
939it [01:08, 14.86it/s, test_acc=29, test_loss=0.0608]                         

Train Acc:  89.0
Val   Acc:  29.43
Cifar Acc:  28.961666666666666





| Model | Number of  epochs  | Train accuracy | Test accuracy | Accuracy on Full Data | Reference github|
|------|------|------|------|------|------|
|  ResNet18 with Scaling  | 100 | 89 | 29.4 | 28.96 | https://www.kymat.io/gallery_2d/cifar.html#sphx-glr-gallery-2d-cifar-py |

# Weak supervision

Weakly supervised techniques permit to tackle the issue of labeled data. An introduction to those techniques can be found here: https://hazyresearch.github.io/snorkel/blog/ws_blog_post.html.

### Useful Functions

In [66]:
def log_sum_exp(x, axis = 1):
    m = torch.max(x, dim = 1)[0]
    return m + torch.log(torch.sum(torch.exp(x - m.unsqueeze(1)), dim = axis))

class LinearWeightNorm(torch.nn.Module):
    def __init__(self, in_features, out_features, bias=True, weight_scale=None, weight_init_stdv=0.1):
        super(LinearWeightNorm, self).__init__()
        self.in_features = in_features
        self.out_features = out_features
        self.weight = nn.Parameter(torch.randn(out_features, in_features) * weight_init_stdv)
        if bias:
            self.bias = nn.Parameter(torch.zeros(out_features))
        else:
            self.register_parameter('bias', None)
        if weight_scale is not None:
            assert type(weight_scale) == int
            self.weight_scale = nn.Parameter(torch.ones(out_features, 1) * weight_scale)
        else:
            self.weight_scale = 1 
    def forward(self, x):
        W = self.weight * self.weight_scale / torch.sqrt(torch.sum(self.weight ** 2, dim = 1, keepdim = True))
        return F.linear(x, W, self.bias)
    def __repr__(self):
        return self.__class__.__name__ + '(' \
            + 'in_features=' + str(self.in_features) \
            + ', out_features=' + str(self.out_features) \
            + ', weight_scale=' + str(self.weight_scale) + ')'

### Net Structure ----- GAN based Model

In [116]:
class ImprovedGAN(object):
    
    def __init__(self, G, D, labeled, unlabeled, test, savedir = 'checkpoint/', batchsize = 64, 
                 lr = 0.001, momentum = 0.5, log_interval = 500, unlabel_weight = 0.1, eval_interval = 10):
        
        self.G = G
        self.D = D
        torch.save(self.G, os.path.join(savedir, 'G.pkl'))
        torch.save(self.D, os.path.join(savedir, 'D.pkl'))
        
        self.savedir = savedir
        self.writer = tensorboardX.SummaryWriter(log_dir = savedir)
        
        self.G.to(device)
        self.D.to(device)
        
        self.labeled = labeled
        self.unlabeled = unlabeled
        self.test = test
        self.log_interval = log_interval
        self.unlabel_weight = unlabel_weight
        self.eval_interval = eval_interval
        
        self.Doptim = optim.Adam(self.D.parameters(), lr=lr, betas= (momentum, 0.999))
        self.Goptim = optim.Adam(self.G.parameters(), lr=lr, betas = (momentum,0.999))
        
        self.use_cuda = True if device == 'cuda' else False
        self.batch_size = batchsize
        
        
    def trainD(self, x_label, y, x_unlabel):
        
        x_label, x_unlabel, y = Variable(x_label), Variable(x_unlabel), Variable(y, requires_grad = False)
        
        if self.use_cuda:
            x_label, x_unlabel, y = x_label.cuda(), x_unlabel.cuda(), y.cuda()
            
        output_label = self.D(x_label, cuda = self.use_cuda)
        output_unlabel = self.D(x_unlabel, cuda = self.use_cuda)
        output_fake = self.D(self.G(x_unlabel.size()[0], self.use_cuda).view(x_unlabel.size()).detach(), cuda = self.use_cuda)
        
        logz_label = log_sum_exp(output_label)
        logz_unlabel = log_sum_exp(output_unlabel)
        logz_fake = log_sum_exp(output_fake) # log ∑e^x_i

        prob_label = torch.gather(output_label, 1, y.unsqueeze(1)) # log e^x_label = x_label 
        loss_supervised = -torch.mean(prob_label) + torch.mean(logz_label)
        loss_unsupervised = 0.5 * (-torch.mean(logz_unlabel) + torch.mean(F.softplus(logz_unlabel))  + # real_data: log Z/(1+Z)
                            torch.mean(F.softplus(logz_fake)) ) # fake_data: log 1/(1+Z)
        loss = loss_supervised + self.unlabel_weight * loss_unsupervised
        acc = torch.mean((output_label.max(1)[1] == y).float())
        self.Doptim.zero_grad()
        loss.backward()
        self.Doptim.step()
        
        return loss_supervised.data.cpu().numpy(), loss_unsupervised.data.cpu().numpy(), acc
    
    def trainG(self, x_unlabel):
        fake = self.G(x_unlabel.size()[0], self.use_cuda).view(x_unlabel.size())
        #fake.retain_grad()
        mom_gen, output_fake = self.D(fake, feature=True, cuda = self.use_cuda)
        mom_unlabel, _ = self.D(Variable(x_unlabel), feature=True, cuda = self.use_cuda)
        mom_gen = torch.mean(mom_gen, dim = 0)
        mom_unlabel = torch.mean(mom_unlabel, dim = 0)
        loss_fm = torch.mean((mom_gen - mom_unlabel) ** 2)
        #loss_adv = -torch.mean(F.softplus(log_sum_exp(output_fake)))
        loss = loss_fm #+ 1. * loss_adv        
        self.Goptim.zero_grad()
        self.Doptim.zero_grad()
        loss.backward()
        self.Goptim.step()
        return loss.data.cpu().numpy()

    def train(self, epochs):
        assert self.unlabeled.__len__() > self.labeled.__len__()
        assert type(self.labeled) == TensorDataset
        times = int(np.ceil(self.unlabeled.__len__() * 1. / self.labeled.__len__()))
        t1 = self.labeled.tensors[0].clone()
        t2 = self.labeled.tensors[1].clone()
        tile_labeled = TensorDataset(t1.repeat(times, 1, 1, 1),t2.repeat(times))
        gn = 0
        for epoch in range(epochs):
            self.G.train()
            self.D.train()
            unlabel_loader1 = DataLoader(self.unlabeled, batch_size = self.batch_size, shuffle=True, drop_last=True, num_workers = 4)
            unlabel_loader2 = DataLoader(self.unlabeled, batch_size = self.batch_size, shuffle=True, drop_last=True, num_workers = 4).__iter__()
            label_loader = DataLoader(tile_labeled, batch_size = self.batch_size, shuffle=True, drop_last=True, num_workers = 4).__iter__()
            loss_supervised = loss_unsupervised = loss_gen = accuracy = 0.
            batch_num = 0
            for (unlabel1, _label1) in unlabel_loader1:
                #pdb.set_trace()
                batch_num += 1
                unlabel2, _label2 = unlabel_loader2.next()
                x, y = label_loader.next()
                if self.use_cuda:
                    x, y, unlabel1, unlabel2 = x.cuda(), y.cuda(), unlabel1.cuda(), unlabel2.cuda()
                ll, lu, acc = self.trainD(x, y, unlabel1)
                loss_supervised += ll
                loss_unsupervised += lu
                accuracy += acc
                lg = self.trainG(unlabel2)
                if epoch > 1 and lg > 1:
                    #pdb.set_trace()
                    lg = self.trainG(unlabel2)
                loss_gen += lg
                if (batch_num + 1) % self.log_interval == 0:
                    #print('Training: %d / %d' % (batch_num + 1, len(unlabel_loader1)))
                    gn += 1
                    self.writer.add_scalars('loss', {'loss_supervised':ll, 'loss_unsupervised':lu, 'loss_gen':lg}, gn)
                    self.writer.add_histogram('real_feature', self.D(Variable(x, volatile = True), cuda=self.use_cuda, feature = True)[0], gn)
                    self.writer.add_histogram('fake_feature', self.D(self.G(self.batch_size, cuda = self.use_cuda), cuda=self.use_cuda, feature = True)[0], gn)
                    self.writer.add_histogram('fc3_bias', self.G.fc3.bias, gn)
                    self.writer.add_histogram('D_feature_weight', self.D.layers[-1].weight, gn)
                    #self.writer.add_histogram('D_feature_bias', self.D.layers[-1].bias, gn)
                    #print('Eval: correct %d/%d, %.4f' % (self.eval(), self.test.__len__(), acc))
                    self.D.train()
                    self.G.train()
            loss_supervised /= batch_num
            loss_unsupervised /= batch_num
            loss_gen /= batch_num
            accuracy /= batch_num
            print("Iteration %d, loss_supervised = %.4f, loss_unsupervised = %.4f, loss_gen = %.4f train acc = %.4f" % (epoch, loss_supervised, loss_unsupervised, loss_gen, accuracy))
            sys.stdout.flush()
            if (epoch + 1) % self.eval_interval == 0:
                print("Eval: correct %d / %d"  % (self.eval(), self.test.__len__()))
                torch.save(self.G, os.path.join(self.savedir, 'G.pkl'))
                torch.save(self.D, os.path.join(self.savedir, 'D.pkl'))
                

    def predict(self, x):
        return torch.max(self.D(Variable(x, volatile=True), cuda=self.use_cuda), 1)[1].data
    
    def eval(self):
        self.G.eval()
        self.D.eval()
        d, l = [], []
        for (datum, label) in self.test:
            d.append(datum)
            l.append(label)
        x, y = torch.stack(d), torch.LongTensor(l)
        if self.use_cuda:
            x, y = x.cuda(), y.cuda()
        pred = self.predict(x)
        return torch.sum(pred == y)
    
    def draw(self, batch_size):
        self.G.eval()
        return self.G(batch_size, cuda=self.use_cuda)

In [109]:
class Discriminator(nn.Module):
    def __init__(self, input_dim = 3 * 32 ** 2, output_dim = 10):
        super(Discriminator, self).__init__()
        self.input_dim = input_dim
        self.layers = torch.nn.ModuleList([
            LinearWeightNorm(input_dim, 1000),
            LinearWeightNorm(1000, 500),
            LinearWeightNorm(500, 250),
            LinearWeightNorm(250, 250),
            LinearWeightNorm(250, 250)]
        )
        self.final = LinearWeightNorm(250, output_dim, weight_scale=1)
        #for layer in self.layers:
        #    reset_normal_param(layer, 0.1)
        #reset_normal_param(self.final, 0.1, 5)
    def forward(self, x, feature = False, cuda = False):
        x = x.view(-1, self.input_dim)
        noise = torch.randn(x.size()) * 0.3 if self.training else torch.Tensor([0])
        if cuda:
            noise = noise.cuda()
        x = x + Variable(noise, requires_grad = False)
        for i in range(len(self.layers)):
            m = self.layers[i]
            x_f = F.relu(m(x))
            noise = torch.randn(x_f.size()) * 0.5 if self.training else torch.Tensor([0])
            if cuda:
                noise = noise.cuda()
            x = (x_f + Variable(noise, requires_grad = False))
        if feature:
            return x_f, self.final(x)
        return self.final(x)
    
class Generator(nn.Module):
    def __init__(self, z_dim, output_dim = 3 * 32 ** 2):
        super(Generator, self).__init__()
        self.z_dim = z_dim
        self.fc1 = nn.Linear(z_dim, 500, bias = False)
        self.bn1 = nn.BatchNorm1d(500, affine = False, eps=1e-6, momentum = 0.5)
        self.fc2 = nn.Linear(500, 500, bias = False)
        self.bn2 = nn.BatchNorm1d(500, affine = False, eps=1e-6, momentum = 0.5)
        self.fc3 = LinearWeightNorm(500, output_dim, weight_scale = 1)
        self.bn1_b = nn.Parameter(torch.zeros(500))
        self.bn2_b = nn.Parameter(torch.zeros(500))
        nn.init.xavier_uniform_(self.fc1.weight)
        nn.init.xavier_uniform_(self.fc2.weight)
        #reset_normal_param(self.fc1, 0.1)
        #reset_normal_param(self.fc2, 0.1)
        #reset_normal_param(self.fc3, 0.1)
    def forward(self, batch_size, cuda = False):
        x = Variable(torch.rand(batch_size, self.z_dim), requires_grad = False, volatile = not self.training)
        if cuda:
            x = x.cuda()
        x = F.softplus(self.bn1(self.fc1(x)) + self.bn1_b)
        x = F.softplus(self.bn2(self.fc2(x)) + self.bn2_b)
        x = F.softplus(self.fc3(x))
        return x

### Prepare Data

In [112]:
transform_train = transforms.Compose([
    transforms.Resize(36),
    transforms.RandomCrop(32, padding = 4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
#    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])

transform_test = transforms.Compose([
    transforms.ToTensor(),
#    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])

In [113]:
def CIFAO10Label(transformer, class_num = 10):
    raw_dataset = CIFAR10("../data", train = 1, max_num = 100, transform = transformer)
    class_tot = [0] * 10
    data = []
    labels = []
    positive_tot = 0
    tot = 0
    perm = np.random.permutation(raw_dataset.__len__())
    for i in range(raw_dataset.__len__()):
        datum, label = raw_dataset.__getitem__(perm[i])
        if class_tot[label] < class_num:
            data.append(datum.numpy())
            labels.append(label)
            class_tot[label] += 1
            tot += 1
            if tot >= 10 * class_num:
                break
    return TensorDataset(torch.FloatTensor(np.array(data)), torch.LongTensor(np.array(labels)))

labelset = CIFAO10Label(transformer = transform_train)
unlabelset = CIFAR10("../data", train = -1, max_num = 50000, transform = transform_train)
testset = CIFAR10("../data", train = 0, transform = transform_test)

In [117]:
gan = ImprovedGAN(Generator(200), Discriminator(), labelset, unlabelset, testset, eval_interval = 5)

In [118]:
gan.train(100)



Iteration 0, loss_supervised = 0.4802, loss_unsupervised = 0.6779, loss_gen = 0.2459 train acc = 0.8520
Iteration 1, loss_supervised = 0.0313, loss_unsupervised = 0.5754, loss_gen = 0.1850 train acc = 0.9907
Iteration 2, loss_supervised = 0.0053, loss_unsupervised = 0.5510, loss_gen = 0.1815 train acc = 0.9987
Iteration 3, loss_supervised = 0.1075, loss_unsupervised = 0.6064, loss_gen = 0.1511 train acc = 0.9695
Iteration 4, loss_supervised = 0.0033, loss_unsupervised = 0.5957, loss_gen = 0.1366 train acc = 0.9997
Eval: correct 1536 / 10000




Iteration 5, loss_supervised = 0.1382, loss_unsupervised = 0.6483, loss_gen = 0.1351 train acc = 0.9683
Iteration 6, loss_supervised = 0.0027, loss_unsupervised = 0.5964, loss_gen = 0.1584 train acc = 0.9998
Iteration 7, loss_supervised = 0.0119, loss_unsupervised = 0.5984, loss_gen = 0.1369 train acc = 0.9971
Iteration 8, loss_supervised = 0.0023, loss_unsupervised = 0.5725, loss_gen = 0.1706 train acc = 0.9996
Iteration 9, loss_supervised = 0.0010, loss_unsupervised = 0.5666, loss_gen = 0.1811 train acc = 1.0000
Eval: correct 1501 / 10000
Iteration 10, loss_supervised = 0.0074, loss_unsupervised = 0.5666, loss_gen = 0.1619 train acc = 0.9981
Iteration 11, loss_supervised = 0.0308, loss_unsupervised = 0.5872, loss_gen = 0.1793 train acc = 0.9920
Iteration 12, loss_supervised = 0.0006, loss_unsupervised = 0.5544, loss_gen = 0.2052 train acc = 1.0000
Iteration 13, loss_supervised = 0.0750, loss_unsupervised = 0.5994, loss_gen = 0.2482 train acc = 0.9804
Iteration 14, loss_supervised = 0

Eval: correct 1379 / 10000
Iteration 80, loss_supervised = 0.0001, loss_unsupervised = 0.5588, loss_gen = 1.2184 train acc = 1.0000
Iteration 81, loss_supervised = 0.0001, loss_unsupervised = 0.5609, loss_gen = 1.1777 train acc = 1.0000
Iteration 82, loss_supervised = 0.0179, loss_unsupervised = 0.5838, loss_gen = 1.0740 train acc = 0.9950
Iteration 83, loss_supervised = 0.0001, loss_unsupervised = 0.5714, loss_gen = 1.1056 train acc = 1.0000
Iteration 84, loss_supervised = 0.0001, loss_unsupervised = 0.5621, loss_gen = 1.2377 train acc = 1.0000
Eval: correct 1356 / 10000
Iteration 85, loss_supervised = 0.0001, loss_unsupervised = 0.5560, loss_gen = 1.2378 train acc = 1.0000
Iteration 86, loss_supervised = 0.0001, loss_unsupervised = 0.5628, loss_gen = 1.1886 train acc = 1.0000
Iteration 87, loss_supervised = 0.0001, loss_unsupervised = 0.5615, loss_gen = 1.1944 train acc = 1.0000
Iteration 88, loss_supervised = 0.0001, loss_unsupervised = 0.5633, loss_gen = 1.1822 train acc = 1.0000
I

### Test on whole dataset

In [125]:
cifar_set = CIFAR10("../data", train = -1, max_num = 60000, transform = transform_test)
cifarloader = torch.utils.data.DataLoader(cifar_set, batch_size = 64, shuffle = False, num_workers = 1)

In [136]:
def val_epoch_gan(net, testloader, use_cuda):
    
    test_loss = 0
    correct = 0
    total = 0
        
    with trange(len(testloader), file=sys.stderr) as t:
        with torch.no_grad():
            for batch_idx, (inputs, targets) in enumerate(testloader):
                inputs, targets = inputs.to(device), targets.to(device)
                outputs = net(inputs, cuda=use_cuda)
                loss = criterion(outputs, targets)

                test_loss += loss.item()
                _, predicted = outputs.max(1)
                total += targets.size(0)
                correct += predicted.eq(targets).sum().item()
                t.update()

        t.set_postfix(test_loss = test_loss / total, test_acc = correct * 100. / total)
        t.update(1)
            
    return correct * 100. / total

In [137]:
criterion = nn.CrossEntropyLoss()
test_score = val_epoch_gan(gan.D, cifarloader, gan.use_cuda)

939it [00:07, 123.11it/s, test_acc=15.4, test_loss=0.0678]                         


__(Open) Question 10:__ Pick a weakly supervised method that will potentially use $\mathcal{X}\cup\mathcal{X}_{\text{train}}$ to train a representation (a subset of $\mathcal{X}$ is also fine). Evaluate it and report the accuracies. You should be careful in the choice of your method, in order to avoid heavy computational effort.

> We trained a GAN model taking as input both labeled images and unlabeled images and the results on shown as below. One of the disadvantage of the model is that both the discriminator and the generator do not contain convolutional layers.

| Model | Number of  epochs  | Train accuracy | Test accuracy | Accuracy on Full Data | Reference github|
|------|------|------|------|------|------|
|  Weak Supervised Learning  | 100 | 100 | 16.20 | 15.40 | https://github.com/Sleepychord/ImprovedGAN-pytorch |

# Conclusions

__Question 11:__ Write a short report explaining the pros and the cons of each methods that you implemented. 25% of the grade of this project will correspond to this question, thus, it should be done carefully. In particular, please add a plot that will summarize all your numerical results.