# Homework 2, *part 2* (60 points)

In this assignment you will build a heavy convolutional neural net (CNN) to solve Tiny ImageNet image classification. Try to achieve as high accuracy as possible.

## Deliverables

* This file,
* a "checkpoint file" from `torch.save(model.state_dict(), ...)` that contains model's weights (which a TA should be able to load to verify your accuracy).

## Grading

* 9 points for reproducible training code and a filled report below.
* 12 points for building a network that gets above 20% accuracy.
* 6.5 points for beating each of these milestones on the validation set:
  * 25.0%
  * 30.0%
  * 32.5%
  * 35.0%
  * 37.5%
  * 40.0%
    
## Restrictions

* Don't use pretrained networks.

## Tips

* One change at a time: never test several new things at once.
* Google a lot.
* Use GPU.
* Use regularization: L2, batch normalization, dropout, data augmentation.
* Use Tensorboard ([non-Colab](https://github.com/lanpa/tensorboardX) or [Colab](https://medium.com/@tommytao_54597/use-tensorboard-in-google-colab-16b4bb9812a6)) or a similar interactive tool for viewing progress.

In [1]:
# import subprocess
# import sys

# subprocess.check_call([sys.executable, '-m', 'pip', 'install', 'torchsummary'])

In [11]:
import numpy as np
import torchvision
import torch
from torchvision import transforms
from torchsummary import summary
from torch.autograd import Variable
import time
import tiny_imagenet
import matplotlib.pyplot as plt
%matplotlib inline

# https://raw.githubusercontent.com/NikolayKozyrskiy/nla-project/blob/master/Article.pdf

In [3]:
tiny_imagenet.download(".")

./tiny-imagenet-200 already exists, not downloading


Training and validation images are now in `tiny-imagenet-200/train` and `tiny-imagenet-200/val`.

In [4]:
dataset = torchvision.datasets.ImageFolder('tiny-imagenet-200/train', transform=transforms.ToTensor())
test_dataset = torchvision.datasets.ImageFolder('tiny-imagenet-200/val', transform=transforms.ToTensor())
train_dataset, val_dataset = torch.utils.data.random_split(dataset, [80000, 20000])
test_dataset, val_dataset = torch.utils.data.random_split(val_dataset, [10000, 10000])

In [5]:
batch_size = 50
train_batch_gen = torch.utils.data.DataLoader(train_dataset, 
                                              batch_size=batch_size,
                                              shuffle=True,
                                              num_workers=1)
val_batch_gen = torch.utils.data.DataLoader(val_dataset, 
                                              batch_size=batch_size,
                                              shuffle=True,
                                              num_workers=1)

In [7]:
# a special module that converts [batch, channel, w, h] to [batch, units]
class Flatten(torch.nn.Module):
    def forward(self, input):
        return input.view(input.size(0), -1)

In [9]:
n_classes = 200
model = torch.nn.Sequential()
# reshape from "images" to flat vectors
model.add_module('flatten', Flatten())
# dense "head"
model.add_module('dense1', torch.nn.Linear(3 * 64 * 64, 1064))
model.add_module('dense2', torch.nn.Linear(1064, 512))
model.add_module('dropout0', torch.nn.Dropout(0.05)) 
model.add_module('dense3', torch.nn.Linear(512, 256))
model.add_module('dropout1', torch.nn.Dropout(0.05))
model.add_module('dense4', torch.nn.Linear(256, 64))
model.add_module('dropout2', torch.nn.Dropout(0.05))
model.add_module('dense1_relu', torch.nn.ReLU())
model.add_module('dense2_logits', torch.nn.Linear(64, n_classes)) # logits for 200 classes

In [10]:
def compute_loss(X_batch, y_batch):
    X_batch = Variable(torch.FloatTensor(X_batch)).cuda()
    y_batch = Variable(torch.LongTensor(y_batch)).cuda()
    logits = model.cuda()(X_batch)
    return F.cross_entropy(logits, y_batch).mean()

In [14]:
lr = 0.01
num_epochs = 50

opt = torch.optim.SGD(model.parameters(), lr=lr)
train_loss = []
val_accuracy = []
for epoch in range(num_epochs):
    start_time = time.time()
    model.train(True) # enable dropout / batch_norm training behavior
    for (X_batch, y_batch) in train_batch_gen:
        # train on batch
        loss = compute_loss(X_batch, y_batch)
        loss.backward()
        opt.step()
        opt.zero_grad()
        train_loss.append(loss.data.numpy())
    
    model.train(False) # disable dropout / use averages for batch_norm
    for X_batch, y_batch in val_batch_gen:
        logits = model(Variable(torch.FloatTensor(X_batch)))
        y_pred = logits.max(1)[1].data
        val_accuracy.append(np.mean( (y_batch == y_pred.numpy() )))
    
    # Then we print the results for this epoch:
    print("Epoch {} of {} took {:.3f}s".format(
        epoch + 1, num_epochs, time.time() - start_time))
    print("  training loss (in-iteration): \t{:.6f}".format(
        np.mean(train_loss[-len(train_dataset) // batch_size :])))
    print("  validation accuracy: \t\t\t{:.2f} %".format(
        np.mean(val_accuracy[-len(val_dataset) // batch_size :]) * 100))

AssertionError: Torch not compiled with CUDA enabled

In [None]:
# Your code here

When everything is done, please compute accuracy on the validation set and report it below.

In [11]:
val_accuracy = # Your code here
print("Validation accuracy: %.2f%%" % (val_accuracy * 100))

# Report

Below, please mention

* a brief history of tweaks and improvements;
* what is the final architecture and why?
* what is the training method (batch size, optimization algorithm, ...) and why?
* Any regularization and other techniques applied and their effects;

The reference format is:

*"I have analyzed these and these articles|sources|blog posts, tried that and that to adapt them to my problem and the conclusions are such and such".*