# Homework 2.2: The Quest For A Better Network

In this assignment you will build a monster network to solve Tiny ImageNet image classification.

This notebook is intended as a sequel to seminar 3, please give it a try if you haven't done so yet.

(please read it at least diagonally)

* The ultimate quest is to create a network that has as high __accuracy__ as you can push it.
* There is a __mini-report__ at the end that you will have to fill in. We recommend reading it first and filling it while you iterate.
 
## Grading
* starting at zero points
* +20% for describing your iteration path in a report below.
* +20% for building a network that gets above 20% accuracy
* +10% for beating each of these milestones on __TEST__ dataset:
    * 25% (50% points)
    * 30% (60% points)
    * 32.5% (70% points)
    * 35% (80% points)
    * 37.5% (90% points)
    * 40% (full points)
    
## Restrictions
* Please do NOT use pre-trained networks for this assignment until you reach 40%.
 * In other words, base milestones must be beaten without pre-trained nets (and such net must be present in the anytask atttachments). After that, you can use whatever you want.
* you __can't__ do anything with validation data apart from running the evaluation procedure. Please, split train images on train and validation parts

## Tips on what can be done:


 * __Network size__
   * MOAR neurons, 
   * MOAR layers, ([torch.nn docs](http://pytorch.org/docs/master/nn.html))

   * Nonlinearities in the hidden layers
     * tanh, relu, leaky relu, etc
   * Larger networks may take more epochs to train, so don't discard your net just because it could didn't beat the baseline in 5 epochs.

   * Ph'nglui mglw'nafh Cthulhu R'lyeh wgah'nagl fhtagn!


### The main rule of prototyping: one change at a time
   * By now you probably have several ideas on what to change. By all means, try them out! But there's a catch: __never test several new things at once__.


### Optimization
   * Training for 100 epochs regardless of anything is probably a bad idea.
   * Some networks converge over 5 epochs, others - over 500.
   * Way to go: stop when validation score is 10 iterations past maximum
   * You should certainly use adaptive optimizers
     * rmsprop, nesterov_momentum, adam, adagrad and so on.
     * Converge faster and sometimes reach better optima
     * It might make sense to tweak learning rate/momentum, other learning parameters, batch size and number of epochs
   * __BatchNormalization__ (nn.BatchNorm2d) for the win!
     * Sometimes more batch normalization is better.
   * __Regularize__ to prevent overfitting
     * Add some L2 weight norm to the loss function, PyTorch will do the rest
       * Can be done manually or like [this](https://discuss.pytorch.org/t/simple-l2-regularization/139/2).
     * Dropout (`nn.Dropout`) - to prevent overfitting
       * Don't overdo it. Check if it actually makes your network better
   
### Convolution architectures
   * This task __can__ be solved by a sequence of convolutions and poolings with batch_norm and ReLU seasoning, but you shouldn't necessarily stop there.
   * [Inception family](https://hacktilldawn.com/2016/09/25/inception-modules-explained-and-implemented/), [ResNet family](https://towardsdatascience.com/an-overview-of-resnet-and-its-variants-5281e2f56035?gi=9018057983ca), [Densely-connected convolutions (exotic)](https://arxiv.org/abs/1608.06993), [Capsule networks (exotic)](https://arxiv.org/abs/1710.09829)
   * Please do try a few simple architectures before you go for resnet-152.
   * Warning! Training convolutional networks can take long without GPU. That's okay.
     * If you are CPU-only, we still recomment that you try a simple convolutional architecture
     * a perfect option is if you can set it up to run at nighttime and check it up at the morning.
     * Make reasonable layer size estimates. A 128-neuron first convolution is likely an overkill.
     * __To reduce computation__ time by a factor in exchange for some accuracy drop, try using __stride__ parameter. A stride=2 convolution should take roughly 1/4 of the default (stride=1) one.
 
   
### Data augmemntation
   * getting 5x as large dataset for free is a great 
     * Zoom-in+slice = move
     * Rotate+zoom(to remove black stripes)
     * Add Noize (gaussian or bernoulli)
   * Simple way to do that (if you have PIL/Image): 
     * ```from scipy.misc import imrotate,imresize```
     * and a few slicing
     * Other cool libraries: cv2, skimake, PIL/Pillow
   * A more advanced way is to use torchvision transforms:
   
    ```
    transform_train = transforms.Compose([
        transforms.RandomCrop(32, padding=4),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
    ])
    trainset = torchvision.datasets.ImageFolder(root=path_to_tiny_imagenet, train=True, download=True, transform=transform_train)
    trainloader = torch.utils.data.DataLoader(trainset, batch_size=128, shuffle=True, num_workers=2)
    ```
   
   * Or use this tool from Keras (requires theano/tensorflow): [tutorial](https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html), [docs](https://keras.io/preprocessing/image/)
   * [Albumentations](https://github.com/albumentations-team/albumentations) is another awesome solution.
   * Stay realistic. There's usually no point in flipping dogs upside down as that is not the way you usually see them.  
    * But sometimes there is! Some examples of advanced image augmentation approaches: [mixup](https://arxiv.org/pdf/1710.09412.pdf), [cutmix](https://arxiv.org/pdf/1905.04899.pdf)   

In [1]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

import torch, torch.nn as nn, torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
import torchvision, torchvision.transforms as transforms

# Uncomment this to disable "Skipping walk through <class 'list'>" warnings in DataSphere's env
# %enable_full_walk

In [2]:
!wget https://raw.githubusercontent.com/yandexdataschool/Practical_DL/spring21/homework02/tiny_img.py

--2021-10-04 16:17:06--  https://raw.githubusercontent.com/yandexdataschool/Practical_DL/spring21/homework02/tiny_img.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4422 (4.3K) [text/plain]
Saving to: ‘tiny_img.py’


2021-10-04 16:17:06 (71.2 MB/s) - ‘tiny_img.py’ saved [4422/4422]



In [3]:
# downloading TinyImagenet
# you don't have to run this cell more than once

from tiny_img import download_tinyImg200, fix_test_data
data_path = '.'
download_tinyImg200(data_path)
fix_test_data(data_path)

./tiny-imagenet-200.zip


We will split `tiny-imagenet-200/train` dataset into train and val parts, and use  `tiny-imagenet-200/val` dataset as a test one.

You are free to use either the default ImageFolder Dataset, or the custom one, which will read and store the whole data in RAM. The second one is preferable only when you have a slow disk; make sure then you do have an extra couple of GiBs of memory (it also could take some time to load the images):

In [4]:
import os
imagenet_dir = os.path.join(data_path, 'tiny-imagenet-200')
dataset = torchvision.datasets.ImageFolder(imagenet_dir + '/train', transform=transforms.ToTensor())
train_dataset, val_dataset = torch.utils.data.random_split(dataset, [80000, 20000],
                                                           generator=torch.Generator().manual_seed(42))
test_dataset = torchvision.datasets.ImageFolder(imagenet_dir + '/val', transform=transforms.ToTensor())

# OR

# from tiny_img_ram import TinyImagenetRAM
# dataset = TinyImagenetRAM('tiny-imagenet-200/train', transform=transforms.ToTensor())
# train_dataset, val_dataset = torch.utils.data.random_split(dataset, [80000, 20000],
#                                                            generator=torch.Generator().manual_seed(42))
# test_dataset = TinyImagenetRAM('tiny-imagenet-200/val', transform=transforms.ToTensor())

In [5]:
batch_size = 50
train_batch_gen = torch.utils.data.DataLoader(
    train_dataset,
    batch_size=batch_size,
    shuffle=True,
    num_workers=1
)

val_batch_gen = torch.utils.data.DataLoader(
    val_dataset,
    batch_size=batch_size,
    shuffle=True,
    num_workers=1
)

In [6]:
class ImageClassifierLinear(nn.Module):
  def __init__(self):
    super(ImageClassifierLinear, self).__init__()
    self.fc1 = nn.Linear(3 * 64 * 64, 1024)
    self.fc2 = nn.Linear(1024, 512)
    self.fc3 = nn.Linear(512, 256)
    self.fc4 = nn.Linear(256, 64)
    self.fc5 = nn.Linear(64, 200)
  
  def forward(self, x):
    x = torch.flatten(x, 1)
    x = F.relu(self.fc1(x))
    x = F.relu(self.fc2(x))
    x = F.relu(self.fc3(x))
    x = F.relu(self.fc4(x))
    x = self.fc5(x)
    return x

if torch.cuda.is_available():
  device = torch.device('cuda')
else:
  device = torch.device('cpu')


modelLinear = ImageClassifierLinear().to(device)

In [46]:
def compute_loss(model, X_batch, y_batch):
    X_batch = torch.FloatTensor(X_batch).to(device)
    y_batch = torch.LongTensor(y_batch).to(device)
    logits = model.to(device)(X_batch)
    return F.cross_entropy(logits, y_batch)

In [8]:
import numpy as np
import time 

opt = torch.optim.Adam(modelLinear.parameters(), lr=0.001)

train_loss = []
val_accuracy = []
val_accuracy_by_epochs = []

epoch = 0
while True:
  epoch += 1
  start_time = time.time()
  modelLinear.train(True)
  for X_batch, y_batch in train_batch_gen:
    loss = compute_loss(modelLinear, X_batch, y_batch)
    loss.backward()
    opt.step()
    opt.zero_grad()
    train_loss.append(loss.cpu().data.numpy())

  modelLinear.train(False)
  for X_batch, y_batch in val_batch_gen:
    logits = modelLinear(X_batch.to(device))
    y_pred = logits.max(1)[1].data
    val_accuracy.append(np.mean((y_batch.to(device) == y_pred).cpu().numpy()))
  
  val_accuracy_by_epoch = np.mean(val_accuracy[-len(val_dataset) // batch_size :]) * 100
  print("Epoch {} took {:.3f}s".format(epoch, time.time() - start_time))
  print("    training loss (in-iteration): \t{:.6f}".format(np.mean(train_loss[-len(train_dataset) // batch_size :])))
  print("    validation accuracy: \t\t\t{:.2f} %".format(val_accuracy_by_epoch))
  if len(val_accuracy_by_epochs) >= 10 and all(acc < val_accuracy_by_epoch for acc in val_accuracy_by_epochs[-10:]):
    break
  val_accuracy_by_epochs.append(val_accuracy_by_epoch)

Epoch 1 took 40.232s
    training loss (in-iteration): 	5.390800
    validation accuracy: 			0.55 %
Epoch 2 took 39.072s
    training loss (in-iteration): 	5.303995
    validation accuracy: 			0.48 %
Epoch 3 took 38.895s
    training loss (in-iteration): 	5.304186
    validation accuracy: 			0.50 %


KeyboardInterrupt: ignored

In [None]:
class ImageClassifierConv(nn.Module):
  def __init__(self):
    super(ImageClassifierConv, self).__init__()
    self.conv1 = nn.Conv2d(3, 16, kernel_size=(3,3))
    self.conv2 = nn.Conv2d(16, 32, kernel_size=(3,3))
    self.conv3 = nn.Conv2d(32, 64, kernel_size=(3,3))

    self.fc1 = nn.Linear(2304, 1024)
    self.fc2 = nn.Linear(1024, 200)

  def forward(self, x):
    x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
    x = F.max_pool2d(F.relu(self.conv2(x)), (2, 2))
    x = F.max_pool2d(F.relu(self.conv3(x)), (2, 2))

    x = torch.flatten(x, 1)
    x = F.relu(self.fc1(x))
    x = self.fc2(x)

    return x

modelConv = ImageClassifierConv().to(device)

In [None]:
import numpy as np
import time 
from torch.autograd import Variable

opt = torch.optim.Adam(modelConv.parameters(), lr=0.01)

train_loss = []
val_accuracy = []
val_accuracy_by_epochs = []

epoch = 0
while True:
  epoch += 1
  start_time = time.time()
  modelConv.train(True)
  for X_batch, y_batch in train_batch_gen:
    loss = compute_loss(modelConv, X_batch, y_batch)
    loss.backward()
    opt.step()
    opt.zero_grad()
    train_loss.append(loss.cpu().data.numpy())

  modelConv.train(False)
  for X_batch, y_batch in val_batch_gen:
    logits = modelConv(X_batch.to(device))
    y_pred = logits.max(1)[1].data
    val_accuracy.append(np.mean((y_batch.cpu() == y_pred.cpu()).numpy()))
  
  val_accuracy_by_epoch = np.mean(val_accuracy[-len(val_dataset) // batch_size :]) * 100
  print("Epoch {} took {:.3f}s".format(epoch, time.time() - start_time))
  print("    training loss (in-iteration): \t{:.6f}".format(np.mean(train_loss[-len(train_dataset) // batch_size :])))
  print("    validation accuracy: \t\t\t{:.2f} %".format(val_accuracy_by_epoch))
  if len(val_accuracy_by_epochs) >= 10 and all(acc < val_accuracy_by_epoch for acc in val_accuracy_by_epochs[-10:]):
    break
  val_accuracy_by_epochs.append(val_accuracy_by_epoch)

In [118]:
class ImageClassifierConvNormalizationRegularization(nn.Module):
  def __init__(self):
    super(ImageClassifierConvNormalizationRegularization, self).__init__()
    self.conv1 = nn.Conv2d(3, 128, kernel_size=(3,3))
    self.conv2 = nn.Conv2d(128, 128, kernel_size=(3,3))
    self.conv3 = nn.Conv2d(128, 256, kernel_size=(3,3))
    self.conv4 = nn.Conv2d(256, 512, kernel_size=(3,3))

    self.batch_norm1 = nn.BatchNorm2d(128)
    self.batch_norm2 = nn.BatchNorm2d(128)
    self.batch_norm3 = nn.BatchNorm2d(256)
    self.batch_norm4 = nn.BatchNorm2d(512)

    self.fc1 = nn.Linear(2048, 512)
    self.fc2 = nn.Linear(512, 200)

  def forward(self, x):
    x = F.max_pool2d(F.dropout2d(F.relu(self.batch_norm1(self.conv1(x))), p=0.3), (2,2))
    x = F.max_pool2d(F.dropout2d(F.relu(self.batch_norm2(self.conv2(x))), p=0.3), (2,2))
    x = F.max_pool2d(F.dropout2d(F.relu(self.batch_norm3(self.conv3(x))), p=0.3), (2,2))
    x = F.max_pool2d(F.dropout2d(F.relu(self.batch_norm4(self.conv4(x))), p=0.3), (2,2))

    x = torch.flatten(x, 1)
    x = F.dropout(F.relu(self.fc1(x)), p=0.3)
    x = self.fc2(x)

    return x
  
modelConvNormalizationRegularization = ImageClassifierConvNormalizationRegularization().to(device)

In [122]:
import numpy as np
import time 

opt = torch.optim.Adam(modelConvNormalizationRegularization.parameters(), lr=0.00001, weight_decay=0.0001)

train_loss = []
val_accuracy = []
val_accuracy_by_epochs = []

epoch = 0
while True:
  epoch += 1
  start_time = time.time()
  modelConvNormalizationRegularization.train(True)
  for X_batch, y_batch in train_batch_gen:
    loss = compute_loss(modelConvNormalizationRegularization, X_batch, y_batch)
    loss.backward()
    opt.step()
    opt.zero_grad()
    train_loss.append(loss.cpu().data.numpy())

  modelConvNormalizationRegularization.train(False)
  for X_batch, y_batch in val_batch_gen:
    logits = modelConvNormalizationRegularization(X_batch.to(device))
    y_pred = logits.max(1)[1].data
    val_accuracy.append(np.mean((y_batch.cpu() == y_pred.cpu()).numpy()))
  
  val_accuracy_by_epoch = np.mean(val_accuracy[-len(val_dataset) // batch_size :]) * 100
  print("Epoch {} took {:.3f}s".format(epoch, time.time() - start_time))
  print("    training loss (in-iteration): \t{:.6f}".format(np.mean(train_loss[-len(train_dataset) // batch_size :])))
  print("    validation accuracy: \t\t\t{:.2f} %".format(val_accuracy_by_epoch))
  if len(val_accuracy_by_epochs) >= 10 and all(acc < val_accuracy_by_epoch for acc in val_accuracy_by_epochs[-10:]):
    break
  val_accuracy_by_epochs.append(val_accuracy_by_epoch)

Epoch 1 took 72.361s
    training loss (in-iteration): 	3.964560
    validation accuracy: 			14.91 %


KeyboardInterrupt: ignored

In [123]:
from torchvision import transforms
means = np.array((0.4914, 0.4822, 0.4465))
stds = np.array((0.2023, 0.1994, 0.2010))

transform_augment = transforms.Compose([
    transforms.RandomRotation([-15, 15]),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize(means, stds)                       
])

dataset = torchvision.datasets.ImageFolder('tiny-imagenet-200/train', transform=transform_augment)
train_dataset, val_dataset = torch.utils.data.random_split(dataset, [90000, 10000])
train_batch_gen = torch.utils.data.DataLoader(
    train_dataset,
    batch_size=batch_size,
    shuffle=True,
    num_workers=1
)

val_batch_gen = torch.utils.data.DataLoader(
    val_dataset,
    batch_size=batch_size,
    shuffle=True,
    num_workers=1
)

In [124]:
import numpy as np
import time 

opt = torch.optim.Adam(modelConvNormalizationRegularization.parameters(), lr=0.00001, weight_decay=0.0001)

train_loss = []
val_accuracy = []
val_accuracy_by_epochs = []

epoch = 0
while True:
  epoch += 1
  start_time = time.time()
  modelConvNormalizationRegularization.train(True)
  for X_batch, y_batch in train_batch_gen:
    loss = compute_loss(modelConvNormalizationRegularization, X_batch, y_batch)
    loss.backward()
    opt.step()
    opt.zero_grad()
    train_loss.append(loss.cpu().data.numpy())

  modelConvNormalizationRegularization.train(False)
  for X_batch, y_batch in val_batch_gen:
    logits = modelConvNormalizationRegularization(X_batch.to(device))
    y_pred = logits.max(1)[1].data
    val_accuracy.append(np.mean((y_batch.cpu() == y_pred.cpu()).numpy()))
  
  val_accuracy_by_epoch = np.mean(val_accuracy[-len(val_dataset) // batch_size :]) * 100
  print("Epoch {} took {:.3f}s".format(epoch, time.time() - start_time))
  print("    training loss (in-iteration): \t{:.6f}".format(np.mean(train_loss[-len(train_dataset) // batch_size :])))
  print("    validation accuracy: \t\t\t{:.2f} %".format(val_accuracy_by_epoch))
  if len(val_accuracy_by_epochs) >= 10 and all(acc < val_accuracy_by_epoch for acc in val_accuracy_by_epochs[-10:]):
    break
  val_accuracy_by_epochs.append(val_accuracy_by_epoch)

Epoch 1 took 72.306s
    training loss (in-iteration): 	3.930057
    validation accuracy: 			15.80 %


Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7f8d7d40cdd0>
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1328, in __del__
    self._shutdown_workers()
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1320, in _shutdown_workers
    if w.is_alive():
  File "/usr/lib/python3.7/multiprocessing/process.py", line 151, in is_alive
    assert self._parent_pid == os.getpid(), 'can only test a child process'
AssertionError: can only test a child process
Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7f8d7d40cdd0>
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1328, in __del__
    self._shutdown_workers()
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1320, in _shutdown_workers
    if w.is_alive():
  File "/usr/lib/pytho

Epoch 2 took 73.634s
    training loss (in-iteration): 	3.915905
    validation accuracy: 			16.18 %


Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7f8d7d40cdd0>
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1328, in __del__
    self._shutdown_workers()
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1320, in _shutdown_workers
    if w.is_alive():
  File "/usr/lib/python3.7/multiprocessing/process.py", line 151, in is_alive
    assert self._parent_pid == os.getpid(), 'can only test a child process'
AssertionError: can only test a child process
Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7f8d7d40cdd0>
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1328, in __del__
    self._shutdown_workers()
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1320, in _shutdown_workers
    if w.is_alive():
  File "/usr/lib/pytho

Epoch 3 took 73.574s
    training loss (in-iteration): 	3.903891
    validation accuracy: 			16.42 %
Epoch 4 took 72.013s
    training loss (in-iteration): 	3.887822
    validation accuracy: 			16.23 %
Epoch 5 took 72.872s
    training loss (in-iteration): 	3.874355
    validation accuracy: 			17.03 %
Epoch 6 took 72.535s
    training loss (in-iteration): 	3.861777
    validation accuracy: 			16.82 %
Epoch 7 took 72.277s
    training loss (in-iteration): 	3.849924
    validation accuracy: 			16.69 %
Epoch 8 took 71.644s
    training loss (in-iteration): 	3.835912
    validation accuracy: 			16.96 %
Epoch 9 took 71.465s
    training loss (in-iteration): 	3.834361
    validation accuracy: 			17.38 %
Epoch 10 took 71.941s
    training loss (in-iteration): 	3.811423
    validation accuracy: 			17.28 %
Epoch 11 took 70.613s
    training loss (in-iteration): 	3.812006
    validation accuracy: 			17.05 %
Epoch 12 took 70.494s
    training loss (in-iteration): 	3.793864
    validation accuracy

When everything is done, please calculate accuracy on `tiny-imagenet-200/val`

In [None]:
test_accuracy = test_acc # YOUR CODE

In [None]:
print("Final results:")
print("  test accuracy:\t\t{:.2f} %".format(
    test_accuracy * 100))

if test_accuracy * 100 > 40:
    print("Achievement unlocked: 110lvl Warlock!")
elif test_accuracy * 100 > 35:
    print("Achievement unlocked: 80lvl Warlock!")
elif test_accuracy * 100 > 30:
    print("Achievement unlocked: 70lvl Warlock!")
elif test_accuracy * 100 > 25:
    print("Achievement unlocked: 60lvl Warlock!")
else:
    print("We need more magic! Follow instructons below")

```

```

```

```

```

```


# Report

All creative approaches are highly welcome, but at the very least it would be great to mention
* the idea;
* brief history of tweaks and improvements;
* what is the final architecture and why?
* what is the training method and, again, why?
* Any regularizations and other techniques applied and their effects;


There is no need to write strict mathematical proofs (unless you want to).
 * "I tried this, this and this, and the second one turned out to be better. And i just didn't like the name of that one" - OK, but can be better
 * "I have analized these and these articles|sources|blog posts, tried that and that to adapt them to my problem and the conclusions are such and such" - the ideal one
 * "I took that code that demo without understanding it, but i'll never confess that and instead i'll make up some pseudoscientific explaination" - __not_ok__

### Hi, my name is `___ ___`, and here's my story

A long time ago in a galaxy far far away, when it was still more than an hour before the deadline, i got an idea:

##### I gonna build a neural network, that
* brief text on what was
* the original idea
* and why it was so

How could i be so naive?!

##### One day, with no signs of warning,
This thing has finally converged and
* Some explaination about what were the results,
* what worked and what didn't
* most importantly - what next steps were taken, if any
* and what were their respective outcomes

##### Finally, after __  iterations, __ mugs of [tea/coffee]
* what was the final architecture
* as well as training method and tricks

That, having wasted ____ [minutes, hours or days] of my life training, got

* accuracy on training: __
* accuracy on validation: __
* accuracy on test: __


[an optional afterword and mortal curses on assignment authors]