# Homework 2.2: The Quest For A Better Network

In this assignment you will build a monster network to solve CIFAR10 image classification.

This notebook is intended as a sequel to seminar 3, please give it a try if you haven't done so yet.

(please read it at least diagonally)

* The ultimate quest is to create a network that has as high __accuracy__ as you can push it.
* There is a __mini-report__ at the end that you will have to fill in. We recommend reading it first and filling it while you iterate.
 
## Grading
* starting at zero points
* +20% for describing your iteration path in a report below.
* +20% for building a network that gets above 20% accuracy
* +10% for beating each of these milestones on __TEST__ dataset:
    * 50% (50% points)
    * 60% (60% points)
    * 65% (70% points)
    * 70% (80% points)
    * 75% (90% points)
    * 80% (full points)
    
## Restrictions
* Please do NOT use pre-trained networks for this assignment until you reach 80%.
 * In other words, base milestones must be beaten without pre-trained nets (and such net must be present in the e-mail). After that, you can use whatever you want.
* you __can__ use validation data for training, but you __can't'__ do anything with test data apart from running the evaluation procedure.

## Tips on what can be done:


 * __Network size__
   * MOAR neurons, 
   * MOAR layers, ([torch.nn docs](http://pytorch.org/docs/master/nn.html))

   * Nonlinearities in the hidden layers
     * tanh, relu, leaky relu, etc
   * Larger networks may take more epochs to train, so don't discard your net just because it could didn't beat the baseline in 5 epochs.

   * Ph'nglui mglw'nafh Cthulhu R'lyeh wgah'nagl fhtagn!


### The main rule of prototyping: one change at a time
   * By now you probably have several ideas on what to change. By all means, try them out! But there's a catch: __never test several new things at once__.


### Optimization
   * Training for 100 epochs regardless of anything is probably a bad idea.
   * Some networks converge over 5 epochs, others - over 500.
   * Way to go: stop when validation score is 10 iterations past maximum
   * You should certainly use adaptive optimizers
     * rmsprop, nesterov_momentum, adam, adagrad and so on.
     * Converge faster and sometimes reach better optima
     * It might make sense to tweak learning rate/momentum, other learning parameters, batch size and number of epochs
   * __BatchNormalization__ (nn.BatchNorm2d) for the win!
     * Sometimes more batch normalization is better.
   * __Regularize__ to prevent overfitting
     * Add some L2 weight norm to the loss function, PyTorch will do the rest
       * Can be done manually or like [this](https://discuss.pytorch.org/t/simple-l2-regularization/139/2).
     * Dropout (`nn.Dropout`) - to prevent overfitting
       * Don't overdo it. Check if it actually makes your network better
   
### Convolution architectures
   * This task __can__ be solved by a sequence of convolutions and poolings with batch_norm and ReLU seasoning, but you shouldn't necessarily stop there.
   * [Inception family](https://hacktilldawn.com/2016/09/25/inception-modules-explained-and-implemented/), [ResNet family](https://towardsdatascience.com/an-overview-of-resnet-and-its-variants-5281e2f56035?gi=9018057983ca), [Densely-connected convolutions (exotic)](https://arxiv.org/abs/1608.06993), [Capsule networks (exotic)](https://arxiv.org/abs/1710.09829)
   * Please do try a few simple architectures before you go for resnet-152.
   * Warning! Training convolutional networks can take long without GPU. That's okay.
     * If you are CPU-only, we still recomment that you try a simple convolutional architecture
     * a perfect option is if you can set it up to run at nighttime and check it up at the morning.
     * Make reasonable layer size estimates. A 128-neuron first convolution is likely an overkill.
     * __To reduce computation__ time by a factor in exchange for some accuracy drop, try using __stride__ parameter. A stride=2 convolution should take roughly 1/4 of the default (stride=1) one.
 
   
### Data augmemntation
   * getting 5x as large dataset for free is a great 
     * Zoom-in+slice = move
     * Rotate+zoom(to remove black stripes)
     * Add Noize (gaussian or bernoulli)
   * Simple way to do that (if you have PIL/Image): 
     * ```from scipy.misc import imrotate,imresize```
     * and a few slicing
     * Other cool libraries: cv2, skimake, PIL/Pillow
   * A more advanced way is to use torchvision transforms:
    ```
    transform_train = transforms.Compose([
        transforms.RandomCrop(32, padding=4),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
    ])
    trainset = torchvision.datasets.CIFAR10(root=path_to_cifar_like_in_seminar, train=True, download=True, transform=transform_train)
    trainloader = torch.utils.data.DataLoader(trainset, batch_size=128, shuffle=True, num_workers=2)

    ```
   * Or use this tool from Keras (requires theano/tensorflow): [tutorial](https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html), [docs](https://keras.io/preprocessing/image/)
   * Stay realistic. There's usually no point in flipping dogs upside down as that is not the way you usually see them.
   
```

```

```

```

```

```

```

```


   
There is a template for your solution below that you can opt to use or throw away and write it your way.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import time
%matplotlib inline

In [2]:
from cifar import load_cifar10
X_train,y_train,X_val,y_val,X_test,y_test = load_cifar10("cifar_data")
class_names = np.array(['airplane','automobile ','bird ','cat ','deer ','dog ','frog ','horse ','ship ','truck'])

print(X_train.shape,y_train.shape)

(40000, 3, 32, 32) (40000,)


In [3]:
import torch, torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
class Flatten(nn.Module):
    def forward(self, input):
        return input.view(input.size(0), -1)

In [4]:
def compute_loss(X_batch, y_batch):
    X_batch = Variable(torch.FloatTensor(X_batch))
    y_batch = Variable(torch.LongTensor(y_batch))
    logits = model(X_batch)
    return F.cross_entropy(logits, y_batch).mean()

__ Training __

In [5]:
def iterate_minibatches(X, y, batchsize):
    indices = np.random.permutation(np.arange(len(X)))
    for start in range(0, len(indices), batchsize):
        ix = indices[start: start + batchsize]
        yield X[ix], y[ix]

In [6]:
def train(model, num_epochs=20, batch_size=50, lr=0.003):
    train_loss = []
    val_accuracy = []
    train_accuracy = []
    for epoch in range(num_epochs):
    # In each epoch, we do a full pass over the training data:
        start_time = time.time()
        opt = torch.optim.Adam(model.parameters(), lr=lr)
        model.train(True) # enable dropout / batch_norm training behavior
        for X_batch, y_batch in iterate_minibatches(X_train, y_train, batch_size):
            # train on batch
            loss = compute_loss(X_batch, y_batch)
            loss.backward()
            opt.step()
            opt.zero_grad()
            train_loss.append(loss.data.numpy())

        # And a full pass over the validation data:
        model.train(False) # disable dropout / use averages for batch_norm
        for X_batch, y_batch in iterate_minibatches(X_val, y_val, batch_size):
            logits = model(Variable(torch.FloatTensor(X_batch)))
            y_pred = logits.max(1)[1].data.numpy()
            val_accuracy.append(np.mean(y_batch == y_pred))
        
        for X_batch, y_batch in iterate_minibatches(X_train, y_train, batch_size):
            logits = model(Variable(torch.FloatTensor(X_batch)))
            y_pred = logits.max(1)[1].data.numpy()
            train_accuracy.append(np.mean(y_batch == y_pred))


        # Then we print the results for this epoch:
        print("Epoch {} of {} took {:.3f}s".format(
            epoch + 1, num_epochs, time.time() - start_time))
        print("  training loss (in-iteration): \t{:.6f}".format(
            np.mean(train_loss[-len(X_train) // batch_size :])))
        print("  training accuracy: \t\t\t{:.2f} %".format(
            np.mean(train_accuracy[-len(X_train) // batch_size :]) * 100))
        print("  validation accuracy: \t\t\t{:.2f} %".format(
            np.mean(val_accuracy[-len(X_val) // batch_size :]) * 100))

In [7]:
def test(model):
    model.train(False) # disable dropout / use averages for batch_norm
    test_batch_acc = []
    for X_batch, y_batch in iterate_minibatches(X_test, y_test, 500):
        logits = model(Variable(torch.FloatTensor(X_batch)))
        y_pred = logits.max(1)[1].data.numpy()
        test_batch_acc.append(np.mean(y_batch == y_pred))

    test_accuracy = np.mean(test_batch_acc)

    print("Final results:")
    print("  test accuracy:\t\t{:.2f} %".format(
        test_accuracy * 100))

    if test_accuracy * 100 > 95:
        print("Double-check, than consider applying for NIPS'17. SRSly.")
    elif test_accuracy * 100 > 90:
        print("U'r freakin' amazin'!")
    elif test_accuracy * 100 > 80:
        print("Achievement unlocked: 110lvl Warlock!")
    elif test_accuracy * 100 > 70:
        print("Achievement unlocked: 80lvl Warlock!")
    elif test_accuracy * 100 > 60:
        print("Achievement unlocked: 70lvl Warlock!")
    elif test_accuracy * 100 > 50:
        print("Achievement unlocked: 60lvl Warlock!")
    else:
        print("We need more magic! Follow instructons below")

In [8]:
model = nn.Sequential()

model.add_module('l1_flatten', Flatten())

model.add_module('l_end', nn.Linear(3072, 10))

In [9]:
train(model)

Epoch 1 of 20 took 1.128s
  training loss (in-iteration): 	2.103738
  training accuracy: 			33.69 %
  validation accuracy: 			32.91 %
Epoch 2 of 20 took 1.101s
  training loss (in-iteration): 	2.056705
  training accuracy: 			38.05 %
  validation accuracy: 			36.76 %
Epoch 3 of 20 took 1.091s
  training loss (in-iteration): 	2.042365
  training accuracy: 			36.61 %
  validation accuracy: 			34.70 %
Epoch 4 of 20 took 1.130s
  training loss (in-iteration): 	2.031201
  training accuracy: 			37.64 %
  validation accuracy: 			35.52 %
Epoch 5 of 20 took 1.099s
  training loss (in-iteration): 	2.006038
  training accuracy: 			36.86 %
  validation accuracy: 			35.54 %
Epoch 6 of 20 took 1.088s
  training loss (in-iteration): 	1.990934
  training accuracy: 			39.64 %
  validation accuracy: 			38.57 %
Epoch 7 of 20 took 1.103s
  training loss (in-iteration): 	1.991855
  training accuracy: 			36.56 %
  validation accuracy: 			34.40 %
Epoch 8 of 20 took 1.141s
  training loss (in-iteration): 	1.9

In [10]:
test(model)

Final results:
  test accuracy:		35.91 %
We need more magic! Follow instructons below


In [11]:
model = nn.Sequential()

model.add_module('l1_flatten', Flatten())

model.add_module('l2_linear', nn.Linear(3072, 128))

model.add_module('l_end', nn.Linear(128, 10))

In [12]:
train(model)

Epoch 1 of 20 took 4.083s
  training loss (in-iteration): 	2.091309
  training accuracy: 			34.38 %
  validation accuracy: 			33.73 %
Epoch 2 of 20 took 4.063s
  training loss (in-iteration): 	1.930001
  training accuracy: 			31.51 %
  validation accuracy: 			32.26 %
Epoch 3 of 20 took 4.075s
  training loss (in-iteration): 	1.905702
  training accuracy: 			32.00 %
  validation accuracy: 			31.65 %
Epoch 4 of 20 took 4.063s
  training loss (in-iteration): 	1.874966
  training accuracy: 			34.09 %
  validation accuracy: 			33.62 %
Epoch 5 of 20 took 4.093s
  training loss (in-iteration): 	1.859216
  training accuracy: 			38.23 %
  validation accuracy: 			38.15 %
Epoch 6 of 20 took 4.090s
  training loss (in-iteration): 	1.853886
  training accuracy: 			36.80 %
  validation accuracy: 			35.59 %
Epoch 7 of 20 took 4.215s
  training loss (in-iteration): 	1.833206
  training accuracy: 			37.09 %
  validation accuracy: 			36.16 %
Epoch 8 of 20 took 4.521s
  training loss (in-iteration): 	1.8

In [13]:
test(model)

Final results:
  test accuracy:		36.60 %
We need more magic! Follow instructons below


In [14]:
model = nn.Sequential()

model.add_module('l1_flatten', Flatten())

model.add_module('l1_linear', nn.Linear(3072, 128))
model.add_module('l1_relu', nn.ReLU())

model.add_module('l_end', nn.Linear(128, 10))

In [15]:
train(model)

Epoch 1 of 20 took 7.125s
  training loss (in-iteration): 	1.929971
  training accuracy: 			34.85 %
  validation accuracy: 			34.20 %
Epoch 2 of 20 took 4.258s
  training loss (in-iteration): 	1.797531
  training accuracy: 			34.04 %
  validation accuracy: 			34.04 %
Epoch 3 of 20 took 4.147s
  training loss (in-iteration): 	1.776362
  training accuracy: 			35.46 %
  validation accuracy: 			36.03 %
Epoch 4 of 20 took 4.217s
  training loss (in-iteration): 	1.746300
  training accuracy: 			35.43 %
  validation accuracy: 			35.48 %
Epoch 5 of 20 took 4.449s
  training loss (in-iteration): 	1.727408
  training accuracy: 			38.28 %
  validation accuracy: 			37.15 %
Epoch 6 of 20 took 4.461s
  training loss (in-iteration): 	1.718972
  training accuracy: 			38.91 %
  validation accuracy: 			38.07 %
Epoch 7 of 20 took 4.456s
  training loss (in-iteration): 	1.704920
  training accuracy: 			39.03 %
  validation accuracy: 			38.44 %
Epoch 8 of 20 took 4.444s
  training loss (in-iteration): 	1.6

In [16]:
test(model)

Final results:
  test accuracy:		39.21 %
We need more magic! Follow instructons below


In [19]:
model = nn.Sequential()

model.add_module('l1_conv', nn.Conv2d(in_channels=3, out_channels=32, kernel_size=(3,3)))
model.add_module('l1_pool', nn.MaxPool2d((2,2)))

model.add_module('l2_flatten', Flatten())

model.add_module('l3_linear', nn.Linear(7200, 128))
model.add_module('l3_relu', nn.ReLU())

model.add_module('l_end', nn.Linear(128, 10))

In [20]:
train(model, lr=0.003)

Epoch 1 of 20 took 37.792s
  training loss (in-iteration): 	1.558277
  training accuracy: 			57.04 %
  validation accuracy: 			53.64 %
Epoch 2 of 20 took 40.410s
  training loss (in-iteration): 	1.241197
  training accuracy: 			64.82 %
  validation accuracy: 			58.16 %
Epoch 3 of 20 took 39.284s
  training loss (in-iteration): 	1.105015
  training accuracy: 			67.72 %
  validation accuracy: 			58.51 %
Epoch 4 of 20 took 40.491s
  training loss (in-iteration): 	0.980606
  training accuracy: 			71.32 %
  validation accuracy: 			59.04 %
Epoch 5 of 20 took 40.267s
  training loss (in-iteration): 	0.878240
  training accuracy: 			75.11 %
  validation accuracy: 			60.78 %
Epoch 6 of 20 took 40.168s
  training loss (in-iteration): 	0.799167
  training accuracy: 			77.88 %
  validation accuracy: 			60.84 %
Epoch 7 of 20 took 41.460s
  training loss (in-iteration): 	0.720244
  training accuracy: 			80.34 %
  validation accuracy: 			60.56 %
Epoch 8 of 20 took 40.194s
  training loss (in-iteratio

In [21]:
test(model)

Final results:
  test accuracy:		57.73 %
Achievement unlocked: 60lvl Warlock!


In [24]:
model = nn.Sequential()

model.add_module('l1_conv', nn.Conv2d(in_channels=3, out_channels=32, kernel_size=(3,3)))
model.add_module('l1_bn', nn.BatchNorm2d(32))
model.add_module('l1_pool', nn.MaxPool2d((2,2)))

model.add_module('l2_flatten', Flatten())

model.add_module('l3_linear', nn.Linear(7200, 128))
model.add_module('l3_bn', nn.BatchNorm1d(128))
model.add_module('l3_relu', nn.ReLU())
model.add_module('l3_dropout', nn.Dropout(0.2))

model.add_module('l_end', nn.Linear(128, 10))

In [25]:
train(model)

Epoch 1 of 20 took 51.093s
  training loss (in-iteration): 	1.467252
  training accuracy: 			56.78 %
  validation accuracy: 			53.46 %
Epoch 2 of 20 took 52.101s
  training loss (in-iteration): 	1.162341
  training accuracy: 			67.42 %
  validation accuracy: 			60.68 %
Epoch 3 of 20 took 53.116s
  training loss (in-iteration): 	1.023534
  training accuracy: 			70.78 %
  validation accuracy: 			61.55 %
Epoch 4 of 20 took 53.027s
  training loss (in-iteration): 	0.917184
  training accuracy: 			75.13 %
  validation accuracy: 			62.58 %
Epoch 5 of 20 took 54.321s
  training loss (in-iteration): 	0.831120
  training accuracy: 			61.43 %
  validation accuracy: 			52.00 %
Epoch 6 of 20 took 57.811s
  training loss (in-iteration): 	0.751428
  training accuracy: 			69.69 %
  validation accuracy: 			56.68 %
Epoch 7 of 20 took 53.111s
  training loss (in-iteration): 	0.693150
  training accuracy: 			68.54 %
  validation accuracy: 			55.29 %
Epoch 8 of 20 took 54.159s
  training loss (in-iteratio

In [26]:
test(model)

Final results:
  test accuracy:		62.34 %
Achievement unlocked: 70lvl Warlock!


In [29]:
model = nn.Sequential()

model.add_module('l1_conv', nn.Conv2d(in_channels=3, out_channels=32, kernel_size=(3,3)))
model.add_module('l1_bn', nn.BatchNorm2d(32))

model.add_module('l2_conv', nn.Conv2d(in_channels=32, out_channels=64, kernel_size=(3,3)))
model.add_module('l2_bn', nn.BatchNorm2d(64))

model.add_module('l3_pool', nn.MaxPool2d((3,3)))
model.add_module('l3_dropout', nn.Dropout(0.2))

model.add_module('l4_flatten', Flatten())

model.add_module('l5_linear', nn.Linear(5184, 128))
model.add_module('l5_bn', nn.BatchNorm1d(128))
model.add_module('l5_relu', nn.ReLU())
model.add_module('l5_dropout', nn.Dropout(0.2))

model.add_module('l_end', nn.Linear(128, 10))

In [30]:
train(model)

Epoch 1 of 20 took 166.044s
  training loss (in-iteration): 	1.337681
  training accuracy: 			65.49 %
  validation accuracy: 			63.14 %
Epoch 2 of 20 took 166.754s
  training loss (in-iteration): 	1.054475
  training accuracy: 			71.43 %
  validation accuracy: 			66.46 %
Epoch 3 of 20 took 166.421s
  training loss (in-iteration): 	0.945181
  training accuracy: 			75.45 %
  validation accuracy: 			68.50 %
Epoch 4 of 20 took 166.230s
  training loss (in-iteration): 	0.867227
  training accuracy: 			76.05 %
  validation accuracy: 			67.66 %
Epoch 5 of 20 took 165.844s
  training loss (in-iteration): 	0.802791
  training accuracy: 			80.49 %
  validation accuracy: 			69.97 %
Epoch 6 of 20 took 166.553s
  training loss (in-iteration): 	0.744912
  training accuracy: 			83.20 %
  validation accuracy: 			70.64 %
Epoch 7 of 20 took 165.794s
  training loss (in-iteration): 	0.705314
  training accuracy: 			83.47 %
  validation accuracy: 			69.52 %
Epoch 8 of 20 took 166.687s
  training loss (in-

In [31]:
test(model)

Final results:
  test accuracy:		71.00 %
Achievement unlocked: 80lvl Warlock!


In [34]:
model = nn.Sequential()

model.add_module('l1_conv', nn.Conv2d(in_channels=3, out_channels=32, kernel_size=(3,3)))
model.add_module('l1_bn', nn.BatchNorm2d(32))
model.add_module('l1_relu', nn.ReLU())

model.add_module('l2_conv', nn.Conv2d(in_channels=32, out_channels=64, kernel_size=(3,3)))
model.add_module('l2_bn', nn.BatchNorm2d(64))
model.add_module('l2_relu', nn.ReLU())

model.add_module('l3_conv', nn.Conv2d(in_channels=64, out_channels=128, kernel_size=(3,3)))
model.add_module('l3_bn', nn.BatchNorm2d(128))
model.add_module('l3_relu', nn.ReLU())

model.add_module('l4_pool', nn.MaxPool2d((3,3)))
model.add_module('l4_dropout', nn.Dropout(0.1))

model.add_module('l5_flatten', Flatten())

model.add_module('l6_linear', nn.Linear(8192, 512))
model.add_module('l6_bn', nn.BatchNorm1d(512))
model.add_module('l6_relu', nn.ReLU())
model.add_module('l6_dropout', nn.Dropout(0.2))

model.add_module('l7_linear', nn.Linear(512, 256))
model.add_module('l7_bn', nn.BatchNorm1d(256))
model.add_module('l7_relu', nn.ReLU())
model.add_module('l7_dropout', nn.Dropout(0.2))

model.add_module('l8_linear', nn.Linear(256, 128))
model.add_module('l8_bn', nn.BatchNorm1d(128))
model.add_module('l8_relu', nn.ReLU())
model.add_module('l8_dropout', nn.Dropout(0.2))

model.add_module('l_end', nn.Linear(128, 10))

In [35]:
train(model)

Epoch 1 of 20 took 540.335s
  training loss (in-iteration): 	1.312281
  training accuracy: 			62.09 %
  validation accuracy: 			59.37 %
Epoch 2 of 20 took 538.095s
  training loss (in-iteration): 	0.968383
  training accuracy: 			72.12 %
  validation accuracy: 			66.64 %
Epoch 3 of 20 took 532.319s
  training loss (in-iteration): 	0.779345
  training accuracy: 			79.95 %
  validation accuracy: 			70.70 %
Epoch 4 of 20 took 531.531s
  training loss (in-iteration): 	0.619937
  training accuracy: 			86.57 %
  validation accuracy: 			73.26 %
Epoch 5 of 20 took 560.437s
  training loss (in-iteration): 	0.494134
  training accuracy: 			91.47 %
  validation accuracy: 			74.89 %
Epoch 6 of 20 took 614.871s
  training loss (in-iteration): 	0.389112
  training accuracy: 			96.01 %
  validation accuracy: 			76.10 %
Epoch 7 of 20 took 581.483s
  training loss (in-iteration): 	0.310812
  training accuracy: 			96.56 %
  validation accuracy: 			74.70 %
Epoch 8 of 20 took 570.636s
  training loss (in-

In [36]:
test(model)

Final results:
  test accuracy:		75.58 %
Achievement unlocked: 80lvl Warlock!


In [8]:
model = nn.Sequential()

model.add_module('l1_conv', nn.Conv2d(in_channels=3, out_channels=32, kernel_size=(3,3), padding=(1,1)))
model.add_module('l1_bn', nn.BatchNorm2d(32))
model.add_module('l1_relu', nn.ReLU())
model.add_module('l1_pool', nn.MaxPool2d((2,2)))

model.add_module('l2_conv', nn.Conv2d(in_channels=32, out_channels=64, kernel_size=(3,3), padding=(1,1)))
model.add_module('l2_bn', nn.BatchNorm2d(64))
model.add_module('l2_relu', nn.ReLU())
model.add_module('l2_pool', nn.MaxPool2d((2,2)))

model.add_module('l3_conv', nn.Conv2d(in_channels=64, out_channels=128, kernel_size=(3,3), padding=(1,1)))
model.add_module('l3_bn', nn.BatchNorm2d(128))
model.add_module('l3_relu', nn.ReLU())
model.add_module('l3_pool', nn.MaxPool2d((2,2)))

model.add_module('l4_conv', nn.Conv2d(in_channels=128, out_channels=256, kernel_size=(3,3), padding=(1,1)))
model.add_module('l4_bn', nn.BatchNorm2d(256))
model.add_module('l4_relu', nn.ReLU())
model.add_module('l4_pool', nn.MaxPool2d((2,2)))

model.add_module('l5_dropout', nn.Dropout(0.2))
model.add_module('l5_flatten', Flatten())

model.add_module('l6_linear', nn.Linear(1024, 512))
model.add_module('l6_bn', nn.BatchNorm1d(512))
model.add_module('l6_relu', nn.ReLU())
model.add_module('l6_dropout', nn.Dropout(0.2))

model.add_module('l7_linear', nn.Linear(512, 256))
model.add_module('l7_bn', nn.BatchNorm1d(256))
model.add_module('l7_relu', nn.ReLU())
model.add_module('l7_ropout', nn.Dropout(0.2))

model.add_module('l8_linear', nn.Linear(256, 128))
model.add_module('l8_bn', nn.BatchNorm1d(128))
model.add_module('l8_relu', nn.ReLU())
model.add_module('l8_dropout', nn.Dropout(0.2))

model.add_module('l_end', nn.Linear(128, 10))

In [9]:
train(model)

Epoch 1 of 20 took 259.639s
  training loss (in-iteration): 	1.387094
  training accuracy: 			64.00 %
  validation accuracy: 			62.67 %
Epoch 2 of 20 took 272.455s
  training loss (in-iteration): 	1.009565
  training accuracy: 			70.03 %
  validation accuracy: 			67.47 %
Epoch 3 of 20 took 256.003s
  training loss (in-iteration): 	0.825947
  training accuracy: 			72.59 %
  validation accuracy: 			68.86 %
Epoch 4 of 20 took 247.037s
  training loss (in-iteration): 	0.707917
  training accuracy: 			77.68 %
  validation accuracy: 			71.84 %
Epoch 5 of 20 took 245.893s
  training loss (in-iteration): 	0.612925
  training accuracy: 			82.08 %
  validation accuracy: 			74.32 %
Epoch 6 of 20 took 250.760s
  training loss (in-iteration): 	0.534407
  training accuracy: 			81.88 %
  validation accuracy: 			72.46 %
Epoch 7 of 20 took 240.036s
  training loss (in-iteration): 	0.467823
  training accuracy: 			85.72 %
  validation accuracy: 			74.75 %
Epoch 8 of 20 took 241.148s
  training loss (in-

In [10]:
test(model)

Final results:
  test accuracy:		79.67 %
Achievement unlocked: 80lvl Warlock!


In [11]:
model = nn.Sequential()

model.add_module('l1_conv', nn.Conv2d(in_channels=3, out_channels=32, kernel_size=(3,3), padding=(1,1)))
model.add_module('l1_bn', nn.BatchNorm2d(32))
model.add_module('l1_relu', nn.ReLU())
model.add_module('l1_pool', nn.MaxPool2d((2,2)))
model.add_module('l1_dropout', nn.Dropout2d(0.1))

model.add_module('l2_conv', nn.Conv2d(in_channels=32, out_channels=64, kernel_size=(3,3), padding=(1,1)))
model.add_module('l2_bn', nn.BatchNorm2d(64))
model.add_module('l2_relu', nn.ReLU())
model.add_module('l2_pool', nn.MaxPool2d((2,2)))
model.add_module('l2_dropout', nn.Dropout2d(0.1))

model.add_module('l3_conv', nn.Conv2d(in_channels=64, out_channels=128, kernel_size=(3,3), padding=(1,1)))
model.add_module('l3_bn', nn.BatchNorm2d(128))
model.add_module('l3_relu', nn.ReLU())
model.add_module('l3_pool', nn.MaxPool2d((2,2)))
model.add_module('l3_dropout', nn.Dropout2d(0.1))

model.add_module('l4_conv', nn.Conv2d(in_channels=128, out_channels=256, kernel_size=(3,3), padding=(1,1)))
model.add_module('l4_bn', nn.BatchNorm2d(256))
model.add_module('l4_relu', nn.ReLU())
model.add_module('l4_pool', nn.MaxPool2d((2,2)))
model.add_module('l4_dropout', nn.Dropout(0.1))

model.add_module('l5_flatten', Flatten())

model.add_module('l6_linear', nn.Linear(1024, 512))
model.add_module('l6_bn', nn.BatchNorm1d(512))
model.add_module('l6_relu', nn.ReLU())
model.add_module('l6_dropout', nn.Dropout(0.1))

model.add_module('l7_linear', nn.Linear(512, 256))
model.add_module('l7_bn', nn.BatchNorm1d(256))
model.add_module('l7_relu', nn.ReLU())
model.add_module('l7_dropout', nn.Dropout(0.1))

model.add_module('l8_linear', nn.Linear(256, 128))
model.add_module('l8_bn', nn.BatchNorm1d(128))
model.add_module('l8_relu', nn.ReLU())
model.add_module('l8_dropout', nn.Dropout(0.1))

model.add_module('l_end', nn.Linear(128, 10))

In [12]:
train(model)

Epoch 1 of 20 took 251.582s
  training loss (in-iteration): 	1.479386
  training accuracy: 			58.77 %
  validation accuracy: 			58.12 %
Epoch 2 of 20 took 251.827s
  training loss (in-iteration): 	1.129340
  training accuracy: 			66.49 %
  validation accuracy: 			64.45 %
Epoch 3 of 20 took 253.363s
  training loss (in-iteration): 	0.959166
  training accuracy: 			66.14 %
  validation accuracy: 			63.40 %
Epoch 4 of 20 took 250.936s
  training loss (in-iteration): 	0.846651
  training accuracy: 			76.83 %
  validation accuracy: 			71.64 %
Epoch 5 of 20 took 251.528s
  training loss (in-iteration): 	0.758098
  training accuracy: 			79.85 %
  validation accuracy: 			72.96 %
Epoch 6 of 20 took 250.705s
  training loss (in-iteration): 	0.682973
  training accuracy: 			85.24 %
  validation accuracy: 			76.54 %
Epoch 7 of 20 took 248.430s
  training loss (in-iteration): 	0.626218
  training accuracy: 			87.75 %
  validation accuracy: 			76.90 %
Epoch 8 of 20 took 248.815s
  training loss (in-

In [13]:
test(model)

Final results:
  test accuracy:		78.61 %
Achievement unlocked: 80lvl Warlock!


In [14]:
model = nn.Sequential()

model.add_module('l1_conv', nn.Conv2d(in_channels=3, out_channels=32, kernel_size=(3,3), padding=(1,1)))
model.add_module('l1_bn', nn.BatchNorm2d(32))
model.add_module('l1_relu', nn.ReLU())
model.add_module('l1_pool', nn.MaxPool2d((2,2)))
model.add_module('l1_dropout', nn.Dropout2d(0.3))

model.add_module('l2_conv', nn.Conv2d(in_channels=32, out_channels=64, kernel_size=(3,3), padding=(1,1)))
model.add_module('l2_bn', nn.BatchNorm2d(64))
model.add_module('l2_relu', nn.ReLU())
model.add_module('l2_pool', nn.MaxPool2d((2,2)))
model.add_module('l2_dropout', nn.Dropout2d(0.3))

model.add_module('l3_conv', nn.Conv2d(in_channels=64, out_channels=128, kernel_size=(3,3), padding=(1,1)))
model.add_module('l3_bn', nn.BatchNorm2d(128))
model.add_module('l3_relu', nn.ReLU())
model.add_module('l3_pool', nn.MaxPool2d((2,2)))
model.add_module('l3_dropout', nn.Dropout2d(0.3))

model.add_module('l4_conv', nn.Conv2d(in_channels=128, out_channels=256, kernel_size=(3,3), padding=(1,1)))
model.add_module('l4_bn', nn.BatchNorm2d(256))
model.add_module('l4_relu', nn.ReLU())
model.add_module('l4_pool', nn.MaxPool2d((2,2)))
model.add_module('l4_dropout', nn.Dropout(0.2))

model.add_module('l5_flatten', Flatten())

model.add_module('l6_linear', nn.Linear(1024, 512))
model.add_module('l6_bn', nn.BatchNorm1d(512))
model.add_module('l6_relu', nn.ReLU())
model.add_module('l6_dropout', nn.Dropout(0.3))

model.add_module('l7_linear', nn.Linear(512, 256))
model.add_module('l7_bn', nn.BatchNorm1d(256))
model.add_module('l7_relu', nn.ReLU())
model.add_module('l7_dropout', nn.Dropout(0.3))

model.add_module('l8_linear', nn.Linear(256, 128))
model.add_module('l8_bn', nn.BatchNorm1d(128))
model.add_module('l8_relu', nn.ReLU())
model.add_module('l8_dropout', nn.Dropout(0.3))

model.add_module('l_end', nn.Linear(128, 10))

In [15]:
train(model)

Epoch 1 of 20 took 252.811s
  training loss (in-iteration): 	1.765514
  training accuracy: 			47.84 %
  validation accuracy: 			47.76 %
Epoch 2 of 20 took 254.639s
  training loss (in-iteration): 	1.507591
  training accuracy: 			56.28 %
  validation accuracy: 			55.14 %
Epoch 3 of 20 took 255.884s
  training loss (in-iteration): 	1.367354
  training accuracy: 			59.22 %
  validation accuracy: 			57.45 %
Epoch 4 of 20 took 249.273s
  training loss (in-iteration): 	1.264435
  training accuracy: 			66.00 %
  validation accuracy: 			63.53 %
Epoch 5 of 20 took 255.538s
  training loss (in-iteration): 	1.181103
  training accuracy: 			68.72 %
  validation accuracy: 			65.47 %
Epoch 6 of 20 took 250.049s
  training loss (in-iteration): 	1.115304
  training accuracy: 			69.08 %
  validation accuracy: 			66.40 %
Epoch 7 of 20 took 252.626s
  training loss (in-iteration): 	1.066446
  training accuracy: 			73.22 %
  validation accuracy: 			69.56 %
Epoch 8 of 20 took 250.414s
  training loss (in-

In [16]:
test(model)

Final results:
  test accuracy:		76.42 %
Achievement unlocked: 80lvl Warlock!


In [19]:
model = nn.Sequential()

model.add_module('l1_conv', nn.Conv2d(in_channels=3, out_channels=16, kernel_size=(3,3), padding=(1,1)))
model.add_module('l1_bn', nn.BatchNorm2d(16))
model.add_module('l1_relu', nn.ReLU())
model.add_module('l1_pool', nn.MaxPool2d((2,2)))
model.add_module('l1_dropout', nn.Dropout2d(0.1))

model.add_module('l2_conv', nn.Conv2d(in_channels=16, out_channels=32, kernel_size=(3,3), padding=(1,1)))
model.add_module('l2_bn', nn.BatchNorm2d(32))
model.add_module('l2_relu', nn.ReLU())
model.add_module('l2_pool', nn.MaxPool2d((2,2)))
model.add_module('l2_dropout', nn.Dropout2d(0.1))

model.add_module('l3_conv', nn.Conv2d(in_channels=32, out_channels=64, kernel_size=(3,3), padding=(1,1)))
model.add_module('l3_bn', nn.BatchNorm2d(64))
model.add_module('l3_relu', nn.ReLU())
model.add_module('l3_pool', nn.MaxPool2d((2,2)))
model.add_module('l3_dropout', nn.Dropout2d(0.1))

model.add_module('l4_conv', nn.Conv2d(in_channels=64, out_channels=128, kernel_size=(3,3), padding=(1,1)))
model.add_module('l4_bn', nn.BatchNorm2d(128))
model.add_module('l4_relu', nn.ReLU())
model.add_module('l4_pool', nn.MaxPool2d((2,2)))
model.add_module('l4_dropout', nn.Dropout(0.1))

model.add_module('l5_flatten', Flatten())

model.add_module('l6_linear', nn.Linear(512, 256))
model.add_module('l6_bn', nn.BatchNorm1d(256))
model.add_module('l6_relu', nn.ReLU())
model.add_module('l6_dropout', nn.Dropout(0.1))

model.add_module('l7_linear', nn.Linear(256, 128))
model.add_module('l7_bn', nn.BatchNorm1d(128))
model.add_module('l7_relu', nn.ReLU())
model.add_module('l7_dropout', nn.Dropout(0.1))

model.add_module('l_end', nn.Linear(128, 10))

In [20]:
train(model)

Epoch 1 of 20 took 119.636s
  training loss (in-iteration): 	1.535589
  training accuracy: 			56.97 %
  validation accuracy: 			56.11 %
Epoch 2 of 20 took 119.444s
  training loss (in-iteration): 	1.218669
  training accuracy: 			67.20 %
  validation accuracy: 			64.98 %
Epoch 3 of 20 took 119.910s
  training loss (in-iteration): 	1.075655
  training accuracy: 			70.42 %
  validation accuracy: 			67.44 %
Epoch 4 of 20 took 120.112s
  training loss (in-iteration): 	0.977414
  training accuracy: 			73.53 %
  validation accuracy: 			70.20 %
Epoch 5 of 20 took 120.476s
  training loss (in-iteration): 	0.912952
  training accuracy: 			74.21 %
  validation accuracy: 			69.55 %
Epoch 6 of 20 took 120.270s
  training loss (in-iteration): 	0.865610
  training accuracy: 			77.85 %
  validation accuracy: 			71.73 %
Epoch 7 of 20 took 120.794s
  training loss (in-iteration): 	0.822262
  training accuracy: 			79.93 %
  validation accuracy: 			72.82 %
Epoch 8 of 20 took 120.469s
  training loss (in-

In [21]:
test(model)

Final results:
  test accuracy:		76.08 %
Achievement unlocked: 80lvl Warlock!


```

```

```

```

```

```


# Report

All creative approaches are highly welcome, but at the very least it would be great to mention
* the idea;
* brief history of tweaks and improvements;
* what is the final architecture and why?
* what is the training method and, again, why?
* Any regularizations and other techniques applied and their effects;


There is no need to write strict mathematical proofs (unless you want to).
 * "I tried this, this and this, and the second one turned out to be better. And i just didn't like the name of that one" - OK, but can be better
 * "I have analized these and these articles|sources|blog posts, tried that and that to adapt them to my problem and the conclusions are such and such" - the ideal one
 * "I took that code that demo without understanding it, but i'll never confess that and instead i'll make up some pseudoscientific explaination" - __not_ok__

### Hi, my name is `Chikunov Anton`, and here's my story

A long time ago in a galaxy far far away, when it was still more than an hour before the deadline, i got an idea:

* First of all, I tried simple linear model and got $\approx$ 36%
* Playing with only linear layers didn't give any real boost so I added conv layer with pool and got 57%-58%
* Then I added BN after all base layers and dropout before end layer
* Also, power of 2 is power
* I tried to play with different combination of layer orders and realazied two pwoerfull patterns of combination
    * Conv Layer template:
        * Conv
        * BN
        * RELU
        * Pool
        * (There is also Dropout but in power conf I get 79.67 without it)
    * Linear Layer template:
        * Linear
        * BN
        * RELU
        * Dropout
* After that i got %79.67 and it was maximum of my warlock power. =( 
* Any other values only had decreased this power. This is realy disgusting. ("What a story, Mark")

BTW, Layers, such as conv, pools, etc, are typicaly used for image classification. Dropout are used to avoid overfitting.

My monster:
```
Conv2d(in_channels=3, out_channels=32, kernel_size=(3,3), padding=(1,1))
BatchNorm2d(32)
ReLU()
MaxPool2d((2,2))

Conv2d(in_channels=32, out_channels=64, kernel_size=(3,3), padding=(1,1))
BatchNorm2d(64)
ReLU()
MaxPool2d((2,2))

Conv2d(in_channels=64, out_channels=128, kernel_size=(3,3), padding=(1,1))
BatchNorm2d(128)
ReLU()
MaxPool2d((2,2))

Conv2d(in_channels=128, out_channels=256, kernel_size=(3,3), padding=(1,1))
BatchNorm2d(256)
ReLU()
MaxPool2d((2,2))

Dropout(0.2)
Flatten()

Linear(1024, 512)
BatchNorm1d(512)
ReLU()
Dropout(0.2)

Linear(512, 256)
BatchNorm1d(256)
ReLU()
Dropout(0.2)

Linear(256, 128)
BatchNorm1d(128)
ReLU()
Dropout(0.2)

Linear(128, 10)
```

Trainig method: 
* cross_entropy as loss
* iterated over the epochs with minibatches
* Adam optimizer (with 0.003 as default)

training accuracy:   99.12 %

validation accuracy: 79.69 %

test accuracy:		 79.67 %

[an optional afterword and mortal curses on assignment authors]