# Homework 2, *part 2* (60 points)

In this assignment you will build a heavy convolutional neural net (CNN) to solve Tiny ImageNet image classification. Try to achieve as high accuracy as possible.

## Deliverables

* This file,
* a "checkpoint file" from `torch.save(model.state_dict(), ...)` that contains model's weights (which a TA should be able to load to verify your accuracy).

## Grading

* 9 points for reproducible training code and a filled report below.
* 12 points for building a network that gets above 20% accuracy.
* 6.5 points for beating each of these milestones on the validation set:
  * 25.0%
  * 30.0%
  * 32.5%
  * 35.0%
  * 37.5%
  * 40.0%
    
## Restrictions

* Don't use pretrained networks.

## Tips

* One change at a time: never test several new things at once.
* Google a lot.
* Use GPU.
* Use regularization: L2, batch normalization, dropout, data augmentation.
* Use Tensorboard ([non-Colab](https://github.com/lanpa/tensorboardX) or [Colab](https://medium.com/@tommytao_54597/use-tensorboard-in-google-colab-16b4bb9812a6)) or a similar interactive tool for viewing progress.

# I add here tips from  [SDA](https://github.com/yandexdataschool/Practical_DL/blob/spring2019/homework02/homework_part2.ipynb) for myself


## Tips on what can be done:


 * __Network size__
   * MOAR neurons, 
   * MOAR layers, ([torch.nn docs](http://pytorch.org/docs/master/nn.html))

   * Nonlinearities in the hidden layers
     * tanh, relu, leaky relu, etc
   * Larger networks may take more epochs to train, so don't discard your net just because it could didn't beat the baseline in 5 epochs.

   * Ph'nglui mglw'nafh Cthulhu R'lyeh wgah'nagl fhtagn!



### The main rule of prototyping: one change at a time
   * By now you probably have several ideas on what to change. By all means, try them out! But there's a catch: __never test several new things at once__.


### Optimization
   * Training for 100 epochs regardless of anything is probably a bad idea.
   * Some networks converge over 5 epochs, others - over 500.
   * Way to go: stop when validation score is 10 iterations past maximum
   * You should certainly use adaptive optimizers
     * rmsprop, nesterov_momentum, adam, adagrad and so on.
     * Converge faster and sometimes reach better optima
     * It might make sense to tweak learning rate/momentum, other learning parameters, batch size and number of epochs
   * __BatchNormalization__ (nn.BatchNorm2d) for the win!
     * Sometimes more batch normalization is better.
   * __Regularize__ to prevent overfitting
     * Add some L2 weight norm to the loss function, PyTorch will do the rest
       * Can be done manually or like [this](https://discuss.pytorch.org/t/simple-l2-regularization/139/2).
     * Dropout (`nn.Dropout`) - to prevent overfitting
       * Don't overdo it. Check if it actually makes your network better
   

### Convolution architectures
   * This task __can__ be solved by a sequence of convolutions and poolings with batch_norm and ReLU seasoning, but you shouldn't necessarily stop there.
   * [Inception family](https://hacktilldawn.com/2016/09/25/inception-modules-explained-and-implemented/), [ResNet family](https://towardsdatascience.com/an-overview-of-resnet-and-its-variants-5281e2f56035?gi=9018057983ca), [Densely-connected convolutions (exotic)](https://arxiv.org/abs/1608.06993), [Capsule networks (exotic)](https://arxiv.org/abs/1710.09829)
   * Please do try a few simple architectures before you go for resnet-152.
   * Warning! Training convolutional networks can take long without GPU. That's okay.
     * If you are CPU-only, we still recomment that you try a simple convolutional architecture
     * a perfect option is if you can set it up to run at nighttime and check it up at the morning.
     * Make reasonable layer size estimates. A 128-neuron first convolution is likely an overkill.
     * __To reduce computation__ time by a factor in exchange for some accuracy drop, try using __stride__ parameter. A stride=2 convolution should take roughly 1/4 of the default (stride=1) one.

  
### Data augmemntation
   * getting 5x as large dataset for free is a great 
     * Zoom-in+slice = move
     * Rotate+zoom(to remove black stripes)
     * Add Noize (gaussian or bernoulli)
   * Simple way to do that (if you have PIL/Image): 
     * ```from scipy.misc import imrotate,imresize```
     * and a few slicing
     * Other cool libraries: cv2, skimake, PIL/Pillow
   * A more advanced way is to use torchvision transforms:
    ```
    transform_train = transforms.Compose([
        transforms.RandomCrop(32, padding=4),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
    ])
    trainset = torchvision.datasets.ImageFolder(root=path_to_tiny_imagenet, train=True, download=True, transform=transform_train)
    trainloader = torch.utils.data.DataLoader(trainset, batch_size=128, shuffle=True, num_workers=2)

    ```
   * Or use this tool from Keras (requires theano/tensorflow): [tutorial](https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html), [docs](https://keras.io/preprocessing/image/)
   * Stay realistic. There's usually no point in flipping dogs upside down as that is not the way you usually see them.

In [0]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [0]:
import torchvision
import torch
from torchvision import transforms

# if you're running in colab,
# 1. go to Runtime -> Change Runtimy Type -> GPU
# 2. uncomment this:
# !wget https://raw.githubusercontent.com/yandexdataschool/Practical_DL/spring2019/week03_convnets/tiny_img.py -O tiny_img.py


In [3]:
import tiny_imagenet
tiny_imagenet.download(".")

./tiny-imagenet-200 already exists, not downloading


#### Let's make Data Augmentation (like it was mentioned on seminar and https://github.com/yandexdataschool/Practical_DL/blob/spring2019/homework02/homework_part2.ipynb)

In [0]:
imagenet_mean = np.array((0.4914, 0.4822, 0.4465))
imagenet_std = np.array((0.2023, 0.1994, 0.2010))

transform_train = transforms.Compose([
    torchvision.transforms.RandomCrop(64),
    torchvision.transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    torchvision.transforms.Normalize(imagenet_mean, imagenet_std),
])

In [0]:
dataset = torchvision.datasets.ImageFolder('tiny-imagenet-200/train', transform=transform_train)
train_dataset, val_dataset = torch.utils.data.random_split(dataset, [80000, 20000])

Training and validation images are now in `tiny-imagenet-200/train` and `tiny-imagenet-200/val`.

#### Fix random_seed for reproducibility

In [6]:
np.random.seed(127)
torch.manual_seed(127)

<torch._C.Generator at 0x7f3703daca50>

In [0]:
batch_size = 50
train_batch_gen = torch.utils.data.DataLoader(train_dataset, 
                                              batch_size=batch_size,
                                              shuffle=True,
                                              num_workers=1)

In [0]:
val_batch_gen = torch.utils.data.DataLoader(val_dataset, 
                                              batch_size=batch_size,
                                              shuffle=True,
                                              num_workers=1)

In [0]:
import torch, torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable

# a special module that converts [batch, channel, w, h] to [batch, units]
class Flatten(nn.Module):
    def forward(self, input):
        return input.view(input.size(0), -1)

#### Let's start with a dense network for our baseline(like seminar):

In [0]:
model = nn.Sequential()

# reshape from "images" to flat vectors
model.add_module('flatten', Flatten())

# dense "head"
model.add_module('dense1', nn.Linear(3 * 64 * 64, 1064))
model.add_module('dense2', nn.Linear(1064, 512))
model.add_module('dropout0', nn.Dropout(0.05)) 
model.add_module('dense3', nn.Linear(512, 256))
model.add_module('dropout1', nn.Dropout(0.05))
model.add_module('dense4', nn.Linear(256, 64))
model.add_module('dropout2', nn.Dropout(0.05))
model.add_module('dense1_relu', nn.ReLU())
model.add_module('dense2_logits', nn.Linear(64, 200)) # logits for 200 classes

In [0]:
#  negative log-likelihood aka crossentropy.
def compute_loss(X_batch, y_batch):
    X_batch = Variable(torch.FloatTensor(X_batch)).cuda()
    y_batch = Variable(torch.LongTensor(y_batch)).cuda()
    logits = model.cuda()(X_batch)
    return F.cross_entropy(logits, y_batch).mean()

In [0]:
import time

opt = torch.optim.SGD(model.parameters(), lr=0.01)

train_loss = []
val_accuracy = []
import numpy as np

opt = torch.optim.SGD(model.parameters(), lr=0.01)

train_loss = []
val_accuracy = []

num_epochs = 50 # total amount of full passes over training data

for epoch in range(num_epochs):
    start_time = time.time()
    model.train(True) # enable dropout / batch_norm training behavior
    for (X_batch, y_batch) in train_batch_gen:
        # train on batch
        loss = compute_loss(X_batch, y_batch)
        loss.backward()
        opt.step()
        opt.zero_grad()
        train_loss.append(loss.cpu().data.numpy())
    
    model.train(False) # disable dropout / use averages for batch_norm
    for X_batch, y_batch in val_batch_gen:
        logits = model(Variable(torch.FloatTensor(X_batch)).cuda())
        y_pred = logits.max(1)[1].data
        val_accuracy.append(np.mean( (y_batch.cpu() == y_pred.cpu()).numpy() ))

    
    # Then we print the results for this epoch:
    print("Epoch {} of {} took {:.3f}s".format(
        epoch + 1, num_epochs, time.time() - start_time))
    print("  training loss (in-iteration): \t{:.6f}".format(
        np.mean(train_loss[-len(train_dataset) // batch_size :])))
    print("  validation accuracy: \t\t\t{:.2f} %".format(
        np.mean(val_accuracy[-len(val_dataset) // batch_size :]) * 100))

Epoch 1 of 50 took 57.029s
  training loss (in-iteration): 	4.682038
  validation accuracy: 			6.42 %
Epoch 2 of 50 took 56.634s
  training loss (in-iteration): 	4.656852
  validation accuracy: 			6.67 %
Epoch 3 of 50 took 57.060s
  training loss (in-iteration): 	4.635980
  validation accuracy: 			6.66 %
Epoch 4 of 50 took 56.976s
  training loss (in-iteration): 	4.611738
  validation accuracy: 			7.24 %
Epoch 5 of 50 took 57.391s
  training loss (in-iteration): 	4.588957
  validation accuracy: 			6.99 %
Epoch 6 of 50 took 57.603s
  training loss (in-iteration): 	4.570151
  validation accuracy: 			7.63 %
Epoch 7 of 50 took 57.373s
  training loss (in-iteration): 	4.551406
  validation accuracy: 			6.95 %
Epoch 8 of 50 took 57.512s
  training loss (in-iteration): 	4.534232
  validation accuracy: 			7.67 %
Epoch 9 of 50 took 57.602s
  training loss (in-iteration): 	4.517220
  validation accuracy: 			8.01 %
Epoch 10 of 50 took 57.624s
  training loss (in-iteration): 	4.502809
  validation

In [0]:
transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(imagenet_mean, imagenet_std),
])

test_dataset = torchvision.datasets.ImageFolder('tiny-imagenet-200/val', 
                                                transform=transform_test)

In [0]:
test_batch_gen = torch.utils.data.DataLoader(test_dataset,
                                             batch_size=batch_size,
                                             shuffle=True,
                                             num_workers=1)

In [0]:
model.train(False) # disable dropout / use averages for batch_norm
test_batch_acc = []
for X_batch, y_batch in test_batch_gen:
    logits = model(Variable(torch.FloatTensor(X_batch)).cuda())
    y_pred = logits.max(1)[1].data
    test_batch_acc.append(np.mean( (y_batch.cpu() == y_pred.cpu()).numpy() ))


test_accuracy = np.mean(test_batch_acc)
    
print("Final results:")
print("  test accuracy:\t\t{:.2f} %".format(
    test_accuracy * 100))

if test_accuracy * 100 > 70:
    print("U'r freakin' amazin'!")
elif test_accuracy * 100 > 50:
    print("Achievement unlocked: 110lvl Warlock!")
elif test_accuracy * 100 > 40:
    print("Achievement unlocked: 80lvl Warlock!")
elif test_accuracy * 100 > 30:
    print("Achievement unlocked: 70lvl Warlock!")
elif test_accuracy * 100 > 20:
    print("Achievement unlocked: 60lvl Warlock!")
else:
    print("We need more magic! Follow instructons below")

Final results:
  test accuracy:		9.00 %
We need more magic! Follow instructons below


#### Now let's try simple CNN with BN<br> And make function for fit and predict

In [0]:
def compute_loss_model(model,X_batch, y_batch):
    X_batch = Variable(torch.FloatTensor(X_batch)).cuda()
    y_batch = Variable(torch.LongTensor(y_batch)).cuda()
    logits = model.cuda()(X_batch)
    return F.cross_entropy(logits, y_batch).mean()

In [None]:
from tqdm import tqdm
import time
from torchsummary import summary

def fit(model, train_batch_gen=train_batch_gen, val_batch_gen=val_batch_gen, optim=torch.optim.SGD, num_epochs=15,  lr=1e-2, batch_size=50):

    opt = optim(model.parameters(), lr=lr)

    train_loss = []
    val_accuracy = []
    flag = 0
    all_val_accuracy = [0,0]
    max_accuracy = 0
    
    flag_changes = 0
    print(num_epochs)
    for epoch in tqdm(range(num_epochs)):
        # In each epoch, we do a full pass over the training data:
        start_time = time.time()
        model.train(True) # enable dropout / batch_norm training behavior
        for (X_batch, y_batch) in train_batch_gen:
            # train on batch
            loss = compute_loss_model(model,X_batch, y_batch)
            loss.backward()
            opt.step()
            opt.zero_grad()
            train_loss.append(loss.data.cpu().numpy())  
        model.train(False) # disable dropout / use averages for batch_norm
        for X_batch, y_batch in val_batch_gen:
            logits = model(Variable(torch.FloatTensor(X_batch)).cuda())
            y_pred = logits.max(1)[1].data
            val_accuracy.append(np.mean( (y_batch.cpu() == y_pred.cpu()).numpy() ))
    
        print (epoch)
        # Then we print the results for this epoch:
        print("Epoch {} of {} took {:.3f}s".format(
            epoch + 1, num_epochs, time.time() - start_time))
        print("  training loss (in-iteration): \t{:.6f}".format(
            np.mean(train_loss[-len(train_dataset) // batch_size :])))
        print("  validation accuracy: \t\t\t{:.2f} %".format(
            np.mean(val_accuracy[-len(val_dataset) // batch_size :]) * 100))
        all_val_accuracy.append(np.mean(val_accuracy[-len(val_dataset) // batch_size :]) * 100)
        
        # earling-stop step
        if max(all_val_accuracy) > max_accuracy: 
            max_accuracy = max(all_val_accuracy)
            flag = 0
        else:
            flag += 1
        if(flag == 10):
            break
            
        if all_val_accuracy[-1]<all_val_accuracy[-2]:
            if flag_changes >5:
                break
            flag_changes+=1
        else:
            flag_changes = 0
            

In [0]:
def predict(model, test_batch_gen,train_accuracy=False):
    model.train(False) # disable dropout / use averages for batch_norm
    test_batch_acc = []
    for X_batch, y_batch in test_batch_gen:
        logits = model(Variable(torch.FloatTensor(X_batch)).cuda())
        y_pred = logits.max(1)[1].data
        test_batch_acc.append(np.mean( (y_batch.cpu() == y_pred.cpu()).numpy() ))
    

    
    
    print("Final results:")
    if train_accuracy:
        test_accuracy = np.mean(test_batch_acc[-len(train_dataset) // batch_size :])
        print("  train accuracy:\t\t{:.2f} %".format(
        test_accuracy * 100))
        
    else:
        test_accuracy = np.mean(test_batch_acc)
        print("  test accuracy:\t\t{:.2f} %".format(
        test_accuracy * 100))

    if test_accuracy * 100 > 40:
        print("full points U'r freakin' amazin'!")
    elif test_accuracy * 100 > 37.5:
        print("Achievement unlocked: 90lvl Warlock!")
    elif test_accuracy * 100 > 35:
        print("Achievement unlocked: 80lvl Warlock!")
    elif test_accuracy * 100 > 32.5:
        print("Achievement unlocked: 70lvl Warlock!")
    elif test_accuracy * 100 > 30:
        print("Achievement unlocked: 60lvl Warlock!")
    elif test_accuracy * 100 > 25:
        print("Achievement unlocked: 50lvl Warlock!")
    else:
        print("We need more magic! Follow instructons below")

In [0]:
def simple_cnn():
    model = nn.Sequential()
    model.add_module('conv1', nn.Conv2d(in_channels=3, out_channels=64, kernel_size=3, padding = (1, 1)))
    model.add_module('norm1', nn.BatchNorm2d(64))
    model.add_module('relu1', nn.ReLU())
    model.add_module('pool1', nn.MaxPool2d(2))

    model.add_module('conv2', nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, padding = (1,1)))
    model.add_module('norm2', nn.BatchNorm2d(128))
    model.add_module('relu2', nn.ReLU())
    model.add_module('pool2', nn.MaxPool2d(2))

    model.add_module('flat', Flatten())
    model.add_module('dense2', nn.Linear(128*16*16, 1024))
    model.add_module('norm__2', nn.BatchNorm1d(1024))
    model.add_module('relu2', nn.ReLU())
    model.add_module('dropout', nn.Dropout(0.5))


    model.add_module('dense1_logits', nn.Linear(1024, 200)) # logits for 200 classes
    model = model.cuda()
    return model

In [0]:
from torchsummary import summary
simple_cnn_model = simple_cnn()

summary(simple_cnn_model, (3, 64, 64))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1           [-1, 64, 64, 64]           1,792
       BatchNorm2d-2           [-1, 64, 64, 64]             128
              ReLU-3           [-1, 64, 64, 64]               0
         MaxPool2d-4           [-1, 64, 32, 32]               0
            Conv2d-5          [-1, 128, 32, 32]          73,856
       BatchNorm2d-6          [-1, 128, 32, 32]             256
              ReLU-7          [-1, 128, 32, 32]               0
         MaxPool2d-8          [-1, 128, 16, 16]               0
           Flatten-9                [-1, 32768]               0
           Linear-10                 [-1, 1024]      33,555,456
      BatchNorm1d-11                 [-1, 1024]           2,048
          Dropout-12                 [-1, 1024]               0
           Linear-13                  [-1, 200]         205,000
Total params: 33,838,536
Trainable para

In [0]:
fit(simple_cnn_model,train_batch_gen=train_batch_gen, val_batch_gen=val_batch_gen,optim=torch.optim.Adam)

  0%|          | 0/15 [00:00<?, ?it/s]

15


  7%|▋         | 1/15 [01:02<14:39, 62.85s/it]

0
Epoch 1 of 15 took 62.845s
  training loss (in-iteration): 	4.511542
  validation accuracy: 			15.96 %


 13%|█▎        | 2/15 [02:05<13:37, 62.88s/it]

1
Epoch 2 of 15 took 62.969s
  training loss (in-iteration): 	3.971709
  validation accuracy: 			11.21 %


 20%|██        | 3/15 [03:09<12:36, 63.04s/it]

2
Epoch 3 of 15 took 63.387s
  training loss (in-iteration): 	3.669898
  validation accuracy: 			19.92 %


 27%|██▋       | 4/15 [04:13<11:38, 63.47s/it]

3
Epoch 4 of 15 took 64.469s
  training loss (in-iteration): 	3.327875
  validation accuracy: 			22.12 %


 33%|███▎      | 5/15 [05:16<10:33, 63.34s/it]

4
Epoch 5 of 15 took 63.039s
  training loss (in-iteration): 	2.856931
  validation accuracy: 			21.60 %


 40%|████      | 6/15 [06:19<09:28, 63.18s/it]

5
Epoch 6 of 15 took 62.818s
  training loss (in-iteration): 	2.278643
  validation accuracy: 			20.35 %


 47%|████▋     | 7/15 [07:22<08:24, 63.12s/it]

6
Epoch 7 of 15 took 62.961s
  training loss (in-iteration): 	1.760238
  validation accuracy: 			19.68 %


 53%|█████▎    | 8/15 [08:26<07:23, 63.33s/it]

7
Epoch 8 of 15 took 63.829s
  training loss (in-iteration): 	1.417063
  validation accuracy: 			18.14 %


 60%|██████    | 9/15 [09:29<06:19, 63.31s/it]

8
Epoch 9 of 15 took 63.250s
  training loss (in-iteration): 	1.237917
  validation accuracy: 			17.69 %


 67%|██████▋   | 10/15 [10:32<05:15, 63.18s/it]

9
Epoch 10 of 15 took 62.886s
  training loss (in-iteration): 	1.120424
  validation accuracy: 			17.75 %


KeyboardInterrupt: ignored

In [0]:
simple_cnn_model = simple_cnn()

In [0]:
fit(simple_cnn_model,train_batch_gen=train_batch_gen, val_batch_gen=val_batch_gen,optim=torch.optim.Adam,num_epochs=5)



  0%|          | 0/5 [00:00<?, ?it/s][A[A

5




 20%|██        | 1/5 [01:03<04:14, 63.59s/it][A[A

0
Epoch 1 of 5 took 63.592s
  training loss (in-iteration): 	4.509953
  validation accuracy: 			11.10 %




 40%|████      | 2/5 [02:06<03:10, 63.47s/it][A[A

1
Epoch 2 of 5 took 63.161s
  training loss (in-iteration): 	3.963488
  validation accuracy: 			18.50 %




 60%|██████    | 3/5 [03:09<02:06, 63.32s/it][A[A

2
Epoch 3 of 5 took 62.960s
  training loss (in-iteration): 	3.682571
  validation accuracy: 			19.18 %




 80%|████████  | 4/5 [04:13<01:03, 63.40s/it][A[A

3
Epoch 4 of 5 took 63.587s
  training loss (in-iteration): 	3.367103
  validation accuracy: 			20.68 %




100%|██████████| 5/5 [05:16<00:00, 63.31s/it][A[A

[A[A

4
Epoch 5 of 5 took 63.079s
  training loss (in-iteration): 	2.961023
  validation accuracy: 			21.70 %


In [0]:
torch.save(simple_cnn_model, 'simple_cnn_model.ckpt')
torch.save(simple_cnn_model.state_dict(), 'simple_cnn_model.ckpt')

  "type " + obj.__name__ + ". It won't be checked "


In [0]:
predict(simple_cnn_model,test_batch_gen)

Final results:
  test accuracy:		21.07 %
Achievement unlocked: 60lvl Warlock!


### But it's not enough let's make another type of model

### Let's try VGG-like CNN: (it was mentioned in many papers)

* [1](http://cs231n.stanford.edu/reports/2017/posters/931.pdf)
* [2](http://cs231n.stanford.edu/reports/2017/pdfs/940.pdf)
* [3](https://neurohive.io/en/popular-networks/vgg16/) picture from here
* [4](https://www3.cs.stonybrook.edu/~zekzhang/cnn_classifier.html)

<img src="https://neurohive.io/wp-content/uploads/2018/11/Capture-564x570.jpg">

#### Firstly, I would like to make model like vgg16(D) without 2 last consistent layers(512 & 512) because <br> many VGG models which worked on ImageNet dataset have final picture shape (8*8)

In [0]:
def vgg_3_conv():
    model = nn.Sequential()
    
    model.add_module('conv1_1', nn.Conv2d(in_channels=3, out_channels=64, kernel_size=3, padding = 1))
    model.add_module('norm1_1', nn.BatchNorm2d(64))
    model.add_module('relu1_1', nn.ReLU())
    model.add_module('conv1_2', nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3, padding = 1))
    model.add_module('norm1_2', nn.BatchNorm2d(64))
    model.add_module('relu1_2', nn.ReLU())
    model.add_module('pool1', nn.MaxPool2d(2))#64*32*32

    model.add_module('conv2_1', nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, padding = 1))
    model.add_module('norm2_1', nn.BatchNorm2d(128))
    model.add_module('relu2_1', nn.ReLU())
    model.add_module('conv2_2', nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3, padding = 1))
    model.add_module('norm2_2', nn.BatchNorm2d(128))
    model.add_module('relu2_2', nn.ReLU())
    model.add_module('pool2', nn.MaxPool2d(2))#128*16*16
    
    model.add_module('conv3_1', nn.Conv2d(in_channels=128, out_channels=256, kernel_size=3, padding = 1))
    model.add_module('norm3_1', nn.BatchNorm2d(256))
    model.add_module('relu3_1', nn.ReLU())
    model.add_module('conv3_2', nn.Conv2d(in_channels=256, out_channels=256, kernel_size=3, padding = 1))
    model.add_module('norm3_2', nn.BatchNorm2d(256))
    model.add_module('relu3_2', nn.ReLU())
    model.add_module('conv3_3', nn.Conv2d(in_channels=256, out_channels=256, kernel_size=3, padding = 1))
    model.add_module('norm3_3', nn.BatchNorm2d(256))
    model.add_module('relu3_3', nn.ReLU())
    model.add_module('pool3', nn.MaxPool2d(2))#256*8*8
    
    

    model.add_module('flat', Flatten())
    model.add_module('dense3', nn.Linear(16384, 2048))
    model.add_module('norm__3', nn.BatchNorm1d(2048))
    model.add_module('relu__3', nn.ReLU())
    model.add_module('dropout3', nn.Dropout(0.5))
    
    model.add_module('dense2', nn.Linear(2048, 1024))
    model.add_module('norm__2', nn.BatchNorm1d(1024))
    model.add_module('relu__2', nn.ReLU())
    model.add_module('dropout2', nn.Dropout(0.5))
    #was without

    model.add_module('dense1_logits', nn.Linear(1024, 200)) # logits for 200 classes
    model = model.cuda()
    return model

In [0]:
model_vgg_3 = vgg_3_conv()

summary(model_vgg_3, (3, 64, 64))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1           [-1, 64, 64, 64]           1,792
       BatchNorm2d-2           [-1, 64, 64, 64]             128
              ReLU-3           [-1, 64, 64, 64]               0
            Conv2d-4           [-1, 64, 64, 64]          36,928
       BatchNorm2d-5           [-1, 64, 64, 64]             128
              ReLU-6           [-1, 64, 64, 64]               0
         MaxPool2d-7           [-1, 64, 32, 32]               0
            Conv2d-8          [-1, 128, 32, 32]          73,856
       BatchNorm2d-9          [-1, 128, 32, 32]             256
             ReLU-10          [-1, 128, 32, 32]               0
           Conv2d-11          [-1, 128, 32, 32]         147,584
      BatchNorm2d-12          [-1, 128, 32, 32]             256
             ReLU-13          [-1, 128, 32, 32]               0
        MaxPool2d-14          [-1, 128,

In [0]:
fit(model_vgg_3,train_batch_gen=train_batch_gen, val_batch_gen=val_batch_gen,optim=torch.optim.Adam)





  0%|          | 0/15 [00:00<?, ?it/s][A[A[A[A

15






  7%|▋         | 1/15 [03:41<51:39, 221.39s/it][A[A[A[A

0
Epoch 1 of 15 took 221.387s
  training loss (in-iteration): 	4.873892
  validation accuracy: 			7.52 %






 13%|█▎        | 2/15 [07:20<47:50, 220.79s/it][A[A[A[A

1
Epoch 2 of 15 took 219.386s
  training loss (in-iteration): 	4.094277
  validation accuracy: 			15.68 %






 20%|██        | 3/15 [10:59<44:00, 220.03s/it][A[A[A[A

2
Epoch 3 of 15 took 218.244s
  training loss (in-iteration): 	3.643180
  validation accuracy: 			17.41 %






 27%|██▋       | 4/15 [14:37<40:13, 219.41s/it][A[A[A[A

3
Epoch 4 of 15 took 217.972s
  training loss (in-iteration): 	3.376798
  validation accuracy: 			24.31 %






 33%|███▎      | 5/15 [18:14<36:27, 218.71s/it][A[A[A[A

4
Epoch 5 of 15 took 217.054s
  training loss (in-iteration): 	3.148226
  validation accuracy: 			27.05 %






 40%|████      | 6/15 [21:49<32:40, 217.87s/it][A[A[A[A

5
Epoch 6 of 15 took 215.911s
  training loss (in-iteration): 	2.923218
  validation accuracy: 			29.88 %






 47%|████▋     | 7/15 [25:25<28:57, 217.18s/it][A[A[A[A

6
Epoch 7 of 15 took 215.540s
  training loss (in-iteration): 	2.686039
  validation accuracy: 			31.05 %






 53%|█████▎    | 8/15 [29:00<25:16, 216.59s/it][A[A[A[A

7
Epoch 8 of 15 took 215.211s
  training loss (in-iteration): 	2.430439
  validation accuracy: 			31.92 %






 60%|██████    | 9/15 [32:35<21:37, 216.18s/it][A[A[A[A

8
Epoch 9 of 15 took 215.201s
  training loss (in-iteration): 	2.154887
  validation accuracy: 			32.23 %






 67%|██████▋   | 10/15 [36:11<17:59, 215.85s/it][A[A[A[A

9
Epoch 10 of 15 took 215.062s
  training loss (in-iteration): 	1.885117
  validation accuracy: 			32.87 %






 73%|███████▎  | 11/15 [39:46<14:22, 215.59s/it][A[A[A[A

10
Epoch 11 of 15 took 215.002s
  training loss (in-iteration): 	1.623405
  validation accuracy: 			32.16 %






 80%|████████  | 12/15 [43:21<10:46, 215.41s/it][A[A[A[A

11
Epoch 12 of 15 took 214.975s
  training loss (in-iteration): 	1.404879
  validation accuracy: 			31.87 %






 87%|████████▋ | 13/15 [46:56<07:10, 215.32s/it][A[A[A[A

12
Epoch 13 of 15 took 215.086s
  training loss (in-iteration): 	1.211562
  validation accuracy: 			32.18 %






 93%|█████████▎| 14/15 [50:31<03:35, 215.23s/it][A[A[A[A

13
Epoch 14 of 15 took 215.004s
  training loss (in-iteration): 	1.054120
  validation accuracy: 			30.38 %
14
Epoch 15 of 15 took 215.040s
  training loss (in-iteration): 	0.922970
  validation accuracy: 			31.37 %


In [0]:
predict(model_vgg_3,test_batch_gen)

Final results:
  test accuracy:		31.90 %
Achievement unlocked: 70lvl Warlock!


In [0]:
predict(model_vgg_3, train_batch_gen,train_accuracy=True)

#### Let's try more epoch <br>may be it will exit from plato

In [0]:
fit(model_vgg_3,train_batch_gen=train_batch_gen, val_batch_gen=val_batch_gen,optim=torch.optim.Adam, num_epochs=5)






  0%|          | 0/5 [00:00<?, ?it/s][A[A[A[A[A

5







 20%|██        | 1/5 [03:36<14:24, 216.11s/it][A[A[A[A[A

0
Epoch 1 of 5 took 216.110s
  training loss (in-iteration): 	0.837569
  validation accuracy: 			30.99 %







 40%|████      | 2/5 [07:11<10:47, 215.79s/it][A[A[A[A[A

1
Epoch 2 of 5 took 215.033s
  training loss (in-iteration): 	0.740992
  validation accuracy: 			31.26 %







 60%|██████    | 3/5 [10:46<07:11, 215.54s/it][A[A[A[A[A

2
Epoch 3 of 5 took 214.938s
  training loss (in-iteration): 	0.672289
  validation accuracy: 			30.54 %







 80%|████████  | 4/5 [14:21<03:35, 215.35s/it][A[A[A[A[A

3
Epoch 4 of 5 took 214.914s
  training loss (in-iteration): 	0.614868
  validation accuracy: 			30.66 %







100%|██████████| 5/5 [17:56<00:00, 215.25s/it][A[A[A[A[A




[A[A[A[A[A

4
Epoch 5 of 5 took 215.013s
  training loss (in-iteration): 	0.559078
  validation accuracy: 			31.30 %


In [0]:
predict(model_vgg_3,test_batch_gen)

Final results:
  test accuracy:		32.18 %
Achievement unlocked: 70lvl Warlock!


In [0]:
predict(model_vgg_3, train_batch_gen,train_accuracy=True)

In [0]:
torch.save(model_vgg_3, 'model_vgg_3_20.ckpt')
torch.save(model_vgg_3.state_dict(), 'model_vgg_3_20_params.ckpt')

#### Despite val accuracy oscillates from one epoch to another <br> Our test accuracy increase

In [0]:
fit(model_vgg_3,train_batch_gen=train_batch_gen, val_batch_gen=val_batch_gen,optim=torch.optim.Adam, num_epochs=5)






  0%|          | 0/5 [00:00<?, ?it/s][A[A[A[A[A

5







 20%|██        | 1/5 [03:35<14:20, 215.18s/it][A[A[A[A[A

0
Epoch 1 of 5 took 215.182s
  training loss (in-iteration): 	0.550353
  validation accuracy: 			30.36 %







 40%|████      | 2/5 [07:10<10:45, 215.13s/it][A[A[A[A[A

1
Epoch 2 of 5 took 214.998s
  training loss (in-iteration): 	0.500117
  validation accuracy: 			30.78 %







 60%|██████    | 3/5 [10:45<07:10, 215.11s/it][A[A[A[A[A

2
Epoch 3 of 5 took 215.061s
  training loss (in-iteration): 	0.458434
  validation accuracy: 			31.20 %







 80%|████████  | 4/5 [14:20<03:35, 215.05s/it][A[A[A[A[A

3
Epoch 4 of 5 took 214.884s
  training loss (in-iteration): 	0.441444
  validation accuracy: 			31.40 %







100%|██████████| 5/5 [17:55<00:00, 215.04s/it][A[A[A[A[A




[A[A[A[A[A

4
Epoch 5 of 5 took 215.015s
  training loss (in-iteration): 	0.419600
  validation accuracy: 			29.89 %


In [0]:
predict(model_vgg_3,test_batch_gen)

Final results:
  test accuracy:		30.29 %
Achievement unlocked: 70lvl Warlock!


In [0]:
predict(model_vgg_3, train_batch_gen,train_accuracy=True)

In [0]:
torch.save(model_vgg_3, 'model_vgg_3.ckpt')
torch.save(model_vgg_3.state_dict(), 'model_vgg_3.ckpt')

  "type " + obj.__name__ + ". It won't be checked "


### There is no point in continuing. Let's try new model

### Model will be also VGG16-like(D) but without last consistent layer (512).<br> I suppose that CNN will have problem to train with image shape 2*2

In [0]:
def vgg_4_conv():
    model = nn.Sequential()
    
    model.add_module('conv1_1', nn.Conv2d(in_channels=3, out_channels=64, kernel_size=3, padding = 1))
    model.add_module('norm1_1', nn.BatchNorm2d(64))
    model.add_module('relu1_1', nn.ReLU())
    model.add_module('conv1_2', nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3, padding = 1))
    model.add_module('norm1_2', nn.BatchNorm2d(64))
    model.add_module('relu1_2', nn.ReLU())
    model.add_module('pool1', nn.MaxPool2d(2))#64*32*32

    model.add_module('conv2_1', nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, padding = 1))
    model.add_module('norm2_1', nn.BatchNorm2d(128))
    model.add_module('relu2_1', nn.ReLU())
    model.add_module('conv2_2', nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3, padding = 1))
    model.add_module('norm2_2', nn.BatchNorm2d(128))
    model.add_module('relu2_2', nn.ReLU())
    model.add_module('pool2', nn.MaxPool2d(2))#128*16*16
    
    model.add_module('conv3_1', nn.Conv2d(in_channels=128, out_channels=256, kernel_size=3, padding = 1))
    model.add_module('norm3_1', nn.BatchNorm2d(256))
    model.add_module('relu3_1', nn.ReLU())
    model.add_module('conv3_2', nn.Conv2d(in_channels=256, out_channels=256, kernel_size=3, padding = 1))
    model.add_module('norm3_2', nn.BatchNorm2d(256))
    model.add_module('relu3_2', nn.ReLU())
    model.add_module('conv3_3', nn.Conv2d(in_channels=256, out_channels=256, kernel_size=3, padding = 1))
    model.add_module('norm3_3', nn.BatchNorm2d(256))
    model.add_module('relu3_3', nn.ReLU())
    model.add_module('pool3', nn.MaxPool2d(2))#256*8*8
    
    model.add_module('conv4_1', nn.Conv2d(in_channels=256, out_channels=512, kernel_size=3, padding = 1))
    model.add_module('norm4_1', nn.BatchNorm2d(512))
    model.add_module('relu4_1', nn.ReLU())
    model.add_module('conv4_2', nn.Conv2d(in_channels=512, out_channels=512, kernel_size=3, padding = 1))
    model.add_module('norm4_2', nn.BatchNorm2d(512))
    model.add_module('relu4_2', nn.ReLU())
    model.add_module('conv4_3', nn.Conv2d(in_channels=512, out_channels=512, kernel_size=3, padding = 1))
    model.add_module('norm4_3', nn.BatchNorm2d(512))
    model.add_module('relu4_3', nn.ReLU())
    model.add_module('pool4', nn.MaxPool2d(2))#512*4*4

    model.add_module('flat', Flatten())
    model.add_module('dense3', nn.Linear(512*4*4, 512*4))
    model.add_module('norm__3', nn.BatchNorm1d(512*4))
    model.add_module('relu__3', nn.ReLU())
    model.add_module('dropout', nn.Dropout(0.5))
    
    model.add_module('dense2', nn.Linear(512*4, 1024))
    model.add_module('norm__2', nn.BatchNorm1d(1024))
    model.add_module('relu__2', nn.ReLU())
#     model.add_module('dropout', nn.Dropout(0.5))
    

    model.add_module('dense1_logits', nn.Linear(1024, 200)) # logits for 200 classes
    model = model.cuda()
    return model

In [17]:
model_vgg_4 = vgg_4_conv()

summary(model_vgg_4, (3, 64, 64))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1           [-1, 64, 64, 64]           1,792
       BatchNorm2d-2           [-1, 64, 64, 64]             128
              ReLU-3           [-1, 64, 64, 64]               0
            Conv2d-4           [-1, 64, 64, 64]          36,928
       BatchNorm2d-5           [-1, 64, 64, 64]             128
              ReLU-6           [-1, 64, 64, 64]               0
         MaxPool2d-7           [-1, 64, 32, 32]               0
            Conv2d-8          [-1, 128, 32, 32]          73,856
       BatchNorm2d-9          [-1, 128, 32, 32]             256
             ReLU-10          [-1, 128, 32, 32]               0
           Conv2d-11          [-1, 128, 32, 32]         147,584
      BatchNorm2d-12          [-1, 128, 32, 32]             256
             ReLU-13          [-1, 128, 32, 32]               0
        MaxPool2d-14          [-1, 128,

In [18]:
fit(model_vgg_4,train_batch_gen=train_batch_gen, val_batch_gen=val_batch_gen,optim=torch.optim.Adam,lr=1e-4)

  0%|          | 0/15 [00:00<?, ?it/s]

15


  7%|▋         | 1/15 [04:11<58:38, 251.34s/it]

0
Epoch 1 of 15 took 251.339s
  training loss (in-iteration): 	4.458200
  validation accuracy: 			15.25 %


 13%|█▎        | 2/15 [08:29<54:52, 253.28s/it]

1
Epoch 2 of 15 took 257.797s
  training loss (in-iteration): 	3.564189
  validation accuracy: 			22.86 %


 20%|██        | 3/15 [12:47<50:57, 254.76s/it]

2
Epoch 3 of 15 took 258.204s
  training loss (in-iteration): 	3.132055
  validation accuracy: 			29.72 %


 27%|██▋       | 4/15 [17:05<46:54, 255.83s/it]

3
Epoch 4 of 15 took 258.325s
  training loss (in-iteration): 	2.819005
  validation accuracy: 			33.34 %


 33%|███▎      | 5/15 [21:23<42:45, 256.55s/it]

4
Epoch 5 of 15 took 258.226s
  training loss (in-iteration): 	2.576796
  validation accuracy: 			37.38 %


 40%|████      | 6/15 [25:42<38:34, 257.15s/it]

5
Epoch 6 of 15 took 258.546s
  training loss (in-iteration): 	2.375805
  validation accuracy: 			40.72 %


 47%|████▋     | 7/15 [30:00<34:19, 257.48s/it]

6
Epoch 7 of 15 took 258.249s
  training loss (in-iteration): 	2.192314
  validation accuracy: 			41.89 %


 53%|█████▎    | 8/15 [34:19<30:04, 257.76s/it]

7
Epoch 8 of 15 took 258.411s
  training loss (in-iteration): 	2.028504
  validation accuracy: 			42.63 %


 60%|██████    | 9/15 [38:37<25:47, 257.86s/it]

8
Epoch 9 of 15 took 258.085s
  training loss (in-iteration): 	1.877708
  validation accuracy: 			44.66 %


 67%|██████▋   | 10/15 [42:55<21:29, 257.98s/it]

9
Epoch 10 of 15 took 258.259s
  training loss (in-iteration): 	1.727335
  validation accuracy: 			45.98 %


 73%|███████▎  | 11/15 [47:13<17:12, 258.03s/it]

10
Epoch 11 of 15 took 258.144s
  training loss (in-iteration): 	1.583926
  validation accuracy: 			47.16 %


 80%|████████  | 12/15 [51:31<12:54, 258.07s/it]

11
Epoch 12 of 15 took 258.150s
  training loss (in-iteration): 	1.444984
  validation accuracy: 			46.58 %


 87%|████████▋ | 13/15 [55:50<08:36, 258.13s/it]

12
Epoch 13 of 15 took 258.271s
  training loss (in-iteration): 	1.313184
  validation accuracy: 			47.36 %


 93%|█████████▎| 14/15 [1:00:08<04:18, 258.13s/it]

13
Epoch 14 of 15 took 258.138s
  training loss (in-iteration): 	1.180415
  validation accuracy: 			47.43 %


100%|██████████| 15/15 [1:04:26<00:00, 258.17s/it]

14
Epoch 15 of 15 took 258.256s
  training loss (in-iteration): 	1.051286
  validation accuracy: 			46.07 %





In [19]:
torch.save(model_vgg_4, 'model_vgg_4_15.ckpt')
torch.save(model_vgg_4.state_dict(), 'model_vgg_4_15_dict.ckpt')

  "type " + obj.__name__ + ". It won't be checked "


In [20]:
predict(model_vgg_4,test_batch_gen)

Final results:
  test accuracy:		45.73 %
full points U'r freakin' amazin'!


In [21]:
predict(model_vgg_4, train_batch_gen,train_accuracy=True)

Final results:
  train accuracy:		80.78 %
full points U'r freakin' amazin'!


#### The model took necessary points but try to increase her accuracy.

In [22]:
fit(model_vgg_4,train_batch_gen=train_batch_gen, val_batch_gen=val_batch_gen,optim=torch.optim.Adam,lr=1e-4,num_epochs=5)

  0%|          | 0/5 [00:00<?, ?it/s]

5


 20%|██        | 1/5 [04:11<16:44, 251.07s/it]

0
Epoch 1 of 5 took 251.074s
  training loss (in-iteration): 	0.937667
  validation accuracy: 			48.11 %


 40%|████      | 2/5 [08:28<12:38, 252.96s/it]

1
Epoch 2 of 5 took 257.361s
  training loss (in-iteration): 	0.826441
  validation accuracy: 			47.03 %


 60%|██████    | 3/5 [12:46<08:28, 254.49s/it]

2
Epoch 3 of 5 took 258.038s
  training loss (in-iteration): 	0.719867
  validation accuracy: 			45.55 %


 80%|████████  | 4/5 [17:04<04:15, 255.48s/it]

3
Epoch 4 of 5 took 257.807s
  training loss (in-iteration): 	0.639638
  validation accuracy: 			46.85 %


100%|██████████| 5/5 [21:22<00:00, 256.23s/it]

4
Epoch 5 of 5 took 257.955s
  training loss (in-iteration): 	0.559397
  validation accuracy: 			47.82 %





In [23]:
predict(model_vgg_4,test_batch_gen)

Final results:
  test accuracy:		47.27 %
full points U'r freakin' amazin'!


predict(model_vgg_4, train_batch_gen,train_accuracy=True)

In [24]:
predict(model_vgg_4, train_batch_gen,train_accuracy=True)

Final results:
  train accuracy:		93.92 %
full points U'r freakin' amazin'!


#### Let's try more epoch, may be it will increase test accuracy despite reduction of val accuracy

In [25]:
fit(model_vgg_4,train_batch_gen=train_batch_gen, val_batch_gen=val_batch_gen,optim=torch.optim.Adam,lr=1e-4,num_epochs=5)

  0%|          | 0/5 [00:00<?, ?it/s]

5


 20%|██        | 1/5 [04:18<17:12, 258.03s/it]

0
Epoch 1 of 5 took 258.033s
  training loss (in-iteration): 	0.512136
  validation accuracy: 			46.56 %


 40%|████      | 2/5 [08:35<12:53, 257.98s/it]

1
Epoch 2 of 5 took 257.839s
  training loss (in-iteration): 	0.441928
  validation accuracy: 			46.19 %


 60%|██████    | 3/5 [12:53<08:35, 257.89s/it]

2
Epoch 3 of 5 took 257.701s
  training loss (in-iteration): 	0.395127
  validation accuracy: 			46.62 %


 80%|████████  | 4/5 [17:11<04:17, 257.86s/it]

3
Epoch 4 of 5 took 257.773s
  training loss (in-iteration): 	0.356969
  validation accuracy: 			46.74 %


100%|██████████| 5/5 [21:29<00:00, 257.85s/it]

4
Epoch 5 of 5 took 257.843s
  training loss (in-iteration): 	0.322891
  validation accuracy: 			46.85 %





In [0]:
predict(model_vgg_4,test_batch_gen)

Final results:
  test accuracy:		45.96 %
full points U'r freakin' amazin'!


In [0]:
predict(model_vgg_4,test_batch_gen)


Final results:
  test accuracy:		45.96 %
full points U'r freakin' amazin'!


#### As we can see model_vgg_4 has the maximum of test accuracy 47.27 <br> and after 25 epoch our train approximately 99% that'is why it will not give better quality

### Let's try final model without augmentation

In [0]:
dataset1 = torchvision.datasets.ImageFolder('tiny-imagenet-200/train', transform=transforms.ToTensor())
test_dataset1 = torchvision.datasets.ImageFolder('tiny-imagenet-200/val', transform=transforms.ToTensor())
train_dataset1, val_dataset1 = torch.utils.data.random_split(dataset, [80000, 20000])


In [0]:
batch_size = 50
train_batch_gen1 = torch.utils.data.DataLoader(train_dataset1, 
                                              batch_size=batch_size,
                                              shuffle=True,
                                              num_workers=1)

In [0]:
val_batch_gen1 = torch.utils.data.DataLoader(val_dataset1, 
                                              batch_size=batch_size,
                                              shuffle=True,
                                              num_workers=1)

In [0]:
test_batch_gen1 = torch.utils.data.DataLoader(test_dataset1, 
                                             batch_size=batch_size,
                                             shuffle=True,
                                             num_workers=1)

In [0]:
model_vgg_4 = vgg_4_conv()

In [31]:
fit(model_vgg_4,train_batch_gen=train_batch_gen1, val_batch_gen=val_batch_gen1,optim=torch.optim.Adam,lr=1e-4)

  0%|          | 0/15 [00:00<?, ?it/s]

15


  7%|▋         | 1/15 [04:13<59:12, 253.74s/it]

0
Epoch 1 of 15 took 253.740s
  training loss (in-iteration): 	4.437639
  validation accuracy: 			16.30 %


 13%|█▎        | 2/15 [08:31<55:14, 254.94s/it]

1
Epoch 2 of 15 took 257.731s
  training loss (in-iteration): 	3.536462
  validation accuracy: 			23.95 %


 20%|██        | 3/15 [12:49<51:10, 255.85s/it]

2
Epoch 3 of 15 took 257.960s
  training loss (in-iteration): 	3.106857
  validation accuracy: 			30.75 %


 27%|██▋       | 4/15 [17:07<47:02, 256.57s/it]

3
Epoch 4 of 15 took 258.263s
  training loss (in-iteration): 	2.805357
  validation accuracy: 			33.35 %


 33%|███▎      | 5/15 [21:26<42:51, 257.12s/it]

4
Epoch 5 of 15 took 258.400s
  training loss (in-iteration): 	2.563449
  validation accuracy: 			37.51 %


 40%|████      | 6/15 [25:44<38:36, 257.43s/it]

5
Epoch 6 of 15 took 258.160s
  training loss (in-iteration): 	2.362634
  validation accuracy: 			38.72 %


 47%|████▋     | 7/15 [30:02<34:21, 257.66s/it]

6
Epoch 7 of 15 took 258.199s
  training loss (in-iteration): 	2.184224
  validation accuracy: 			40.91 %


 53%|█████▎    | 8/15 [34:20<30:04, 257.78s/it]

7
Epoch 8 of 15 took 258.065s
  training loss (in-iteration): 	2.021948
  validation accuracy: 			42.99 %


 60%|██████    | 9/15 [38:38<25:47, 257.92s/it]

8
Epoch 9 of 15 took 258.220s
  training loss (in-iteration): 	1.867450
  validation accuracy: 			44.28 %


 67%|██████▋   | 10/15 [42:56<21:30, 258.01s/it]

9
Epoch 10 of 15 took 258.228s
  training loss (in-iteration): 	1.721002
  validation accuracy: 			44.81 %


 73%|███████▎  | 11/15 [47:15<17:12, 258.06s/it]

10
Epoch 11 of 15 took 258.158s
  training loss (in-iteration): 	1.574139
  validation accuracy: 			45.52 %


 80%|████████  | 12/15 [51:33<12:54, 258.05s/it]

11
Epoch 12 of 15 took 258.025s
  training loss (in-iteration): 	1.434782
  validation accuracy: 			45.85 %


 87%|████████▋ | 13/15 [55:51<08:36, 258.00s/it]

12
Epoch 13 of 15 took 257.899s
  training loss (in-iteration): 	1.300968
  validation accuracy: 			46.33 %


 93%|█████████▎| 14/15 [1:00:09<04:18, 258.07s/it]

13
Epoch 14 of 15 took 258.221s
  training loss (in-iteration): 	1.165500
  validation accuracy: 			45.41 %


100%|██████████| 15/15 [1:04:27<00:00, 258.02s/it]

14
Epoch 15 of 15 took 257.909s
  training loss (in-iteration): 	1.038347
  validation accuracy: 			46.23 %





In [32]:
predict(model_vgg_4,test_batch_gen)

Final results:
  test accuracy:		46.27 %
full points U'r freakin' amazin'!


In [33]:
torch.save(model_vgg_4, 'model_vgg_4_15_non_augmentation.ckpt')
torch.save(model_vgg_4.state_dict(), 'model_vgg_4_15_dict_non_augmentation.ckpt')

  "type " + obj.__name__ + ". It won't be checked "


### Let's try different optimizators for final model

### Adagrad

In [0]:
model_vgg_4 = vgg_4_conv()

In [35]:
fit(model_vgg_4,train_batch_gen=train_batch_gen, val_batch_gen=val_batch_gen,optim=torch.optim.Adagrad,lr=1e-4)

  0%|          | 0/15 [00:00<?, ?it/s]

15


  7%|▋         | 1/15 [04:12<59:01, 253.00s/it]

0
Epoch 1 of 15 took 252.994s
  training loss (in-iteration): 	5.190955
  validation accuracy: 			4.76 %


 13%|█▎        | 2/15 [08:26<54:51, 253.17s/it]

1
Epoch 2 of 15 took 253.583s
  training loss (in-iteration): 	5.065718
  validation accuracy: 			5.78 %


 20%|██        | 3/15 [12:40<50:39, 253.33s/it]

2
Epoch 3 of 15 took 253.680s
  training loss (in-iteration): 	4.995880
  validation accuracy: 			6.40 %


 27%|██▋       | 4/15 [16:53<46:27, 253.42s/it]

3
Epoch 4 of 15 took 253.646s
  training loss (in-iteration): 	4.944571
  validation accuracy: 			6.75 %


 33%|███▎      | 5/15 [21:07<42:15, 253.52s/it]

4
Epoch 5 of 15 took 253.743s
  training loss (in-iteration): 	4.899580
  validation accuracy: 			7.32 %


 40%|████      | 6/15 [25:21<38:01, 253.52s/it]

5
Epoch 6 of 15 took 253.503s
  training loss (in-iteration): 	4.862000
  validation accuracy: 			7.80 %


 47%|████▋     | 7/15 [29:34<33:48, 253.51s/it]

6
Epoch 7 of 15 took 253.487s
  training loss (in-iteration): 	4.825287
  validation accuracy: 			8.36 %


 53%|█████▎    | 8/15 [33:47<29:33, 253.34s/it]

7
Epoch 8 of 15 took 252.962s
  training loss (in-iteration): 	4.795421
  validation accuracy: 			8.50 %


 60%|██████    | 9/15 [38:01<25:20, 253.39s/it]

8
Epoch 9 of 15 took 253.487s
  training loss (in-iteration): 	4.767249
  validation accuracy: 			9.00 %


 67%|██████▋   | 10/15 [42:14<21:06, 253.35s/it]

9
Epoch 10 of 15 took 253.254s
  training loss (in-iteration): 	4.740455
  validation accuracy: 			9.29 %


 73%|███████▎  | 11/15 [46:27<16:53, 253.38s/it]

10
Epoch 11 of 15 took 253.455s
  training loss (in-iteration): 	4.716852
  validation accuracy: 			9.56 %


 80%|████████  | 12/15 [50:40<12:39, 253.27s/it]

11
Epoch 12 of 15 took 253.014s
  training loss (in-iteration): 	4.694594
  validation accuracy: 			9.74 %


 87%|████████▋ | 13/15 [54:54<08:26, 253.28s/it]

12
Epoch 13 of 15 took 253.299s
  training loss (in-iteration): 	4.672462
  validation accuracy: 			10.33 %


 93%|█████████▎| 14/15 [59:07<04:13, 253.30s/it]

13
Epoch 14 of 15 took 253.354s
  training loss (in-iteration): 	4.651579
  validation accuracy: 			10.41 %


100%|██████████| 15/15 [1:03:20<00:00, 253.35s/it]

14
Epoch 15 of 15 took 253.442s
  training loss (in-iteration): 	4.632497
  validation accuracy: 			10.66 %





In [36]:
predict(model_vgg_4,test_batch_gen)

Final results:
  test accuracy:		10.49 %
We need more magic! Follow instructons below


In [37]:
predict(model_vgg_4, train_batch_gen,train_accuracy=True)

Final results:
  train accuracy:		11.37 %
We need more magic! Follow instructons below


# Report

Below, please mention

* a brief history of tweaks and improvements;
* what is the final architecture and why?
* what is the training method (batch size, optimization algorithm, ...) and why?
* Any regularization and other techniques applied and their effects;

The reference format is:

*"I have analyzed these and these articles|sources|blog posts, tried that and that to adapt them to my problem and the conclusions are such and such".*

After models from seminar(where I took accuracy equal to 20%), I tried simple model with many layers < you wrote<br> "This task can be solved by a sequence of convolutions and poolings with batch_norm and ReLU seasoning, but you shouldn't necessarily stop there.">.<br> (I understand that low dimensions of image shape can beat me)

<img src='photo_2019-04-16_00-36-56.jpg'>

And (after many hours) I achieved good val accuracy(more 40%) after 25 epochs but have a problem with low test accuracy approximately 7% (on val 40.99%).

After googling (you can see attached links before model_vgg_3) I found a good article about [VGG-model](https://neurohive.io/en/popular-networks/vgg16/) and tried to implement it.<br>
As you can see I made  VGG16(D)  models without the last (1 or 2) layers because I think that smaller shapes as in my first model will not achieved  a good result on test dataset.

Firstly, I tried VGG like model without 2 last convolution layers and take learning rate = 1e-2. Unfortunately, I didn't get necessary accuracy and want to try MOAR layers. 

Also, I added everywhere BN because many papers (seminars) report that it increases speed of convergence and I added everywhere Dropout to prevent overfitting.

If I added dropout after every linear-bn-relu layer it decreases accuracy of model and time of learning( I think that I excluded too much information) . And I decided to put one dropout after first dense layer.

Secondly, I made model_vgg_4 and try it with learing rate = 1e-2.Then I didn't take accuracy on val and test more then 34. After many changes and my tears, I change learning rate to 1e-4 and got results above (best result val and test accuracy approximately 48). After 15 iteration the model started to approach to overfittind and I continued my reserch.  

Thirdly, I build model_vgg_4 on not augmented data and got the same results. I think specially for last architecture  It doesn't influence on accuracy.

Finally, I changed optimizer to Adagrad and got bad quality(accuracy) of this model.

Also, you can see my child version of early stopping in fit function.
