# Задание 3.2 - сверточные нейронные сети (CNNs) в PyTorch

Это упражнение мы буде выполнять в Google Colab - https://colab.research.google.com/  
Google Colab позволяет запускать код в notebook в облаке Google, где можно воспользоваться бесплатным GPU!  

Авторы курса благодарят компанию Google и надеятся, что праздник не закончится.

Туториал по настройке Google Colab:  
https://medium.com/deep-learning-turkey/google-colab-free-gpu-tutorial-e113627b9f5d  
(Keras инсталлировать не нужно, наш notebook сам установит PyTorch)


In [1]:
# Intstall PyTorch and download data
!pip3 install torch torchvision

!wget -c http://ufldl.stanford.edu/housenumbers/train_32x32.mat http://ufldl.stanford.edu/housenumbers/test_32x32.mat

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
--2023-03-26 06:34:16--  http://ufldl.stanford.edu/housenumbers/train_32x32.mat
Resolving ufldl.stanford.edu (ufldl.stanford.edu)... 171.64.68.10
Connecting to ufldl.stanford.edu (ufldl.stanford.edu)|171.64.68.10|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 182040794 (174M) [text/plain]
Saving to: ‘train_32x32.mat’


2023-03-26 06:34:31 (12.2 MB/s) - ‘train_32x32.mat’ saved [182040794/182040794]

--2023-03-26 06:34:31--  http://ufldl.stanford.edu/housenumbers/test_32x32.mat
Reusing existing connection to ufldl.stanford.edu:80.
HTTP request sent, awaiting response... 200 OK
Length: 64275384 (61M) [text/plain]
Saving to: ‘test_32x32.mat’


2023-03-26 06:34:35 (15.7 MB/s) - ‘test_32x32.mat’ saved [64275384/64275384]

FINISHED --2023-03-26 06:34:35--
Total wall clock time: 19s
Downloaded: 2 files, 235M in 18s (13.0 MB/s)


In [2]:
from collections import namedtuple

import matplotlib
matplotlib.use('Qt5Agg')
import matplotlib.pyplot as plt
import numpy as np
import PIL
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.datasets as dset
from torch.utils.data.sampler import SubsetRandomSampler

from torchvision import transforms

In [3]:
device = torch.device("cuda:0") # Let's make sure GPU is available!

# Загружаем данные

In [8]:
# First, lets load the dataset
data_train = dset.SVHN('./', 
                       transform=transforms.Compose([
                           transforms.ToTensor(),
                           transforms.Normalize(mean=[0.43,0.44,0.47],
                                               std=[0.20,0.20,0.20])                           
                       ])
                      )
data_test = dset.SVHN('./', split='test', transform=transforms.Compose([
                           transforms.ToTensor(),
                           transforms.Normalize(mean=[0.43,0.44,0.47],
                                               std=[0.20,0.20,0.20])                           
                       ]))

Downloading http://ufldl.stanford.edu/housenumbers/train_32x32.mat to ./train_32x32.mat


  0%|          | 0/182040794 [00:00<?, ?it/s]

Downloading http://ufldl.stanford.edu/housenumbers/test_32x32.mat to ./test_32x32.mat


  0%|          | 0/64275384 [00:00<?, ?it/s]

Разделяем данные на training и validation.

На всякий случай для подробностей - https://pytorch.org/tutorials/beginner/data_loading_tutorial.html

In [9]:
batch_size = 64

data_size = data_train.data.shape[0]
validation_split = .2
split = int(np.floor(validation_split * data_size))
indices = list(range(data_size))
np.random.shuffle(indices)

train_indices, val_indices = indices[split:], indices[:split]

train_sampler = SubsetRandomSampler(train_indices)
val_sampler = SubsetRandomSampler(val_indices)

train_loader = torch.utils.data.DataLoader(data_train, batch_size=batch_size, 
                                           sampler=train_sampler)
val_loader = torch.utils.data.DataLoader(data_train, batch_size=batch_size,
                                         sampler=val_sampler)

In [10]:
# We'll use a special helper module to shape it into a flat tensor
class Flattener(nn.Module):
    def forward(self, x):
        batch_size, *_ = x.shape
        return x.view(batch_size, -1)

Создадим простейшую сеть с новыми слоями:  
Convolutional - `nn.Conv2d`  
MaxPool - `nn.MaxPool2d`

In [11]:
nn_model = nn.Sequential(
            nn.Conv2d(3, 64, 3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(4),
            nn.Conv2d(64, 64, 3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(4),    
            Flattener(),
            nn.Linear(64*2*2, 10),
          )

nn_model.type(torch.cuda.FloatTensor)
nn_model.to(device)

loss = nn.CrossEntropyLoss().type(torch.cuda.FloatTensor)
optimizer = optim.SGD(nn_model.parameters(), lr=1e-1, weight_decay=1e-4)

Восстановите функцию `compute_accuracy` из прошлого задания.  
Единственное отличие в новом - она должна передать данные на GPU прежде чем прогонять через модель. Сделайте это так же, как это делает функция `train_model`

In [12]:
def train_model(model, train_loader, val_loader, loss, optimizer, num_epochs, step_size=2, gamma=0.5, use_scheduler=False): 
    loss_history = []
    train_history = []
    val_history = []
    if use_scheduler:
        scheduler = torch.optim.lr_scheduler.StepLR(optimizer=optimizer, step_size=step_size, gamma=gamma) 
    for epoch in range(num_epochs):
        model.train() # Enter train mode
        
        loss_accum = 0
        correct_samples = 0
        total_samples = 0
        for i_step, (x, y) in enumerate(train_loader):
          
            x_gpu = x.to(device)
            y_gpu = y.to(device)
            prediction = model(x_gpu)    
            loss_value = loss(prediction, y_gpu)
            optimizer.zero_grad()
            loss_value.backward()
            optimizer.step()
            
            _, indices = torch.max(prediction, 1)
            correct_samples += torch.sum(indices == y_gpu)
            total_samples += y.shape[0]
            
            loss_accum += loss_value

        ave_loss = loss_accum / i_step
        train_accuracy = float(correct_samples) / total_samples
        val_accuracy = compute_accuracy(model, val_loader)
        
        loss_history.append(float(ave_loss))
        train_history.append(train_accuracy)
        val_history.append(val_accuracy)

        print("Epoch: %f, Average loss: %f, Train accuracy: %f, Val accuracy: %f" % (epoch, ave_loss, train_accuracy, val_accuracy))
        if use_scheduler:
            scheduler.step()
        
    return loss_history, train_history, val_history
        
def compute_accuracy(model, loader):
    """
    Computes accuracy on the dataset wrapped in a loader
    
    Returns: accuracy as a float value between 0 and 1
    """
    model.eval() # Evaluation mode
    # TODO: Implement the inference of the model on all of the batches from loader,
    #       and compute the overall accuracy.
    # Hint: PyTorch has the argmax function!
    
    pred_list = []
    test_list = []

    for test_X, test_Y in loader:
        test_X_gpu = test_X.to(device)
        test_Y_gpu = test_Y.to(device)
        pred_list.append(torch.argmax(model(test_X_gpu), dim=1))
        test_list.append(test_Y_gpu)

    pred_list = torch.hstack(pred_list)
    test_list = torch.hstack(test_list)
    
    accuracy = torch.sum(test_list == pred_list)/len(pred_list)    
    
    return accuracy

loss_history, train_history, val_history = train_model(nn_model, train_loader, val_loader, loss, optimizer, 5)

Epoch: 0.000000, Average loss: 1.390235, Train accuracy: 0.534979, Val accuracy: 0.719473
Epoch: 1.000000, Average loss: 0.721146, Train accuracy: 0.779374, Val accuracy: 0.806361
Epoch: 2.000000, Average loss: 0.617981, Train accuracy: 0.814388, Val accuracy: 0.814074
Epoch: 3.000000, Average loss: 0.565337, Train accuracy: 0.831980, Val accuracy: 0.821650
Epoch: 4.000000, Average loss: 0.530758, Train accuracy: 0.843600, Val accuracy: 0.837963


# Аугментация данных (Data augmentation)

В работе с изображениями одним из особенно важных методов является аугментация данных - то есть, генерация дополнительных данных для тренировки на основе изначальных.   
Таким образом, мы получаем возможность "увеличить" набор данных для тренировки, что ведет к лучшей работе сети.
Важно, чтобы аугментированные данные были похожи на те, которые могут встретиться в реальной жизни, иначе польза от аугментаций уменьшается и может ухудшить работу сети.

С PyTorch идут несколько таких алгоритмов, называемых `transforms`. Более подробно про них можно прочитать тут -
https://pytorch.org/tutorials/beginner/data_loading_tutorial.html#transforms

Ниже мы используем следующие алгоритмы генерации:
- ColorJitter - случайное изменение цвета
- RandomHorizontalFlip - горизонтальное отражение с вероятностью 50%
- RandomVerticalFlip - вертикальное отражение с вероятностью 50%
- RandomRotation - случайный поворот

In [13]:
tfs = transforms.Compose([
    transforms.ColorJitter(hue=.50, saturation=.50),
    transforms.RandomHorizontalFlip(),
    transforms.RandomVerticalFlip(),
    transforms.RandomRotation(50, interpolation=PIL.Image.BILINEAR),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.43,0.44,0.47],
                       std=[0.20,0.20,0.20])                           
])

# Create augmented train dataset
data_aug_train = dset.SVHN('./', 
                       transform=tfs
                      )

train_aug_loader = torch.utils.data.DataLoader(data_aug_train, batch_size=batch_size, 
                                           sampler=train_sampler)



Визуализируем результаты агментации (вообще, смотреть на сгенерированные данные всегда очень полезно).

In [14]:
# TODO: Visualize some augmented images!
# hint: you can create new datasets and loaders to accomplish this

# Based on the visualizations, should we keep all the augmentations?

tfs = transforms.Compose([
    transforms.ColorJitter(hue=.20, saturation=.20),
    transforms.RandomHorizontalFlip(),
    transforms.RandomVerticalFlip(),
    transforms.RandomRotation(10, interpolation=PIL.Image.BILINEAR),
])

data_aug_vis = dset.SVHN('./', 
                       transform=tfs
                      )

plt.figure(figsize=(30, 3))

for i, (x, y) in enumerate(data_aug_vis):
    if i == 10:
        break
    plt.subplot(1, 10, i+1)
    plt.grid(False)
    plt.imshow(x)
    plt.axis('off')

Все ли агментации одинаково полезны на этом наборе данных? Могут ли быть среди них те, которые собьют модель с толку?

Выберите из них только корректные

In [15]:
# TODO: 
tfs = transforms.Compose([
    # TODO: Add good augmentations
    transforms.ColorJitter(hue=.20, saturation=.20),
    transforms.RandomRotation(10, interpolation=PIL.Image.BILINEAR),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.43,0.44,0.47],
                       std=[0.20,0.20,0.20])                           
])

data_aug_train = dset.SVHN('./', 
                       transform=tfs
                      )

# TODO create new instances of loaders with the augmentations you chose
train_aug_loader = torch.utils.data.DataLoader(data_aug_train, batch_size=batch_size, 
                                           sampler=train_sampler)

In [16]:
# Finally, let's train with augmentations!

# Note we shouldn't use augmentations on validation

loss_history, train_history, val_history = train_model(nn_model, train_aug_loader, val_loader, loss, optimizer, 5)

Epoch: 0.000000, Average loss: 0.617004, Train accuracy: 0.811931, Val accuracy: 0.835097
Epoch: 1.000000, Average loss: 0.565919, Train accuracy: 0.828379, Val accuracy: 0.845881
Epoch: 2.000000, Average loss: 0.546541, Train accuracy: 0.834334, Val accuracy: 0.848952
Epoch: 3.000000, Average loss: 0.529878, Train accuracy: 0.838805, Val accuracy: 0.850386
Epoch: 4.000000, Average loss: 0.512710, Train accuracy: 0.843258, Val accuracy: 0.846086


In [17]:
train_aug_loader.dataset.data[0].shape

(3, 32, 32)

# LeNet
Попробуем имплементировать классическую архитектуру сверточной нейронной сети, предложенную Яном ЛеКуном в 1998 году. В свое время она достигла впечатляющих результатов на MNIST, посмотрим как она справится с SVHN?
Она описана в статье ["Gradient Based Learning Applied to Document Recognition"](http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf), попробуйте прочитать ключевые части и имплементировать предложенную архитетуру на PyTorch.

Реализовывать слои и функцию ошибки LeNet, которых нет в PyTorch, **не нужно** - просто возьмите их размеры и переведите в уже известные нам Convolutional, Pooling и Fully Connected layers.

Если в статье не очень понятно, можно просто погуглить LeNet и разобраться в деталях :)

In [18]:
# TODO: Implement LeNet-like architecture for SVHN task
lenet_model = nn.Sequential(
          nn.Conv2d(in_channels=3, out_channels=6, kernel_size=(5, 5)),
          nn.MaxPool2d(kernel_size=(2, 2), stride=2),
          nn.Tanh(),
          nn.Conv2d(in_channels=6, out_channels=16, kernel_size=(5, 5)),
          nn.MaxPool2d(kernel_size=(2, 2), stride=2),
          nn.Tanh(),
          Flattener(),
          nn.Linear(in_features=16 * 5 * 5, out_features=120),
          nn.Tanh(),
          nn.Linear(120, 84),
          nn.Tanh(),
          nn.Linear(84, 10)
          )

lenet_model.type(torch.cuda.FloatTensor)
lenet_model.to(device)

loss = nn.CrossEntropyLoss().type(torch.cuda.FloatTensor)
optimizer = optim.SGD(lenet_model.parameters(), lr=1e-1, weight_decay=1e-4)

In [19]:
# Let's train it!
loss_history, train_history, val_history = train_model(lenet_model, train_aug_loader, val_loader, loss, optimizer, 10)

Epoch: 0.000000, Average loss: 1.228162, Train accuracy: 0.587602, Val accuracy: 0.817350
Epoch: 1.000000, Average loss: 0.570891, Train accuracy: 0.824335, Val accuracy: 0.846154
Epoch: 2.000000, Average loss: 0.479917, Train accuracy: 0.852421, Val accuracy: 0.866835
Epoch: 3.000000, Average loss: 0.436173, Train accuracy: 0.866669, Val accuracy: 0.864241
Epoch: 4.000000, Average loss: 0.403884, Train accuracy: 0.875218, Val accuracy: 0.875640
Epoch: 5.000000, Average loss: 0.382499, Train accuracy: 0.881292, Val accuracy: 0.880827
Epoch: 6.000000, Average loss: 0.364528, Train accuracy: 0.889005, Val accuracy: 0.881988
Epoch: 7.000000, Average loss: 0.353490, Train accuracy: 0.891649, Val accuracy: 0.886970
Epoch: 8.000000, Average loss: 0.337224, Train accuracy: 0.895181, Val accuracy: 0.884172
Epoch: 9.000000, Average loss: 0.326993, Train accuracy: 0.899021, Val accuracy: 0.888335


# Подбор гиперпараметров

In [20]:
# The key hyperparameters we're going to tune are learning speed, annealing rate and regularization
# We also encourage you to try different optimizers as well
from itertools import product
Hyperparams = namedtuple("Hyperparams", ['learning_rate', 'anneal_epochs', 'reg'])
RunResult = namedtuple("RunResult", ['model', 'train_history', 'val_history', 'final_val_accuracy'])

learning_rates = [1e-2, 1e-3]
anneal_coeff = 0.2
anneal_epochs = [1, 10]
reg = [1e-3, 1e-5]

batch_size = 512
epoch_num = 5

# Record all the runs here
# Key should be Hyperparams and values should be RunResult
run_record = {} 
def get_new_Lenet_model():
  lenet_model = nn.Sequential(
          nn.Conv2d(in_channels=3, out_channels=6, kernel_size=(5, 5)),
          nn.MaxPool2d(kernel_size=(2, 2), stride=2),
          nn.Tanh(),
          nn.Conv2d(in_channels=6, out_channels=16, kernel_size=(5, 5)),
          nn.MaxPool2d(kernel_size=(2, 2), stride=2),
          nn.Tanh(),
          Flattener(),
          nn.Linear(in_features=16 * 5 * 5, out_features=120),
          nn.Tanh(),
          nn.Linear(120, 84),
          nn.Tanh(),
          nn.Linear(84, 10)
          )

  return lenet_model
  
# Use grid search or random search and record all runs in run_record dictionnary 
# Important: perform search in logarithmic space!
attemps = 10
for ae, rg, lr in product(anneal_epochs, reg, learning_rates):
  current_params = Hyperparams(lr, ae, rg)
  new_lenet_model = get_new_Lenet_model()
  new_lenet_model.type(torch.cuda.FloatTensor)
  new_lenet_model.to(device)
  loss = nn.CrossEntropyLoss().type(torch.cuda.FloatTensor)
  optimizer = optim.Adam(new_lenet_model.parameters(), lr=current_params.learning_rate, weight_decay=current_params.reg)
  loss_history, train_history, val_history = train_model(new_lenet_model, train_aug_loader, val_loader, loss, optimizer, epoch_num, current_params.anneal_epochs, anneal_coeff, use_scheduler=True)
  print(f" lr: {current_params.learning_rate} | weight_decay: {current_params.reg} | anneal_epochs: {current_params.anneal_epochs} | accuracy: {val_history[-1]}")
  result = RunResult(new_lenet_model, train_history, val_history, val_history[-1])
  run_record[current_params]=result
# TODO: Your code here!

Epoch: 0.000000, Average loss: 1.362503, Train accuracy: 0.537863, Val accuracy: 0.749164
Epoch: 1.000000, Average loss: 0.620845, Train accuracy: 0.809320, Val accuracy: 0.834550
Epoch: 2.000000, Average loss: 0.470903, Train accuracy: 0.857796, Val accuracy: 0.864241
Epoch: 3.000000, Average loss: 0.425055, Train accuracy: 0.872078, Val accuracy: 0.868473
Epoch: 4.000000, Average loss: 0.414892, Train accuracy: 0.875218, Val accuracy: 0.869087
 lr: 0.01 | weight_decay: 0.001 | anneal_epochs: 1 | accuracy: 0.8690873980522156
Epoch: 0.000000, Average loss: 0.921637, Train accuracy: 0.701993, Val accuracy: 0.839738
Epoch: 1.000000, Average loss: 0.512186, Train accuracy: 0.846364, Val accuracy: 0.860078
Epoch: 2.000000, Average loss: 0.476144, Train accuracy: 0.858257, Val accuracy: 0.862262
Epoch: 3.000000, Average loss: 0.468882, Train accuracy: 0.860799, Val accuracy: 0.864173
Epoch: 4.000000, Average loss: 0.466333, Train accuracy: 0.861772, Val accuracy: 0.863491
 lr: 0.001 | weigh

In [21]:
best_val_accuracy = None
best_hyperparams = None
best_run = None

for hyperparams, run_result in run_record.items():
    if best_val_accuracy is None or best_val_accuracy < run_result.final_val_accuracy:
        best_val_accuracy = run_result.final_val_accuracy
        best_hyperparams = hyperparams
        best_run = run_result
        
print("Best validation accuracy: %4.2f, best hyperparams: %s" % (best_val_accuracy, best_hyperparams))
        

Best validation accuracy: 0.88, best hyperparams: Hyperparams(learning_rate=0.001, anneal_epochs=10, reg=0.001)


# Свободное упражнение - догоним и перегоним LeNet!

Попробуйте найти архитектуру и настройки тренировки, чтобы выступить лучше наших бейзлайнов.

Что можно и нужно попробовать:
- BatchNormalization (для convolution layers он в PyTorch называется [batchnorm2d](https://pytorch.org/docs/stable/nn.html#batchnorm2d))
- Изменить количество слоев и их толщину
- Изменять количество эпох тренировки
- Попробовать и другие агментации

In [105]:
learning_rates = [1e-2, 1e-3]
anneal_coeff = 0.2
anneal_epochs = [1, 10]
reg = [1e-3, 1e-5]

batch_size = 512
epoch_num = 5

run_record = {} 
def get_new_model():
  best_model = nn.Sequential(
          nn.Conv2d(in_channels=3, out_channels=16, kernel_size=(5, 5), padding="same"),
          nn.MaxPool2d(kernel_size=(2, 2), stride=2),
          nn.BatchNorm2d(num_features=16),
          nn.ReLU(),
          nn.Conv2d(in_channels=16, out_channels=32, kernel_size=(5, 5), padding="same"),
          nn.MaxPool2d(kernel_size=(2, 2), stride=2),
          nn.BatchNorm2d(num_features=32),
          nn.ReLU(),
          nn.Conv2d(in_channels=32, out_channels=64, kernel_size=(5, 5), padding="same"),
          nn.MaxPool2d(kernel_size=(2, 2), stride=2),
          nn.BatchNorm2d(num_features=64),
          nn.ReLU(),
          Flattener(),
          nn.Linear(in_features=1024, out_features=100),
          nn.ReLU(),
          nn.Linear(100, 10))

  return best_model

attemps = 10
for ae, rg, lr in product(anneal_epochs, reg, learning_rates):
  current_params = Hyperparams(lr, ae, rg)
  new_lenet_model = get_new_model()
  new_lenet_model.type(torch.cuda.FloatTensor)
  new_lenet_model.to(device)
  loss = nn.CrossEntropyLoss().type(torch.cuda.FloatTensor)
  optimizer = optim.Adam(new_lenet_model.parameters(), lr=current_params.learning_rate, weight_decay=current_params.reg)
  loss_history, train_history, val_history = train_model(new_lenet_model, train_aug_loader, val_loader, loss, optimizer, epoch_num, current_params.anneal_epochs, anneal_coeff, use_scheduler=True)
  print(f" lr: {current_params.learning_rate} | weight_decay: {current_params.reg} | anneal_epochs: {current_params.anneal_epochs} | accuracy: {val_history[-1]}")
  result = RunResult(new_lenet_model, train_history, val_history, val_history[-1])
  run_record[current_params]=result
# TODO: Your code here!

Epoch: 0.000000, Average loss: 1.250682, Train accuracy: 0.571409, Val accuracy: 0.817009
Epoch: 1.000000, Average loss: 0.505729, Train accuracy: 0.845886, Val accuracy: 0.867449
Epoch: 2.000000, Average loss: 0.418108, Train accuracy: 0.874774, Val accuracy: 0.878848
Epoch: 3.000000, Average loss: 0.390278, Train accuracy: 0.883578, Val accuracy: 0.882875
Epoch: 4.000000, Average loss: 0.387233, Train accuracy: 0.884534, Val accuracy: 0.882670
 lr: 0.01 | weight_decay: 0.001 | anneal_epochs: 1 | accuracy: 0.8826701045036316
Epoch: 0.000000, Average loss: 0.712976, Train accuracy: 0.769495, Val accuracy: 0.872841
Epoch: 1.000000, Average loss: 0.327593, Train accuracy: 0.903423, Val accuracy: 0.907242
Epoch: 2.000000, Average loss: 0.283974, Train accuracy: 0.918268, Val accuracy: 0.914272
Epoch: 3.000000, Average loss: 0.275843, Train accuracy: 0.920998, Val accuracy: 0.915910
Epoch: 4.000000, Average loss: 0.272704, Train accuracy: 0.921578, Val accuracy: 0.915978
 lr: 0.001 | weigh

In [108]:
learning_rates = [1e-3]
anneal_coeff = 0.2
anneal_epochs = [10]
reg = [1e-5]

batch_size = 512
epoch_num = 5

run_record = {} 
def get_new_model():
  best_model = nn.Sequential(
          nn.Conv2d(in_channels=3, out_channels=16, kernel_size=(5, 5), padding="same"),
          nn.MaxPool2d(kernel_size=(2, 2), stride=2),
          nn.BatchNorm2d(num_features=16),
          nn.ReLU(),
          nn.Conv2d(in_channels=16, out_channels=32, kernel_size=(5, 5), padding="same"),
          nn.MaxPool2d(kernel_size=(2, 2), stride=2),
          nn.BatchNorm2d(num_features=32),
          nn.ReLU(),
          nn.Conv2d(in_channels=32, out_channels=64, kernel_size=(5, 5), padding="same"),
          nn.MaxPool2d(kernel_size=(2, 2), stride=2),
          nn.BatchNorm2d(num_features=64),
          nn.ReLU(),
          Flattener(),
          nn.Linear(in_features=1024, out_features=100),
          nn.ReLU(),
          nn.Linear(100, 10))

  return best_model

attemps = 10
for ae, rg, lr in product(anneal_epochs, reg, learning_rates):
  current_params = Hyperparams(lr, ae, rg)
  new_lenet_model = get_new_model()
  new_lenet_model.type(torch.cuda.FloatTensor)
  new_lenet_model.to(device)
  loss = nn.CrossEntropyLoss().type(torch.cuda.FloatTensor)
  optimizer = optim.Adam(new_lenet_model.parameters(), lr=current_params.learning_rate, weight_decay=current_params.reg)
  loss_history, train_history, val_history = train_model(new_lenet_model, train_aug_loader, val_loader, loss, optimizer, epoch_num, current_params.anneal_epochs, anneal_coeff, use_scheduler=True)
  print(f" lr: {current_params.learning_rate} | weight_decay: {current_params.reg} | anneal_epochs: {current_params.anneal_epochs} | accuracy: {val_history[-1]}")
  result = RunResult(new_lenet_model, train_history, val_history, val_history[-1])
  run_record[current_params]=result
# TODO: Your code here!

Epoch: 0.000000, Average loss: 0.726617, Train accuracy: 0.767669, Val accuracy: 0.873729
Epoch: 1.000000, Average loss: 0.371329, Train accuracy: 0.887315, Val accuracy: 0.897686
Epoch: 2.000000, Average loss: 0.313304, Train accuracy: 0.905385, Val accuracy: 0.899324
Epoch: 3.000000, Average loss: 0.279224, Train accuracy: 0.916169, Val accuracy: 0.913726
Epoch: 4.000000, Average loss: 0.251518, Train accuracy: 0.924428, Val accuracy: 0.914340
 lr: 0.001 | weight_decay: 1e-05 | anneal_epochs: 10 | accuracy: 0.9143403172492981


# Финальный аккорд - проверим лучшую модель на test set

В качестве разнообразия - напишите код для прогона модели на test set вы.

В результате вы должны натренировать модель, которая покажет более **90%** точности на test set.  
Как водится, лучший результат в группе получит дополнительные баллы!

In [113]:
for hyperparams, run_result in run_record.items():
    if best_val_accuracy is None or best_val_accuracy < run_result.final_val_accuracy:
        best_val_accuracy = run_result.final_val_accuracy
        best_hyperparams = hyperparams
        best_run = run_result

In [118]:
# TODO Write the code to compute accuracy on test set
test_loader = torch.utils.data.DataLoader(data_test, batch_size=batch_size)
final_test_accuracy = compute_accuracy(best_run.model, test_loader)
print("Final test accuracy - ", final_test_accuracy)

Final test accuracy -  tensor(0.9195, device='cuda:0')
