## Practice: Basic Artificial Neural Networks
Credits: this notebook belongs to [Practical DL](https://docs.google.com/forms/d/e/1FAIpQLScvrVtuwrHSlxWqHnLt1V-_7h2eON_mlRR6MUb3xEe5x9LuoA/viewform?usp=sf_link) course by Yandex School of Data Analysis.

We will start working with neural networks on the practice session. Your homework will be to finish the implementation of the layers.

Our goal is simple, yet an actual implementation may take some time :). We are going to write an Artificial Neural Network (almost) from scratch. The software design was heavily inspired by [PyTorch](http://pytorch.org) which is the main framework of our course 

Speaking about the homework (once again, it will be really similar to this seminar), it requires sending **multiple** files, please do not forget to include all the files when sending to TA. The list of files:
- This notebook
- modules.ipynb with all blocks implemented (except maybe `Conv2d` and `MaxPool2d` layers implementation which are part of 'advanced' version of this homework)

In [1]:
%matplotlib inline
from time import time, sleep

import numpy as np
import matplotlib.pyplot as plt
from IPython import display

In [2]:
import torch
from torch.utils.data import DataLoader, Dataset, Subset
import torchvision
from torchvision import transforms

In [3]:
from typing import Tuple, List, Type, Dict, Any

In [4]:
from tqdm.notebook import tqdm

In [5]:
from torch import nn

# Framework

Implement everything in `modules.ipynb`. Read all the comments thoughtfully to ease the pain. Please try not to change the prototypes.

Do not forget, that each module should return **AND** store `output` and `gradInput`.

The typical assumption is that `module.backward` is always executed after `module.forward`,
so `output` is stored, this would be useful for `SoftMax`. 

### Tech note
Prefer using `np.multiply`, `np.add`, `np.divide`, `np.subtract` instead of `*`,`+`,`/`,`-` for better memory handling.

Example: suppose you allocated a variable 

```
a = np.zeros(...)
```
So, instead of
```
a = b + c  # will be reallocated, GC needed to free
``` 
You can use: 
```
np.add(b,c,out = a) # puts result in `a`
```

In [6]:
# (re-)load layers
%run modules.ipynb

# Toy example

Use this example to debug your code, start with logistic regression and then test other layers. You do not need to change anything here. This code is provided for you to test the layers. Also it is easy to use this code in MNIST task.

In [None]:
# Generate some data
N = 500

X1 = np.random.randn(N,2) + np.array([2,2])
X2 = np.random.randn(N,2) + np.array([-2,-2])

Y = np.concatenate([np.ones(N),np.zeros(N)])[:,None]
Y = np.hstack([Y, 1-Y])

X = np.vstack([X1,X2])
plt.scatter(X[:,0],X[:,1], c = Y[:,0], edgecolors= 'none')

Define a **logistic regression** for debugging. 

In [None]:
net = Sequential()
net.add(Linear(2, 2))
net.add(LogSoftMax())

criterion = ClassNLLCriterion()

print(net)

# Test something like that then 

# net = Sequential()
# net.add(Linear(2, 4))
# net.add(ReLU())
# net.add(Linear(4, 2))
# net.add(LogSoftMax())
# print(net)

Start with batch_size = 1000 to make sure every step lowers the loss, then try stochastic version.

In [None]:
# Iptimizer params
optimizer_config = {'learning_rate' : 1e-1, 'momentum': 0.9}
optimizer_state = {}

# Looping params
n_epoch = 20
batch_size = 128

In [None]:
# batch generator
def get_batches(dataset, batch_size):
    X, Y = dataset
    n_samples = X.shape[0]
        
    # Shuffle at the start of epoch
    indices = np.arange(n_samples)
    np.random.shuffle(indices)
    
    for start in range(0, n_samples, batch_size):
        end = min(start + batch_size, n_samples)
        
        batch_idx = indices[start:end]
    
        yield X[batch_idx], Y[batch_idx]

### Train

Basic training loop. Examine it.

In [None]:
loss_history = []

for i in range(n_epoch):
    for x_batch, y_batch in get_batches((X, Y), batch_size):
        
        net.zeroGradParameters()
        
        # Forward
        predictions = net.forward(x_batch)
        loss = criterion.forward(predictions, y_batch)
    
        # Backward
        dp = criterion.backward(predictions, y_batch)
        net.backward(x_batch, dp)
        
        # Update weights
        sgd_momentum(net.getParameters(), 
                     net.getGradParameters(), 
                     optimizer_config,
                     optimizer_state)      
        
        loss_history.append(loss)

    # Visualize
    display.clear_output(wait=True)
    plt.figure(figsize=(8, 6))
        
    plt.title("Training loss")
    plt.xlabel("#iteration")
    plt.ylabel("loss")
    plt.plot(loss_history, 'b')
    plt.show()
    
    print('Current loss: %f' % loss)    

# Digit classification 

We are using old good [MNIST](http://yann.lecun.com/exdb/mnist/) as our dataset.

In [7]:
import mnist
X_train, y_train, X_val, y_val, X_test, y_test = mnist.load_dataset()

One-hot encode the labels first.

In [None]:
def OneHotEncoding(y):
    """
    Encoding targets using one hot encoding
    """
    assert len(y.shape) == 1

    y_coded = np.zeros((y.shape[0], np.max(np.unique(y)) + 1))

    for i in range(len(y)):
        y_coded[i][y[i]] = 1
    return y_coded

In [None]:
print(X_train.shape)

In [None]:
print(OneHotEncoding(y_train))

In [8]:
# class DatasetMNIST(Dataset):
#     def __init__(self, X, y, transforms=None):
#         self.data, self.labels = X, y
#         self.transform = transform

#     def __len__(self):
#         return len(self.data)

#     def __getitem__(self, index):
#         data = self.data[index]
#         label = self.labels[index]

#         if self.transform is not None:
#             data = self.transforms(data)

#         return data, label
class DatasetMNIST(Dataset):
    def __init__(self, X, y, transform=None):
        self.X = np.array(X)
        self.y = np.array(y)
        self.transform = transform
        assert (X.shape[0] == y.shape[0])
        
    def __len__(self):
        return self.X.shape[0]
    
    def __getitem__(self, index):
        obj = self.X[index]
        label = self.y[index]
        
        if self.transform is not None:
            obj = self.transform(obj)
            
        return obj, label

In [None]:
train_dataset = DatasetMNIST(X_train, OneHotEncoding(y_train))
val_dataset = DatasetMNIST(X_val, OneHotEncoding(y_val))
test_dataset = DatasetMNIST(X_test, OneHotEncoding(y_test))

train_loader = DataLoader(train_dataset, batch_size=256, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=256, shuffle=False)
test_loader = DataLoader(test_dataset, batch_size=256, shuffle=False) 

In [None]:
def accuracy_score(y_true, y_pred):
    """
    Размерность y_true (object_counts, 10)
    Размерность y_pred (object_counts, 10)
    """
    assert y_true.shape == y_pred.shape
    acc = 0
    for i in range(y_true.shape[0]):
        if np.argmax(y_true[i]) == np.argmax(y_pred[i]):
            acc += 1
    return acc / y_pred.shape[0]

- **Compare** `ReLU`, `ELU`, `LeakyReLU`, `SoftPlus` activation functions. 
You would better pick the best optimizer params for each of them, but it is overkill for now. Use an architecture of your choice for the comparison.
- **Try** inserting `BatchNormalization` (folowed by `ChannelwiseScaling`) between `Linear` module and activation functions.
- Plot the losses both from activation functions comparison and `BatchNormalization` comparison on one plot. Please find a scale (log?) when the lines are distinguishable, do not forget about naming the axes, the plot should be goodlooking.
- Plot the losses for two networks: one trained by momentum_sgd, another one trained by Adam. Which one performs better?
- Hint: good logloss for MNIST should be around 0.5. 

In [None]:
def Neural_Network(activation=ReLU,
                  input_resolution: Tuple[int, int]=(28, 28),
                  input_channels: int=1,
                  num_classes: int=10,
                  batch_norm: bool=False
                  ):
    if batch_norm:
        in_features = np.product(input_resolution) * input_channels
        sequential = Sequential()
        flatten = Flatten()
        sequential.add(flatten)
        linear = Linear(in_features, 256)
        sequential.add(linear)
        bn = BatchNormalization(0.001)
        sequential.add(bn)
        cws = ChannelwiseScaling(256)
        sequential.add(cws)
        activation = activation()
        sequential.add(activation)
        linear_2 = Linear(256, num_classes)
        sequential.add(linear_2)
        log_softmax = LogSoftMax()
        sequential.add(log_softmax)
        return sequential
    else:
        in_features = np.product(input_resolution) * input_channels
        sequential = Sequential()
        flatten = Flatten()
        sequential.add(flatten)
        linear = Linear(in_features, 256)
        sequential.add(linear)
        activation = activation()
        sequential.add(activation)
        linear_2 = Linear(256, num_classes)
        sequential.add(linear_2)
        log_softmax = LogSoftMax()
        sequential.add(log_softmax)
        return sequential

In [None]:
def train_loop(train_loader,
               val_loader,
               net,
               criterion,
               optimizer, 
               optimizer_config,
               batch_size=128,
               n_epoch=20):

    loss_history = []
    loss_val_history = []
    optimizer_state = {}

    for i in tqdm(range(n_epoch)):
        loss = []
        loss_val = None
        
        net.train()
        for x_batch, y_batch in train_loader:

            net.zeroGradParameters()

            # Forward
            predictions = net.forward(x_batch.numpy())
            loss.append(criterion.forward(predictions, y_batch.numpy()))

            # Backward
            dp = criterion.backward(predictions, y_batch.numpy())
            net.backward(x_batch.numpy(), dp)

            # Update weights
            sgd_momentum(net.getParameters(), net.getGradParameters(),
                         optimizer_config, optimizer_state)

            
        loss_history.append(np.mean(loss))

        print('Epoch: %i' % i)
        print('Current mean train loss: %f' % np.mean(loss))

    return loss_history



In [None]:
criterion = ClassNLLCriterion()

In [None]:
net_ReLU = Neural_Network(activation=ReLU)
net_ReLU_bn = Neural_Network(activation=ReLU, batch_norm=True)

net_ELU = Neural_Network(activation=ELU)
net_ELU_bn = Neural_Network(activation=ELU, batch_norm=True)

net_LeakyReLU = Neural_Network(activation=ELU)
net_LeakyReLU_bn = Neural_Network(activation=ELU, batch_norm=True)

net_SoftPlus = Neural_Network(activation=SoftPlus)
net_SoftPlus_bn = Neural_Network(activation=SoftPlus, batch_norm=True)

In [None]:
loss_ReLU = train_loop(train_loader=train_loader,
                       val_loader=val_loader,
                       net=net_ReLU,
                       criterion=criterion,
                       optimizer=sgd_momentum, 
                       optimizer_config={'learning_rate' : 1e-1, 'momentum': 0.9})

In [None]:
loss_ReLU_bn = train_loop(train_loader=train_loader,
                          val_loader=val_loader,
                          net=net_ReLU_bn,
                          criterion=criterion,
                          optimizer=sgd_momentum, 
                          optimizer_config={'learning_rate' : 1e-1, 'momentum': 0.9})

In [None]:
display.clear_output(wait=True)
plt.figure(figsize=(12, 9))

plt.title("Training loss")
plt.xlabel("Epoch")
plt.ylabel("loss")
plt.plot(loss_ReLU, 'b', label='ReLU')
plt.plot(loss_ReLU_bn, 'g', label='ReLU with BatchNorm')
plt.legend()
plt.show()
print(f'min ReLU loss: {np.min(loss_ReLU)}')
print(f'min ReLU with batchnorm loss: {np.min(loss_ReLU_bn)}')

In [None]:
loss_ELU = train_loop(train_loader=train_loader,
                       val_loader=val_loader,
                       net=net_ELU,
                       criterion=criterion,
                       optimizer=sgd_momentum, 
                       optimizer_config={'learning_rate' : 1e-1, 'momentum': 0.9})

In [None]:
loss_ELU_bn = train_loop(train_loader=train_loader,
                         val_loader=val_loader,
                         net=net_ELU_bn,
                         criterion=criterion,
                         optimizer=sgd_momentum, 
                         optimizer_config={'learning_rate' : 1e-1, 'momentum': 0.9})

In [None]:
display.clear_output(wait=True)
plt.figure(figsize=(12, 9))

plt.title("Training loss")
plt.xlabel("Epoch")
plt.ylabel("loss")
plt.plot(loss_ELU, 'b', label='ELU')
plt.plot(loss_ELU_bn, 'g', label='ELU with BatchNorm')
plt.legend()
plt.show()
print(f'min ELU loss: {np.min(loss_ELU)}')
print(f'min ELU with batchnorm loss: {np.min(loss_ELU_bn)}')

In [None]:
loss_LeakyReLU = train_loop(train_loader=train_loader,
                             val_loader=val_loader,
                             net=net_LeakyReLU,
                             criterion=criterion,
                             optimizer=sgd_momentum, 
                             optimizer_config={'learning_rate' : 1e-1, 'momentum': 0.9})

In [None]:
loss_LeakyReLU_bn = train_loop(train_loader=train_loader,
                                val_loader=val_loader,
                                net=net_LeakyReLU_bn,
                                criterion=criterion,
                                optimizer=sgd_momentum, 
                                optimizer_config={'learning_rate' : 1e-1, 'momentum': 0.9})

In [None]:
display.clear_output(wait=True)
plt.figure(figsize=(12, 9))

plt.title("Training loss")
plt.xlabel("Epoch")
plt.ylabel("loss")
plt.plot(loss_LeakyReLU, 'b', label='LeakyReLU')
plt.plot(loss_LeakyReLU_bn, 'g', label='LeakyReLU with BatchNorm')
plt.legend()
plt.show()
print(f'min LeakyReLU loss: {np.min(loss_LeakyReLU)}')
print(f'min LeakyReLU with batchnorm loss: {np.min(loss_LeakyReLU_bn)}')

In [None]:
loss_SoftPlus = train_loop(train_loader=train_loader,
                           val_loader=val_loader,
                           net=net_SoftPlus,
                           criterion=criterion,
                           optimizer=sgd_momentum, 
                           optimizer_config={'learning_rate' : 1e-1, 'momentum': 0.9})

In [None]:
loss_SoftPlus_bn = train_loop(train_loader=train_loader,
                              val_loader=val_loader,
                              net=net_SoftPlus_bn,
                              criterion=criterion,
                              optimizer=sgd_momentum, 
                              optimizer_config={'learning_rate' : 1e-1, 'momentum': 0.9})  

In [None]:
display.clear_output(wait=True)
plt.figure(figsize=(12, 9))

plt.title("Training loss")
plt.xlabel("Epoch")
plt.ylabel("loss")
plt.plot(loss_SoftPlus, 'b', label='SoftPlus')
plt.plot(loss_SoftPlus_bn, 'g', label='SoftPlus with BatchNorm')
plt.legend()
plt.show()
print(f'min SoftPlus loss: {np.min(loss_SoftPlus)}')
print(f'min SoftPlus with batchnorm loss: {np.min(loss_SoftPlus_bn)}')

In [None]:
net_LeakyReLU_Adagrad_bn = Neural_Network(activation=LeakyReLU, batch_norm=True)
net_LeakyReLU_sgd_bn = Neural_Network(activation=LeakyReLU, batch_norm=True)


In [None]:
loss_LeakyReLU_sgd_bn = train_loop(train_loader=train_loader,
                           val_loader=val_loader,
                           net=net_LeakyReLU_sgd_bn,
                           criterion=criterion,
                           optimizer=sgd_momentum,
                           optimizer_config={'learning_rate' : 1e-1, 'momentum': 0.9})

In [None]:
loss_LeakyReLU_Adagrad_bn = train_loop(train_loader=train_loader,
                               val_loader=val_loader,
                               net=net_LeakyReLU_Adagrad_bn,
                               criterion=criterion,
                               optimizer=adam_optimizer,
                               optimizer_config={'learning_rate' : 1e-1, 'momentum': 0.9})

In [None]:
display.clear_output(wait=True)
plt.figure(figsize=(12, 9))

plt.title("Training loss")
plt.xlabel("Epoch")
plt.ylabel("loss")
plt.plot(loss_LeakyReLU_sgd_bn, 'b', label='SoftPlus')
plt.plot(loss_LeakyReLU_Adagrad_bn, 'g', label='SoftPlus with BatchNorm')
plt.legend()
plt.show()
print(f'min LeakyReLU (BatchNorm) with SGD optimizer loss: {np.min(loss_LeakyReLU_sgd_bn)}')
print(f'min LeakyReLU (BatchNorm) with Adam optimizer loss: {np.min(loss_LeakyReLU_Adagrad_bn)}')

In [None]:
def train_loop(train_loader,
               val_loader,
               net,
               criterion,
               optimizer, 
               optimizer_config,
               batch_size=128,
               n_epoch=20):

    loss_history = []
    loss_val_history = []
    optimizer_state = {}

    for i in tqdm(range(n_epoch)):
        loss = []
        loss_val = []
        
        net.train()
        for x_batch, y_batch in train_loader:

            net.zeroGradParameters()

            # Forward
            predictions = net.forward(x_batch.numpy())
            loss.append(criterion.forward(predictions, y_batch.numpy()))

            # Backward
            dp = criterion.backward(predictions, y_batch.numpy())
            net.backward(x_batch.numpy(), dp)

            # Update weights
            sgd_momentum(net.getParameters(), net.getGradParameters(),
                         optimizer_config, optimizer_state)

            
        loss_history.append(np.mean(loss))

        net.evaluate()
        for x_val, y_val in val_loader:
            y_val_pred = net.forward(x_val.numpy())
            loss_val.append(criterion.forward(y_val_pred, y_val.numpy()))

        loss_val_history.append(np.mean(loss_val))

        print('Epoch: %i' % i)
        print('Current mean train loss: %f' % np.mean(loss))
        print('Current mean validation loss: %f' % np.mean(loss_val))

    return loss_history, loss_val_history




In [14]:
train_transforms = torchvision.transforms.Compose([
    torchvision.transforms.ToPILImage(),
    torchvision.transforms.RandomResizedCrop(size=(28, 28), scale=(.5, 1.0), ratio=(.8, 1.25)),
    torchvision.transforms.RandomRotation(degrees=30),
    torchvision.transforms.ToTensor(),
])

test_transforms = torchvision.transforms.Compose([
    torchvision.transforms.ToTensor(),
])

In [None]:
train_dataset = DatasetMNIST(X_train, OneHotEncoding(y_train), transform=train_transforms)
val_dataset = DatasetMNIST(X_val, OneHotEncoding(y_val), transform=test_transforms)
test_dataset = DatasetMNIST(X_test, OneHotEncoding(y_test), transform=test_transforms)

train_loader = DataLoader(train_dataset, batch_size=256, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=256, shuffle=False)
test_loader = DataLoader(test_dataset, batch_size=256, shuffle=False) 

In [None]:
def advanced_NN(activation=LeakyReLU,
                  input_resolution: Tuple[int, int]=(28, 28),
                  input_channels: int=1,
                  num_classes: int=10,
                  ):

    in_features = np.product(input_resolution) * input_channels
    sequential = Sequential()
    flatten = Flatten()
    sequential.add(flatten)
    linear = Linear(in_features, 256)
    sequential.add(linear)
    bn = BatchNormalization(0.001)
    sequential.add(bn)
    cws = ChannelwiseScaling(256)
    sequential.add(cws)
    activation = activation()
    sequential.add(activation)
    dropout = Dropout()
    sequential.add(dropout)
    linear_2 = Linear(256, num_classes)
    sequential.add(linear_2)
    log_softmax = LogSoftMax()
    sequential.add(log_softmax)
    return sequential

In [None]:
model = advanced_NN()
criterion = ClassNLLCriterion()

In [None]:
print(model)

In [None]:
loss_train, loss_val = train_loop(train_loader=train_loader,
                               val_loader=val_loader,
                               net=model,
                               criterion=criterion,
                               optimizer=adam_optimizer,
                               optimizer_config={'learning_rate' : 1e-1, 'momentum': 0.9})

In [None]:
display.clear_output(wait=True)
plt.figure(figsize=(12, 9))

plt.title("Train and validation loss comparison")
plt.xlabel("Epoch")
plt.ylabel("loss")
plt.plot(loss_train, 'b', label='train')
plt.plot(loss_val, 'g', label='validation')
plt.legend()
plt.show()
print(f'min train loss: {np.min(loss_train)}')
print(f'min validation loss: {np.min(loss_val)}')

In [None]:
model.evaluate()
acc = []
for x_batch, y_batch in test_loader:
    y_pred = model.forward(X_test)
    acc.append(accuracy_score(OneHotEncoding(y_test), y_pred))
    

In [None]:
print(np.mean(acc))

Write your personal opinion on the activation functions, think about computation times too. Does `BatchNormalization` help?

**Finally**, use all your knowledge to build a super cool model on this dataset. Use **dropout** to prevent overfitting, play with **learning rate decay**. You can use **data augmentation** such as rotations, translations to boost your score. Use your knowledge and imagination to train a model. Don't forget to call `training()` and `evaluate()` methods to set desired behaviour of `BatchNormalization` and `Dropout` layers.

Print here your accuracy on test set. It should be around 90%.

In [None]:
# Your answer goes here. ################################################

### Comparing with PyTorch implementation
The last (and maybe the easiest step after compared to the previous tasks: build a network with the same architecture as above now with PyTorch.

You can refer to the `week0_09` or `Lab3_part2` notebooks for hints.

__Good Luck!__

In [9]:
def PyTorch_NN(activation=nn.LeakyReLU,
               input_resolution: Tuple[int, int]=(28, 28),
               input_channels: int=1,
               num_classes: int=10):
    
    in_features = np.product(input_resolution) * input_channels
    return nn.Sequential(
        nn.Flatten(),
        nn.Linear(in_features, 256),
        nn.BatchNorm1d(256, momentum=0.001),
        activation(),
        nn.Dropout(),
        nn.Linear(256, num_classes)
    )

In [10]:
model = PyTorch_NN()
print(model)

Sequential(
  (0): Flatten(start_dim=1, end_dim=-1)
  (1): Linear(in_features=784, out_features=256, bias=True)
  (2): BatchNorm1d(256, eps=1e-05, momentum=0.001, affine=True, track_running_stats=True)
  (3): LeakyReLU(negative_slope=0.01)
  (4): Dropout(p=0.5, inplace=False)
  (5): Linear(in_features=256, out_features=10, bias=True)
)


In [11]:
criterion = nn.CrossEntropyLoss()

In [12]:
# train_transforms = torchvision.transforms.Compose([
#     torchvision.transforms.ToPILImage(),
#     torchvision.transforms.RandomResizedCrop(size=(28, 28), scale=(.5, 1.0), ratio=(.8, 1.25)),
#     torchvision.transforms.RandomRotation(degrees=30),
#     torchvision.transforms.ToTensor(),
# ])

# test_transforms = torchvision.transforms.Compose([
#     torchvision.transforms.ToTensor(),
# ])

In [15]:
train_dataset = DatasetMNIST(X_train, y_train, transform=train_transforms)
val_dataset = DatasetMNIST(X_val, y_val, transform=test_transforms)
test_dataset = DatasetMNIST(X_test, y_test, transform=test_transforms)

train_loader = DataLoader(train_dataset, batch_size=256, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=256, shuffle=False)
test_loader = DataLoader(test_dataset, batch_size=256, shuffle=False) 

In [16]:
def train_loop(train_loader,
               val_loader,
               model,
               criterion,
               optimizer, 
               batch_size=128,
               n_epoch=20):

    loss_history = []
    loss_val_history = []

    for i in tqdm(range(n_epoch)):
        loss = []
        loss_val = []
        
        model.train()
        for x_batch, y_batch in train_loader:
            optimizer.zero_grad()
            y_pred = model(x_batch)
            loss_cur = criterion(y_pred, y_batch)
            loss.append(loss_cur.item())
            loss_cur.backward()
            optimizer.step()

            
        loss_history.append(np.mean(loss))

        model.eval()
        for x_batch, y_batch in val_loader:
            with torch.no_grad():
                y_pred = model(x_batch)
                loss_cur = criterion(y_pred, y_batch)
                loss_val.append(loss_cur.item())
                
        loss_val_history.append(np.mean(loss_val))

        print('Epoch: %i' % i)
        print('Current mean train loss: %f' % np.mean(loss))
        print('Current mean validation loss: %f' % np.mean(loss_val))

    return loss_history, loss_val_history

In [None]:
loss_train, loss_val = train_loop(train_loader=train_loader,
                                  val_loader=val_loader,
                                  model=model,
                                  criterion=criterion,
                                  optimizer=torch.optim.Adam(model.parameters(), lr=0.1))

HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=20.0), HTML(value='')))

In [None]:
model.eval()

In [None]:
acc = 0
batches = 0
for x_batch, y_batch in test_loader:
    y_pred = model.forward(x_batch)
    batches += 1
    acc += ((torch.argmax(y_pred, dim=1) == y_batch).numpy().mean())
print(acc/batches)