# Exercise 8: Convolutional Neural Networks - Solutions

In this exercise session, we will learn about PyTorch, one of the most used framework for deep learning in python. We will use it to implement our own neural networks and train them through gradient descent.


We will use the following packages:

`torch`: The framework we will use for training deep nets, with useful sub-modules `torch.nn` and `torch.nn.functional` that we import below.

`torchvision`: Helper package consisting of popular datasets, model architectures, and common image transformations for computer vision. We will use it for loading the MNIST dataset and to perform simple data transformations.

`torchinfo`: Helper package for visualizing deep net architectures.

In [1]:
# 3rd party
import numpy as np

# We import PyTorch and some of its internal modules
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader
from torchvision.datasets import MNIST
from torchvision.transforms import ToTensor, Lambda, Compose
from torchinfo import summary
import matplotlib.pyplot as plt

# Project files.
from helpers import accuracy, DrawingPad

In [2]:
%matplotlib notebook
%load_ext autoreload
%autoreload 2

# 1 PyTorch

## 1.1 Motivation

In the first part of the exercise we will revisit the MNIST dataset of hand-written digits, and we will train deep net models to classify the digits. Instead of doing all the hard coding work manually, we will simplify our life by using a deep learning framework: **PyTorch**.

Last week we have implemented our own Multi-Layer Perceptron (MLP), where we defined both the forward pass and back-propagation together with a simple optimizer (stochastic gradient descent, SGD, update rule) and successfully trained it to perform classification. Given the amount of code written, one can imagine that prototyping with various NN architectures and training strategies might get tedious. That is where PyTorch (and other deep learning frameworks) come into play.

## 1.2 About PyTorch

[PyTorch](https://pytorch.org/) is an optimized tensor library for deep learning using GPUs and CPUs. It allows
for fast prototyping by providing high-level access to all necessary building blocks, including NN layers, activation functions, loss functions, and optimizers, to name a few. Most importantly, however, PyTorch implements the [autograd](https://pytorch.org/docs/stable/autograd.html) package, which allows for automatic differentiation of the operations we use to define NN architectures. In other words, one only has to implement the forward pass, namely to combine desired layers, while the **backpropagation is computed automatically**.

## 1.3 Basic pipeline

To define and train deep net models, one would usually implement the following steps:

    1. Load the dataset.
    2. Define and instantiate a deep net architecture.
    3. Choose or implement a loss function (such as the mean squared error).
    4. Choose and instantiate an optimizer (such as the SGD).
    5. Repeating multiple time for the whole dataset, for each batch in the dataset:
        5.1. Load a batch.
        5.2. Run a forward pass through your model.
        5.3. Compute the loss.
        5.4. Run a backward pass, i.e., compute gradients of the loss w.r.t. the trainable parameters (weights).
        5.5. Update the weights using the optimizer.
        5.6. Zero-out the accumulated gradients before the next iteration.
        
We will see this exact pipeline in our code as well.

## 1.4 Essential bulding blocks

This section gives a high-level summary of the most important components representing the bare minimum that you will need to start playing with PyTorch and deep net models. You might want to skim through the official tutorials as well, namely [What is PyTorch](https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html) and [Neural Networks](https://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#sphx-glr-beginner-blitz-neural-networks-tutorial-py). Here is the list of the components that will be explained in more detail along with the code blocks.

  - **nn.Module**: Base class for NN architectures.
  - **criterion**: A loss function.
  - **backward-pass**: Derivatives computed by the auto-differentiation system.
  - **optimizer**: Updates the trainable parameters (weights) during training.

## 1.5 Loading the data

We are at step (1) of the training pipeline where we prepare the data. In PyTorch, loading the data is traditionally performed by creating:
* a *dataset* that manages the loading and transformations of the data
* and a *dataloader* that is a Python *iterator*, which returns the batches of data and associated labels from our dataset.

PyTorch provides us with the `Dataset` and `DataLoader` classes for this.

As was the case of previous week, we will work with the [MNIST](https://en.wikipedia.org/wiki/MNIST_database) dataset, where each sample is stored as a $28 \times 28$ pixels grayscale image. The data samples are loaded as `torch.Tensor` data type, multi-dimentional matrices similar to `numpy.ndarray`.

`MNIST` below is a sub-class of `Dataset`, which will download the dataset when used for the first time. It returns an image $x$ and its true label $y$. The dataloader will then prepare batches out of these images and labels.

In [3]:
batch_size = 128

# Dataset and DataLoader for MLP.
dataset_train = MNIST('data', train=True, download=True, transform=ToTensor())
dataset_test = MNIST('data', train=False, download=True, transform=ToTensor())
dataloader_train = DataLoader(dataset_train, batch_size=batch_size, shuffle=True)
dataloader_test = DataLoader(dataset_test, batch_size=batch_size, shuffle=False)

print('Loaded {} train and {} valid samples.'.format(len(dataset_train), len(dataset_test)))

Loaded 60000 train and 10000 valid samples.


## 1.6 Multi-Layer Perceptron (MLP)

### Architecture

We are at step (2) of the training pipeline. We will start by implementing an MLP consisting of a 1D input layer (we flatten, i.e., vectorize, the input image) of shape ($784$, ), $3$ hidden fully connected layers and an output layer of shape ($10$, ), as we have $10$ classes.

As you saw last week, one layer of an MLP computes the following function:

$$ \mathbf{y} = \sigma \left(\mathbf{W}^\top\mathbf{x} + \mathbf{b}\right), $$

where $\sigma$ is the activation, $\mathbf{W}$ is the weight, and $\mathbf{b}$ the bias. The type of layer that computes $\mathbf{W}^\top\mathbf{x} + \mathbf{b}$ is referred to as *fully-connected* (FC) because every input is connected to every output (recall the MLP diagram from the lectures). Additionally, we also call them *linear* layers because they compute a linear function with respect to their input (plus a bias).

### Optimization criterion

We would like to interpret the output vector $\hat{\mathbf{y}} \in \mathbb{R}^{10}$ as the probabilities of data sample $\mathbf{x} \in \mathbb{R}^{784}$ belonging to each class $j \in \{0, 1, 2, ..., 9\}$. Therefore, we will make use of the activation function **softmax**, defined as

$$ \hat{y}_j = P(\text{class}=j|\mathbf{z}) = \mathrm{Softmax}_j(\mathbf{z}) = \frac{\exp{z_j}}{\sum_{k=0}^{9}{\exp{z_k}}}, $$

on the final output of our network $\mathbf{z}$ (these values pre-softmax are referred to as "logits").
The softmax guarantees that $\sum_{j=0}^{9}\hat{y}_{j} = 1$ and $\hat{y}_j \geq 0, \, \forall j$, meaning that the predicted vector $\hat{\mathbf{y}}$ is indeed a valid probability distribution over classes.

Finally, we would like to match the predicted distribution $\hat{\mathbf{y}}$ to the ground truth (GT) one $\mathbf{y}$, where $\mathbf{y}$ is given as a one-hot encoding ($\mathbf{y}$ is all zeros except for a $1$ at the index $j$, if $j$ is the correct class to be predicted). The optimization criterion of choice is then to minimize the [**cross-entropy**](https://en.wikipedia.org/wiki/Cross_entropy) (CE) of $\hat{\mathbf{y}}$ and $\mathbf{y}$. Therefore our final loss function $\mathcal{L}$ is defined as:

$$ \mathcal{L} = \text{CE}\left(\hat{\mathbf{y}}, \mathbf{y}\right).$$

Thankfully, PyTorch provides the implementation of $\mathcal{L}$ that directly works with the logits, **so you will only really need to provide the output $\mathbf{z}$** (i.e. the 10-dimensional output of your last layer *before* the softmax). We will get back to $\mathcal{L}$ later.

---

### nn.Module
Each custom NN architecture you choose to implement has to subclass the [`nn.Module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html?highlight=nn+module#torch.nn.Module), which conveniently keeps track of all the trainable parameters. From the programmer's perspective, you have to implement the constructor (`__init__`) and override the `forward()` function:

- **`__init__()`**

You will define your layers (e.g., fully connected layer, 2D convolutional layer, etc.) in the constructor, and `nn.Module` will automatically keep track of all the weights these layers contain. Here, we basically prepare and initialized what we need in the network.

- **`forward()`**

This function really defines the architecture, as you will sequentally call your layers in the desired order. Each time you call `forward()` (every training iteration), the so-called **computational graph** is built. It is a directed acyclic graph (DAG) of nodes corresponding to the operations you have called. Each node defines the derivative of its outputs w.r.t. its inputs. The computational graph is then traversed in the reversed fashion once you call `backward()` and the derivatives are computed.

**Note:** PyTorch allows us to use call our model like a function on the input data like `y = model(x)`. This will automatically call the `forward()` function of the model.

All the trainable parameters, which your model consists of, can be accessed via a call to `model.parameters()` implemented in `nn.Module`. This comes in handy once instantiating your optimizer as you have to pass all the parameters you want it to manage.

---

Your task is to define the MLP as depicted in the figure below. Please refer to the documentation and focus on
the classes [`nn.Linear`](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html?highlight=nn+linear#torch.nn.Linear) to define the layers and [`F.relu`](https://pytorch.org/docs/stable/generated/torch.nn.functional.relu.html?highlight=f+relu#torch.nn.functional.relu) to call the activation funtion.

<img src="img/mlp.png" width=800></img>

In [4]:
class FC(nn.Module):
    """ Standard Multi layer perceptron for classification into 10 
    classes. Consists of 4 FC layers, ReLU activations are used 
    for the first 3.
    """
    def __init__(self):
        """ Constructor, layers definitions go here. Only specify
        those layers which have any trainable parameters (but for
        instance not the activation functions as the ones we use 
        do not have any trainable parameters). """
        # The following line is needed to initialize the nn.Module properly
        super(FC, self).__init__()

        ### WRITE YOUR CODE HERE
        self.fc1 = nn.Linear(784, 512)
        self.fc2 = nn.Linear(512, 256)
        self.fc3 = nn.Linear(256, 128)
        self.fc4 = nn.Linear(128, 10)

    def forward(self, x):
        """ Feed-forward pass, this is where the actual computation happens
        and the computational graph is built (from scratch each time this 
        function is called). """
        # Note: we first flatten the images into vectors
        # This is done over the last 3 dimensions: (channel, height, width)
        x = x.flatten(-3)
        
        ### WRITE YOUR CODE HERE
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        return self.fc4(x)
    
# Instantiate the model.
model_fc = FC()

**Q:** How many learnable parameters (weights) does this model have?

**A:** (784 + 1) * 512 + (512 + 1) * 256 + (256 + 1) * 128 + (128 + 1) * 10 = 567,434

## 1.7 Inspecting the model architecture

Let us check the model architecture and see how many trainable parameters we really use. For this purpose we will use the `torchinfo` package.

Note the number of trainable parameters.

In [5]:
summary(model_fc, input_size=(1, 1, 28, 28))  # the first dimension=1 corresponds to the batch size

Layer (type:depth-idx)                   Output Shape              Param #
FC                                       [1, 10]                   --
├─Linear: 1-1                            [1, 512]                  401,920
├─Linear: 1-2                            [1, 256]                  131,328
├─Linear: 1-3                            [1, 128]                  32,896
├─Linear: 1-4                            [1, 10]                   1,290
Total params: 567,434
Trainable params: 567,434
Non-trainable params: 0
Total mult-adds (M): 0.57
Input size (MB): 0.00
Forward/backward pass size (MB): 0.01
Params size (MB): 2.27
Estimated Total Size (MB): 2.28

## 1.8 Loss function

We are at step (3) of our pipeline. As explained above, our loss function $\mathcal{L}$ will be $\text{CE}(\hat{\mathbf{y}}, \mathbf{y})$, which is provided for us by PyTorch, please refer to the documentation of [`nn.CrossEntropyLoss`](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html#torch.nn.CrossEntropyLoss).

**Note:** as explained above, this function will also directly implement the *Softmax* on the predictions, so you don't need to manually call it.

There are [many commonly used loss functions](https://pytorch.org/docs/stable/nn.html#loss-functions) defined in the `torch.nn` module, and you can also implement your own using PyTorch operations. 

Your task is to instantiate the CE loss function.

In [6]:
# Define the loss fuction.
criterion = nn.CrossEntropyLoss()  ### WRITE YOUR CODE HERE

## 1.9 Optimizer
We are at step (4) of the pipeline. The [Optimizer](https://pytorch.org/docs/stable/optim.html) updates the weights of the network given their currently computed gradients. It can be a simple state-less function (such as SGD) or a more advanced one that keeps track of additional information about the weights and their gradients (such as a running mean), which can be used for more advanced update rules.

We will opt for the simplest case, the state-less Stochastic Gradient Descent. Your task is to instantiate this optimizer for the parameters (:= weights) of our model, please refer to [`torch.optim.SGD`](https://pytorch.org/docs/stable/generated/torch.optim.SGD.html#torch.optim.SGD). We will also need to define it's learning rate.

In [7]:
learning_rate = 1e-1
optimizer = torch.optim.SGD(model_fc.parameters(), lr=learning_rate)  ### WRITE YOUR CODE HERE

## 1.10 Training loop

We are at step (5) of our pipeline. We would like to define a training loop where we iterate over the training samples, predict the outputs, and update the model based on its errors. Let us define a function `train_model()` that will be used for training any network architecture we come up with.

Fill in the code that follows the steps 5.2 - 5.6 of our training pipeline. For running the backward pass, use the function [`backward()`](https://pytorch.org/docs/stable/generated/torch.autograd.backward.html#torch.autograd.backward). For zeroing out the accumulated gradients, use the function [`zero_grad()`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html?highlight=zero_grad#torch.nn.Module.zero_grad).

In [None]:
def train_model(model, criterion, optimizer, dataloader_train, dataloader_test, epochs):
    """ Trains the model for the specified number of epochs on the dataset.

    Args:
        model: The model to train.
        criterion: The loss function.
        optimizer: The optimizer to use.
        dataloader_train: The DataLoader for the training set.
        dataloader_test: The DataLoader for the test set.
        epochs: The number of epochs to train for.
    """
    for ep in range(epochs):
        # Training.
        model.train()
        for it, batch in enumerate(dataloader_train):
            # 5.1 Load a batch, break it down in images and targets.
            x, y = batch

            # 5.2 Run forward pass.
            logits = model(x)  ### WRITE YOUR CODE HERE
            
            # 5.3 Compute loss (using 'criterion').
            loss = criterion(logits, y)  ### WRITE YOUR CODE HERE
            
            # 5.4 Run backward pass.
            loss.backward()  ### WRITE YOUR CODE HERE
            # `torch.autograd.backward(loss)` also works
            
            # 5.5 Update the weights using 'optimizer'.
            optimizer.step()  ### WRITE YOUR CODE HERE
            
            # 5.6 Zero-out the accumulated gradients.
            optimizer.zero_grad()  ### WRITE YOUR CODE HERE
            # `model.zero_grad()` also works

            print('\rEp {}/{}, it {}/{}: loss train: {:.2f}, accuracy train: {:.2f}'.
                  format(ep + 1, epochs, it + 1, len(dataloader_train), loss,
                         accuracy(logits, y)), end='')

        # Validation.
        model.eval()
        with torch.no_grad():
            acc_run = 0
            for it, batch in enumerate(dataloader_test):
                # Get batch of data.
                x, y = batch
                curr_bs = x.shape[0]
                acc_run += accuracy(model(x), y) * curr_bs
            acc = acc_run / len(dataloader_test.dataset)

            print(', accuracy test: {:.2f}'.format(acc))

### Training the model

Let's now use everything that we have prepared to train our model on MNIST:

In [14]:
epochs = 5
train_model(model_fc, criterion, optimizer, dataloader_train, dataloader_test, epochs)

tensor([3, 1, 0, 3, 1, 2, 3, 8, 4, 7, 9, 0, 1, 7, 1, 1, 3, 5, 3, 7, 6, 2, 0, 8,
        0, 9, 3, 0, 3, 5, 3, 7, 3, 7, 1, 2, 1, 6, 5, 2, 6, 7, 2, 6, 0, 6, 5, 3,
        2, 2, 0, 6, 3, 8, 0, 3, 9, 6, 2, 9, 3, 1, 2, 2, 6, 0, 8, 0, 9, 6, 6, 7,
        3, 5, 6, 3, 7, 2, 3, 1, 1, 4, 0, 8, 9, 6, 6, 1, 6, 7, 2, 4, 6, 4, 1, 6,
        8, 1, 2, 4, 2, 8, 9, 5, 0, 7, 5, 7, 6, 8, 6, 1, 4, 2, 7, 7, 8, 2, 8, 7,
        8, 6, 1, 2, 7, 3, 3, 9])
Ep 1/5, it 1/469: loss train: 0.05, accuracy train: 0.98tensor([6, 2, 6, 0, 1, 5, 4, 3, 6, 8, 2, 1, 0, 6, 1, 4, 8, 3, 4, 5, 5, 8, 9, 8,
        1, 1, 2, 9, 0, 0, 9, 0, 3, 5, 8, 8, 3, 1, 7, 8, 3, 3, 1, 6, 5, 3, 7, 2,
        5, 3, 6, 9, 5, 1, 2, 3, 1, 5, 4, 2, 2, 2, 9, 4, 8, 9, 6, 9, 6, 3, 6, 6,
        1, 3, 0, 6, 0, 6, 5, 7, 5, 5, 0, 0, 0, 4, 2, 8, 1, 0, 4, 2, 3, 6, 3, 9,
        2, 9, 1, 6, 7, 0, 1, 5, 3, 7, 6, 7, 5, 0, 1, 4, 4, 6, 3, 3, 3, 3, 3, 6,
        1, 5, 4, 3, 5, 7, 1, 0])
Ep 1/5, it 2/469: loss train: 0.01, accuracy train: 1.00tensor([0, 9, 7, 7, 2,

Ep 1/5, it 17/469: loss train: 0.07, accuracy train: 0.99tensor([0, 0, 3, 9, 8, 4, 3, 4, 1, 8, 2, 0, 1, 2, 8, 2, 4, 6, 5, 3, 5, 7, 9, 6,
        5, 6, 8, 6, 4, 5, 1, 2, 0, 3, 2, 4, 0, 9, 2, 4, 1, 5, 1, 5, 9, 6, 6, 7,
        1, 6, 4, 0, 4, 6, 2, 0, 8, 9, 5, 5, 4, 2, 6, 6, 0, 0, 4, 8, 9, 7, 9, 3,
        6, 0, 7, 6, 0, 0, 0, 3, 5, 3, 8, 5, 2, 1, 6, 4, 8, 5, 6, 1, 3, 8, 4, 7,
        5, 0, 7, 6, 6, 7, 5, 6, 6, 9, 8, 9, 2, 3, 1, 2, 6, 0, 8, 3, 6, 3, 6, 3,
        3, 0, 0, 3, 1, 9, 1, 4])
Ep 1/5, it 18/469: loss train: 0.03, accuracy train: 1.00tensor([6, 3, 1, 1, 7, 3, 7, 5, 4, 4, 6, 0, 4, 1, 1, 8, 5, 9, 4, 4, 8, 1, 4, 0,
        6, 0, 6, 9, 1, 9, 2, 5, 0, 8, 6, 2, 2, 5, 3, 9, 2, 5, 5, 7, 6, 1, 2, 9,
        1, 1, 2, 5, 0, 7, 5, 4, 1, 7, 8, 3, 5, 0, 5, 7, 2, 5, 9, 6, 3, 2, 1, 7,
        7, 2, 2, 7, 8, 6, 8, 0, 8, 9, 8, 5, 8, 5, 0, 0, 5, 8, 3, 5, 4, 1, 1, 4,
        5, 7, 9, 2, 0, 6, 8, 8, 6, 9, 5, 6, 3, 7, 2, 5, 4, 5, 5, 9, 7, 6, 0, 0,
        9, 5, 8, 3, 2, 2, 2, 1])
Ep 1/5, it 19/469: l

Ep 1/5, it 38/469: loss train: 0.02, accuracy train: 0.99tensor([4, 8, 1, 1, 4, 7, 0, 5, 0, 3, 9, 4, 2, 5, 1, 9, 0, 0, 3, 5, 5, 3, 3, 3,
        2, 2, 0, 0, 8, 4, 5, 7, 6, 2, 1, 1, 1, 7, 1, 4, 3, 3, 2, 0, 7, 7, 9, 6,
        8, 1, 1, 8, 1, 8, 0, 5, 3, 8, 7, 6, 3, 7, 3, 3, 5, 3, 6, 8, 5, 7, 3, 9,
        8, 3, 7, 5, 5, 1, 4, 1, 0, 0, 0, 1, 8, 6, 5, 3, 9, 8, 5, 1, 0, 4, 1, 2,
        8, 2, 2, 4, 4, 2, 6, 2, 3, 4, 3, 2, 6, 2, 0, 5, 4, 6, 8, 9, 7, 7, 6, 4,
        4, 0, 2, 6, 0, 5, 2, 0])
Ep 1/5, it 39/469: loss train: 0.00, accuracy train: 1.00tensor([8, 1, 5, 1, 2, 0, 6, 4, 7, 4, 8, 7, 9, 6, 8, 6, 3, 7, 8, 8, 6, 9, 3, 6,
        3, 4, 6, 4, 5, 0, 2, 6, 6, 3, 1, 6, 8, 5, 6, 1, 4, 0, 4, 8, 1, 2, 0, 0,
        3, 0, 6, 4, 8, 4, 6, 4, 6, 1, 5, 7, 9, 1, 3, 8, 5, 4, 5, 1, 6, 5, 1, 0,
        7, 5, 2, 5, 5, 6, 3, 5, 2, 8, 5, 9, 7, 2, 6, 4, 1, 5, 2, 7, 8, 1, 7, 7,
        1, 6, 1, 6, 8, 0, 4, 1, 1, 3, 1, 4, 8, 0, 3, 9, 0, 0, 9, 6, 9, 5, 2, 3,
        7, 1, 6, 7, 5, 1, 1, 0])
Ep 1/5, it 40/469: l

Ep 1/5, it 55/469: loss train: 0.01, accuracy train: 1.00tensor([0, 0, 8, 3, 0, 6, 5, 1, 5, 3, 4, 4, 7, 7, 7, 7, 1, 3, 0, 7, 4, 2, 3, 6,
        3, 4, 6, 6, 3, 1, 4, 8, 2, 5, 8, 6, 2, 2, 8, 9, 8, 4, 2, 3, 9, 7, 6, 1,
        9, 7, 2, 8, 0, 3, 6, 4, 5, 6, 2, 6, 1, 2, 2, 1, 5, 1, 8, 6, 3, 8, 4, 4,
        6, 5, 9, 1, 4, 8, 6, 3, 6, 2, 7, 2, 2, 7, 9, 0, 4, 1, 1, 2, 3, 0, 6, 4,
        3, 5, 8, 5, 6, 0, 8, 6, 5, 3, 7, 6, 3, 5, 2, 0, 8, 8, 1, 3, 6, 8, 9, 9,
        5, 2, 7, 8, 5, 3, 0, 8])
Ep 1/5, it 56/469: loss train: 0.02, accuracy train: 0.99tensor([1, 7, 1, 6, 1, 9, 1, 5, 4, 3, 1, 2, 7, 1, 2, 1, 7, 4, 1, 1, 5, 9, 3, 9,
        3, 6, 0, 6, 1, 3, 1, 7, 0, 6, 4, 7, 7, 3, 8, 8, 8, 8, 4, 6, 5, 0, 1, 5,
        0, 7, 9, 0, 2, 1, 8, 1, 8, 2, 0, 1, 0, 2, 3, 1, 3, 7, 3, 9, 1, 8, 1, 2,
        2, 0, 3, 1, 5, 0, 4, 7, 7, 1, 8, 2, 9, 1, 5, 8, 8, 6, 0, 1, 0, 2, 8, 1,
        1, 5, 5, 6, 5, 7, 1, 3, 5, 6, 9, 0, 4, 6, 6, 2, 7, 4, 9, 9, 6, 4, 1, 6,
        7, 8, 9, 6, 5, 6, 7, 8])
Ep 1/5, it 57/469: l

Ep 1/5, it 77/469: loss train: 0.01, accuracy train: 1.00tensor([5, 0, 6, 7, 7, 2, 9, 0, 1, 5, 4, 2, 5, 1, 0, 5, 8, 1, 6, 2, 7, 0, 2, 8,
        5, 8, 3, 3, 1, 3, 2, 3, 2, 6, 5, 0, 0, 6, 1, 2, 8, 2, 3, 1, 2, 2, 2, 3,
        0, 0, 9, 7, 5, 8, 5, 5, 8, 1, 7, 0, 5, 7, 0, 6, 5, 4, 4, 6, 8, 2, 6, 7,
        9, 3, 2, 8, 3, 6, 3, 5, 7, 8, 1, 9, 2, 4, 4, 6, 3, 6, 9, 5, 6, 2, 0, 8,
        4, 6, 4, 0, 1, 3, 1, 1, 3, 4, 7, 4, 0, 7, 0, 1, 5, 6, 2, 1, 7, 3, 7, 1,
        3, 3, 3, 2, 7, 7, 9, 0])
Ep 1/5, it 78/469: loss train: 0.03, accuracy train: 0.99tensor([4, 2, 3, 6, 2, 0, 9, 7, 8, 4, 0, 1, 8, 4, 3, 5, 5, 7, 1, 2, 6, 5, 3, 7,
        1, 5, 8, 1, 3, 4, 3, 6, 7, 8, 4, 5, 7, 7, 4, 5, 8, 8, 4, 5, 6, 2, 4, 0,
        6, 2, 7, 6, 2, 7, 1, 6, 8, 5, 7, 0, 7, 7, 6, 8, 0, 4, 5, 8, 9, 4, 8, 1,
        8, 8, 9, 8, 1, 6, 2, 8, 2, 1, 9, 3, 3, 4, 3, 1, 1, 6, 1, 6, 4, 1, 0, 3,
        2, 6, 8, 2, 0, 0, 0, 7, 2, 4, 9, 8, 3, 2, 3, 3, 8, 6, 2, 1, 5, 0, 2, 3,
        8, 0, 8, 2, 4, 2, 8, 6])
Ep 1/5, it 79/469: l

Ep 1/5, it 103/469: loss train: 0.04, accuracy train: 0.99tensor([7, 3, 4, 3, 3, 2, 4, 2, 7, 6, 0, 3, 9, 0, 2, 1, 8, 6, 0, 1, 8, 9, 8, 6,
        9, 6, 0, 2, 2, 5, 2, 0, 8, 9, 3, 2, 2, 0, 8, 5, 1, 8, 9, 5, 1, 4, 8, 1,
        9, 8, 9, 2, 1, 1, 7, 0, 7, 6, 0, 9, 7, 2, 9, 9, 2, 3, 9, 2, 2, 4, 9, 8,
        8, 7, 0, 9, 6, 7, 7, 0, 3, 7, 2, 7, 3, 3, 9, 4, 0, 8, 6, 0, 8, 6, 2, 8,
        1, 2, 8, 7, 9, 9, 9, 4, 8, 0, 3, 2, 1, 5, 4, 4, 4, 7, 5, 0, 4, 2, 7, 4,
        6, 1, 8, 6, 0, 0, 1, 2])
Ep 1/5, it 104/469: loss train: 0.02, accuracy train: 1.00tensor([7, 8, 7, 3, 4, 6, 8, 2, 4, 6, 7, 4, 7, 8, 6, 8, 9, 3, 5, 1, 8, 7, 4, 1,
        7, 2, 8, 7, 8, 8, 2, 9, 4, 5, 9, 7, 7, 9, 0, 2, 1, 6, 6, 8, 6, 7, 8, 9,
        9, 0, 9, 7, 3, 6, 0, 0, 2, 2, 3, 0, 0, 6, 7, 1, 6, 8, 6, 3, 5, 0, 9, 9,
        3, 5, 2, 3, 2, 1, 1, 1, 3, 0, 9, 4, 1, 0, 3, 9, 1, 2, 2, 7, 1, 1, 8, 0,
        9, 1, 4, 1, 8, 5, 6, 7, 8, 5, 6, 6, 4, 7, 6, 7, 0, 9, 8, 1, 2, 0, 0, 2,
        5, 4, 2, 1, 4, 7, 8, 3])
Ep 1/5, it 105/469

Ep 1/5, it 128/469: loss train: 0.01, accuracy train: 1.00tensor([5, 8, 2, 6, 1, 1, 9, 8, 9, 4, 6, 3, 9, 7, 5, 3, 9, 2, 3, 8, 8, 6, 7, 2,
        7, 4, 5, 8, 9, 3, 5, 6, 2, 3, 2, 0, 4, 0, 9, 7, 9, 5, 1, 3, 6, 9, 3, 4,
        6, 3, 1, 1, 5, 8, 7, 1, 3, 0, 6, 7, 8, 5, 1, 2, 6, 1, 5, 6, 1, 1, 0, 3,
        4, 3, 0, 8, 3, 0, 3, 1, 2, 4, 7, 9, 4, 6, 0, 4, 6, 2, 5, 0, 7, 1, 0, 0,
        6, 8, 1, 2, 3, 0, 9, 8, 2, 1, 9, 4, 5, 4, 8, 8, 7, 3, 7, 8, 9, 6, 2, 4,
        9, 2, 3, 6, 3, 5, 2, 1])
Ep 1/5, it 129/469: loss train: 0.03, accuracy train: 0.99tensor([1, 1, 5, 7, 0, 9, 4, 1, 0, 9, 5, 1, 4, 6, 1, 0, 7, 8, 1, 1, 8, 4, 6, 3,
        1, 7, 3, 8, 4, 2, 0, 0, 1, 2, 9, 7, 6, 6, 8, 0, 5, 2, 7, 7, 1, 7, 9, 7,
        5, 7, 7, 0, 0, 9, 6, 8, 3, 0, 1, 0, 7, 0, 4, 5, 7, 5, 1, 5, 3, 6, 5, 3,
        6, 1, 2, 5, 2, 9, 7, 4, 9, 4, 4, 7, 5, 4, 1, 1, 6, 3, 9, 7, 9, 3, 1, 4,
        0, 8, 6, 3, 7, 0, 7, 9, 0, 9, 6, 6, 3, 3, 8, 0, 1, 4, 6, 6, 6, 0, 2, 7,
        5, 1, 3, 1, 7, 4, 4, 6])
Ep 1/5, it 130/469

Ep 1/5, it 153/469: loss train: 0.03, accuracy train: 0.99tensor([6, 1, 7, 0, 0, 3, 0, 2, 4, 4, 8, 5, 4, 3, 3, 7, 3, 1, 7, 2, 7, 1, 9, 9,
        0, 8, 5, 9, 4, 6, 4, 1, 5, 9, 9, 1, 6, 7, 7, 6, 0, 1, 2, 1, 5, 0, 9, 0,
        3, 6, 2, 1, 0, 3, 7, 1, 5, 3, 0, 3, 7, 4, 4, 2, 9, 7, 1, 5, 4, 3, 2, 7,
        8, 0, 8, 0, 0, 6, 1, 8, 7, 4, 7, 9, 6, 0, 6, 6, 1, 7, 1, 6, 9, 6, 5, 7,
        1, 7, 9, 2, 6, 7, 3, 1, 4, 5, 3, 3, 0, 1, 9, 0, 5, 1, 6, 2, 6, 3, 3, 4,
        4, 4, 6, 3, 4, 0, 7, 4])
Ep 1/5, it 154/469: loss train: 0.01, accuracy train: 1.00tensor([2, 7, 8, 3, 5, 1, 5, 3, 7, 4, 0, 9, 7, 0, 8, 8, 2, 8, 3, 5, 4, 2, 6, 7,
        1, 1, 4, 1, 4, 2, 7, 3, 3, 7, 7, 6, 8, 0, 1, 5, 0, 6, 5, 5, 3, 7, 0, 1,
        0, 1, 3, 4, 5, 3, 6, 7, 4, 8, 8, 6, 6, 2, 9, 3, 0, 8, 7, 7, 4, 9, 4, 4,
        9, 4, 9, 3, 9, 2, 8, 5, 3, 9, 5, 9, 1, 7, 1, 7, 2, 2, 7, 3, 8, 6, 6, 4,
        5, 7, 7, 3, 2, 8, 6, 4, 5, 4, 8, 4, 7, 9, 3, 2, 7, 3, 8, 4, 4, 6, 0, 9,
        4, 7, 7, 2, 8, 4, 4, 1])
Ep 1/5, it 155/469

Ep 1/5, it 178/469: loss train: 0.01, accuracy train: 1.00tensor([2, 3, 8, 2, 0, 4, 1, 3, 0, 6, 8, 5, 5, 3, 1, 5, 8, 2, 1, 5, 9, 8, 5, 3,
        5, 1, 6, 9, 0, 8, 6, 7, 6, 0, 7, 0, 2, 2, 6, 0, 1, 8, 0, 7, 2, 5, 2, 3,
        5, 2, 6, 9, 5, 5, 6, 5, 2, 5, 1, 2, 6, 7, 4, 9, 8, 3, 8, 2, 7, 0, 1, 6,
        0, 6, 5, 8, 1, 6, 9, 3, 1, 0, 4, 2, 2, 7, 6, 7, 6, 3, 3, 8, 8, 1, 9, 5,
        5, 2, 5, 4, 8, 9, 6, 9, 4, 1, 3, 9, 8, 5, 5, 5, 1, 2, 3, 2, 4, 4, 4, 7,
        4, 1, 1, 9, 9, 9, 6, 2])
Ep 1/5, it 179/469: loss train: 0.01, accuracy train: 1.00tensor([7, 0, 4, 1, 7, 6, 1, 0, 0, 3, 1, 7, 1, 4, 6, 0, 5, 8, 9, 7, 9, 2, 7, 1,
        7, 4, 6, 1, 1, 8, 3, 3, 2, 5, 5, 2, 1, 9, 0, 3, 3, 9, 2, 9, 3, 8, 7, 3,
        7, 9, 3, 4, 5, 0, 5, 1, 0, 1, 3, 8, 4, 1, 7, 8, 1, 0, 2, 7, 9, 0, 4, 5,
        3, 2, 9, 1, 4, 5, 0, 9, 5, 9, 9, 1, 9, 7, 5, 0, 9, 8, 2, 1, 7, 0, 2, 2,
        0, 2, 0, 8, 0, 2, 2, 2, 4, 3, 0, 9, 3, 4, 1, 3, 0, 9, 6, 8, 4, 3, 5, 3,
        2, 6, 6, 9, 3, 9, 3, 1])
Ep 1/5, it 180/469

Ep 1/5, it 205/469: loss train: 0.04, accuracy train: 0.99tensor([1, 3, 5, 7, 9, 2, 2, 4, 9, 6, 3, 6, 4, 5, 5, 0, 1, 1, 2, 0, 8, 3, 1, 2,
        4, 1, 8, 1, 6, 1, 6, 5, 5, 8, 7, 8, 3, 3, 2, 8, 4, 7, 5, 4, 7, 6, 0, 8,
        6, 2, 8, 8, 8, 0, 0, 8, 7, 8, 4, 8, 7, 6, 4, 9, 6, 9, 0, 0, 8, 8, 1, 4,
        2, 0, 0, 8, 9, 6, 4, 7, 8, 4, 5, 6, 1, 7, 4, 7, 0, 0, 7, 6, 0, 5, 9, 3,
        0, 2, 4, 0, 5, 5, 0, 7, 0, 1, 0, 9, 5, 3, 3, 7, 5, 1, 3, 0, 3, 6, 9, 0,
        4, 0, 0, 6, 4, 7, 4, 3])
Ep 1/5, it 206/469: loss train: 0.02, accuracy train: 1.00tensor([4, 4, 8, 1, 6, 3, 9, 3, 9, 4, 3, 1, 5, 1, 0, 4, 7, 7, 8, 1, 1, 4, 4, 2,
        7, 6, 4, 8, 3, 4, 4, 0, 7, 1, 5, 9, 6, 0, 2, 0, 2, 2, 3, 6, 1, 5, 8, 4,
        6, 1, 9, 9, 9, 5, 6, 7, 5, 0, 3, 8, 8, 2, 7, 2, 8, 6, 2, 0, 3, 4, 7, 7,
        0, 3, 8, 9, 2, 9, 6, 2, 7, 3, 9, 4, 1, 4, 1, 6, 8, 1, 4, 7, 4, 1, 5, 8,
        1, 2, 5, 6, 3, 7, 0, 5, 4, 3, 4, 1, 2, 2, 8, 6, 9, 2, 9, 2, 7, 7, 8, 2,
        6, 9, 0, 8, 1, 4, 9, 3])
Ep 1/5, it 207/469

Ep 1/5, it 231/469: loss train: 0.04, accuracy train: 0.98tensor([4, 3, 6, 4, 1, 0, 6, 3, 1, 2, 1, 2, 6, 0, 3, 9, 4, 0, 1, 0, 7, 0, 5, 9,
        6, 9, 2, 8, 1, 4, 9, 7, 0, 1, 9, 0, 5, 0, 0, 2, 1, 3, 4, 2, 0, 6, 1, 6,
        5, 9, 2, 8, 3, 3, 8, 4, 6, 8, 5, 9, 3, 7, 7, 9, 5, 6, 8, 6, 6, 5, 1, 1,
        0, 1, 0, 2, 6, 3, 7, 7, 8, 1, 8, 5, 5, 0, 5, 5, 3, 4, 1, 2, 6, 7, 1, 5,
        2, 1, 2, 3, 1, 3, 6, 8, 3, 8, 0, 2, 5, 3, 3, 7, 6, 9, 7, 7, 5, 4, 4, 1,
        7, 7, 7, 3, 3, 7, 5, 5])
Ep 1/5, it 232/469: loss train: 0.01, accuracy train: 1.00tensor([2, 1, 4, 5, 8, 1, 3, 5, 4, 0, 7, 2, 7, 9, 0, 6, 6, 9, 6, 6, 4, 5, 2, 6,
        4, 9, 5, 3, 6, 7, 4, 1, 2, 2, 6, 8, 2, 2, 6, 7, 1, 4, 4, 7, 3, 6, 3, 6,
        9, 1, 8, 2, 3, 4, 3, 1, 3, 9, 5, 1, 6, 5, 8, 7, 6, 9, 7, 4, 9, 6, 0, 7,
        2, 2, 8, 4, 5, 2, 0, 8, 6, 9, 7, 5, 7, 1, 0, 5, 8, 6, 0, 2, 6, 6, 8, 2,
        8, 6, 3, 3, 2, 9, 2, 8, 7, 1, 7, 2, 1, 9, 5, 7, 1, 9, 5, 3, 0, 5, 0, 2,
        3, 5, 6, 5, 4, 5, 1, 5])
Ep 1/5, it 233/469

Ep 1/5, it 257/469: loss train: 0.03, accuracy train: 0.98tensor([4, 1, 9, 6, 7, 2, 0, 2, 9, 6, 8, 0, 8, 4, 8, 8, 8, 7, 2, 0, 7, 5, 2, 2,
        0, 0, 2, 3, 4, 1, 0, 8, 0, 8, 7, 2, 5, 9, 9, 1, 4, 1, 4, 2, 5, 0, 6, 9,
        0, 4, 2, 2, 9, 7, 6, 1, 5, 0, 8, 2, 7, 8, 4, 2, 7, 5, 8, 1, 8, 7, 0, 8,
        4, 7, 6, 2, 5, 2, 7, 7, 2, 5, 0, 5, 1, 4, 3, 2, 0, 6, 6, 9, 2, 0, 5, 5,
        2, 9, 7, 1, 3, 4, 9, 5, 0, 2, 1, 3, 8, 4, 6, 2, 1, 2, 7, 3, 6, 5, 8, 5,
        1, 2, 7, 0, 6, 0, 5, 1])
Ep 1/5, it 258/469: loss train: 0.01, accuracy train: 0.99tensor([8, 0, 5, 5, 9, 8, 6, 5, 2, 7, 8, 4, 1, 8, 4, 1, 6, 6, 0, 3, 0, 1, 6, 4,
        2, 8, 3, 1, 6, 8, 2, 8, 5, 5, 7, 0, 6, 8, 5, 9, 0, 9, 3, 7, 7, 1, 2, 4,
        6, 6, 4, 3, 9, 5, 4, 0, 0, 3, 6, 1, 9, 9, 0, 5, 8, 7, 1, 2, 3, 3, 6, 5,
        4, 9, 7, 1, 1, 7, 5, 4, 6, 0, 9, 0, 9, 2, 4, 6, 7, 6, 2, 8, 8, 1, 5, 4,
        4, 3, 7, 8, 9, 0, 8, 4, 1, 9, 4, 3, 1, 2, 4, 1, 9, 7, 8, 1, 0, 7, 8, 5,
        6, 5, 5, 5, 4, 1, 0, 5])
Ep 1/5, it 259/469

Ep 1/5, it 283/469: loss train: 0.02, accuracy train: 1.00tensor([9, 8, 7, 8, 3, 4, 4, 5, 7, 6, 5, 9, 9, 1, 8, 5, 6, 8, 1, 0, 0, 1, 1, 6,
        9, 0, 9, 6, 5, 1, 0, 3, 0, 2, 0, 7, 9, 0, 6, 6, 0, 2, 5, 7, 0, 7, 8, 1,
        3, 1, 9, 4, 5, 9, 7, 3, 6, 2, 0, 2, 8, 3, 2, 5, 1, 4, 8, 5, 8, 6, 3, 7,
        3, 5, 8, 7, 6, 4, 7, 3, 8, 4, 7, 6, 5, 2, 1, 1, 9, 0, 7, 4, 7, 5, 6, 8,
        5, 6, 1, 0, 3, 4, 3, 5, 9, 3, 0, 5, 3, 1, 3, 2, 2, 5, 7, 5, 7, 4, 6, 4,
        7, 0, 4, 8, 5, 3, 3, 0])
Ep 1/5, it 284/469: loss train: 0.02, accuracy train: 0.99tensor([2, 4, 3, 9, 3, 5, 0, 3, 2, 5, 5, 3, 1, 3, 2, 4, 6, 7, 3, 9, 1, 1, 9, 6,
        8, 6, 7, 1, 0, 9, 8, 6, 9, 7, 7, 4, 9, 4, 0, 4, 8, 1, 4, 0, 9, 2, 6, 7,
        9, 0, 6, 6, 4, 6, 4, 1, 9, 2, 9, 2, 4, 9, 9, 8, 6, 5, 4, 9, 8, 3, 1, 8,
        6, 7, 4, 7, 1, 0, 7, 3, 7, 7, 0, 6, 8, 1, 3, 3, 0, 5, 1, 8, 3, 6, 5, 0,
        4, 5, 5, 2, 3, 6, 8, 2, 2, 1, 6, 0, 8, 8, 1, 6, 0, 4, 2, 4, 0, 1, 8, 6,
        7, 6, 5, 0, 9, 0, 7, 7])
Ep 1/5, it 285/469

Ep 1/5, it 310/469: loss train: 0.07, accuracy train: 0.98tensor([3, 0, 3, 0, 1, 5, 4, 7, 9, 3, 7, 5, 9, 1, 1, 2, 1, 0, 4, 8, 7, 5, 1, 7,
        8, 5, 1, 8, 1, 7, 3, 3, 5, 9, 0, 3, 1, 2, 0, 3, 6, 7, 5, 7, 7, 5, 5, 5,
        1, 1, 2, 2, 9, 5, 5, 2, 5, 8, 5, 0, 2, 3, 1, 5, 1, 3, 8, 1, 7, 2, 1, 9,
        7, 2, 4, 8, 7, 2, 2, 2, 3, 9, 6, 3, 6, 1, 1, 3, 0, 7, 7, 5, 4, 0, 7, 4,
        7, 8, 9, 1, 5, 3, 5, 8, 2, 6, 9, 0, 3, 1, 8, 1, 2, 2, 3, 9, 1, 7, 9, 1,
        8, 6, 4, 7, 1, 1, 3, 1])
Ep 1/5, it 311/469: loss train: 0.04, accuracy train: 0.98tensor([5, 0, 1, 1, 2, 7, 1, 8, 1, 2, 7, 2, 1, 6, 6, 5, 6, 9, 0, 8, 9, 4, 5, 7,
        2, 9, 4, 9, 1, 5, 9, 9, 1, 1, 2, 1, 2, 9, 1, 2, 8, 2, 1, 7, 8, 2, 1, 9,
        9, 8, 1, 2, 8, 7, 8, 3, 1, 3, 9, 2, 7, 3, 5, 7, 3, 1, 6, 4, 3, 0, 1, 1,
        4, 3, 1, 6, 9, 6, 4, 3, 3, 1, 5, 8, 5, 2, 3, 8, 8, 8, 3, 1, 2, 6, 6, 8,
        8, 7, 5, 0, 2, 3, 3, 3, 4, 7, 3, 6, 6, 2, 3, 1, 4, 8, 1, 6, 5, 2, 7, 4,
        9, 1, 1, 0, 3, 4, 6, 6])
Ep 1/5, it 312/469

Ep 1/5, it 335/469: loss train: 0.02, accuracy train: 1.00tensor([8, 6, 7, 8, 7, 9, 1, 0, 1, 7, 5, 9, 5, 6, 3, 8, 6, 8, 6, 9, 4, 2, 4, 3,
        3, 1, 0, 5, 2, 3, 2, 2, 3, 8, 3, 0, 0, 6, 6, 5, 6, 1, 2, 4, 1, 9, 9, 8,
        5, 4, 9, 3, 8, 5, 4, 1, 5, 0, 0, 1, 1, 2, 5, 6, 9, 2, 8, 5, 8, 3, 8, 8,
        3, 9, 7, 5, 7, 2, 2, 4, 4, 5, 9, 4, 7, 6, 2, 2, 2, 7, 9, 1, 0, 1, 0, 3,
        7, 5, 1, 9, 5, 8, 5, 4, 6, 5, 3, 6, 0, 0, 1, 7, 7, 0, 9, 0, 0, 9, 0, 8,
        6, 8, 2, 1, 5, 2, 8, 1])
Ep 1/5, it 336/469: loss train: 0.06, accuracy train: 0.97tensor([5, 3, 0, 1, 2, 9, 8, 4, 8, 4, 3, 0, 5, 5, 9, 5, 2, 7, 5, 2, 3, 6, 7, 2,
        1, 1, 8, 3, 2, 8, 3, 1, 2, 0, 3, 4, 5, 5, 8, 2, 3, 6, 9, 6, 8, 1, 2, 4,
        3, 4, 6, 0, 1, 6, 8, 1, 4, 8, 1, 1, 4, 5, 3, 5, 4, 1, 2, 6, 2, 1, 4, 7,
        9, 5, 7, 6, 1, 5, 2, 3, 5, 0, 1, 1, 0, 0, 5, 3, 1, 3, 4, 9, 8, 1, 3, 4,
        4, 1, 8, 7, 6, 9, 5, 6, 9, 7, 7, 1, 2, 6, 2, 1, 1, 0, 8, 1, 7, 6, 4, 8,
        8, 8, 4, 3, 8, 0, 3, 0])
Ep 1/5, it 337/469

Ep 1/5, it 362/469: loss train: 0.01, accuracy train: 1.00tensor([0, 9, 1, 5, 8, 1, 0, 5, 5, 8, 1, 2, 2, 2, 8, 4, 7, 2, 4, 3, 4, 3, 2, 5,
        9, 6, 8, 5, 7, 8, 0, 8, 0, 6, 1, 1, 3, 1, 8, 1, 6, 9, 2, 8, 1, 3, 2, 6,
        6, 5, 2, 0, 3, 1, 5, 3, 5, 1, 6, 9, 9, 4, 5, 0, 0, 9, 0, 6, 5, 4, 6, 8,
        2, 0, 9, 5, 7, 1, 8, 7, 4, 7, 9, 4, 1, 7, 2, 1, 6, 1, 5, 2, 3, 3, 1, 6,
        9, 1, 5, 7, 1, 7, 8, 0, 8, 8, 1, 0, 9, 9, 6, 9, 9, 0, 4, 4, 4, 8, 6, 1,
        1, 1, 9, 9, 1, 4, 5, 9])
Ep 1/5, it 363/469: loss train: 0.03, accuracy train: 0.98tensor([3, 1, 3, 0, 4, 9, 7, 1, 0, 8, 1, 2, 2, 1, 9, 1, 6, 9, 2, 1, 3, 4, 5, 2,
        4, 6, 5, 6, 4, 8, 4, 1, 1, 0, 1, 8, 8, 1, 9, 5, 8, 6, 5, 3, 0, 2, 1, 4,
        8, 9, 4, 8, 5, 6, 6, 8, 4, 3, 7, 3, 3, 0, 8, 3, 5, 9, 7, 1, 9, 8, 9, 3,
        6, 3, 4, 8, 8, 6, 6, 5, 3, 0, 7, 9, 5, 1, 2, 0, 6, 8, 9, 3, 0, 8, 6, 8,
        6, 0, 6, 5, 7, 7, 4, 6, 1, 5, 4, 3, 8, 2, 9, 1, 0, 8, 6, 6, 0, 8, 3, 6,
        0, 9, 0, 9, 1, 4, 2, 4])
Ep 1/5, it 364/469

Ep 1/5, it 389/469: loss train: 0.02, accuracy train: 1.00tensor([4, 9, 9, 8, 1, 1, 5, 1, 0, 6, 2, 5, 6, 0, 2, 1, 3, 7, 5, 2, 7, 9, 4, 4,
        4, 4, 5, 5, 4, 9, 4, 8, 2, 3, 9, 5, 6, 0, 5, 4, 8, 7, 9, 1, 2, 6, 3, 2,
        5, 6, 0, 8, 6, 7, 0, 0, 3, 1, 3, 8, 9, 2, 1, 3, 7, 6, 6, 0, 6, 8, 6, 2,
        3, 4, 7, 9, 1, 8, 3, 5, 7, 9, 1, 4, 8, 9, 1, 2, 4, 9, 9, 2, 1, 1, 9, 5,
        0, 5, 4, 0, 4, 7, 5, 3, 8, 4, 8, 6, 2, 2, 1, 7, 1, 1, 7, 7, 4, 5, 3, 0,
        5, 3, 5, 2, 8, 4, 3, 8])
Ep 1/5, it 390/469: loss train: 0.01, accuracy train: 1.00tensor([5, 8, 7, 0, 0, 2, 5, 4, 4, 8, 8, 7, 0, 9, 9, 8, 3, 4, 6, 7, 0, 2, 3, 9,
        5, 2, 6, 0, 9, 8, 1, 3, 4, 4, 9, 9, 5, 5, 2, 9, 3, 7, 0, 4, 0, 0, 0, 9,
        5, 9, 8, 8, 2, 4, 2, 3, 7, 3, 7, 7, 5, 6, 1, 8, 9, 5, 7, 8, 8, 0, 9, 2,
        1, 9, 5, 1, 2, 9, 7, 5, 5, 8, 6, 1, 9, 6, 1, 3, 3, 4, 1, 1, 2, 8, 9, 9,
        2, 2, 2, 0, 4, 2, 2, 1, 6, 7, 3, 9, 5, 7, 2, 8, 7, 9, 7, 2, 1, 6, 7, 5,
        0, 6, 6, 5, 3, 5, 7, 0])
Ep 1/5, it 391/469

Ep 1/5, it 417/469: loss train: 0.02, accuracy train: 1.00tensor([0, 0, 7, 1, 7, 0, 4, 2, 0, 1, 3, 0, 2, 8, 0, 9, 0, 8, 8, 6, 1, 5, 5, 2,
        9, 3, 9, 0, 2, 8, 1, 6, 8, 2, 1, 5, 1, 7, 9, 6, 4, 6, 8, 5, 1, 1, 0, 4,
        5, 8, 7, 8, 3, 9, 3, 4, 3, 4, 9, 8, 5, 8, 9, 1, 7, 9, 0, 1, 1, 2, 1, 8,
        9, 4, 7, 1, 0, 8, 0, 0, 0, 0, 3, 0, 9, 8, 2, 7, 8, 9, 5, 6, 3, 3, 1, 6,
        2, 2, 1, 8, 1, 9, 2, 4, 3, 8, 0, 2, 7, 0, 1, 8, 9, 4, 2, 7, 8, 8, 0, 6,
        0, 1, 3, 0, 1, 3, 0, 1])
Ep 1/5, it 418/469: loss train: 0.04, accuracy train: 0.99tensor([5, 5, 4, 5, 0, 9, 2, 9, 4, 7, 1, 1, 6, 5, 2, 5, 6, 3, 0, 2, 4, 8, 7, 4,
        8, 7, 3, 8, 8, 1, 2, 2, 0, 3, 7, 9, 3, 3, 0, 0, 7, 4, 3, 0, 6, 6, 6, 1,
        6, 6, 3, 8, 7, 5, 2, 4, 0, 0, 0, 6, 0, 1, 4, 4, 1, 1, 2, 6, 8, 7, 2, 8,
        5, 5, 5, 6, 8, 8, 7, 3, 2, 9, 7, 6, 4, 7, 0, 6, 5, 2, 8, 0, 9, 2, 2, 3,
        9, 3, 8, 3, 5, 1, 5, 6, 2, 9, 2, 2, 2, 3, 6, 1, 8, 1, 3, 4, 0, 2, 5, 4,
        6, 4, 5, 7, 2, 4, 1, 1])
Ep 1/5, it 419/469

Ep 1/5, it 446/469: loss train: 0.01, accuracy train: 1.00tensor([9, 5, 0, 7, 1, 2, 9, 7, 4, 9, 1, 2, 5, 8, 8, 2, 9, 3, 1, 0, 8, 0, 3, 2,
        4, 6, 1, 8, 8, 9, 3, 3, 0, 7, 6, 9, 6, 3, 0, 0, 9, 4, 6, 7, 7, 8, 7, 7,
        7, 1, 5, 8, 9, 0, 4, 6, 4, 3, 8, 7, 3, 0, 9, 7, 9, 1, 2, 6, 2, 7, 0, 6,
        1, 7, 0, 3, 1, 2, 9, 4, 3, 9, 4, 6, 0, 3, 8, 6, 5, 3, 6, 5, 0, 1, 3, 8,
        0, 0, 6, 5, 8, 1, 0, 5, 0, 9, 7, 0, 6, 1, 6, 6, 1, 6, 4, 9, 6, 1, 7, 0,
        6, 6, 0, 8, 8, 1, 1, 4])
Ep 1/5, it 447/469: loss train: 0.01, accuracy train: 1.00tensor([5, 5, 2, 7, 1, 0, 7, 4, 2, 5, 4, 2, 9, 3, 8, 4, 8, 9, 7, 3, 4, 2, 6, 1,
        5, 0, 5, 7, 4, 1, 4, 1, 9, 4, 8, 2, 8, 5, 4, 9, 3, 3, 5, 7, 6, 4, 3, 0,
        5, 2, 9, 9, 4, 7, 4, 5, 9, 6, 9, 6, 0, 0, 2, 8, 3, 4, 3, 7, 9, 9, 8, 9,
        8, 9, 0, 8, 8, 7, 1, 5, 5, 7, 0, 2, 6, 7, 0, 4, 5, 8, 4, 8, 9, 6, 3, 3,
        0, 0, 1, 9, 0, 3, 5, 6, 4, 5, 6, 5, 7, 9, 2, 8, 0, 0, 0, 3, 9, 1, 1, 3,
        4, 7, 4, 9, 4, 8, 6, 6])
Ep 1/5, it 448/469

Ep 1/5, it 469/469: loss train: 0.01, accuracy train: 1.00, accuracy test: 0.98
tensor([8, 4, 3, 8, 3, 3, 2, 8, 6, 8, 1, 9, 2, 2, 5, 9, 0, 8, 4, 3, 3, 0, 3, 6,
        1, 6, 4, 6, 3, 7, 4, 2, 2, 3, 0, 0, 5, 6, 1, 2, 9, 0, 8, 6, 1, 2, 2, 1,
        2, 0, 7, 5, 9, 1, 1, 4, 3, 2, 7, 0, 6, 4, 7, 1, 9, 4, 7, 6, 6, 6, 3, 3,
        8, 6, 3, 5, 6, 0, 4, 8, 1, 0, 1, 2, 0, 6, 7, 7, 0, 1, 2, 8, 6, 3, 8, 4,
        6, 1, 0, 6, 3, 9, 0, 3, 6, 0, 7, 5, 2, 3, 6, 4, 9, 4, 1, 2, 6, 4, 7, 6,
        5, 8, 2, 1, 8, 2, 0, 4])
Ep 2/5, it 1/469: loss train: 0.01, accuracy train: 1.00tensor([1, 7, 1, 1, 4, 4, 2, 3, 1, 5, 5, 8, 9, 3, 6, 9, 6, 3, 4, 0, 2, 1, 1, 4,
        8, 8, 2, 0, 9, 0, 3, 7, 9, 6, 2, 0, 5, 8, 7, 2, 7, 4, 4, 8, 3, 4, 7, 6,
        8, 0, 5, 4, 3, 5, 4, 0, 4, 1, 9, 1, 9, 3, 3, 4, 1, 0, 3, 2, 4, 1, 4, 7,
        7, 0, 5, 9, 7, 8, 1, 2, 0, 8, 2, 6, 1, 5, 7, 1, 1, 2, 5, 5, 2, 4, 4, 1,
        3, 3, 9, 7, 0, 2, 9, 8, 2, 3, 0, 5, 7, 7, 6, 6, 9, 6, 8, 1, 3, 7, 3, 2,
        4, 4, 3, 0, 3, 9, 6, 8]

Ep 2/5, it 27/469: loss train: 0.02, accuracy train: 0.99tensor([6, 7, 5, 3, 6, 1, 8, 1, 7, 2, 3, 7, 3, 8, 1, 6, 9, 4, 9, 7, 2, 1, 6, 1,
        8, 4, 5, 4, 9, 1, 3, 1, 1, 3, 8, 5, 9, 6, 7, 2, 3, 9, 9, 5, 9, 3, 2, 3,
        3, 3, 0, 9, 6, 5, 0, 3, 3, 3, 0, 8, 1, 7, 3, 7, 2, 3, 3, 4, 3, 9, 3, 0,
        8, 2, 1, 0, 3, 0, 0, 3, 8, 5, 9, 3, 1, 9, 4, 2, 7, 9, 4, 2, 3, 2, 9, 4,
        7, 7, 1, 1, 3, 2, 0, 9, 2, 4, 2, 4, 5, 0, 0, 8, 4, 3, 4, 8, 6, 3, 9, 1,
        3, 2, 1, 4, 0, 6, 5, 7])
Ep 2/5, it 28/469: loss train: 0.09, accuracy train: 0.98tensor([3, 6, 9, 9, 5, 6, 1, 6, 8, 4, 9, 2, 4, 7, 7, 4, 9, 9, 4, 5, 5, 9, 1, 7,
        0, 9, 4, 8, 1, 1, 8, 0, 9, 6, 0, 2, 3, 8, 8, 8, 6, 0, 2, 1, 1, 6, 2, 3,
        1, 7, 9, 5, 7, 4, 3, 9, 4, 1, 7, 3, 4, 2, 6, 9, 0, 8, 8, 4, 9, 4, 2, 7,
        7, 0, 3, 2, 3, 1, 2, 7, 6, 2, 7, 9, 9, 4, 4, 3, 7, 2, 9, 0, 0, 2, 1, 7,
        6, 5, 2, 4, 0, 4, 0, 7, 7, 8, 0, 4, 1, 0, 9, 8, 7, 0, 7, 9, 1, 3, 1, 1,
        9, 3, 4, 4, 2, 6, 3, 9])
Ep 2/5, it 29/469: l

Ep 2/5, it 54/469: loss train: 0.03, accuracy train: 0.98tensor([8, 6, 0, 2, 4, 6, 9, 3, 4, 7, 6, 1, 1, 6, 1, 4, 6, 1, 6, 0, 3, 4, 4, 1,
        4, 6, 7, 8, 5, 5, 9, 9, 3, 0, 3, 6, 9, 2, 0, 4, 2, 8, 6, 7, 2, 6, 1, 4,
        4, 2, 2, 7, 3, 6, 3, 4, 3, 7, 0, 8, 7, 5, 3, 1, 9, 6, 9, 8, 6, 9, 1, 3,
        2, 7, 4, 4, 3, 3, 1, 6, 9, 3, 0, 6, 7, 9, 9, 1, 8, 7, 7, 6, 0, 6, 3, 5,
        7, 2, 9, 2, 0, 9, 0, 2, 5, 4, 9, 5, 1, 7, 1, 6, 1, 3, 3, 7, 4, 9, 6, 0,
        4, 8, 9, 9, 1, 3, 4, 0])
Ep 2/5, it 55/469: loss train: 0.03, accuracy train: 0.99tensor([2, 6, 5, 8, 8, 6, 3, 5, 2, 5, 6, 8, 8, 7, 5, 2, 7, 9, 8, 5, 6, 7, 2, 2,
        0, 7, 7, 7, 2, 6, 9, 6, 9, 4, 1, 3, 3, 5, 0, 1, 6, 0, 9, 8, 0, 9, 3, 8,
        7, 4, 7, 1, 6, 9, 1, 6, 9, 5, 8, 6, 4, 0, 8, 1, 6, 6, 2, 0, 4, 0, 6, 2,
        4, 5, 0, 1, 5, 6, 6, 4, 8, 4, 8, 4, 8, 5, 3, 6, 1, 8, 1, 1, 3, 7, 5, 6,
        0, 4, 1, 3, 0, 6, 4, 9, 3, 0, 9, 6, 1, 5, 3, 2, 1, 5, 6, 5, 3, 5, 0, 2,
        0, 7, 4, 0, 2, 8, 6, 0])
Ep 2/5, it 56/469: l

Ep 2/5, it 81/469: loss train: 0.03, accuracy train: 0.98tensor([7, 9, 2, 0, 1, 8, 6, 7, 0, 1, 4, 1, 3, 7, 6, 6, 8, 2, 3, 4, 9, 3, 9, 4,
        8, 1, 8, 7, 4, 6, 5, 2, 8, 1, 2, 2, 4, 4, 4, 0, 2, 0, 8, 3, 6, 0, 7, 4,
        8, 6, 7, 2, 9, 4, 8, 2, 1, 6, 5, 3, 6, 4, 7, 4, 6, 9, 5, 7, 2, 0, 8, 8,
        6, 1, 3, 3, 0, 1, 5, 4, 0, 7, 2, 9, 2, 9, 8, 8, 0, 4, 6, 4, 9, 2, 2, 0,
        7, 3, 8, 9, 1, 0, 0, 7, 9, 9, 2, 2, 9, 7, 9, 9, 8, 2, 4, 5, 7, 3, 3, 8,
        2, 3, 0, 6, 8, 6, 7, 8])
Ep 2/5, it 82/469: loss train: 0.02, accuracy train: 0.99tensor([8, 2, 8, 3, 2, 2, 6, 5, 6, 0, 4, 1, 7, 9, 7, 6, 0, 7, 9, 4, 9, 4, 6, 5,
        8, 0, 4, 5, 2, 3, 2, 7, 1, 8, 0, 0, 5, 5, 3, 6, 0, 1, 9, 6, 2, 5, 5, 5,
        1, 8, 1, 5, 6, 6, 3, 5, 6, 1, 1, 2, 2, 1, 8, 2, 0, 3, 0, 0, 4, 3, 3, 6,
        1, 0, 2, 0, 5, 0, 1, 8, 0, 8, 0, 0, 2, 3, 2, 8, 2, 1, 8, 1, 3, 6, 0, 5,
        3, 6, 2, 4, 5, 7, 6, 2, 1, 8, 0, 8, 6, 5, 5, 3, 9, 2, 1, 3, 5, 8, 4, 1,
        8, 7, 3, 7, 7, 0, 4, 6])
Ep 2/5, it 83/469: l

Ep 2/5, it 110/469: loss train: 0.01, accuracy train: 1.00tensor([4, 1, 0, 1, 5, 3, 0, 8, 5, 4, 4, 5, 9, 3, 0, 5, 9, 2, 6, 9, 1, 9, 4, 3,
        5, 8, 7, 4, 0, 2, 1, 2, 1, 7, 1, 7, 2, 4, 6, 8, 1, 7, 0, 3, 5, 8, 0, 3,
        0, 5, 7, 8, 2, 6, 6, 5, 1, 2, 9, 1, 9, 3, 6, 1, 1, 8, 4, 8, 5, 0, 1, 4,
        1, 2, 7, 3, 6, 6, 1, 0, 2, 1, 6, 7, 5, 3, 9, 9, 6, 3, 8, 2, 8, 5, 2, 3,
        0, 3, 0, 9, 1, 1, 1, 7, 4, 9, 6, 9, 2, 1, 2, 6, 1, 5, 1, 0, 0, 9, 1, 9,
        4, 8, 9, 7, 5, 1, 6, 8])
Ep 2/5, it 111/469: loss train: 0.01, accuracy train: 1.00tensor([5, 1, 6, 1, 9, 2, 8, 8, 2, 1, 5, 1, 4, 2, 6, 6, 8, 5, 7, 7, 2, 9, 1, 2,
        3, 8, 2, 0, 4, 4, 5, 3, 3, 9, 3, 0, 9, 6, 1, 2, 9, 6, 1, 6, 1, 9, 0, 9,
        3, 9, 9, 1, 2, 5, 1, 7, 5, 1, 8, 5, 0, 5, 1, 5, 6, 5, 9, 5, 7, 1, 4, 4,
        5, 7, 2, 2, 2, 8, 4, 8, 1, 6, 1, 5, 1, 8, 2, 8, 5, 4, 5, 0, 2, 6, 5, 8,
        3, 4, 0, 2, 9, 3, 6, 5, 8, 1, 6, 8, 8, 7, 4, 1, 8, 8, 1, 7, 9, 8, 2, 1,
        5, 1, 5, 0, 3, 2, 9, 0])
Ep 2/5, it 112/469

Ep 2/5, it 137/469: loss train: 0.04, accuracy train: 0.98tensor([3, 4, 3, 3, 3, 9, 1, 2, 7, 5, 0, 4, 0, 7, 5, 0, 5, 7, 9, 4, 9, 6, 7, 0,
        7, 3, 8, 2, 7, 0, 0, 3, 2, 2, 9, 1, 3, 1, 2, 2, 2, 5, 9, 4, 2, 2, 4, 4,
        6, 8, 4, 2, 5, 0, 3, 9, 4, 2, 8, 7, 6, 4, 1, 9, 0, 6, 2, 6, 6, 6, 6, 9,
        1, 9, 3, 6, 2, 0, 4, 6, 9, 5, 3, 2, 7, 5, 3, 7, 1, 2, 5, 5, 9, 7, 1, 0,
        1, 8, 9, 9, 0, 9, 8, 6, 4, 6, 8, 0, 1, 0, 0, 8, 4, 5, 7, 3, 1, 7, 6, 6,
        7, 9, 7, 9, 5, 9, 1, 3])
Ep 2/5, it 138/469: loss train: 0.11, accuracy train: 0.98tensor([9, 9, 2, 5, 5, 9, 1, 1, 1, 4, 1, 0, 7, 3, 2, 3, 5, 7, 5, 1, 6, 5, 9, 9,
        6, 0, 1, 3, 0, 0, 9, 2, 7, 2, 4, 6, 4, 7, 5, 7, 3, 1, 1, 0, 7, 6, 3, 7,
        4, 0, 8, 7, 8, 8, 3, 2, 4, 5, 3, 2, 8, 5, 3, 1, 9, 3, 5, 0, 1, 4, 8, 2,
        2, 1, 7, 7, 3, 5, 2, 1, 7, 6, 3, 7, 2, 0, 3, 4, 0, 0, 2, 9, 3, 2, 3, 8,
        9, 8, 7, 2, 9, 2, 3, 6, 1, 0, 4, 9, 5, 0, 1, 9, 7, 5, 2, 3, 0, 9, 0, 6,
        4, 2, 7, 3, 1, 9, 3, 4])
Ep 2/5, it 139/469

Ep 2/5, it 166/469: loss train: 0.02, accuracy train: 0.99tensor([1, 4, 7, 2, 9, 7, 1, 5, 3, 1, 3, 4, 3, 9, 1, 9, 6, 3, 2, 1, 8, 7, 3, 1,
        9, 0, 3, 2, 3, 7, 5, 1, 0, 1, 5, 7, 4, 5, 0, 9, 7, 8, 6, 1, 0, 7, 2, 8,
        8, 0, 3, 8, 5, 9, 7, 1, 1, 6, 8, 4, 9, 9, 3, 2, 1, 9, 8, 9, 5, 0, 1, 2,
        9, 5, 7, 3, 0, 7, 5, 5, 7, 8, 1, 0, 0, 6, 8, 6, 1, 8, 5, 3, 7, 9, 8, 8,
        5, 0, 4, 3, 7, 0, 4, 4, 2, 3, 3, 7, 7, 3, 1, 7, 3, 1, 6, 4, 0, 0, 5, 4,
        4, 3, 6, 4, 0, 4, 2, 3])
Ep 2/5, it 167/469: loss train: 0.02, accuracy train: 0.99tensor([5, 0, 5, 4, 4, 6, 4, 8, 6, 4, 5, 3, 0, 7, 7, 8, 6, 3, 7, 1, 8, 1, 1, 2,
        2, 3, 2, 7, 7, 1, 4, 5, 9, 0, 8, 7, 5, 3, 6, 8, 5, 6, 1, 7, 9, 7, 9, 9,
        6, 2, 1, 2, 8, 8, 2, 9, 3, 0, 3, 8, 9, 7, 7, 6, 9, 6, 2, 4, 9, 5, 6, 2,
        7, 9, 6, 3, 8, 4, 9, 8, 2, 8, 9, 0, 5, 4, 5, 2, 3, 4, 4, 7, 4, 1, 7, 4,
        1, 1, 8, 6, 3, 9, 2, 3, 4, 1, 3, 2, 4, 8, 8, 5, 6, 8, 9, 6, 0, 1, 3, 7,
        6, 7, 5, 1, 9, 6, 1, 2])
Ep 2/5, it 168/469

Ep 2/5, it 194/469: loss train: 0.03, accuracy train: 0.99tensor([2, 4, 1, 4, 3, 7, 2, 4, 4, 6, 3, 8, 0, 1, 2, 7, 3, 5, 2, 1, 8, 1, 3, 3,
        4, 1, 3, 9, 3, 1, 0, 1, 3, 9, 4, 9, 0, 3, 6, 0, 1, 0, 2, 2, 2, 9, 1, 9,
        9, 1, 6, 0, 4, 7, 5, 8, 0, 5, 4, 0, 7, 3, 9, 5, 0, 4, 0, 4, 8, 8, 2, 1,
        5, 9, 3, 2, 8, 4, 2, 4, 2, 0, 2, 7, 5, 4, 8, 6, 1, 8, 5, 0, 8, 7, 7, 2,
        3, 7, 5, 0, 7, 6, 9, 2, 3, 0, 6, 8, 7, 3, 1, 3, 0, 7, 7, 5, 9, 9, 5, 6,
        8, 1, 5, 5, 1, 9, 7, 9])
Ep 2/5, it 195/469: loss train: 0.02, accuracy train: 0.99tensor([4, 3, 1, 6, 8, 1, 4, 6, 2, 0, 6, 7, 6, 4, 6, 8, 7, 4, 4, 1, 9, 5, 1, 9,
        0, 1, 0, 8, 9, 6, 6, 9, 0, 9, 3, 2, 4, 6, 8, 9, 0, 0, 4, 1, 8, 3, 7, 3,
        7, 7, 7, 6, 4, 7, 8, 5, 7, 9, 0, 1, 0, 1, 7, 7, 6, 3, 4, 3, 4, 7, 8, 5,
        6, 5, 1, 2, 9, 3, 0, 7, 6, 6, 1, 8, 9, 4, 3, 0, 8, 2, 8, 7, 9, 6, 8, 7,
        4, 1, 9, 9, 7, 1, 4, 2, 4, 5, 7, 2, 6, 0, 9, 4, 6, 7, 7, 8, 1, 5, 8, 3,
        9, 9, 7, 3, 9, 7, 0, 8])
Ep 2/5, it 196/469

Ep 2/5, it 220/469: loss train: 0.03, accuracy train: 0.99tensor([9, 6, 2, 8, 0, 4, 8, 8, 2, 4, 1, 4, 1, 1, 6, 5, 5, 0, 1, 6, 8, 0, 9, 1,
        8, 0, 5, 4, 8, 3, 6, 8, 2, 9, 0, 8, 0, 0, 8, 5, 5, 8, 8, 7, 2, 9, 6, 1,
        9, 7, 5, 1, 8, 4, 3, 6, 3, 3, 5, 0, 0, 4, 8, 4, 9, 5, 2, 3, 2, 3, 3, 9,
        1, 7, 2, 9, 3, 4, 4, 2, 2, 1, 5, 2, 7, 7, 4, 5, 4, 5, 4, 0, 5, 6, 5, 2,
        7, 2, 0, 8, 8, 4, 6, 1, 6, 2, 1, 1, 2, 8, 7, 8, 7, 8, 4, 7, 7, 8, 8, 7,
        6, 8, 7, 7, 3, 9, 4, 2])
Ep 2/5, it 221/469: loss train: 0.03, accuracy train: 0.99tensor([0, 3, 6, 9, 9, 7, 4, 0, 5, 2, 4, 6, 3, 3, 4, 6, 7, 6, 6, 2, 1, 1, 7, 3,
        3, 4, 2, 0, 5, 6, 7, 9, 0, 0, 6, 2, 7, 7, 5, 0, 1, 5, 5, 5, 9, 7, 1, 5,
        0, 6, 0, 4, 0, 4, 0, 2, 8, 0, 3, 1, 2, 4, 7, 7, 3, 6, 3, 3, 6, 1, 5, 7,
        3, 1, 4, 1, 8, 3, 2, 0, 7, 1, 2, 7, 8, 1, 1, 1, 2, 1, 1, 6, 6, 3, 6, 2,
        7, 8, 8, 0, 2, 6, 7, 6, 1, 0, 5, 9, 7, 3, 9, 6, 9, 7, 7, 3, 5, 1, 9, 0,
        2, 5, 5, 4, 4, 5, 1, 7])
Ep 2/5, it 222/469

Ep 2/5, it 246/469: loss train: 0.01, accuracy train: 1.00tensor([9, 1, 8, 0, 9, 4, 0, 2, 3, 6, 7, 7, 7, 6, 1, 3, 2, 0, 8, 8, 0, 1, 2, 5,
        9, 7, 1, 1, 6, 1, 1, 7, 4, 2, 0, 8, 2, 4, 5, 1, 1, 7, 5, 4, 1, 9, 6, 8,
        8, 3, 0, 5, 9, 2, 1, 8, 7, 0, 5, 4, 5, 1, 9, 8, 0, 8, 1, 6, 9, 6, 8, 2,
        8, 6, 6, 6, 1, 7, 1, 9, 8, 9, 4, 1, 6, 9, 5, 6, 0, 7, 8, 1, 4, 0, 3, 1,
        5, 6, 6, 5, 0, 3, 3, 4, 3, 9, 8, 1, 1, 9, 1, 7, 9, 9, 9, 2, 9, 8, 6, 2,
        0, 8, 5, 9, 4, 7, 4, 2])
Ep 2/5, it 247/469: loss train: 0.16, accuracy train: 0.99tensor([1, 6, 0, 3, 1, 3, 3, 4, 5, 9, 7, 9, 6, 9, 2, 0, 3, 2, 0, 9, 9, 9, 6, 1,
        0, 3, 1, 3, 0, 1, 4, 6, 4, 1, 6, 4, 8, 6, 8, 3, 5, 5, 9, 9, 5, 5, 9, 7,
        8, 5, 3, 4, 6, 4, 8, 6, 5, 3, 2, 9, 2, 1, 1, 6, 0, 7, 9, 4, 1, 6, 5, 5,
        4, 4, 6, 8, 0, 6, 9, 5, 1, 3, 0, 1, 2, 9, 8, 2, 5, 8, 1, 2, 6, 7, 1, 1,
        0, 7, 0, 8, 8, 7, 5, 9, 5, 9, 3, 4, 0, 6, 7, 2, 6, 3, 4, 2, 3, 5, 1, 6,
        2, 8, 8, 6, 0, 3, 5, 3])
Ep 2/5, it 248/469

Ep 2/5, it 274/469: loss train: 0.01, accuracy train: 0.99tensor([4, 7, 7, 6, 1, 8, 0, 4, 9, 6, 4, 2, 9, 3, 0, 1, 8, 6, 4, 9, 8, 6, 2, 6,
        1, 1, 8, 5, 1, 4, 6, 9, 1, 9, 7, 2, 0, 5, 2, 9, 1, 9, 1, 6, 0, 4, 9, 8,
        4, 0, 1, 3, 8, 6, 3, 4, 0, 5, 4, 7, 4, 6, 5, 9, 6, 1, 9, 3, 8, 5, 4, 0,
        2, 4, 9, 8, 7, 3, 4, 8, 0, 9, 7, 9, 4, 6, 4, 7, 8, 3, 4, 6, 6, 8, 7, 8,
        3, 9, 6, 9, 9, 5, 5, 8, 4, 1, 3, 7, 9, 3, 5, 5, 1, 5, 0, 2, 9, 6, 0, 0,
        7, 2, 9, 0, 9, 1, 1, 8])
Ep 2/5, it 275/469: loss train: 0.02, accuracy train: 0.99tensor([3, 3, 7, 3, 2, 1, 1, 1, 7, 3, 9, 1, 0, 3, 6, 3, 9, 3, 4, 0, 9, 7, 4, 1,
        7, 5, 5, 0, 4, 4, 7, 6, 6, 0, 7, 7, 2, 1, 5, 4, 3, 7, 5, 5, 3, 0, 6, 1,
        3, 1, 0, 9, 6, 0, 1, 1, 6, 7, 6, 1, 7, 6, 8, 0, 3, 8, 6, 2, 5, 9, 7, 2,
        2, 1, 9, 8, 3, 2, 4, 6, 0, 9, 4, 6, 9, 2, 2, 3, 4, 1, 3, 5, 7, 5, 8, 2,
        0, 3, 8, 7, 1, 3, 6, 6, 5, 6, 9, 6, 9, 8, 2, 8, 1, 6, 0, 9, 1, 0, 6, 6,
        9, 2, 6, 2, 1, 7, 6, 2])
Ep 2/5, it 276/469

Ep 2/5, it 303/469: loss train: 0.02, accuracy train: 1.00tensor([7, 3, 2, 4, 4, 1, 0, 5, 0, 3, 9, 2, 5, 0, 5, 3, 2, 0, 6, 1, 2, 6, 3, 1,
        0, 7, 0, 0, 1, 3, 8, 9, 9, 7, 5, 2, 2, 1, 1, 9, 9, 8, 5, 5, 9, 0, 5, 2,
        0, 8, 2, 1, 6, 2, 8, 5, 2, 9, 0, 8, 9, 4, 0, 6, 2, 3, 1, 4, 6, 8, 9, 1,
        9, 8, 1, 7, 5, 8, 8, 9, 5, 7, 5, 7, 4, 2, 3, 4, 3, 6, 4, 7, 0, 6, 9, 0,
        8, 6, 5, 6, 1, 0, 1, 2, 7, 0, 3, 2, 8, 7, 8, 8, 4, 3, 3, 4, 1, 5, 4, 4,
        6, 9, 7, 2, 1, 7, 2, 5])
Ep 2/5, it 304/469: loss train: 0.01, accuracy train: 1.00tensor([6, 4, 9, 3, 1, 6, 7, 1, 3, 1, 0, 0, 4, 1, 1, 7, 7, 5, 3, 9, 6, 9, 1, 4,
        9, 7, 6, 3, 6, 7, 2, 7, 2, 4, 0, 4, 7, 2, 3, 1, 2, 8, 5, 4, 5, 2, 2, 8,
        4, 8, 0, 6, 1, 4, 5, 7, 9, 0, 2, 7, 6, 0, 9, 8, 6, 7, 9, 4, 2, 2, 0, 8,
        4, 7, 8, 1, 3, 0, 8, 7, 5, 0, 6, 1, 4, 1, 4, 4, 5, 0, 6, 6, 0, 9, 1, 0,
        2, 9, 0, 8, 4, 4, 9, 0, 4, 0, 4, 3, 5, 8, 0, 6, 5, 8, 8, 6, 3, 2, 7, 6,
        6, 0, 7, 8, 3, 7, 4, 6])
Ep 2/5, it 305/469

Ep 2/5, it 330/469: loss train: 0.02, accuracy train: 0.99tensor([9, 9, 3, 4, 2, 6, 7, 7, 8, 7, 0, 6, 0, 6, 0, 2, 9, 5, 2, 1, 4, 9, 6, 9,
        6, 9, 5, 3, 2, 4, 4, 2, 7, 2, 7, 1, 9, 3, 6, 7, 5, 6, 0, 5, 9, 6, 4, 2,
        8, 2, 5, 4, 5, 9, 6, 0, 0, 0, 1, 3, 4, 4, 6, 5, 1, 5, 9, 6, 0, 8, 3, 4,
        2, 7, 0, 9, 4, 5, 1, 4, 3, 1, 1, 1, 6, 3, 8, 8, 4, 9, 7, 2, 3, 5, 3, 8,
        8, 7, 7, 0, 7, 0, 5, 4, 5, 0, 5, 1, 2, 1, 1, 0, 8, 5, 1, 3, 6, 2, 4, 8,
        4, 5, 3, 2, 8, 7, 2, 8])
Ep 2/5, it 331/469: loss train: 0.03, accuracy train: 0.98tensor([4, 3, 3, 3, 0, 5, 1, 3, 7, 2, 4, 8, 8, 8, 8, 7, 4, 7, 8, 5, 8, 3, 1, 3,
        3, 1, 7, 6, 4, 0, 8, 9, 8, 5, 6, 2, 0, 5, 3, 1, 6, 0, 9, 2, 1, 6, 5, 8,
        9, 5, 3, 1, 8, 1, 8, 0, 5, 1, 0, 0, 1, 2, 5, 0, 8, 7, 2, 9, 5, 7, 4, 4,
        2, 8, 9, 6, 8, 8, 5, 7, 9, 5, 6, 0, 5, 6, 0, 8, 5, 8, 9, 9, 0, 9, 2, 4,
        3, 0, 7, 1, 6, 4, 2, 7, 9, 0, 9, 9, 7, 2, 9, 2, 7, 9, 0, 2, 1, 3, 9, 3,
        6, 4, 3, 1, 6, 5, 7, 1])
Ep 2/5, it 332/469

Ep 2/5, it 356/469: loss train: 0.01, accuracy train: 0.99tensor([3, 3, 0, 8, 1, 0, 8, 7, 7, 0, 0, 8, 6, 3, 7, 6, 9, 7, 1, 1, 1, 7, 9, 3,
        2, 9, 3, 5, 3, 8, 9, 0, 5, 6, 8, 4, 9, 3, 6, 8, 2, 0, 9, 5, 6, 6, 7, 9,
        2, 2, 6, 8, 7, 7, 7, 3, 4, 5, 3, 9, 7, 6, 4, 4, 7, 7, 6, 5, 6, 6, 1, 0,
        4, 0, 0, 9, 3, 7, 6, 8, 7, 7, 1, 9, 8, 4, 2, 5, 1, 0, 6, 5, 9, 1, 1, 3,
        6, 2, 0, 9, 5, 5, 8, 5, 6, 0, 2, 4, 7, 2, 0, 0, 7, 0, 5, 0, 5, 6, 4, 6,
        4, 0, 0, 6, 7, 1, 6, 0])
Ep 2/5, it 357/469: loss train: 0.01, accuracy train: 1.00tensor([0, 4, 5, 2, 3, 2, 5, 3, 1, 4, 1, 4, 4, 6, 1, 4, 8, 1, 1, 4, 6, 2, 7, 0,
        4, 8, 4, 9, 0, 7, 3, 9, 4, 9, 5, 9, 7, 2, 0, 6, 3, 9, 9, 7, 8, 4, 3, 3,
        7, 6, 0, 0, 1, 1, 5, 9, 4, 2, 9, 6, 9, 1, 1, 3, 6, 6, 3, 0, 7, 8, 1, 6,
        0, 2, 9, 4, 7, 4, 8, 7, 7, 1, 2, 8, 1, 3, 2, 9, 5, 9, 8, 5, 3, 6, 3, 0,
        4, 4, 1, 6, 3, 4, 9, 1, 3, 9, 0, 4, 9, 4, 3, 5, 3, 2, 6, 9, 3, 4, 2, 1,
        3, 3, 6, 2, 1, 5, 5, 0])
Ep 2/5, it 358/469

Ep 2/5, it 382/469: loss train: 0.00, accuracy train: 1.00tensor([9, 2, 1, 6, 4, 8, 4, 3, 7, 7, 4, 6, 0, 9, 7, 6, 9, 6, 7, 7, 3, 1, 5, 6,
        7, 1, 4, 5, 6, 5, 3, 5, 6, 7, 7, 6, 5, 6, 5, 7, 3, 1, 5, 7, 5, 9, 5, 2,
        7, 5, 2, 5, 1, 8, 6, 6, 4, 7, 8, 7, 7, 9, 3, 8, 0, 7, 1, 0, 8, 2, 4, 7,
        6, 8, 5, 6, 5, 0, 5, 3, 1, 5, 0, 4, 6, 5, 9, 7, 4, 1, 7, 3, 4, 8, 9, 3,
        7, 5, 7, 9, 7, 2, 0, 5, 9, 7, 2, 5, 9, 3, 9, 2, 4, 9, 4, 1, 1, 0, 4, 4,
        1, 1, 6, 5, 0, 1, 7, 6])
Ep 2/5, it 383/469: loss train: 0.01, accuracy train: 1.00tensor([6, 5, 2, 7, 7, 4, 1, 0, 7, 3, 8, 6, 3, 4, 3, 6, 6, 7, 7, 0, 9, 5, 9, 0,
        1, 5, 0, 5, 1, 2, 8, 4, 6, 1, 8, 4, 0, 4, 3, 0, 8, 4, 1, 4, 1, 0, 5, 9,
        5, 7, 8, 9, 8, 1, 9, 6, 1, 8, 5, 8, 0, 2, 7, 1, 4, 5, 7, 5, 4, 8, 8, 7,
        0, 1, 3, 3, 9, 0, 4, 4, 9, 6, 2, 4, 5, 4, 4, 0, 1, 3, 5, 1, 5, 7, 4, 3,
        2, 3, 8, 8, 3, 8, 3, 5, 6, 6, 5, 7, 0, 8, 6, 7, 0, 1, 0, 8, 9, 1, 9, 0,
        8, 4, 3, 3, 3, 5, 8, 6])
Ep 2/5, it 384/469

Ep 2/5, it 410/469: loss train: 0.02, accuracy train: 1.00tensor([3, 1, 1, 0, 3, 1, 7, 0, 7, 4, 1, 4, 3, 7, 5, 1, 8, 6, 4, 3, 4, 5, 7, 4,
        2, 0, 2, 7, 9, 3, 0, 7, 9, 0, 8, 5, 3, 3, 0, 2, 6, 0, 7, 7, 5, 1, 5, 1,
        7, 6, 0, 7, 9, 2, 6, 2, 1, 9, 1, 2, 3, 2, 4, 4, 2, 2, 2, 8, 7, 8, 8, 9,
        9, 7, 8, 0, 1, 0, 9, 1, 6, 3, 9, 5, 6, 1, 3, 4, 5, 9, 1, 6, 5, 5, 9, 7,
        1, 6, 8, 2, 4, 7, 6, 2, 1, 5, 5, 8, 8, 7, 3, 2, 2, 8, 5, 8, 9, 3, 1, 1,
        4, 9, 9, 2, 6, 1, 0, 0])
Ep 2/5, it 411/469: loss train: 0.03, accuracy train: 0.98tensor([7, 0, 2, 6, 9, 9, 3, 5, 8, 7, 9, 9, 1, 8, 9, 5, 4, 9, 1, 3, 4, 8, 8, 2,
        5, 9, 1, 4, 4, 3, 6, 7, 9, 2, 9, 2, 4, 2, 8, 1, 4, 9, 4, 1, 0, 9, 6, 3,
        1, 5, 4, 8, 4, 3, 7, 8, 7, 5, 0, 4, 4, 1, 3, 8, 6, 7, 5, 7, 7, 6, 7, 5,
        3, 9, 0, 3, 6, 2, 2, 5, 2, 1, 2, 7, 3, 0, 8, 1, 4, 5, 2, 4, 4, 7, 4, 2,
        1, 5, 0, 4, 7, 8, 6, 0, 0, 3, 5, 4, 5, 9, 6, 4, 8, 9, 1, 4, 7, 1, 7, 9,
        0, 1, 2, 8, 0, 5, 5, 5])
Ep 2/5, it 412/469

Ep 2/5, it 437/469: loss train: 0.02, accuracy train: 0.99tensor([0, 5, 0, 9, 1, 0, 8, 7, 7, 8, 8, 9, 1, 2, 7, 0, 9, 3, 8, 9, 1, 4, 1, 9,
        9, 7, 6, 6, 5, 3, 1, 6, 0, 8, 4, 0, 3, 5, 3, 2, 1, 4, 1, 6, 9, 3, 1, 6,
        1, 4, 0, 8, 9, 7, 2, 4, 2, 0, 5, 5, 9, 8, 3, 9, 2, 2, 6, 3, 5, 0, 5, 7,
        7, 0, 2, 6, 1, 7, 6, 5, 3, 4, 8, 8, 1, 6, 5, 6, 3, 9, 9, 2, 7, 5, 1, 3,
        1, 6, 1, 2, 2, 3, 4, 8, 2, 8, 6, 0, 6, 9, 9, 4, 9, 7, 1, 4, 1, 8, 6, 4,
        1, 5, 9, 1, 3, 8, 1, 0])
Ep 2/5, it 438/469: loss train: 0.04, accuracy train: 0.98tensor([5, 9, 3, 4, 5, 8, 6, 7, 5, 3, 8, 7, 3, 4, 3, 8, 7, 8, 9, 0, 1, 8, 8, 9,
        4, 6, 6, 6, 7, 3, 0, 0, 7, 7, 7, 1, 7, 2, 6, 5, 8, 3, 3, 7, 8, 1, 5, 2,
        4, 2, 7, 1, 2, 0, 2, 7, 9, 4, 7, 9, 2, 5, 0, 3, 0, 6, 0, 6, 4, 6, 3, 2,
        6, 7, 2, 4, 1, 9, 0, 3, 8, 3, 7, 1, 6, 8, 8, 8, 3, 1, 6, 7, 5, 9, 3, 7,
        8, 4, 4, 1, 9, 6, 2, 7, 5, 4, 5, 1, 6, 7, 2, 4, 3, 7, 8, 6, 4, 3, 5, 3,
        6, 5, 7, 1, 8, 1, 7, 2])
Ep 2/5, it 439/469

Ep 2/5, it 462/469: loss train: 0.01, accuracy train: 1.00tensor([2, 5, 9, 9, 6, 2, 4, 4, 5, 8, 8, 6, 8, 3, 5, 8, 7, 3, 3, 4, 1, 2, 1, 0,
        1, 8, 4, 3, 7, 0, 3, 2, 9, 2, 9, 4, 3, 7, 6, 2, 2, 9, 6, 7, 0, 2, 4, 6,
        1, 4, 1, 5, 8, 0, 2, 4, 9, 4, 9, 7, 4, 0, 1, 3, 3, 3, 8, 9, 3, 4, 4, 0,
        3, 3, 8, 3, 3, 2, 3, 1, 9, 9, 1, 7, 3, 7, 2, 7, 6, 6, 5, 3, 7, 5, 1, 5,
        6, 0, 6, 6, 6, 4, 6, 0, 1, 2, 8, 2, 8, 0, 6, 3, 0, 8, 2, 2, 4, 8, 6, 5,
        5, 8, 1, 0, 6, 8, 1, 1])
Ep 2/5, it 463/469: loss train: 0.02, accuracy train: 1.00tensor([5, 2, 9, 4, 6, 3, 7, 7, 5, 7, 5, 7, 1, 4, 0, 4, 2, 8, 6, 8, 7, 1, 7, 6,
        2, 3, 5, 7, 0, 1, 3, 1, 3, 0, 2, 2, 6, 4, 5, 7, 2, 9, 3, 8, 1, 3, 6, 8,
        8, 8, 8, 4, 9, 3, 9, 4, 4, 7, 4, 5, 7, 4, 1, 0, 0, 4, 0, 7, 9, 6, 4, 0,
        0, 1, 1, 6, 5, 0, 0, 9, 9, 2, 6, 8, 8, 7, 2, 9, 9, 3, 8, 1, 1, 9, 8, 4,
        0, 5, 6, 7, 7, 7, 4, 5, 1, 2, 6, 1, 6, 1, 4, 4, 4, 6, 9, 8, 0, 8, 8, 9,
        1, 3, 4, 1, 7, 2, 1, 7])
Ep 2/5, it 464/469

Ep 3/5, it 13/469: loss train: 0.02, accuracy train: 0.99tensor([3, 5, 7, 1, 2, 8, 9, 0, 2, 1, 0, 6, 3, 6, 7, 2, 6, 7, 6, 3, 7, 7, 2, 7,
        9, 6, 3, 1, 1, 0, 1, 2, 1, 3, 4, 4, 2, 5, 4, 3, 1, 2, 5, 9, 9, 1, 4, 9,
        7, 1, 0, 5, 8, 4, 5, 0, 4, 6, 2, 4, 9, 9, 3, 6, 8, 0, 4, 2, 1, 9, 9, 3,
        5, 6, 3, 5, 1, 9, 2, 1, 9, 4, 8, 0, 7, 1, 1, 0, 0, 6, 8, 2, 0, 9, 2, 9,
        0, 5, 5, 6, 4, 1, 6, 9, 0, 0, 2, 2, 3, 1, 7, 7, 1, 6, 6, 9, 1, 1, 1, 1,
        2, 2, 6, 5, 5, 6, 0, 1])
Ep 3/5, it 14/469: loss train: 0.02, accuracy train: 0.99tensor([1, 4, 3, 9, 2, 7, 8, 6, 4, 5, 1, 1, 8, 4, 1, 4, 0, 9, 6, 9, 2, 2, 5, 9,
        4, 6, 6, 5, 0, 2, 3, 9, 3, 8, 4, 9, 5, 9, 9, 4, 0, 7, 8, 5, 1, 7, 1, 3,
        4, 7, 0, 3, 3, 2, 3, 0, 4, 6, 9, 2, 0, 5, 8, 4, 3, 3, 2, 1, 5, 2, 9, 7,
        3, 8, 3, 7, 1, 4, 9, 0, 7, 4, 9, 9, 2, 5, 1, 6, 7, 7, 3, 9, 7, 3, 4, 4,
        1, 2, 1, 1, 2, 8, 7, 7, 6, 9, 6, 9, 8, 0, 3, 7, 6, 0, 0, 8, 3, 1, 9, 4,
        6, 6, 3, 1, 8, 9, 0, 9])
Ep 3/5, it 15/469: l

Ep 3/5, it 43/469: loss train: 0.01, accuracy train: 1.00tensor([1, 1, 9, 7, 8, 5, 9, 3, 2, 3, 4, 6, 6, 5, 3, 1, 3, 8, 2, 0, 5, 6, 3, 9,
        2, 2, 5, 2, 6, 3, 3, 9, 4, 2, 4, 8, 5, 2, 3, 6, 6, 8, 9, 9, 0, 3, 3, 7,
        0, 6, 7, 0, 5, 7, 6, 2, 9, 7, 6, 2, 2, 2, 7, 9, 4, 0, 5, 2, 2, 9, 9, 2,
        9, 3, 6, 2, 8, 4, 2, 2, 7, 9, 0, 0, 7, 8, 3, 8, 0, 5, 8, 0, 6, 1, 2, 3,
        0, 3, 3, 5, 5, 5, 6, 8, 7, 2, 2, 6, 9, 0, 5, 0, 4, 2, 5, 6, 0, 1, 3, 7,
        4, 0, 6, 1, 7, 0, 4, 1])
Ep 3/5, it 44/469: loss train: 0.01, accuracy train: 0.99tensor([0, 0, 3, 7, 5, 5, 9, 4, 8, 6, 6, 3, 3, 4, 8, 8, 4, 3, 8, 0, 6, 2, 0, 3,
        0, 2, 0, 4, 2, 8, 1, 4, 5, 3, 9, 9, 3, 1, 1, 9, 5, 2, 0, 0, 2, 5, 6, 0,
        2, 3, 3, 3, 9, 2, 8, 2, 9, 1, 9, 6, 8, 0, 8, 4, 9, 3, 0, 0, 7, 7, 4, 9,
        7, 4, 1, 3, 7, 3, 2, 0, 7, 9, 4, 6, 0, 3, 1, 1, 3, 9, 1, 3, 3, 0, 7, 1,
        6, 8, 6, 3, 3, 5, 7, 8, 5, 8, 1, 9, 7, 6, 1, 0, 7, 2, 0, 9, 7, 3, 3, 0,
        3, 5, 5, 7, 0, 7, 0, 3])
Ep 3/5, it 45/469: l

Ep 3/5, it 71/469: loss train: 0.01, accuracy train: 1.00tensor([8, 1, 2, 1, 8, 9, 7, 9, 0, 9, 6, 4, 7, 2, 1, 0, 6, 5, 7, 2, 7, 7, 0, 3,
        1, 1, 7, 3, 9, 1, 2, 3, 2, 0, 0, 7, 3, 4, 6, 8, 9, 6, 3, 5, 5, 5, 9, 1,
        4, 4, 0, 1, 6, 7, 7, 0, 8, 3, 0, 6, 7, 9, 6, 1, 3, 5, 1, 5, 0, 0, 5, 5,
        9, 3, 1, 2, 9, 6, 7, 8, 0, 6, 9, 8, 1, 4, 4, 2, 7, 2, 0, 2, 3, 0, 2, 0,
        9, 6, 9, 0, 1, 3, 7, 2, 7, 9, 5, 1, 6, 3, 1, 6, 4, 7, 9, 9, 8, 4, 4, 0,
        8, 2, 2, 4, 9, 4, 9, 0])
Ep 3/5, it 72/469: loss train: 0.01, accuracy train: 1.00tensor([7, 8, 5, 9, 2, 7, 8, 2, 2, 9, 3, 6, 4, 5, 3, 6, 0, 2, 3, 2, 0, 8, 4, 5,
        8, 1, 7, 9, 5, 7, 3, 0, 0, 2, 0, 6, 4, 9, 9, 8, 1, 1, 6, 1, 7, 0, 2, 7,
        4, 6, 6, 5, 7, 3, 1, 1, 6, 8, 4, 6, 0, 8, 3, 4, 3, 5, 1, 9, 3, 2, 3, 7,
        7, 6, 2, 9, 8, 7, 1, 0, 1, 7, 8, 6, 3, 6, 6, 6, 6, 4, 5, 0, 1, 5, 1, 7,
        8, 1, 3, 0, 2, 9, 5, 7, 6, 3, 4, 3, 1, 0, 1, 8, 1, 2, 0, 2, 4, 0, 7, 4,
        8, 1, 4, 2, 1, 7, 0, 2])
Ep 3/5, it 73/469: l

Ep 3/5, it 100/469: loss train: 0.02, accuracy train: 1.00tensor([5, 3, 4, 7, 6, 1, 8, 1, 5, 4, 9, 4, 2, 5, 5, 2, 1, 8, 0, 3, 1, 1, 7, 5,
        3, 1, 0, 1, 7, 1, 4, 9, 6, 1, 4, 9, 5, 0, 7, 0, 7, 2, 7, 6, 7, 1, 1, 5,
        3, 9, 2, 6, 0, 5, 9, 4, 8, 0, 7, 1, 8, 6, 0, 8, 9, 4, 7, 7, 9, 9, 1, 9,
        3, 6, 0, 2, 1, 0, 4, 5, 7, 6, 7, 6, 8, 7, 1, 9, 1, 8, 8, 4, 8, 0, 2, 6,
        8, 5, 4, 8, 9, 3, 1, 6, 5, 8, 3, 0, 0, 7, 5, 7, 5, 8, 1, 3, 4, 3, 1, 0,
        0, 0, 7, 6, 4, 0, 7, 4])
Ep 3/5, it 101/469: loss train: 0.01, accuracy train: 1.00tensor([1, 1, 9, 0, 2, 1, 0, 1, 8, 8, 4, 7, 7, 3, 3, 5, 8, 1, 7, 7, 3, 0, 5, 2,
        3, 7, 7, 5, 7, 5, 3, 9, 7, 5, 7, 9, 3, 6, 8, 3, 9, 7, 7, 4, 9, 3, 6, 2,
        2, 1, 1, 0, 6, 8, 3, 8, 3, 8, 5, 7, 1, 0, 4, 6, 4, 5, 9, 8, 8, 0, 7, 0,
        1, 0, 9, 7, 9, 5, 0, 5, 8, 2, 2, 6, 9, 7, 3, 3, 4, 2, 8, 5, 1, 7, 2, 4,
        3, 6, 2, 4, 3, 5, 6, 3, 5, 6, 4, 9, 1, 1, 3, 6, 9, 8, 7, 9, 9, 4, 5, 2,
        5, 3, 8, 6, 4, 9, 1, 5])
Ep 3/5, it 102/469

Ep 3/5, it 128/469: loss train: 0.02, accuracy train: 1.00tensor([3, 2, 1, 0, 7, 1, 5, 3, 9, 3, 8, 2, 5, 3, 4, 8, 7, 5, 0, 9, 7, 7, 1, 2,
        4, 3, 5, 2, 5, 7, 9, 1, 8, 7, 3, 3, 2, 6, 5, 1, 4, 7, 0, 4, 7, 8, 1, 9,
        3, 5, 9, 7, 3, 7, 2, 8, 4, 6, 0, 1, 9, 5, 1, 2, 8, 4, 9, 2, 5, 6, 4, 2,
        7, 9, 4, 6, 0, 7, 0, 4, 6, 9, 7, 7, 7, 1, 4, 0, 4, 1, 5, 0, 4, 2, 1, 3,
        2, 4, 9, 2, 0, 7, 8, 0, 6, 0, 8, 7, 7, 1, 7, 2, 6, 0, 5, 5, 9, 7, 5, 8,
        0, 9, 1, 4, 6, 0, 4, 0])
Ep 3/5, it 129/469: loss train: 0.07, accuracy train: 0.98tensor([6, 4, 9, 7, 0, 6, 7, 0, 8, 0, 4, 9, 5, 8, 7, 2, 1, 2, 8, 1, 7, 3, 5, 4,
        1, 8, 9, 4, 3, 3, 1, 0, 0, 0, 2, 4, 4, 5, 4, 2, 4, 9, 4, 8, 4, 9, 5, 9,
        1, 1, 3, 6, 9, 2, 0, 7, 5, 7, 5, 9, 2, 9, 9, 7, 9, 2, 5, 4, 1, 3, 3, 9,
        9, 0, 7, 3, 8, 5, 0, 9, 2, 3, 4, 0, 9, 3, 0, 6, 0, 3, 3, 4, 0, 2, 9, 4,
        5, 0, 1, 7, 6, 9, 2, 9, 5, 7, 3, 1, 9, 8, 2, 8, 4, 0, 5, 9, 1, 0, 5, 5,
        7, 0, 2, 8, 6, 0, 0, 1])
Ep 3/5, it 130/469

Ep 3/5, it 154/469: loss train: 0.01, accuracy train: 1.00tensor([5, 7, 6, 4, 3, 0, 9, 9, 3, 5, 0, 6, 0, 9, 6, 3, 2, 4, 7, 5, 0, 0, 0, 8,
        3, 6, 4, 1, 5, 9, 4, 5, 2, 3, 5, 6, 4, 4, 9, 0, 5, 0, 9, 6, 4, 5, 8, 5,
        3, 4, 7, 4, 1, 5, 1, 8, 1, 9, 0, 3, 8, 4, 8, 6, 8, 9, 2, 5, 3, 8, 3, 2,
        7, 4, 8, 1, 9, 9, 8, 0, 4, 6, 5, 4, 3, 2, 4, 3, 7, 6, 9, 9, 5, 4, 2, 4,
        8, 9, 1, 1, 0, 9, 4, 7, 1, 8, 4, 0, 1, 6, 4, 8, 5, 5, 9, 6, 2, 4, 9, 3,
        1, 5, 9, 1, 8, 0, 7, 0])
Ep 3/5, it 155/469: loss train: 0.01, accuracy train: 1.00tensor([2, 8, 0, 1, 1, 6, 8, 5, 7, 3, 0, 7, 4, 1, 5, 3, 4, 4, 1, 2, 4, 1, 7, 9,
        4, 1, 9, 1, 2, 7, 7, 7, 3, 1, 2, 1, 7, 2, 3, 9, 1, 2, 4, 1, 5, 8, 7, 7,
        9, 3, 6, 5, 7, 8, 8, 7, 2, 3, 9, 8, 7, 3, 4, 0, 2, 3, 4, 9, 9, 5, 9, 6,
        4, 2, 7, 4, 9, 7, 3, 2, 1, 9, 3, 6, 9, 7, 1, 0, 8, 3, 4, 9, 8, 4, 3, 7,
        6, 1, 2, 4, 6, 5, 2, 1, 5, 2, 1, 0, 4, 6, 7, 2, 3, 1, 0, 9, 6, 1, 3, 1,
        8, 9, 1, 9, 2, 2, 7, 2])
Ep 3/5, it 156/469

Ep 3/5, it 181/469: loss train: 0.01, accuracy train: 1.00tensor([6, 5, 0, 1, 1, 9, 0, 8, 5, 4, 6, 1, 1, 0, 4, 6, 9, 0, 3, 5, 2, 0, 6, 1,
        4, 2, 8, 8, 2, 8, 3, 0, 4, 7, 4, 3, 9, 7, 1, 6, 1, 4, 5, 7, 6, 5, 5, 5,
        9, 3, 7, 0, 5, 3, 1, 9, 0, 3, 3, 2, 6, 3, 1, 5, 1, 4, 6, 3, 4, 5, 7, 2,
        0, 5, 1, 8, 2, 5, 9, 3, 3, 3, 0, 8, 7, 3, 4, 5, 8, 8, 3, 8, 3, 8, 4, 4,
        2, 8, 2, 0, 8, 8, 9, 3, 3, 2, 5, 2, 9, 9, 7, 9, 1, 1, 5, 9, 2, 7, 2, 1,
        9, 3, 5, 4, 3, 7, 9, 1])
Ep 3/5, it 182/469: loss train: 0.00, accuracy train: 1.00tensor([8, 7, 9, 6, 6, 5, 0, 7, 6, 4, 2, 7, 7, 1, 8, 2, 0, 2, 1, 7, 2, 8, 6, 9,
        9, 3, 4, 7, 7, 1, 3, 3, 1, 6, 4, 3, 2, 5, 2, 2, 9, 0, 8, 5, 0, 9, 6, 2,
        0, 1, 0, 1, 8, 8, 7, 1, 6, 5, 8, 5, 8, 5, 7, 9, 4, 4, 8, 7, 8, 1, 1, 8,
        0, 3, 7, 6, 9, 8, 7, 3, 0, 0, 2, 3, 9, 4, 1, 1, 1, 0, 5, 2, 7, 1, 3, 2,
        5, 6, 5, 3, 8, 5, 3, 4, 6, 7, 8, 8, 5, 5, 1, 1, 8, 2, 6, 6, 6, 4, 0, 6,
        2, 7, 0, 9, 3, 0, 8, 0])
Ep 3/5, it 183/469

Ep 3/5, it 208/469: loss train: 0.00, accuracy train: 1.00tensor([5, 8, 0, 4, 2, 8, 8, 1, 1, 0, 6, 4, 4, 0, 0, 6, 6, 9, 5, 0, 6, 5, 7, 2,
        6, 7, 3, 9, 5, 0, 7, 5, 4, 9, 4, 7, 4, 4, 2, 8, 8, 8, 9, 2, 7, 7, 6, 1,
        6, 1, 2, 7, 4, 4, 1, 0, 1, 0, 7, 0, 8, 9, 9, 5, 8, 4, 9, 1, 3, 7, 7, 1,
        0, 7, 7, 6, 8, 2, 8, 9, 9, 8, 9, 4, 0, 4, 8, 5, 7, 6, 4, 8, 1, 3, 7, 0,
        1, 2, 3, 7, 4, 8, 0, 7, 8, 4, 4, 4, 2, 9, 8, 6, 0, 0, 3, 3, 4, 3, 7, 0,
        9, 9, 4, 0, 8, 3, 7, 9])
Ep 3/5, it 209/469: loss train: 0.01, accuracy train: 1.00tensor([3, 4, 9, 0, 9, 8, 3, 4, 3, 0, 3, 1, 1, 2, 2, 2, 0, 3, 2, 2, 6, 3, 5, 2,
        1, 1, 2, 6, 0, 7, 6, 3, 8, 7, 7, 9, 2, 2, 5, 6, 2, 4, 7, 3, 2, 3, 0, 1,
        2, 4, 3, 9, 3, 7, 8, 6, 2, 9, 8, 8, 2, 8, 4, 2, 1, 5, 0, 1, 1, 6, 7, 6,
        5, 0, 9, 7, 8, 8, 4, 1, 0, 1, 7, 1, 4, 4, 6, 9, 4, 2, 3, 8, 1, 7, 6, 8,
        0, 5, 3, 8, 5, 9, 8, 6, 9, 3, 1, 4, 0, 2, 7, 9, 0, 6, 0, 4, 4, 4, 4, 6,
        8, 3, 9, 1, 4, 6, 7, 6])
Ep 3/5, it 210/469

Ep 3/5, it 238/469: loss train: 0.02, accuracy train: 0.99tensor([9, 7, 8, 1, 1, 3, 4, 8, 0, 0, 3, 1, 0, 7, 3, 4, 1, 2, 6, 1, 0, 7, 8, 2,
        6, 4, 7, 9, 1, 1, 7, 6, 8, 7, 7, 5, 9, 4, 7, 1, 8, 2, 3, 8, 1, 5, 7, 5,
        4, 3, 9, 3, 3, 4, 5, 3, 6, 0, 8, 7, 4, 3, 0, 8, 6, 0, 3, 6, 4, 9, 0, 1,
        3, 9, 1, 3, 5, 1, 7, 4, 6, 9, 6, 7, 0, 5, 8, 1, 1, 8, 0, 9, 2, 1, 6, 5,
        6, 9, 8, 2, 4, 8, 4, 9, 0, 7, 0, 8, 5, 9, 0, 2, 3, 2, 8, 1, 9, 0, 2, 2,
        9, 9, 8, 4, 0, 4, 8, 7])
Ep 3/5, it 239/469: loss train: 0.01, accuracy train: 1.00tensor([0, 9, 6, 8, 1, 3, 4, 3, 9, 7, 1, 3, 4, 3, 7, 3, 9, 9, 9, 4, 1, 0, 4, 1,
        4, 3, 0, 1, 6, 4, 6, 0, 2, 8, 5, 2, 0, 6, 5, 2, 1, 2, 0, 3, 7, 3, 3, 5,
        7, 3, 2, 5, 2, 1, 7, 3, 8, 9, 0, 7, 6, 7, 7, 9, 1, 8, 5, 4, 3, 3, 9, 6,
        3, 1, 6, 8, 2, 1, 4, 4, 7, 6, 0, 5, 2, 7, 7, 8, 3, 0, 9, 7, 3, 0, 1, 2,
        5, 4, 2, 8, 3, 3, 9, 8, 2, 9, 3, 1, 5, 4, 8, 9, 6, 1, 7, 3, 3, 8, 6, 2,
        3, 5, 1, 9, 4, 9, 1, 3])
Ep 3/5, it 240/469

Ep 3/5, it 267/469: loss train: 0.03, accuracy train: 0.99tensor([4, 1, 1, 4, 5, 9, 6, 5, 2, 9, 4, 3, 8, 2, 3, 1, 5, 5, 4, 5, 0, 4, 0, 5,
        8, 1, 9, 8, 2, 3, 8, 7, 8, 8, 4, 3, 6, 5, 4, 3, 1, 2, 0, 2, 5, 4, 1, 0,
        3, 2, 2, 4, 1, 8, 2, 0, 3, 1, 7, 6, 3, 1, 6, 8, 6, 7, 7, 6, 4, 6, 8, 2,
        8, 1, 7, 4, 5, 8, 9, 4, 7, 3, 4, 7, 0, 5, 1, 1, 1, 1, 7, 5, 5, 2, 7, 8,
        2, 0, 9, 6, 3, 8, 8, 5, 6, 9, 0, 8, 8, 1, 3, 8, 3, 2, 4, 9, 9, 2, 5, 2,
        2, 4, 4, 9, 3, 1, 0, 2])
Ep 3/5, it 268/469: loss train: 0.02, accuracy train: 0.99tensor([7, 6, 2, 7, 6, 6, 5, 1, 2, 5, 2, 5, 5, 6, 0, 0, 7, 1, 9, 7, 2, 5, 5, 7,
        7, 6, 5, 8, 2, 2, 1, 3, 4, 7, 1, 5, 5, 6, 3, 1, 2, 7, 4, 0, 7, 7, 9, 6,
        5, 2, 1, 6, 8, 7, 8, 2, 6, 2, 8, 6, 6, 1, 6, 2, 3, 9, 1, 3, 7, 0, 4, 1,
        9, 6, 8, 9, 3, 9, 7, 6, 3, 4, 5, 0, 1, 6, 4, 4, 6, 4, 1, 2, 5, 4, 2, 8,
        4, 7, 1, 0, 3, 3, 4, 9, 3, 1, 5, 0, 5, 4, 3, 0, 1, 7, 4, 1, 1, 7, 4, 6,
        4, 0, 1, 6, 1, 2, 0, 2])
Ep 3/5, it 269/469

Ep 3/5, it 296/469: loss train: 0.02, accuracy train: 1.00tensor([3, 3, 0, 7, 4, 0, 6, 8, 0, 8, 1, 1, 2, 9, 0, 3, 8, 6, 5, 4, 3, 1, 8, 5,
        5, 4, 6, 3, 2, 0, 0, 0, 3, 1, 1, 2, 6, 2, 0, 5, 6, 1, 5, 4, 9, 3, 7, 0,
        1, 7, 2, 9, 9, 2, 1, 2, 6, 5, 0, 0, 8, 6, 6, 5, 7, 4, 0, 4, 8, 8, 2, 6,
        8, 7, 7, 0, 2, 2, 7, 5, 8, 6, 8, 7, 0, 9, 5, 1, 3, 6, 1, 4, 4, 4, 7, 7,
        0, 3, 5, 8, 3, 3, 5, 8, 1, 0, 0, 8, 9, 6, 0, 4, 3, 4, 5, 2, 8, 7, 6, 7,
        6, 8, 4, 7, 1, 6, 5, 1])
Ep 3/5, it 297/469: loss train: 0.03, accuracy train: 0.99tensor([7, 5, 1, 2, 3, 4, 9, 3, 7, 3, 1, 4, 2, 2, 3, 2, 7, 7, 8, 3, 6, 2, 2, 1,
        4, 3, 2, 6, 4, 0, 7, 3, 9, 8, 7, 1, 2, 1, 8, 1, 2, 1, 6, 7, 0, 5, 8, 3,
        3, 8, 6, 3, 4, 6, 0, 3, 9, 3, 1, 6, 5, 6, 9, 4, 0, 4, 8, 1, 9, 9, 7, 3,
        8, 9, 9, 2, 2, 7, 3, 4, 8, 4, 9, 4, 0, 1, 5, 1, 1, 0, 6, 3, 9, 6, 0, 6,
        7, 2, 6, 8, 3, 6, 7, 2, 2, 5, 7, 2, 1, 4, 7, 4, 0, 2, 4, 0, 4, 1, 3, 2,
        6, 2, 9, 3, 7, 2, 3, 3])
Ep 3/5, it 298/469

Ep 3/5, it 323/469: loss train: 0.02, accuracy train: 0.99tensor([1, 6, 6, 5, 6, 9, 8, 1, 5, 5, 6, 2, 1, 1, 3, 7, 0, 2, 6, 3, 7, 9, 6, 2,
        8, 7, 4, 3, 1, 0, 2, 1, 9, 7, 0, 7, 3, 0, 3, 2, 2, 6, 2, 9, 6, 2, 1, 9,
        6, 7, 1, 9, 3, 3, 7, 4, 9, 8, 2, 6, 7, 5, 9, 8, 1, 8, 5, 0, 7, 7, 8, 7,
        7, 4, 7, 4, 1, 1, 3, 9, 1, 7, 0, 9, 1, 4, 7, 0, 2, 3, 9, 2, 2, 7, 1, 4,
        9, 1, 1, 9, 5, 3, 1, 5, 8, 8, 4, 8, 2, 8, 2, 9, 7, 2, 5, 5, 4, 1, 1, 2,
        3, 2, 0, 4, 6, 9, 1, 5])
Ep 3/5, it 324/469: loss train: 0.02, accuracy train: 1.00tensor([3, 5, 2, 1, 2, 8, 0, 3, 2, 6, 6, 8, 4, 3, 6, 1, 9, 3, 9, 8, 1, 5, 3, 1,
        7, 4, 8, 3, 1, 6, 1, 1, 3, 5, 5, 2, 3, 5, 4, 2, 1, 6, 4, 4, 3, 7, 5, 4,
        6, 4, 4, 5, 8, 3, 5, 9, 1, 6, 7, 7, 2, 9, 8, 6, 9, 4, 3, 4, 3, 1, 2, 9,
        3, 1, 1, 1, 2, 4, 1, 8, 5, 4, 9, 9, 7, 8, 2, 6, 5, 9, 7, 9, 2, 5, 6, 6,
        4, 4, 0, 1, 1, 0, 1, 0, 6, 5, 7, 1, 9, 9, 6, 2, 4, 0, 9, 7, 8, 7, 4, 3,
        9, 5, 7, 6, 4, 7, 5, 3])
Ep 3/5, it 325/469

Ep 3/5, it 353/469: loss train: 0.01, accuracy train: 1.00tensor([7, 5, 9, 1, 7, 9, 5, 2, 0, 9, 3, 3, 3, 9, 9, 4, 9, 0, 5, 3, 4, 3, 4, 7,
        4, 9, 9, 9, 4, 5, 6, 5, 5, 7, 4, 7, 0, 0, 1, 4, 3, 5, 2, 5, 9, 3, 1, 6,
        1, 7, 6, 0, 1, 0, 4, 4, 3, 7, 0, 8, 2, 3, 3, 2, 5, 2, 4, 7, 5, 1, 6, 9,
        8, 1, 7, 8, 6, 5, 1, 2, 7, 6, 2, 5, 2, 6, 7, 1, 0, 3, 4, 3, 2, 4, 8, 1,
        7, 1, 8, 1, 1, 0, 3, 5, 8, 3, 3, 3, 3, 5, 4, 2, 4, 0, 4, 3, 5, 0, 8, 8,
        0, 7, 4, 8, 7, 4, 7, 0])
Ep 3/5, it 354/469: loss train: 0.02, accuracy train: 0.99tensor([4, 0, 1, 9, 5, 2, 5, 4, 6, 8, 3, 8, 9, 2, 4, 8, 5, 8, 7, 4, 2, 6, 6, 0,
        2, 5, 0, 1, 6, 1, 3, 5, 2, 0, 5, 9, 9, 3, 1, 2, 8, 1, 6, 7, 7, 1, 1, 9,
        3, 7, 4, 8, 3, 3, 2, 4, 6, 9, 3, 4, 2, 5, 3, 3, 8, 8, 6, 5, 9, 5, 0, 4,
        9, 4, 3, 7, 3, 8, 1, 0, 3, 1, 5, 8, 9, 1, 8, 0, 1, 2, 7, 3, 7, 5, 6, 2,
        9, 2, 7, 3, 2, 3, 8, 7, 2, 4, 4, 7, 8, 4, 2, 7, 6, 2, 2, 0, 9, 5, 3, 9,
        0, 5, 5, 2, 5, 6, 5, 7])
Ep 3/5, it 355/469

Ep 3/5, it 379/469: loss train: 0.02, accuracy train: 0.98tensor([3, 8, 1, 4, 0, 6, 4, 9, 2, 8, 6, 8, 4, 4, 0, 2, 0, 7, 3, 9, 6, 8, 9, 0,
        0, 4, 8, 0, 9, 3, 3, 4, 0, 0, 7, 7, 1, 5, 8, 3, 8, 6, 7, 6, 1, 3, 8, 1,
        2, 4, 9, 8, 0, 7, 1, 2, 2, 8, 7, 0, 8, 9, 7, 4, 9, 0, 7, 5, 2, 0, 6, 8,
        3, 0, 9, 6, 3, 9, 1, 5, 8, 5, 9, 4, 6, 1, 4, 3, 5, 4, 0, 6, 1, 9, 3, 5,
        4, 9, 2, 1, 2, 7, 3, 8, 8, 4, 7, 2, 4, 6, 5, 8, 4, 7, 3, 5, 0, 1, 7, 8,
        4, 4, 3, 6, 8, 1, 8, 9])
Ep 3/5, it 380/469: loss train: 0.02, accuracy train: 1.00tensor([1, 0, 9, 3, 1, 6, 8, 3, 6, 2, 6, 6, 1, 8, 1, 9, 6, 9, 8, 4, 4, 1, 1, 3,
        1, 8, 5, 5, 4, 2, 6, 6, 9, 5, 1, 0, 7, 5, 0, 7, 7, 9, 6, 8, 6, 3, 8, 9,
        8, 8, 5, 4, 0, 3, 1, 5, 2, 7, 2, 2, 5, 0, 7, 2, 7, 1, 6, 1, 9, 8, 3, 6,
        5, 8, 2, 4, 2, 7, 0, 1, 7, 9, 5, 7, 7, 9, 0, 1, 2, 1, 9, 7, 5, 8, 2, 8,
        8, 8, 1, 1, 2, 8, 1, 8, 0, 8, 8, 9, 7, 2, 7, 5, 8, 0, 6, 3, 4, 7, 6, 6,
        8, 7, 7, 8, 7, 7, 0, 3])
Ep 3/5, it 381/469

Ep 3/5, it 406/469: loss train: 0.02, accuracy train: 0.99tensor([7, 4, 4, 1, 8, 7, 9, 0, 1, 3, 1, 4, 5, 1, 8, 7, 6, 7, 2, 7, 0, 5, 7, 8,
        1, 1, 5, 7, 3, 1, 4, 3, 5, 5, 3, 8, 3, 9, 3, 1, 7, 8, 3, 3, 0, 1, 4, 2,
        9, 3, 5, 5, 3, 9, 2, 2, 5, 1, 2, 5, 5, 6, 5, 7, 5, 8, 8, 7, 7, 2, 5, 7,
        7, 7, 7, 2, 4, 6, 2, 5, 0, 7, 3, 3, 5, 1, 6, 1, 6, 5, 7, 7, 9, 7, 3, 3,
        4, 5, 5, 0, 2, 2, 2, 1, 8, 7, 6, 1, 8, 3, 0, 1, 5, 9, 4, 3, 5, 1, 8, 8,
        0, 2, 3, 5, 8, 6, 6, 2])
Ep 3/5, it 407/469: loss train: 0.02, accuracy train: 0.99tensor([5, 2, 5, 1, 1, 7, 4, 3, 8, 3, 9, 9, 0, 1, 9, 1, 6, 3, 0, 9, 0, 8, 6, 8,
        1, 8, 2, 6, 2, 0, 8, 9, 3, 4, 2, 0, 6, 3, 6, 7, 7, 6, 5, 0, 3, 1, 3, 8,
        1, 3, 9, 7, 4, 4, 1, 6, 8, 4, 0, 0, 3, 0, 9, 9, 4, 1, 6, 3, 6, 5, 3, 2,
        4, 0, 6, 3, 8, 2, 1, 1, 0, 3, 2, 7, 6, 0, 0, 8, 1, 7, 9, 9, 5, 7, 0, 5,
        7, 9, 6, 5, 8, 6, 6, 5, 7, 6, 7, 5, 3, 9, 6, 5, 7, 8, 9, 1, 9, 2, 9, 1,
        7, 7, 6, 7, 8, 7, 9, 3])
Ep 3/5, it 408/469

Ep 3/5, it 433/469: loss train: 0.02, accuracy train: 0.99tensor([7, 2, 7, 1, 2, 5, 4, 4, 0, 8, 1, 4, 0, 3, 9, 0, 3, 1, 7, 2, 0, 5, 6, 5,
        0, 8, 3, 3, 2, 0, 5, 0, 4, 9, 7, 7, 4, 6, 1, 4, 7, 2, 3, 4, 7, 3, 8, 4,
        0, 4, 1, 2, 2, 8, 0, 6, 2, 5, 4, 8, 5, 4, 2, 1, 7, 3, 4, 6, 3, 9, 2, 4,
        5, 4, 1, 9, 6, 6, 2, 6, 0, 2, 4, 3, 9, 7, 9, 9, 1, 1, 0, 6, 4, 1, 3, 3,
        5, 0, 1, 1, 2, 9, 4, 5, 8, 7, 0, 6, 1, 2, 3, 3, 1, 7, 2, 1, 2, 5, 4, 9,
        1, 1, 4, 0, 4, 1, 2, 3])
Ep 3/5, it 434/469: loss train: 0.04, accuracy train: 0.98tensor([7, 9, 2, 3, 5, 0, 6, 9, 8, 5, 0, 4, 2, 6, 7, 1, 9, 1, 8, 3, 5, 6, 3, 8,
        5, 7, 3, 9, 1, 1, 6, 2, 7, 2, 0, 4, 1, 8, 2, 8, 9, 0, 4, 4, 7, 4, 9, 3,
        7, 5, 6, 0, 2, 6, 7, 9, 9, 7, 9, 1, 9, 6, 0, 7, 5, 3, 2, 2, 1, 8, 6, 7,
        6, 6, 3, 8, 5, 3, 2, 3, 4, 7, 5, 6, 0, 8, 3, 8, 8, 8, 8, 3, 3, 1, 5, 0,
        7, 2, 1, 2, 9, 8, 4, 0, 7, 9, 3, 1, 9, 6, 2, 4, 7, 2, 7, 3, 3, 8, 7, 8,
        1, 6, 5, 6, 6, 3, 8, 0])
Ep 3/5, it 435/469

Ep 3/5, it 458/469: loss train: 0.02, accuracy train: 0.99tensor([7, 6, 6, 8, 9, 6, 1, 8, 8, 8, 6, 9, 4, 3, 3, 8, 4, 1, 2, 2, 8, 0, 6, 3,
        5, 6, 4, 6, 8, 8, 8, 1, 7, 5, 6, 6, 6, 3, 1, 2, 9, 2, 1, 7, 7, 9, 1, 1,
        1, 2, 6, 5, 8, 6, 1, 2, 0, 3, 5, 6, 5, 6, 8, 0, 7, 9, 3, 8, 8, 3, 9, 2,
        4, 9, 0, 3, 3, 9, 9, 7, 3, 8, 3, 2, 3, 7, 4, 9, 8, 9, 0, 5, 0, 9, 9, 6,
        0, 1, 1, 4, 1, 0, 2, 5, 4, 3, 4, 0, 4, 0, 1, 5, 3, 6, 9, 7, 9, 7, 2, 8,
        3, 4, 7, 4, 9, 3, 7, 3])
Ep 3/5, it 459/469: loss train: 0.02, accuracy train: 1.00tensor([8, 1, 0, 8, 9, 1, 5, 1, 9, 3, 0, 4, 0, 8, 8, 8, 0, 1, 3, 8, 0, 6, 1, 2,
        3, 9, 8, 0, 6, 7, 3, 5, 0, 9, 6, 6, 4, 8, 3, 7, 6, 1, 9, 6, 9, 6, 8, 5,
        0, 1, 4, 3, 0, 6, 5, 6, 2, 4, 3, 5, 9, 0, 6, 0, 8, 6, 1, 7, 7, 1, 2, 3,
        6, 8, 9, 5, 9, 1, 3, 3, 4, 3, 9, 1, 4, 3, 5, 3, 7, 2, 6, 9, 4, 5, 6, 7,
        9, 7, 1, 7, 6, 8, 3, 4, 9, 9, 0, 6, 4, 8, 2, 5, 4, 7, 7, 9, 9, 0, 3, 6,
        2, 6, 0, 3, 6, 8, 7, 8])
Ep 3/5, it 460/469

Ep 4/5, it 12/469: loss train: 0.01, accuracy train: 1.00tensor([5, 4, 7, 6, 0, 8, 6, 6, 8, 8, 4, 3, 6, 9, 1, 4, 7, 1, 7, 8, 1, 3, 7, 2,
        7, 1, 3, 7, 0, 1, 8, 7, 0, 1, 2, 2, 5, 8, 9, 8, 5, 8, 4, 1, 5, 2, 8, 3,
        3, 7, 9, 8, 3, 1, 3, 5, 1, 9, 3, 0, 1, 1, 8, 6, 0, 8, 4, 9, 2, 5, 4, 6,
        1, 2, 9, 7, 7, 0, 7, 5, 9, 5, 6, 9, 2, 1, 7, 4, 3, 0, 1, 4, 3, 1, 7, 4,
        6, 3, 1, 1, 6, 3, 7, 5, 3, 5, 8, 4, 0, 3, 9, 9, 9, 5, 7, 1, 9, 9, 5, 7,
        6, 2, 4, 2, 9, 0, 6, 3])
Ep 4/5, it 13/469: loss train: 0.02, accuracy train: 0.99tensor([4, 6, 7, 1, 6, 1, 6, 8, 5, 8, 4, 7, 9, 9, 2, 9, 7, 7, 6, 5, 6, 1, 7, 2,
        0, 3, 9, 4, 9, 5, 5, 4, 8, 2, 9, 1, 3, 7, 8, 8, 9, 6, 4, 8, 3, 4, 0, 5,
        5, 0, 1, 0, 1, 5, 2, 9, 2, 5, 2, 2, 2, 6, 6, 0, 6, 2, 2, 4, 1, 6, 8, 5,
        7, 6, 6, 7, 3, 6, 7, 8, 2, 4, 7, 6, 9, 4, 9, 7, 2, 6, 9, 8, 8, 1, 0, 7,
        8, 8, 6, 5, 2, 3, 2, 0, 0, 0, 4, 3, 6, 2, 2, 1, 5, 8, 8, 7, 0, 8, 3, 4,
        1, 0, 4, 1, 2, 1, 0, 2])
Ep 4/5, it 14/469: l

Ep 4/5, it 37/469: loss train: 0.01, accuracy train: 1.00tensor([7, 7, 9, 2, 8, 5, 9, 6, 6, 7, 2, 6, 0, 5, 1, 0, 4, 5, 3, 8, 0, 6, 1, 5,
        8, 7, 7, 7, 5, 1, 8, 2, 6, 3, 0, 3, 3, 7, 3, 2, 3, 5, 7, 0, 0, 4, 3, 3,
        5, 4, 4, 3, 8, 6, 4, 1, 0, 3, 0, 0, 0, 7, 4, 3, 6, 8, 2, 9, 3, 7, 0, 9,
        0, 9, 1, 4, 7, 6, 0, 2, 2, 4, 6, 7, 6, 4, 5, 9, 9, 9, 5, 0, 2, 1, 1, 6,
        2, 2, 4, 6, 9, 8, 8, 6, 0, 8, 3, 2, 0, 1, 0, 5, 9, 8, 9, 9, 8, 7, 9, 0,
        3, 3, 5, 2, 9, 1, 7, 2])
Ep 4/5, it 38/469: loss train: 0.01, accuracy train: 1.00tensor([3, 0, 8, 4, 2, 4, 7, 6, 1, 5, 8, 9, 6, 3, 5, 0, 3, 5, 8, 0, 9, 9, 5, 2,
        5, 5, 0, 6, 7, 5, 8, 0, 7, 4, 8, 2, 3, 5, 5, 6, 5, 7, 1, 7, 3, 4, 0, 1,
        6, 2, 1, 9, 3, 2, 0, 2, 1, 0, 4, 1, 8, 7, 5, 2, 7, 3, 4, 3, 9, 9, 0, 9,
        4, 5, 8, 2, 5, 0, 3, 8, 6, 9, 3, 1, 9, 8, 5, 7, 6, 2, 5, 9, 8, 4, 1, 7,
        3, 5, 1, 3, 1, 8, 8, 4, 6, 2, 2, 9, 9, 3, 2, 4, 9, 6, 7, 7, 7, 7, 8, 3,
        8, 0, 6, 7, 1, 0, 8, 3])
Ep 4/5, it 39/469: l

Ep 4/5, it 60/469: loss train: 0.02, accuracy train: 0.99tensor([8, 9, 7, 2, 2, 4, 9, 7, 1, 1, 8, 0, 9, 8, 9, 2, 8, 5, 5, 5, 3, 0, 9, 3,
        1, 5, 3, 9, 2, 1, 0, 9, 5, 3, 0, 2, 3, 6, 7, 8, 5, 2, 6, 1, 3, 9, 0, 1,
        8, 4, 0, 6, 1, 7, 8, 0, 4, 1, 4, 7, 8, 0, 0, 0, 2, 1, 0, 0, 8, 1, 7, 1,
        8, 7, 0, 0, 9, 4, 2, 9, 8, 4, 3, 5, 6, 7, 8, 6, 8, 7, 1, 9, 4, 5, 7, 8,
        5, 4, 6, 9, 3, 2, 4, 9, 6, 4, 2, 0, 3, 3, 9, 1, 8, 6, 2, 0, 6, 9, 6, 5,
        7, 2, 2, 5, 8, 5, 9, 4])
Ep 4/5, it 61/469: loss train: 0.01, accuracy train: 1.00tensor([2, 1, 6, 5, 9, 5, 7, 9, 1, 1, 8, 5, 0, 5, 6, 1, 3, 8, 3, 8, 0, 2, 5, 3,
        2, 8, 6, 3, 2, 9, 5, 4, 3, 7, 7, 1, 7, 2, 5, 4, 6, 0, 2, 1, 5, 9, 3, 9,
        7, 4, 8, 0, 7, 2, 4, 3, 5, 7, 3, 4, 3, 0, 2, 8, 7, 9, 7, 5, 8, 3, 3, 0,
        8, 5, 9, 0, 8, 9, 5, 6, 0, 2, 4, 3, 7, 9, 7, 4, 5, 9, 6, 2, 4, 0, 7, 6,
        9, 1, 2, 2, 6, 6, 7, 2, 0, 0, 1, 3, 4, 5, 9, 1, 4, 2, 2, 3, 6, 4, 7, 1,
        9, 1, 0, 7, 8, 3, 3, 0])
Ep 4/5, it 62/469: l

Ep 4/5, it 85/469: loss train: 0.00, accuracy train: 1.00tensor([7, 5, 5, 1, 2, 6, 8, 2, 8, 3, 6, 9, 0, 2, 9, 9, 1, 1, 4, 8, 3, 7, 7, 0,
        7, 3, 9, 9, 1, 9, 0, 8, 7, 6, 2, 4, 9, 3, 0, 3, 3, 1, 8, 6, 2, 3, 1, 7,
        4, 6, 5, 4, 4, 1, 5, 3, 9, 0, 5, 6, 5, 0, 5, 1, 3, 1, 3, 4, 8, 1, 5, 2,
        1, 0, 3, 0, 4, 3, 3, 6, 7, 1, 5, 4, 8, 9, 7, 0, 9, 2, 0, 2, 3, 2, 8, 7,
        8, 7, 2, 6, 5, 4, 4, 2, 1, 9, 9, 3, 0, 1, 9, 5, 6, 4, 4, 6, 1, 1, 3, 0,
        4, 8, 5, 1, 6, 6, 7, 1])
Ep 4/5, it 86/469: loss train: 0.05, accuracy train: 0.99tensor([8, 2, 9, 0, 0, 2, 2, 0, 9, 2, 7, 8, 6, 5, 6, 3, 9, 1, 3, 0, 9, 4, 3, 9,
        7, 1, 5, 4, 7, 9, 3, 1, 1, 0, 0, 7, 7, 9, 0, 2, 6, 0, 1, 0, 1, 8, 1, 1,
        4, 3, 5, 0, 7, 7, 5, 6, 2, 2, 0, 7, 5, 8, 4, 5, 9, 3, 1, 1, 7, 1, 9, 5,
        3, 7, 8, 5, 0, 5, 6, 5, 2, 4, 6, 0, 7, 9, 3, 8, 2, 8, 4, 7, 9, 4, 3, 3,
        0, 8, 1, 3, 9, 8, 9, 7, 2, 7, 8, 0, 2, 1, 6, 1, 4, 4, 0, 0, 7, 9, 1, 5,
        0, 1, 9, 3, 5, 2, 7, 9])
Ep 4/5, it 87/469: l

Ep 4/5, it 108/469: loss train: 0.04, accuracy train: 0.99tensor([8, 5, 0, 1, 7, 3, 4, 8, 7, 7, 7, 6, 5, 5, 4, 2, 2, 6, 3, 9, 5, 4, 3, 9,
        1, 0, 0, 7, 3, 5, 2, 3, 6, 7, 6, 6, 4, 8, 8, 2, 1, 5, 4, 4, 7, 8, 8, 0,
        5, 8, 9, 1, 4, 6, 9, 4, 1, 7, 0, 5, 2, 3, 8, 7, 4, 4, 0, 4, 3, 2, 4, 4,
        0, 4, 4, 4, 1, 8, 7, 4, 1, 2, 9, 3, 7, 9, 5, 2, 2, 1, 9, 6, 0, 9, 1, 1,
        1, 5, 9, 3, 3, 5, 4, 3, 7, 7, 2, 8, 8, 2, 1, 1, 6, 8, 0, 7, 5, 3, 3, 3,
        4, 9, 9, 7, 6, 6, 7, 2])
Ep 4/5, it 109/469: loss train: 0.01, accuracy train: 1.00tensor([8, 9, 2, 5, 5, 2, 3, 8, 8, 1, 5, 4, 2, 9, 1, 3, 6, 6, 9, 1, 9, 4, 9, 3,
        2, 1, 0, 1, 3, 2, 5, 3, 7, 0, 9, 2, 6, 0, 3, 6, 7, 2, 9, 6, 3, 7, 3, 8,
        6, 7, 1, 0, 1, 2, 3, 7, 2, 4, 6, 4, 8, 0, 1, 4, 9, 3, 2, 2, 9, 0, 6, 5,
        9, 5, 8, 6, 8, 4, 3, 5, 9, 2, 8, 0, 8, 2, 7, 5, 1, 1, 4, 0, 7, 8, 0, 2,
        0, 0, 9, 7, 1, 0, 5, 1, 0, 8, 6, 7, 9, 5, 3, 1, 7, 0, 0, 9, 2, 4, 6, 0,
        7, 8, 8, 2, 0, 4, 7, 0])
Ep 4/5, it 110/469

Ep 4/5, it 132/469: loss train: 0.01, accuracy train: 1.00tensor([7, 8, 6, 7, 6, 3, 6, 7, 5, 9, 1, 1, 0, 3, 4, 9, 9, 3, 2, 5, 7, 1, 5, 6,
        6, 1, 5, 4, 5, 4, 9, 3, 9, 1, 7, 9, 1, 8, 3, 6, 4, 3, 2, 1, 4, 1, 1, 8,
        7, 9, 7, 5, 4, 9, 6, 0, 1, 3, 6, 2, 4, 8, 9, 8, 2, 1, 1, 8, 6, 3, 3, 3,
        2, 8, 0, 9, 8, 7, 6, 9, 9, 2, 9, 3, 3, 9, 2, 7, 0, 0, 0, 2, 3, 8, 9, 3,
        4, 1, 8, 0, 1, 3, 5, 9, 2, 5, 0, 7, 4, 3, 6, 1, 2, 2, 0, 7, 1, 6, 1, 1,
        0, 5, 9, 7, 8, 2, 5, 8])
Ep 4/5, it 133/469: loss train: 0.01, accuracy train: 1.00tensor([6, 9, 2, 3, 1, 8, 7, 8, 3, 7, 7, 9, 2, 3, 7, 1, 0, 7, 1, 7, 7, 3, 1, 4,
        9, 7, 1, 3, 6, 9, 3, 2, 6, 0, 4, 4, 1, 8, 2, 2, 7, 0, 5, 2, 1, 2, 4, 3,
        7, 4, 6, 1, 6, 4, 1, 7, 6, 8, 2, 6, 6, 4, 4, 8, 0, 3, 2, 7, 6, 4, 2, 4,
        7, 0, 3, 1, 7, 1, 8, 3, 6, 3, 2, 4, 0, 1, 5, 5, 9, 8, 1, 2, 5, 7, 0, 7,
        0, 5, 3, 9, 0, 7, 8, 1, 8, 8, 0, 1, 7, 8, 8, 3, 8, 3, 5, 3, 4, 9, 0, 3,
        1, 1, 7, 8, 6, 2, 5, 5])
Ep 4/5, it 134/469

Ep 4/5, it 158/469: loss train: 0.01, accuracy train: 1.00tensor([1, 5, 2, 2, 4, 4, 3, 5, 1, 2, 1, 4, 0, 4, 7, 4, 0, 8, 8, 4, 8, 0, 6, 7,
        2, 2, 1, 9, 6, 4, 8, 6, 6, 2, 4, 4, 8, 8, 8, 9, 9, 8, 7, 4, 9, 3, 3, 9,
        9, 3, 8, 9, 2, 8, 9, 3, 2, 2, 9, 3, 1, 1, 1, 2, 1, 1, 8, 3, 7, 8, 4, 3,
        1, 7, 2, 4, 2, 7, 9, 2, 8, 4, 0, 7, 5, 0, 4, 0, 9, 2, 8, 7, 8, 7, 8, 9,
        4, 3, 4, 1, 3, 8, 2, 1, 6, 7, 8, 0, 3, 1, 4, 6, 2, 2, 1, 2, 9, 6, 0, 1,
        1, 8, 5, 1, 9, 5, 7, 7])
Ep 4/5, it 159/469: loss train: 0.01, accuracy train: 1.00tensor([5, 0, 8, 5, 7, 3, 9, 4, 4, 0, 0, 5, 2, 3, 0, 3, 0, 2, 1, 2, 2, 0, 7, 9,
        9, 4, 0, 3, 1, 7, 3, 9, 8, 1, 7, 3, 5, 2, 4, 0, 0, 6, 1, 2, 0, 4, 1, 0,
        7, 3, 6, 9, 0, 2, 4, 1, 4, 8, 7, 8, 7, 2, 8, 8, 4, 5, 9, 8, 1, 0, 6, 8,
        1, 4, 1, 3, 2, 9, 9, 8, 8, 6, 7, 8, 1, 7, 1, 6, 7, 6, 2, 9, 6, 0, 8, 1,
        5, 2, 6, 6, 9, 2, 9, 7, 5, 5, 3, 3, 9, 4, 2, 7, 4, 6, 1, 9, 0, 8, 6, 1,
        4, 2, 7, 2, 9, 6, 8, 2])
Ep 4/5, it 160/469

Ep 4/5, it 183/469: loss train: 0.00, accuracy train: 1.00tensor([8, 5, 2, 4, 8, 0, 8, 7, 9, 6, 9, 0, 6, 4, 2, 7, 6, 8, 8, 9, 1, 1, 8, 3,
        4, 0, 3, 3, 1, 2, 9, 8, 9, 9, 1, 4, 1, 0, 4, 1, 4, 0, 6, 3, 6, 1, 3, 8,
        4, 8, 7, 5, 7, 2, 4, 0, 9, 2, 7, 1, 8, 9, 5, 1, 1, 5, 9, 9, 5, 5, 4, 1,
        8, 9, 4, 1, 3, 8, 0, 7, 3, 2, 2, 3, 6, 8, 3, 6, 5, 8, 3, 7, 5, 6, 1, 9,
        6, 3, 7, 9, 7, 3, 2, 1, 1, 2, 1, 3, 9, 7, 7, 7, 5, 3, 9, 1, 5, 3, 1, 4,
        3, 4, 9, 0, 0, 4, 1, 7])
Ep 4/5, it 184/469: loss train: 0.01, accuracy train: 1.00tensor([9, 6, 1, 2, 3, 5, 5, 5, 8, 7, 7, 9, 7, 1, 8, 0, 4, 0, 7, 9, 0, 1, 8, 8,
        6, 6, 3, 9, 4, 0, 1, 4, 8, 9, 8, 9, 9, 9, 5, 2, 2, 7, 4, 6, 5, 9, 4, 6,
        4, 7, 2, 2, 4, 6, 1, 8, 8, 1, 2, 9, 3, 0, 4, 3, 6, 6, 8, 1, 9, 6, 7, 2,
        0, 3, 4, 8, 1, 0, 6, 7, 7, 8, 7, 3, 9, 3, 2, 6, 8, 8, 0, 2, 4, 0, 1, 1,
        1, 8, 3, 2, 2, 5, 7, 1, 3, 4, 4, 9, 4, 8, 0, 7, 3, 4, 4, 9, 3, 6, 5, 8,
        9, 7, 9, 2, 0, 6, 4, 0])
Ep 4/5, it 185/469

Ep 4/5, it 208/469: loss train: 0.02, accuracy train: 0.99tensor([1, 9, 7, 5, 2, 3, 3, 4, 6, 0, 7, 7, 4, 6, 8, 5, 0, 2, 3, 1, 1, 6, 4, 2,
        4, 3, 0, 3, 8, 5, 8, 7, 9, 3, 1, 5, 5, 6, 1, 2, 5, 8, 9, 3, 7, 7, 1, 0,
        9, 5, 7, 7, 2, 6, 8, 0, 6, 6, 1, 0, 5, 3, 5, 1, 2, 2, 7, 1, 2, 9, 1, 2,
        9, 3, 3, 9, 6, 3, 6, 2, 7, 7, 3, 6, 8, 1, 6, 7, 9, 8, 1, 6, 9, 5, 5, 9,
        1, 8, 1, 7, 7, 6, 0, 2, 6, 5, 1, 1, 3, 1, 2, 6, 0, 9, 3, 5, 3, 7, 5, 3,
        1, 7, 2, 9, 6, 3, 9, 6])
Ep 4/5, it 209/469: loss train: 0.02, accuracy train: 0.98tensor([9, 0, 7, 7, 3, 8, 8, 7, 8, 9, 4, 5, 4, 5, 8, 3, 4, 5, 8, 9, 0, 5, 0, 3,
        9, 6, 3, 4, 5, 3, 1, 4, 4, 4, 3, 8, 2, 8, 9, 7, 7, 8, 1, 1, 1, 0, 4, 9,
        0, 6, 6, 3, 6, 1, 0, 6, 4, 0, 3, 9, 4, 3, 4, 3, 3, 3, 8, 6, 2, 6, 8, 9,
        4, 2, 3, 8, 7, 9, 7, 6, 6, 6, 4, 0, 1, 9, 3, 3, 6, 6, 3, 3, 9, 0, 7, 9,
        2, 8, 9, 2, 2, 7, 4, 3, 0, 2, 1, 5, 1, 5, 7, 8, 0, 3, 3, 2, 4, 1, 8, 8,
        5, 7, 2, 6, 4, 4, 0, 2])
Ep 4/5, it 210/469

Ep 4/5, it 231/469: loss train: 0.02, accuracy train: 0.99tensor([3, 2, 9, 2, 5, 6, 5, 9, 1, 7, 0, 2, 0, 4, 3, 4, 0, 4, 1, 5, 2, 6, 4, 5,
        4, 0, 3, 4, 3, 8, 3, 7, 7, 1, 6, 7, 3, 0, 8, 1, 3, 1, 9, 3, 4, 7, 9, 6,
        8, 4, 9, 2, 0, 9, 2, 1, 3, 5, 9, 9, 6, 3, 2, 6, 0, 6, 1, 6, 3, 9, 5, 6,
        2, 4, 3, 7, 4, 5, 8, 9, 2, 7, 6, 4, 8, 5, 4, 5, 8, 8, 8, 4, 1, 7, 0, 4,
        8, 0, 8, 7, 6, 1, 3, 3, 2, 3, 6, 0, 8, 5, 5, 0, 7, 0, 8, 3, 6, 0, 7, 6,
        1, 2, 4, 3, 1, 0, 4, 5])
Ep 4/5, it 232/469: loss train: 0.08, accuracy train: 0.98tensor([4, 4, 9, 1, 2, 1, 9, 3, 1, 4, 4, 9, 4, 0, 4, 7, 1, 8, 7, 0, 7, 8, 4, 2,
        9, 1, 7, 5, 2, 5, 9, 4, 0, 8, 1, 3, 2, 0, 1, 0, 4, 3, 4, 7, 1, 5, 6, 7,
        9, 2, 2, 3, 3, 8, 9, 4, 9, 0, 8, 9, 0, 9, 6, 0, 2, 8, 3, 9, 3, 0, 8, 9,
        0, 6, 2, 2, 7, 9, 3, 3, 3, 3, 1, 5, 5, 2, 3, 2, 6, 6, 6, 9, 3, 7, 9, 6,
        6, 6, 2, 1, 4, 3, 8, 7, 6, 1, 0, 9, 3, 7, 4, 1, 6, 8, 7, 4, 7, 8, 4, 8,
        0, 7, 7, 1, 4, 2, 8, 9])
Ep 4/5, it 233/469

Ep 4/5, it 258/469: loss train: 0.01, accuracy train: 1.00tensor([5, 7, 1, 3, 8, 4, 4, 7, 8, 3, 8, 2, 2, 5, 3, 2, 7, 9, 2, 6, 6, 4, 0, 3,
        1, 7, 0, 6, 2, 2, 0, 3, 3, 9, 4, 4, 0, 4, 5, 9, 3, 0, 1, 2, 6, 1, 1, 2,
        2, 7, 1, 6, 0, 3, 4, 4, 7, 1, 5, 4, 6, 0, 2, 7, 8, 4, 9, 1, 1, 3, 9, 0,
        1, 9, 5, 3, 2, 7, 3, 4, 7, 7, 3, 4, 9, 9, 7, 8, 5, 5, 0, 0, 9, 3, 7, 2,
        6, 2, 8, 5, 8, 8, 0, 8, 5, 9, 6, 5, 4, 1, 1, 2, 5, 1, 7, 5, 0, 6, 6, 6,
        0, 6, 5, 2, 7, 1, 7, 3])
Ep 4/5, it 259/469: loss train: 0.01, accuracy train: 1.00tensor([1, 8, 3, 0, 0, 0, 6, 7, 3, 6, 3, 3, 6, 2, 9, 7, 5, 5, 5, 3, 7, 1, 7, 6,
        1, 3, 7, 1, 7, 9, 6, 8, 4, 4, 1, 4, 4, 5, 7, 0, 2, 0, 7, 3, 8, 8, 3, 8,
        9, 1, 2, 3, 7, 1, 7, 2, 2, 7, 5, 0, 4, 3, 0, 8, 5, 9, 1, 7, 9, 0, 9, 0,
        4, 6, 1, 4, 0, 9, 1, 0, 7, 8, 0, 6, 0, 7, 2, 7, 9, 7, 8, 6, 5, 5, 6, 9,
        1, 7, 0, 5, 3, 6, 4, 9, 1, 3, 4, 0, 8, 7, 4, 8, 4, 1, 7, 3, 6, 1, 1, 4,
        6, 3, 2, 3, 7, 1, 6, 9])
Ep 4/5, it 260/469

Ep 4/5, it 281/469: loss train: 0.02, accuracy train: 1.00tensor([8, 3, 6, 0, 8, 8, 4, 1, 4, 5, 2, 1, 8, 3, 7, 2, 1, 2, 6, 2, 8, 9, 5, 4,
        4, 0, 8, 4, 9, 8, 8, 1, 1, 2, 3, 2, 0, 2, 0, 8, 9, 6, 1, 0, 9, 7, 4, 0,
        4, 4, 6, 3, 3, 3, 8, 4, 1, 2, 7, 5, 8, 8, 2, 3, 8, 5, 0, 5, 3, 0, 8, 5,
        4, 2, 2, 2, 6, 1, 1, 8, 1, 7, 8, 0, 5, 6, 8, 4, 0, 7, 0, 3, 6, 2, 3, 0,
        3, 0, 1, 2, 7, 1, 2, 0, 0, 9, 7, 7, 0, 3, 6, 6, 0, 4, 0, 9, 5, 6, 9, 2,
        0, 0, 9, 8, 5, 6, 7, 3])
Ep 4/5, it 282/469: loss train: 0.01, accuracy train: 1.00tensor([6, 8, 9, 5, 1, 4, 8, 1, 1, 4, 6, 8, 4, 3, 9, 7, 3, 0, 0, 3, 7, 2, 8, 2,
        7, 0, 2, 3, 4, 1, 3, 0, 0, 5, 0, 1, 0, 3, 5, 9, 3, 5, 2, 0, 2, 9, 2, 7,
        5, 0, 2, 8, 2, 3, 5, 9, 4, 9, 5, 6, 4, 9, 3, 9, 6, 7, 0, 0, 0, 4, 0, 6,
        5, 3, 9, 1, 2, 8, 0, 0, 7, 1, 3, 1, 3, 8, 1, 0, 2, 7, 9, 4, 9, 5, 2, 8,
        6, 5, 6, 4, 4, 7, 3, 9, 2, 8, 3, 7, 9, 9, 5, 6, 9, 3, 6, 0, 3, 7, 5, 7,
        9, 5, 0, 0, 2, 3, 2, 9])
Ep 4/5, it 283/469

Ep 4/5, it 304/469: loss train: 0.01, accuracy train: 1.00tensor([4, 6, 9, 4, 9, 1, 0, 4, 5, 2, 8, 0, 9, 8, 3, 5, 7, 3, 6, 6, 0, 1, 2, 9,
        6, 0, 0, 6, 3, 0, 5, 5, 2, 2, 8, 9, 6, 8, 2, 3, 1, 4, 9, 5, 1, 7, 0, 7,
        5, 5, 3, 9, 5, 8, 3, 4, 4, 6, 2, 9, 8, 8, 2, 1, 1, 9, 5, 7, 8, 9, 1, 3,
        5, 8, 5, 3, 9, 6, 3, 1, 4, 7, 2, 7, 9, 9, 5, 7, 5, 5, 9, 1, 4, 5, 5, 1,
        0, 7, 8, 5, 1, 9, 1, 6, 6, 3, 6, 1, 1, 5, 7, 5, 8, 7, 8, 1, 8, 5, 4, 1,
        7, 7, 4, 9, 6, 6, 6, 0])
Ep 4/5, it 305/469: loss train: 0.07, accuracy train: 0.99tensor([3, 2, 3, 8, 4, 2, 5, 4, 7, 7, 0, 9, 2, 3, 5, 6, 3, 1, 6, 7, 5, 0, 7, 2,
        0, 6, 2, 1, 1, 7, 4, 4, 3, 2, 4, 2, 9, 5, 0, 7, 7, 5, 7, 4, 6, 3, 1, 8,
        5, 4, 1, 1, 2, 0, 4, 0, 7, 7, 7, 9, 9, 4, 4, 9, 5, 8, 2, 2, 7, 9, 6, 0,
        2, 6, 4, 4, 6, 8, 0, 4, 0, 3, 3, 9, 3, 3, 8, 0, 4, 7, 7, 8, 0, 1, 1, 9,
        0, 1, 8, 9, 7, 0, 7, 4, 0, 7, 4, 8, 5, 8, 5, 2, 1, 3, 5, 1, 2, 9, 4, 3,
        0, 1, 3, 4, 4, 8, 1, 7])
Ep 4/5, it 306/469

Ep 4/5, it 329/469: loss train: 0.02, accuracy train: 0.99tensor([7, 1, 3, 2, 1, 4, 1, 5, 1, 2, 4, 8, 5, 7, 8, 5, 0, 8, 6, 3, 5, 7, 6, 6,
        8, 2, 0, 9, 2, 9, 6, 2, 6, 5, 9, 4, 4, 0, 3, 5, 5, 3, 0, 8, 7, 6, 4, 9,
        6, 3, 6, 4, 1, 2, 8, 2, 8, 9, 5, 7, 1, 1, 9, 5, 0, 5, 1, 5, 0, 3, 2, 9,
        0, 3, 1, 1, 6, 5, 9, 8, 1, 0, 5, 2, 8, 6, 6, 7, 2, 7, 1, 8, 6, 7, 5, 0,
        3, 9, 1, 4, 9, 8, 4, 9, 5, 6, 3, 0, 0, 6, 9, 2, 9, 9, 8, 5, 6, 0, 6, 3,
        7, 9, 5, 6, 7, 4, 4, 3])
Ep 4/5, it 330/469: loss train: 0.02, accuracy train: 0.99tensor([5, 7, 4, 8, 7, 0, 4, 5, 8, 0, 9, 3, 0, 1, 3, 4, 8, 0, 5, 4, 8, 9, 8, 1,
        0, 3, 5, 2, 5, 5, 7, 5, 7, 4, 5, 6, 3, 1, 1, 9, 1, 5, 8, 4, 3, 6, 8, 0,
        7, 0, 5, 7, 2, 0, 2, 7, 3, 5, 7, 6, 7, 6, 5, 0, 1, 8, 8, 1, 4, 6, 2, 3,
        3, 4, 9, 7, 6, 3, 2, 8, 8, 1, 2, 6, 7, 8, 0, 6, 3, 0, 0, 6, 1, 6, 2, 0,
        0, 1, 7, 8, 4, 8, 0, 1, 1, 7, 8, 9, 4, 5, 3, 1, 6, 3, 7, 8, 3, 4, 1, 5,
        4, 9, 1, 1, 6, 0, 7, 3])
Ep 4/5, it 331/469

Ep 4/5, it 354/469: loss train: 0.01, accuracy train: 1.00tensor([3, 6, 4, 3, 4, 9, 3, 7, 7, 6, 8, 6, 2, 6, 7, 3, 1, 5, 0, 9, 3, 4, 2, 9,
        7, 3, 2, 7, 4, 1, 8, 0, 8, 4, 1, 6, 5, 5, 3, 1, 5, 3, 4, 1, 6, 0, 5, 0,
        5, 7, 7, 4, 1, 7, 7, 0, 1, 8, 8, 3, 2, 5, 6, 1, 6, 0, 4, 6, 6, 8, 2, 0,
        6, 4, 6, 1, 5, 1, 6, 1, 3, 7, 0, 7, 9, 2, 6, 9, 6, 8, 2, 2, 3, 5, 7, 5,
        5, 2, 4, 2, 6, 9, 5, 0, 7, 0, 4, 4, 1, 9, 6, 2, 4, 3, 5, 9, 7, 1, 0, 0,
        1, 7, 8, 6, 7, 7, 3, 0])
Ep 4/5, it 355/469: loss train: 0.00, accuracy train: 1.00tensor([1, 1, 5, 5, 7, 0, 8, 0, 8, 6, 6, 8, 3, 2, 5, 8, 6, 5, 5, 6, 0, 9, 6, 8,
        3, 3, 9, 3, 3, 3, 5, 4, 9, 0, 1, 2, 8, 0, 6, 0, 0, 1, 2, 0, 3, 5, 9, 6,
        0, 2, 1, 5, 1, 7, 2, 8, 7, 4, 3, 3, 9, 1, 2, 6, 5, 0, 3, 5, 4, 3, 4, 4,
        7, 6, 0, 5, 9, 7, 4, 4, 4, 7, 3, 2, 8, 1, 4, 4, 6, 9, 1, 3, 8, 0, 6, 3,
        9, 0, 0, 0, 8, 0, 9, 3, 9, 2, 4, 3, 4, 7, 5, 1, 7, 0, 7, 4, 0, 7, 5, 9,
        8, 6, 8, 5, 1, 1, 5, 3])
Ep 4/5, it 356/469

Ep 4/5, it 378/469: loss train: 0.02, accuracy train: 0.99tensor([0, 1, 0, 4, 6, 0, 0, 1, 4, 6, 8, 4, 8, 4, 6, 5, 0, 4, 2, 8, 3, 5, 0, 8,
        7, 2, 4, 9, 7, 0, 4, 1, 5, 1, 1, 1, 7, 7, 5, 2, 0, 9, 1, 8, 2, 2, 9, 6,
        7, 5, 4, 5, 6, 1, 9, 2, 4, 3, 6, 3, 6, 4, 1, 2, 4, 4, 7, 2, 1, 0, 0, 0,
        9, 1, 4, 6, 5, 8, 0, 6, 9, 1, 0, 8, 8, 2, 3, 9, 8, 2, 2, 6, 4, 3, 2, 0,
        0, 7, 5, 5, 1, 7, 0, 2, 1, 1, 5, 3, 2, 8, 7, 1, 1, 4, 3, 2, 3, 0, 1, 2,
        1, 2, 8, 9, 7, 7, 9, 0])
Ep 4/5, it 379/469: loss train: 0.01, accuracy train: 1.00tensor([8, 2, 6, 5, 9, 8, 7, 0, 7, 3, 5, 7, 5, 5, 2, 8, 2, 7, 9, 4, 2, 1, 0, 5,
        7, 8, 1, 9, 4, 9, 2, 6, 5, 0, 7, 6, 1, 1, 2, 0, 9, 8, 9, 9, 1, 4, 2, 3,
        1, 8, 6, 0, 2, 7, 0, 0, 6, 9, 4, 9, 9, 0, 3, 3, 9, 4, 8, 0, 4, 7, 2, 1,
        4, 1, 2, 6, 6, 2, 2, 3, 2, 5, 4, 6, 3, 0, 3, 0, 0, 1, 9, 2, 8, 4, 6, 2,
        6, 5, 6, 9, 2, 3, 8, 2, 3, 6, 2, 7, 9, 8, 7, 1, 6, 0, 6, 9, 3, 6, 5, 9,
        0, 2, 4, 9, 1, 2, 8, 3])
Ep 4/5, it 380/469

Ep 4/5, it 402/469: loss train: 0.02, accuracy train: 0.99tensor([6, 0, 2, 7, 4, 9, 9, 6, 0, 4, 6, 5, 2, 1, 7, 6, 1, 6, 9, 6, 0, 8, 4, 2,
        8, 4, 0, 8, 6, 1, 6, 4, 6, 3, 3, 1, 1, 5, 5, 9, 0, 8, 6, 9, 6, 5, 7, 8,
        9, 9, 9, 3, 4, 2, 1, 1, 7, 9, 2, 9, 6, 0, 1, 8, 1, 8, 5, 3, 2, 4, 7, 9,
        6, 0, 8, 3, 7, 0, 2, 7, 7, 7, 9, 2, 6, 4, 6, 8, 6, 4, 0, 7, 0, 6, 2, 2,
        9, 3, 2, 7, 7, 3, 8, 8, 3, 3, 8, 8, 8, 5, 7, 8, 7, 9, 3, 9, 6, 3, 9, 1,
        2, 0, 3, 6, 9, 9, 3, 7])
Ep 4/5, it 403/469: loss train: 0.01, accuracy train: 1.00tensor([9, 6, 3, 8, 5, 9, 3, 5, 5, 1, 0, 9, 7, 1, 1, 5, 3, 2, 5, 7, 6, 3, 1, 0,
        7, 9, 2, 6, 7, 8, 7, 5, 9, 3, 0, 6, 1, 1, 8, 4, 6, 8, 6, 9, 2, 2, 5, 0,
        7, 7, 0, 5, 4, 5, 2, 3, 4, 8, 6, 0, 1, 5, 2, 9, 8, 0, 2, 6, 1, 0, 1, 8,
        9, 4, 3, 8, 8, 6, 1, 8, 3, 4, 0, 1, 9, 7, 9, 9, 1, 9, 8, 4, 8, 3, 9, 7,
        9, 9, 1, 6, 4, 1, 4, 6, 7, 3, 9, 6, 3, 9, 0, 7, 9, 6, 7, 4, 1, 3, 8, 0,
        3, 0, 9, 7, 0, 8, 7, 5])
Ep 4/5, it 404/469

Ep 4/5, it 422/469: loss train: 0.01, accuracy train: 1.00tensor([9, 7, 2, 7, 8, 8, 4, 6, 7, 3, 2, 9, 6, 4, 0, 8, 6, 5, 5, 3, 7, 0, 0, 3,
        1, 9, 2, 5, 1, 3, 4, 6, 4, 5, 3, 2, 6, 3, 9, 4, 8, 7, 3, 1, 6, 8, 1, 6,
        3, 4, 8, 3, 3, 5, 0, 9, 4, 7, 0, 3, 0, 2, 2, 7, 5, 8, 7, 5, 9, 4, 4, 6,
        4, 5, 1, 4, 9, 8, 8, 5, 5, 3, 1, 5, 8, 9, 1, 7, 1, 2, 8, 3, 7, 1, 3, 2,
        3, 9, 3, 1, 4, 3, 6, 5, 9, 9, 7, 0, 0, 4, 1, 0, 9, 2, 3, 1, 6, 6, 1, 0,
        6, 4, 4, 7, 4, 5, 3, 7])
Ep 4/5, it 423/469: loss train: 0.01, accuracy train: 1.00tensor([0, 0, 1, 4, 6, 7, 2, 8, 5, 9, 2, 6, 1, 5, 0, 6, 8, 4, 7, 5, 9, 0, 3, 0,
        1, 7, 3, 3, 9, 7, 3, 5, 1, 2, 5, 4, 8, 7, 2, 6, 6, 8, 1, 6, 5, 4, 9, 0,
        1, 5, 5, 7, 4, 4, 6, 4, 4, 5, 1, 8, 5, 8, 2, 9, 7, 3, 4, 9, 6, 3, 5, 2,
        0, 8, 9, 4, 0, 4, 8, 7, 1, 2, 3, 6, 6, 4, 2, 3, 0, 2, 5, 4, 0, 3, 1, 2,
        2, 1, 7, 6, 7, 0, 9, 4, 3, 1, 8, 6, 6, 0, 9, 0, 0, 0, 4, 5, 9, 1, 2, 3,
        4, 9, 4, 9, 0, 1, 2, 1])
Ep 4/5, it 424/469

Ep 4/5, it 441/469: loss train: 0.00, accuracy train: 1.00tensor([6, 9, 1, 0, 6, 7, 8, 6, 5, 4, 1, 2, 0, 1, 3, 8, 0, 2, 1, 0, 8, 3, 3, 5,
        3, 3, 1, 8, 5, 2, 8, 2, 5, 9, 8, 3, 4, 7, 5, 9, 3, 8, 8, 9, 1, 6, 8, 0,
        6, 8, 9, 3, 1, 9, 0, 4, 0, 5, 4, 1, 5, 2, 7, 5, 0, 4, 0, 9, 8, 0, 9, 8,
        9, 0, 8, 7, 8, 4, 5, 2, 2, 7, 2, 7, 8, 3, 3, 3, 5, 2, 3, 1, 7, 9, 5, 9,
        7, 9, 3, 7, 9, 5, 4, 3, 9, 2, 0, 0, 7, 2, 3, 2, 6, 1, 1, 1, 7, 9, 3, 4,
        0, 5, 4, 9, 3, 3, 9, 0])
Ep 4/5, it 442/469: loss train: 0.01, accuracy train: 1.00tensor([1, 0, 6, 0, 3, 5, 4, 0, 7, 4, 3, 1, 1, 3, 7, 0, 4, 3, 4, 0, 8, 3, 9, 3,
        7, 3, 6, 2, 1, 2, 3, 2, 1, 8, 4, 1, 6, 2, 1, 1, 0, 5, 5, 1, 7, 4, 7, 3,
        9, 8, 8, 1, 7, 7, 9, 3, 9, 1, 2, 8, 4, 6, 1, 6, 0, 7, 1, 2, 3, 9, 7, 2,
        1, 7, 8, 1, 0, 0, 9, 8, 4, 6, 6, 3, 1, 3, 9, 5, 9, 1, 3, 9, 9, 6, 0, 8,
        0, 1, 7, 5, 7, 6, 3, 2, 1, 2, 8, 8, 8, 3, 7, 6, 7, 8, 3, 4, 3, 4, 5, 0,
        7, 3, 6, 0, 1, 8, 0, 9])
Ep 4/5, it 443/469

Ep 4/5, it 460/469: loss train: 0.01, accuracy train: 1.00tensor([3, 4, 2, 5, 1, 1, 8, 1, 0, 1, 1, 4, 2, 8, 1, 0, 1, 8, 2, 8, 5, 1, 2, 3,
        6, 7, 8, 5, 9, 1, 0, 6, 9, 0, 7, 2, 1, 0, 1, 4, 3, 8, 5, 4, 6, 1, 0, 1,
        5, 3, 6, 8, 1, 6, 9, 3, 4, 9, 9, 2, 1, 6, 3, 6, 2, 4, 7, 1, 3, 4, 8, 5,
        1, 7, 0, 5, 7, 5, 5, 3, 5, 8, 8, 0, 8, 6, 8, 3, 1, 3, 6, 5, 2, 4, 8, 5,
        4, 5, 5, 4, 4, 2, 7, 2, 1, 4, 7, 4, 5, 2, 6, 2, 7, 9, 6, 9, 2, 9, 3, 0,
        3, 7, 5, 7, 6, 8, 5, 3])
Ep 4/5, it 461/469: loss train: 0.01, accuracy train: 1.00tensor([4, 5, 6, 5, 3, 3, 7, 7, 0, 3, 4, 3, 5, 0, 2, 4, 3, 3, 4, 5, 6, 8, 0, 4,
        4, 1, 4, 7, 9, 8, 7, 8, 8, 0, 0, 2, 9, 2, 3, 4, 5, 6, 1, 2, 1, 6, 6, 2,
        2, 6, 0, 3, 2, 0, 2, 2, 0, 4, 8, 0, 1, 9, 5, 8, 9, 6, 3, 1, 0, 2, 9, 3,
        4, 0, 4, 7, 4, 5, 5, 3, 4, 8, 0, 2, 8, 7, 0, 6, 9, 0, 9, 7, 8, 4, 3, 6,
        1, 3, 3, 8, 0, 9, 6, 3, 8, 0, 3, 6, 6, 1, 3, 2, 7, 0, 1, 4, 9, 1, 7, 4,
        6, 6, 9, 8, 9, 0, 8, 7])
Ep 4/5, it 462/469

Ep 5/5, it 11/469: loss train: 0.02, accuracy train: 1.00tensor([9, 1, 7, 5, 0, 6, 5, 3, 8, 9, 4, 0, 2, 6, 6, 8, 2, 5, 4, 5, 8, 7, 8, 8,
        9, 9, 0, 3, 7, 6, 9, 6, 8, 7, 9, 8, 7, 5, 4, 5, 8, 4, 4, 7, 6, 3, 0, 5,
        9, 7, 4, 0, 9, 2, 7, 5, 3, 3, 3, 5, 9, 1, 6, 9, 4, 7, 6, 2, 7, 9, 6, 3,
        1, 9, 6, 1, 7, 7, 9, 6, 3, 8, 7, 5, 8, 1, 2, 5, 2, 1, 1, 5, 7, 7, 0, 0,
        1, 9, 3, 0, 2, 6, 1, 4, 1, 1, 0, 8, 3, 3, 6, 0, 6, 0, 0, 4, 4, 2, 0, 1,
        5, 4, 8, 2, 6, 9, 6, 3])
Ep 5/5, it 12/469: loss train: 0.00, accuracy train: 1.00tensor([4, 9, 8, 1, 9, 4, 2, 7, 8, 8, 7, 8, 6, 1, 1, 3, 0, 8, 5, 7, 4, 7, 4, 9,
        1, 8, 4, 1, 9, 6, 1, 1, 7, 1, 6, 3, 0, 3, 9, 6, 6, 0, 5, 1, 3, 7, 5, 0,
        7, 9, 7, 9, 7, 3, 9, 4, 0, 5, 1, 0, 9, 3, 6, 7, 2, 2, 8, 9, 3, 3, 8, 7,
        8, 3, 6, 1, 0, 2, 4, 2, 8, 0, 9, 6, 2, 9, 2, 3, 1, 5, 4, 2, 6, 8, 1, 6,
        1, 6, 9, 3, 8, 4, 4, 8, 8, 1, 4, 1, 8, 8, 0, 4, 3, 8, 5, 8, 6, 0, 0, 3,
        6, 2, 8, 2, 6, 2, 0, 9])
Ep 5/5, it 13/469: l

Ep 5/5, it 28/469: loss train: 0.01, accuracy train: 1.00tensor([3, 5, 6, 3, 3, 9, 7, 1, 4, 0, 6, 2, 1, 3, 5, 7, 5, 6, 9, 0, 0, 7, 0, 3,
        4, 9, 0, 2, 7, 9, 4, 0, 1, 2, 6, 9, 0, 8, 2, 9, 2, 6, 3, 1, 7, 7, 1, 7,
        1, 3, 5, 6, 6, 4, 3, 4, 1, 4, 8, 1, 9, 0, 3, 8, 6, 6, 8, 7, 8, 3, 4, 1,
        9, 1, 1, 7, 1, 8, 7, 3, 1, 5, 0, 6, 3, 4, 7, 2, 0, 2, 4, 6, 7, 9, 6, 0,
        3, 1, 4, 1, 3, 0, 1, 9, 7, 3, 6, 2, 4, 3, 1, 7, 2, 0, 6, 8, 3, 7, 4, 7,
        9, 8, 2, 8, 5, 5, 8, 7])
Ep 5/5, it 29/469: loss train: 0.02, accuracy train: 0.99tensor([1, 0, 8, 4, 2, 9, 7, 7, 1, 9, 8, 0, 7, 9, 6, 8, 2, 2, 2, 1, 7, 8, 5, 9,
        1, 6, 3, 7, 1, 6, 4, 3, 7, 9, 6, 5, 0, 9, 4, 6, 0, 7, 1, 9, 5, 8, 2, 9,
        3, 7, 6, 4, 6, 1, 5, 6, 8, 0, 0, 4, 7, 2, 5, 0, 0, 0, 7, 2, 4, 5, 5, 3,
        6, 0, 9, 8, 5, 8, 9, 4, 2, 0, 5, 3, 6, 1, 7, 6, 4, 2, 5, 6, 5, 4, 9, 8,
        3, 4, 1, 8, 7, 5, 1, 4, 8, 3, 9, 2, 5, 4, 2, 2, 1, 7, 2, 7, 4, 9, 7, 7,
        0, 8, 3, 4, 7, 1, 5, 3])
Ep 5/5, it 30/469: l

Ep 5/5, it 46/469: loss train: 0.00, accuracy train: 1.00tensor([4, 0, 6, 6, 1, 2, 1, 9, 6, 8, 5, 1, 8, 9, 8, 2, 9, 9, 3, 4, 1, 0, 6, 4,
        5, 9, 8, 6, 5, 2, 1, 9, 4, 0, 1, 6, 0, 7, 2, 5, 7, 8, 8, 0, 4, 8, 1, 3,
        4, 3, 2, 4, 1, 3, 8, 0, 8, 2, 8, 4, 2, 7, 3, 0, 8, 8, 4, 2, 4, 3, 1, 2,
        0, 1, 1, 5, 4, 6, 1, 2, 5, 0, 3, 7, 7, 5, 8, 0, 9, 9, 9, 2, 4, 9, 2, 5,
        1, 6, 9, 3, 5, 5, 6, 0, 4, 4, 6, 2, 4, 8, 7, 1, 9, 2, 7, 3, 5, 6, 0, 7,
        2, 7, 0, 4, 6, 3, 4, 0])
Ep 5/5, it 47/469: loss train: 0.02, accuracy train: 0.99tensor([6, 8, 2, 5, 9, 4, 4, 8, 3, 7, 0, 1, 7, 1, 3, 6, 7, 4, 3, 1, 4, 7, 3, 8,
        2, 8, 6, 7, 6, 7, 4, 8, 0, 7, 0, 3, 6, 2, 9, 1, 0, 7, 4, 5, 9, 6, 8, 7,
        0, 8, 1, 2, 2, 9, 4, 3, 1, 5, 9, 3, 3, 3, 3, 9, 9, 4, 0, 6, 7, 6, 3, 1,
        8, 1, 0, 6, 8, 1, 9, 4, 1, 3, 4, 3, 1, 8, 5, 5, 1, 7, 4, 8, 5, 6, 2, 1,
        6, 7, 2, 3, 9, 3, 1, 0, 1, 2, 9, 3, 0, 9, 4, 4, 4, 8, 9, 8, 1, 2, 9, 2,
        9, 1, 9, 6, 0, 8, 6, 1])
Ep 5/5, it 48/469: l

Ep 5/5, it 63/469: loss train: 0.00, accuracy train: 1.00tensor([7, 4, 4, 1, 8, 0, 4, 1, 4, 6, 3, 5, 3, 3, 9, 3, 8, 0, 1, 6, 6, 5, 9, 3,
        0, 6, 0, 8, 0, 5, 7, 5, 6, 3, 7, 4, 8, 6, 9, 5, 3, 6, 9, 2, 8, 8, 8, 3,
        3, 5, 8, 3, 8, 0, 4, 6, 3, 3, 3, 7, 7, 9, 4, 6, 1, 6, 2, 4, 5, 3, 3, 0,
        8, 5, 9, 6, 1, 0, 1, 1, 4, 7, 8, 4, 0, 4, 6, 2, 6, 6, 4, 6, 3, 8, 0, 4,
        8, 7, 3, 1, 1, 1, 2, 7, 0, 5, 3, 6, 5, 5, 5, 2, 0, 1, 9, 4, 9, 5, 9, 6,
        8, 9, 3, 4, 7, 2, 8, 9])
Ep 5/5, it 64/469: loss train: 0.00, accuracy train: 1.00tensor([0, 5, 7, 1, 8, 3, 1, 3, 2, 0, 5, 9, 5, 7, 9, 1, 6, 7, 9, 1, 3, 6, 3, 1,
        7, 4, 1, 9, 6, 7, 7, 8, 5, 3, 1, 4, 6, 9, 3, 5, 3, 0, 3, 3, 9, 8, 3, 0,
        9, 7, 4, 0, 2, 6, 2, 3, 3, 9, 0, 8, 7, 3, 7, 4, 2, 4, 9, 9, 9, 8, 2, 0,
        9, 2, 7, 3, 7, 3, 2, 0, 8, 9, 3, 0, 1, 1, 1, 5, 6, 8, 3, 0, 3, 6, 1, 2,
        2, 5, 6, 6, 2, 8, 0, 2, 1, 6, 9, 6, 0, 9, 8, 7, 2, 6, 6, 5, 6, 6, 1, 6,
        0, 0, 3, 1, 2, 5, 3, 3])
Ep 5/5, it 65/469: l

Ep 5/5, it 81/469: loss train: 0.00, accuracy train: 1.00tensor([0, 1, 5, 7, 6, 3, 9, 2, 1, 0, 9, 2, 4, 7, 5, 4, 6, 9, 0, 6, 9, 9, 2, 1,
        7, 8, 5, 0, 0, 3, 5, 4, 4, 5, 2, 8, 7, 0, 0, 7, 9, 0, 8, 5, 9, 1, 9, 2,
        7, 7, 1, 1, 1, 5, 6, 4, 2, 7, 1, 1, 3, 8, 7, 1, 7, 3, 7, 4, 0, 9, 0, 7,
        9, 3, 5, 2, 2, 6, 8, 7, 8, 8, 4, 9, 2, 6, 1, 7, 6, 0, 0, 0, 8, 9, 4, 5,
        4, 8, 2, 7, 0, 5, 9, 0, 3, 8, 7, 3, 2, 3, 5, 9, 2, 2, 5, 1, 3, 3, 4, 3,
        3, 6, 0, 3, 5, 4, 0, 9])
Ep 5/5, it 82/469: loss train: 0.01, accuracy train: 1.00tensor([5, 0, 1, 3, 0, 7, 2, 6, 1, 4, 7, 4, 6, 7, 1, 4, 6, 5, 9, 2, 8, 5, 9, 9,
        1, 9, 6, 5, 0, 5, 8, 9, 7, 2, 6, 1, 2, 0, 0, 6, 7, 4, 8, 8, 0, 3, 5, 7,
        4, 7, 9, 2, 7, 9, 7, 2, 5, 4, 9, 0, 2, 6, 9, 6, 3, 1, 5, 7, 8, 4, 9, 1,
        4, 5, 3, 7, 0, 3, 8, 9, 8, 0, 2, 1, 7, 3, 7, 3, 2, 9, 6, 8, 7, 1, 0, 3,
        0, 4, 8, 8, 9, 7, 9, 8, 5, 6, 0, 0, 1, 6, 9, 1, 0, 0, 0, 2, 7, 7, 7, 1,
        1, 4, 4, 9, 0, 2, 6, 2])
Ep 5/5, it 83/469: l

Ep 5/5, it 99/469: loss train: 0.00, accuracy train: 1.00tensor([9, 0, 1, 8, 9, 8, 2, 8, 0, 6, 5, 4, 7, 3, 2, 8, 9, 0, 9, 8, 3, 7, 0, 1,
        7, 8, 4, 0, 7, 7, 0, 2, 5, 5, 4, 3, 3, 3, 4, 6, 6, 5, 4, 6, 1, 2, 0, 5,
        7, 1, 0, 7, 6, 6, 3, 3, 3, 6, 7, 0, 8, 5, 7, 3, 2, 8, 6, 9, 1, 9, 7, 4,
        9, 4, 0, 0, 7, 1, 5, 7, 3, 5, 4, 6, 3, 7, 5, 6, 1, 8, 0, 7, 1, 6, 4, 1,
        5, 2, 1, 6, 7, 7, 2, 0, 6, 5, 8, 4, 6, 8, 1, 7, 6, 7, 0, 2, 7, 3, 7, 7,
        0, 1, 3, 8, 8, 3, 3, 3])
Ep 5/5, it 100/469: loss train: 0.02, accuracy train: 0.99tensor([3, 6, 9, 0, 3, 1, 7, 1, 9, 7, 7, 9, 7, 1, 2, 6, 3, 7, 5, 0, 2, 5, 6, 5,
        4, 4, 9, 1, 6, 2, 0, 8, 3, 8, 4, 7, 6, 7, 6, 3, 4, 7, 2, 7, 8, 9, 6, 5,
        2, 4, 5, 2, 5, 1, 3, 5, 9, 6, 4, 8, 2, 9, 5, 8, 9, 6, 8, 1, 7, 6, 6, 5,
        2, 0, 4, 7, 8, 1, 3, 3, 9, 1, 5, 7, 9, 4, 8, 9, 2, 5, 9, 9, 5, 4, 6, 0,
        5, 3, 9, 5, 0, 5, 1, 3, 4, 7, 1, 6, 1, 9, 3, 1, 4, 0, 3, 6, 2, 5, 8, 9,
        1, 1, 5, 0, 5, 4, 7, 0])
Ep 5/5, it 101/469:

Ep 5/5, it 120/469: loss train: 0.01, accuracy train: 1.00tensor([4, 3, 6, 9, 0, 8, 4, 9, 5, 5, 7, 9, 1, 3, 6, 1, 0, 6, 7, 1, 1, 3, 2, 7,
        0, 0, 6, 2, 5, 5, 5, 6, 9, 1, 3, 1, 8, 7, 2, 6, 1, 7, 2, 9, 1, 6, 8, 0,
        3, 2, 3, 3, 7, 3, 1, 3, 5, 0, 9, 4, 0, 8, 8, 9, 9, 0, 7, 9, 6, 7, 4, 1,
        7, 4, 0, 6, 4, 7, 3, 9, 5, 6, 6, 0, 6, 5, 8, 1, 9, 9, 7, 0, 4, 2, 9, 4,
        6, 4, 7, 1, 3, 4, 4, 2, 9, 2, 4, 6, 6, 4, 6, 9, 1, 8, 6, 1, 9, 4, 1, 1,
        4, 0, 3, 0, 0, 9, 1, 2])
Ep 5/5, it 121/469: loss train: 0.02, accuracy train: 1.00tensor([3, 3, 5, 5, 6, 3, 0, 2, 9, 5, 2, 4, 7, 5, 7, 2, 1, 3, 9, 1, 5, 2, 1, 3,
        8, 9, 2, 3, 6, 0, 5, 0, 2, 3, 7, 9, 2, 9, 7, 9, 5, 6, 7, 5, 3, 8, 4, 7,
        0, 3, 4, 9, 9, 2, 7, 9, 1, 0, 0, 3, 1, 1, 8, 7, 5, 1, 8, 0, 4, 4, 7, 0,
        6, 4, 7, 1, 3, 8, 6, 0, 7, 2, 0, 3, 9, 2, 5, 3, 0, 2, 2, 1, 2, 6, 2, 5,
        1, 1, 6, 5, 4, 9, 7, 9, 5, 1, 7, 2, 2, 5, 2, 7, 7, 4, 2, 6, 9, 3, 5, 3,
        3, 9, 8, 6, 6, 1, 4, 1])
Ep 5/5, it 122/469

Ep 5/5, it 138/469: loss train: 0.03, accuracy train: 0.99tensor([0, 0, 1, 2, 3, 0, 7, 4, 1, 6, 4, 2, 5, 3, 5, 5, 1, 4, 3, 4, 2, 2, 8, 1,
        9, 7, 8, 4, 6, 5, 9, 3, 7, 1, 3, 4, 5, 6, 3, 0, 3, 5, 1, 8, 3, 6, 8, 3,
        8, 3, 3, 3, 1, 3, 0, 9, 1, 4, 6, 4, 6, 5, 8, 3, 9, 3, 5, 3, 0, 0, 0, 8,
        8, 6, 3, 8, 0, 7, 3, 0, 3, 4, 2, 4, 4, 1, 0, 3, 6, 1, 6, 1, 0, 2, 9, 0,
        4, 2, 5, 8, 5, 0, 7, 1, 9, 0, 8, 3, 6, 3, 0, 6, 8, 1, 7, 7, 6, 1, 0, 0,
        8, 1, 4, 5, 5, 9, 1, 8])
Ep 5/5, it 139/469: loss train: 0.00, accuracy train: 1.00tensor([6, 9, 4, 9, 6, 9, 6, 1, 0, 5, 7, 1, 6, 9, 2, 8, 2, 9, 4, 9, 8, 5, 7, 1,
        6, 7, 3, 9, 4, 7, 3, 1, 7, 6, 9, 5, 0, 9, 1, 8, 5, 4, 1, 9, 0, 2, 7, 1,
        9, 1, 9, 1, 5, 2, 2, 7, 6, 9, 3, 8, 3, 2, 4, 9, 7, 6, 9, 1, 2, 4, 3, 3,
        6, 2, 9, 9, 1, 1, 5, 0, 6, 9, 2, 2, 2, 5, 1, 9, 7, 8, 0, 0, 8, 7, 6, 3,
        9, 2, 4, 2, 0, 9, 5, 6, 0, 2, 6, 4, 8, 5, 1, 2, 4, 4, 3, 2, 1, 8, 1, 5,
        7, 1, 0, 1, 5, 7, 9, 3])
Ep 5/5, it 140/469

Ep 5/5, it 156/469: loss train: 0.00, accuracy train: 1.00tensor([1, 6, 5, 3, 0, 6, 2, 0, 2, 2, 1, 2, 1, 0, 0, 5, 1, 2, 0, 7, 5, 1, 9, 2,
        2, 8, 2, 6, 0, 6, 0, 3, 1, 4, 8, 7, 1, 1, 9, 1, 7, 7, 4, 7, 7, 1, 5, 1,
        7, 9, 2, 8, 4, 5, 3, 0, 3, 0, 0, 8, 9, 1, 0, 3, 1, 2, 0, 3, 7, 8, 2, 1,
        8, 4, 9, 1, 4, 5, 7, 9, 5, 4, 5, 4, 7, 7, 3, 8, 9, 9, 6, 3, 2, 7, 4, 3,
        9, 9, 7, 9, 3, 7, 9, 5, 0, 2, 1, 5, 0, 6, 8, 7, 7, 7, 1, 5, 6, 3, 6, 0,
        3, 1, 2, 5, 9, 6, 9, 6])
Ep 5/5, it 157/469: loss train: 0.01, accuracy train: 1.00tensor([0, 7, 2, 9, 6, 3, 4, 5, 4, 7, 9, 9, 5, 7, 2, 2, 3, 5, 8, 6, 3, 4, 9, 4,
        4, 0, 4, 1, 1, 8, 7, 2, 6, 3, 7, 1, 8, 7, 8, 0, 1, 2, 8, 3, 7, 3, 9, 5,
        6, 6, 1, 6, 9, 7, 1, 1, 2, 1, 4, 9, 2, 4, 3, 6, 0, 2, 9, 9, 8, 8, 5, 2,
        1, 3, 6, 4, 6, 3, 6, 7, 1, 6, 9, 2, 3, 2, 3, 6, 9, 1, 0, 7, 1, 3, 8, 4,
        4, 7, 7, 3, 3, 5, 3, 6, 0, 7, 7, 6, 5, 4, 1, 0, 2, 9, 7, 7, 7, 5, 1, 1,
        6, 9, 9, 8, 8, 1, 7, 6])
Ep 5/5, it 158/469

Ep 5/5, it 173/469: loss train: 0.01, accuracy train: 1.00tensor([1, 1, 8, 0, 1, 6, 7, 0, 3, 3, 0, 2, 0, 3, 9, 3, 7, 0, 8, 0, 8, 8, 3, 4,
        5, 4, 0, 1, 6, 0, 3, 5, 0, 3, 3, 3, 9, 8, 3, 2, 6, 8, 8, 6, 5, 3, 6, 8,
        9, 9, 3, 1, 1, 5, 0, 9, 8, 0, 0, 1, 2, 5, 3, 4, 1, 9, 6, 1, 9, 7, 2, 2,
        3, 9, 8, 8, 3, 3, 0, 2, 3, 1, 6, 3, 2, 7, 6, 0, 4, 7, 0, 1, 7, 5, 4, 4,
        4, 4, 5, 5, 9, 1, 6, 3, 4, 4, 2, 3, 5, 7, 6, 5, 6, 2, 8, 2, 3, 9, 1, 6,
        1, 8, 2, 5, 7, 9, 2, 0])
Ep 5/5, it 174/469: loss train: 0.01, accuracy train: 1.00tensor([2, 1, 6, 5, 9, 0, 5, 5, 2, 2, 5, 3, 4, 1, 2, 6, 6, 9, 3, 9, 4, 9, 6, 8,
        8, 7, 0, 5, 2, 9, 7, 6, 3, 7, 8, 3, 1, 9, 3, 3, 3, 5, 0, 3, 9, 7, 8, 2,
        5, 7, 6, 9, 5, 3, 7, 0, 5, 6, 0, 9, 6, 3, 7, 0, 5, 4, 4, 3, 1, 0, 7, 3,
        3, 4, 7, 1, 3, 8, 3, 2, 1, 4, 2, 1, 3, 6, 2, 5, 0, 3, 2, 4, 1, 9, 6, 2,
        9, 0, 3, 4, 6, 8, 9, 4, 7, 8, 0, 3, 0, 0, 3, 0, 9, 7, 9, 9, 6, 4, 6, 4,
        1, 8, 4, 7, 3, 3, 5, 4])
Ep 5/5, it 175/469

Ep 5/5, it 191/469: loss train: 0.01, accuracy train: 1.00tensor([8, 0, 8, 6, 1, 3, 5, 7, 0, 6, 0, 7, 5, 7, 8, 4, 7, 3, 2, 5, 3, 3, 9, 9,
        0, 6, 7, 7, 6, 0, 8, 7, 6, 3, 2, 1, 1, 4, 4, 8, 5, 9, 5, 3, 9, 9, 5, 3,
        4, 2, 1, 0, 6, 8, 5, 2, 9, 2, 0, 0, 6, 3, 0, 4, 6, 1, 1, 7, 2, 7, 3, 4,
        3, 1, 0, 5, 3, 6, 6, 9, 7, 7, 2, 2, 8, 3, 9, 6, 7, 2, 2, 7, 9, 9, 5, 5,
        4, 6, 4, 9, 1, 6, 4, 0, 0, 8, 7, 2, 8, 4, 6, 2, 2, 4, 9, 7, 9, 7, 8, 4,
        7, 5, 8, 9, 1, 8, 4, 7])
Ep 5/5, it 192/469: loss train: 0.01, accuracy train: 1.00tensor([5, 5, 1, 7, 8, 7, 2, 0, 6, 0, 8, 0, 9, 7, 6, 8, 4, 7, 6, 1, 3, 0, 7, 2,
        8, 3, 8, 9, 9, 1, 8, 0, 7, 1, 5, 1, 9, 3, 4, 2, 9, 8, 1, 8, 8, 5, 3, 9,
        0, 4, 5, 4, 3, 7, 0, 4, 3, 4, 3, 7, 3, 5, 2, 2, 1, 4, 7, 0, 0, 0, 1, 1,
        2, 0, 9, 6, 2, 1, 0, 3, 9, 8, 7, 2, 2, 6, 3, 3, 0, 4, 1, 6, 4, 6, 4, 4,
        1, 2, 7, 9, 5, 1, 9, 2, 0, 6, 3, 7, 5, 8, 5, 8, 3, 7, 2, 2, 7, 0, 5, 1,
        1, 8, 8, 7, 5, 3, 4, 1])
Ep 5/5, it 193/469

Ep 5/5, it 209/469: loss train: 0.01, accuracy train: 1.00tensor([8, 1, 8, 4, 8, 0, 1, 8, 2, 8, 4, 8, 4, 7, 1, 7, 3, 8, 7, 0, 7, 6, 2, 8,
        4, 8, 6, 6, 7, 6, 9, 2, 2, 3, 9, 7, 6, 5, 1, 6, 9, 6, 6, 1, 8, 9, 3, 0,
        8, 6, 3, 0, 6, 9, 1, 1, 2, 2, 9, 9, 6, 5, 2, 8, 4, 4, 6, 4, 5, 1, 1, 1,
        7, 3, 9, 1, 6, 8, 5, 4, 7, 6, 7, 3, 4, 5, 3, 5, 7, 0, 5, 4, 7, 5, 2, 9,
        5, 0, 5, 7, 3, 4, 7, 7, 8, 3, 1, 0, 1, 2, 1, 4, 6, 3, 1, 5, 5, 8, 5, 4,
        7, 5, 5, 7, 7, 1, 5, 1])
Ep 5/5, it 210/469: loss train: 0.02, accuracy train: 0.99tensor([9, 1, 2, 7, 1, 6, 1, 7, 1, 4, 6, 3, 6, 5, 1, 1, 1, 6, 5, 1, 4, 1, 2, 0,
        6, 8, 6, 0, 8, 1, 3, 4, 9, 4, 6, 6, 3, 5, 1, 0, 2, 2, 8, 7, 6, 7, 5, 5,
        7, 8, 8, 2, 4, 4, 1, 1, 5, 6, 4, 7, 4, 4, 0, 4, 7, 7, 4, 7, 7, 7, 6, 1,
        1, 0, 7, 9, 8, 6, 5, 5, 2, 5, 5, 0, 6, 7, 3, 0, 7, 9, 4, 9, 2, 0, 8, 9,
        2, 9, 2, 3, 3, 4, 6, 8, 4, 8, 6, 9, 6, 8, 5, 0, 8, 6, 9, 2, 0, 7, 9, 5,
        4, 6, 7, 2, 5, 9, 9, 5])
Ep 5/5, it 211/469

Ep 5/5, it 226/469: loss train: 0.01, accuracy train: 1.00tensor([1, 8, 8, 3, 7, 6, 6, 0, 7, 4, 2, 5, 1, 3, 7, 5, 6, 5, 4, 5, 4, 9, 0, 7,
        9, 1, 9, 4, 2, 2, 6, 8, 5, 1, 9, 5, 6, 1, 6, 8, 7, 5, 2, 1, 3, 6, 4, 0,
        2, 1, 5, 1, 8, 1, 7, 3, 1, 0, 2, 5, 6, 6, 4, 7, 4, 3, 0, 1, 2, 2, 1, 8,
        0, 5, 2, 6, 2, 7, 2, 4, 6, 1, 0, 2, 4, 0, 9, 2, 8, 7, 0, 9, 0, 0, 1, 4,
        2, 1, 2, 0, 1, 0, 6, 7, 6, 8, 0, 5, 3, 8, 5, 9, 4, 1, 1, 4, 9, 4, 4, 5,
        3, 1, 7, 1, 3, 3, 1, 3])
Ep 5/5, it 227/469: loss train: 0.01, accuracy train: 1.00tensor([2, 6, 2, 5, 4, 9, 1, 5, 5, 6, 1, 8, 0, 1, 5, 2, 5, 5, 5, 4, 9, 6, 5, 6,
        0, 5, 0, 5, 6, 3, 9, 6, 2, 1, 5, 5, 3, 3, 8, 9, 8, 1, 0, 0, 6, 6, 8, 9,
        4, 9, 1, 5, 0, 1, 3, 3, 7, 9, 3, 9, 8, 3, 1, 9, 8, 6, 3, 8, 3, 0, 6, 5,
        3, 5, 2, 0, 3, 8, 1, 3, 5, 7, 6, 7, 3, 6, 0, 0, 2, 6, 4, 6, 9, 8, 5, 5,
        6, 0, 1, 8, 0, 3, 9, 5, 3, 5, 1, 2, 9, 7, 4, 0, 3, 2, 3, 6, 2, 7, 6, 9,
        9, 4, 0, 5, 7, 7, 4, 8])
Ep 5/5, it 228/469

Ep 5/5, it 245/469: loss train: 0.04, accuracy train: 0.98tensor([5, 6, 9, 0, 1, 4, 0, 6, 9, 7, 5, 8, 7, 6, 5, 6, 8, 6, 0, 5, 6, 6, 8, 5,
        6, 6, 6, 8, 3, 6, 1, 8, 9, 3, 3, 8, 9, 1, 4, 0, 3, 2, 3, 6, 5, 5, 7, 3,
        6, 6, 0, 2, 8, 4, 2, 2, 4, 4, 9, 7, 2, 7, 6, 7, 1, 4, 3, 0, 6, 8, 2, 0,
        0, 1, 0, 4, 7, 0, 4, 2, 8, 5, 5, 8, 6, 3, 1, 3, 0, 3, 5, 1, 4, 2, 8, 8,
        1, 6, 3, 5, 7, 7, 4, 3, 1, 8, 9, 5, 1, 6, 9, 5, 0, 1, 2, 7, 4, 9, 3, 2,
        7, 1, 4, 3, 7, 2, 0, 0])
Ep 5/5, it 246/469: loss train: 0.01, accuracy train: 1.00tensor([0, 9, 6, 1, 4, 4, 4, 9, 1, 9, 9, 8, 1, 8, 1, 4, 9, 8, 6, 3, 5, 7, 0, 3,
        7, 0, 7, 1, 2, 1, 0, 1, 1, 4, 7, 6, 1, 7, 0, 2, 8, 0, 2, 5, 2, 1, 0, 5,
        6, 6, 6, 2, 0, 8, 3, 6, 3, 6, 7, 0, 2, 8, 1, 0, 8, 3, 0, 1, 2, 7, 0, 7,
        2, 2, 7, 3, 7, 8, 7, 2, 0, 9, 9, 8, 3, 4, 9, 3, 5, 7, 0, 9, 0, 1, 4, 8,
        2, 4, 5, 7, 5, 6, 3, 6, 5, 4, 2, 8, 6, 0, 0, 8, 6, 7, 4, 6, 3, 8, 4, 8,
        2, 4, 2, 1, 5, 3, 6, 9])
Ep 5/5, it 247/469

Ep 5/5, it 264/469: loss train: 0.02, accuracy train: 0.98tensor([8, 7, 3, 9, 8, 1, 2, 0, 2, 4, 3, 0, 8, 2, 7, 5, 7, 3, 4, 1, 4, 3, 0, 5,
        8, 7, 3, 0, 8, 5, 2, 2, 1, 5, 3, 9, 0, 8, 0, 0, 3, 2, 6, 4, 9, 2, 5, 9,
        0, 3, 3, 2, 9, 2, 5, 6, 7, 3, 1, 7, 2, 9, 4, 2, 7, 9, 1, 7, 2, 1, 9, 0,
        1, 0, 4, 4, 7, 2, 2, 6, 6, 2, 5, 0, 7, 2, 5, 8, 7, 7, 2, 9, 1, 5, 4, 9,
        9, 1, 5, 3, 3, 4, 6, 1, 7, 1, 0, 2, 1, 4, 9, 6, 1, 8, 1, 0, 9, 8, 1, 7,
        3, 0, 8, 7, 1, 4, 4, 7])
Ep 5/5, it 265/469: loss train: 0.01, accuracy train: 1.00tensor([2, 3, 3, 0, 7, 0, 4, 1, 1, 4, 1, 5, 1, 5, 0, 0, 5, 0, 1, 5, 8, 2, 2, 9,
        8, 9, 9, 0, 6, 3, 4, 7, 6, 6, 2, 5, 7, 9, 9, 7, 9, 5, 1, 7, 8, 0, 3, 2,
        1, 4, 0, 1, 6, 5, 6, 9, 5, 4, 6, 2, 5, 4, 3, 0, 6, 2, 0, 4, 0, 8, 0, 0,
        7, 5, 3, 3, 8, 1, 5, 7, 8, 3, 5, 6, 1, 4, 7, 3, 7, 3, 2, 5, 8, 8, 5, 3,
        0, 4, 4, 9, 8, 2, 1, 9, 5, 1, 6, 4, 3, 1, 2, 3, 6, 2, 4, 2, 0, 2, 8, 2,
        4, 1, 9, 9, 9, 5, 0, 3])
Ep 5/5, it 266/469

Ep 5/5, it 282/469: loss train: 0.02, accuracy train: 0.99tensor([0, 7, 5, 3, 2, 8, 7, 9, 2, 0, 3, 9, 7, 4, 2, 9, 3, 7, 5, 0, 0, 3, 6, 3,
        6, 6, 7, 8, 0, 2, 0, 1, 3, 4, 6, 9, 2, 6, 1, 5, 1, 7, 1, 4, 0, 5, 0, 4,
        7, 3, 1, 0, 7, 0, 3, 6, 5, 8, 9, 8, 9, 6, 3, 3, 3, 6, 5, 1, 8, 6, 1, 0,
        2, 8, 2, 4, 3, 0, 9, 4, 3, 3, 8, 5, 2, 6, 1, 0, 0, 2, 7, 4, 0, 2, 5, 6,
        0, 8, 7, 2, 5, 8, 0, 2, 1, 7, 4, 9, 9, 9, 3, 3, 4, 3, 9, 1, 7, 5, 3, 5,
        3, 0, 3, 8, 3, 5, 7, 9])
Ep 5/5, it 283/469: loss train: 0.01, accuracy train: 1.00tensor([3, 5, 0, 2, 3, 3, 4, 0, 4, 4, 3, 5, 1, 9, 8, 6, 1, 4, 6, 5, 2, 5, 6, 2,
        0, 7, 7, 2, 2, 3, 3, 5, 8, 5, 7, 1, 7, 7, 5, 3, 1, 2, 8, 0, 4, 0, 4, 5,
        3, 4, 9, 5, 3, 6, 0, 8, 7, 1, 3, 2, 1, 7, 6, 0, 0, 1, 4, 6, 0, 4, 3, 0,
        2, 4, 2, 7, 7, 1, 1, 2, 8, 7, 9, 7, 4, 1, 7, 4, 0, 8, 0, 8, 2, 3, 7, 9,
        8, 5, 8, 9, 0, 0, 1, 0, 4, 0, 1, 7, 8, 8, 3, 7, 3, 0, 7, 1, 2, 7, 4, 5,
        8, 0, 5, 3, 4, 1, 9, 7])
Ep 5/5, it 284/469

Ep 5/5, it 300/469: loss train: 0.00, accuracy train: 1.00tensor([1, 0, 6, 4, 8, 8, 2, 7, 7, 3, 9, 7, 6, 7, 2, 1, 3, 9, 0, 7, 7, 4, 1, 6,
        8, 7, 3, 4, 4, 1, 2, 5, 6, 6, 1, 0, 7, 6, 1, 2, 7, 5, 0, 0, 1, 0, 1, 2,
        9, 2, 5, 0, 9, 4, 6, 3, 3, 7, 6, 0, 9, 5, 6, 3, 1, 2, 2, 2, 1, 8, 6, 4,
        1, 7, 6, 3, 0, 5, 6, 5, 2, 7, 5, 2, 7, 6, 7, 3, 0, 1, 4, 5, 3, 7, 0, 4,
        4, 7, 3, 7, 1, 8, 3, 8, 1, 9, 9, 4, 7, 9, 6, 8, 5, 1, 2, 6, 1, 4, 4, 1,
        7, 0, 0, 3, 2, 7, 8, 8])
Ep 5/5, it 301/469: loss train: 0.01, accuracy train: 1.00tensor([9, 8, 8, 2, 9, 0, 8, 3, 2, 9, 1, 3, 4, 4, 1, 5, 9, 2, 3, 8, 9, 8, 8, 2,
        1, 5, 6, 8, 2, 4, 2, 5, 3, 0, 6, 2, 9, 0, 6, 0, 2, 3, 8, 7, 0, 0, 9, 1,
        1, 9, 7, 9, 6, 5, 1, 6, 3, 9, 9, 9, 2, 7, 8, 5, 1, 6, 5, 9, 2, 9, 1, 2,
        4, 1, 0, 6, 7, 6, 1, 2, 0, 6, 3, 1, 5, 0, 4, 9, 2, 1, 1, 5, 7, 4, 2, 0,
        8, 0, 6, 6, 0, 0, 0, 8, 7, 6, 7, 8, 8, 0, 9, 1, 7, 3, 9, 0, 7, 0, 1, 5,
        3, 9, 4, 3, 2, 0, 0, 0])
Ep 5/5, it 302/469

Ep 5/5, it 317/469: loss train: 0.00, accuracy train: 1.00tensor([4, 6, 0, 3, 4, 0, 5, 5, 3, 2, 3, 3, 7, 7, 7, 4, 7, 7, 7, 9, 8, 5, 6, 2,
        1, 7, 4, 1, 9, 8, 9, 0, 2, 6, 6, 9, 4, 7, 8, 6, 1, 1, 6, 2, 7, 6, 5, 3,
        0, 9, 8, 4, 2, 1, 0, 6, 7, 4, 4, 2, 9, 1, 7, 4, 1, 3, 2, 9, 6, 4, 7, 2,
        6, 4, 2, 6, 4, 9, 4, 8, 1, 3, 4, 0, 1, 9, 0, 2, 8, 7, 2, 8, 2, 0, 8, 4,
        2, 4, 7, 6, 7, 3, 7, 3, 8, 6, 7, 7, 3, 6, 6, 9, 9, 3, 1, 1, 5, 4, 3, 2,
        3, 7, 3, 8, 9, 3, 2, 9])
Ep 5/5, it 318/469: loss train: 0.01, accuracy train: 1.00tensor([0, 2, 1, 1, 5, 9, 9, 7, 9, 3, 6, 6, 3, 8, 1, 4, 6, 3, 5, 8, 2, 3, 0, 5,
        1, 7, 1, 8, 4, 9, 2, 0, 7, 4, 6, 6, 9, 0, 0, 0, 6, 6, 1, 5, 8, 3, 6, 7,
        2, 2, 3, 6, 1, 8, 5, 3, 2, 6, 4, 3, 2, 9, 1, 7, 0, 8, 3, 7, 8, 9, 8, 4,
        8, 4, 0, 4, 1, 3, 8, 0, 0, 9, 1, 1, 9, 5, 2, 8, 4, 8, 5, 2, 1, 6, 7, 2,
        7, 6, 4, 1, 2, 1, 6, 5, 3, 8, 7, 7, 5, 3, 9, 0, 1, 6, 5, 0, 2, 3, 8, 2,
        7, 9, 0, 1, 8, 6, 5, 2])
Ep 5/5, it 319/469

Ep 5/5, it 335/469: loss train: 0.03, accuracy train: 0.99tensor([2, 5, 9, 0, 9, 5, 5, 8, 7, 4, 1, 9, 3, 2, 7, 5, 3, 6, 6, 5, 9, 4, 9, 1,
        0, 9, 6, 8, 5, 3, 1, 4, 5, 6, 4, 1, 5, 1, 4, 3, 8, 4, 3, 2, 4, 7, 2, 0,
        7, 2, 8, 4, 1, 2, 3, 6, 1, 5, 4, 5, 9, 4, 1, 6, 1, 9, 5, 5, 8, 6, 9, 0,
        1, 3, 7, 0, 6, 0, 7, 1, 2, 6, 5, 3, 7, 5, 5, 4, 9, 4, 5, 1, 4, 4, 3, 5,
        1, 0, 7, 2, 3, 6, 0, 7, 6, 1, 2, 0, 4, 3, 1, 7, 6, 8, 1, 1, 9, 0, 8, 9,
        3, 4, 2, 9, 9, 2, 6, 8])
Ep 5/5, it 336/469: loss train: 0.00, accuracy train: 1.00tensor([9, 1, 1, 3, 1, 8, 0, 8, 6, 3, 6, 7, 4, 8, 7, 4, 4, 2, 8, 1, 5, 9, 8, 4,
        4, 0, 2, 5, 8, 4, 9, 9, 0, 5, 2, 2, 2, 9, 7, 1, 4, 7, 3, 6, 0, 8, 9, 1,
        1, 0, 5, 5, 6, 7, 3, 8, 7, 1, 6, 7, 3, 4, 6, 0, 6, 5, 0, 9, 8, 2, 0, 3,
        4, 7, 2, 5, 5, 0, 1, 6, 4, 3, 9, 3, 6, 3, 3, 5, 8, 1, 8, 2, 3, 4, 1, 6,
        6, 4, 8, 2, 0, 6, 0, 1, 9, 7, 9, 4, 9, 3, 0, 1, 3, 5, 8, 4, 7, 2, 2, 5,
        6, 5, 8, 4, 9, 7, 8, 2])
Ep 5/5, it 337/469

Ep 5/5, it 352/469: loss train: 0.01, accuracy train: 1.00tensor([2, 7, 6, 7, 4, 0, 0, 9, 4, 9, 8, 2, 1, 1, 0, 0, 8, 7, 8, 4, 5, 8, 9, 4,
        6, 7, 1, 1, 3, 5, 7, 9, 6, 2, 5, 4, 0, 3, 5, 4, 2, 7, 5, 9, 2, 1, 7, 5,
        4, 6, 5, 8, 4, 2, 7, 3, 7, 1, 4, 8, 6, 3, 6, 0, 0, 1, 7, 3, 4, 0, 3, 4,
        6, 6, 8, 1, 1, 7, 9, 4, 7, 3, 5, 9, 3, 1, 7, 2, 7, 8, 2, 3, 8, 6, 3, 6,
        1, 0, 0, 7, 8, 3, 4, 8, 9, 6, 1, 0, 2, 3, 2, 4, 2, 8, 8, 3, 3, 2, 6, 9,
        4, 6, 5, 3, 4, 4, 0, 2])
Ep 5/5, it 353/469: loss train: 0.01, accuracy train: 1.00tensor([6, 1, 6, 3, 0, 1, 6, 7, 8, 1, 3, 4, 4, 1, 3, 2, 4, 7, 9, 3, 5, 2, 8, 4,
        8, 5, 3, 8, 9, 2, 0, 8, 2, 1, 2, 0, 9, 5, 5, 4, 0, 8, 8, 0, 7, 5, 2, 4,
        4, 7, 1, 5, 6, 0, 1, 1, 9, 3, 2, 5, 4, 0, 7, 1, 0, 5, 8, 6, 1, 3, 9, 3,
        7, 9, 3, 7, 0, 6, 4, 7, 8, 1, 6, 4, 5, 4, 9, 8, 6, 4, 0, 2, 8, 1, 6, 2,
        3, 2, 2, 1, 9, 3, 5, 7, 8, 3, 6, 4, 7, 2, 0, 0, 8, 2, 4, 2, 1, 4, 7, 1,
        3, 5, 0, 7, 9, 9, 5, 3])
Ep 5/5, it 354/469

Ep 5/5, it 371/469: loss train: 0.01, accuracy train: 1.00tensor([7, 3, 2, 0, 4, 8, 7, 5, 0, 9, 1, 4, 6, 2, 2, 0, 9, 0, 3, 0, 7, 7, 1, 5,
        9, 2, 6, 3, 1, 7, 3, 9, 3, 8, 4, 3, 8, 0, 4, 7, 5, 4, 1, 4, 4, 9, 5, 6,
        9, 7, 6, 8, 2, 5, 9, 2, 2, 5, 9, 7, 8, 4, 1, 4, 6, 7, 9, 5, 8, 0, 1, 5,
        3, 0, 4, 2, 6, 0, 2, 7, 1, 8, 8, 7, 0, 4, 0, 1, 1, 1, 5, 7, 6, 8, 3, 9,
        4, 7, 8, 2, 0, 8, 1, 4, 6, 7, 9, 0, 3, 0, 6, 0, 9, 5, 5, 5, 9, 3, 9, 6,
        1, 9, 9, 6, 3, 6, 2, 0])
Ep 5/5, it 372/469: loss train: 0.01, accuracy train: 1.00tensor([2, 9, 1, 1, 8, 5, 9, 9, 3, 3, 8, 8, 5, 9, 6, 7, 7, 4, 6, 2, 2, 3, 9, 4,
        0, 7, 6, 7, 9, 3, 0, 7, 8, 8, 3, 1, 3, 5, 8, 6, 5, 2, 1, 9, 5, 4, 3, 1,
        6, 8, 3, 4, 8, 2, 2, 2, 1, 6, 7, 9, 0, 5, 7, 3, 0, 1, 2, 1, 8, 1, 6, 0,
        9, 6, 2, 9, 7, 4, 2, 7, 6, 3, 6, 4, 3, 5, 8, 0, 6, 9, 1, 1, 1, 5, 2, 0,
        2, 9, 1, 7, 4, 1, 4, 9, 1, 0, 3, 8, 3, 8, 4, 9, 0, 5, 3, 1, 7, 0, 9, 1,
        4, 0, 2, 3, 6, 9, 1, 1])
Ep 5/5, it 373/469

Ep 5/5, it 388/469: loss train: 0.01, accuracy train: 1.00tensor([8, 8, 4, 3, 3, 6, 2, 8, 7, 9, 7, 1, 4, 2, 6, 7, 0, 9, 1, 9, 7, 5, 5, 0,
        8, 6, 4, 9, 1, 1, 4, 3, 4, 8, 1, 7, 6, 6, 3, 6, 9, 9, 9, 2, 2, 4, 6, 3,
        3, 7, 7, 2, 1, 4, 5, 7, 3, 4, 7, 2, 0, 2, 0, 3, 9, 7, 8, 6, 8, 9, 8, 6,
        7, 8, 6, 3, 2, 1, 2, 5, 0, 3, 5, 3, 3, 0, 2, 6, 5, 9, 6, 0, 8, 9, 4, 4,
        9, 8, 7, 4, 3, 2, 9, 5, 4, 8, 5, 2, 1, 4, 3, 4, 1, 0, 7, 1, 8, 4, 3, 0,
        1, 0, 7, 9, 5, 9, 0, 4])
Ep 5/5, it 389/469: loss train: 0.01, accuracy train: 1.00tensor([8, 4, 8, 1, 3, 7, 9, 0, 4, 7, 1, 0, 6, 0, 6, 8, 7, 1, 7, 6, 3, 2, 7, 6,
        0, 5, 8, 0, 0, 6, 1, 2, 6, 7, 8, 0, 5, 0, 9, 9, 7, 5, 3, 8, 1, 6, 2, 4,
        7, 7, 2, 3, 1, 3, 1, 4, 4, 3, 5, 7, 1, 0, 2, 1, 4, 6, 1, 9, 0, 1, 9, 0,
        7, 2, 2, 9, 9, 2, 8, 1, 7, 6, 5, 3, 4, 1, 4, 2, 5, 0, 6, 2, 1, 8, 9, 0,
        8, 7, 5, 8, 1, 6, 6, 0, 4, 5, 2, 7, 5, 5, 8, 6, 5, 2, 4, 8, 6, 7, 6, 0,
        1, 5, 1, 2, 3, 0, 5, 8])
Ep 5/5, it 390/469


KeyboardInterrupt



# 2 Convolutional Neural Networks (CNNs)

Our 4-layer MLP network works well, reaching a test accuracy of ~0.96. However, this network uses ~0.5M weights. We can use even deeper architectures with fewer parameters and take advantage of the 2D structure of the input data (images) using CNNs.

## 2.1 LeNet-5

Let us define a simple CNN with 2 convolutional layers with max-pooling and 3 Fully-Connected (FC) layers. In particular, we will implement a variant of the architecture called [LeNet-5 introduced by Yann LeCun in 1999](http://yann.lecun.com/exdb/publis/pdf/lecun-99.pdf). 


Your task is to define a simple LeNet-5 architecture depicted in the figure below. Print the architecture and comment on the number of parameters. Finally train the model. To specify the layers, please additionally refer to [`nn.Conv2d`](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html#torch.nn.Conv2d) and [`F.max_pool2d`](https://pytorch.org/docs/stable/generated/torch.nn.functional.max_pool2d.html?highlight=functional+max+pool+2d).

### A word about **padding**
As you saw in the lecture, a convolution will decrease the size of the signal if we only consider the *valid* convolutions, i.e. when the kernel (or mask) is fully within the input signal. In order to avoid this, we can allow the kernel to go slightly beyond the original signal by padding it with zeros at its beginning and end.

To visualize this, imagine a 1D signal of size $4$ and a kernel of size $3$: we can move the kernel on the input only twice, and the output will then have a size of $2$. However, if we add one $0$ at the beginning and at the end of the input, therefore changing its effective size to $6$, the kernel will be able to move four times over it, which will give an output of size $4$, the same as the original input.

**Notes for the implementation:**
* a kernel of size $k$ will lead to a decrease in the size of feature maps by $k-1$, so we should pad with zeros each side by $(k-1)/2$
* we want to use the max-pooling to reduce the size by 2, so its `kernel_size` should be 2
* to go from convolutional to fully-connected layers, we need to reshape the tensor

<img src="img/lenet5.png" width=800></img>

In [10]:
class CNN_LeNet(nn.Module):
    """ CNN expects inputs of shape (1, 28, 28).
    The initial 1 corresponds to the number of channel:
    here 1 for the grayscale value.
    """
    def __init__(self):
        super(CNN_LeNet, self).__init__()

        ### WRITE YOUR CODE HERE
        self.conv2d1 = nn.Conv2d(1, 6, 3, padding=1)
        self.conv2d2 = nn.Conv2d(6, 16, 3, padding=1)
        self.fc1 = nn.Linear(7 * 7 * 16, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        ### WRITE YOUR CODE HERE
        x = F.max_pool2d(F.relu(self.conv2d1(x)), 2)
        x = F.max_pool2d(F.relu(self.conv2d2(x)), 2)
        x = x.reshape((x.shape[0], -1))  # or we could use `x.flatten(-3)`
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        return self.fc3(x)
    
# Instantiate the model.
model_lenet = CNN_LeNet()

**Q:** What is the number of trainable parameters in our LeNet model?

**A:** For each convolution layer, we have $C_{out}$ kernels of size $K\times K\times C_{in}$, where $K$ is the `kernel_size`, $C_{in}$ is the number of channels in the input, and $C_{out}$ is the number of channels in the output. So we have:

(6 * 3 * 3 * 1 + 6) + (16 * 3 * 3 * 6 + 16) + (784 + 1) * 120 + (120 + 1) * 84 + (84 + 1) * 10 = 106'154

Let us check the architecture again and the number of trainable parameters. We can directly see that this architecture needs just about 20% of the parameters the MLP used.

In [11]:
model_lenet(torch.randn(1, 1, 28, 28)).shape

torch.Size([1, 10])

In [12]:
# Print out the architecture and check the number of parameters.
summary(model_lenet, input_size=(1, 1, 28, 28))

Layer (type:depth-idx)                   Output Shape              Param #
CNN_LeNet                                [1, 10]                   --
├─Conv2d: 1-1                            [1, 6, 28, 28]            60
├─Conv2d: 1-2                            [1, 16, 14, 14]           880
├─Linear: 1-3                            [1, 120]                  94,200
├─Linear: 1-4                            [1, 84]                   10,164
├─Linear: 1-5                            [1, 10]                   850
Total params: 106,154
Trainable params: 106,154
Non-trainable params: 0
Total mult-adds (M): 0.32
Input size (MB): 0.00
Forward/backward pass size (MB): 0.06
Params size (MB): 0.42
Estimated Total Size (MB): 0.49

We can now again train our model. As the `train_model()` function we wrote is agnostic to the network used, and because PyTorch automatically computes the gradient for us, we can directly reuse it with our CNN.

However, we do need to define a new optimizer to apply SGD to the weights of our new model!

In [13]:
# Train the model
epochs = 5
learning_rate = 1e-1
optimizer_lenet = torch.optim.SGD(model_lenet.parameters(), lr=learning_rate)  ### WRITE YOUR CODE HERE
train_model(model_lenet, criterion, optimizer_lenet, dataloader_train, dataloader_test, epochs)

Ep 1/5, it 469/469: loss train: 0.12, accuracy train: 0.97, accuracy test: 0.94
Ep 2/5, it 469/469: loss train: 0.20, accuracy train: 0.96, accuracy test: 0.97
Ep 3/5, it 469/469: loss train: 0.09, accuracy train: 0.97, accuracy test: 0.98
Ep 4/5, it 469/469: loss train: 0.01, accuracy train: 1.00, accuracy test: 0.98
Ep 5/5, it 469/469: loss train: 0.01, accuracy train: 1.00, accuracy test: 0.98


## 2.2 3-layered CNN

Let us now define an even deeper CNN with 3 convolutional layers and only 2 FC layers. This network should reach higher accuracy (or converge faster) and still use fewer parameters than the previous architectures.

Your task is to implement a 3-layer CNN as depicted in the figure below. Check the number of parameters using `torchinfo`. Train the model and play around with the number of filters (kernels) used by every layer. Comment on your findings.

<img src="img/cnn.png" width=800></img>

In [10]:
class CNN(nn.Module):
    """ CNN, expects input shape (1, 28, 28).
    """
    def __init__(self, filters=(16, 32, 64)):
        """
        Args
        ----
        filters: tuple or list of 3 integers
            The number of filters (:=kernels) used in the network.
            See the above image for reference.
        """
        super(CNN, self).__init__()

        ### WRITE YOUR CODE HERE
        self.conv2d1 = nn.Conv2d(1, filters[0], 3, 1, padding=1)
        self.conv2d2 = nn.Conv2d(filters[0], filters[1], 3, 1, padding=1)
        self.conv2d3 = nn.Conv2d(filters[1], filters[2], 3, 1, padding=1)
        self.fc1 = nn.Linear(3 * 3 * filters[2], 128)
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        ### WRITE YOUR CODE HERE
        x = F.max_pool2d(F.relu(self.conv2d1(x)), 2)
        x = F.max_pool2d(F.relu(self.conv2d2(x)), 2)
        x = F.max_pool2d(F.relu(self.conv2d3(x)), 2)
        x = x.reshape((x.shape[0], -1))
        x = F.relu(self.fc1(x))
        return self.fc2(x)

# Instantiate the model.
filters = (16, 32, 64)
model_cnn = CNN(filters)

In [11]:
# Print out the architecture and number of parameters.
summary(model_cnn, input_size=(1, 1, 28, 28))

Layer (type:depth-idx)                   Output Shape              Param #
CNN                                      [1, 10]                   --
├─Conv2d: 1-1                            [1, 16, 28, 28]           160
├─Conv2d: 1-2                            [1, 32, 14, 14]           4,640
├─Conv2d: 1-3                            [1, 64, 7, 7]             18,496
├─Linear: 1-4                            [1, 128]                  73,856
├─Linear: 1-5                            [1, 10]                   1,290
Total params: 98,442
Trainable params: 98,442
Non-trainable params: 0
Total mult-adds (M): 2.02
Input size (MB): 0.00
Forward/backward pass size (MB): 0.18
Params size (MB): 0.39
Estimated Total Size (MB): 0.57

And again, we train our new model by reusing the same dataset, criterion, and training function, but with a new optimizer defined for this new model:

In [16]:
# Train the model.
learning_rate = 1e-1
optimizer_cnn = torch.optim.SGD(model_cnn.parameters(), lr=learning_rate)  ### WRITE YOUR CODE HERE
train_model(model_cnn, criterion, optimizer_cnn, dataloader_train, dataloader_test, epochs)

Ep 1/5, it 469/469: loss train: 0.15, accuracy train: 0.95, accuracy test: 0.96
Ep 2/5, it 469/469: loss train: 0.01, accuracy train: 1.00, accuracy test: 0.98
Ep 3/5, it 469/469: loss train: 0.03, accuracy train: 1.00, accuracy test: 0.98
Ep 4/5, it 469/469: loss train: 0.02, accuracy train: 0.99, accuracy test: 0.98
Ep 5/5, it 469/469: loss train: 0.07, accuracy train: 0.98, accuracy test: 0.98


## 2.3 Trying out your own input

We have provided a tool for you to draw your own digits and test your network. Play around with the inputs to get a sense of how accurate your model is. Use the button `reset` to reset the canvas and `predict` to run the prediction on the current canvas image. You can use the button `blur` to blur your drawn image so that it looks closer to the samples from the training set.

**Note:** the following cell may not work properly if VS Code or jupyter lab.

In [17]:
dp = DrawingPad((28, 28), model_lenet)

<IPython.core.display.Javascript object>

Button(description='reset', style=ButtonStyle())

Button(description='blur', style=ButtonStyle())

Button(description='predict', style=ButtonStyle())

Prediction: 1

# 3 Build your own model (Optional)

In this part, we prepared some empty model definition for you. You can use it to build your own model and test it on MNIST!

In [18]:
class MyModel(nn.Module):
    """ Build your own model.
    It should take as input images of shape (1, 28, 28).
    """
    def __init__(self):
        """
        Initialize your model.
        
        Feel free to add argument if you want to.
        """
        super(MyModel, self).__init__()

        ### WRITE YOUR CODE HERE
        ...

    def forward(self, x):
        """Write the forward pass."""
        ### WRITE YOUR CODE HERE
        return ...

# Instantiate the model and print its architecture.
my_model = MyModel()
summary(my_model, input_size=(1, 1, 28, 28))

RuntimeError: Failed to run torchinfo. See above stack traces for more details. Executed layers up to: []

In [None]:
# Train your model.
learning_rate = 1e-1
my_optimizer = torch.optim.SGD(my_model.parameters(), lr=learning_rate)  ### WRITE YOUR CODE HERE
train_model(my_model, criterion, my_optimizer, dataloader_train, dataloader_test, epochs)

# 4 Written questions

**Q.1** (MCQ) Which of the following statements is/are correct?

1. Gradient Descent updates the weights of the network in the direction opposite to the gradient of the loss function.
2. The primary role of an activation function is to introduce non-linearity into the deep network.
3. Strided convolution (stride > 1) is used to increase the size of the feature maps.
4. Max pooling has learnable parameters.

**A.1** Correct answers are 1 and 2.

**Q.2** What advantages do CNN have over MLP? Can we always use a CNN instead of an MLP?

**A.2** CNN use spatial information which can be very meaningful, e.g., neighbouring pixels in an image. Additionally, the convolution applies the same operation at every spatial location on the input, so convolutions can help to drastically reduce the number of weights compared to MLP. These make CNN both powerful and robust to overfitting in comparison to MLP.

However, CNN cannot always be used in place of MLP. As said, they rely on the signal having some form of *spatiality* such as pixels on an image, 1D signals like the evolution of temperature over multiple days, etc. Otherwise, the convolution does not make sense.

**Q.3** Consider the following 1D signal:

$$[-1, 4, -6, 0, 3, 2, 1, -3, 5]$$

What is the output of the 1D convolution with the mask $w$ and $\texttt{stride=2}$ (no padding is used)?

$$w = [-1, 2, 1]$$

**A.3** The output would be $[3, 9, 2, -2]$.

**Q.4** Imagine you have a CNN that you use to classify greyscale images of animals, where each image is of size $128\times128\times1$. You then receive an updated version of that dataset where the images are in color, so they have now three channel: Red, Green, and Blue.

What is the new shape of the images? What minial change to your CNN can you do to make it accept and use this new color information?

**A.4** The new images are $128\times128\times3$ as they have $3$ channels instead of $1$. To adapt our CNN to this, we can simply modify the number of input channels of the first convolutional layer from $1$ to $3$.