## Project 3: Digit Recognition

Good programmers can use neural nets. Great programmers can make them. This section will guide you through the implementation of a simple neural net with an architecture as shown in the figure below. You will implement the net from scratch (you will probably never do this again, don't worry) so that you later feel confident about using libraries. We provide some skeleton code in neural_nets.py for you to fill in.

![neural_net](../Media/images_neuralnet.png)

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import torch

  from .autonotebook import tqdm as notebook_tqdm


### 3. Activation Functions

#### Rectified Linear Unit

In [3]:
def rectified_linear_unit(x):
    """ Returns the ReLU of x, or the maximum between 0 and x."""
    return np.maximum(x, 0)

#### Taking the Derivative

In [62]:
def rectified_linear_unit_derivative(x):
    """ Returns the derivative of ReLU."""

    x[x > 0] = 1
    x[x <= 0] = 0

    # ReLu returns 1 for all positive values and 0 for all negative values
    # (Returns 0 when the value is equal to 0 as well)
    return x

------
### 4. Training the Network

In [135]:
def output_layer_activation(x):
    """ Linear function, returns input as is. """
    return x

def output_layer_activation_derivative(x):
    """ Returns the derivative of a linear function: 1. """
    return 1


class NeuralNetwork():
    """
        Contains the following functions:
            -train: tunes parameters of the neural network based on error obtained from forward propagation.
            -predict: predicts the label of a feature vector based on the class's parameters.
            -train_neural_network: trains a neural network over all the data points for the specified number of epochs during initialization of the class.
            -test_neural_network: uses the parameters specified at the time in order to test that the neural network classifies the points given in testing_points within a margin of error.
    """

    def __init__(self):

        # DO NOT CHANGE PARAMETERS (Initialized to floats instead of ints)
        self.input_to_hidden_weights = np.matrix('1. 1.; 1. 1.; 1. 1.')  # (3,2)
        self.hidden_to_output_weights = np.matrix('1. 1. 1.')
        self.biases = np.matrix('0.; 0.; 0.')
        self.learning_rate = .001
        self.epochs_to_train = 10
        self.training_points = [((2,1), 10), ((3,3), 21), ((4,5), 32), ((6, 6), 42)]
        self.testing_points = [(1,1), (2,2), (3,3), (5,5), (10,10)]


    # ============================================================

    def train(self, x1 : float, x2: float, y):

        ### Forward propagation ###
        input_values = np.matrix([[x1],[x2]]) # 2 by 1

        # Calculate the input and activation of the hidden layer
        hidden_layer_weighted_input = np.dot(self.input_to_hidden_weights, input_values) + self.biases  # (3,2) * (2x1) = (3,1) + (3,1) = (3,1)
        hidden_layer_activation = rectified_linear_unit(hidden_layer_weighted_input)                    # (3,1)

        output = np.dot(self.hidden_to_output_weights, hidden_layer_activation)                         # (1,3) * (3,1) = (1,1)
        activated_output = output_layer_activation(output)

        ### Backpropagation ###

        # Compute gradients
        output_layer_error = (y - activated_output)                                                                                 # Derivative of cost function
        hidden_layer_error = self.hidden_to_output_weights.T * output_layer_activation_derivative(output) *  output_layer_error     # (3 by 1 matrix)

        bias_gradients = hidden_layer_error * 1                                                     # Derivative of Z with respect of the bias is 1 (Z = W*a + b) 
        hidden_to_output_weight_gradients = np.dot(output_layer_error, hidden_layer_activation.T)   # Derivative of Z with respect of the weights is the weighted input of the layer (Z' = a)
        input_to_hidden_weight_gradients = np.dot(hidden_layer_error, input_values.T)
        
        # print(bias_gradients.shape)
        # print(input_to_hidden_weight_gradients.shape)
        # print(hidden_to_output_weight_gradients.shape)
        # print("======================")

        # Use gradients to adjust weights and biases using gradient descent
        self.biases = self.biases - self.learning_rate * bias_gradients
        self.input_to_hidden_weights = self.input_to_hidden_weights - self.learning_rate * input_to_hidden_weight_gradients
        self.hidden_to_output_weights = self.hidden_to_output_weights - self.learning_rate * hidden_to_output_weight_gradients

    # ============================================================

    def predict(self, x1, x2):

        input_values = np.matrix([[x1],[x2]])
        print(input_values.shape)
        print(self.input_to_hidden_weights.shape)

        # Compute output for a single input(should be same as the forward propagation in training)
        hidden_layer_weighted_input = np.dot(self.input_to_hidden_weights, input_values) + self.biases
        hidden_layer_activation = rectified_linear_unit(hidden_layer_weighted_input)
        output = np.dot(self.hidden_to_output_weights, hidden_layer_activation)

        activated_output = output_layer_activation(output)
        return activated_output.item()

    # Run this to train your neural network once you complete the train method
    def train_neural_network(self):

        for epoch in range(self.epochs_to_train):
            for x,y in self.training_points:
                self.train(x[0], x[1], y)

    # Run this to test your neural network implementation for correctness after it is trained
    def test_neural_network(self):

        for point in self.testing_points:
            print("Point,", point, "Prediction,", self.predict(point[0], point[1]))
            if abs(self.predict(point[0], point[1]) - 7*point[0]) < 0.1:
                print("Test Passed")
            else:
                print("Point ", point[0], point[1], " failed to be predicted correctly.")
                return

In [136]:
x = NeuralNetwork()
x.train_neural_network()
x.test_neural_network()

(2, 1)
(3, 2)
Point, (1, 1) Prediction, -inf
(2, 1)
(3, 2)
Point  1 1  failed to be predicted correctly.


### 8. Fully-Connected Neural Networks

#### Training and Testing Accuracy Over Time

In [153]:
import _pickle as cPickle, gzip
import numpy as np
from tqdm import tqdm
import torch
import torch.autograd as autograd
import torch.nn.functional as F
import torch.nn as nn
import sys

sys.path.append("..")
from mnist.utils import *
from mnist.part2_mnist.train_utils import batchify_data, run_epoch, train_model

# Specify seed for deterministic behavior, then shuffle. 
# Do not change seed for official submissions to edx
np.random.seed(12321)  # for reproducibility
torch.manual_seed(12321)  # for reproducibility

# Load the dataset
num_classes = 10
X_train, y_train, X_test, y_test = get_MNIST_data()

# Split into train and dev
dev_split_index = int(9 * len(X_train) / 10)
X_dev = X_train[dev_split_index:]
y_dev = y_train[dev_split_index:]
X_train = X_train[:dev_split_index]
y_train = y_train[:dev_split_index]

permutation = np.array([i for i in range(len(X_train))])
np.random.shuffle(permutation)
X_train = [X_train[i] for i in permutation]
y_train = [y_train[i] for i in permutation]

# Split dataset into batches
batch_size = 32
train_batches = batchify_data(X_train, y_train, batch_size)
dev_batches = batchify_data(X_dev, y_dev, batch_size)
test_batches = batchify_data(X_test, y_test, batch_size)

#################################
## Model specification TODO
model = nn.Sequential(
            nn.Linear(784, 128),
            nn.LeakyReLU(),
            nn.Linear(128, 10),
        )
lr=0.1
momentum=0
##################################

train_model(train_batches, dev_batches, model, lr=lr, momentum=momentum)

## Evaluate the model on test data
loss, accuracy = run_epoch(test_batches, model.eval(), None)

print ("Loss on test set:"  + str(loss) + " Accuracy on test set: " + str(accuracy))

# ===========================
# INITIAL ARCHITECTURE

#                                        TEST
#                       ----------------------------------------
# Baseline:             Accuracy = 0.920472 / Loss = 0.267226
# Batch Size 64:        Accuracy = 0.931490 / Loss = 0.24238465
# Learning Rate 0.01:   Accuracy = 0.920673 / Loss = 0.278865
# Momentum 0.9:         Accuracy = 0.859375 / Loss = 0.541848
# Leaky ReLU:           Accuracy = 0.920773 / Loss = 0.2689

# ============================
# HIDDEN REPRESENTATION WITH 128 NEURONS

#                           VALIDATION              TEST
#                       ----------------------------------------
# Baseline:             Accuracy = 0.978275   /   0.977163
# Batch Size 64:        Accuracy = 0.976983   /   0.97435
# Learning Rate 0.01:   Accuracy = 0.955047   /   0.942708
# Momentum 0.9:         Accuracy = 0.963402   /   0.962139
# Leaky ReLU:           Accuracy = 0.978944   /   0.977263

-------------
Epoch 1:



100%|██████████| 1687/1687 [00:00<00:00, 1855.88it/s]


Train loss: 0.366998 | Train accuracy: 0.897025


100%|██████████| 187/187 [00:00<00:00, 5053.80it/s]


Val loss:   0.179281 | Val accuracy:   0.947527
-------------
Epoch 2:



100%|██████████| 1687/1687 [00:00<00:00, 1815.93it/s]


Train loss: 0.175322 | Train accuracy: 0.948818


100%|██████████| 187/187 [00:00<00:00, 4794.60it/s]


Val loss:   0.126170 | Val accuracy:   0.966076
-------------
Epoch 3:



100%|██████████| 1687/1687 [00:00<00:00, 1819.84it/s]


Train loss: 0.123239 | Train accuracy: 0.965230


100%|██████████| 187/187 [00:00<00:00, 4794.81it/s]


Val loss:   0.104606 | Val accuracy:   0.970922
-------------
Epoch 4:



100%|██████████| 1687/1687 [00:00<00:00, 1757.29it/s]


Train loss: 0.095654 | Train accuracy: 0.973085


100%|██████████| 187/187 [00:00<00:00, 4921.19it/s]


Val loss:   0.092678 | Val accuracy:   0.973095
-------------
Epoch 5:



100%|██████████| 1687/1687 [00:00<00:00, 1783.30it/s]


Train loss: 0.077786 | Train accuracy: 0.977882


100%|██████████| 187/187 [00:00<00:00, 4794.69it/s]


Val loss:   0.084781 | Val accuracy:   0.975434
-------------
Epoch 6:



100%|██████████| 1687/1687 [00:00<00:00, 1810.08it/s]


Train loss: 0.065020 | Train accuracy: 0.981884


100%|██████████| 187/187 [00:00<00:00, 5054.16it/s]


Val loss:   0.079922 | Val accuracy:   0.977106
-------------
Epoch 7:



100%|██████████| 1687/1687 [00:00<00:00, 1821.82it/s]


Train loss: 0.055229 | Train accuracy: 0.984903


100%|██████████| 187/187 [00:00<00:00, 4794.78it/s]


Val loss:   0.076733 | Val accuracy:   0.976604
-------------
Epoch 8:



100%|██████████| 1687/1687 [00:00<00:00, 1779.53it/s]


Train loss: 0.047383 | Train accuracy: 0.987478


100%|██████████| 187/187 [00:00<00:00, 4794.75it/s]


Val loss:   0.074332 | Val accuracy:   0.977607
-------------
Epoch 9:



100%|██████████| 1687/1687 [00:00<00:00, 1753.63it/s]


Train loss: 0.040840 | Train accuracy: 0.989256


100%|██████████| 187/187 [00:00<00:00, 4794.66it/s]


Val loss:   0.072652 | Val accuracy:   0.978610
-------------
Epoch 10:



100%|██████████| 1687/1687 [00:00<00:00, 1759.12it/s]


Train loss: 0.035235 | Train accuracy: 0.991146


100%|██████████| 187/187 [00:00<00:00, 5342.92it/s]


Val loss:   0.070931 | Val accuracy:   0.978944


100%|██████████| 312/312 [00:00<00:00, 5114.76it/s]

Loss on test set:0.07416471567711806 Accuracy on test set: 0.9772636217948718





### 9. Convolutional Neural Networks

#### Convolutional Neural Networks

In [None]:
import _pickle as c_pickle, gzip
import numpy as np
from tqdm import tqdm
import torch
import torch.autograd as autograd
import torch.nn.functional as F
import torch.nn as nn
import sys

sys.path.append("..")
from mnist.utils  import *
from mnist.part2_mnist.train_utils import batchify_data, run_epoch, train_model, Flatten

# Specify seed for deterministic behavior, then shuffle. Do not change seed for official submissions to edx
np.random.seed(12321)  # for reproducibility
torch.manual_seed(12321)

# Load the dataset
num_classes = 10
X_train, y_train, X_test, y_test = get_MNIST_data()

# We need to rehape the data back into a 1x28x28 image
X_train = np.reshape(X_train, (X_train.shape[0], 1, 28, 28))
X_test = np.reshape(X_test, (X_test.shape[0], 1, 28, 28))

# Split into train and dev
dev_split_index = int(9 * len(X_train) / 10)
X_dev = X_train[dev_split_index:]
y_dev = y_train[dev_split_index:]
X_train = X_train[:dev_split_index]
y_train = y_train[:dev_split_index]

permutation = np.array([i for i in range(len(X_train))])
np.random.shuffle(permutation)
X_train = [X_train[i] for i in permutation]
y_train = [y_train[i] for i in permutation]

# Split dataset into batches
batch_size = 32
train_batches = batchify_data(X_train, y_train, batch_size)
dev_batches = batchify_data(X_dev, y_dev, batch_size)
test_batches = batchify_data(X_test, y_test, batch_size)

#################################
## Model specification TODO
model = nn.Sequential(
            nn.Conv2d(1, 32, (3, 3)),         # Channels: 1 (Monochrome image). 32 Image channels
            nn.ReLU(),
            nn.MaxPool2d((2, 2)),             # Image reduced from 28x28 to 
            nn.Conv2d(32, 64, (3, 3)),        # The last Conv2d layer outputs 32 image channels. Here they are expanded to 64 channels
            nn.ReLU(),
            nn.MaxPool2d((2, 2)),
            nn.Flatten(),
            nn.Linear(1600, 128),              # Input
            nn.Dropout(0.5),
            nn.Linear(128, 10),
        )
##################################

train_model(train_batches, dev_batches, model, nesterov=True)

## Evaluate the model on test data
loss, accuracy = run_epoch(test_batches, model.eval(), None)

print ("Loss on test set:"  + str(loss) + " Accuracy on test set: " + str(accuracy))


### 10. Overlapping, multi-digit MNIST

#### Fully Connected Network

In [None]:
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from mnist.part2_twodigit.train_utils import batchify_data, run_epoch, train_model, Flatten
import mnist.part2_twodigit.utils_multiMNIST as U
path_to_data_dir = './mnist/Datasets/'
use_mini_dataset = True

batch_size = 64
nb_classes = 10
nb_epoch = 30
num_classes = 10
img_rows, img_cols = 42, 28 # input image dimensions

class MLP(nn.Module):

    def __init__(self, input_dimension):
        super(MLP, self).__init__()

        self.flatten = Flatten()
        self.linear1 = nn.Linear(input_dimension, 64)

        # 20 output classes (Pairs of 10 possible digits different digits)
        self.linear2 = nn.Linear(64, 20)

    def forward(self, x):
        xf = self.flatten(x)

        # A ReLu activation function because... why not?
        xl1 = self.linear1(xf)

        # You need to use softmax because its a multi-class classification problem
        xl2 = self.linear2(xl1)

        # Re-structure the output as two separate variables
        out_first_digit = xl2[:, :10]
        out_second_digit = xl2[:, 10:]

        return out_first_digit, out_second_digit

def main():
    X_train, y_train, X_test, y_test = U.get_data(path_to_data_dir, use_mini_dataset)
    print(y_train[0].shape, y_train[1].shape)

    # Split into train and dev
    dev_split_index = int(9 * len(X_train) / 10)
    X_dev = X_train[dev_split_index:]
    y_dev = [y_train[0][dev_split_index:], y_train[1][dev_split_index:]]
    X_train = X_train[:dev_split_index]
    y_train = [y_train[0][:dev_split_index], y_train[1][:dev_split_index]]

    permutation = np.array([i for i in range(len(X_train))])
    np.random.shuffle(permutation)
    X_train = [X_train[i] for i in permutation]
    y_train = [[y_train[0][i] for i in permutation], [y_train[1][i] for i in permutation]]

    # Split dataset into batches
    train_batches = batchify_data(X_train, y_train, batch_size)
    dev_batches = batchify_data(X_dev, y_dev, batch_size)
    test_batches = batchify_data(X_test, y_test, batch_size)

    # Load model
    input_dimension = img_rows * img_cols
    model = MLP(input_dimension) # TODO add proper layers to MLP class above

    # Train
    train_model(train_batches, dev_batches, model)

    ## Evaluate the model on test data
    loss, acc = run_epoch(test_batches, model.eval(), None)
    print('Test loss1: {:.6f}  accuracy1: {:.6f}  loss2: {:.6f}   accuracy2: {:.6f}'.format(loss[0], acc[0], loss[1], acc[1]))

np.random.seed(12321)  # for reproducibility
torch.manual_seed(12321)  # for reproducibility
main()

#### Convolutional Model

In [1]:
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from mnist.part2_twodigit.train_utils import batchify_data, run_epoch, train_model, Flatten
import mnist.part2_twodigit.utils_multiMNIST as U
path_to_data_dir = './mnist/Datasets/'
use_mini_dataset = True

batch_size = 64
nb_classes = 10
nb_epoch = 30
num_classes = 10
img_rows, img_cols = 42, 28 # input image dimensions

class CNN(nn.Module):

    def __init__(self, input_dimension):
        super(CNN, self).__init__()
        self.conv_1 = nn.Conv2d(1, 32, (3,3))
        self.relu = nn.ReLU()
        self.maxpool = nn.MaxPool2d((2, 2))
        self.conv_2 = nn.Conv2d(32, 64, (3, 3))
        self.flatten = Flatten()
        self.linear1 = nn.Linear(2880, 64)
        self.dropout = nn.Dropout(p = 0.5)
        self.linear2 = nn.Linear(64, 20)

    def forward(self, x):

        # TODO use model layers to predict the two digits

        x = self.conv_1(x)
        x = self.relu(x)
        x = self.maxpool(x)
        x = self.conv_2(x)
        x = self.relu(x)
        x = self.maxpool(x)
        x = self.flatten(x)
        x = self.linear1(x)
        x = self.dropout(x)
        x = self.linear2(x)
        out_first_digit = x[:, :10]
        out_second_digit = x[:, 10:]
        return out_first_digit, out_second_digit

def main():
    X_train, y_train, X_test, y_test = U.get_data(path_to_data_dir, use_mini_dataset)

    # Split into train and dev
    dev_split_index = int(9 * len(X_train) / 10)
    X_dev = X_train[dev_split_index:]
    y_dev = [y_train[0][dev_split_index:], y_train[1][dev_split_index:]]
    X_train = X_train[:dev_split_index]
    y_train = [y_train[0][:dev_split_index], y_train[1][:dev_split_index]]

    permutation = np.array([i for i in range(len(X_train))])
    np.random.shuffle(permutation)
    X_train = [X_train[i] for i in permutation]
    y_train = [[y_train[0][i] for i in permutation], [y_train[1][i] for i in permutation]]

    # Split dataset into batches
    train_batches = batchify_data(X_train, y_train, batch_size)
    dev_batches = batchify_data(X_dev, y_dev, batch_size)
    test_batches = batchify_data(X_test, y_test, batch_size)

    # Load model
    input_dimension = img_rows * img_cols
    model = CNN(input_dimension) # TODO add proper layers to CNN class above

    # Train
    train_model(train_batches, dev_batches, model)

    ## Evaluate the model on test data
    loss, acc = run_epoch(test_batches, model.eval(), None)
    print('Test loss1: {:.6f}  accuracy1: {:.6f}  loss2: {:.6f}   accuracy2: {:.6f}'.format(loss[0], acc[0], loss[1], acc[1]))

if __name__ == '__main__':
    # Specify seed for deterministic behavior, then shuffle. Do not change seed for official submissions to edx
    np.random.seed(12321)  # for reproducibility
    torch.manual_seed(12321)  # for reproducibility
    main()


  'x': torch.tensor(x_data[i:i + batch_size],


-------------
Epoch 1:



100%|██████████| 562/562 [00:15<00:00, 36.28it/s]


Train | loss1: 0.821744  accuracy1: 0.731067 | loss2: 0.868528  accuracy2: 0.705488


100%|██████████| 62/62 [00:00<00:00, 87.32it/s]


Valid | loss1: 0.212937  accuracy1: 0.936744 | loss2: 0.249888  accuracy2: 0.922379
-------------
Epoch 2:



100%|██████████| 562/562 [00:15<00:00, 36.99it/s]


Train | loss1: 0.277024  accuracy1: 0.912784 | loss2: 0.326594  accuracy2: 0.891070


100%|██████████| 62/62 [00:00<00:00, 80.10it/s]


Valid | loss1: 0.140246  accuracy1: 0.956401 | loss2: 0.160948  accuracy2: 0.947329
-------------
Epoch 3:



100%|██████████| 562/562 [00:15<00:00, 36.34it/s]


Train | loss1: 0.202184  accuracy1: 0.938000 | loss2: 0.244636  accuracy2: 0.919512


100%|██████████| 62/62 [00:00<00:00, 86.71it/s]


Valid | loss1: 0.124825  accuracy1: 0.960433 | loss2: 0.130571  accuracy2: 0.959425
-------------
Epoch 4:



100%|██████████| 562/562 [00:15<00:00, 36.69it/s]


Train | loss1: 0.162735  accuracy1: 0.951151 | loss2: 0.197964  accuracy2: 0.934831


100%|██████████| 62/62 [00:00<00:00, 91.04it/s]


Valid | loss1: 0.096605  accuracy1: 0.971018 | loss2: 0.104877  accuracy2: 0.968246
-------------
Epoch 5:



100%|██████████| 562/562 [00:15<00:00, 36.93it/s]


Train | loss1: 0.135950  accuracy1: 0.957935 | loss2: 0.168082  accuracy2: 0.945257


100%|██████████| 62/62 [00:00<00:00, 87.45it/s]


Valid | loss1: 0.091181  accuracy1: 0.973034 | loss2: 0.089136  accuracy2: 0.970514
-------------
Epoch 6:



100%|██████████| 562/562 [00:14<00:00, 37.48it/s]


Train | loss1: 0.115992  accuracy1: 0.964246 | loss2: 0.148391  accuracy2: 0.952541


100%|██████████| 62/62 [00:00<00:00, 85.99it/s]


Valid | loss1: 0.088006  accuracy1: 0.973286 | loss2: 0.081496  accuracy2: 0.971018
-------------
Epoch 7:



100%|██████████| 562/562 [00:14<00:00, 37.88it/s]


Train | loss1: 0.103298  accuracy1: 0.967749 | loss2: 0.129339  accuracy2: 0.957073


100%|██████████| 62/62 [00:00<00:00, 86.96it/s]


Valid | loss1: 0.078259  accuracy1: 0.976058 | loss2: 0.078203  accuracy2: 0.973034
-------------
Epoch 8:



100%|██████████| 562/562 [00:15<00:00, 36.09it/s]


Train | loss1: 0.090895  accuracy1: 0.971919 | loss2: 0.119039  accuracy2: 0.960854


100%|██████████| 62/62 [00:01<00:00, 40.82it/s]


Valid | loss1: 0.074605  accuracy1: 0.977319 | loss2: 0.072699  accuracy2: 0.975806
-------------
Epoch 9:



100%|██████████| 562/562 [00:18<00:00, 29.69it/s]


Train | loss1: 0.082930  accuracy1: 0.973532 | loss2: 0.106901  accuracy2: 0.964746


100%|██████████| 62/62 [00:00<00:00, 81.47it/s]


Valid | loss1: 0.074024  accuracy1: 0.977067 | loss2: 0.072612  accuracy2: 0.975554
-------------
Epoch 10:



100%|██████████| 562/562 [00:15<00:00, 35.14it/s]


Train | loss1: 0.077913  accuracy1: 0.976340 | loss2: 0.097153  accuracy2: 0.967388


100%|██████████| 62/62 [00:00<00:00, 86.59it/s]


Valid | loss1: 0.069849  accuracy1: 0.980847 | loss2: 0.067006  accuracy2: 0.978831
-------------
Epoch 11:



100%|██████████| 562/562 [00:15<00:00, 36.36it/s]


Train | loss1: 0.069537  accuracy1: 0.977730 | loss2: 0.089413  accuracy2: 0.969501


100%|██████████| 62/62 [00:00<00:00, 86.47it/s]


Valid | loss1: 0.072650  accuracy1: 0.977823 | loss2: 0.070378  accuracy2: 0.978327
-------------
Epoch 12:



100%|██████████| 562/562 [00:15<00:00, 36.54it/s]


Train | loss1: 0.063722  accuracy1: 0.980038 | loss2: 0.082481  accuracy2: 0.972337


100%|██████████| 62/62 [00:00<00:00, 82.89it/s]


Valid | loss1: 0.071792  accuracy1: 0.978579 | loss2: 0.068295  accuracy2: 0.976310
-------------
Epoch 13:



100%|██████████| 562/562 [00:14<00:00, 38.50it/s]


Train | loss1: 0.059943  accuracy1: 0.980399 | loss2: 0.075474  accuracy2: 0.975117


100%|██████████| 62/62 [00:00<00:00, 90.91it/s]


Valid | loss1: 0.070934  accuracy1: 0.978327 | loss2: 0.057146  accuracy2: 0.980847
-------------
Epoch 14:



100%|██████████| 562/562 [00:14<00:00, 39.73it/s]


Train | loss1: 0.054315  accuracy1: 0.982874 | loss2: 0.070090  accuracy2: 0.976284


100%|██████████| 62/62 [00:00<00:00, 92.40it/s]


Valid | loss1: 0.072688  accuracy1: 0.979587 | loss2: 0.063578  accuracy2: 0.979083
-------------
Epoch 15:



100%|██████████| 562/562 [00:14<00:00, 39.98it/s]


Train | loss1: 0.052298  accuracy1: 0.983402 | loss2: 0.067280  accuracy2: 0.977035


100%|██████████| 62/62 [00:00<00:00, 93.66it/s]


Valid | loss1: 0.073164  accuracy1: 0.980091 | loss2: 0.059399  accuracy2: 0.980595
-------------
Epoch 16:



100%|██████████| 562/562 [00:14<00:00, 39.33it/s]


Train | loss1: 0.047330  accuracy1: 0.984987 | loss2: 0.060950  accuracy2: 0.978509


100%|██████████| 62/62 [00:00<00:00, 89.99it/s]


Valid | loss1: 0.075289  accuracy1: 0.979083 | loss2: 0.061226  accuracy2: 0.979335
-------------
Epoch 17:



100%|██████████| 562/562 [00:14<00:00, 37.47it/s]


Train | loss1: 0.043419  accuracy1: 0.985459 | loss2: 0.059985  accuracy2: 0.979148


100%|██████████| 62/62 [00:00<00:00, 81.58it/s]


Valid | loss1: 0.065354  accuracy1: 0.983115 | loss2: 0.058397  accuracy2: 0.980847
-------------
Epoch 18:



100%|██████████| 562/562 [00:15<00:00, 37.40it/s]


Train | loss1: 0.041996  accuracy1: 0.986015 | loss2: 0.055111  accuracy2: 0.980955


100%|██████████| 62/62 [00:00<00:00, 84.12it/s]


Valid | loss1: 0.066594  accuracy1: 0.982611 | loss2: 0.060197  accuracy2: 0.980847
-------------
Epoch 19:



100%|██████████| 562/562 [00:15<00:00, 37.31it/s]


Train | loss1: 0.038634  accuracy1: 0.987544 | loss2: 0.051527  accuracy2: 0.982707


100%|██████████| 62/62 [00:00<00:00, 86.59it/s]


Valid | loss1: 0.070924  accuracy1: 0.981351 | loss2: 0.057170  accuracy2: 0.979335
-------------
Epoch 20:



100%|██████████| 562/562 [00:14<00:00, 38.50it/s]


Train | loss1: 0.037148  accuracy1: 0.987906 | loss2: 0.051058  accuracy2: 0.982596


100%|██████████| 62/62 [00:00<00:00, 89.86it/s]


Valid | loss1: 0.069249  accuracy1: 0.981855 | loss2: 0.056713  accuracy2: 0.981351
-------------
Epoch 21:



100%|██████████| 562/562 [00:14<00:00, 38.89it/s]


Train | loss1: 0.034849  accuracy1: 0.988601 | loss2: 0.044885  accuracy2: 0.983819


100%|██████████| 62/62 [00:00<00:00, 91.31it/s]


Valid | loss1: 0.081609  accuracy1: 0.979587 | loss2: 0.060057  accuracy2: 0.979839
-------------
Epoch 22:



100%|██████████| 562/562 [00:14<00:00, 38.45it/s]


Train | loss1: 0.033033  accuracy1: 0.989268 | loss2: 0.042783  accuracy2: 0.984959


100%|██████████| 62/62 [00:00<00:00, 96.42it/s]


Valid | loss1: 0.069787  accuracy1: 0.983367 | loss2: 0.052537  accuracy2: 0.983115
-------------
Epoch 23:



100%|██████████| 562/562 [00:13<00:00, 41.02it/s]


Train | loss1: 0.029561  accuracy1: 0.990631 | loss2: 0.039795  accuracy2: 0.985960


100%|██████████| 62/62 [00:00<00:00, 89.72it/s]


Valid | loss1: 0.069037  accuracy1: 0.983619 | loss2: 0.054760  accuracy2: 0.981099
-------------
Epoch 24:



100%|██████████| 562/562 [00:14<00:00, 39.95it/s]


Train | loss1: 0.028170  accuracy1: 0.990909 | loss2: 0.036678  accuracy2: 0.986988


100%|██████████| 62/62 [00:00<00:00, 97.33it/s]


Valid | loss1: 0.077449  accuracy1: 0.981855 | loss2: 0.056224  accuracy2: 0.982611
-------------
Epoch 25:



100%|██████████| 562/562 [00:13<00:00, 40.87it/s]


Train | loss1: 0.026189  accuracy1: 0.990964 | loss2: 0.040427  accuracy2: 0.985737


100%|██████████| 62/62 [00:00<00:00, 94.51it/s]


Valid | loss1: 0.080519  accuracy1: 0.981603 | loss2: 0.049570  accuracy2: 0.983619
-------------
Epoch 26:



100%|██████████| 562/562 [00:13<00:00, 41.77it/s]


Train | loss1: 0.026640  accuracy1: 0.990881 | loss2: 0.035259  accuracy2: 0.987016


100%|██████████| 62/62 [00:00<00:00, 96.12it/s] 


Valid | loss1: 0.073372  accuracy1: 0.983619 | loss2: 0.050028  accuracy2: 0.982863
-------------
Epoch 27:



100%|██████████| 562/562 [00:13<00:00, 42.87it/s]


Train | loss1: 0.024795  accuracy1: 0.991687 | loss2: 0.030134  accuracy2: 0.989713


100%|██████████| 62/62 [00:00<00:00, 99.04it/s] 


Valid | loss1: 0.077170  accuracy1: 0.983367 | loss2: 0.053747  accuracy2: 0.983871
-------------
Epoch 28:



100%|██████████| 562/562 [00:15<00:00, 36.85it/s]


Train | loss1: 0.024006  accuracy1: 0.991576 | loss2: 0.031286  accuracy2: 0.988990


100%|██████████| 62/62 [00:00<00:00, 80.62it/s]


Valid | loss1: 0.073595  accuracy1: 0.983619 | loss2: 0.048271  accuracy2: 0.984375
-------------
Epoch 29:



100%|██████████| 562/562 [00:16<00:00, 34.49it/s]


Train | loss1: 0.022173  accuracy1: 0.992215 | loss2: 0.030829  accuracy2: 0.988573


100%|██████████| 62/62 [00:00<00:00, 77.99it/s]


Valid | loss1: 0.073494  accuracy1: 0.984375 | loss2: 0.055236  accuracy2: 0.981603
-------------
Epoch 30:



100%|██████████| 562/562 [00:16<00:00, 35.08it/s]


Train | loss1: 0.023195  accuracy1: 0.991993 | loss2: 0.029851  accuracy2: 0.989046


100%|██████████| 62/62 [00:00<00:00, 84.47it/s]


Valid | loss1: 0.076826  accuracy1: 0.984123 | loss2: 0.055838  accuracy2: 0.982611


100%|██████████| 62/62 [00:00<00:00, 85.40it/s]

Test loss1: 0.074613  accuracy1: 0.981099  loss2: 0.088537   accuracy2: 0.974294



