# Workshop 4: Training and Evaluation

In [None]:
import numpy as np
import random

def get_data(path):
    f = open(path, 'r')
    
    lines = f.readlines()
    
    training_images = np.zeros((len(lines), 784))
    training_labels = np.zeros((len(lines), 10))
    index = 0
    for line in lines:
        line = line.strip()
        label = int(line[0])
        training_images[index, :] = np.fromstring(line[2:], dtype=int, sep=',')
        training_labels[index, label - 1] = 1.0
        index += 1
        

    f.close()
    
    return training_images / 255, training_labels

def sigmoid(x):
    return np.where(x >= 0, 
                    1 / (1 + np.exp(-x)), 
                    np.exp(x) / (1 + np.exp(x)))

Last week, we implemented backpropagation and gradient descent for our neural network. This week, we will finally put everything together and train our network to read hand-written digits.

## Training / Validation

If you recall, we will need to show our network many different examples of numbers in order for it to learn effectively. The images we will show our network is known as the *training set*. We will also want some way to evaluate the performance of our network after it learns. For this, we will need a seperate set of data that the network has never seen before. This is the *validation set*. The training set should be larger than the validation set. In our case, there are 60000 image, label pairs in the training set, and 10000 image, label pairs in the validation set.

The training process is simple. For every image in the training set, forward pass the image into the network, then backpropagate and update the weights. When we make it through the entire training set, we say that we have completed one *epoch*. Typically, we will need to train for several epochs -- that is to say, we often need to run through the training set multiple times.

We also want to evaluate our network. For every image in the validation set, we will forward pass the image into the network. Then, we will look at the output, a vector of length 10 -- our network network is trained to place a 1 in the correct entry of the vector, and a 0 in all other entries. Therefore, the maximal entry in the network's output is the number that the network thinks the given image represents. We then compare the network's answer to the correct answer, the given label. After observing every image in the validation set, we note the proportion of the samples that the network got correct.

Sometimes, it can be helpful to run through the validation set after every epoch, to see how the network's performance improves over time. This can be helpful for deciding how many epochs you want to run your network for.

Also, in order to train our network effectively, we will need to pick an appropriate learning rate. If the learning rate is too large, the network will take too large steps during the gradient descent, and it will be ineffective and unstable. If the learning rate is too small, the training could be slow. We will find a good learning rate through trial and error. Remember, the learning rate should be a very small positive number, typically less than 1.

Try to write your own training and validation methods. The code for forward pass and backpropagation is provided for you. See if you can get a validation accuracy of over 90 percent.

In [None]:
class NeuralNetwork():
    """
    A Fully Connected Neural Network. There are 784 input layer nodes, 12 hidden layer nodes, and 10 output layer
    nodes.
    """
    def __init__(self):
        rng = np.random.default_rng(888)
        
        # Arrays to hold node values
        self.N = np.zeros((784, ))
        self.H = np.zeros((12, ))
        self.Z = np.zeros((10, ))
        
        # Arrays to hold weight values (randomly initialized between -1 and 1)
        self.W = 2 * rng.random((784, 12)) - 1
        self.V = 2 * rng.random((12, 10)) - 1
        
        # Arrays to hold biases for hidden and output nodes
        self.B = 2 * rng.random(12) - 1
        self.C = 2 * rng.random(10) - 1
        
        
        # Arrays to store values for calculating backprop efficiently
        self.H_before = np.zeros((12, ))
        self.Z_before = np.zeros((10, ))
        
        self.W_grad = np.zeros((784, 12))
        self.V_grad = np.zeros((12, 10))
        self.B_grad = np.zeros((12, ))
        self.C_grad = np.zeros((10, ))
        
        
    def forward(self, x):
        self.N = x
        self.H_before = self.N @ self.W + self.B
        self.H = np.tanh(self.H_before)
        self.Z_before = self.H @ self.V + self.C
        self.Z = 1 / (1 + np.exp(-1 * self.Z_before))

        
    def calculate_loss(self, x, y):
        out = self.forward(x)
        loss = np.sum((self.Z - y) ** 2)
        return loss

        
    def backpropagate(self, label):
        dZ = -2 * (label - self.Z)
        dZ_before = dZ * sigmoid(self.Z_before) * (1 - sigmoid(self.Z_before))
        self.V_grad = np.outer(self.H, dZ_before)
        self.C_grad = dZ_before
        
        dH = dZ_before @ np.self.V.T
        dH_before = dH * (1 / (np.cosh(self.H_before) ** 2))
        self.W_grad = np.outer(self.N, dH_before)
        self.B_grad = dH_before
        
    
    def update(self, lr):
        self.V -= lr * self.V_grad
        self.C -= lr * self.C_grad
        self.W -= lr * self.W_grad
        self.B -= lr * self.B_grad
        
        
    def train(self):
        """
        Write a training method for the neural network. Add whatever method parameters you think you might need.
        """
                
            
    def evaluate(self, val_images, val_labels):
        """
        Write an evaluation method for the neural network. Add whatever method parameters you think you might need.
        """
            


Test your code below

In [None]:
training_images, training_labels = get_data("../data/mnist_train.csv")
val_images, val_labels = get_data("../data/mnist_test.csv")

## Challenge

After succesfully training, you may realize that our network has a difficult time getting accuracies higher than low nineties. From the testing I've done, that seems to be the best our basic network can do -- however, it has been shown that neural networks are capable of reaching accuracies over 99 percent for the MNIST dataset. What changes can we make to the structure of our network to accomplish this? More nodes in the hidden layer? Add another hidden layer? Perhaps there are even more sophisticated methods...

Try to create a neural network that can achieve a validation accuracy of over 96 percent. It's probably easiest if you reuse code from the previous network. Feel free to try any method you can think of or find on the internet.