# Workshop 2: Forward Pass and Loss

In [None]:
import numpy as np

def get_data(path):
    f = open(path, 'r')
    
    lines = f.readlines()
    
    training_images = np.zeros((len(lines), 784))
    training_labels = np.zeros((len(lines), 10))
    index = 0
    for line in lines:
        line = line.strip()
        label = int(line[0])
        training_images[index, :] = np.fromstring(line[2:], dtype=int, sep=',')
        training_labels[index, label - 1] = 1.0
        index += 1
        

    f.close()
    
    return training_images / 255, training_labels

training_images, training_labels = get_data("mnist_test.csv")

Last week, we briefly examined how the forward pass worked, and saw a general implementation for forward pass of a data sample. However, that exercise didn't give much insight as to what was actually going on in the neural network. This week, we will have a small explanation of forward pass (you can skip this if you already are confident in your understanding), and then we will implement forward pass step-by-step in an intuitive way.

## Forward Pass Explained

This is a picture of the network we will implement. Observe that it has 784 input layer nodes, 12 hidden layer nodes, and 10 output nodes. At the beginning, it is empty.
<img src="pics/network.png">
The goal of the forward pass is to generate values in the output layer nodes. These values will be later used to perform gradient descent. But how do we find these values in the output nodes?

First, an image is passed into the input layer:
<img src="pics/pass_1.png">
Then, these values in the input layer are used to calculate the values in the next layer, the hidden layer.
<img src="pics/pass_2.png">
Finally, the hidden layer values are used to calculate the values in the next and final layer, the output layer.
<img src="pics/pass_3.png">
As we can see, in order to calculate values in the output layer, we must first calculate the values for the layer before it. To calculate the values for that layer, we must in turn calculate the values for the layer before it. Thus, we fill the nodes starting from the beginning of the network, and then we use these nodes to fill subsequent nodes going forward in the network -- hence the name forward pass, or forward propogation.

## Filling the Hidden Layer Nodes

So what exact calculations do we need to make to fill these nodes? The values for the input layer are simple, just pass in an image vector. But what about nodes in the hidden layer -- how do we fill those? As it turns out, there is an exact formula. Here is a diagram explaining how we can fill the first node in the hidden layer, h1:
<img src="pics/FCN_close.png">
To understand this formula, it is important to know that each connection you see between two nodes has a scalar value -- the weight.  Because there are 784 input nodes and 12 hidden nodes, and each input node is connected to each hidden node, there are 784 x 12 = 9408 weights. We store these weights in array W, a 784 x 12 ndarray. To calculate the value of h1, we only need the weights from each input node to h1 (the bolded connections). This corresponds to the first column of W.

Also, each hidden node has a corresponding bias. We store these biases in B, a one-dimensional array of size 12. To calculate h1, we must use the value b1.

Why do we need to put everything into the tanh function? Think about this. If you don't know the answer, talk with people around you or ask me.

Take some time to study this diagram and ask questions until you understand this formula. Then, if you can understand how to calculate h1, what do you think the formula is for h2, the next hidden layer node? We almost the exact same formula, except we must use the weights connecting the input nodes to h2. What does this formula look like. Again, take some time to think, talk, or ask questions until you understand.

## Filling the Output Layer Nodes

Ok, so lets say that we used the formula we just thought of to calculate every value in the hidden layer. We can then use a similar process to calculate all of the values in the output layer.
<img src="pics/FCN_close2.png">
Just note that we are now using weight array V, the weights connecting the hidden nodes to output nodes, and bias array C, the biases corresponding to output nodes.

Also note that the activation function for this layer is different. Previously, we used tanh to calculate the hidden node values. To calculate output nodes, we will use sigmoid -- this is because sigmoid outputs values from 0 to 1, whereas tanh outputs values from -1 to 1.

We can then use this formula to fill all the values in the output node, thus completing forward pass.
<img src="pics/pass_3.png">

## Implementing Forward Pass

Now that we have explicit formulas for calculating node values, lets try to implement forward pass on our own.
We'll create a NeuralNetwork class to handle this. Read the comments as you go along. I've written the init function for you. Feel free to create more cells at the bottom of the worksheet to test your code whenever you wish.

In [None]:
class NeuralNetwork():
    """
    A Fully Connected Neural Network. There are 784 input layer nodes, 12 hidden layer nodes, and 10 output layer
    nodes.
    """
    def __init__(self):
        
        
        # We need to create arrays to store all of the parts of our network:
        
        # Arrays to hold node values
        self.N = np.zeros((784, ))
        self.H = np.zeros((12, ))
        self.Z = np.zeros((10, ))
        
        # Arrays to hold weight values (randomly initialized between -1 and 1)
        self.W = 2 * np.random.rand(784, 12) - 1
        self.V = 2 * np.random.rand(12, 10) - 1
        
        # Arrays to hold biases for hidden and output nodes
        self.B = 2 * np.random.rand(12) - 1
        self.C = 2 * np.random.rand(10) - 1
        

    def fill_input_nodes(self, x):
        """
        Given an image vector, fill self.N, the input node array. Remember, we just put values from the image
        into the input nodes, so this should be very easy.
        
        Parameters:
        x: input vector representing image data, one-dimensional vector
        """
        

        ### TODO: write the method
        self.N = x
        
        ###
        
    
    def calculate_hi(self, i): 
        """
        Assuming the input nodes array is full, use these values to calculate hi, the ith hidden layer node.
        Once the value is calculated, fill the corresponding entry in self.H.
        You will need to access weight array W and bias array B.
        
        Parameters:
        i: the index telling which hidden layer node to calculate
        """
        
        ### TODO: write the method
        
        
        ###
    
    def fill_hidden_nodes(self):
        """
        Use the calculate_hi method to iteratively fill every hidden layer node. This should be easy if your
        calculate_hi method works.
        
        For thought:
        Finding each hi value individually is a perfectly acceptable way of calculating node values,
        but is it the most efficient? Is there a way to calculate all hi values at once?
        """
        
        ### TODO: write the method
        
        
        ###
        
    def calculate_zi(self, i):
        """
        Assuming the hidden nodes array is full, use these values to calculate zi, the ith output layer node.
        Once the value is calculated, fill the corresponding entry in self.Z.
        You will need to access weight array V and bias array C.
        
        Parameters:
        i: the index telling which output layer node to calculate
        """
        
        ### TODO: write the method
        
        
        ###
        
    def fill_output_nodes(self):
        """
        Use the calculate_zi method to iteratively fill every output layer node. This should be easy if your
        calculate_zi method works.
        """
        
        ### TODO: write the method
        
        
        ###
    
    def forward(self, x):
        """
        Given an image vector, fill every node in the network. You have already written the necesary methods to
        complete this task, you just need to call them in the right order.
        
        Parameters:
        x: input vector representing image data, one-dimensional vector
        """
        
        ### TODO: write the method
        
        
        ###
        

    ### Challenge 1
    # So you've managed to complete the forward pass. However, the way we implemented the forward pass just now was
    # inefficient. There is actually a way to calculate values layer-by-layer instead of node-by-node. Can you
    # think of a formula that can calculate all hidden nodes at once, instead of one-at-a-time? 
    #
    # Hint: you will need to do matrix operations between the entire weight array, input array, and bias array 
    #       instead of just a column or element
    
    def fill_hidden_nodes_fast(self):
        """
        Assume the input nodes array is filled. Now, use these values to fill all hidden nodes at once.
        You will need to use self.N, self.W, and self.B.
        This should only take up a few lines
        """
        
        ### TODO: write the method
        
        
        ###
        
    def fill_output_nodes_fast(self):
        """
        Assume the hidden nodes array is filled. Now, use these values to fill all output nodes at once.
        You will need to use self.H, self.V, and self.C.
        This should only take up a few lines.
        """
        
        ### TODO: write the method
        
        
        ###
    
    def forward_fast(self, x):
        """
        Given an image vector, fill every node in the network using the more efficient methods you just wrote.
        
        Parameters:
        x: input vector representing image data, one-dimensional vector
        """
        
        ### TODO: write the method
        
        
        ###
        
    
    ### Challenge 2
    # Ok, so you've completed forward pass for real this time. But what do we do with this information?
    # If we want our network to learn anything, we'll need to use the outputs from forward pass to generate
    # a loss, or error, that measures how far away from the target our network was.
    #
    # The process is simple: forward pass an image through the network, then read the output.
    # Compare this output to the label corresponding to that image. These will both be one-dimensional
    # arrays of the same size. The most simple loss function is just calculating the distance between these
    # two vectors. 
    
    def calculate_loss(self, x, y):
        """
        Given an image vector and its corresponding label vector, calculate the loss.
        
        Parameters:
        x: input vector representing image data, one-dimensional vector
        y: input vector representing label, one-dimensional vector. Has a 1 in the position corresponding to the
           correct answer, and 0s everywhere else.
        """
        
        ### TODO: write the method
        
        
        ###

Test your code here in the space below. Feel free to make as many additional cells as you wish.