# Workshop 1: Using Numpy to Implement Forward Pass.

In [3]:
import numpy as np

## Numpy Arrays

In order to implement a Neural Network, we will need a convenient way to perform matrix operations. 

Thankfully, Numpy provides a simple way to create and work with matrices, in the form ndarrays (n-dimensional arrays). We can use ndarrays to represent vectors, matrices, or even tensors (higher dimensional matrices). Before we get into creating a Neural Network, let's briefly learn the basics of ndarrays. We'll start by creating a simple array.

In [4]:
A = np.array([1, 2, 3, 4])

print(A)

[1 2 3 4]


Remember, we can think of ndarrays similarly to the way we think of matrices or vectors. What do you think are the dimensions of A? Is it 4 x 1? Something else? Discuss among your group.

In [7]:
# We can find the dimensions of A like so:
dim_A = A.shape

print("Dimensions of A: " + str(dim_A))

Dimensions of A: (4,)


As it turns out, A is neither 1 x 4 or 4 x 1. Instead, it is a one-dimensional array, or flattened array, of dimenson 4. Whenever we want to represent vectors in Numpy, we will use one-dimensional arrays. For operations like matrix multiplication, a one-dimensional array can fulfill the role of either a 4 x 1 or 1 x 4 matrix, so they are quite versatile.

You can index into one-dimensional arrays to grab individual indices or slices. You can also modify entries of arrays:

In [4]:
x = A[0]
print(x)

a_slice = A[1:3]
print(a_slice)

print("Before modifying: {}".format(A))
A[2] = 7
print("After modifying: {}".format(A))

1
[2 3]
Before modifying: [1 2 3 4]
After modifying: [1 2 7 4]


Let's make another array.

In [9]:
B = np.array([ [1, 2, 3, 4],
               [5, 6, 7, 8],
               [9, 10, 11, 12],
               [13, 14, 15, 16] ])

print(B)

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]
 [13 14 15 16]]


What are the dimensions of B? Find the dimensions and print them below.

In [10]:
### TODO: print the dimensions of B below
print(B.shape)
###

(4, 4)


B is a two-dimensional array. These are useful for representing matrices, and they behave similarly to matrices in consideration to most operations.

We can also index into two-dimensional arrays. However, we have more options, as there are two dimensions of indices to index into as opposed to one. Look at some of the following examples:

In [7]:
y = B[0, 2]
print("A single entry: {}".format(y))

column1 = B[:, 0]
print("The first column: {}".format(column1))

row1 = B[0, :]
row2 = B[1]    # we can opt to leave out the second index, numpy will fetch everything in the 2nd dimension
print("The first row: {}".format(row1))
print("The second row: {}".format(row2))

crop = B[0:2, 1:3]
print("A crop from B:\n{}".format(crop))

A single entry: 3
The first column: [ 1  5  9 13]
The first row: [1 2 3 4]
The second row: [5 6 7 8]
A crop from B:
[[2 3]
 [6 7]]


In [11]:
### TODO: print one slice containing the last three entries of the third column of B
print(B[1:, 2])
###

[ 7 11 15]


Observe that indexing individual entries will yield scalars; indexing rows or columns will yield one-dimensional arrays; and indexing crops or submatrices will yield a two-dimensional array.

We see that scalar entries make up one-dimensional arrays, which can then be used to make up two-dimensional arrays. It is also true that two-dimensional arrays can be used to make up three-dimensional arrays. In general, n-dimensional arrays can be thought of as a collection of (n - 1)-dimensional arrays.

We can represent matrix multiplication and matrix-vector multiplication with np.dot().

In [9]:
right_product = np.dot(A, B)
print(right_product)
print()

left_product = np.dot(B, A)
print(left_product)

[126 140 154 168]

[ 42  98 154 210]


Similar to matrix multiplication, the order in which you enter arguments into np.dot() is very important.

In [14]:
### TODO: Print out the product between the upper-leftmost 3 x 3 crop of B and the lower-rightmost 3 x 3 crop of B
###       Then, replace the upper-rightmost 3 x 3 section of B with that product, and print B
upper_lower_product = np.dot(B[:3,:3], B[1:,1:])
print(upper_lower_product)
B[:3,:3] = upper_lower_product
print(B)
###

[[ 68  74  80]
 [188 206 224]
 [308 338 368]]
[[ 68  74  80   4]
 [188 206 224   8]
 [308 338 368  12]
 [ 13  14  15  16]]


There are many ways to instantiate ndarrays in numpy. Here are two common ways:

In [11]:
zero_array = np.zeros((2, 2))
print(zero_array)
print()

random_array = np.random.rand(4, 3)  # fills array with values between 0 and 1
print(random_array)

[[0. 0.]
 [0. 0.]]

[[0.00210941 0.03980142 0.25715043]
 [0.22498141 0.22288702 0.80170878]
 [0.28910275 0.5658836  0.64027261]
 [0.46922218 0.14394224 0.80510457]]


Both of these ways of instantiating arrays involve entering the desired dimensions into their respective functions. However, np.zeros() requires the dimensions to be contained inside parentheses, while np.random.rand() wants the dimensions to be entered as separate arguments. These small details are always difficult to remember, so you may often have to refer to the [numpy documentation](https://docs.scipy.org/doc/numpy/reference/index.html "Numpy Documentation").

In [16]:
### TODO: Constuct a random 5 x 4 array. Matrix multiply this with B and then print the result.
###       Remember, order matters!
five_by_four = np.random.rand(5, 4)
print(np.dot(five_by_four, B))
###

[[270.89916639 296.9146038  322.93004121  15.28328081]
 [346.3348523  379.0421235  411.74939471  28.81515358]
 [108.20744301 117.97257038 127.73769774  19.16875258]
 [318.32631801 349.0067806  379.68724318  17.92003047]
 [366.84520179 401.98586296 437.12652412  20.0782226 ]]


## Forward Pass

Now that we know some basics of Numpy, we can begin to implement our Neural Network that will solve the MNIST problem. For today, we will only implement the forward pass.

However, before we can start writing the network, we will need the image data.

In [22]:
def get_training_data():
    f = open('../../data/mnist_train.csv', 'r')
    
    lines = f.readlines()
    
    training_images = np.zeros((len(lines), 784))
    training_labels = np.zeros((len(lines), 10))
    index = 0
    for line in lines:
        line = line.strip()
        stringlist = line.split(",")
        
        # fill the image array
        for h in range(784):
            training_images[index, h] = float(stringlist[h + 1])
        
        #fill the label array
        answer = int(stringlist[0])
        training_labels[index, answer - 1] = 1.0
        index += 1
        

    f.close()
    
    return training_images / 255, training_labels

I wrote a function that grabs the image data from the included csv file and puts it into an ndarray. When you move on to doing your own problems, you will have to learn solve tasks like these. But for now we will just use the function.

In [23]:
training_images, training_labels = get_training_data()

### TODO: Print the shapes of training_images and training_data. What do these dimensions mean? Can you explain why
###       they are of that size?
print(f"Shape of training_images: {training_images.shape}")
print(f"Shape of training_labels: {training_labels.shape}")
###

Shape of training_images: (60000, 784)
Shape of training_labels: (60000, 10)


Now that we have our data, we can create our Neural Network. We will take an object-oriented approach by creating a NeuralNetwork class.

In [24]:
class NeuralNetwork():
    """
    A Fully Connected Neural Network. There are 784 input layer nodes, 12 hidden layer nodes, and 10 output layer
    nodes.
    """
    def __init__(self):
        
        
        # First, we instantiate an array to hold all the weights from the input layer to the hidden layer.
        # Look at the line of code and discuss with your group what you think each part of it is doing.
        # Hint: You may need to google numpy.full
        self.W1 = np.full((784, 12), -1) + 2 * np.random.rand(784, 12)
        
        
        ### TODO: Initialize random values for W2, the weights from the hidden layer to the
        ###       output layer.
        self.W2 = np.full((12, 10), -1) + 2 * np.random.rand(12, 10)
        ###
        
        
        
        
        
        
        # Now, we instantiate an array to hold the biases for the hidden layer nodes
        self.B1 = np.full((12, ), -1) + 2 * np.random.rand(12)
        
        
        ### TODO: Initialize random values for B2, the biases for output layer nodes
        self.B2 = np.full((10, ), -1) + 2 * np.random.rand(10)
        ###
        

    def forward(self, x):
        """
        Given an individual input vector, forward propogate through the network.
        
        Parameters:
        x: input vector representing image data, one-dimensional vector
        """
        

        # Remember, the input vector is going to be an unrolled image -- a vector of 784 pixels
        # We can represent the multiplication of all inputs by their respective weights to nodes in one matrix
        # multiplication. We then add the biases to this result through an entrywise addition.
        # Then, we apply our activation function. For now, we will use the hyperbolic tangent function.
        # O1 is the vector of values that end up in the hidden layer nodes.
        Z1 = np.dot(x, self.W1) + self.B1
        O1 = np.tanh(Z1)
        
        
        ### TODO: Calculate O2, the vector of values that ends up in the output layer nodes.
        Z2 = np.dot(O1, self.W2) + self.B2
        O2 = np.tanh(Z2)
        ###
        
        return O2
    

Lets instantiate a Neural Network object now.

In [25]:
net = NeuralNetwork()

You can call methods using the "." operator, just like in Java. Index into training_images to get the first image, then input that image vector into net.forward(). Print the result.

In [26]:
### TODO: Index into training_images to get the first image, then input that image vector into net.forward().
###       Print the result.
net.forward(training_images[0])
###

array([-0.1656872 ,  0.6676499 ,  0.53630367,  0.63084143, -0.86111305,
        0.29571661, -0.99837726, -0.29497781,  0.97869909,  0.51368015])

Congratulations, you have successfully implemented the forward pass. Right now, the output is meaningless; however, in the future we will be able to use this to perform gradient descent. See you next week!