# (Optional) Colab Setup
If you aren't using Colab, you can delete the following code cell. This is just to help students with mounting to Google Drive to access the other .py files and downloading the data, which is a little trickier on Colab than on your local machine using Jupyter. 

In [1]:
# you will be prompted with a window asking to grant permissions
from google.colab import drive

drive.mount("/content/drive")

Mounted at /content/drive


In [97]:
# fill in the path in your Google Drive in the string below. Note: do not escape slashes or spaces
import os
datadir = "/content/assignment2"
if not os.path.exists(datadir):
  !ln -s "/content/drive/My Drive/CS444/assignment2/" $datadir
os.chdir(datadir)
!pwd

/content/drive/My Drive/CS444/assignment2


# Implement a Neural Network

This notebook contains testing code to help you develop a neural network by implementing the forward pass and backpropagation algorithm in the `models/neural_net.py` file. 

You will implement your network in the class `NeuralNetwork` inside the file `models/neural_net.py` to represent instances of the network. The network parameters are stored in the instance variable `self.params` where keys are string parameter names and values are numpy arrays.

In [98]:
import numpy as np

from models.neural_net import NeuralNetwork

# For auto-reloading external modules
# See http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

def rel_error(x, y):
    """Returns relative error"""
    return np.max(np.abs(x - y) / (np.maximum(1e-8, np.abs(x) + np.abs(y))))

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


The cell below initializes a toy dataset and corresponding model which will allow you to check your forward and backward pass by using a numeric gradient check. Note that we set a random seed for repeatable experiments.

In [99]:
input_size = 2
hidden_size = 10
num_classes = 3
num_inputs = 5
optimizer = 'SGD'


def init_toy_model(num_layers):
    """Initializes a toy model"""
    np.random.seed(0)
    hidden_sizes = [hidden_size] * (num_layers - 1)
    return NeuralNetwork(input_size, hidden_sizes, num_classes, num_layers)

def init_toy_data():
    """Initializes a toy dataset"""
    np.random.seed(0)
    X = np.random.randn(num_inputs, input_size)
    y = np.random.randn(num_inputs, num_classes)
    return X, y


# Implement forward and backward pass

The first thing you will do is implement the forward pass of your neural network. The forward pass should be implemented in the `forward` function. You can use helper functions like `linear`, `relu`, and `softmax` to help organize your code.

Next, you will implement the backward pass using the backpropagation algorithm. Backpropagation will compute the gradient of the loss with respect to the model parameters `W1`, `b1`, ... etc. Use a softmax fuction with cross entropy loss for loss calcuation. Fill in the code blocks in `NeuralNetwork.backward`. 

# Gradient  check

If you have implemented your forward pass through the network correctly, you can use the following cell to debug your backward pass with a numeric gradient check. If your backward pass has been implemented correctly, the max relative error between your analytic solution and the numeric solution should be around 1e-7 or less for all parameters.


In [129]:
from copy import deepcopy

from utils.gradient_check import eval_numerical_gradient

X, y = init_toy_data()


def f(W):
    net.forward(X)
    return net.backward(y)

for num in [2, 3]:
    net = init_toy_model(num)
    net.forward(X)
    net.backward(y)
    gradients = deepcopy(net.gradients)

    for param_name in net.params:
        param_grad_num = eval_numerical_gradient(f, net.params[param_name], verbose=False)
        print('%s max relative error: %e' % (param_name, rel_error(param_grad_num, gradients[param_name])))

loss : 1.143849676959448


ValueError: ignored

In [87]:
net = NeuralNetwork(3, [5,5], output_size = 2, num_layers = 3)


In [91]:
X_train = np.random.randint(low = 2, high = 10 , size = (5,3))
y_train = np.random.random(size = (5, 2))
print(X_train.shape)
net.forward(X_train)
net.backward(y_train)

net.params

(5, 3)


ValueError: ignored

In [92]:
net.outputs

{'X0': array([[3, 9, 5],
        [4, 7, 4],
        [3, 8, 2],
        [3, 2, 3],
        [3, 6, 8]]),
 'X1': array([[0.99999024, 0.25623199, 0.01589268, 0.91654518, 0.02372202],
        [0.999556  , 0.14241748, 0.02797859, 0.95225047, 0.16012916],
        [0.99995954, 0.15635153, 0.024448  , 0.68879676, 0.06157497],
        [0.67490171, 0.23025734, 0.19377771, 0.96122453, 0.66085873],
        [0.99880946, 0.43197337, 0.04543774, 0.99444084, 0.08628404]]),
 'X2': array([[0.6695334 , 0.43617121, 0.28224721, 0.17937867, 0.24223844],
        [0.66350115, 0.4112047 , 0.277329  , 0.19981065, 0.23196882],
        [0.61586083, 0.42839533, 0.29235528, 0.22065161, 0.2687497 ],
        [0.68843073, 0.39501315, 0.39565236, 0.30470873, 0.16569078],
        [0.70418643, 0.44612523, 0.30503705, 0.16387824, 0.19971806]]),
 'X3': array([[ 0.27882042, -0.63647013],
        [ 0.26734977, -0.64274692],
        [ 0.24400868, -0.59998966],
        [ 0.25546049, -0.75057619],
        [ 0.30218683, -0.673099

In [117]:
net.forward(X_train)
net.backward(y_train)
net.gradients

Error X.shape[1] != W.shape[0]


AssertionError: ignored

In [84]:
net.diagnose()

params are  {'W1': array([[ 0.36431721, -0.17269726, -0.32796902,  1.00760856,  0.17342863],
       [-0.77448201, -0.32038593,  0.25866302, -0.0274866 , -0.39598961],
       [ 1.10796316,  0.62341926,  0.77636748,  0.1365389 ,  0.0113708 ]]), 'b1': array([ 0.0004691 , -0.00055617, -0.00085262, -0.00032883, -0.00047243]), 'W2': array([[ 0.54235753, -0.20713248,  0.57421415, -0.06823129,  0.44586478],
       [ 0.04498326, -0.15678397,  0.34708932,  0.47221315, -1.01084691],
       [-0.65155574,  0.65682608, -0.46260763,  0.05172591, -0.51985933],
       [ 0.44188706, -0.14254766, -0.30447717,  0.17374963, -0.69894906],
       [ 0.07509001,  0.38070899, -0.6386164 ,  0.23882249, -0.49275292]]), 'b2': array([ 0.00010878, -0.00064503, -0.00010411,  0.00043693,  0.00082312]), 'W3': array([[ 0.11313823,  0.50949517],
       [-0.66732484,  0.31892178],
       [-0.10741302, -0.07908681],
       [ 0.45287277, -1.22617052],
       [ 0.85239748, -0.12023949]]), 'b3': array([ 0.00096588, -0.0009658

In [85]:
net.outputs

{'X0': array([[5, 5, 8],
        [6, 7, 4],
        [7, 3, 2],
        [6, 3, 5],
        [2, 5, 4]]),
 'X1': array([[0.9988843 , 0.92693814, 0.99724885, 0.99753707, 0.26776206],
        [0.76443765, 0.31811803, 0.95180952, 0.99836235, 0.15888736],
        [0.91901087, 0.28762574, 0.51446058, 0.9992928 , 0.51564009],
        [0.99545479, 0.7568229 , 0.93784175, 0.99871421, 0.48068273],
        [0.78194254, 0.63632643, 0.97731108, 0.91921143, 0.17129453]]),
 'X2': array([[0.59736906, 0.56536912, 0.48983065, 0.65883455, 0.13707526],
        [0.56507399, 0.58310134, 0.42645423, 0.58866381, 0.22233207],
        [0.6584532 , 0.53908351, 0.43931145, 0.5977921 , 0.24947751],
        [0.60831106, 0.58230262, 0.44755638, 0.65158017, 0.14877789],
        [0.55846418, 0.5780428 , 0.45724027, 0.62200137, 0.17799182]]),
 'X3': array([[ 0.05276767, -0.37826993],
        [ 0.08500616, -0.30829036],
        [ 0.15081408, -0.29020331],
        [ 0.05393728, -0.35646426],
        [ 0.06162368, -0.351246