# Building a Neural Network

Please note that I've already completed the "Neural Networks and Deep Learning" course from Andrew Ng. This is the course which was recommended in class notes. Much of my code will look similar to what I did for that assignment. However, the activation functions and their use in backpropagation will be unique. I'm kind of overbuilding in some ways but this serves the purpose of being able to use my implementation here to complete a stage of our research project.

In [5]:

from collections import deque

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split



# Feedforward Neural Network Implementation

The following functions implement a Feedforward Neural Network. These
functions are broken out into a few sections

* Parameter Initialization
* Activation functions and their gradients
* Forward propagation
* Backward propagation

These functions will be used to complete parts **(b)**
and **(c)** of Homework 5.

## Test Cases

## Parameter Intitializtion

In [10]:
def initialize_parameters(layer_dims, seed=42):
    """ Initialize parameters for each layer in NN
    
    :param layer_dims: dimensions for each layer
    :param seed: int to set random seed
    
    :return: weight matrices W and bias vectors b
    """
    parameters = {}
    L = len(layer_dims)
    np.random.seed(seed)
    
    for l in range(1, L):
        
        parameters['W' + str(l)] = \
        np.random.randn(layer_dims[l], 
                        layer_dims[l-1]) * 0.01
        parameters['b' + str(l)] = \
        np.zeros((layer_dims[l], 1))
        
    assert(parameters['W' + str(l)].shape == (layer_dims[l], layer_dims[l-1]))
    assert(parameters['b' + str(l)].shape == (layer_dims[l], 1))
    
    return parameters

## Activation functions

* Sigmoid $F(Z)$, range $[0,1]$

    $$ \begin{align*} F(Z) &= \frac{1}{1 + e^{-Z}} = \sigma(Z) \\
     F^{\prime}(Z) &= F(X)(1 - F(Z)) \end{align*}$$
    
* Tanh $F(Z)$, range $[-1, 1]$

    $$\begin{align*} F(Z) &= \frac{e^Z - e^{-Z}}{e^Z + e^{-Z}} = \tanh(Z) \\
       F^{\prime}(Z) &= 1 - F(Z)^2 \end{align*}$$
       
* Relu $F(Z)$, range $[0, +\infty]$

    $$\begin{align*} F(Z) &= \max(0,Z) \\
       F^{\prime}(Z) &= \begin{cases} 1, & \text{ if } Z > 0 \\
                         \text{undefined}, & \text{ if } Z = 0 \\
                         0, & \text{ if } Z < 0 \end{cases} 
                         \end{align*}$$
                         
Compute gradient for an activation function

$$dZ^{[l]} = dA^{[l]} * g'(Z^{[l]}) $$

TODO: verify gradient equation from notes

In [11]:
def sigmoid(Z):
    """ Sigmoid activation function
    
    :param Z: -- the input of the activation function
    
    :return: sigmoid function applied to vector Z
    """
    A = (1 + np.exp(-Z))**(-1)
    cache = Z
    return A, cache

In [12]:
# TODO: figure out how the activation_cache works

def sigmoid_gradient(dA, cache):
    """ Gradient of sigmoid function """
    
    Z = cache
    s = np.power((1 + np.exp(-Z)),-1)
    dZ = dA * s * (1 - s)
    
    assert (dZ.shape == Z.shape)
    
    return dZ

In [13]:
def relu(Z):
    """ Relu activation function
    
    :param Z: -- the input of the activation function
    
    :return: relu function applied to vector Z
    """
    A = np.maximum(Z, 0)
    
    assert(A.shape == Z.shape)
    cache = Z
    return A, cache

In [14]:
def relu_gradient(dA, cache):
    """ Gradient of the relu function """
    Z = cache
    dZ = np.array(dA, copy=True)
    assert (dZ.shape == Z.shape)
    
    dZ[Z <= 0] = 0
    
    return dZ


In [15]:
def tanh(Z):
    """ Tanh activation function
    
    :param Z: -- the input of the activation function
    
    :return: tanh function applied to vector Z
    """
    e_z = np.exp(Z)
    e_nz = np.exp(-Z)
    A = (e_z - e_nz)/(e_z + e_nz)
    return A, Z

In [16]:
def tanh_gradient(Z, cache):
    """ Gradient of the tanh function """
    
    Z = cache
    
    e_z = np.exp(Z)
    e_nz = np.exp(-Z)
    s = (e_z - e_nz)/(e_z + e_nz)    
    A = Z * (1 - np.power(s, 2))

    return A, Z
        

## Forward propagation

Implementing forward propagation

In [17]:
def linear_forward(A, W, b):
    """ Linear part of a layer's forward propagation 
    
    :param A: activation from previous layer (or input data)
    :param W: weights matrix 
    :param b: bias vector
    
    :return: Z -- the input of the activation function
    """
    Z = np.dot(W, A) + b
    
    assert(Z.shape == (W.shape[0], A.shape[1]))
    cache = (A, W, b)
    
    return Z, cache

In [18]:
def linear_activation_forward(A_prev, W, b, activation):
    """ Forward propagation for the LINEAR->ACTIVATION layer
    
    :param A_prev: activation from previous layer (or input data)
    :param W: weights matrix
    :param b: bias vector
    :param activation: activation function name of "sigmoid", 
                "relu", or "tanh"
                
    :return: A -- output of the activation function
    :return: cache -- contains "linear_cache" and "activation_cache"
    """
    Z, linear_cache =  linear_forward(A_prev, W, b)
    
    if activation == "relu":
        A, activation_cache = relu(Z)
        
    elif activation == "sigmoid":
        A, activation_cache = sigmoid(Z)
        
    elif activation == "tanh":
        A, activation_cache = tanh(Z)
    
    else:
        raise(TypeError("Activation done not exist: {}".\
                        format(activation)))
        
    assert (A.shape == (W.shape[0], A_prev.shape[1]))
    cache = (linear_cache, activation_cache)
    return (A, cache)
    

In [19]:
def L_model_forward(X, parameters, activations):
    """ Forward propogation given a model specification """
    caches = []
    A = X
    L = len(parameters) //2 # Number of layers in the neural network
    
    assert L == len(activations)
    
    for l in range(1, L):
        A_prev = A
        activation = activations.popleft()
        
        A, cache = linear_activation_forward(A_prev, parameters["W" + str(l)], 
                                             parameters["b" + str(l)], 
                                             activation = activation)
        caches.append(cache)

    activation = activations.popleft()
    AL, cache = linear_activation_forward(A, parameters["W" + str(l + 1)], 
                                          parameters["b" + str(l + 1)], 
                                          activation = activation)
    caches.append(cache)
    assert(AL.shape == (1,X.shape[1]))
    
    return (AL, caches)


## Cost Function

Using cross-entropy cost $J$, using the following formula: $$-\frac{1}{m} \sum\limits_{i = 1}^{m} (y^{(i)}\log\left(a^{[L] (i)}\right) + (1-y^{(i)})\log\left(1- a^{[L](i)}\right))$$

TODO: May need a different cost function depending on the notes

In [20]:
def compute_cost(AL, Y):
    """
    Cross entropy cost J
    """
    m = Y.shape[1]
    a = np.multiply(Y, np.log(AL))
    b = np.multiply((1 - Y), np.log(1 - AL))
    cost = - 1 * np.sum(a + b, axis=1) / m
    
    cost = np.squeeze(cost)
    assert(cost.shape == ())
    
    return cost


## Backward propagation

Implementing backward propagation

In [21]:
def linear_backward(dZ, cache):
    """ Linear portion of backward propagation for a single layer """
    A_prev, W, b = cache
    m = A_prev.shape[1]
    
    dW = (1/m) * np.dot(dZ, A_prev.T)
    db = (1/m) * np.sum(dZ, axis=1, keepdims=True)
    dA_prev = np.dot(W.T, dZ)
    
    assert (dA_prev.shape == A_prev.shape)
    assert (dW.shape == W.shape)
    assert (db.shape == b.shape)
    
    return (dA_prev, dW, db)


In [22]:
def linear_activation_backward(dA, cache, activation):
    """ Backward propagation for the LINEAR->ACTIVATION layer """
    
    linear_cache, activation_cache = cache
    
    if activation == "relu":
        dZ = relu_gradient(dA, activation_cache)
        
    elif activation == "sigmoid":
        dZ = sigmoid_gradient(dA, activation_cache)
        
    elif activation == "tanh":
        dZ = tanh_gradient(dA, activation_cache)
    
    dA_prev, dW, db = linear_backward(dZ, linear_cache)
    return (dA_prev, dW, db)

In [23]:
def L_model_backward(AL, Y, caches, activations):
    """ Implement backward propagation for the specified model """
    grads = {}
    L = len(caches)
    m = AL.shape[1]
    Y = Y.reshape(AL.shape)
    
    # Initialize the backpropagation
    
    dAL = - (np.divide(Y, AL) - np.divide(1 - Y, 1 - AL))
    
    # Lth layer gradients
    
    activation = activations.pop()
    current_cache = caches[L-1]
    grads["dA" + str(L-1)], grads["dW" + str(L)], grads["db" + str(L)] = \
    linear_activation_backward(dAL, current_cache, "sigmoid")
    
    # Loop from l=L-2 to l=0
    
    for l in reversed(range(L-1)):
        
        activation = activations.pop()
        current_cache = caches[l]
        
        dA_prev_temp, dW_temp, db_temp = \
        linear_activation_backward(grads["dA" + str(l + 1)], current_cache, "relu")
        grads["dA" + str(l)] = dA_prev_temp
        grads["dW" + str(l + 1)] = dW_temp
        grads["db" + str(l + 1)] = db_temp
        
    return grads
    

## Update parameters

Update parameters of the model using gradient descent.

In [24]:
def update_parameters(parameters, grads, learning_rate):
    """ Update parameters using gradient descent """
    L = len(parameters) // 2 # number of parameters in model
    
    for l in range(L):
        parameters["W" + str(l+1)] = parameters["W" + str(l+1)] - \
        learning_rate * grads["dW" + str(l + 1)]
        parameters["b" + str(l+1)] = parameters["b" + str(l+1)] - \
        learning_rate * grads["db" + str(l + 1)]
        
    return parameters
    

In [48]:
def train_model(X, Y, config):
    """ Run an L-Layer neural network using config file """
    
    # Unpack configuration file
    
    activations = config.get("activations")
    layers_dims = config.get("layers_dims") 
    learning_rate = config.get("learning_rate")
    num_iterations = config.get("num_iterations")
    print_cost = config.get("print_cost")
    random_seed = config.get("random_seed")
    
    np.random.seed(random_seed)
    costs = [] # keep track of costs
    
    # Initialize model
    
    parameters = initialize_parameters(layers_dims)
    
    for i in range(0, num_iterations):
        
        # Forward propagation
        
        fact = deque(activations.copy())
        AL, caches = L_model_forward(X, parameters, fact)
        
        # Compute cost
        
        cost = compute_cost(AL, Y)
        
        # Backward propagation
        
        bact = deque(activations.copy())
        grads = L_model_backward(AL, Y, caches, bact)
        
        # Update parameters
        
        parameters = update_parameters(parameters, grads, learning_rate)
        
        # Track cost and print values if specified
        if print_cost and i % 50 == 0:
            print ("Cost after iteration %i: %f" %(i, cost))
        if print_cost and i % 100 == 0:
            costs.append((i, cost))
            
    if print_cost:
        costs.append((i, cost))
             
    return parameters, costs
        

In [27]:
def predict(X, y, parameters, activations):
    """ Prediction using params from neural network """
    
    m = X.shape[1]
    n = len(parameters) // 2
    p = np.zeros((1,m))
    
    # Forward propagation
    
    facts = deque(activations.copy())
    probas, caches = L_model_forward(X, parameters, facts)
    
    # Convert probas to 0/1 predictions
    
    for i in range(0, probas.shape[1]):
        if probas[0,i] > 0.5:
            p[0,i] = 1
        else:
            p[0,i] = 0
            
    #print("Accuracy: "  + str(np.sum((p == y)/m)))
    return p
    

In [28]:
def report_model(meta_config:dict):
    """ Generate a model report based on meta_config file """
    
    config_template = meta_config.get("config_template", None)
    learning_range = meta_config.get("learning_range", None)
    learning_step = meta_config.get("learning_step", None)
    X_train = meta_config.get("X_train", None)
    y_train = meta_config.get("y_train", None)
    X_test = meta_config.get("X_test", None)
    y_test = meta_config.get("y_test", None)
    X_validate = meta_config.get("X_validate", None)
    y_validate = meta_config.get("y_validate", None) 
    
    learning_min, learning_max = learning_range
    
    learning_rates = np.arange(learning_min, learning_max, learning_step)
    reports = []
    report = {}
    
    i = 0
    for learning_rate in learning_rates:
        
        config = config_template.copy()
        config["learning_rate"] = learning_rate
        
        # Train the model
        
        parameters, costs = train_model(X_train, y_train, config)
        
        # Predict on training set
        train_score = predict(X_train, y_train, parameters, 
                              config.get("activations"))
        
        # Predict on test set
        test_score = predict(X_test, y_test, parameters, 
                             config.get("activations"))
        
        # Predict on validation set
        validation_score = predict(X_validate, y_validate, 
                                   parameters, config.get("activations"))
        
        model_report = report.copy()
        model_report["learning_rate"] = learning_rate
        model_report["costs"] = costs
        model_report["train_parameters"] = parameters
        model_report["train_score"] = train_score
        model_report["test_score"] = test_score
        model_report["validation_score"] = validation_score
        
        reports.append(model_report)
        
        if i % 100 == 0:
            print("Computing the {}th model".format(i))
            
        i += 1
            
    return reports
        
    

# Running the Model

In [None]:
#X_train, y_train

In [6]:
def load_data(filepath, y_label_value):
    x = np.load(filepath).transpose()
    y = np.full(x.shape[0], y_label_value)
    y = y.reshape(y.shape[0],1)
    print("X shape {}".format(x.shape))
    input_data = np.append(x, y, axis=1) 
    print("Input shape {}".format(input_data.shape))
    return input_data

In [7]:
def create_train_val_test(x, y):
    x_train, x_intermediate, y_train, y_intermediate = train_test_split(x, y, test_size=0.2, random_state=42)
    print("Train shape x: {}, y:{}".format(x_train.shape, y_intermediate.shape))
    print("Intermediate shape x: {}, y:{}".format(x_intermediate.shape, y_train.shape))
    x_val, x_test, y_val, y_test = train_test_split(x_intermediate, y_intermediate, test_size=0.5, random_state=42)
    print("Val shape x: {}, y:{}".format(x_val.shape, y_val.shape))
    print("Test shape x: {}, y:{}".format(x_test.shape, y_test.shape))
    return(x_train, y_train, x_val, y_val, x_test, y_test)

In [41]:
def load():
    filepath_fish = "/Users/karangm/Downloads/fish_1.npy"
    input_fish = load_data(filepath_fish, 0)
    filepath_fork = "/Users/karangm/Downloads/fork_1.npy"
    input_fork = load_data(filepath_fork, 1)
    input_data = np.concatenate((input_fish, input_fork), axis=0)
    np.random.shuffle(input_data)
    print("Input shape {}".format(input_data.shape))
    
    ## Split into x and y
    x = input_data[:, :65536]
    y = input_data[:, 65536]
    
    ## Split into train(80), validation(10), test(10)
    x_train, y_train, x_val, y_val, x_test, y_test = create_train_val_test(x, y)
    
    y_train = y_train.reshape(y_train.shape[0],1)
    y_val = y_val.reshape(y_val.shape[0],1)
    y_test = y_test.reshape(y_test.shape[0],1)
    
    return(x_train, y_train, x_val, y_val, x_test, y_test)

In [42]:
x_train, y_train, x_val, y_val, x_test, y_test = load()

X shape (1000, 65536)
Input shape (1000, 65537)
X shape (1000, 65536)
Input shape (1000, 65537)
Input shape (2000, 65537)
Train shape x: (1600, 65536), y:(400,)
Intermediate shape x: (400, 65536), y:(1600,)
Val shape x: (200, 65536), y:(200,)
Test shape x: (200, 65536), y:(200,)


In [44]:
# Need to figure out config for the data you have available
random_state = 42
config = {
    "activations": ["relu", "sigmoid"],
    "layers_dims" : [65536, 50, 1], #  2-layer model
    "learning_rate": 0.0075,
    "num_iterations" : 3000,
    "print_cost": True,
    "random_seed": random_state,
}

In [49]:
parameters, costs = train_model(x_train.T, y_train.T, config) 

Cost after iteration 0: 0.693155
Cost after iteration 50: 0.693078
Cost after iteration 100: 0.693011
Cost after iteration 150: 0.692952
Cost after iteration 200: 0.692899
Cost after iteration 250: 0.692852
Cost after iteration 300: 0.692809
Cost after iteration 350: 0.692770
Cost after iteration 400: 0.692733
Cost after iteration 450: 0.692699
Cost after iteration 500: 0.692666
Cost after iteration 550: 0.692635
Cost after iteration 600: 0.692605
Cost after iteration 650: 0.692576
Cost after iteration 700: 0.692548
Cost after iteration 750: 0.692519
Cost after iteration 800: 0.692491
Cost after iteration 850: 0.692463
Cost after iteration 900: 0.692435
Cost after iteration 950: 0.692407
Cost after iteration 1000: 0.692378
Cost after iteration 1050: 0.692349
Cost after iteration 1100: 0.692319
Cost after iteration 1150: 0.692289
Cost after iteration 1200: 0.692258
Cost after iteration 1250: 0.692227
Cost after iteration 1300: 0.692194
Cost after iteration 1350: 0.692161
Cost after iter

In [70]:
y_test_pred = predict(x_test.T, y_test.T, parameters, config.get("activations"))
y_test_pred = y_test_pred.T 
y_test_pred

array([[ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
      

In [69]:
y_test

array([[ 0.],
       [ 0.],
       [ 0.],
       [ 1.],
       [ 0.],
       [ 1.],
       [ 0.],
       [ 1.],
       [ 1.],
       [ 0.],
       [ 0.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 0.],
       [ 0.],
       [ 1.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 1.],
       [ 0.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 1.],
       [ 0.],
       [ 1.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 1.],
       [ 1.],
       [ 0.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 0.],
       [ 1.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 1.],
       [ 0.],
       [ 1.],
       [ 0.],
       [ 0.],
       [ 1.],
       [ 0.],
       [ 0.],
       [ 1.],
       [ 1.],
       [ 1.],
       [ 0.],
       [ 1.],
      

In [71]:
from sklearn.metrics import accuracy_score
accuracy_score(y_test, y_test_pred)

0.44500000000000001

In [56]:
meta_config = {
    "config_template" : config,
    "learning_range": (0.0005, 0.01),
    "learning_step": 0.0005,
    "X_train": x_train.T,
    "y_train": y_train.T,
    "X_test": x_test.T,
    "y_test": y_test.T,
    "X_validate": x_val.T,
    "y_validate": y_val.T,
}

In [57]:
reports_partb = report_model(meta_config)

Cost after iteration 0: 0.693155
Cost after iteration 50: 0.693150
Cost after iteration 100: 0.693144
Cost after iteration 150: 0.693139


KeyboardInterrupt: 

In [None]:
len(reports_partb)

In [35]:
x_train.T

array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       ..., 
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.]])