# Back Propagation 
This notebook aims to help gain an understanding of back propagation by applying the process to a neural network model and double checking the accuracy of the back propagation by comparing the results with derivatives with finite differences.

### Imports
Import the libraries needed to define the neural network and back propagation algorithm

In [1]:
import numpy as np
import matplotlib.pyplot as plt

### Define Parameters 
Define the dimensions of the neural network model

In [2]:
# Set seed in order to consistently obtain the same random number
np.random.seed(0)

# Number of hidden layers
K = 5
# Number of neurons per layer
D = 6
# Input layer
D_i = 1
# Output layer
D_o = 1

### Define Weight and Bias Structure 
Define a list for the weights and biases in order to store all the weight and bias values.

In [3]:
all_weights = [None] * (K+1)
all_biases = [None] * (K+1)

### Define Weight and Bias Values (First and Last)
Define random values for the first and last layer of weights and biases and store these values in an array where each array corresponds to one hidden layer. Store the array as an element of the defined list

In [4]:
all_weights[0] = np.random.normal(size=(D, D_i)) # First layer of weights
all_weights[-1] = np.random.normal(size=(D_o, D)) # Last layer of weights
all_biases[0] = np.random.normal(size =(D,1)) # First layer of biases
all_biases[-1]= np.random.normal(size =(D_o,1)) # Last layer of biases

### Define Weight and Bias Values (Intermediate)
Define random values for the weights and biases in the intermediate hidden layers and store these values in an array, where each array corresponds to one hidden layer. Store the array as an element in the defined list. 

In [5]:
for layer in range(1,K):
  all_weights[layer] = np.random.normal(size=(D,D))
  all_biases[layer] = np.random.normal(size=(D,1))

### Define the Rectified Linear Unit (ReLU) Function

In [6]:
def ReLU(preactivation):
  activation = preactivation.clip(0.0)
  return activation

### Define Neural Network 
Define a function to define and compute a neural network, where the function takes in the inputs, weights, and biases and returns an output of the neural network model.

In [7]:
def compute_network_output(net_input, all_weights, all_biases):

  # Retrieve number of layers
  K = len(all_weights) -1
  
  # Store the pre-activations (all_f) and the activations (all_h)
  all_f = [None] * (K+1)
  all_h = [None] * (K+1)

  all_h[0] = net_input

  # Calculate the pre-activation and activation for each hidden layer
  for layer in range(K):
      # Compute the pre-activation function
      all_f[layer] = all_biases[layer] + np.matmul(all_weights[layer], all_h[layer])
      # Compute the activation function
      all_h[layer+1] = ReLU(all_f[layer])

  # Compute the output of the neural network from the last hidden layer
  all_f[K] = all_biases[K] + np.matmul(all_weights[K], all_h[K])

  # Retrieve the output
  net_output = all_f[K]

  return net_output, all_f, all_h

### Define Input Values

In [8]:
net_input = np.ones((D_i,1)) * 1.2

### Compute Neural Network
Compute the neural network using the defined input value and print the output of the neural network

In [9]:
net_output, all_f, all_h = compute_network_output(net_input,all_weights, all_biases)
print("True output = %3.3f, Your answer = %3.3f"%(1.907, net_output[0,0]))

True output = 1.907, Your answer = 1.907


### Define Loss Function 
Define the least squares loss function as the loss function for this neural network

In [10]:
def least_squares_loss(net_output, y):
  return np.sum((net_output-y) * (net_output-y))

### Define the Derivative of the Least Squares Loss Function (with respect to net_output)
Define a function to take the derivative of the least squares loss function with respect to the variable net_output

In [11]:
def d_loss_d_output(net_output, y):
    return 2*(net_output -y)

### Define Output Values
Define ideal output values for the least squares loss function

In [12]:
y = np.ones((D_o,1)) * 20.0

### Compute the Loss
Compute the loss of the neural network by using the defined output value

In [15]:
loss = least_squares_loss(net_output, y)

### Define an Indicator Function 
Define an indicator function for the backward pass

In [16]:
def indicator_function(x):
  x_in = np.array(x)
  x_in[x_in>0] = 1
  x_in[x_in<=0] = 0
  return x_in

## Backward Pass

### Define a Backward Pass
Define a function that calculates the backward pass by computing the derivatives of the loss function with respect to the preactivations, activations, biases, and weights respectively. Return the derivative of the loss function with respect to the weights and the derivative of the loss function with respect to the biases

In [17]:
# Define a function for the main backward pass
def backward_pass(all_weights, all_biases, all_f, all_h, y):
  # Store the derivatives dl_dweights and dl_dbiases in lists
  all_dl_dweights = [None] * (K+1)
  all_dl_dbiases = [None] * (K+1)
  # Store the derivatives of the loss with respect to the activation and preactivations in lists
  all_dl_df = [None] * (K+1)
  all_dl_dh = [None] * (K+1)

  # Compute derivatives of the loss with respect to the network output
  all_dl_df[K] = np.array(d_loss_d_output(all_f[K],y))

  # Compute the backward pass
  for layer in range(K,-1,-1):
    # Calculate the derivatives of the loss with respect to the biases at layer
    all_dl_dbiases[layer] = all_dl_df[layer]

    # Calculate the derivatives of the loss with respect to the weights at layer
    all_dl_dweights[layer] = np.matmul(all_dl_df[layer], all_h[layer].T)

    # Calculate the derivatives of the loss with respect to the activations
    all_dl_dh[layer] = np.matmul(all_weights[layer].T, all_dl_df[layer])

    if layer > 0:
      # Calculate the derivatives of the loss with respect to the pre-activation f
      all_dl_df[layer-1] = indicator_function(all_f[layer-1])*np.matmul(all_weights[layer].T, all_dl_df[layer])

  return all_dl_dweights, all_dl_dbiases

### Compute the Backward Pass

In [18]:
all_dl_dweights, all_dl_dbiases = backward_pass(all_weights, all_biases, all_f, all_h, y)

## Confirm Backward Pass
Confirm whether the backward pass was computed correctly by computing all the derivatives of the loss function with respect to the preactivations, activations, biases, and weights respectively with finite differences

In [19]:
np.set_printoptions(precision=3)

### Define Weight and Bias Structure 
Define a list for the weights and biases in order to store all the weight and bias values.

In [20]:
all_dl_dweights_fd = [None] * (K+1)
all_dl_dbiases_fd = [None] * (K+1)

### Define the Finite Difference

In [21]:
delta_fd = 0.000001

### Test the Derivatives of the Bias Vectors

In [22]:
for layer in range(K+1):
  dl_dbias  = np.zeros_like(all_dl_dbiases[layer])
  # For every element in the bias
  for row in range(all_biases[layer].shape[0]):
    # Take copy of biases and change one element at a time
    all_biases_copy = [np.array(x) for x in all_biases]
    all_biases_copy[layer][row] += delta_fd
    network_output_1, *_ = compute_network_output(net_input, all_weights, all_biases_copy)
    network_output_2, *_ = compute_network_output(net_input, all_weights, all_biases)
    dl_dbias[row] = (least_squares_loss(network_output_1, y) - least_squares_loss(network_output_2,y))/delta_fd
  all_dl_dbiases_fd[layer] = np.array(dl_dbias)
  print("-----------------------------------------------")
  print("Bias %d, derivatives from backprop:"%(layer))
  print(all_dl_dbiases[layer])
  print("Bias %d, derivatives from finite differences"%(layer))
  print(all_dl_dbiases_fd[layer])
  if np.allclose(all_dl_dbiases_fd[layer],all_dl_dbiases[layer],rtol=1e-05, atol=1e-08, equal_nan=False):
    print("Success!  Derivatives match.")
  else:
    print("Failure!  Derivatives different.")

-----------------------------------------------
Bias 0, derivatives from backprop:
[[ -4.48594876]
 [  4.94744466]
 [  6.81152875]
 [ -3.88346281]
 [-24.93526672]
 [  0.        ]]
Bias 0, derivatives from finite differences
[[ -4.48594881]
 [  4.94744461]
 [  6.81152875]
 [ -3.88346285]
 [-24.93526625]
 [  0.        ]]
Success!  Derivatives match.
-----------------------------------------------
Bias 1, derivatives from backprop:
[[ -0.        ]
 [-11.29689608]
 [  0.        ]
 [  0.        ]
 [-10.72177079]
 [  0.        ]]
Bias 1, derivatives from finite differences
[[  0.        ]
 [-11.29689599]
 [  0.        ]
 [  0.        ]
 [-10.72177071]
 [  0.        ]]
Success!  Derivatives match.
-----------------------------------------------
Bias 2, derivatives from backprop:
[[-0.        ]
 [-0.        ]
 [ 0.93788659]
 [ 0.        ]
 [-9.99335294]
 [ 0.50756642]]
Bias 2, derivatives from finite differences
[[ 0.        ]
 [ 0.        ]
 [ 0.93788645]
 [ 0.        ]
 [-9.99335293]
 [ 0.50

### Test the Derivatives of the Weight Matrices

In [23]:
for layer in range(K+1):
  dl_dweight  = np.zeros_like(all_dl_dweights[layer])
  # For every element in the bias
  for row in range(all_weights[layer].shape[0]):
    for col in range(all_weights[layer].shape[1]):
      # Take copy of biases and change one element at a time
      all_weights_copy = [np.array(x) for x in all_weights]
      all_weights_copy[layer][row][col] += delta_fd
      network_output_1, *_ = compute_network_output(net_input, all_weights_copy, all_biases)
      network_output_2, *_ = compute_network_output(net_input, all_weights, all_biases)
      dl_dweight[row][col] = (least_squares_loss(network_output_1, y) - least_squares_loss(network_output_2,y))/delta_fd
  all_dl_dweights_fd[layer] = np.array(dl_dweight)
  print("-----------------------------------------------")
  print("Weight %d, derivatives from backprop:"%(layer))
  print(all_dl_dweights[layer])
  print("Weight %d, derivatives from finite differences"%(layer))
  print(all_dl_dweights_fd[layer])
  if np.allclose(all_dl_dweights_fd[layer],all_dl_dweights[layer],rtol=1e-05, atol=1e-08, equal_nan=False):
    print("Success!  Derivatives match.")
  else:
    print("Failure!  Derivatives different.")

-----------------------------------------------
Weight 0, derivatives from backprop:
[[ -5.38313851]
 [  5.93693359]
 [  8.1738345 ]
 [ -4.66015537]
 [-29.92232006]
 [  0.        ]]
Weight 0, derivatives from finite differences
[[ -5.38313856]
 [  5.93693352]
 [  8.17383454]
 [ -4.66015541]
 [-29.92231947]
 [  0.        ]]
Success!  Derivatives match.
-----------------------------------------------
Weight 1, derivatives from backprop:
[[  0.           0.           0.           0.           0.
    0.        ]
 [-32.51134334  -6.7991913  -18.28231837 -34.14764932 -42.19558628
    0.        ]
 [  0.           0.           0.           0.           0.
    0.        ]
 [  0.           0.           0.           0.           0.
    0.        ]
 [-30.85618994  -6.45304428 -17.35156504 -32.40919155 -40.04740781
    0.        ]
 [  0.           0.           0.           0.           0.
    0.        ]]
Weight 1, derivatives from finite differences
[[  0.           0.           0.           0.   