# Backpropagation

## Feed forward

In the following cell, we will define functions to set up our neural network.
Namely an activation function, $\sigma(z)$, it's derivative, $\sigma'(z)$, a function to initialise weights and biases, and a function that calculates each activation of the network using feed-forward.

Recall the feed-forward equations,
$$ \mathbf{a}^{(n)} = \sigma(\mathbf{z}^{(n)}) $$
$$ \mathbf{z}^{(n)} = \mathbf{W}^{(n)}\mathbf{a}^{(n-1)} + \mathbf{b}^{(n)} $$

In this worksheet we will use the *logistic function* as our activation function, rather than the more familiar $\tanh$.
$$ \sigma(\mathbf{z}) = \frac{1}{1 + \exp(-\mathbf{z})} $$

There is no need to edit the following cells.
They do not form part of the assessment.
You may wish to study how it works though.

**Run the following cells before continuing.**

In [3]:
# PACKAGE
import numpy as np
import matplotlib.pyplot as plt
import numpy.random as random

In [5]:
# PACKAGE
# First load the worksheet dependencies.
# Here is the activation function and its derivative.
sigma = lambda z : 1 / (1 + np.exp(-z))
d_sigma = lambda z : np.cosh(z/2)**(-2) / 4

def reset_network(n1=6, n2=7):
    """
    This function initializes the network with it's structure
    """
    global W1, W2, W3, b1, b2, b3 
    W1 = random.rand(n1, 1)/2
    W2 = random.rand(n2, n1)/2
    W3 = random.rand(2, n2)/2
    b1 = random.rand(n1, 1)/2
    b2 = random.rand(n2, 1)/2
    b3 = random.rand(2, 1)/2

def network_function(a0):
    """
    This function feeds forward each activation to the next layer.
    It returns all weighted sums.
    """
    z1 = W1@a0 + b1 
    a1 = sigma(z1)
    z2 = W2@a1 + b2 
    a2 = sigma(z2)
    z3 = W3@a2 + b3 
    a3 = sigma(z3)
    return a0, a1, z1, a2, z2, a3, z3 

def cost1(x, y):
    """
    This function calcute cost function.
    """
    return np.linalg.norm(network_function(x)[-1] - y)**2 / x.size


## Backforward

In [None]:
###             Gradient function

def J_W3(x, y):
    """
    This function computer Jacobian for the third layer weights.
    """
    a0, a1, z1, a2, z2, a3, z3 = network_function(x)
    #       dC/da3
    J = 2*(a3 - y)
    #       Calcutated by the derivative of sigma at z3
    J = J*d_sigma(z3)
    #       Take the dot product with the final partial derivative 
    # divide by the number of training examples -> the average over all
    J = J @ a2.T / x.size
    # Finally return the result out of the function.
    return J

def J_b3 (x, y) :
    # As last time, we'll first set up the activations.
    a0, z1, a1, z2, a2, z3, a3 = network_function(x)
    # Next you should implement the first two partial derivatives of the Jacobian.
    # ===COPY TWO LINES FROM THE PREVIOUS FUNCTION TO SET UP THE FIRST TWO JACOBIAN TERMS===
    J = 2 * (a3 - y)
    J = J * d_sigma(z3)
    # For the final line, we don't need to multiply by dz3/db3, because that is multiplying by 1.
    # We still need to sum over all training examples however.
    # There is no need to edit this line.
    J = np.sum(J, axis=1, keepdims=True) / x.size
    return J


In [None]:
###             Gradient function

def J_W2(x, y):
    """
    This function computer Jacobian for the third layer weights.
    """
    a0, a1, z1, a2, z2, a3, z3 = network_function(x)
    #       dC/da3
    J = 2*(a3 - y)
    #       da3/da2
    J = J*d_sigma(z3)@
    #       Calcutated by the derivative of sigma at z3
    J = J*d_sigma(z3)
    #       Take the dot product with the final partial derivative 
    # divide by the number of training examples -> the average over all
    J = J @ a2.T / x.size
    # Finally return the result out of the function.
    return J

def J_b3 (x, y) :
    # As last time, we'll first set up the activations.
    a0, z1, a1, z2, a2, z3, a3 = network_function(x)
    # Next you should implement the first two partial derivatives of the Jacobian.
    # ===COPY TWO LINES FROM THE PREVIOUS FUNCTION TO SET UP THE FIRST TWO JACOBIAN TERMS===
    J = 2 * (a3 - y)
    J = J * d_sigma(z3)
    # For the final line, we don't need to multiply by dz3/db3, because that is multiplying by 1.
    # We still need to sum over all training examples however.
    # There is no need to edit this line.
    J = np.sum(J, axis=1, keepdims=True) / x.size
    return J