# Neural Networks 2

## Forward and Backward Propagation Methods

Last time you built a neural network that was able to create multiple neurons per layer and multiple layers in the network.  The consolidated code from last time is given below which we will add to through this lesson to build the key algorithms which make all neural networks operate: Forward Propagation and Backward Propagation!

In [1]:
import random
import math

# From Lesson 1
def print_network(network):
    """
    Prints the current neural network weights and results for each neuron
    """
    print("Neuron_Index: [Weights] => Neuron_Output\n")

    for layer in network.Layers:
        if layer.Layer_Index == 0:
            layer_type = "Input"
        elif layer.Layer_Index == (len(network.Layers) - 1):
            layer_type = "Output"
        else:
            layer_type = "Hidden"
        print("Layer {0} - {1}".format(layer.Layer_Index, layer_type))
        for neuron in layer.Neurons:
            print("   Neuron {0}: {1} => {2}".format(neuron.Neuron_Index, neuron.Weights, neuron.Result))


class Network:

    def __init__(self, description, initialization_value, learning_rate):
        self.Layers = []
        # Initialize the input layer
        self.Layers.append(Layer(0, description[0], 0, initialization_value, learning_rate))
        # Initialize the subsequent layers
        for i in range(1, len(description)):
            self.Layers.append(Layer(description[i - 1], description[i], i, initialization_value, learning_rate))
        self.Learning_Rate = learning_rate

    def set_weights(self, layer_index, neuron_index, weights):
        self.Layers[layer_index].set_weights(neuron_index, weights)


class Layer:

    def __init__(self, number_neurons_previous_layer, number_neurons, layer_index, initialization_value, learning_rate):
        self.Neurons = []
        self.Layer_Index = layer_index
        for i in range(number_neurons):
            self.Neurons.append(Neuron(number_neurons_previous_layer, i, initialization_value, learning_rate))

    def set_weights(self, neuron_index, weights):
        self.Neurons[neuron_index].set_weights(weights)


class Neuron:

    def __init__(self, number_neurons_in_previous_layer, neuron_index, initialization_value, learning_rate):
        self.Weights = []
        self.Result = 0
        # self.learning_rate = learning_rate
        self.previous_values = []
        self.delta_weights = []
        self.Neuron_Index = neuron_index
        for i in range(number_neurons_in_previous_layer):
            self.Weights.append(random.uniform(-initialization_value, initialization_value))
        self.Weights.append(random.uniform(-initialization_value, initialization_value))

    def set_weights(self, weights):
        self.Weights = weights

### Forward Propagation

Forward Propagation is the algorithm that takes your input and pushes through the neural network layers/neurons to create one or more output values.  Recall from the last lesson that each neuron calculates it's output as the weighted sum of the neuron's input vector and the weights.  In other words, given:
$\vec{W} = Weight\ vector$

$\vec{I} = Input\ vector$

$\theta = Bias$

Here is what our output looks like without an activation function

$output = \vec{W}\cdot\vec{I} - \theta$

The activation function transforms the output of the weighted sum.  A common activation function is the sigmoid function, given by 

$y^{sigmoid}=1/{(1+e^{-X})}$

A benefit of the sigmoid function for activation is that it forces the output to be bounded between 0 and 1.  I highly recommend you graph out the sigmoid function so you are aware of how it looks.  Why do you think this form would be of use?

A key attribute of any activation function is that it is differentiable.  As we will see, it's differntiability allows us to determine how to distribute the error.  For the Sigmoid function the derivative is $x*(1-x)$

For these lessons we will be implementing any class for activation function using two methods:  
<code>activation_function(value)</code>
and
<code>activation_function_derivative(result)</code>

The code below implements the sigmoid activation function and its derivative.

In [2]:
class SigmoidActivation:

    @staticmethod
    def activation_function(value):
        return 1 / (1 + math.exp(-value))

    @staticmethod
    def activation_function_derivative(result):
        return result * (1.0 - result)

Alright, given the sigmoid function activation function we will now build the forward propagation method.  The below code is complete except for the calculations to compute output vector (weighted sum of the values and weights including the bias) and the result of the activation function.  Complete the below function.

In [3]:
def forward_propagate(network, values, activation):
    """
    Runs the forward propagation algorithm
    """
    results = values
    for layer in network.Layers[1:]:
        # initialize a vector for the layer's values (to be sent to next layer)
        values_for_layer = []

        for neuron in layer.Neurons:
            # store the previous values for use in back propagation
            neuron.previous_values = results
            # reset the results
            neuron.Result = 0
            
            # YOUR CODE HERE
            # calculate the weighted sum of the weights by the results
           
        
        
            # update the neuron.Result as the activation_function of the weighted sum
            
            
            # END YOUR CODE HERE 
            # add the new output value to the values vector for the current layer
            values_for_layer.append(neuron.Result)
        results = values_for_layer

    return values_for_layer

Once the forward propagation method is complete each neuron will have the vector of the results from the previous layer as well as an output.  One of the cool aspects of a neural network is that once the weights have been trained the algorithm only needs to run the forward propagation method to calculate the resulting output of the network.  Then, based on a threshold the inputted vector is classified.

## Back Propagation Algorithm

Take a look over the following video to get an idea what is going on with backprop: https://youtu.be/Ilg3gGewQ5U

Backprop effectively provides a way of distributing the error across all of the weights based on their contribution.  To help walk through the math, let's unravel the equations presented in Negnivitski's discussion.

We will start with the simple error function:

$e_k(p) = y_{d,k}(p) - y_k(p)$

This isn't anything crazy. All we are saying is that the error is the actual output subtracted with what we predicted.

Next we seek to describe to find how we descibe the gradient for each neuron in the output layer. Well this is simply the derivative of the activation function multipled by the error gradient.

Remember the activation function is what we pass our final weighted sum to

$output = activationfunction(\vec{W}\cdot\vec{I} - \theta)$

As the gradient is pretty boring for a linear function lets suppose we have a sigmoid activation function where .

$sigmoid(x) = \frac{e^x}{1 + e^x}$


Now the gradient for this function is

$\delta_k(p) = y_k(p)[1-y_k(p)]e_k(p)$

Finally, the final change in weight correction for connection j,k is

$\delta_{j,k} = \alpha \ y_j(p) \ \delta_k(p)$

Where alpha is our learning rate (you have seen this before) $y_j$ is the output at the neuron in layer j with weight j,k and the error gradient is just what we calculated.

Finally, the new weight is 

$new\ weight = old\ weight\ +\ the\ change\ in\ weight$

For the hidden layers the equation is slightly different we have 

$\delta_j(p) = y_j(p) \ (1-y_j(p)) \ \sum_{k = 1}^{j} {\delta_k(p)w_{jk}(p)}$

Now that we have the equations, it's time to implement!  

In [4]:
def backward_propagate(network, output_values, correct_values, activation):
    def layer_bp(layer, errors, activation):
        '''
        Takes the errors from the previous layer to update the weights
        :param layer:  the layer being updated
        :param errors:  errors calculated from the previous layer
        :param activation:  the activation function class
        :return:  updated errors for the next layer
        '''
        error_vector = []
        for (neuron, error) in zip(layer.Neurons, errors):
            # calculate each neurons error vector
            error_vector_from_neuron = neuron_back_propagate(neuron,
                                                             error, activation)
            if len(error_vector) == 0:
                for i in range(len(error_vector_from_neuron)):
                    error_vector.append(0)
            for i in range(len(error_vector_from_neuron)):
                error_vector[i] += error_vector_from_neuron[i]
        return error_vector

    def neuron_back_propagate(neuron, error, activation):
        """
        Calculates weight updates for a neuron through back prop
        :param neuron:  the neuron being evaluated
        :param error:  the error associated with this neuron
        :param activation:  the activation function being used
        :return: the error_vector for the 
        """
        derivative = activation.activation_function_derivative(neuron.Result)
        error_gradient = error * derivative
        
        # calculate the change in weights using the error gradient
        neuron.delta_weights.clear()
        
        # YOUR CODE HERE
        
        for i in range(len(neuron.previous_values)):
            delta = 0  # FIX THIS
            neuron.delta_weights.append(delta)
        neuron.delta_weights.append(0) # FIX THIS
        
        # YOUR CODE COMPLETE
        
        # The error vector becomes the weights modified by the error gradient
        error_vector = []
        for weight in neuron.Weights:
            error_vector.append(weight * error_gradient)
        return error_vector

    errors = []
    # Calculate the error of the prediction at the output layer
    for (output, correct) in zip(output_values, correct_values):
        errors.append(correct - output)

    # move through hidden layers to calculate the error (move from output to input)
    for reversed_layer in reversed(network.Layers[1:]):
        errors = layer_bp(reversed_layer, errors, activation)

    # Update the weights based on the delta calculated
    for layer in network.Layers[1:]:
        for neuron in layer.Neurons:
            for i in range(len(neuron.Weights)):
                neuron.Weights[i] += neuron.delta_weights[i]

Let's test this functionality using the calculations given in Negnivitsky.  Below is a test routine for your to compare the values.

In [5]:
def test_negnivitsky():
    init_value = 2.4/2  # per Haykin set the init to 2.4/<number input neurons>

    network = Network([2, 2, 1], init_value, 0.1)
    activation = SigmoidActivation()
    network.set_weights(1, 0, [0.5, 0.4, 0.8])  # layer #, neuron #, weights + bias
    network.set_weights(1, 1, [0.9, 1.0, -0.1])
    network.set_weights(2, 0, [-1.2, 1.1, 0.3])

    result = forward_propagate(network, [1, 1], activation)
    print("result:")
    print(result)
    print("\nnetwork:")
    print_network(network)

    # Neuron 3 from Negnivitski diagram
    assert(abs(network.Layers[1].Neurons[0].Result - 0.5249791874789399) < 0.0001)
    assert(abs(network.Layers[1].Neurons[1].Result - 0.8807970779778823) < 0.0001)
    assert(abs(network.Layers[2].Neurons[0].Result - 0.5097242138886783) < 0.0001)

    print("\n-- BackProp --")
    backward_propagate(network, result, [0], activation)
    print_network(network)

    # Neuron 3 from Negnivitski diagram
    assert(abs(network.Layers[1].Neurons[0].Weights[0] - 0.5038119477996761) < 0.0001)
    assert(abs(network.Layers[1].Neurons[0].Weights[1] - 0.40381194779967616) < 0.0001)
    assert(abs(network.Layers[1].Neurons[0].Weights[2] - 0.796188052200324) < 0.0001)
    # Neuron 4 from Negnivitski diagram
    assert(abs(network.Layers[1].Neurons[1].Weights[0] - 0.89852881792090521) < 0.0001)
    assert(abs(network.Layers[1].Neurons[1].Weights[1] - 0.9985288179209052) < 0.0001)
    assert(abs(network.Layers[1].Neurons[1].Weights[2] - -0.09852881792090515) < 0.0001)
    # Neuron 5 from Negnivitski diagram
    assert(abs(network.Layers[2].Neurons[0].Weights[0] - -1.2066873347075837) < 0.0001)
    assert(abs(network.Layers[2].Neurons[0].Weights[1] - 1.0887801554606653) < 0.0001)
    assert(abs(network.Layers[2].Neurons[0].Weights[2] - 0.3127382853779363) < 0.0001)


In [6]:
test_negnivitsky()

result:
[0]

network:
Neuron_Index: [Weights] => Neuron_Output

Layer 0 - Input
   Neuron 0: [-0.8509993546018211] => 0
   Neuron 1: [0.9718916833623055] => 0
Layer 1 - Hidden
   Neuron 0: [0.5, 0.4, 0.8] => 0
   Neuron 1: [0.9, 1.0, -0.1] => 0
Layer 2 - Output
   Neuron 0: [-1.2, 1.1, 0.3] => 0


AssertionError: 

If you want answers for the above questions/problems, below is the code I used:

Answer for Forward Prop:
<code>
            # calculate the weighted sum of the weights by the results
            for i in range(len(results)):
                neuron.Result += neuron.Weights[i] * results[i]
            neuron.Result += -1.0 * neuron.Weights[-1]
            
            # update the result using the activation function
            neuron.Result = activation.activation_function(neuron.Result)
</code>

Answer for BackProp:
<code>
   for i in range(len(neuron.previous_values)):
            delta = network.Learning_Rate * error_gradient * neuron.previous_values[i]
            neuron.delta_weights.append(delta)
        neuron.delta_weights.append(network.Learning_Rate * error_gradient * -1.0)
</code>