# Backpropagation in Feed Forward Neural Networks - Research Workbook

Academic Exercise 0  
Started 1/5/2017

## Background

### Artificial Neural Network

<p><a href='https://en.wikipedia.org/wiki/Artificial_neural_network'>wikipedia:Artificial_neural_network</a></p>
<p>A neural network (one of the basic concept of connectionism) is a computational attempt to mimic the way the biological brain works based on the concept of a network of connected neurons.</p>
<p>The basic unit, the <b>neuron</b>, takes in inputs from previous "layers", sums them up, feeds that number through some <b>activation function</b>, then sends the result forward down to any other neurons it's connected to. (Referred to as <b>"feed forward"</b>)</p>
<p>These networks are typically constructed in "layers". a row of neurons that each feed into each neuron in the next layer. The first layer is the input layer, where the problem inputs are supplied. The output layer is the one that everything eventually ends up feeding into, where the result is. Every layer in between is referred to as a <b>"hidden layer"</b></p>
<p>The key part of a network is that every connection has an associated <b>weight</b>. Every number passed down through it from the originating neuron is multiplied by this weight before supplied to the target neuron.</p>
</br>

<p>The network as a whole essentially acts as a "formula finder". The goal is to get a series of weight multiplications/activations that produce a desired set of outputs given a set of inputs. The learning portion of this network works by attempting to manipulate the weight values to get closer and closer to a working solution.</p>

## Feed-Forward



## Notation

(Reference: <a href="http://neuralnetworksanddeeplearning.com/chap2.html">neuralnetworksanddeeplearning.com</a>)

$w^l_{jk}$ - weight for connection from $k$th neuron in in the $(l-1)$th layer to the $j$th neuron in the $l$th layer.   
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;ex: $w^3_{24}$ means the weight from the 4th node in the 2nd layer to the 2nd node in the 3rd layer

$w^l$ - weight matrix for layer $l$. This consists of the weights connecting to the $l$th layer of neurons, "that is, the entry in the $j$th row and $k$th column is $w^l_{jk}$

$b^l$ - the bias vector for layer $l$

$\sigma$ - the sigmoid function: 
$$
\sigma(t) = \frac{1}{1+e^{-t}}
$$
(so used as the activation function because it has a nice easy derivative!)

$a^l$ - activation vector for layer $l$ (the activation value from each neuron).

$a^l_j$ - activation for the $j$th node in layer $l$.

## Network Construction

NOTE: all of the example code will be assuming a network structure of 1 hidden layer with 3 nodes, an input layer of 2 nodes, and an output layer of 1 node. (Attempting to emulate an XOR gate.)

## Calculations

Activations: An activation is calculated by first summing up all of the weights * previous layer's activations, (plus the bias weights) and then running this through the activation function. (In this case, we're using sigmoid)

For an individual activation, this can be written:
$$
a^l_j = \sigma(\sum_k{w^l_{jk}a^{l-1}_{k}+b^l_j})
$$
Stepping through each part of the summation: $w^l_{jk}*a^{l-1}_k$ is the weights leading into this layer multipled with the activations (outputs) of the previous layer. Then to each individual value of this vector, you add the associated bias weights: $+b^l_j$

This can also be written in a more compact matrix/vectorized form:
$$
a^l = \sigma(w^la^{l-1}+b^l)
$$
This equates to the same as above, only done via vectors. $w^la^{l-1}$ Is multiplying the weights going into layer $l$ with the activations of the previous layer.

In [None]:
# This contains the set of weights from layer i to layer i+1
layers = []

# structure
inputSize = 2
hiddenSize = 3
outputSize = 1

# initialize layer weights by structure

for i in range(0, inputSize):
    