# Cost Function and Backpropagation

## Cost Function

Some variables that will be used

* L = total number of layers in the network
* sl = number of units (not counting bias unit) in layer l
* K = number of output units / classes

The following cost function is a generalization of the cost function of logistic regression

![Back Propagation Cost Function](Resources/bpcf.png "Back Propagation Cost Function")

* The first term is the sum of costs (w/o regularization) over all output units
* The second (regularization) term just sums up all the theta in the matrices (squared)

## Back Propagation Algorithm

For each training element:

1. Forward Propagation with an initial theta matrices -> We have an hypothesis output
2. Calculate the error rate for the output layer (yi - hypo output)
3. Backpropagate by calculating delta of the previous layers (nodes) by weighing it with the specific theta

-> The result are the partial derivatives of the cost function (derived by the thetas)

![Back Propagation](Resources/bpalgo.png "Back Propagation")

Intuitively: Forward propagation is from left to right, Backward propagation is from right to left

![Back Intuition](Resources/bpintuition.png "Back Propagation Intution")

### Learning Algorithm

The back propagation computes the partial derivatives of the cost function (for the thetas). They return the gradients as well as the costs. The outputs of this function are used for fminunc. The gradients as well as the thetas matrices(thetas for fminunc, not for back prop.) have to be unrolled into vectors. 

#### Unroll a matrice into a vector
thetaVector = [ Theta1(:); Theta2(:); Theta3(:); ]

#### Reshape an unrolled vector into matrices
Theta1 = reshape(thetaVector(1:110),10,11)

![Learning Algorithm](Resources/learningalgo.png "Learning Algorithm")

### Gradient checking

Gradient checking is used to debug your back propagation algorithm. With gradient checking, an approximation of the gradients (for each theta) is calculated using the following formula:

![Gradient checking](Resources/gradientcheck.png "Gradient checking")

Then we test whether our back propagation algorithm returns gradients that are approx. the ones computed numerically (formula above). If this is the case, our back propagation algorithm works fine.

<font color='red'> Note: When NN training is conducted, turn gradient checking off since it is very slow. That's also why we don't directly use the approximation, which would be easier to implement, but much slower. </font>

### Random initialization

It can be shown that if the initial values of all thetas are equal (e.g. 0), then back propagation will not work correctly, since each unit in a layer will have the same value. Therefore, we initialize each theta with a random value between (-epsilon, epsilon). 

Theta1 = rand(10,11) * (2 * INIT_EPSILON) - INIT_EPSILON;

As an example, this creates a 10x11 matrix (for example for the first layer of the neural network) of thetas, where each value is a random value between epsilon and - epsilon.

### Putting it all together

#### Neural Network design

* Input layer is defined by your data (e.g. 20x20 image are 400 input neurons, 1 for each pixel)
* Output layer is defined by how many classes you want to classify (1 for two classes, n for n classes where n >= 3)
* Hidden layer:
    * Default: 1 Hidden layer, if you have more, number of units per layer identical
    * Usually the more units you have the better, trade-off with computational time

#### Training a Neural Network

1. Randomly initialize Theta
2. Implement forward propagation to compute h(x)
3. Implement cost function
4. Implement backward propagation to compute the gradient (all partial derivatives)
5. Gradient checking, then disable it (since its slow)
6. Use the output (costs and gradient) for gradient descent or other more advanced optimization solvers

<b>Keep in mind that forward and backward propagation is looped over every training example.</b>

<font color='red'>The function is not convex, meaning existence of local optima. Turns out not to be a problem.</font>
