# Neural Networks: Learning

In this section, we will learn how to fit the parameters of the neural network given a training set. 

In our (classification) problems, we will be dealing with either:

1. Binary classification (1 output unit)
2. Multi-class classification (k output units)

### Cost function

Our neural network cost function is a generalization of the logistic regression cost function:

$J(\Theta) = -\frac{1}{m}[\sum_{i=1}^m \sum_{k=1}^K y_k^{(i)} log(h_\Theta(x^{(i)}))_k + (1-y_k^{(i)})log(1-(h_\Theta(x^{(i)}))_k)] + \frac{\lambda}{2m} \sum_{l=1}^{L-1} \sum_{i=1}^{s_l} \sum_{j=1}^{s_{l+1}} (\Theta_{ji}^{(l)})^2 $

**Notes**:

1. The double sum adds up the logistic regression costs calculated for each cell in the output layer;
2. The triple sum adds up the squares of all the individual $\Theta$s in the entire network;
3. the i in the triple sum does **not** refer to training example i;

### Backpropagation algorithm

Just as a remainder, what we did in the previous section (Week 4) was **forward propagation**. Starting from the first layer all the way to the output layer.

**Intuition**: for each node $j$ in layer $l$ we will calculate the _error_ $\delta_j^{(l)}$

The backpropagation algorithm works as follows:

Given a training set $\{ (x^{(1)}, x^{(2)} ) \cdots (x^{(m)}, y^{(m)}) \}$

1. $\Delta^{(l)}_ij = 0$ for all $(l,i,j)$

2. Perform forward propagation to compute $a^{(l)}$

3. Using $y^{(t)}$, compute $\delta^{(L)} = a^{(L)} − y^{(t)}$  
L = last layer

4. Compute $\delta$ **backwards** starting from L-1:  
$\delta^{(l)} = ((\Theta^{(l)})^T \delta^{(l+1)}) .* a^{(l)} .* (1-a^{(l)})$  
$g'(z^{(l)}) = a^{(l)} .* (1-a^{(l)})$  

5. $\Delta^{(l)}_ij = \Delta^{(l)}_ij + a^{(l)}_j \delta^{(l+1)} $

### Implementation Note: Unrolling Parameters

In order to speed up computation, it can be beneficial unroll our initial parameters $\Theta^{(1)}$, $\Theta^{(2)}$ and $\Theta^{(3)}$ in a **single vector** <code>initialTheta</code>.

At this point, we apply:

<code>fminunc(@costFunction, initialTheta, options)</code>

### Gradient Checking 

