In [None]:
import numpy as np 

# Neural Network

**Implementation**

    1. Define the neural network structure ( # of input units,  # of hidden units, etc). 
    2. Initialize the model's parameters
    3. Loop:
        - Implement forward propagation
        - Compute loss
        - Implement backward propagation to get the gradients
        - Update parameters (gradient descent)

##  Propagation
**Notation**
- $L$: Number of layer
- $n^{[l]}$: Number of unitis in layer l
- $a^{[l]}$: Activation in layer l

### Forward Propagation

**Sample i**
$$
\begin{aligned}
a^{[0]}(i) &= x(i) \\
z^{[l]}(i) &= W^{[l]}a^{[l-1]}(i) + b^{[l]}\\
a^{[l]}(i) &= g^{[l]}(z^{[l]}(i))\
\end{aligned}
$$

**Matrix Form (i=1...m)**

$$
\begin{aligned}
X &= [x(1), ... ,x(n)]_{n_x, m} \\
Z^{[l]} &= [z^{[l]}(1), ... z^{[l]}(m)]\\
A^{[l]} &= [a{[l]}(1), ..., a{[l]}(m)]\\
Z^{[l]} &= W^{[l]}A^{[l-1]} + b^{[l]}\\
A^{[l]} &= g^{[l]}(Z^{[l]})
\end{aligned}
$$

### Backward Propagation

## Activation Function

Important to include nonlinearity

In [15]:
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def tanh(x):
    return np.tanh(x)

def reulu(x):
    return np.maximum(x, 0)

def sigmoid_derivative(x):
    return sigmoid(x) * (1 - sigmoid(x))

def tanh_derivative(x):
    return 1 - tanh(x) **2

def relu_derivative(x):
    return np.maximum(np.sign(x), 0)

## Random Initialization

- $W^{[l]}$ can not be initialized to all 0s due to symmetry issues.
- b does not have this issue