# Shallow Neural Networks

- [Neural Networks Overview](#0)
- [Neural Network Representation](#1)
  - [Example](#1-1)
  - [Details](#1-2)
  - [Vectors](#1-3)

<a name='0'></a>
## Neural Network Overview

for input $x$, output $\hat{y}$, and weights $w$:

in each neuron $j$ we compute the following:

$$X=A^{[0]}$$
$$Z^{[l]} = W^{[l]}A^{[l-1]} +b^{[l]}$$
$$A^{[l]} = g^{[l]}(Z^{[l]})$$
$$A^{[L]} = \hat{y}$$

for back propagation we compute the following:

$$dZ^{[l]} = dA^{[l]} * g^{[l]'}(Z^{[l]})$$
$$dW^{[l]} = \frac{1}{m} dZ^{[l]} A^{[l-1]T}$$
$$db^{[l]} = \frac{1}{m} np.sum(dZ^{[l]}, axis=1, keepdims=True)$$
$$dA^{[l-1]} = W^{[l]T} dZ^{[l]}$$

<a name='1'></a>
## Neural Network Representation ##

<a name='1-1'></a>
### Example with one hidden layer:

> I really don't like to put it here, as I want to depict a general form.

input layer = layer zero
hidden layer [k] = layer k+1
output layer = layer $\sum{k}$

<a name='1-2'></a>
### Compute Details

for each node:

$$z^{[l]}_{i} = W^{[l]}_{i}a^{[l-1]}_{i} + b^{[l]}_{i}$$
$$a^{[l]}_{i} = g^{[l]}(z^{[l]}_{i})$$

<a name='1-3'></a>
### Vectorized Form

X = [x1, x2, ..., xn] # X is a (nx,m) matrix, xn is Column Vector.

$$ Z^{[l]} = W^{[l]}A^{[l-1]} + b^{[l]}$$
$$ A^{[l]} = g^{[l]}(Z^{[l]})$$

<a name ='1-4'></a>
### Explanation for Vectorized Implementation

for each z calculation:

$$z^{(1)} = W^{(1)}x + b^{(1)}$$

stacking the training examples horizontally:

$$Z^{(1)} = W^{(1)}X + b^{(1)}$$


## Activation Functions

### 
g(Z) could be any activation

Examples:
While linear part $Z = W^T X + b$

Activation | Function
-|-
Linear: | $g(Z) = Z$
Sigmoid: | $g(Z) = 1 / (1 + e^(-Z))$
tanh: | $g(Z) = (e^Z - e^(-Z)) / (e^Z + e^(-Z))$
ReLU: | $g(Z) = max(0, Z)$
Leaky ReLU: | $g(Z) = max(0.01Z, Z)$
softmax: | $g(Z) = e^Z / \sum e^Z$


## Gradient Descent