# Shallow Neural Networks

## Neural Network Overview 
for input $x$, output $\hat{y}$, and weights $w$:

in each neuron $j$ we compute the following:

$$X=A^{[0]}$$
$$Z^{[l]} = W^{[l]}A^{[l-1]} +b^{[l]}$$
$$A^{[l]} = g^{[l]}(Z^{[l]})$$
$$A^{[L]} = \hat{y}$$

for back propagation we compute the following:

$$dZ^{[l]} = dA^{[l]} * g^{[l]'}(Z^{[l]})$$
$$dW^{[l]} = \frac{1}{m} dZ^{[l]} A^{[l-1]T}$$
$$db^{[l]} = \frac{1}{m} np.sum(dZ^{[l]}, axis=1, keepdims=True)$$
$$dA^{[l-1]} = W^{[l]T} dZ^{[l]}$$


## Neural Network Representation 

<a name='1-1'></a>
### Example with one hidden layer:

> I really don't like to put it here, as I want to depict a general form.

input layer = layer zero
hidden layer [k] = layer k+1
output layer = layer $\sum{k}$

### Compute Details

for each node:

$$z^{[l]}_{i} = W^{[l]}_{i}a^{[l-1]}_{i} + b^{[l]}_{i}$$
$$a^{[l]}_{i} = g^{[l]}(z^{[l]}_{i})$$

### Vectorized Form

X = [x1, x2, ..., xn] # X is a (nx,m) matrix, xn is Column Vector.

$$ Z^{[l]} = W^{[l]}A^{[l-1]} + b^{[l]}$$
$$ A^{[l]} = g^{[l]}(Z^{[l]})$$

### Explanation for Vectorized Implementation

for each z calculation:

$$z^{(1)} = W^{(1)}x + b^{(1)}$$

stacking the training examples horizontally:

$$Z^{(1)} = W^{(1)}X + b^{(1)}$$


## Activation Functions

While $Z=WX+b$

Activation | g(Z) | g'(Z)
-|-|-
Linear: | $Z$ | $1$
Sigmoid: | $\frac {1}{1+e^{-Z}}$ | $g(Z) * (1 - g(Z))$
tanh: | $\frac {e^Z - e^{-Z}}{e^Z + e^{-Z}}$ | $1 - g(Z)^2$
ReLU: | $max(0, Z)$ | if $Z > 0$: $1$, else: $0$
Leaky ReLU: | $max(0.01Z, Z)$ | if $Z > 0$: $1$, else: $0.01$
softmax: | $e^Z / \sum e^Z$ | $g(Z) * (1 - g(Z))$


## Gradient Descent

### Forward propagation:
$$ Z^{[l]} = W^{[l]} X + b^{[l]} \tag{1}$$
$$ A^{[l]} = g^{[l]}(Z^{[l]}) \tag{2}$$
$$ L(\hat{y}^{(i)}, y^{(i)}) = - y^{(i)} \log(\hat{y}^{(i)}) - (1-y^{(i)}) \log(1-\hat{y}^{(i)}) \tag{3}$$

### Cost function: 
$$J(W^{[1]}, b^{[1]}, ..., W^{[L]}, b^{[L]}) = \frac{1}{m} \sum_{i=1}^{m} L(\hat{y}^{(i)}, y^{(i)}) \tag{4}$$

### Backward propagation:
$$ dZ^{[L]} = A^{[L]} - Y \tag{1}$$
$$ dW^{[L]} = \frac{1}{m} dZ^{[L]} A^{[L-1]T} \tag{2}$$
$$ db^{[L]} = \frac{1}{m} \sum_{i=1}^{m} dZ^{[L](i)} \tag{3}$$
$$ dZ^{[l]} = W^{[l+1]T} dZ^{[l+1]} * g^{[l]'}(Z^{[l]}) \tag{4}$$ 
$$ dW^{[l]} = \frac{1}{m} dZ^{[l]} A^{[l-1]T} \tag{5}$$
$$ db^{[l]} = \frac{1}{m} \sum_{i=1}^{m} dZ^{[l](i)} \tag{6}$$

### Update parameters: 
$$W^{[l]} = W^{[l]} - \alpha \frac{\partial J}{\partial W^{[l]}}\tag{1}$$
$$b^{[l]} = b^{[l]} - \alpha \frac{\partial J}{\partial b^{[l]}}\tag{2}$$


## Initialization

Initializing to zeros:
```python
w = np.zeros((dim, 1))
b = 0
w=[[0.]
 [0.]]
```
W is always the same, so the back propagation has the same result. We have **Symmetric Neurons**.

Accordingly, we should apply random initialization.\
Initializing randomly
```python
w = np.random.randn((dim, nodes)) * 0.01
b = np.zeros((nodes,1))
# w=[[0.01788628]
#    [0.0043651 ]]
```
