# Deep L-layer Neural Network
Here are examples about shallow and deep NN:  
<img src="images/shallowdeep.png" width="600">

Some notations for deep NN:  
<img src="images/4layerNN.png" width="400">  
$L$: number of layers.  
$n^{[l]}:$ number of units in layer $l$, in the example, $n^{[0]}=n_x=3$, $n^{[1]}=n^{[2]}=5$, $n^{[3]}=3$, $n^{[4]}=n^{[L]}=1=\hat{y}$.  
$a^{[l]}:$ activation function in layer $l$, $x=a^{[0]}$.  
$w^{[l]}:$ weights for computing $z^{[l]}$.  
$b^{[l]}:$ bias for computing $z^{[l]}$.  

## Forward Propagation in a Deep Network  
<img src="images/4layerNN.png" width="400">    

In this NN,  
  
$z^{[1]}=w^{[1]}x+b^{[1]} \ \ \  , \ a^{[1]}=g^{[1]}(z^{[1]})$  
$z^{[2]}=w^{[2]}a^{[1]}+b^{[2]}, \ a^{[2]}=g^{[2]}(z^{[2]})$  
$z^{[3]}=w^{[3]}a^{[2]}+b^{[3]}, \ a^{[3]}=g^{[3]}(z^{[3]})$  
$z^{[4]}=w^{[4]}a^{[3]}+b^{[4]}, \ a^{[4]}=g^{[4]}(z^{[4]})=\hat{y}$  

In general:  
  
$z^{[l]}=w^{[l]}a^{[l-1]}+b^{[l]}, \ a^{[l]}=g^{[l]}(z^{[l]})$  

Vectorization:  
  
$Z^{[l]}=W^{[l]}A^{[l-1]}+b^{[l]}, \ A^{[l]}=g^{[l]}(Z^{[l]})$  

## Getting your Matrix Dimensions Right  
Here is the script used to ensure the matrix dimensions are right:  
<img src="images/new5layer.png" width="400"><img src="images/dimensionscript.png" width="300">   
In general:  
  
$w^{[l]}, dw^{[l]}: \ (n^{[l]},n^{[l-1]})$  
$b^{[l]},db^{[l]}\ \ \ : \ (n^{[l]},1)$  
$a^{[l]},z^{[l]}\quad : \ (n^{[l]},1)$  
  
Vectorization:  
  
$A^{[l]},Z^{[l]}\ \ \quad : \ (n^{[l]},m)$  
$dA^{[l]},dZ^{[l]}\ \  : \ (n^{[l]},m)$  

## Why Deep Representaions?

## Forward and Backward Propagation

### blocks  

<img src="images/blocks.png" width="900">
  


### computation  

**Forward propagation** for layer $l$:  
  
$\text{input} \ a^{[l-1]} \textcolor{red}{\Rightarrow} z^{[l]}=w^{[l]}a^{[l-1]}+b^{[l]} \textcolor{red}{\Rightarrow} a^{[l]}=g^{[l]}(z^{[l]})$.  
OUTPUT: $\textcolor{red}{a^{[l]}, \ \text{cache}(z^{[l]})}$.   
The "cache" is used in our implementation to store values computed during forward propagation to be used in backward propagation.  
In the vectorization way, it is:  
$\text{input} \ A^{[l-1]} \textcolor{red}{\Rightarrow} Z^{[l]}=W^{[l]}A^{[l-1]}+b^{[l]} \textcolor{red}{\Rightarrow} A^{[l]}=g^{[l]}(Z^{[l]})$  
  
**Backward propagation** for layer $l$:  
$\text{input} \ da^{[l]} \textcolor{red}{\Rightarrow} dz^{[l]}=da^{[l]}*g^{\prime [l]}(z^{[l]}) \textcolor{red} {\Rightarrow} dw^{[l]}=dz^{[l]} \cdot a^{[l-1]T} \textcolor{red}{\Rightarrow} db^{[l]}=dz^{[l]} \textcolor{red}{\Rightarrow} da^{[l-1]}=w^{[l]T} \cdot dz^{[l]}$.  
OUTPUT: $\textcolor{red}{da^{[l-1]}, \ dW^{[l]}, \ db^{[l]}}$.

## Feedforward Neural Networks in Depth

Here are some detailed math calculations. 
I referred to [these articles posted by jonaslali](https://community.deeplearning.ai/t/feedforward-neural-networks-in-depth/98811), but rewrote to the format that I am comfortable with.

## Parameters Vs. Hyperparameters

### what is hyperparameter?  

Hyperparameters are parameters whose values control the learning process and determine the values of model parameters that a learning algorithm ends up learning. The prefix ‘hyper_’ suggests that they are ‘top-level’ parameters that control the learning process and the model parameters that result from it [@https://towardsdatascience.com/parameters-and-hyperparameters-aa609601a9ac].  
Learning rate $\alpha$, number of iterations, number of hidden layers, number of hidden units, choice of activation functions are all hyperparameters.