## Deep L-layer neural network

<img src="screenshot/11.PNG" style="width:600px;height:350px;">

- Deep neural network notation
    - $l=4$ (\# of layers); 
    - $n^{[l]}$ (\# of units in layer $l$): $n^{[0]}=3$, $n^{[1]}=5$, $n^{[2]}=5$, $n^{[3]}=3$, $n^{[4]}=1$
    - $a^{[l]}$ (activations in layer $l$): $a^{[l]}=g^{[l]}(z^{[l]})$
    - $w^{[l]}$, $b^{[l]}$: weights for $z^{[l]}$
<img src="screenshot/12.PNG" style="width:600px;height:350px;">

## Forward propagation in a deep network 

$$
\begin{aligned}
&for\ l\ in\ 1:L:\\
&\quad z^{[l]}=w^{[l]}a^{[l-1]}+b^{[l]}\\
&\quad a^{[l-1]}=g^{[l-1]}(z^{[l-1]})
\end{aligned}
$$

## Getting your matrix dimensions right

- Dimension of $w^{[l]}$ is $(n^{[l]},n^{[l-1]})$, can be thought from right to left
- Dimension of $b^{[l]}$ is $(n^{[l]},1)$
- $dw^{[l]}$ has the same shape as $w^{[l]}$, while $db^{[l]}$ has the same shape as $b^{[l]}$
- Dimension of $z^{[l]}$, $A^{[l]}$, $dz^{[l]}$, $dA^{[l]}$ is $(n^{[l]},m)$

## Why deep representations?

- Deep NN makes relation with data from simpler to complex. In each layer, it tries to make a relation with the previous layer.
<img src="screenshot/13.PNG" style="width:600px;height:350px;">

    - Face recognition application: image$\rightarrow$edges$\rightarrow$face parts$\rightarrow$faces$\rightarrow$desired face
    - Audio recognition application: audio$\rightarrow$low level sound features like "sss" or "bb"$\rightarrow$phonemes$\rightarrow$words$\rightarrow$sentences
    
**Neural researchers think that deep neural networks "thinks" like brains**

    - Circuit theory and deep learning
<img src="screenshot/14.PNG" style="width:600px;height:350px;">    

**When starting on an application, don't start directly by dozens of hidden layers. Try the simplest solutions (e.g., logistic regression), then try the shallow neural network and so on.**

## Building blocks of deep neural networks

- Forward and backward functions
    - forward propagation
$$
\begin{aligned}
&Input\ a^{[l-1]}\\
&\quad z^{[l]}=w^{[l]}a^{[l-1]}+b^{[l]}\\
&\quad a^{[l]}=g^{[l]}(z^{[l]})\\
&Output\ a^{[l]}\\
&Cache\ z^{[l]}
\end{aligned}
$$

    - backward propagation
$$
\begin{aligned}
&Input\ da^{[l]}, Caches\\
&\quad dz^{[l]}=da^{[l]}*g^{'[l]}(z^{[l]})\\
&\quad dw^{[l]}=np.dot(dz^{[l]}, a^{[l-1]})/m\\
&\quad db^{[l]}=np.sum(z^{[l]})/m\\
&\quad da^{[l-1]}=np.dot(w^{[l]},dz^{[l]})\\
&Output\ da^{[l-1]}, dw^{[l]}, db^{[l]}
\end{aligned}
$$

<img src="screenshot/15.PNG" style="width:600px;height:350px;">
<img src="screenshot/16.PNG" style="width:600px;height:350px;">

## Parameters vs hyperparameters

- Main parameters of the NN is `w` and `b`
- Hyperparameters (parameters that control the algorithm) are like: learning rate, number of iteration, number of hidden layers `L`, number of hidden units `n`, choice of activation functions
- Have to try values of hyperparameters
- In the earlier days of ML, the learning rate was often called a parameter, but it is a hyperparameter
- On the next course, we will see how to optimize hyperparameters

## What does this have to do with the brain?

<img src="screenshot/17.PNG" style="width:600px;height:350px;">