# Deep Neural Networks
## 1. Notation
The following notation will be used throughout the notebook:
* L: number of non-input layers in the network. The input layer will denoted as the $0$ zero layer
* $n^{[i]}$ the number of hidden units in the $i$-th layer
* $z^{[i]}$ the weighted sum vector in the $i$-th layer
* $a^{[i]}$ the activations vector in the $i$-th layer
* $w^{[i]}$ the weights used to compute the $i$-th layer
* $b^{[i]}$ the bias added in the $i$-th layer
* $g^{[i]}$ the activation function used in the $i$-th layer
* $a^{[0](i)}$ =  $x^{(i)}$
* $n^{[0]}$ = $n$: the number of features.
 

## 2. FeedForward Propagation
### 2.1 For a single training sample
Given a training sample $x = \begin{bmatrix} x_1 \\ x_2 \\ .. \\ .. \\ x_n\end{bmatrix}$ it is possible to compute the associated output $\hat{y}$ as follows:
1. $a^{[0]}$ = $x$
2. for $l = 1, 2, .. L$
$\begin{equation} \begin{aligned} 
z^{[l]} = w^{[l]} \cdot a^{[l - 1]} + b^{[l]} \\
a^{[l]} = g^{[l]} (z^{[l]})
\end{aligned}\end{equation}$
3. $\hat{y} = a^{[L]}$

### 2.2 For the entire training dataset
This section will extend (or vectorize) the algorithm to apply it for dataset $X$. Hence, the new notation:
* $Z^{[i]}$ = $\begin{bmatrix} 
z^{[i](1)} && z^{[i](2)} && .. && .. && z^{[i](m)}
\end{bmatrix}$
* $A^{[i]}$ = $\begin{bmatrix} 
a^{[i](1)} && a^{[i](2)} && .. && .. && a^{[i](m)}
\end{bmatrix}$
* $W^{[i]}$ = $\begin{bmatrix} 
w^{[i](1)} && w^{[i](2)} && .. && .. && w^{[i](m)}
\end{bmatrix}$
* $B^{[i]}$ = $\begin{bmatrix} 
b^{[i](1)} && b^{[i](2)} && .. && .. && b^{[i](m)}
\end{bmatrix}$.
The final vectorized version would be:
1. $A^{[0]}$ = $X$
2. for $l = 1, 2, .. L$
$\begin{equation} \begin{aligned} 
Z^{[l]} = W^{[l]} \cdot A^{[l - 1]} + B^{[l]} \\
A^{[l]} = g^{[l]} (Z^{[l]})
\end{aligned}\end{equation}$
3. $\hat{Y} = A^{[L]}$

## 3. Backward Propagation
Due to the hyper complex nature of the mathematics involved, only the final algorithm will be provided as a part of the code below

## 4. Implementation