# Deep Neural Network

Let's say that we have following inputs '$X$' (with three input examples, and each example has four features) and outputs 'Y':

\begin{equation}
X = 
\begin{bmatrix}
x_{11} & x_{12} & x_{13} \\
x_{21} & x_{22} & x_{23} \\
x_{31} & x_{32} & x_{33} \\
x_{41} & x_{42} & x_{43} 
\end{bmatrix} \hspace{1cm} and \hspace{1cm} Y = 
\begin{bmatrix}
y_{11} & y_{12} & y_{13}\\ 
\end{bmatrix}
\end{equation}


Let's design a deep neural network with following specifications:

| No. of Layers            | $L=$  | 3 |
|--------------------------|-------|---|
| No. of input features    | $n^0$ | 4 |
| No. of Units in Layer 1  | $n^1$ | 3 |   
| No. of Units in Layer 2  | $n^1$ | 5 |   
| No. of Units in Layer 3  | $n^2$ | 1 |   

The architecture will look like the one shown below:

![Screen%20Shot%202020-12-16%20at%206.28.12%20PM.png](attachment:Screen%20Shot%202020-12-16%20at%206.28.12%20PM.png)

## Step 1: Initialize Weights

Let's initialize weights with dimension $W\in \mathbb{R}^{n^{[l]}\times n^{[l-1]}}$ and $b\in \mathbb{R}^{n^{[l]}\times 1}$; where $l$ represent the current layer. Therefore, the dimensions would be

| Weight Matrix           | Dimension        | Dimension |
|-------------------------|------------------|---------|
| $W^{[1]}$ :    | $n^{[1]}\times n^{[0]}$=    | $3\times4$ |
| $W^{[2]}$  :   | $n^{[2]}\times n^{[1]}$=    | $5\times3$ |   
| $W^{[3]}$  :   | $n^{[3]}\times n^{[2]}$=    | $1\times5$ |   

### Weights and Bias of Layer 1
\begin{equation}
W^{[1]} = 
\begin{bmatrix}
w^{[1]}_{11} & w^{[1]}_{21} & w^{[1]}_{31} & w^{[1]}_{41}\\
w^{[1]}_{12} & w^{[1]}_{22} & w^{[1]}_{32} & w^{[1]}_{42}\\
w^{[1]}_{13} & w^{[1]}_{23} & w^{[1]}_{33} & w^{[1]}_{43} 
\end{bmatrix} \hspace{1cm} and \hspace{1cm} b^{[1]} = 
\begin{bmatrix}
b^{[1]}_{1} \\ b^{[1]}_{2} \\ b^{[1]}_{3}
\end{bmatrix}
\end{equation}

Where, in $w^{[l]}_{ij}$, the superscript $[l]$ indicate layer l, and subscript ${ij}$ indicates the weight $w$ between node $i$ of input layer $(l-1)$ and node $j$ of output layer $(l)$.

### Weights and Bias of Layer 2
\begin{equation}
W^{[2]} = 
\begin{bmatrix}
w^{[2]}_{11} & w^{[2]}_{21} & w^{[2]}_{31} \\
w^{[2]}_{12} & w^{[2]}_{22} & w^{[2]}_{32} \\
w^{[2]}_{13} & w^{[2]}_{23} & w^{[2]}_{33} \\
w^{[2]}_{14} & w^{[2]}_{24} & w^{[2]}_{34} \\
w^{[2]}_{15} & w^{[2]}_{25} & w^{[2]}_{35} \\
\end{bmatrix} \hspace{1cm} and \hspace{1cm} b^{[1]} = 
\begin{bmatrix}
b^{[2]}_{1} \\ b^{[2]}_{2} \\ b^{[2]}_{3} \\ b^{[2]}_{4} \\ b^{[2]}_{5}
\end{bmatrix}
\end{equation}

### Weights and Bias of Layer 3
\begin{equation}
W^{[3]} = 
\begin{bmatrix}
w^{[3]}_{11} & w^{[3]}_{21} & w^{[3]}_{31} & w^{[3]}_{41} & w^{[3]}_{51} 
\end{bmatrix} \hspace{1cm} and \hspace{1cm} b^{[3]} = 
\begin{bmatrix}
b^{[3]}_{1} 
\end{bmatrix}
\end{equation}

## Step 2: Forward Propagation

### For layer 1; that is, $A^{[1]}$
\begin{equation}
A^{[1]} = W^{[1]}A^{[0]}+b^{[1]}
\end{equation}

\begin{equation}
A^{[1]} = 
\begin{bmatrix}
w^{[1]}_{11} & w^{[1]}_{21} & w^{[1]}_{31} & w^{[1]}_{41}\\
w^{[1]}_{12} & w^{[1]}_{22} & w^{[1]}_{32} & w^{[1]}_{42}\\
w^{[1]}_{13} & w^{[1]}_{23} & w^{[1]}_{33} & w^{[1]}_{43} 
\end{bmatrix} 
\begin{bmatrix}
x_{11} & x_{12} & x_{13} \\
x_{21} & x_{22} & x_{23} \\
x_{31} & x_{32} & x_{33} \\
x_{41} & x_{42} & x_{43} 
\end{bmatrix}
+ 
\begin{bmatrix}
b^{[1]}_{1} \\ b^{[1]}_{2} \\ b^{[1]}_{3}
\end{bmatrix}
\end{equation}

\begin{equation}
A^{[1]} = 
\begin{bmatrix}
a^{[1]}_{11} & a^{[1]}_{12} & a^{[1]}_{13}  \\
a^{[1]}_{21} & a^{[1]}_{22} & a^{[1]}_{23} \\
a^{[1]}_{31} & a^{[1]}_{32} & a^{[1]}_{33}
\end{bmatrix}
\end{equation}

where
\begin{equation}
\begin{aligned}
a^{[1]}_{11}&= w^{[1]}_{11}x_{11}+w^{[1]}_{21}x_{21}+w^{[1]}_{31}x_{31}+w^{[1]}_{41}x_{41}+b^{[1]}_{1},\hspace{0.1cm}  
a^{[1]}_{12} = w^{[1]}_{11}x_{12}+w^{[1]}_{21}x_{22}+w^{[1]}_{31}x_{32}+w^{[1]}_{41}x_{42}+b^{[1]}_{1} ,\hspace{0.1cm}
a^{[1]}_{13}= w^{[1]}_{11}x_{13}+w^{[1]}_{21}x_{23}+w_{31}x_{33}+w_{41}x_{43}+b^{[1]}_{1}\\
%
a^{[1]}_{21}&= w^{[1]}_{12}x_{11}+w^{[1]}_{22}x_{21}+w^{[1]}_{32}x_{31}+w^{[1]}_{42}x_{41}+b^{[1]}_{2},\hspace{0.1cm}  
a^{[1]}_{22} = w^{[1]}_{12}x_{12}+w^{[1]}_{22}x_{22}+w^{[1]}_{32}x_{32}+w^{[1]}_{42}x_{42}+b^{[1]}_{2} ,\hspace{0.1cm}
a^{[1]}_{23}= w^{[1]}_{12}x_{13}+w^{[1]}_{22}x_{23}+w^{[1]}_{32}x_{33}+w^{[1]}_{42}x_{43}+b^{[1]}_{2}\\
%
a^{[1]}_{31}&= w^{[1]}_{13}x_{11}+w^{[1]}_{23}x_{21}+w^{[1]}_{33}x_{31}+w^{[1]}_{43}x_{41}+b^{[1]}_{3},\hspace{0.1cm}  
a^{[1]}_{32} = w^{[1]}_{13}x_{12}+w^{[1]}_{23}x_{22}+w^{[1]}_{33}x_{32}+w^{[1]}_{43}x_{42}+b^{[1]}_{3} ,\hspace{0.1cm}
a^{[1]}_{33}= w^{[1]}_{13}x_{13}+w^{[1]}_{23}x_{23}+w^{[1]}_{33}x_{33}+w^{[1]}_{43}x_{43}+b^{[1]}_{3}
\end{aligned}
\end{equation}

### For layer 2; that is, $A^{[2]}$
\begin{equation}
A^{[2]} = W^{[2]}A^{[1]}+b^{[2]}
\end{equation}

\begin{equation}
A^{[2]} = 
\begin{bmatrix}
w^{[2]}_{11} & w^{[2]}_{21} & w^{[2]}_{31} \\
w^{[2]}_{12} & w^{[2]}_{22} & w^{[2]}_{32} \\
w^{[2]}_{13} & w^{[2]}_{23} & w^{[2]}_{33} \\
w^{[2]}_{14} & w^{[2]}_{24} & w^{[2]}_{34} \\
w^{[2]}_{15} & w^{[2]}_{25} & w^{[2]}_{35} \\
\end{bmatrix} 
\begin{bmatrix}
a^{[1]}_{11} & a^{[1]}_{12} & a^{[1]}_{13}  \\
a^{[1]}_{21} & a^{[1]}_{22} & a^{[1]}_{23} \\
a^{[1]}_{31} & a^{[1]}_{32} & a^{[1]}_{33} 
\end{bmatrix} 
+ 
\begin{bmatrix}
b^{[2]}_{1} \\ b^{[2]}_{2} \\ b^{[2]}_{3} \\ b^{[2]}_{4} \\ b^{[2]}_{5}
\end{bmatrix}
\end{equation}

\begin{equation}
A^{[2]} = 
\begin{bmatrix}
a^{[2]}_{11} & a^{[2]}_{12} & a^{[2]}_{13}  \\
a^{[2]}_{21} & a^{[2]}_{22} & a^{[2]}_{23} \\
a^{[2]}_{31} & a^{[2]}_{32} & a^{[2]}_{33} \\
a^{[2]}_{41} & a^{[2]}_{42} & a^{[2]}_{43} \\
a^{[2]}_{51} & a^{[2]}_{52} & a^{[2]}_{53} 
\end{bmatrix} 
\end{equation}

where
\begin{equation}
\begin{aligned}
a^{[2]}_{11}&= w^{[2]}_{11}a^{[1]}_{11}+w^{[2]}_{21}a^{[1]}_{21}+w^{[2]}_{31}a^{[1]}_{31}+b^{[2]}_{1},\hspace{0.1cm}  
a^{[2]}_{12} = w^{[2]}_{11}a^{[1]}_{12}+w^{[2]}_{21}a^{[1]}_{22}+w^{[2]}_{31}a^{[1]}_{32}+b^{[2]}_{1} ,\hspace{0.1cm}
a^{[2]}_{13}= w^{[2]}_{11}a^{[1]}_{13}+w^{[2]}_{21}a^{[1]}_{23}+w^{[2]}_{31}a^{[1]}_{33}+b^{[2]}_{1}\\
%
a^{[2]}_{21}&= w^{[2]}_{12}a^{[1]}_{11}+w^{[2]}_{22}a^{[1]}_{21}+w^{[2]}_{32}a^{[1]}_{31}+b^{[2]}_{2},\hspace{0.1cm}  
a^{[2]}_{22} = w^{[2]}_{12}a^{[1]}_{12}+w^{[2]}_{22}a^{[1]}_{22}+w^{[2]}_{32}a^{[1]}_{32}+b^{[2]}_{2} ,\hspace{0.1cm}
a^{[2]}_{23}= w^{[2]}_{12}a^{[1]}_{13}+w^{[2]}_{22}a^{[1]}_{23}+w^{[2]}_{32}a^{[1]}_{33}+b^{[2]}_{2}\\
%
a^{[2]}_{31}&= w^{[2]}_{13}a^{[1]}_{11}+w^{[2]}_{23}a^{[1]}_{21}+w^{[2]}_{33}a^{[1]}_{31}+b^{[2]}_{3},\hspace{0.1cm}  
a^{[2]}_{32} = w^{[2]}_{13}a^{[1]}_{12}+w^{[2]}_{23}a^{[1]}_{22}+w^{[2]}_{33}a^{[1]}_{32}+b^{[2]}_{3} ,\hspace{0.1cm}
a^{[2]}_{33}= w^{[2]}_{13}a^{[1]}_{13}+w^{[2]}_{23}a^{[1]}_{23}+w^{[2]}_{33}a^{[1]}_{33}+b^{[2]}_{3}\\
%
a^{[2]}_{41}&= w^{[2]}_{14}a^{[1]}_{11}+w^{[2]}_{24}a^{[1]}_{21}+w^{[2]}_{34}a^{[1]}_{31}+b^{[2]}_{3},\hspace{0.1cm}  
a^{[2]}_{42} = w^{[2]}_{14}a^{[1]}_{12}+w^{[2]}_{24}a^{[1]}_{22}+w^{[2]}_{34}a^{[1]}_{32}+b^{[2]}_{3} ,\hspace{0.1cm}
a^{[2]}_{43}= w^{[2]}_{14}a^{[1]}_{13}+w^{[2]}_{24}a^{[1]}_{23}+w^{[2]}_{34}a^{[1]}_{33}+b^{[2]}_{3}\\
%
a^{[2]}_{51}&= w^{[2]}_{15}a^{[1]}_{11}+w^{[2]}_{25}a^{[1]}_{21}+w^{[2]}_{35}a^{[1]}_{31}+b^{[2]}_{3},\hspace{0.1cm}  
a^{[2]}_{52} = w^{[2]}_{15}a^{[1]}_{12}+w^{[2]}_{25}a^{[1]}_{22}+w^{[2]}_{35}a^{[1]}_{32}+b^{[2]}_{3} ,\hspace{0.1cm}
a^{[2]}_{53}= w^{[2]}_{15}a^{[1]}_{13}+w^{[2]}_{25}a^{[1]}_{23}+w^{[2]}_{35}a^{[1]}_{33}+b^{[2]}_{3}
\end{aligned}
\end{equation}

### For layer 3; that is, $A^{[3]}$
\begin{equation}
A^{[3]} = W^{[3]}A^{[2]}+b^{[3]}
\end{equation}

\begin{equation}
A^{[3]} = 
\begin{bmatrix}
w^{[3]}_{11} & w^{[3]}_{21} & w^{[3]}_{31} & w^{[3]}_{41} & w^{[3]}_{51} 
\end{bmatrix}
\begin{bmatrix}
a^{[2]}_{11} & a^{[2]}_{12} & a^{[2]}_{13}  \\
a^{[2]}_{21} & a^{[2]}_{22} & a^{[2]}_{23} \\
a^{[2]}_{31} & a^{[2]}_{32} & a^{[2]}_{33} \\
a^{[2]}_{41} & a^{[2]}_{42} & a^{[2]}_{43} \\
a^{[2]}_{51} & a^{[2]}_{52} & a^{[2]}_{53} 
\end{bmatrix} 
+ 
\begin{bmatrix}
b^{[3]}_{1} 
\end{bmatrix}
\end{equation}

\begin{equation}
A^{[3]} =  
\begin{bmatrix}
a^{[3]}_{11}& a^{[3]}_{12}& a^{[3]}_{13}  
\end{bmatrix}
\end{equation}

where
\begin{equation}
\begin{aligned}
a^{[3]}_{11}= w^{[3]}_{11}a^{[3]}_{11}+w^{[3]}_{21}a^{[2]}_{21}+w^{[3]}_{31}a^{[2]}_{31}+w^{[3]}_{41}a^{[2]}_{41}+w^{[3]}_{51}a^{[2]}_{51}+b^{[3]}_{1} ,\\
a^{[3]}_{12} = w^{[3]}_{11}a^{[1]}_{12}+w^{[3]}_{21}a^{[2]}_{22}+w^{[3]}_{31}a^{[2]}_{32}+w^{[3]}_{41}a^{[2]}_{41}+w^{[3]}_{51}a^{[2]}_{51}+b^{[3]}_{1}  ,\\
a^{[3]}_{13}= w^{[3]}_{11}a^{[1]}_{13}+w^{[3]}_{21}a^{[2]}_{23}+w^{[3]}_{31}a^{[2]}_{33}+w^{[3]}_{41}a^{[2]}_{41}+w^{[3]}_{51}a^{[2]}_{51}+b^{[3]}_{1} 
\end{aligned}
\end{equation}

## Step 3: Cost computation
The cost is computed using the following formula
\begin{equation}
\mathcal{J}(W,b) = \frac{1}{m}\sum_{i=1}^{m}\{a_i-y_i\}^2
\end{equation}

For given example, it will be calculated as follows:
\begin{equation}
\mathcal{J}(W,b) = \frac{1}{m}\{(a_{11}-y_{11})^2+(a_{12}-y_{12})^2+(a_{13}-y_{13})^2\}
\end{equation}

## Step 4: Backpropagation

Now, we will calculate the following

| Weight Matrix           | Dimension        | Dimension |
|-------------------------|------------------|---------|
| $dW^{[1]}$ :    | $n^{[1]}\times n^{[0]}$=    | $3\times4$ |
| $dW^{[2]}$  :   | $n^{[2]}\times n^{[1]}$=    | $5\times3$ |   
| $dW^{[3]}$  :   | $n^{[3]}\times n^{[2]}$=    | $1\times5$ |  