# 📖 TABLE OF CONTENTS

- [1. Intro to Forward Propagation]()
- [2. Placement Classifier Problem]()
  - [1. MLP Architecture to solve Placement Classifier Problem]()
  - [2. Calculation of prediction $\hat y_i$ for a given input $X_i$]()
  - [3. Simplified Notation to understand Forward Propagation]()

![rainbow](https://github.com/ancilcleetus/My-Learning-Journey/assets/25684256/839c3524-2a1d-4779-85a0-83c562e1e5e5)

# 1. Intro to Forward Propagation

Prediction by a neural network for a given input is done through Forward Propagation. Whatever be the architecture or however big, Forward Propagation is handled beautifully by Linear Algebra using the dot products with matrices.

![rainbow](https://github.com/ancilcleetus/My-Learning-Journey/assets/25684256/839c3524-2a1d-4779-85a0-83c562e1e5e5)

# 2. Placement Classifier Problem

## 1. MLP Architecture to solve Placement Classifier Problem

Here, we have 4 features $cgpa, iq, 10^{th} marks, 12^{th} marks$ and 1 output $placed$. MLP Architecture is illustrated below:

In [None]:
# Placement Classifier MLP Architecture

from IPython import display
display.Image("data/images/DL_04_Forward_Propagation-01-Placement-Classifier-MLP-Architecture.jpg")

<IPython.core.display.Image object>

- No of trainable parameters i.e. no of parameters updated during Backpropagation = $(4 \times 3 + 3) + (3 \times 2 + 2) + (2 \times 1 + 1) = 15 + 8 + 3 = 26$. These 26 parameters will be initialized with random values.

- Output of a layer of Perceptrons = $\sigma (W^T \cdot X + b)$

## 2. Calculation of prediction $\hat y_i$ for a given input $X_i$

Given input $X_i$, we will calculate prediction $\hat y_i$ by finding out the output of each layer as below:

Output of Layer 1 is given by

$O^1 = \sigma (W^1 \cdot X + b_1)
= \sigma (
\left[
\begin{array}{ccc}
W^1_{11} & W^1_{12} & W^1_{13} \\
W^1_{21} & W^1_{22} & W^1_{23} \\
W^1_{31} & W^1_{32} & W^1_{33} \\
W^1_{41} & W^1_{42} & W^1_{43}
\end{array}
\right]^\top
\cdot
\left[
\begin{array}{ccc}
X_{i1} \\
X_{i2} \\
X_{i3} \\
X_{i4}
\end{array}
\right]
+
\left[
\begin{array}{ccc}
b_{11} \\
b_{12} \\
b_{13}
\end{array}
\right])
= \sigma (
\left[
\begin{array}{ccc}
W^1_{11}X_{i1} + W^1_{21}X_{i2} + W^1_{31}X_{i3} + W^1_{41}X_{i4} + b_{11} \\
W^1_{12}X_{i1} + W^1_{22}X_{i2} + W^1_{32}X_{i3} + W^1_{42}X_{i4} + b_{12} \\
W^1_{13}X_{i1} + W^1_{23}X_{i2} + W^1_{33}X_{i3} + W^1_{43}X_{i4} + b_{13}
\end{array}
\right]
)
=
\left[
\begin{array}{ccc}
O_{11} \\
O_{12} \\
O_{13}
\end{array}
\right]$

where:
- $W^1$ is the weights matrix between Input layer and Layer 1 and has a shape $(4 \times 3)$,
- $X$ is the Layer 1 input matrix of shape $(4 \times 1)$,
- $b_1$ is the Layer 1 bias matrix of shape $(3 \times 1)$,
- $O^1$ is the Layer 1 output matrix of shape $(3 \times 1)$.


Output of Layer 2 is given by

$O^2 = \sigma (W^2 \cdot O^1 + b_2)
= \sigma (
\left[
\begin{array}{ccc}
W^2_{11} & W^2_{12} \\
W^2_{21} & W^2_{22} \\
W^2_{31} & W^2_{32}
\end{array}
\right]^\top
\cdot
\left[
\begin{array}{ccc}
O_{11} \\
O_{12} \\
O_{13}
\end{array}
\right]
+
\left[
\begin{array}{ccc}
b_{21} \\
b_{22}
\end{array}
\right])
= \sigma (
\left[
\begin{array}{ccc}
W^2_{11}O_{11} + W^2_{21}O_{12} + W^3_{31}O_{13} + b_{21} \\
W^2_{12}O_{11} + W^2_{22}O_{12} + W^2_{32}O_{13} + b_{22}
\end{array}
\right]
)
=
\left[
\begin{array}{ccc}
O_{21} \\
O_{22}
\end{array}
\right]$

where:
- $W^2$ is the weights matrix between Layer 1 and Layer 2 and has a shape $(3 \times 2)$,
- $O^1$ is the Layer 2 input matrix of shape $(3 \times 1)$,
- $b_2$ is the Layer 2 bias matrix of shape $(2 \times 1)$,
- $O^2$ is the Layer 2 output matrix of shape $(2 \times 1)$.


Output of Layer 3 is given by

$O^3 = \sigma (W^3 \cdot O^2 + b_3)
= \sigma (
\left[
\begin{array}{ccc}
W^3_{11} \\
W^3_{21}
\end{array}
\right]^\top
\cdot
\left[
\begin{array}{ccc}
O_{21} \\
O_{22}
\end{array}
\right]
+
\left[
\begin{array}{ccc}
b_{31}
\end{array}
\right])
= \sigma (
\left[
\begin{array}{ccc}
W^3_{11}O_{21} + W^3_{21}O_{22} + b_{31}
\end{array}
\right]
)
=
\left[
\begin{array}{ccc}
O_{31}
\end{array}
\right]
= \hat y_i$

where:
- $W^3$ is the weights matrix between Layer 2 and Layer 3 and has a shape $(2 \times 1)$,
- $O^2$ is the Layer 3 input matrix of shape $(2 \times 1)$,
- $b_3$ is the Layer 3 bias matrix of shape $(1 \times 1)$,
- $O^3$ is the Layer 3 output matrix of shape $(1 \times 1)$


Since Layer 3 is also the final Output layer, for a given input $X_i$, prediction $\hat y_i = O^3$.

## 3. Simplified Notation to understand Forward Propagation

Let us use below notation:

- Weight matrix before a layer $i \implies W^{[i]}$
- Bias matrix of a layer $i \implies b^{[i]}$
- Input matrix before a layer $i$ i.e. after activation from previous layer $i-1$ $\implies a^{[i-1]}$
- Input $X_i \implies a^{[0]}$, $O^1 \implies a^{[1]}$, $O^2 \implies a^{[2]}$, $\hat y_i \implies O^3 \implies a^{[3]}$,
<br>
<br>

Then, we have:

- $a^{[1]} = \sigma (a^{[0]} \cdot W^{[1]} + b^{[1]})$
- $a^{[2]} = \sigma (a^{[1]} \cdot W^{[2]} + b^{[2]})$
- $a^{[3]} = \sigma (a^{[2]} \cdot W^{[3]} + b^{[3]})$
<br>
<br>

We can chain all the above equations and get the below result:

$\hat y_i = a^{[3]} = \sigma (\sigma (\sigma (a^{[0]} \cdot W^{[1]} + b^{[1]}) \cdot W^{[2]} + b^{[2]}) \cdot W^{[3]} + b^{[3]})$
<br>
<br>

However big or complex the architecture is, Linear Algebra helps us in doing mathematics in a very organized way resulting in the above simplified expression. With even deeper neural networks, we can extend this expression even further by chaining more layers.

![rainbow](https://github.com/ancilcleetus/My-Learning-Journey/assets/25684256/839c3524-2a1d-4779-85a0-83c562e1e5e5)