### 🔥 **Forward Propagation in a Multi-Layer Perceptron (MLP) – A Full Breakdown** 🔥  

Forward propagation (also called **forward pass**) is the process where input data moves **layer by layer** through the network until it reaches the output layer, producing a prediction. This is the first step in training a neural network, followed by backpropagation, which adjusts weights to minimize errors.



## 🏗 **1. Components Involved in Forward Propagation**  

Before diving into the process, let’s define key components:

- **Input Layer ($ X $)**: Receives the raw data as feature values.
- **Weights ($ W $)**: These are the parameters that adjust how signals pass between neurons.
- **Bias ($ b $)**: A bias term allows the activation function to shift.
- **Activation Function ($ \sigma $)**: Introduces non-linearity to the model, allowing it to learn complex patterns.
- **Output Layer ($ Y $)**: Produces the final prediction.



## 🔄 **2. Step-by-Step Breakdown of Forward Propagation**  

Let’s assume an **MLP with one hidden layer**, meaning the network structure is:

- **Input Layer**: 3 neurons
- **Hidden Layer**: 2 neurons
- **Output Layer**: 1 neuron (Binary classification)

### 🔹 **Step 1: Input Layer to Hidden Layer**  

Each neuron in the hidden layer receives input from **all input neurons**, applies weights and bias, and then passes the result through an **activation function**.

Mathematically, the operation at each hidden neuron follows:

$$
Z^{(1)} = W^{(1)} X + b^{(1)}
$$

Where:
- $ W^{(1)} $ = Weight matrix from input to hidden layer  
- $ X $ = Input features  
- $ b^{(1)} $ = Bias term  
- $ Z^{(1)} $ = Weighted sum before activation  

Each neuron in the hidden layer applies an **activation function** (e.g., ReLU, Sigmoid):

$$
A^{(1)} = \sigma(Z^{(1)})
$$

Where $ A^{(1)} $ is the **activated output** of the hidden layer.

📌 **Example Calculation**:

Suppose:
- Input $ X = [x_1, x_2, x_3] = [0.5, 0.2, 0.8] $
- Weights $ W^{(1)} = \begin{bmatrix} 0.3 & -0.2 & 0.5 \\ 0.7 & 0.1 & -0.6 \end{bmatrix} $  
- Bias $ b^{(1)} = \begin{bmatrix} 0.1 \\ -0.3 \end{bmatrix} $  
- Activation function = **ReLU** $ \sigma(x) = \max(0, x) $

The weighted sum $ Z^{(1)} $ is calculated as:

$$
Z^{(1)} = \begin{bmatrix} 0.3 & -0.2 & 0.5 \\ 0.7 & 0.1 & -0.6 \end{bmatrix} \times \begin{bmatrix} 0.5 \\ 0.2 \\ 0.8 \end{bmatrix} + \begin{bmatrix} 0.1 \\ -0.3 \end{bmatrix}
$$

Breaking it down:

$$
Z_1 = (0.3 \times 0.5) + (-0.2 \times 0.2) + (0.5 \times 0.8) + 0.1 = 0.47
$$
$$
Z_2 = (0.7 \times 0.5) + (0.1 \times 0.2) + (-0.6 \times 0.8) - 0.3 = -0.23
$$

Applying **ReLU Activation**:

$$
A^{(1)}_1 = \max(0, 0.47) = 0.47
$$
$$
A^{(1)}_2 = \max(0, -0.23) = 0
$$

Thus, the activated outputs of the hidden layer are:

$$
A^{(1)} = \begin{bmatrix} 0.47 \\ 0 \end{bmatrix}
$$



### 🔹 **Step 2: Hidden Layer to Output Layer**  

The process is repeated:  
1. Compute weighted sum **$ Z^{(2)} $**
2. Apply activation function **$ \sigma $**

$$
Z^{(2)} = W^{(2)} A^{(1)} + b^{(2)}
$$

$$
A^{(2)} = \sigma(Z^{(2)})
$$

📌 **Example Calculation**:

Suppose:
- Weights $ W^{(2)} = \begin{bmatrix} 0.4 & -0.7 \end{bmatrix} $
- Bias $ b^{(2)} = \begin{bmatrix} 0.2 \end{bmatrix} $
- Activation function = **Sigmoid** $ \sigma(x) = \frac{1}{1 + e^{-x}} $

The weighted sum:

$$
Z^{(2)} = (0.4 \times 0.47) + (-0.7 \times 0) + 0.2
$$

$$
Z^{(2)} = 0.188 + 0 + 0.2 = 0.388
$$

Applying **Sigmoid Activation**:

$$
A^{(2)} = \frac{1}{1 + e^{-0.388}} = 0.595
$$

This is the **final output (prediction)**. If this is for **binary classification**, we interpret:

- $ A^{(2)} > 0.5 $ → Class 1
- $ A^{(2)} < 0.5 $ → Class 0



## 🔥 **3. Key Takeaways on Forward Propagation**
✅ **Weight Matrix Multiplication**: Determines how signals pass through the network.  
✅ **Activation Functions**: Add non-linearity for better learning.  
✅ **Output Layer Interpretation**: Uses Softmax (multi-class) or Sigmoid (binary).  



### **📌 Summary of Forward Propagation Steps**
1️⃣ Compute weighted sum $ Z = WX + b $ for each layer.  
2️⃣ Apply **activation function** $ A = \sigma(Z) $.  
3️⃣ Repeat until reaching the **output layer**.  
4️⃣ Output is interpreted based on **activation function** (e.g., Sigmoid for probability).  



## 🚀 **Python Code for Forward Propagation in NumPy**
Here’s a simple implementation:

```python
import numpy as np

# Activation functions
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def relu(x):
    return np.maximum(0, x)

# Input (3 neurons)
X = np.array([[0.5, 0.2, 0.8]])

# Weights and Bias for Hidden Layer
W1 = np.array([[0.3, -0.2, 0.5], [0.7, 0.1, -0.6]])
b1 = np.array([[0.1, -0.3]])

# Forward propagation to Hidden Layer
Z1 = np.dot(X, W1.T) + b1
A1 = relu(Z1)

# Weights and Bias for Output Layer
W2 = np.array([[0.4, -0.7]])
b2 = np.array([[0.2]])

# Forward propagation to Output Layer
Z2 = np.dot(A1, W2.T) + b2
A2 = sigmoid(Z2)

print("Final Output:", A2)
```

This simulates **one forward pass** through an MLP! 🚀

---

Great question! Let's break it down clearly. **Forward propagation** is the process where inputs pass through the network **layer by layer** until we get an output prediction.  



## **🔄 Where is Forward Propagation in Code?**  

Forward propagation happens **whenever we call the model on data**.  
This occurs during:  
- **Training (`model.fit()`)** → Forward pass + Backpropagation  
- **Prediction (`model.predict()`)** → Only Forward pass  

Let's explicitly separate **forward propagation** in the code!



### **🚀 Full Code Highlighting Forward Propagation**
```python
import tensorflow as tf
from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt

# Load MNIST dataset
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Normalize the images (scale pixel values to 0-1)
x_train, x_test = x_train / 255.0, x_test / 255.0

# Define the neural network
model = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),  # Converts 2D image to 1D
    keras.layers.Dense(128, activation='relu'),  # Hidden layer
    keras.layers.Dense(10, activation='softmax')  # Output layer (10 classes)
])

# Compile the model (choosing optimizer & loss function)
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# - 🚀 Forward Propagation Happens Here! - #
# When calling model.fit(), the input data (x_train) goes through:
# 1. Flatten layer → Converts image into 1D array
# 2. Dense (128 neurons, ReLU) → Extracts important features
# 3. Dense (10 neurons, Softmax) → Outputs probabilities for each digit (0-9)
# The model generates predictions, which are compared with y_train.

# Train the model
history = model.fit(x_train, y_train, epochs=10, validation_data=(x_test, y_test))
```
### **🔹 What Happens in Forward Propagation?**
1️⃣ The **Flatten** layer reshapes input images.  
2️⃣ The first **Dense (128 neurons, ReLU)** transforms data with learned weights.  
3️⃣ The second **Dense (10 neurons, Softmax)** gives probability scores for each digit.  
4️⃣ The model **outputs predictions** → Compared with actual labels (`y_train`).  

During `model.fit()`, TensorFlow does both **forward propagation** (to get predictions) and **backpropagation** (to adjust weights).  



## **📌 Explicit Forward Propagation for Prediction**
If you only want **forward propagation** (without backpropagation), you can use `model.predict()`:

```python
# Perform forward propagation on test images
sample_image = x_test[:5]  # Take 5 sample images
predictions = model.predict(sample_image)  # 🚀 Only Forward Propagation Here!

# Print predicted classes
predicted_classes = np.argmax(predictions, axis=1)
print("Predicted Labels:", predicted_classes)
```
✅ Here, **only forward propagation** is performed.  
❌ No backpropagation, since we're not updating weights.  



## **🌟 Summary**
- **Forward Propagation:** Happens in `model.fit()` (during training) and `model.predict()` (during inference).  
- **Backpropagation:** Only happens during `model.fit()` to adjust weights using gradient descent.  


---

![](images/multi.png)