# Day-59: Forward and Backpropagation

Today,we’re diving into the heart of how neural networks actually learn — Forward Propagation and Backpropagation.

Think of this as the learning cycle of a neural network — how it makes a prediction, realizes how wrong it was, and then corrects itself step by step until it becomes a smart model!

## Topic Covered:

- Forward Propagation

- Loss Function

- Gradient Descent

- Backpropagation

- Weight Update

## Forward Propagation (The Guess)

**Forward Propagation** is the process of feeding the input data (  $ \vec x $
 ) through the network, layer by layer, to compute the final output prediction ( $\^y$).

**Process**:

1. Input: Features ( $\vec x $) are fed into the first layer.

2. Weighted Sum: Each neuron calculates the linear combination: $ z=( \vec x ⋅ \vec w)+b $.

3. Activation: The output $ z $ is passed through an Activation Function (like $ReLU$) to introduce non-linearity: $a=ReLU(z)$.

4. Repeat: This activated output (a) becomes the input for the next layer.

5. Prediction ($\hat y$ ): The final output layer produces the network's prediction.

- `analogy`: 
    - Imagine you’re a teacher grading a student’s math answer. The student (neural network) takes the question (input), does some calculations (weighted sum + activation), and gives an answer (output).

    - Forward propagation is just the student solving the problem — passing inputs through layers and generating an output.

## Loss Functions (Measuring the Error)

Once the network makes a prediction $( \hat y)$, we need a metric to measure how wrong it was compared to the true label $(y)$. This metric is the Loss Function (or Cost Function).

Basically, It's a function that quantifies the difference between the network's predicted output and the actual target value. A lower loss means a better model.

### Example:

### Mean Squared Error (MSE) (For Regression)
$$ \text{Loss} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 $$

### Categorical Cross-Entropy (For Classification - Most Common)
$$ \text{Loss} = - \sum_{i} y_i \log(\hat{y}_i) $$
This penalizes the model heavily when it predicts a low probability for the correct class.

## Gradient Descent (The Optimization)

The Gradient tells us the direction of the steepest increase in loss. To minimize the loss, we must move in the opposite direction. This is Gradient Descent.

- `Analogy`: Walking Down a Hill
    - Imagine you are blindfolded on a foggy hill (the loss surface) and want to find the lowest point (minimum loss). You can't see the bottom, but you can feel the slope (the gradient). You take small, careful steps in the direction of the steepest descent.

## Backpropagation (The Correction)

This is the core learning mechanism. It uses the calculated Loss to figure out exactly how much each weight $(w)$ and bias $(b)$ contributed to that error, and then adjusts them slightly to reduce the error next time.

The Chain Rule: Backpropagation relies entirely on the Chain Rule from calculus to distribute the total error backward through all the layers. It calculates the gradient (slope) of the loss function with respect to every single weight in the network.

### The Chain Rule:

Backpropagation relies entirely on the Chain Rule from calculus to distribute the total error backward through all the layers. It calculates the gradient (slope) of the loss function with respect to every single weight in the network.

### The Update Rule (Weight Updates):


We update every weight ( 
w
 ) in the network using this fundamental formula:

$$ w_{\text{new}} = w_{\text{old}} - \eta \cdot \frac{\partial \text{Loss}}{\partial w_{\text{old}}} $$

∂w 
old
​
 
∂Loss
​
 : The Gradient calculated by Backpropagation.

η (eta): The Learning Rate (a hyperparameter). This controls the size of the steps you take down the hill.

If η is too large, you might overshoot the minimum (the optimal weights).

If η is too small, training will take too long.

In [1]:
import numpy as np
from tabulate import tabulate

# --- 1. Define the Loss Function and its Derivative (Gradient) ---

def loss_function(x):
    """The function we want to minimize: f(x) = x^2."""
    return x**2

def gradient(x):
    """The derivative (slope) of the loss function: f'(x) = 2x."""
    return 2 * x

# --- 2. Gradient Descent Algorithm ---

def gradient_descent(starting_point, learning_rate, n_iterations):
    """
    Applies the gradient descent algorithm to find the minimum of the loss function.
    In a NN, x would be the weights (w).
    """
    current_x = starting_point
    history = []

    for i in range(n_iterations):
        # Calculate the gradient (slope) at the current point
        grad = gradient(current_x)

        # Update Rule: Move opposite to the slope
        # x_new = x_old - learning_rate * gradient
        current_x = current_x - learning_rate * grad

        # Record the state
        history.append([
            i + 1, 
            np.round(current_x, 5), 
            np.round(grad, 5), 
            np.round(loss_function(current_x), 5)
        ])

    return current_x, history

# --- 3. Run the Simulation ---
# Parameters
starting_point = 4.0      # Start high on the parabola
learning_rate = 0.1       # How big are our steps?
n_iterations = 10         # How many steps to take?

# Run the descent
final_x, descent_history = gradient_descent(starting_point, learning_rate, n_iterations)

# Display Results
print("--- Gradient Descent Simulation (Minimizing f(x) = x^2) ---")
print(f"Starting Point (x): {starting_point}")
print(f"Learning Rate (eta): {learning_rate}")
print(f"Minimum is at x = 0, Loss = 0.")

print("\n--- Iteration History ---")
print(tabulate(descent_history, 
               headers=["Iteration", "Current x", "Gradient", "Current Loss"], 
               tablefmt="github"))

print(f"\nFinal x value after {n_iterations} iterations: {np.round(final_x, 5)}")
print(f"Final Loss: {np.round(loss_function(final_x), 5)}")


--- Gradient Descent Simulation (Minimizing f(x) = x^2) ---
Starting Point (x): 4.0
Learning Rate (eta): 0.1
Minimum is at x = 0, Loss = 0.

--- Iteration History ---
|   Iteration |   Current x |   Gradient |   Current Loss |
|-------------|-------------|------------|----------------|
|           1 |     3.2     |    8       |       10.24    |
|           2 |     2.56    |    6.4     |        6.5536  |
|           3 |     2.048   |    5.12    |        4.1943  |
|           4 |     1.6384  |    4.096   |        2.68435 |
|           5 |     1.31072 |    3.2768  |        1.71799 |
|           6 |     1.04858 |    2.62144 |        1.09951 |
|           7 |     0.83886 |    2.09715 |        0.70369 |
|           8 |     0.67109 |    1.67772 |        0.45036 |
|           9 |     0.53687 |    1.34218 |        0.28823 |
|          10 |     0.4295  |    1.07374 |        0.18447 |

Final x value after 10 iterations: 0.4295
Final Loss: 0.18447


In [2]:
import numpy as np

# Sigmoid activation
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

# Derivative of sigmoid
def sigmoid_derivative(x):
    return x * (1 - x)

# Training data
X = np.array([[0,0], [0,1], [1,0], [1,1]])  # Inputs
y = np.array([[0], [1], [1], [0]])          # XOR Output

# Initialize weights and learning rate
np.random.seed(42)
weights_input_hidden = np.random.rand(2, 2)
weights_hidden_output = np.random.rand(2, 1)
lr = 0.1

for epoch in range(10000):
    # Forward propagation
    hidden_input = np.dot(X, weights_input_hidden)
    hidden_output = sigmoid(hidden_input)

    final_input = np.dot(hidden_output, weights_hidden_output)
    final_output = sigmoid(final_input)

    # Loss calculation (Mean Squared Error)
    loss = np.mean((y - final_output) ** 2)

    # Backpropagation
    d_loss_output = (y - final_output) * sigmoid_derivative(final_output)
    d_loss_hidden = d_loss_output.dot(weights_hidden_output.T) * sigmoid_derivative(hidden_output)

    # Weight updates
    weights_hidden_output += hidden_output.T.dot(d_loss_output) * lr
    weights_input_hidden += X.T.dot(d_loss_hidden) * lr

    if epoch % 2000 == 0:
        print(f"Epoch {epoch}, Loss: {loss:.4f}")

print("Final Output:\n", final_output)


Epoch 0, Loss: 0.2521
Epoch 2000, Loss: 0.2467
Epoch 4000, Loss: 0.1920
Epoch 6000, Loss: 0.1324
Epoch 8000, Loss: 0.0969
Final Output:
 [[0.20369158]
 [0.73603066]
 [0.73604444]
 [0.34370702]]
