# **3 Forward and Backward Propagation (ReLU)**

Instruction: Perform a forward and backward propagation in python using the inputs from Laboratory Task 2

`x = np.array([1, 0, 1])
y = np.array([1])`

use relu as the activation function.

`# learning rate
lr = 0.001`

In [1]:
import numpy as np

# Inputs and expected output
x = np.array([1, 0, 1])
y = np.array([1])

In [2]:
# Learning rate
lr = 0.001

In [3]:
# Initialize weights and bias
np.random.seed(42)
W = np.random.randn(3, 1)   # 3 inputs â†’ 1 output
b = np.random.randn(1)

In [4]:
# ReLU activation
def relu(z):
    return np.maximum(0, z)

def relu_derivative(z):
    return (z > 0).astype(float)

# Forward propagation
z = np.dot(x, W) + b
a = relu(z)

# Loss (Mean Squared Error)
loss = np.square(y - a).mean()

print("Forward Propagation")
print("Input:", x)
print("Weights:", W.flatten())
print("Bias:", b)
print("Z:", z)
print("Activated Output (a):", a)
print("Loss:", loss)

Forward Propagation
Input: [1 0 1]
Weights: [ 0.49671415 -0.1382643   0.64768854]
Bias: [1.52302986]
Z: [2.66743255]
Activated Output (a): [2.66743255]
Loss: 2.780331300528872


In [5]:
# Backward propagation
dz = (a - y) * relu_derivative(z)  # derivative of loss w.r.t z
dW = x.reshape(-1, 1) * dz
db = dz

# Update parameters
W -= lr * dW
b -= lr * db

print("\nBackward Propagation")
print("dW:", dW.flatten())
print("db:", db)
print("Updated Weights:", W.flatten())
print("Updated Bias:", b)


Backward Propagation
dW: [1.66743255 0.         1.66743255]
db: [1.66743255]
Updated Weights: [ 0.49504672 -0.1382643   0.64602111]
Updated Bias: [1.52136242]


**Observations & Insights**

- Training loss shows **large fluctuations** (e.g., ~18 at epoch 300 but >6000 at epoch 500), indicating instability with SGD on small batches.  
- Despite fluctuations, the model can occasionally reach **very low training losses (~15)**, meaning it has the capacity to fit the dataset.  
- The **final training loss (~14.9)** is much lower than the early epochs, showing the model does learn patterns over time.  
- The **final test loss (~2920.8)** is high compared to training loss, suggesting **overfitting** and poor generalization.  
- Using the Diabetes dataset (small sample size, noisy target) contributes to unstable convergence.  
- Two fully connected layers give the model more flexibility, but with **SGD (no momentum)** and **batch size of 8**, the optimization remains noisy.  
- Overall: the model fits training data but struggles to generalize, highlighting dataset limitations and optimizer instability.