## Neural Network

Backpropagation (short for "backward propagation of errors") is the algorithm that enables neural networks to learn. It’s used to update the weights in a network so that it gets better at making predictions.

### Step 1: Implementing a Simple Neural Network (Without Backpropagation Yet)

Before we introduce backprop, let’s first build a basic neural network with forward propagation.

This will help us see how data flows forward before we start calculating gradients.

#### Python Code: Forward Pass in a Simple Neural Network
We'll implement a 2-layer neural network with one hidden layer.

In [4]:
import numpy as np 

# sigmoid activation function
def sigmoid(x):
    return 1/(1 + np.exp(-x))

# Derivative of sigmoid (needed later for backpropagation)
def sigmoid_derivative(x):
    return x * (1 - x)

# Sample input (2 features)
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])  # Input data
y = np.array([[0], [1], [1], [0]])  # Expected output (XOR problem)

# Initialize weights randomly
np.random.seed(42)
input_layer_neurons = 2
hidden_layer_neurons = 2
output_neurons = 1

# Randomly initializing weights
W1 = np.random.uniform(size=(input_layer_neurons, hidden_layer_neurons))
W2 = np.random.uniform(size=(hidden_layer_neurons, output_neurons))

# Forward Propagation
hidden_layer_input = np.dot(X, W1)  # Input to hidden layer
hidden_layer_output = sigmoid(hidden_layer_input)  # Activation function

output_layer_input = np.dot(hidden_layer_output, W2)  # Input to output layer
output = sigmoid(output_layer_input)  # Activation function

print("Predicted Output (Before Training):")
print(output)

Predicted Output (Before Training):
[[0.53892274]
 [0.55132394]
 [0.5510619 ]
 [0.56117033]]


### Step 2: Loss Function & Understanding Gradient Descent

Now that we’ve implemented forward propagation, it’s time to measure how wrong our neural network’s predictions are. This is where the loss function comes in.

A loss function calculates the difference between the predicted output and the actual target. Our goal is to minimize this loss so that the neural network makes better predictions.

For a binary classification task, the most commonly used loss function is the Mean Squared Error (MSE) or Binary Cross-Entropy (Log Loss).
For now, we’ll use MSE, which is simple and works well for small networks.

Gradient Descent is the algorithm that helps adjust the weights of our network in a way that reduces the loss.

#### Python Code: Adding Loss Calculation

Now, let’s modify our forward propagation code to include loss calculation.

In [5]:
import numpy as np

# Sigmoid activation function
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

# Mean Squared Error (MSE) Loss Function
def mse_loss(y_true, y_pred):
    return np.mean((y_true - y_pred) ** 2)

# Sample input (2 features)
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])  # Input data
y = np.array([[0], [1], [1], [0]])  # Expected output (XOR problem)

# Initialize weights randomly
np.random.seed(42)
W1 = np.random.uniform(size=(2, 2))  # Weights for input to hidden layer
W2 = np.random.uniform(size=(2, 1))  # Weights for hidden to output layer

# Forward Propagation
hidden_layer_input = np.dot(X, W1)  # Input to hidden layer
hidden_layer_output = sigmoid(hidden_layer_input)  # Activation function

output_layer_input = np.dot(hidden_layer_output, W2)  # Input to output layer
output = sigmoid(output_layer_input)  # Activation function

# Calculate loss
loss = mse_loss(y, output)

print("Predicted Output (Before Training):")
print(output)
print("\nLoss (Before Training):", loss)


Predicted Output (Before Training):
[[0.53892274]
 [0.55132394]
 [0.5510619 ]
 [0.56117033]]

Loss (Before Training): 0.2520513692725072


### Step 3: Understanding the Math Behind Backpropagation

Backpropagation is an algorithm used to compute the gradients of the loss function with respect to the weights and biases of the network.
It helps us determine how much each weight contributed to the error, so we can update them accordingly.

It consists of two main steps:

    1) Forward Pass → Compute predictions & loss.
    
    2) Backward Pass (Backpropagation) → Compute gradients & update weights.

In [None]:
import numpy as np

# Sigmoid activation function
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

# Derivative of sigmoid function
def sigmoid_derivative(x):
    return x * (1 - x)

# Mean Squared Error (MSE) Loss Function
def mse_loss(y_true, y_pred):
    return np.mean((y_true - y_pred) ** 2)

# Sample input (2 features)
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])  # Input data
y = np.array([[0], [1], [1], [0]])  # Expected output (XOR problem)

# Initialize weights randomly
np.random.seed(42)
W1 = np.random.uniform(size=(2, 2))  # Weights for input to hidden layer
W2 = np.random.uniform(size=(2, 1))  # Weights for hidden to output layer

# Learning rate
learning_rate = 0.5

# Training loop
epochs = 10000  # Number of iterations
for epoch in range(epochs):
    # **Forward Propagation**
    hidden_layer_input = np.dot(X, W1)  # Input to hidden layer
    hidden_layer_output = sigmoid(hidden_layer_input)  # Activation function

    output_layer_input = np.dot(hidden_layer_output, W2)  # Input to output layer
    output = sigmoid(output_layer_input)  # Activation function

    # **Compute Loss**
    loss = mse_loss(y, output)

    # **Backward Propagation**
    # Compute gradient for output layer
    output_error = y - output
    output_delta = output_error * sigmoid_derivative(output)

    # Compute gradient for hidden layer
    hidden_layer_error = output_delta.dot(W2.T)
    hidden_layer_delta = hidden_layer_error * sigmoid_derivative(hidden_layer_output)

    # **Update Weights**
    W2 += hidden_layer_output.T.dot(output_delta) * learning_rate
    W1 += X.T.dot(hidden_layer_delta) * learning_rate

    # Print loss every 1000 epochs
    if epoch % 1000 == 0:
        print(f"Epoch {epoch}, Loss: {loss}")

print("\nFinal Predicted Output After Training:")
print(output)


Epoch 0, Loss: 0.2520513692725072
Epoch 1000, Loss: 0.24998152415102137
Epoch 2000, Loss: 0.24993098044436718
Epoch 3000, Loss: 0.2499055627179052
Epoch 4000, Loss: 0.2498778040725502
Epoch 5000, Loss: 0.24984627319395852
Epoch 6000, Loss: 0.24980969847108805
Epoch 7000, Loss: 0.24976658130666474
Epoch 8000, Loss: 0.24971512007938584
Epoch 9000, Loss: 0.24965312082888597
Epoch 10000, Loss: 0.2495778854717237
Epoch 11000, Loss: 0.24948607106936638
Epoch 12000, Loss: 0.24937351247123185
Epoch 13000, Loss: 0.2492349996731948
Epoch 14000, Loss: 0.24906400110084947
Epoch 15000, Loss: 0.2488523258508311
Epoch 16000, Loss: 0.2485897235600782
Epoch 17000, Loss: 0.24826343282966618
Epoch 18000, Loss: 0.24785771163736825
Epoch 19000, Loss: 0.24735341936799463
Epoch 20000, Loss: 0.2467277704548673
Epoch 21000, Loss: 0.245954436538602
Epoch 22000, Loss: 0.24500421534747002
Epoch 23000, Loss: 0.24384647103738688
Epoch 24000, Loss: 0.2424514379628092
Epoch 25000, Loss: 0.24079325070327962
Epoch 2600