Backpropagation Implementation in Python for XOR Problem

This code demonstrates how backpropagation is used in a neural network to solve the XOR problem. The neural network consists of:

1. Defining Neural Network

Input layer with 2 inputs

Hidden layer with 4 neurons

Output layer with 1 output neuron

Using Sigmoid function as activation function

In [None]:
import numpy as np

class NeuralNetwork:
    def __init__(self, input_size, hidden_size, output_size):
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size

        # Initialize weights and biases
        self.weights_input_hidden = np.random.randn(self.input_size, self.hidden_size)
        self.weights_hidden_output = np.random.randn(self.hidden_size, self.output_size)
        self.bias_hidden = np.zeros((1, self.hidden_size))
        self.bias_output = np.zeros((1, self.output_size))

    def sigmoid(self, x):
        return 1 / (1 + np.exp(-x))

    def sigmoid_derivative(self, x):
        return x * (1 - x)

    def feedforward(self, X):
        self.hidden_activation = np.dot(X, self.weights_input_hidden) + self.bias_hidden
        self.hidden_output = self.sigmoid(self.hidden_activation)

        self.output_activation = np.dot(self.hidden_output, self.weights_hidden_output) + self.bias_output
        self.predicted_output = self.sigmoid(self.output_activation)

        return self.predicted_output

    def backward(self, X, y, learning_rate):
        output_error = y - self.predicted_output
        output_delta = output_error * self.sigmoid_derivative(self.predicted_output)

        hidden_error = np.dot(output_delta, self.weights_hidden_output.T)
        hidden_delta = hidden_error * self.sigmoid_derivative(self.hidden_output)

        # Update weights and biases
        self.weights_hidden_output += np.dot(self.hidden_output.T, output_delta) * learning_rate
        self.bias_output += np.sum(output_delta, axis=0, keepdims=True) * learning_rate
        self.weights_input_hidden += np.dot(X.T, hidden_delta) * learning_rate
        self.bias_hidden += np.sum(hidden_delta, axis=0, keepdims=True) * learning_rate

    def train(self, X, y, epochs, learning_rate):
        for epoch in range(epochs):
            output = self.feedforward(X)
            self.backward(X, y, learning_rate)
            if epoch % 1000 == 0:
                loss = np.mean(np.square(y - output))
                print(f"Epoch {epoch}, Loss: {loss:.4f}")

def __init__(self, input_size, hidden_size, output_size):: constructor to initialize the neural network

self.input_size = input_size: stores the size of the input layer

self.hidden_size = hidden_size: stores the size of the hidden layer

self.weights_input_hidden = np.random.randn(self.input_size, self.hidden_size):
initializes weights for input to hidden layer

self.weights_hidden_output = np.random.randn(self.hidden_size, self.output_size)
: initializes weights for hidden to output layer

self.bias_hidden = np.zeros((1, self.hidden_size)): initializes bias for hidden
layer

self.bias_output = np.zeros((1, self.output_size)): initializes bias for output layer

2. Defining Feed Forward Network

In Forward pass inputs are passed through the network activating the hidden and output layers using the sigmoid function.

In [None]:
    def feedforward(self, X):
          self.hidden_activation = np.dot(X, self.weights_input_hidden) + self.bias_hidden
          self.hidden_output = self.sigmoid(self.hidden_activation)

          self.output_activation = np.dot(self.hidden_output, self.weights_hidden_output) + self.bias_output
          self.predicted_output = self.sigmoid(self.output_activation)

          return self.predicted_output

self.hidden_activation = np.dot(X, self.weights_input_hidden) + self.bias_hidden: calculates activation for hidden layer

self.hidden_output = self.sigmoid(self.hidden_activation): applies activation function to hidden layer

self.output_activation = np.dot(self.hidden_output, self.weights_hidden_output) + self.bias_output: calculates activation for output layer

self.predicted_output = self.sigmoid(self.output_activation): applies activation function to output layer

3. Defining Backward Network

In Backward pass (Backpropagation) the errors between the predicted and actual outputs are computed. The gradients are calculated using the derivative of the sigmoid function and weights and biases are updated accordingly.

In [None]:
    def backward(self, X, y, learning_rate):
          output_error = y - self.predicted_output
          output_delta = output_error * self.sigmoid_derivative(self.predicted_output)

          hidden_error = np.dot(output_delta, self.weights_hidden_output.T)
          hidden_delta = hidden_error * self.sigmoid_derivative(self.hidden_output)

          self.weights_hidden_output += np.dot(self.hidden_output.T, output_delta) * learning_rate
          self.bias_output += np.sum(output_delta, axis=0, keepdims=True) * learning_rate
          self.weights_input_hidden += np.dot(X.T, hidden_delta) * learning_rate
          self.bias_hidden += np.sum(hidden_delta, axis=0, keepdims=True) * learning_rate

output_error = y - self.predicted_output: calculates the error at the output layer

output_delta = output_error * self.sigmoid_derivative(self.predicted_output): calculates the delta for the output layer

hidden_error = np.dot(output_delta, self.weights_hidden_output.T): calculates the error at the hidden layer

hidden_delta = hidden_error * self.sigmoid_derivative(self.hidden_output): calculates the delta for the hidden layer

self.weights_hidden_output += np.dot(self.hidden_output.T, output_delta) * learning_rate: updates weights between hidden and output layers

self.weights_input_hidden += np.dot(X.T, hidden_delta) * learning_rate: updates weights between input and hidden layers

4. Training Network

The network is trained over 10,000 epochs using the backpropagation algorithm with a learning rate of 0.1 progressively reducing the error.

In [None]:
    def train(self, X, y, epochs, learning_rate):
        for epoch in range(epochs):
            output = self.feedforward(X)
            self.backward(X, y, learning_rate)
            if epoch % 4000 == 0:
                loss = np.mean(np.square(y - output))
                print(f"Epoch {epoch}, Loss:{loss}")

output = self.feedforward(X): computes the output for the current inputs

self.backward(X, y, learning_rate): updates weights and biases using backpropagation

loss = np.mean(np.square(y - output)): calculates the mean squared error (MSE) loss

5. Testing Neural Network

In [None]:
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([[0], [1], [1], [0]])

nn = NeuralNetwork(input_size=2, hidden_size=4, output_size=1)
nn.train(X, y, epochs=10000, learning_rate=0.1)

output = nn.feedforward(X)
print("Predictions after training:")
print(output)

Epoch 0, Loss: 0.2670
Epoch 1000, Loss: 0.2185
Epoch 2000, Loss: 0.1334
Epoch 3000, Loss: 0.0325
Epoch 4000, Loss: 0.0125
Epoch 5000, Loss: 0.0071
Epoch 6000, Loss: 0.0048
Epoch 7000, Loss: 0.0036
Epoch 8000, Loss: 0.0028
Epoch 9000, Loss: 0.0023
Predictions after training:
[[0.03847731]
 [0.95554079]
 [0.95549253]
 [0.04826057]]


In [None]:
# Example usage:
if __name__ == "__main__":
    # XOR problem
    X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
    y = np.array([[0], [1], [1], [0]])

    nn = NeuralNetwork(input_size=2, hidden_size=4, output_size=1)
    nn.train(X, y, epochs=10000, learning_rate=0.1)

    predictions = nn.feedforward(X)
    print("\nPredictions after training:")
    print(predictions)

Epoch 0, Loss: 0.3053
Epoch 1000, Loss: 0.2442
Epoch 2000, Loss: 0.2104
Epoch 3000, Loss: 0.1508
Epoch 4000, Loss: 0.0497
Epoch 5000, Loss: 0.0194
Epoch 6000, Loss: 0.0107
Epoch 7000, Loss: 0.0071
Epoch 8000, Loss: 0.0052
Epoch 9000, Loss: 0.0040

Predictions after training:
[[0.03559979]
 [0.94999532]
 [0.93538288]
 [0.07148302]]


Advantages of Backpropagation for Neural Network Training

The key benefits of using the backpropagation algorithm are:

Ease of Implementation: Backpropagation is beginner-friendly requiring no prior neural network knowledge and simplifies programming by adjusting weights with error derivatives.

Simplicity and Flexibility: Its straightforward design suits a range of tasks from basic feedforward to complex convolutional or recurrent networks.
Efficiency: Backpropagation accelerates learning by directly updating weights based on error especially in deep networks.

Generalization: It helps models generalize well to new data improving prediction accuracy on unseen examples.

Scalability: The algorithm scales efficiently with larger datasets and more complex networks making it ideal for large-scale tasks.

Challenges with Backpropagation

While backpropagation is powerful it does face some challenges:

Vanishing Gradient Problem: In deep networks the gradients can become very small during backpropagation making it difficult for the network to learn. This is common when using activation functions like sigmoid or tanh.
Exploding Gradients: The gradients can also become excessively large causing the network to diverge during training.

Overfitting: If the network is too complex it might memorize the training data instead of learning general patterns.

Backpropagation is a technique that makes neural network learn. By propagating errors backward and adjusting the weights and biases neural networks can gradually improve their predictions. Though it has some limitations like vanishing gradients many techniques like ReLU activation or optimizing learning rates have been developed to address these issues.