Objective : WAP to implement a multi-layer perceptron (MLP) network with one hidden layer using numpy in Python. Demonstrate that it can learn the XOR Boolean function.

Below is an implementation of a simple Multi-Layer Perceptron (MLP) network with one hidden layer that can learn the XOR Boolean function. We'll use the step function (which outputs 0 or 1) as the activation function and employ the backpropagation algorithm to train the network.

The network will have:

2 input neurons (to represent the two inputs for XOR).
4 neurons in the hidden layer.
1 output neuron to predict the XOR result.
The step function as the activation function.


In [1]:
import numpy as np

Step Activation Function:

We use the step function for both the hidden and output layers. It outputs 1 if the input is >= 0, otherwise, it outputs 0.

In [2]:
# Step function (activation function)
def step_function(x):
    return np.where(x >= 0, 1, 0)



step_derivative(x): This is the derivative of the step function.
For the step function, its derivative is always 1 because the function is piecewise constant. We don't really use it here (for the step function, the derivative isn't very meaningful, but we define it for structure).


In [3]:
# Derivative of step function (not used directly in backpropagation but kept for structure)
def step_derivative(x):
    return np.ones_like(x)

In [4]:
# XOR inputs and outputs
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])  # 4 XOR input pairs
y = np.array([[0], [1], [1], [0]])  # XOR outputs

# Initialize weights and biases
np.random.seed(42)  # For reproducibility

# Parameters for the neural network
input_size = 2  # 2 inputs
hidden_size = 4  # 4 neurons in the hidden layer
output_size = 1  # 1 output

# Random initialization of weights and biases
w1 = np.random.rand(input_size, hidden_size)  # Weights for input to hidden layer
b1 = np.zeros((1, hidden_size))  # Biases for hidden layer
w2 = np.random.rand(hidden_size, output_size)  # Weights for hidden to output layer
b2 = np.zeros((1, output_size))  # Biases for output layer

# Learning rate
learning_rate = 0.1
epochs = 10000  # Number of training iterations



input_size: The number of inputs (2 in the XOR problem).

hidden_size: The number of neurons in the hidden layer (4 neurons).

output_size: The number of outputs (1 output for XOR).

w1: Randomly initialized weights connecting the input layer to the hidden layer. The shape is (2, 4) because there are 2 inputs and 4 neurons in the hidden layer.

b1: Bias values for the hidden layer. Initialized to zeros with shape (1, 4).

w2: Randomly initialized weights connecting the hidden layer to the output layer. The shape is (4, 1) because there are 4 neurons in the hidden layer and 1 output neuron.

b2: Bias values for the output layer. Initialized to zeros with shape (1, 1).

learning_rate: The learning rate controls how much the weights and biases are adjusted during each training iteration. A typical value is between 0.01 and 0.1.

epochs: The number of times the entire dataset will be passed through the network during training. In this case, it's set to 10,000 iterations to ensure enough learning.


In [5]:
# Training the MLP network
for epoch in range(epochs):
    # Forward propagation
    hidden_input = np.dot(X, w1) + b1  # Input to the hidden layer
    hidden_output = step_function(hidden_input)  # Output of hidden layer (after activation)

    output_input = np.dot(hidden_output, w2) + b2  # Input to output layer
    output = step_function(output_input)  # Final output (after activation)

    # Backpropagation
    # Compute the error at the output layer
    output_error = y - output

    # Gradients for output layer
    output_delta = output_error * step_derivative(output_input)

    # Compute the error at the hidden layer
    hidden_error = output_delta.dot(w2.T)
    hidden_delta = hidden_error * step_derivative(hidden_input)

    # Update weights and biases using the gradients
    w2 += hidden_output.T.dot(output_delta) * learning_rate
    b2 += np.sum(output_delta, axis=0, keepdims=True) * learning_rate

    w1 += X.T.dot(hidden_delta) * learning_rate
    b1 += np.sum(hidden_delta, axis=0, keepdims=True) * learning_rate

    # Optionally print the error every 1000 epochs
    if epoch % 1000 == 0:
        error = np.mean(np.abs(output_error))
        print(f'Epoch {epoch}, Error: {error}')


Epoch 0, Error: 0.5
Epoch 1000, Error: 0.5
Epoch 2000, Error: 0.5
Epoch 3000, Error: 0.5
Epoch 4000, Error: 0.5
Epoch 5000, Error: 0.5
Epoch 6000, Error: 0.5
Epoch 7000, Error: 0.5
Epoch 8000, Error: 0.5
Epoch 9000, Error: 0.5


Forward Propagation:

hidden_input: The weighted sum of the inputs for the hidden layer. This is calculated by performing matrix multiplication (np.dot(X, w1)) and adding the bias (b1).

The shape of hidden_input is (4, 4) since there are 4 data points and 4 hidden neurons.

hidden_output: The output of the hidden layer, which is obtained by applying the step_function (activation function) to hidden_input.

output_input: The weighted sum of the inputs to the output layer. This is calculated by performing matrix multiplication (np.dot(hidden_output, w2)) and adding the bias (b2).

The shape of output_input is (4, 1) since there are 4 data points and 1 output neuron.

output: The final output of the network, which is obtained by applying the step_function to output_input.


Backpropagation (Weight Updates):

Compute gradients (d_output and d_hidden) for adjusting weights.

Update W1, W2, b1, and b2 using gradient descent.

In [6]:
# After training, check the predictions
print("\nFinal predictions after training:")
final_output = step_function(np.dot(step_function(np.dot(X, w1) + b1), w2) + b2)
print(final_output)


Final predictions after training:
[[0]
 [0]
 [0]
 [0]]
