#### WAP to implement a multi-layer perceptron (MLP) network with one hidden layer using numpy in Python. Demonstrate that it can learn the XOR Boolean function.

### Description of the Model:
A Multi-Layer Perceptron (MLP) is a class of feedforward artificial neural networks (ANN). It consists of an input layer, one or more hidden layers, and an output layer. For this implementation:

* XOR is non-linear seperable means it cannot be seperable by single layer perceptron, so we use multi-layer perceptron with hidden layer
* The input layer has 2 neurons.
* The hidden layer has 4 neurons with step activation functions.
* The output layer has 1 neuron with a step activation function.
* It does not use backpropagation instead  weight update rule is applied when the output is incorrect.

In [13]:
import numpy as np

def step_function(x):
    # Step activation function:
    return np.where(x >= 0, 1, 0)

def train_mlp(X, y, hidden_neurons=4, epochs=10000, learning_rate=0.01):
    input_neurons = X.shape[1]
    output_neurons = 1
    
    # Initialize weights and biases randomly
    W1 = np.random.uniform(-1, 1, (hidden_neurons, input_neurons))
    b1 = np.random.uniform(-1, 1, (hidden_neurons, 1)) 
    W2 = np.random.uniform(-1, 1, (output_neurons, hidden_neurons))
    b2 = np.random.uniform(-1, 1, (output_neurons, 1))
    
    for epoch in range(epochs):
        for i in range(X.shape[0]):
            x_sample = X[i].reshape(-1, 1)
            y_sample = y[i]
            
            # Forward pass
            hidden_input = np.dot(W1, x_sample) + b1
            hidden_output = step_function(hidden_input)
            final_input = np.dot(W2, hidden_output) + b2
            final_output = step_function(final_input)
            
            # Random weight update rule
            if final_output != y_sample:
                W2 += learning_rate * (y_sample - final_output) * hidden_output.T
                b2 += learning_rate * (y_sample - final_output)
                W1 += learning_rate * (y_sample - final_output) * x_sample.T
                b1 += learning_rate * (y_sample - final_output)
    
    return W1, b1, W2, b2

def predict(X, W1, b1, W2, b2):
    hidden_input = np.dot(W1, X.T) + b1
    hidden_output = step_function(hidden_input)
    final_input = np.dot(W2, hidden_output) + b2
    final_output = step_function(final_input)
    return final_output.T

def accuracy(y_true, y_pred):
    return np.mean(y_true == y_pred) * 100

# XOR dataset
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([[0], [1], [1], [0]]) # output of XOR gate

# Train the MLP
W1, b1, W2, b2 = train_mlp(X, y, hidden_neurons=4)

# Make predictions
y_pred = predict(X, W1, b1, W2, b2)

# Compute accuracy
acc = accuracy(y, y_pred)

# Print results
print("Predictions:", y_pred.flatten())
print(f"Accuracy: {acc:.2f}%")

Predictions: [0 1 1 0]
Accuracy: 100.00%


### Description of the code

Use Step function to predict binary output i.e 0 and 1
 * 1 when x>=0
 * 0 otherwise

Initialize weights and biases 
 * W1 and b1 are weights and biases of the hidden layer.
 * W2 and b2 are weights and biases of the output layer.
 * Weights are initialized randomly between -1 and 1. 

##### Error: y_sample- final_output
##### weight-update rule: w = w+ learning_rate * error * X.T

 
* Epoch: We have train the model 10,000 times to improve accuracy. It iterate each time and adjust weights and biases.
* Learning rate controls how much the weights change.

Forward Pass:
 * It compute the values for hidden layer and output layer.
 * Applied step function to get binary outputs.

Weight Update Rule:
 * It adjust weights randomly when the output is wrong.

The accuracy of XOR by MLP is 100% after training.




### Performance Evaluation:

* Accuracy: The model have achieved 100% accuracy for XOR.



### My Comments:

Limitations: 
* For the accuracy we have to increase number of epochs or learning rate to get the desired accuracy and that can led to overfitting.
* Some training runs may not achieve the desired accuracy.

Improvements: 
* We can implement Backpropagation with gradient descent instead of weight update rule.
* Use Sigmoid, ReLU like activation functions for better learning.
