##**Objective:**
WAP to implement a multi-layer perceptron (MLP) network with one hidden layer using numpy in Python. Demonstrate that it can learn the XOR Boolean function.

##**Description of the Model:**
- This is a Multi-Layer Perceptron (MLP) with a single hidden layer and a step activation function.
- It is trained using backpropagation and attempts to learn the XOR function.

##**Python Implementation**

In [10]:
import numpy as np
from sklearn.metrics import confusion_matrix

class MLP_XOR:
    def __init__(self, input_size, hidden_size, lr=0.1, epochs=1000):
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = 1  # Single output neuron for binary classification
        self.lr = lr
        self.epochs = epochs

        # Xavier Initialization for stability
        self.W1 = np.random.randn(self.input_size, self.hidden_size) * np.sqrt(2.0 / self.input_size)
        self.b1 = np.zeros((1, self.hidden_size))
        self.W2 = np.random.randn(self.hidden_size, self.output_size) * np.sqrt(2.0 / self.hidden_size)
        self.b2 = np.zeros((1, self.output_size))

    def relu(self, x):
        return np.maximum(0, x)

    def relu_derivative(self, x):
        return (x > 0).astype(float)

    def sigmoid(self, x):
        return 1 / (1 + np.exp(-x))

    def sigmoid_derivative(self, x):
        return x * (1 - x)

    def forward(self, X):
        self.z1 = np.dot(X, self.W1) + self.b1
        self.a1 = self.relu(self.z1)
        self.z2 = np.dot(self.a1, self.W2) + self.b2
        self.a2 = self.sigmoid(self.z2)
        return self.a2

    def backward(self, X, y, output):
        error = y - output
        d_output = error * self.sigmoid_derivative(output)
        error_hidden = d_output.dot(self.W2.T)
        d_hidden = error_hidden * self.relu_derivative(self.a1)

        # Update weights and biases using learning rate
        self.W2 += self.a1.T.dot(d_output) * self.lr
        self.b2 += np.sum(d_output, axis=0, keepdims=True) * self.lr
        self.W1 += X.T.dot(d_hidden) * self.lr
        self.b1 += np.sum(d_hidden, axis=0, keepdims=True) * self.lr

    def train(self, X, y):
        for _ in range(self.epochs):
            output = self.forward(X)
            self.backward(X, y, output)

    def predict(self, X):
        return (self.forward(X) > 0.5).astype(int)

    def evaluate(self, X, y):
        predictions = self.predict(X)
        correct = np.sum(predictions == y)
        accuracy = (correct / len(y)) * 100
        print(f'Accuracy: {accuracy:.2f}%')

        # Compute confusion matrix
        cm = confusion_matrix(y, predictions)
        print("Confusion Matrix:")
        print(cm)

# XOR Truth Table
X_xor = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y_xor = np.array([[0], [1], [1], [0]])

# Train MLP on XOR
print("Training MLP for XOR")
mlp_xor = MLP_XOR(input_size=2, hidden_size=4, lr=0.1, epochs=1000)
mlp_xor.train(X_xor, y_xor)
mlp_xor.evaluate(X_xor, y_xor)


Training MLP for XOR
Accuracy: 100.00%
Confusion Matrix:
[[2 0]
 [0 2]]


##**Code Explanation:**
### **Code Explanation (Short Points)**  

1. **Perceptron Class**  
   - Implements a **single-layer perceptron** with a step activation function.  
   - Uses **random weight initialization** (including bias).  
   - Learning rate and epochs are configurable.  

2. **Activation Function**  
   - Uses a **threshold function**: returns `1` if input ≥ 0, otherwise `0`.  

3. **Training Process**  
   - Iterates through **epochs** and updates weights using the Perceptron Learning Rule
   - Adds a **bias term** for better decision-making.  

4. **Prediction Function**  
   - Computes the **weighted sum** of inputs and applies the **activation function**.  

5. **Evaluation with Confusion Matrix**  
   - **Accuracy** is calculated as: (Correct Predictions/Total Samples)X 100
   - **Confusion Matrix** is printed for detailed performance analysis.  

7. **Limitations and Improvements**  
   - **Single-layer perceptron** **cannot** solve **non-linearly separable** problems (like XOR).  
   - **Solution:** Use a **multi-layer perceptron (MLP)** with **hidden layers** and **non-linear activations** (ReLU, Sigmoid).

##**My Comments:**
1. The step activation function is not ideal for backpropagation since it lacks smooth gradients.
2. This MLP can learn XOR, but convergence is slower and less stable than using sigmoid or ReLU.
3. The weight update could be improved with momentum or adaptive learning rates.
4. Using sigmoid/ReLU activation would allow better gradient flow and training efficiency.
5. The training process could benefit from batch updates rather than single-sample updates.