In [None]:
import numpy as np

# Define the sigmoid activation function
def sigmoid(x):
  return 1 / (1 + np.exp(-x))

# Define the derivative of the sigmoid function
def sigmoid_derivative(x):
  return x * (1 - x)

# Input dataset
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])

# Output dataset
y = np.array([[0], [1], [1], [0]])

# Define the number of neurons in each layer
input_layer_neurons = 2
hidden_layer_neurons = 2
output_layer_neurons = 1

# Initialize weights and biases with random values
hidden_weights = np.random.uniform(size=(input_layer_neurons, hidden_layer_neurons))
hidden_bias = np.random.uniform(size=(1, hidden_layer_neurons))
output_weights = np.random.uniform(size=(hidden_layer_neurons, output_layer_neurons))
output_bias = np.random.uniform(size=(1, output_layer_neurons))

# Training parameters
learning_rate = 0.1
epochs = 10000

# Training the model
for epoch in range(epochs):
  # Forward propagation
  hidden_layer_activation = np.dot(X, hidden_weights) + hidden_bias
  hidden_layer_output = sigmoid(hidden_layer_activation)
  output_layer_activation = np.dot(hidden_layer_output, output_weights) + output_bias
  predicted_output = sigmoid(output_layer_activation)

  # Backpropagation
  error = y - predicted_output
  d_predicted_output = error * sigmoid_derivative(predicted_output)
  error_hidden_layer = d_predicted_output.dot(output_weights.T)
  d_hidden_layer = error_hidden_layer * sigmoid_derivative(hidden_layer_output)

  # Update weights and biases
  output_weights += hidden_layer_output.T.dot(d_predicted_output) * learning_rate
  output_bias += np.sum(d_predicted_output, axis=0, keepdims=True) * learning_rate
  hidden_weights += X.T.dot(d_hidden_layer) * learning_rate
  hidden_bias += np.sum(d_hidden_layer, axis=0, keepdims=True) * learning_rate

# Test the model
print("Input: \n" + str(X))
print("Actual Output: \n" + str(y))
print("Predicted Output: \n", predicted_output)


### Multilayer Perceptron (MLP) to Simulate XOR Gate

A **Multilayer Perceptron (MLP)** is a type of artificial neural network (ANN) composed of multiple layers of neurons. It includes an input layer, one or more hidden layers, and an output layer. Each layer is fully connected to the next, and information flows in one direction — from input to output. MLPs are well-suited for solving problems that are not linearly separable, like the XOR problem.

#### XOR Gate and Its Non-Linearity

The **XOR (Exclusive OR)** gate is a logical gate that outputs true (1) when the inputs are different, and false (0) when the inputs are the same. The XOR function is **non-linearly separable**, meaning it cannot be separated using a straight line in a 2D plane. This makes XOR a classic example used to demonstrate the power of multilayer neural networks, as simple perceptrons cannot solve it.

| Input A | Input B | XOR Output |
|---------|---------|------------|
|    0    |    0    |      0     |
|    0    |    1    |      1     |
|    1    |    0    |      1     |
|    1    |    1    |      0     |

#### Architecture of MLP for XOR
The MLP used to simulate the XOR gate consists of:
1. **Input Layer**: 2 neurons, corresponding to the two inputs (A, B).
2. **Hidden Layer**: 2 neurons to capture the non-linearity of the XOR function.
3. **Output Layer**: 1 neuron to produce the XOR result.

#### Steps in MLP for XOR Simulation

1. **Initialization**:
   - The weights for both the hidden and output layers are initialized randomly. Biases are also added to ensure flexibility in learning.

2. **Forward Propagation**:
   - Inputs are fed into the network, and the weighted sums of inputs plus biases are computed for each neuron in the hidden layer.
   - A non-linear activation function like **sigmoid** is applied to these sums, producing outputs for the hidden layer.
   - These hidden layer outputs are passed to the output layer, where another weighted sum and sigmoid activation produce the final output.

3. **Backpropagation**:
   - The network compares the predicted output with the actual XOR output and computes the error.
   - Using this error, the network performs backpropagation to adjust the weights. The **gradient descent** algorithm minimizes the error by updating weights and biases based on the calculated gradients.

4. **Training**:
   - The network repeats forward and backpropagation over multiple **epochs** (iterations) to reduce the error. The learning rate controls how much the weights are adjusted in each step.

After sufficient training, the network can simulate the XOR gate, producing outputs close to the expected XOR values for any binary input pair.

#### Diagram of MLP for XOR Gate
Below is a simplified diagram of the MLP architecture used for the XOR gate:

```
   Input Layer       Hidden Layer      Output Layer
   (A, B)          (H1, H2)          (Output)
    0 ----(w1)----> H1 ----(w5)---->  XOR
    1 ----(w2)----> H1 ----(w6)---->  XOR
    0 ----(w3)----> H2 ----(w7)---->  XOR
    1 ----(w4)----> H2 ----(w8)---->  XOR
```

Where `w1, w2, w3...` represent weights between the layers, and the neurons in the hidden layer (H1, H2) apply the activation function to learn non-linearities. The XOR output is computed based on these transformations.

### Conclusion
The XOR problem showcases the strength of MLPs in solving non-linearly separable problems. By using a hidden layer with non-linear activation, the MLP can effectively learn and simulate the behavior of an XOR gate.

https://chatgpt.com/share/6715bf48-48a4-8008-915e-c0b38ce59d6f