In this homework, you will implement a fully functioning feedforward neural network in Python using only the NumPy library. You will not use any deep learning frameworks such as TensorFlow or PyTorch. Your goal is to train your neural network to learn the XOR function.
Instructions
1. Generate the training data for the XOR function using the following code snippet:
----------------------------------------------------
import numpy as np

def generate_xor_data():
    X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
    Y = np.array([[0], [1], [1], [0]])
    return X, Y

X, Y = generate_xor_data()
-------------------------------------------------------
2. Create a Python class called `NeuralNetwork` that will contain your neural network implementation. Your class should follow the template provided below:
------------------------------------------------------
class NeuralNetwork:
    def __init__(self, input_size, hidden_size, output_size):
        # Initialize the weights and biases of the network
        pass

    def forward(self, X):
        # Perform the forward pass and compute the output of the network
        pass

    def backward(self, X, Y, output):
        # Perform the backward pass and compute the gradients of the loss function
        pass

    def update_weights(self, learning_rate):
        # Update the weights and biases using the computed gradients
        pass

    def train(self, X, Y, epochs, learning_rate):
        # Train the neural network using the provided training data
        pass

    def predict(self, X):
        # Make predictions using the trained neural network
        pass
--------------------------------------------------------
3. Implement the methods in the `NeuralNetwork` class following these guidelines:
For weight initialization, use either Xavier or He initialization.
Implement the forward pass, using the ReLU activation function for the hidden layer, and the sigmoid activation function for the output layer.
Implement the backward pass using the chain rule and the gradients of the loss function with respect to the weights and biases.
Use gradient descent or a variant thereof (e.g., mini-batch gradient descent, Adam, etc.) to update the weights and biases.
Train your neural network on the XOR dataset, adjusting the number of epochs and learning rate as needed.
4. Evaluate your trained neural network using the `predict` method and check if the predicted outputs match the actual XOR function outputs. You may also utilize some plots to help your visualize your results.

Submission
Submit a Python file containing your NeuralNetwork class implementation along with the code to generate the XOR data and train your neural network. Your submission should include comments to explain your implementation and any design choices you made. Additionally, include a brief report summarizing your results, including the network architecture, the number of epochs, the learning rate, and the final predictions.

1. XOR Data Generation

In [1]:
import numpy as np

def generate_xor_data():
    X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])  # XOR inputs
    Y = np.array([[0], [1], [1], [0]])  # XOR outputs
    return X, Y

X, Y = generate_xor_data()


2. Neural Network Class Definition

In [2]:
class NeuralNetwork:
    def __init__(self, input_size, hidden_size, output_size):
        # Initialize the weights and biases of the network using He initialization
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size

        # Weights and biases for input to hidden layer
        self.W1 = np.random.randn(self.input_size, self.hidden_size) * np.sqrt(2. / self.input_size)  # He Initialization
        self.b1 = np.zeros((1, self.hidden_size))

        # Weights and biases for hidden to output layer
        self.W2 = np.random.randn(self.hidden_size, self.output_size) * np.sqrt(2. / self.hidden_size)  # He Initialization
        self.b2 = np.zeros((1, self.output_size))

    def forward(self, X):
        # Forward pass to compute output using ReLU for hidden layer and Sigmoid for output layer
        self.Z1 = np.dot(X, self.W1) + self.b1  # Weighted sum for hidden layer
        self.A1 = np.maximum(0, self.Z1)  # ReLU activation for hidden layer

        self.Z2 = np.dot(self.A1, self.W2) + self.b2  # Weighted sum for output layer
        self.A2 = 1 / (1 + np.exp(-self.Z2))  # Sigmoid activation for output layer

        return self.A2

    def backward(self, X, Y, output):
        # Backward pass to compute gradients using chain rule

        # Compute the error in output
        output_error = output - Y

        # Compute gradient for weights and biases between hidden and output layer
        dZ2 = output_error * output * (1 - output)  # Derivative of sigmoid
        dW2 = np.dot(self.A1.T, dZ2)  # Gradient for W2
        db2 = np.sum(dZ2, axis=0, keepdims=True)  # Gradient for b2

        # Compute gradient for weights and biases between input and hidden layer
        dA1 = np.dot(dZ2, self.W2.T)  # Backpropagated error to hidden layer
        dZ1 = dA1 * (self.Z1 > 0)  # Derivative of ReLU
        dW1 = np.dot(X.T, dZ1)  # Gradient for W1
        db1 = np.sum(dZ1, axis=0, keepdims=True)  # Gradient for b1

        return dW1, db1, dW2, db2

    def update_weights(self, dW1, db1, dW2, db2, learning_rate):
        # Update the weights and biases using gradient descent
        self.W1 -= learning_rate * dW1
        self.b1 -= learning_rate * db1
        self.W2 -= learning_rate * dW2
        self.b2 -= learning_rate * db2

    def train(self, X, Y, epochs, learning_rate):
        # Train the neural network on the XOR dataset
        for epoch in range(epochs):
            # Forward pass
            output = self.forward(X)

            # Backward pass
            dW1, db1, dW2, db2 = self.backward(X, Y, output)

            # Update weights
            self.update_weights(dW1, db1, dW2, db2, learning_rate)

            if epoch % 1000 == 0:
                loss = np.mean(np.square(Y - output))  # Mean squared error loss
                print(f"Epoch {epoch}, Loss: {loss}")

    def predict(self, X):
        # Make predictions with the trained neural network
        output = self.forward(X)
        return (output > 0.5).astype(int)  # Return 1 if output > 0.5, else 0


3. Training and Testing the Model

In [3]:
# Generate XOR data
X, Y = generate_xor_data()

# Create the neural network instance
input_size = 2  # Number of input features (for XOR: two inputs)
hidden_size = 4  # Number of neurons in the hidden layer (can be adjusted)
output_size = 1  # Output size (one output for XOR)
learning_rate = 0.1
epochs = 10000

# Initialize the neural network
nn = NeuralNetwork(input_size, hidden_size, output_size)

# Train the neural network
nn.train(X, Y, epochs, learning_rate)

# Test the network with the trained model
predictions = nn.predict(X)

# Print the results
print("Predictions:")
print(predictions)
print("Actual XOR Output:")
print(Y)


Epoch 0, Loss: 0.34787185098008183
Epoch 1000, Loss: 0.006868755653009804
Epoch 2000, Loss: 0.0022212607090746433
Epoch 3000, Loss: 0.0012232824413356677
Epoch 4000, Loss: 0.0008275265037609213
Epoch 5000, Loss: 0.0006195756618468821
Epoch 6000, Loss: 0.0004923920127441533
Epoch 7000, Loss: 0.00040713900528488103
Epoch 8000, Loss: 0.0003463006931797388
Epoch 9000, Loss: 0.0003007225173488562
Predictions:
[[0]
 [1]
 [1]
 [0]]
Actual XOR Output:
[[0]
 [1]
 [1]
 [0]]
