#Detailed Comments Explanation:

Sigmoid and Derivative:

sigmoid(x) computes the output of the sigmoid activation function. It maps any real-valued input into a value between 0 and 1. sigmoid_derivative(x) is the derivative of the sigmoid function, which is used during backpropagation to calculate gradients.

Neural Network Class:

The NeuralNetwork class contains the methods to initialize the network, perform the forward and backward passes, and train the network.

Forward Pass:

The forward pass computes the activations of the neurons in the network. The input is passed through the layers, with the weights and biases applied, followed by the activation function (sigmoid) to compute the output.

Backward Pass (Backpropagation):

During the backward pass, the weights are updated by calculating the error between the predicted output and the true output. The gradients are calculated using the derivative of the sigmoid function, and the weights and biases are updated using gradient descent with a specified learning rate.

Training Method:

The network is trained by performing multiple epochs, where each epoch involves a forward pass followed by a backward pass. Every 1000 epochs, the loss (mean squared error) is printed to track the network's progress in learning.

Main Program:

The main program defines a simple XOR dataset, where the inputs are 0 and 1 combinations, and the output is their XOR result. The network is created with 2 input neurons, 4 hidden neurons, and 1 output neuron. The network is trained on the XOR data for 10,000 epochs with a learning rate of 0.1.

Output:

After training, the network is tested on the same XOR inputs, and the predictions are printed.

In [1]:
import numpy as np

def sigmoid(x):
    """
    Numerically stable sigmoid activation function.
    """
    # Clip values to avoid overflow
    x = np.clip(x, -500, 500)
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    """
    Derivative of the sigmoid function.
    Now correctly accepts the sigmoid output as input.
    """
    # x should already be sigmoid(x)
    return x * (1 - x)

class NeuralNetwork:
    def __init__(self, input_size, hidden_layers_sizes, output_size):
        self.input_size = input_size
        self.hidden_layers_sizes = hidden_layers_sizes
        self.output_size = output_size

        # Use Xavier/Glorot initialization for better convergence
        self.weights = []
        self.biases = []

        prev_size = self.input_size
        for size in self.hidden_layers_sizes:
            # Xavier initialization
            limit = np.sqrt(6 / (prev_size + size))
            self.weights.append(np.random.uniform(-limit, limit, (prev_size, size)))
            self.biases.append(np.zeros((1, size)))  # Initialize biases to zero
            prev_size = size

        # Output layer initialization
        limit = np.sqrt(6 / (prev_size + self.output_size))
        self.weights.append(np.random.uniform(-limit, limit, (prev_size, self.output_size)))
        self.biases.append(np.zeros((1, self.output_size)))

    def forward(self, X):
        """
        Forward pass with input validation.
        """
        # Input validation
        if not isinstance(X, np.ndarray):
            X = np.array(X)
        if X.ndim == 1:
            X = X.reshape(1, -1)

        self.activations = [X]

        # Forward pass through all layers
        for i in range(len(self.weights)):
            z = np.dot(self.activations[i], self.weights[i]) + self.biases[i]
            a = sigmoid(z)
            self.activations.append(a)

        return self.activations[-1]

    def backward(self, X, y, learning_rate):
        """
        Backward pass with gradient clipping.
        """
        # Convert y to numpy array if needed
        if not isinstance(y, np.ndarray):
            y = np.array(y)
        if y.ndim == 1:
            y = y.reshape(-1, 1)

        # Initialize lists to store gradients
        weight_gradients = []
        bias_gradients = []

        # Output layer error
        error = y - self.activations[-1]
        delta = error * sigmoid_derivative(self.activations[-1])

        # Backpropagate through layers
        for i in range(len(self.weights) - 1, -1, -1):
            # Calculate gradients
            weight_grad = np.dot(self.activations[i].T, delta)
            bias_grad = np.sum(delta, axis=0, keepdims=True)

            # Gradient clipping
            weight_grad = np.clip(weight_grad, -1, 1)
            bias_grad = np.clip(bias_grad, -1, 1)

            # Store gradients
            weight_gradients.insert(0, weight_grad)
            bias_gradients.insert(0, bias_grad)

            # Calculate delta for next layer
            if i > 0:
                delta = np.dot(delta, self.weights[i].T) * sigmoid_derivative(self.activations[i])

        # Update weights and biases
        for i in range(len(self.weights)):
            self.weights[i] += learning_rate * weight_gradients[i]
            self.biases[i] += learning_rate * bias_gradients[i]

    def train(self, X, y, epochs, learning_rate, batch_size=None, verbose=True):
        """
        Training with mini-batch support and better monitoring.
        """
        # Convert inputs to numpy arrays
        X = np.array(X)
        y = np.array(y)

        if batch_size is None:
            batch_size = len(X)

        n_samples = len(X)

        for epoch in range(epochs):
            # Shuffle data
            indices = np.random.permutation(n_samples)
            X_shuffled = X[indices]
            y_shuffled = y[indices]

            # Mini-batch training
            for i in range(0, n_samples, batch_size):
                batch_X = X_shuffled[i:i + batch_size]
                batch_y = y_shuffled[i:i + batch_size]

                self.forward(batch_X)
                self.backward(batch_X, batch_y, learning_rate)

            # Print progress
            if verbose and epoch % 1000 == 0:
                predictions = self.forward(X)
                mse = np.mean(np.square(y - predictions))
                accuracy = np.mean((predictions > 0.5) == y)
                print(f"Epoch {epoch} - Loss: {mse:.6f} - Accuracy: {accuracy:.2%}")

if __name__ == "__main__":
    # XOR problem setup
    X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
    y = np.array([[0], [1], [1], [0]])

    try:
        # Input validation
        num_hidden_layers = int(input("Enter number of hidden layers: "))
        if num_hidden_layers < 1:
            raise ValueError("Number of hidden layers must be at least 1")

        hidden_layers_sizes = []
        for i in range(num_hidden_layers):
            neurons = int(input(f"Enter number of neurons in hidden layer {i+1}: "))
            if neurons < 1:
                raise ValueError(f"Layer {i+1} must have at least 1 neuron")
            hidden_layers_sizes.append(neurons)

        # Create and train network
        nn = NeuralNetwork(input_size=2, hidden_layers_sizes=hidden_layers_sizes, output_size=1)
        nn.train(X, y, epochs=10000, learning_rate=0.1, batch_size=4)

        # Test network
        predictions = nn.forward(X)
        print("\nFinal Predictions:")
        for input_data, pred, target in zip(X, predictions, y):
            print(f"Input: {input_data}, Predicted: {pred[0]:.4f}, Target: {target[0]}")

    except ValueError as e:
        print(f"Error: {e}")
    except Exception as e:
        print(f"An unexpected error occurred: {e}")

Enter number of hidden layers: 3
Enter number of neurons in hidden layer 1: 2
Enter number of neurons in hidden layer 2: 5
Enter number of neurons in hidden layer 3: 4
Epoch 0 - Loss: 0.256275 - Accuracy: 50.00%
Epoch 1000 - Loss: 0.249893 - Accuracy: 50.00%
Epoch 2000 - Loss: 0.249842 - Accuracy: 50.00%
Epoch 3000 - Loss: 0.249762 - Accuracy: 75.00%
Epoch 4000 - Loss: 0.249624 - Accuracy: 75.00%
Epoch 5000 - Loss: 0.249362 - Accuracy: 75.00%
Epoch 6000 - Loss: 0.248779 - Accuracy: 75.00%
Epoch 7000 - Loss: 0.247155 - Accuracy: 75.00%
Epoch 8000 - Loss: 0.240347 - Accuracy: 75.00%
Epoch 9000 - Loss: 0.199924 - Accuracy: 75.00%

Final Predictions:
Input: [0 0], Predicted: 0.1528, Target: 0
Input: [0 1], Predicted: 0.6900, Target: 1
Input: [1 0], Predicted: 0.6779, Target: 1
Input: [1 1], Predicted: 0.4535, Target: 0
