Initialization:

The FeedForwardNN class initializes the weights and biases for each layer, scaled for small initial values. W1 and b1 are for the hidden layer, while W2 and b2 are for the output layer.
The learning_rate controls how much we adjust the weights in each training step.
Activation Functions:

ReLU: The hidden layer uses a ReLU activation function. The ReLU derivative is also implemented for backpropagation.
Softmax: The output layer uses softmax, suitable for multi-class classification, ensuring output probabilities sum up to 1.
Forward Pass:

forward(x): Computes the activations and final output by passing inputs through each layer.
Z1 is the linear transformation of the input to the hidden layer, and A1 is the result after applying ReLU.
Z2 and A2 represent the linear transformation and softmax output for the output layer.
Backward Pass:

backward(x, y): Computes gradients for weights and biases using backpropagation.
Output Layer Gradients:
dZ2 calculates the difference between predicted and actual output.
dW2 and db2 update the weights and biases for the output layer.
Hidden Layer Gradients:
dA1 propagates the error to the hidden layer, and dZ1 adjusts based on the derivative of ReLU.
dW1 and db1 update the hidden layer’s weights and biases.
Training:

train(x, y, epochs): Iteratively calls forward and backward for multiple epochs, updating weights and biases to reduce the loss.
Loss Calculation: We use cross-entropy loss for training, calculated by comparing predicted outputs with actual labels.

In [None]:
import numpy as np

class FeedForwardNN:
    def __init__(self, input_size, hidden_size, output_size, learning_rate=0.01):
        # Initialize network architecture and learning rate
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.learning_rate = learning_rate

        # Initialize weights and biases
        self.W1 = np.random.randn(hidden_size, input_size) * 0.01  # Input to hidden layer weights
        self.b1 = np.zeros((hidden_size, 1))                       # Hidden layer bias
        self.W2 = np.random.randn(output_size, hidden_size) * 0.01 # Hidden to output layer weights
        self.b2 = np.zeros((output_size, 1))                       # Output layer bias

    def relu(self, x):
        return np.maximum(0, x)

    def relu_derivative(self, x):
        return np.where(x > 0, 1, 0)

    def softmax(self, x):
        exps = np.exp(x - np.max(x, axis=0, keepdims=True))  # Stability improvement by subtracting max
        return exps / np.sum(exps, axis=0, keepdims=True)

    def forward(self, x):
        # Forward pass: compute activations and outputs
        self.Z1 = np.dot(self.W1, x) + self.b1      # Linear transformation for hidden layer
        self.A1 = self.relu(self.Z1)                # ReLU activation for hidden layer
        self.Z2 = np.dot(self.W2, self.A1) + self.b2 # Linear transformation for output layer
        self.A2 = self.softmax(self.Z2)             # Softmax for output probabilities

        return self.A2

    def backward(self, x, y):
        # Compute gradients using backpropagation
        m = y.shape[1]  # Batch size

        # Output layer gradient
        dZ2 = self.A2 - y
        dW2 = (1 / m) * np.dot(dZ2, self.A1.T)
        db2 = (1 / m) * np.sum(dZ2, axis=1, keepdims=True)

        # Hidden layer gradient
        dA1 = np.dot(self.W2.T, dZ2)
        dZ1 = dA1 * self.relu_derivative(self.Z1)
        dW1 = (1 / m) * np.dot(dZ1, x.T)
        db1 = (1 / m) * np.sum(dZ1, axis=1, keepdims=True)

        # Update weights and biases
        self.W1 -= self.learning_rate * dW1
        self.b1 -= self.learning_rate * db1
        self.W2 -= self.learning_rate * dW2
        self.b2 -= self.learning_rate * db2

    def train(self, x, y, epochs=100):
        for epoch in range(epochs):
            # Forward pass
            output = self.forward(x)

            # Calculate loss (cross-entropy)
            loss = -np.mean(np.sum(y * np.log(output), axis=0))

            # Backward pass
            self.backward(x, y)

            if epoch % 10 == 0:
                print(f'Epoch {epoch}, Loss: {loss:.4f}')


# Example usage:
# Suppose we have a dataset with input features of size 3 and two classes to classify
input_size = 3
hidden_size = 5
output_size = 2
learning_rate = 0.01

# Randomly generated inputs (features) and one-hot encoded target labels
x = np.random.randn(input_size, 10)  # 10 examples, each with 3 features
y = np.zeros((output_size, 10))
y[0, :5] = 1  # First 5 examples belong to class 0
y[1, 5:] = 1  # Last 5 examples belong to class 1

# Create and train the neural network
nn = FeedForwardNN(input_size, hidden_size, output_size, learning_rate)
nn.train(x, y, epochs=100)
