# Task
Implement a multi-layer perceptron (MLP) network step by step, including the forward and backward passes, and demonstrate the weight changes during backpropagation using a sample dataset.

## Define the structure of the mlp

### Subtask:
Determine the number of layers, the number of neurons in each layer, and the activation functions to be used.


**Reasoning**:
Define the network architecture based on the input data and the task. The input data `inputs` is a tuple with one element, so the input layer will have 1 neuron. This is a simple demonstration, so one hidden layer with a few neurons and one output neuron for a simple output (like binary classification or a single regression value) is sufficient. ReLU is a common choice for hidden layers and sigmoid for binary classification output.



In [6]:
# 1. Determine the number of input neurons
input_neurons = len(inputs)

# 2. Decide on the number of hidden layers and neurons.
# For this simple example, we'll use one hidden layer with 2 neurons.
hidden_layers = [2]

# 3. Determine the number of output neurons
# The expected output is a single integer, so we'll use 1 output neuron.
output_neurons = 1

# 6. Define a list to store the number of neurons in each layer
layer_sizes = [input_neurons] + hidden_layers + [output_neurons]

# 4. Choose activation function for hidden layers
hidden_activation = 'relu'

# 5. Choose activation function for the output layer
output_activation = 'sigmoid'

print(f"Layer sizes: {layer_sizes}")
print(f"Hidden activation: {hidden_activation}")
print(f"Output activation: {output_activation}")

Layer sizes: [1, 2, 1]
Hidden activation: relu
Output activation: sigmoid


## Initialize weights and biases

### Subtask:
Initialize the weights and biases of the network with small random values.


**Reasoning**:
Initialize the weights and biases for the network with small random values based on the determined layer sizes.



In [7]:
# Initialize lists to store weights and biases
weights = []
biases = []

# Iterate through the layers to initialize weights and biases
for i in range(len(layer_sizes) - 1):
    # Initialize weights with random values
    w = np.random.randn(layer_sizes[i], layer_sizes[i+1]) * 0.01
    weights.append(w)

    # Initialize biases with random values
    b = np.random.randn(1, layer_sizes[i+1]) * 0.01
    biases.append(b)

print("Weights initialized:")
for i, w in enumerate(weights):
    print(f"Layer {i+1} weights shape: {w.shape}")

print("\nBiases initialized:")
for i, b in enumerate(biases):
    print(f"Layer {i+1} biases shape: {b.shape}")

Weights initialized:
Layer 1 weights shape: (1, 2)
Layer 2 weights shape: (2, 1)

Biases initialized:
Layer 1 biases shape: (1, 2)
Layer 2 biases shape: (1, 1)


## Implement the forward pass

### Subtask:
Create a function to perform the forward pass, calculating the output of the network for a given input.


**Reasoning**:
Implement the activation functions and the forward pass function as described in the instructions.



In [8]:
import numpy as np

def relu(x):
    """Rectified Linear Unit activation function."""
    return np.maximum(0, x)

def sigmoid(x):
    """Sigmoid activation function."""
    return 1 / (1 + np.exp(-x))

def forward_pass(inputs, weights, biases, hidden_activation, output_activation):
    """
    Performs the forward pass through the MLP network.

    Args:
        inputs: Input data.
        weights: List of weight matrices for each layer.
        biases: List of bias vectors for each layer.
        hidden_activation: Name of the activation function for hidden layers ('relu' or 'sigmoid').
        output_activation: Name of the activation function for the output layer ('relu' or 'sigmoid').

    Returns:
        The output of the network.
    """
    current_layer_output = inputs

    for i in range(len(weights) - 1):  # Iterate through hidden layers
        linear_transform = np.dot(current_layer_output, weights[i]) + biases[i]
        if hidden_activation == 'relu':
            current_layer_output = relu(linear_transform)
        elif hidden_activation == 'sigmoid':
            current_layer_output = sigmoid(linear_transform)
        else:
            raise ValueError("Unsupported hidden activation function")

    # Output layer
    linear_transform = np.dot(current_layer_output, weights[-1]) + biases[-1]
    if output_activation == 'relu':
        final_output = relu(linear_transform)
    elif output_activation == 'sigmoid':
        final_output = sigmoid(linear_transform)
    else:
        raise ValueError("Unsupported output activation function")

    return final_output

# Test the forward pass with the initialized weights and biases
sample_input = np.array([inputs]) # Reshape inputs to match expected shape (1, 1)
network_output = forward_pass(sample_input, weights, biases, hidden_activation, output_activation)
print("Network output for sample input:", network_output)

Network output for sample input: [[0.50678299]]


## Implement the loss function

### Subtask:
Define a loss function to measure the error between the predicted output and the actual output.


**Reasoning**:
The subtask requires defining a loss function and calculating the loss. This can be done in a single code block by defining the `mean_squared_error` function and then using it to calculate and print the loss.



In [9]:
def mean_squared_error(y_true, y_predicted):
    """Calculates the Mean Squared Error."""
    n = len(y_true)
    mse = np.sum((y_true - y_predicted) ** 2) / n
    return mse

# The actual_output variable is already available in the kernel.
# Calculate the loss using the mean_squared_error function
loss = mean_squared_error(np.array([actual_output]), network_output)

# Print the calculated loss
print("Calculated Loss:", loss)

Calculated Loss: 0.25682900049823126


## Implement the backward pass (backpropagation)

### Subtask:
Create a function to perform backpropagation, calculating the gradients of the loss with respect to the weights and biases.


**Reasoning**:
Implement the backward pass function as described in the instructions, calculating the gradients for weights and biases.



In [10]:
def sigmoid_derivative(x):
    """Derivative of the sigmoid activation function."""
    return x * (1 - x)

def relu_derivative(x):
    """Derivative of the ReLU activation function."""
    return (x > 0).astype(float)

def backward_pass(inputs, target_output, network_output, weights, biases, hidden_activation, output_activation):
    """
    Performs the backward pass through the MLP network to calculate gradients.

    Args:
        inputs: Input data used in the forward pass.
        target_output: The actual target output.
        network_output: The output of the network from the forward pass.
        weights: List of weight matrices for each layer.
        biases: List of bias vectors for each layer.
        hidden_activation: Name of the activation function for hidden layers ('relu' or 'sigmoid').
        output_activation: Name of the activation function for the output layer ('relu' or 'sigmoid').

    Returns:
        A tuple containing lists of weight gradients and bias gradients.
    """
    num_layers = len(weights)
    weight_gradients = [np.zeros_like(w) for w in weights]
    bias_gradients = [np.zeros_like(b) for b in biases]
    layer_outputs = [inputs] # Store outputs of each layer (including input)

    # Forward pass to store intermediate outputs for backpropagation
    current_layer_output = inputs
    for i in range(num_layers):
        linear_transform = np.dot(current_layer_output, weights[i]) + biases[i]
        if i < num_layers - 1:  # Hidden layers
            if hidden_activation == 'relu':
                current_layer_output = relu(linear_transform)
            elif hidden_activation == 'sigmoid':
                current_layer_output = sigmoid(linear_transform)
            else:
                raise ValueError("Unsupported hidden activation function")
        else:  # Output layer
            if output_activation == 'relu':
                current_layer_output = relu(linear_transform)
            elif output_activation == 'sigmoid':
                current_layer_output = sigmoid(linear_transform)
            else:
                raise ValueError("Unsupported output activation function")
        layer_outputs.append(current_layer_output)

    # Calculate error at the output layer
    error = network_output - target_output

    # Calculate derivative of the output activation function
    if output_activation == 'sigmoid':
        output_activation_derivative = sigmoid_derivative(network_output)
    elif output_activation == 'relu':
        output_activation_derivative = relu_derivative(layer_outputs[-1]) # Use the input to the output activation
    else:
        raise ValueError("Unsupported output activation function")

    # Calculate error derivative at the output layer
    error_derivative = error * output_activation_derivative

    # Backpropagate through the layers
    for i in range(num_layers - 1, -1, -1): # Iterate backward through layers
        # Calculate weight gradient
        # The input to the current layer is the output of the previous layer
        input_to_current_layer = layer_outputs[i]
        weight_gradients[i] = np.dot(input_to_current_layer.T, error_derivative)

        # Calculate bias gradient
        bias_gradients[i] = np.sum(error_derivative, axis=0, keepdims=True)

        if i > 0: # Not the input layer
            # Calculate error derivative for the previous layer
            error_derivative_previous_layer = np.dot(error_derivative, weights[i].T)

            # Calculate derivative of the hidden activation function for the previous layer
            if hidden_activation == 'sigmoid':
                hidden_activation_derivative = sigmoid_derivative(layer_outputs[i])
            elif hidden_activation == 'relu':
                 hidden_activation_derivative = relu_derivative(layer_outputs[i]) # Use the input to the hidden activation
            else:
                raise ValueError("Unsupported hidden activation function")

            # Multiply error derivative by the derivative of the activation function
            error_derivative = error_derivative_previous_layer * hidden_activation_derivative

    return weight_gradients, bias_gradients

# Test the backward pass
sample_input_bp = np.array([inputs]) # Reshape inputs to match expected shape (1, 1)
target_output_bp = np.array([actual_output]) # Reshape target output to match expected shape (1, 1)
weight_gradients, bias_gradients = backward_pass(sample_input_bp, target_output_bp, network_output, weights, biases, hidden_activation, output_activation)

print("\nWeight Gradients:")
for i, grad in enumerate(weight_gradients):
    print(f"Layer {i+1} weight gradient shape: {grad.shape}")
    print(f"Layer {i+1} weight gradient:\n{grad}")

print("\nBias Gradients:")
for i, grad in enumerate(bias_gradients):
    print(f"Layer {i+1} bias gradient shape: {grad.shape}")
    print(f"Layer {i+1} bias gradient:\n{grad}")


Weight Gradients:
Layer 1 weight gradient shape: (1, 2)
Layer 1 weight gradient:
[[0. 0.]]
Layer 2 weight gradient shape: (2, 1)
Layer 2 weight gradient:
[[0.]
 [0.]]

Bias Gradients:
Layer 1 bias gradient shape: (1, 2)
Layer 1 bias gradient:
[[0. 0.]]
Layer 2 bias gradient shape: (1, 1)
Layer 2 bias gradient:
[[0.12667243]]


## Update weights and biases

### Subtask:
Update the weights and biases using an optimization algorithm (e.g., gradient descent) and the calculated gradients.


**Reasoning**:
Implement the weight and bias update step using the calculated gradients and a defined learning rate.



In [11]:
# 1. Define a learning rate
learning_rate = 0.1

# 2. Iterate through the weights and biases lists and their corresponding gradient lists.
# 3. For each weight matrix, update it by subtracting the product of the learning rate and its corresponding gradient.
# 4. For each bias vector, update it by subtracting the product of the learning rate and its corresponding gradient.
for i in range(len(weights)):
    weights[i] -= learning_rate * weight_gradients[i]
    biases[i] -= learning_rate * bias_gradients[i]

# 5. Print or inspect the updated weights and biases to verify the update process.
print("\nUpdated Weights:")
for i, w in enumerate(weights):
    print(f"Layer {i+1} updated weights shape: {w.shape}")
    print(f"Layer {i+1} updated weights:\n{w}")

print("\nUpdated Biases:")
for i, b in enumerate(biases):
    print(f"Layer {i+1} updated biases shape: {b.shape}")
    print(f"Layer {i+1} updated biases:\n{b}")


Updated Weights:
Layer 1 updated weights shape: (1, 2)
Layer 1 updated weights:
[[-0.00438769 -0.01276384]]
Layer 2 updated weights shape: (2, 1)
Layer 2 updated weights:
[[-0.00567785]
 [ 0.00966768]]

Updated Biases:
Layer 1 updated biases shape: (1, 2)
Layer 1 updated biases:
[[-0.0069388   0.00151578]]
Layer 2 updated biases shape: (1, 1)
Layer 2 updated biases:
[[0.01446639]]


## Train the mlp

### Subtask:
Train the network using a sample dataset, repeatedly performing forward and backward passes and updating the weights and biases.


**Reasoning**:
The subtask is to train the network using a sample dataset by repeatedly performing forward and backward passes and updating weights and biases. This involves defining a dataset, setting the number of epochs, and looping through the training process as described in the instructions. I will define the dataset, the number of epochs, and then implement the training loop including the forward pass, loss calculation, backward pass, and weight updates.



In [12]:
# 1. Define a sample dataset
# For this example, we'll use a simple XOR-like dataset with a single input and output.
# The previous steps already used a single input and output, so we'll stick to that structure.
# We'll define a few input-output pairs for demonstration.
training_data = [
    (np.array([0]), 0),
    (np.array([1]), 1),
    (np.array([0.5]), 0.5) # Adding a fractional input for more variation
]

# 2. Define the number of training epochs.
epochs = 1000

# 3. Loop for the specified number of epochs.
print(f"Starting training for {epochs} epochs...")
for epoch in range(epochs):
    total_loss = 0
    # 4. Inside the loop, iterate through the sample dataset.
    for inputs_train, target_output_train in training_data:
        # Reshape inputs and target output to match expected shapes (1, input_dim) and (1, output_dim)
        inputs_train_reshaped = inputs_train.reshape(1, -1)
        target_output_train_reshaped = np.array([target_output_train]).reshape(1, -1)

        # 5. For each input-target pair:
        # a. Perform the forward pass
        network_output = forward_pass(inputs_train_reshaped, weights, biases, hidden_activation, output_activation)

        # b. Calculate the loss
        loss = mean_squared_error(target_output_train_reshaped, network_output)
        total_loss += loss

        # c. Perform the backward pass
        weight_gradients, bias_gradients = backward_pass(inputs_train_reshaped, target_output_train_reshaped, network_output, weights, biases, hidden_activation, output_activation)

        # d. Update the weights and biases
        for i in range(len(weights)):
            weights[i] -= learning_rate * weight_gradients[i]
            biases[i] -= learning_rate * bias_gradients[i]

    # e. Optionally, print the average loss for monitoring training progress.
    average_loss = total_loss / len(training_data)
    if (epoch + 1) % 100 == 0: # Print loss every 100 epochs
        print(f"Epoch {epoch + 1}/{epochs}, Average Loss: {average_loss:.4f}")

print("Training finished.")

# Optionally, test the network after training
print("\nTesting network after training:")
for inputs_test, target_output_test in training_data:
    inputs_test_reshaped = inputs_test.reshape(1, -1)
    network_output_test = forward_pass(inputs_test_reshaped, weights, biases, hidden_activation, output_activation)
    print(f"Input: {inputs_test}, Target: {target_output_test}, Predicted: {network_output_test[0][0]:.4f}")

Starting training for 1000 epochs...
Epoch 100/1000, Average Loss: 0.1677
Epoch 200/1000, Average Loss: 0.1677
Epoch 300/1000, Average Loss: 0.1677
Epoch 400/1000, Average Loss: 0.1677
Epoch 500/1000, Average Loss: 0.1677
Epoch 600/1000, Average Loss: 0.1677
Epoch 700/1000, Average Loss: 0.1677
Epoch 800/1000, Average Loss: 0.1677
Epoch 900/1000, Average Loss: 0.1677
Epoch 1000/1000, Average Loss: 0.1677
Training finished.

Testing network after training:
Input: [0], Target: 0, Predicted: 0.5010
Input: [1], Target: 1, Predicted: 0.5010
Input: [0.5], Target: 0.5, Predicted: 0.5010


## Show weight changes

### Subtask:
During training, record and display the changes in weights and biases over epochs to demonstrate the backpropagation process.


**Reasoning**:
I need to store the weights and biases at different points during training to show how they change. I will create lists to store these values and append them inside the training loop. Then, I will display these stored values after training.



In [13]:
# 1. Create empty lists to store the weights and biases at different epochs.
stored_weights = []
stored_biases = []

# 2. Inside the training loop (which is already defined), before the weight and bias update step,
# append copies of the current weights and biases to the storage lists.
# I'll store them every 100 epochs to avoid storing too many values.
print(f"Starting training for {epochs} epochs and storing weights/biases every 100 epochs...")
for epoch in range(epochs):
    total_loss = 0
    if (epoch + 1) % 100 == 0 or epoch == 0: # Store initial and every 100 epochs
        # Append copies of current weights and biases
        stored_weights.append([w.copy() for w in weights])
        stored_biases.append([b.copy() for b in biases])

    for inputs_train, target_output_train in training_data:
        inputs_train_reshaped = inputs_train.reshape(1, -1)
        target_output_train_reshaped = np.array([target_output_train]).reshape(1, -1)

        network_output = forward_pass(inputs_train_reshaped, weights, biases, hidden_activation, output_activation)
        loss = mean_squared_error(target_output_train_reshaped, network_output)
        total_loss += loss

        weight_gradients, bias_gradients = backward_pass(inputs_train_reshaped, target_output_train_reshaped, network_output, weights, biases, hidden_activation, output_activation)

        for i in range(len(weights)):
            weights[i] -= learning_rate * weight_gradients[i]
            biases[i] -= learning_rate * bias_gradients[i]

    average_loss = total_loss / len(training_data)
    if (epoch + 1) % 100 == 0:
        print(f"Epoch {epoch + 1}/{epochs}, Average Loss: {average_loss:.4f}")

print("Training finished.")

# 3. After the training loop, iterate through the stored weights and biases and display how they changed.
print("\nChanges in Weights and Biases over Epochs:")
for i, epoch_weights in enumerate(stored_weights):
    epoch_num = i * 100 if i > 0 else 0
    print(f"\n--- Epoch {epoch_num} ---")
    for j in range(len(epoch_weights)):
        print(f"Layer {j+1} Weights:\n{epoch_weights[j]}")
        print(f"Layer {j+1} Biases:\n{stored_biases[i][j]}")

# 4. Calculate and display the magnitude of change (optional but helpful).
print("\nMagnitude of Change in Weights and Biases (Initial vs Final):")
initial_weights = stored_weights[0]
final_weights = stored_weights[-1]
initial_biases = stored_biases[0]
final_biases = stored_biases[-1]

for i in range(len(initial_weights)):
    weight_change_magnitude = np.linalg.norm(final_weights[i] - initial_weights[i])
    bias_change_magnitude = np.linalg.norm(final_biases[i] - initial_biases[i])
    print(f"Layer {i+1} Weight Change Magnitude: {weight_change_magnitude:.6f}")
    print(f"Layer {i+1} Bias Change Magnitude: {bias_change_magnitude:.6f}")

# Test the network after training (already present, keeping it)
print("\nTesting network after training:")
for inputs_test, target_output_test in training_data:
    inputs_test_reshaped = inputs_test.reshape(1, -1)
    network_output_test = forward_pass(inputs_test_reshaped, weights, biases, hidden_activation, output_activation)
    print(f"Input: {inputs_test}, Target: {target_output_test}, Predicted: {network_output_test[0][0]:.4f}")

Starting training for 1000 epochs and storing weights/biases every 100 epochs...
Epoch 100/1000, Average Loss: 0.1677
Epoch 200/1000, Average Loss: 0.1677
Epoch 300/1000, Average Loss: 0.1677
Epoch 400/1000, Average Loss: 0.1677
Epoch 500/1000, Average Loss: 0.1677
Epoch 600/1000, Average Loss: 0.1677
Epoch 700/1000, Average Loss: 0.1677
Epoch 800/1000, Average Loss: 0.1677
Epoch 900/1000, Average Loss: 0.1677
Epoch 1000/1000, Average Loss: 0.1677
Training finished.

Changes in Weights and Biases over Epochs:

--- Epoch 0 ---
Layer 1 Weights:
[[-0.00438769 -0.01276384]]
Layer 1 Biases:
[[-6.93880482e-03 -5.24375469e-05]]
Layer 2 Weights:
[[-0.00567785]
 [ 0.00953843]]
Layer 2 Biases:
[[0.00415776]]

--- Epoch 100 ---
Layer 1 Weights:
[[-0.00438769 -0.01276384]]
Layer 1 Biases:
[[-6.93880482e-03 -5.24375469e-05]]
Layer 2 Weights:
[[-0.00567785]
 [ 0.00953843]]
Layer 2 Biases:
[[0.00415776]]

--- Epoch 200 ---
Layer 1 Weights:
[[-0.00438769 -0.01276384]]
Layer 1 Biases:
[[-6.93880482e-03

## Evaluate the mlp

### Subtask:
Evaluate the performance of the trained MLP on the sample dataset.


**Reasoning**:
Evaluate the performance of the trained MLP on the sample dataset by iterating through the data, performing a forward pass, calculating the error, and printing the results, including the average error.



In [14]:
# 1. Iterate through the training_data list
print("\nEvaluating performance on the sample dataset:")
total_error = 0
for inputs_eval, target_output_eval in training_data:
    # 2. Reshape the input to match the expected shape (1, -1)
    inputs_eval_reshaped = inputs_eval.reshape(1, -1)

    # 3. Perform a forward pass using the trained weights and biases
    network_output_eval = forward_pass(inputs_eval_reshaped, weights, biases, hidden_activation, output_activation)

    # 4. Reshape the target output to match the expected shape (1, -1)
    target_output_eval_reshaped = np.array([target_output_eval]).reshape(1, -1)

    # 5. Calculate the mean squared error
    error = mean_squared_error(target_output_eval_reshaped, network_output_eval)
    total_error += error

    # 6. Print the input, target output, predicted output, and error
    print(f"Input: {inputs_eval}, Target: {target_output_eval}, Predicted: {network_output_eval[0][0]:.4f}, Error: {error:.6f}")

# 7. Calculate and print the average mean squared error
average_error = total_error / len(training_data)
print(f"\nAverage Mean Squared Error across the dataset: {average_error:.6f}")


Evaluating performance on the sample dataset:
Input: [0], Target: 0, Predicted: 0.5010, Error: 0.251041
Input: [1], Target: 1, Predicted: 0.5010, Error: 0.248962
Input: [0.5], Target: 0.5, Predicted: 0.5010, Error: 0.000001

Average Mean Squared Error across the dataset: 0.166668


## Summary:

### Data Analysis Key Findings

*   The MLP network was configured with one input neuron, one hidden layer with two neurons, and one output neuron. ReLU was chosen as the activation function for the hidden layer, and Sigmoid for the output layer.
*   Initial weights and biases were set to small random values.
*   During training over 1000 epochs, the average Mean Squared Error (MSE) remained constant at approximately 0.1677.
*   The weights and biases of the network did not change from their initial random values throughout the training process.
*   After training, the network consistently predicted an output close to 0.5010 for all inputs in the training dataset (0, 1, and 0.5), despite the corresponding target outputs being 0, 1, and 0.5 respectively.
*   The final evaluation on the sample dataset confirmed an average MSE of 0.166668, reflecting the network's failure to learn the data pattern.


