Write a Python function that simulates a single neuron with sigmoid activation, and implements backpropagation to update the neuron's weights and bias. The function should take a list of feature vectors, associated true binary labels, initial weights, initial bias, a learning rate, and the number of epochs. The function should update the weights and bias using gradient descent based on the MSE loss, and return the updated weights, bias, and a list of MSE values for each epoch, each rounded to four decimal places.

Example:

Input:

features = [[1.0, 2.0], [2.0, 1.0], [-1.0, -2.0]], labels = [1, 0, 0], initial_weights = [0.1, -0.2], initial_bias = 0.0, learning_rate = 0.1, epochs = 2

Output:

updated_weights = [0.1036, -0.1425], updated_bias = -0.0167, mse_values = [0.3033, 0.2942]

Reasoning:

The neuron receives feature vectors and computes predictions using the sigmoid activation. Based on the predictions and true labels, the gradients of MSE loss with respect to weights and bias are computed and used to update the model parameters across epochs.


In [3]:
import numpy as np
def sigmoid(z):
  return 1/(1+np.exp(-z))
def sigmoid_grad(z):
  return sigmoid(z)*(1-sigmoid(z))
def neuron_backprop(features, labels, weights, bias, learning_rate, epochs):
  n=len(labels)
  y_pred=[]
  mse_values=[]
  #predictions
  for _ in range(epochs):
    z=np.matmul(features,weights)+bias
    y_pred=sigmoid(z)
    #error
    diff=y_pred-labels
    MSE=(1/n)*np.sum(diff**2)#element-wise squaring
    mse_values.append(MSE)
    #gradient
    w_grad = (2/n) * np.dot(features.T, diff * sigmoid_grad(z)) #features dimensions is (n_samples*n_features) and w_grad is for each of the weights for each of the features so its dimension should be n_features
    b_grad=(2/n)*np.sum(diff*sigmoid_grad(z))#sum over all the samples because the diff*sigmoid_grad(z) dimensions is (n_samples,) #also diff*sigmoid_grad(z) is element wise
    #update
    weights-=learning_rate*w_grad
    bias-=learning_rate*b_grad
  return weights.tolist(),bias, mse_values

np.dot(A, B): Performs dot product for 1D arrays and matrix multiplication for 2D arrays.

np.matmul(A, B): Always performs matrix multiplication and does not behave like a dot product for 1D arrays.

In [4]:
print(neuron_backprop(np.array([[1.0, 2.0], [2.0, 1.0], [-1.0, -2.0]]), np.array([1, 0, 0]), np.array([0.1, -0.2]), 0.0,  0.1,  2))

([0.10357439996840352, -0.14254395564702677], -0.016719880375037202, [0.303322803413942, 0.2942232621822798])


Neural Network Learning with Backpropagation

This task involves implementing backpropagation for a single neuron in a neural network. The neuron processes inputs and updates parameters to minimize the Mean Squared Error (MSE) between predicted outputs and true labels.

Mathematical Background

### Forward Pass
Compute the neuron output by calculating the dot product of the weights and input features, and adding the bias:

\[ z = w_1x_1 + w_2x_2 + \dots + w_nx_n + b \]
\[ \sigma(z) = \frac{1}{1 + e^{-z}} \]

### Loss Calculation (MSE)
The Mean Squared Error quantifies the error between the neuron's predictions and the actual labels:

\[ MSE = \frac{1}{n} \sum_{i=1}^{n} (\sigma(z_i) - y_i)^2 \]

### Backward Pass (Gradient Calculation)
Compute the gradient of the MSE with respect to each weight and the bias. This involves the partial derivatives of the loss function with respect to the output of the neuron, multiplied by the derivative of the sigmoid function:

\[ \frac{\partial MSE}{\partial w_j} = \frac{2}{n} \sum_{i=1}^{n} (\sigma(z_i) - y_i) \sigma'(z_i) x_{ij} \]

\[ \frac{\partial MSE}{\partial b} = \frac{2}{n} \sum_{i=1}^{n} (\sigma(z_i) - y_i) \sigma'(z_i) \]

### Parameter Update
Update each weight and the bias by subtracting a portion of the gradient, determined by the learning rate \( \alpha \):

\[ w_j = w_j - \alpha \frac{\partial MSE}{\partial w_j} \]
\[ b = b - \alpha \frac{\partial MSE}{\partial b} \]

### Practical Implementation
This process refines the neuron's ability to predict accurately by iteratively adjusting the weights and bias based on the error gradients, optimizing the neural network's performance over multiple iterations.
