|
| 1 | + |
| 2 | +# Backpropagation in Neural Networks |
| 3 | + |
| 4 | +## Overview |
| 5 | + |
| 6 | +Backpropagation is a fundamental algorithm used for training artificial neural networks. It computes the gradient of the loss function with respect to each weight by the chain rule, efficiently propagating errors backward through the network. This allows for the adjustment of weights to minimize the loss function, ultimately improving the performance of the neural network. |
| 7 | + |
| 8 | + |
| 9 | + |
| 10 | + |
| 11 | +# How Backpropagation Works |
| 12 | + |
| 13 | +## Forward propogation |
| 14 | + |
| 15 | +- Input Layer: The input data is fed into the network. |
| 16 | +- Hidden Layers: Each layer performs computations using weights and biases to transform the input data. |
| 17 | +- Output Layer: The final transformation produces the output, which is compared to the actual target to calculate the loss. |
| 18 | + |
| 19 | +### Mathematical Formulation |
| 20 | +$$ |
| 21 | +a_i^l = f\left(z_i^l\right) = f\left(\sum_j w_{ij}^l a_j^{l-1} + b_i^l\right) |
| 22 | +$$ |
| 23 | + |
| 24 | + |
| 25 | +where f is the activation function, zᵢˡ is the net input of neuron i in layer l, wᵢⱼˡ is the connection weight between neuron j in layer l — 1 and neuron i in layer l, and bᵢˡ is the bias of neuron i in layer l. |
| 26 | + |
| 27 | +## Backward propogation |
| 28 | + |
| 29 | +- Compute Loss: Calculate the error (loss) using a loss function (e.g., Mean Squared Error, Cross-Entropy Loss). |
| 30 | +- Error Propagation: Propagate the error backward through the network, layer by layer. |
| 31 | +- Gradient Calculation: Compute the gradient of the loss with respect to each weight using the chain rule. |
| 32 | +- Weight Update: Adjust the weights by subtracting the gradient multiplied by the learning rate. |
| 33 | + |
| 34 | +### Mathematical Formulation |
| 35 | + |
| 36 | +- The loss function measures how well the neural network's output matches the target values. Common loss functions include: |
| 37 | +1) **Mean Squared Error (MSE):** |
| 38 | + |
| 39 | +$$ |
| 40 | +L = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 |
| 41 | +$$ |
| 42 | +1) **Cross-Entropy Loss:** |
| 43 | + |
| 44 | +$$ |
| 45 | +L = -\frac{1}{n} \sum_{i=1}^{n} \left[ y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i) \right] |
| 46 | +$$ |
| 47 | + |
| 48 | + |
| 49 | +- For each weight 𝑤 in the network, the gradient of the loss L with respect to w is computed as: |
| 50 | + |
| 51 | +$$ |
| 52 | +\frac{\partial L}{\partial w} = \frac{\partial L}{\partial \hat{y}} \cdot \frac{\partial \hat{y}}{\partial w} |
| 53 | +$$ |
| 54 | + |
| 55 | + |
| 56 | +- Weights are updated using the gradient descent algorithm: |
| 57 | + |
| 58 | +$$ |
| 59 | +w \leftarrow w - \eta \frac{\partial L}{\partial w} |
| 60 | +$$ |
| 61 | + |
| 62 | +# Backpropogation from scratch |
| 63 | + |
| 64 | + |
| 65 | + |
| 66 | +```bash |
| 67 | + import numpy as np |
| 68 | + |
| 69 | +def sigmoid(x): |
| 70 | + return 1 / (1 + np.exp(-x)) |
| 71 | + |
| 72 | +def sigmoid_derivative(x): |
| 73 | + return x * (1 - x) |
| 74 | + |
| 75 | +# Input data |
| 76 | +X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]]) |
| 77 | +y = np.array([[0], [1], [1], [0]]) |
| 78 | + |
| 79 | +# Initialize weights and biases |
| 80 | +np.random.seed(42) |
| 81 | +weights_input_hidden = np.random.rand(2, 2) |
| 82 | +weights_hidden_output = np.random.rand(2, 1) |
| 83 | +bias_hidden = np.random.rand(1, 2) |
| 84 | +bias_output = np.random.rand(1, 1) |
| 85 | +learning_rate = 0.1 |
| 86 | + |
| 87 | +# Training |
| 88 | + |
| 89 | +for epoch in range(10000): |
| 90 | + |
| 91 | + # Forward pass |
| 92 | + hidden_input = np.dot(X, weights_input_hidden) + bias_hidden |
| 93 | + hidden_output = sigmoid(hidden_input) |
| 94 | + final_input = np.dot(hidden_output, weights_hidden_output) + bias_output |
| 95 | + final_output = sigmoid(final_input) |
| 96 | + |
| 97 | + # Error |
| 98 | + error = y - final_output |
| 99 | + d_output = error * sigmoid_derivative(final_output) |
| 100 | + |
| 101 | + # Backward Propogation ( gradient decent) |
| 102 | + error_hidden = d_output.dot(weights_hidden_output.T) |
| 103 | + d_hidden = error_hidden * sigmoid_derivative(hidden_output) |
| 104 | + |
| 105 | + # Update weights and biases |
| 106 | + weights_hidden_output += hidden_output.T.dot(d_output) * learning_rate |
| 107 | + bias_output += np.sum(d_output, axis=0, keepdims=True) * learning_rate |
| 108 | + weights_input_hidden += X.T.dot(d_hidden) * learning_rate |
| 109 | + bias_hidden += np.sum(d_hidden, axis=0) * learning_rate |
| 110 | + |
| 111 | +print("Training complete") |
| 112 | +print("Output after training:") |
| 113 | +print(final_output) |
| 114 | + |
| 115 | +``` |
| 116 | +
|
| 117 | +
|
| 118 | +## Conclusion |
| 119 | +
|
| 120 | +Backpropagation is a powerful technique for training neural networks(ANN), enabling them to learn complex patterns and make accurate predictions. Understanding the mechanics and mathematics behind it is essential to Understand inner woking of an ANN. |
0 commit comments