### BackPropagation Example: Single Neuron with a Single Input

#### Setup
- Imagine a single neuron with one input $ x $, a weight $ w $, and a bias $ b $.
- The neuron applies a linear transformation followed by an activation function. Here, we'll use the identity function (i.e., no activation function) for simplicity.
- The output of the neuron is $ y = wx + b $.

#### Initial Values
- Input $ x = 2 $
- Weight $ w = 0.5 $
- Bias $ b = 1 $
- Target output $ y_{\text{target}} = 4 $

#### Forward Pass
1. Compute the output $ y $:
   $$
   y = wx + b = 0.5 \times 2 + 1 = 2
   $$

2. Compute the error (loss) using the Mean Squared Error (MSE) loss function:
   $$
   \text{Loss} = \frac{1}{2} (y_{\text{target}} - y)^2 = \frac{1}{2} (4 - 2)^2 = 2
   $$

#### Backward Pass (Backpropagation)
1. Compute the gradient of the loss with respect to the output $ y $:
   $$
   \frac{\partial \text{Loss}}{\partial y} = y - y_{\text{target}} = 2 - 4 = -2
   $$

2. Compute the gradient of the output $ y $ with respect to the weight $ w $ and the bias $ b $:
   $$
   \frac{\partial y}{\partial w} = x = 2
   $$
   $$
   \frac{\partial y}{\partial b} = 1
   $$

3. Compute the gradient of the loss with respect to the weight $ w $ and the bias $ b $ using the chain rule:
   $$
   \frac{\partial \text{Loss}}{\partial w} = \frac{\partial \text{Loss}}{\partial y} \times \frac{\partial y}{\partial w} = -2 \times 2 = -4
   $$
   $$
   \frac{\partial \text{Loss}}{\partial b} = \frac{\partial \text{Loss}}{\partial y} \times \frac{\partial y}{\partial b} = -2 \times 1 = -2
   $$

#### Update Weights
- Use a learning rate $ \eta $ (let's say $ \eta = 0.1 $) to update the weight $ w $and the bias $ b $:
  $$
  w_{\text{new}} = w - \eta \frac{\partial \text{Loss}}{\partial w} = 0.5 - 0.1 \times (-4) = 0.5 + 0.4 = 0.9
  $$
  $$
  b_{\text{new}} = b - \eta \frac{\partial \text{Loss}}{\partial b} = 1 - 0.1 \times (-2) = 1 + 0.2 = 1.2
  $$

#### Repeat
- The process of forward pass, computing the loss, backpropagation, and updating weights is repeated for multiple iterations until the loss is minimized.

### Summary
1. **Forward Pass**: Compute the output and loss.
2. **Backward Pass**: Calculate gradients of the loss with respect to the weights and biases.
3. **Update Weights**: Adjust the weights and biases using the gradients and a learning rate.

To extend the example with a single input, a hidden layer with one neuron, and a single output, we can follow a similar process, but now with additional parameters and layers.

### Setup
- Input: $ x $
- Weights: $ w_1 $ for input to hidden layer, $ w_2 $ for hidden layer to output
- Biases: $ b_1 $ for hidden layer, $ b_2 $ for output layer
- Target output: $ y_{\text{target}} $

### Initial Values
- $x = 2 $
- $ w_1 = 0.5 $
- $ b_1 = 1 $
- $ w_2 = 0.5 $
- $ b_2 = 1 $
- $ y_{\text{target}} = 4 $

### Forward Pass
1. Compute the output of the hidden layer (using the identity function):
   $$
   h = w_1 x + b_1 = 0.5 \times 2 + 1 = 2
   $$

2. Compute the final output $ y $ (using the identity function):
   $$
   y = w_2 h + b_2 = 0.5 \times 2 + 1 = 2
   $$

3. Compute the error (loss) using the Mean Squared Error (MSE) loss function:
   $$
   \text{Loss} = \frac{1}{2} (y_{\text{target}} - y)^2 = \frac{1}{2} (4 - 2)^2 = 2
   $$

### Backward Pass (Backpropagation)
1. Compute the gradient of the loss with respect to the output $ y $:
   $$
   \frac{\partial \text{Loss}}{\partial y} = y - y_{\text{target}} = 2 - 4 = -2
   $$

2. Compute the gradient of the output $ y $ with respect to the hidden output $ h $, weight $ w_2 $, and bias $ b_2 $:
   $$
   \frac{\partial y}{\partial h} = w_2 = 0.5
   $$
   $$
   \frac{\partial y}{\partial w_2} = h = 2
   $$
   $$
   \frac{\partial y}{\partial b_2} = 1
   $$

3. Compute the gradient of the hidden output $ h $ with respect to the weight $ w_1 $, input $ x $, and bias $ b_1 $:
   $$
   \frac{\partial h}{\partial w_1} = x = 2
   $$
   $$
   \frac{\partial h}{\partial b_1} = 1
   $$

4. Compute the gradient of the loss with respect to the weights $ w_2 $ and $ w_1 $, and biases $ b_2 $ and $ b_1 $:
   $$
   \frac{\partial \text{Loss}}{\partial w_2} = \frac{\partial \text{Loss}}{\partial y} \times \frac{\partial y}{\partial w_2} = -2 \times 2 = -4
   $$
   $$
   \frac{\partial \text{Loss}}{\partial b_2} = \frac{\partial \text{Loss}}{\partial y} \times \frac{\partial y}{\partial b_2} = -2 \times 1 = -2
   $$
   $$
   \frac{\partial \text{Loss}}{\partial h} = \frac{\partial \text{Loss}}{\partial y} \times \frac{\partial y}{\partial h} = -2 \times 0.5 = -1
   $$
   $$
   \frac{\partial \text{Loss}}{\partial w_1} = \frac{\partial \text{Loss}}{\partial h} \times \frac{\partial h}{\partial w_1} = -1 \times 2 = -2
   $$
   $$
   \frac{\partial \text{Loss}}{\partial b_1} = \frac{\partial \text{Loss}}{\partial h} \times \frac{\partial h}{\partial b_1} = -1 \times 1 = -1
   $$

### Update Weights
- Use a learning rate $ \eta $ (let's say $ \eta = 0.1 $) to update the weights $ w_1 $, $ w_2 $ and biases $ b_1 $, $ b_2 $:
  $$
  w_{1_{\text{new}}} = w_1 - \eta \frac{\partial \text{Loss}}{\partial w_1} = 0.5 - 0.1 \times (-2) = 0.5 + 0.2 = 0.7
  $$
  $$
  b_{1_{\text{new}}} = b_1 - \eta \frac{\partial \text{Loss}}{\partial b_1} = 1 - 0.1 \times (-1) = 1 + 0.1 = 1.1
  $$
  $$
  w_{2_{\text{new}}} = w_2 - \eta \frac{\partial \text{Loss}}{\partial w_2} = 0.5 - 0.1 \times (-4) = 0.5 + 0.4 = 0.9
  $$
  $$
  b_{2_{\text{new}}} = b_2 - \eta \frac{\partial \text{Loss}}{\partial b_2} = 1 - 0.1 \times (-2) = 1 + 0.2 = 1.2
  $$

### Summary
1. **Forward Pass**: Compute the output of the hidden layer, then the final output, and calculate the loss.
2. **Backward Pass**: Calculate gradients of the loss with respect to weights and biases in both the hidden and output layers.
3. **Update Weights**: Adjust weights and biases using the gradients and a learning rate.

Repeat this process for multiple iterations until the loss is minimized. This iterative process gradually adjusts the weights and biases to minimize the error between the predicted output and the target output.

In [4]:
import numpy as np

np.random.seed(1)

def relu(x):
    return (x > 0) * x # returns x if x > 0
                       # return 0 otherwise

def relu2deriv(output):
    return output>0 # returns 1 for input > 0
                    # return 0 otherwise

streetlights = np.array( [[ 1, 0, 1 ],
                          [ 0, 1, 1 ],
                          [ 0, 0, 1 ],
                          [ 1, 1, 1 ] ] )

walk_vs_stop = np.array([[ 1, 1, 0, 0]]).T

print(walk_vs_stop)
    
alpha = 0.2
hidden_size = 4

weights_0_1 = 2*np.random.random((3,hidden_size)) - 1
weights_1_2 = 2*np.random.random((hidden_size,1)) - 1

print(weights_0_1)
print(weights_1_2)

[[1]
 [1]
 [0]
 [0]]
[[-0.16595599  0.44064899 -0.99977125 -0.39533485]
 [-0.70648822 -0.81532281 -0.62747958 -0.30887855]
 [-0.20646505  0.07763347 -0.16161097  0.370439  ]]
[[-0.5910955 ]
 [ 0.75623487]
 [-0.94522481]
 [ 0.34093502]]


In [7]:

for iteration in range(60):
   layer_2_error = 0
   for i in range(len(streetlights)):
      layer_0 = streetlights[i:i+1]
      layer_1 = relu(np.dot(layer_0,weights_0_1))
      layer_2 = np.dot(layer_1,weights_1_2)

      layer_2_error += np.sum((layer_2 - walk_vs_stop[i:i+1]) ** 2)

      layer_2_delta = (layer_2 - walk_vs_stop[i:i+1])
      layer_1_delta=layer_2_delta.dot(weights_1_2.T)*relu2deriv(layer_1)

      weights_1_2 -= alpha * layer_1.T.dot(layer_2_delta)
      weights_0_1 -= alpha * layer_0.T.dot(layer_1_delta)

   if(iteration % 10 == 9):
      print("Error:" + str(layer_2_error))

ff0 [[1 0 1]]
ff1 [[0 1 1]]
ff2 [[0 0 1]]
ff3 [[1 1 1]]
ff0 [[1 0 1]]
ff1 [[0 1 1]]
ff2 [[0 0 1]]
ff3 [[1 1 1]]
ff0 [[1 0 1]]
ff1 [[0 1 1]]
ff2 [[0 0 1]]
ff3 [[1 1 1]]
ff0 [[1 0 1]]
ff1 [[0 1 1]]
ff2 [[0 0 1]]
ff3 [[1 1 1]]
ff0 [[1 0 1]]
ff1 [[0 1 1]]
ff2 [[0 0 1]]
ff3 [[1 1 1]]
ff0 [[1 0 1]]
ff1 [[0 1 1]]
ff2 [[0 0 1]]
ff3 [[1 1 1]]
ff0 [[1 0 1]]
ff1 [[0 1 1]]
ff2 [[0 0 1]]
ff3 [[1 1 1]]
ff0 [[1 0 1]]
ff1 [[0 1 1]]
ff2 [[0 0 1]]
ff3 [[1 1 1]]
ff0 [[1 0 1]]
ff1 [[0 1 1]]
ff2 [[0 0 1]]
ff3 [[1 1 1]]
ff0 [[1 0 1]]
ff1 [[0 1 1]]
ff2 [[0 0 1]]
ff3 [[1 1 1]]
Error:4.522200282839368e-15
ff0 [[1 0 1]]
ff1 [[0 1 1]]
ff2 [[0 0 1]]
ff3 [[1 1 1]]
ff0 [[1 0 1]]
ff1 [[0 1 1]]
ff2 [[0 0 1]]
ff3 [[1 1 1]]
ff0 [[1 0 1]]
ff1 [[0 1 1]]
ff2 [[0 0 1]]
ff3 [[1 1 1]]
ff0 [[1 0 1]]
ff1 [[0 1 1]]
ff2 [[0 0 1]]
ff3 [[1 1 1]]
ff0 [[1 0 1]]
ff1 [[0 1 1]]
ff2 [[0 0 1]]
ff3 [[1 1 1]]
ff0 [[1 0 1]]
ff1 [[0 1 1]]
ff2 [[0 0 1]]
ff3 [[1 1 1]]
ff0 [[1 0 1]]
ff1 [[0 1 1]]
ff2 [[0 0 1]]
ff3 [[1 1 1]]
ff0 [[1 0 1]]
ff1 [[