# Backpropagation Tutorial with a Simple Neural Network

In this tutorial, we will implement a simple neural network with one hidden layer and demonstrate the backpropagation process using real numbers.


## Numpy example

In [None]:
import numpy as np

In [2]:
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    return x * (1 - x)

# Input data
x = np.array([1.0])

# Target
t = np.array([0.0])

# Initialise weights and biases
W1 = np.array([0.5])
b1 = np.array([0.0])
W2 = np.array([-1.0])
b2 = np.array([0.0])

# Forward pass
z = W1 * x + b1
a = sigmoid(z)
y = W2 * a + b2

# Compute loss (Mean Squared Error)
L = 0.5 * (y - t) ** 2

print("Forward Pass:")
print("z =", z)
print("a =", a)
print("y =", y)
print("Loss L =", L)


Forward Pass:
z = [0.5]
a = [0.62245933]
y = [-0.62245933]
Loss L = [0.19372781]


### Forward Pass

In the forward pass, we compute the intermediate values for the hidden layer and the final output:

$$
z = W_1 \cdot x + b_1
$$

$$
a = \sigma(z) \quad \text{where} \quad \sigma \text{ is the sigmoid function}
$$

$$
y = W_2 \cdot a + b_2
$$

We then compute the loss \( L \) using the mean squared error formula:

$$
L = \frac{1}{2} (y - t)^2
$$


In [3]:
# Backward pass
# Gradient of loss w.r.t. output y
dL_dy = y - t

# Gradient of loss w.r.t. W2 and b2
dy_dW2 = a
dL_dW2 = dL_dy * dy_dW2
dL_db2 = dL_dy

# Gradient of loss w.r.t. activation a
dy_da = W2
dL_da = dL_dy * dy_da

# Gradient of loss w.r.t. z
da_dz = sigmoid_derivative(a)
dz_dW1 = x
dL_dW1 = dL_da * da_dz * dz_dW1
dL_db1 = dL_da * da_dz

print("Backward Pass:")
print("dL_dy =", dL_dy)
print("dL_dW2 =", dL_dW2)
print("dL_db2 =", dL_db2)
print("dL_da =", dL_da)
print("dL_dW1 =", dL_dW1)
print("dL_db1 =", dL_db1)


Backward Pass:
dL_dy = [-0.62245933]
dL_dW2 = [-0.38745562]
dL_db2 = [-0.62245933]
dL_da = [0.62245933]
dL_dW1 = [0.14628025]
dL_db1 = [0.14628025]


### Backward Pass

In the backward pass, we use the chain rule to compute the gradients of the loss with respect to each weight and bias:

$$
\frac{\partial L}{\partial y} = y - t
$$

$$
\frac{\partial L}{\partial W_2} = \frac{\partial L}{\partial y} \cdot \frac{\partial y}{\partial W_2}
$$

$$
\frac{\partial L}{\partial b_2} = \frac{\partial L}{\partial y} \cdot 1
$$

$$
\frac{\partial L}{\partial a} = \frac{\partial L}{\partial y} \cdot \frac{\partial y}{\partial a}
$$

$$
\frac{\partial L}{\partial W_1} = \frac{\partial L}{\partial a} \cdot \frac{\partial a}{\partial z} \cdot \frac{\partial z}{\partial W_1}
$$

$$
\frac{\partial L}{\partial b_1} = \frac{\partial L}{\partial a} \cdot \frac{\partial a}{\partial z} \cdot 1
$$


In [4]:
# Learning rate
eta = 0.01

# Update weights and biases
W2 -= eta * dL_dW2
b2 -= eta * dL_db2
W1 -= eta * dL_dW1
b1 -= eta * dL_db1

print("Updated Parameters:")
print("W2 =", W2)
print("b2 =", b2)
print("W1 =", W1)
print("b1 =", b1)


Updated Parameters:
W2 = [-0.99612544]
b2 = [0.00622459]
W1 = [0.4985372]
b1 = [-0.0014628]


### Update Parameters

Using the computed gradients, we update the weights and biases using gradient descent:

$$
W_2 \leftarrow W_2 - \eta \cdot \frac{\partial L}{\partial W_2}
$$

$$
b_2 \leftarrow b_2 - \eta \cdot \frac{\partial L}{\partial b_2}
$$

$$
W_1 \leftarrow W_1 - \eta \cdot \frac{\partial L}{\partial W_1}
$$

$$
b_1 \leftarrow b_1 - \eta \cdot \frac{\partial L}{\partial b_1}
$$


Where /eta is the learning rate.


### Conclusion
We have successfully demonstrated the forward and backward passes of a simple neural network with real numbers. The backpropagation process involves computing the gradients of the loss function with respect to each weight and bias using the chain rule, and then updating the parameters using gradient descent. This iterative process continues until the network converges to a minimum loss.

## PyTorch example

In [1]:
import torch
import torch.nn.functional as F

In [25]:
# Set random seed for reproducibility
torch.manual_seed(0)

# Input data
x = torch.tensor([1.0], requires_grad=True)

# True target
t = torch.tensor([0.0])

# Initialize weights and biases
W1 = torch.tensor([0.5], requires_grad=True)
b1 = torch.tensor([0.0], requires_grad=True)
W2 = torch.tensor([-1.0], requires_grad=True)
b2 = torch.tensor([0.0], requires_grad=True)

# Forward pass
z = W1 * x + b1
a = torch.sigmoid(z)
y = W2 * a + b2

# Compute loss (Mean Squared Error)
L = 0.5 * (y - t) ** 2

print("Forward Pass:")
print("z =", z.item())
print("a =", a.item())
print("y =", y.item())
print("Loss L =", L.item())

Forward Pass:
z = 0.5
a = 0.622459352016449
y = -0.622459352016449
Loss L = 0.19372782111167908


In [26]:
# Backward pass
L.backward()

# Gradients
# dL_dy = y.grad # will be None because it's not a leaf node
dL_dW2 = W2.grad
dL_db2 = b2.grad
# dL_da = a.grad # will be None because it's not a leaf node
dL_dW1 = W1.grad
dL_db1 = b1.grad


print("Backward Pass:")
# print("dL_dy =", dL_dy)
print("dL_dW2 =", dL_dW2.item())
print("dL_db2 =", dL_db2.item())
# print("dL_da =", dL_da) 
print("dL_dW1 =", dL_dW1.item())
print("dL_db1 =", dL_db1.item())


Backward Pass:
dL_dW2 = -0.38745564222335815
dL_db2 = -0.622459352016449
dL_dW1 = 0.14628025889396667
dL_db1 = 0.14628025889396667
