# Backpropagation

In [20]:
import pandas as pd
import numpy as np

![Backpropagation](backpropagation1.png)

<img src="backpropagation1.png" alt="Açıklama Metni" style="width:50%; height:50%;">

In [22]:
# Sigmoid Function: Q(x) = 1 / (1 + e^(-x))

def sigmoid(x):
    return 1 / (1 + np.exp(-x)) 

sigmoid(3)


# Sigmoid Derivative: Q'(x) = Q(x) * (1 - Q(x))
def sigmoid_derivative(sigmoid_output):
    return sigmoid(sigmoid_output) * (1 - sigmoid(sigmoid_output))

Sigmoid Function (Activation Function) 

$$
\sigma(x) = \frac{1}{1 + e^{-x}}
$$

The derivative of the sigmoid function is:

$$
\sigma'(x) = \sigma(x) \cdot (1 - \sigma(x))
$$

- np.exp(x) -> it means ${e^x}$, where e is Euler's number (approximately equal to 2.71828).
- np.exp supports both scalar and array inputs.
- math.exp(x) -> it also means ${e^x}$, but only supports scalar values.

In [23]:
# inputs
x1 = 0.5
x2 = 0.3

In [25]:
# expected output
y_expected = 1

# learning rate
learning_rate = 0.1

## Forward Pass

In [26]:
# weights

w1 = 0.7010
w2 = 0.3009
w3 = 0.4011
w4 = 0.6005
w5 = 0.551
w6 = 0.4595

In [27]:
# Neurons and activation function step
def l1f1(x1,x2):
    l1f1_result = x1*w1 + x2*w3 # z1 step
    return sigmoid(l1f1_result) # h1 step, activation func applied

def l1f2(x1,x2):
    l1f2_result = x1*w2 + x2*w4 #z2 step
    return sigmoid(l1f2_result) # h2 step, activation func applied

In [28]:
# Output layer
def of1(h1,h2):
    of1_result = h1*w5 + h2*w6 # z3 step
    return sigmoid(of1_result) # h3 step, activation

In [29]:
h1 = l1f1(x1,x2)
h2 = l1f2(x1,x2)

print("h1: ", round(h1,3))
print("h2: ", round(h2, 3))

h1:  0.616
h2:  0.582


In [30]:
output1 = of1(h1, h2)
print("output1: ", round(output1, 3))

output1:  0.647


### Lost / Cost / Error Function

In [31]:
# Means Squared Error
def mse(y_expected, y_predicted):
    return (y_expected - y_predicted) ** 2 # y is th e expected output, y_hat is the predicted output

# MSE Derivative
def mse_derivative(y_expected, y_predicted):
    return -2 * (y_expected - y_predicted)

#### Mean Squared Error (MSE)

The Mean Squared Error (MSE) is a commonly used loss function in machine learning. 
The formula for MSE is:


$$
\text{MSE} = \frac{1}{N} \sum_{i=1}^N (y_{\text{expected}, i} - y_{\text{predicted}, i})^2
$$

Where:
- \( N \) is the total number of samples.


The Mean Squared Error (MSE) is defined as:

The derivative of MSE with respect to y_predicted is:

$$
\frac{\partial \text{MSE}}{\partial y_{\text{predicted}}} = -\frac{2}{N} (y_{\text{expected}} - y_{\text{predicted}})
$$


For simplicity, if we omit the $\frac{1}{N}$ term (e.g., for a single training example or simplicity in implementation), the derivative becomes:

$$
\frac{\partial \text{MSE}}{\partial y_{\text{predicted}}} = -2 (y_{\text{expected}} - y_{\text{predicted}})
$$


##### In our case: Simplification in backpropagation
- In backpropagation, the key goal is to compute the gradient of the loss with respect to the weights.

- We did not include $\frac{1}{N}$ factor in the case of MSE because our example is simple, so it does not affect the result much.

- In larger neural networks with multiple examples, $\frac{1}{N}$ to average the gradient, which helps make the weight updates more stable.

In [32]:
first_mse = mse(y_expected, output1)
print("MSE: ", round(first_mse, 3))
print("Expected: ", y_expected, "|", "Predicted: ", round(output1, 3))

MSE:  0.124
Expected:  1 | Predicted:  0.647


## Backward Pass

#### Output

In [33]:
# Derivative of the loss function with respect to the output layer (y_predicted)
dL_o1 = mse_derivative(y_expected, output1) 
print("dL_o1: ", round(dL_o1, 3))

dL_o1:  -0.706


Gradient of the Loss w.r.t. Output $o_1$

Derivative of the loss with respect to $o_1$:

$$
\frac{\partial L}{\partial o_1} = -2 \cdot (y_{\text{expected}} - o_1)
$$

Where:
- $y_{\text{expected}}$ is the true value.
- $o_1$ is the predicted output.


In [34]:
# Gradient of the loss wrt output
do1_z3 = sigmoid_derivative(dL_o1)
print("do1_z3: ", round(do1_z3, 3))

do1_z3:  0.221


- Gradient of Output w.r.t. ${z_3}$ (Pre-activation Output)

Using the sigmoid derivative:

$$
\frac{\partial o_1}{\partial z_3} = \sigma(z_3) \cdot (1 - \sigma(z_3))
$$

- Chain Rule: Loss Gradient w.r.t. \(z_3\)

Using the chain rule:

$$
\frac{\partial L}{\partial z_3} = \frac{\partial L}{\partial o_1} \cdot \frac{\partial o_1}{\partial z_3}
$$


#### Hidden Layers

In [35]:
dw5 = do1_z3 * h1
dw6 = do1_z3 * h2
print("dw5: ", round(dw5, 3))
print("dw6: ", round(dw6, 3))

dw5:  0.136
dw6:  0.129


#### ${H_2}$

In [36]:
dh2_z2 = w6 * do1_z3 * sigmoid_derivative(h2)
dw2 = dh2_z2 * x1
dw4 = dh2_z2 * x2

#### ${H_1}$

In [37]:
dh1_z1 = w5 * do1_z3 * sigmoid_derivative(h1)
dw1 = dh1_z1 * x1
dw3 = dh1_z1 * x2

#### Update weights

In [38]:
w1 -= learning_rate * dw1
w2 -= learning_rate * dw2
w3 -= learning_rate * dw3
w4 -= learning_rate * dw4
w5 -= learning_rate * dw5
w6 -= learning_rate * dw6

In [41]:
print(f"Updated Weights:\nw1={round(w1, 4)}, w2={round(w2, 4)}, w3={round(w3, 4)}, \nw4={round(w4, 4)}, w5={round(w5, 4)}, w6={round(w6, 4)}")


Updated Weights:
w1=0.6996, w2=0.2997, w3=0.4003, 
w4=0.5998, w5=0.5374, w6=0.4466


In [42]:
# Re-run forward pass to verify improvements
z1, h1 = l1f1(x1, x2, w1, w3)
z2, h2 = l1f2(x1, x2, w2, w4)
z3, output2 = of1(h1, h2, w5, w6)

TypeError: l1f1() takes 2 positional arguments but 4 were given