# **Part 1: Manual training of multi-layer feedforward network (Example-by-example training)**

In this part, I manually trained a multi-layer feedforward network using **example-by-example training**. The training process consists of three main steps:

1. **Training the network with the first input $x_1 = 0.7853$ and target $t_1 = 0.707$**.
2. **Training the network with the second input $x_2 = 1.57$ and target $t_2 = 1.0$**, using the updated weights from the first input.
3. **Calculating the Mean Square Error (MSE)** for both inputs after training.

## **Initial Setup**

- Initial weights after epoch 1:
  - $a_0 = 0.301444$
  - $a_1 = 0.201954$
  - $b_0 = -0.0844103$
  - $b_1 = 0.409993$
- Learning rate: $\beta = 0.1$

The network will be trained using two inputs and their corresponding target values:

- Input 1: $x_1 = 0.7853$, Target: $t_1 = 0.707$
- Input 2: $x_2 = 1.57$, Target: $t_2 = 1.0$


In [None]:
import numpy as np

# Sigmoid activation function
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

# Derivative of the sigmoid function
def sigmoid_derivative(x):
    return sigmoid(x) * (1 - sigmoid(x))

# Given initial weights after epoch 1
a_0 = 0.301444
a_1 = 0.201954
b_0 = -0.0844103
b_1 = 0.409993

# Inputs and targets
x1 = 0.7853
t1 = 0.707
x2 = 1.57
t2 = 1.0

# Learning rate
beta = 0.1

# Print the initial conditions and setup
print("=== Initial Setup ===")
print(f"Initial weights: a_0 = {a_0}, a_1 = {a_1}, b_0 = {b_0}, b_1 = {b_1}")
print(f"Input 1: x1 = {x1}, Target 1: t1 = {t1}")
print(f"Input 2: x2 = {x2}, Target 2: t2 = {t2}")
print(f"Learning rate: beta = {beta}\n")


## **Step 1**

### **Part 1(i): Weight Adjustment for the First Input $x_1$**

#### **Forward Pass**

1. Compute the weighted sum for the hidden layer:
   $$
   u_1 = a_0 + a_1 \cdot x_1 = 0.301444 + 0.201954 \cdot 0.7853 = 0.460038
   $$
2. Apply the sigmoid activation to get the hidden layer output:
   $$
   y_1 = \frac{1}{1 + e^{-u_1}} = \frac{1}{1 + e^{-0.460038}} = 0.613023
   $$
3. Compute the weighted sum for the output layer:
   $$
   v_1 = b_0 + b_1 \cdot y_1 = -0.0844103 + 0.409993 \cdot 0.613023 = 0.166925
   $$
4. Apply the sigmoid activation to get the network output:
   $$
   z_1 = \frac{1}{1 + e^{-v_1}} = \frac{1}{1 + e^{-0.166925}} = 0.541635
   $$

#### **Error Calculation**

Calculate the error for the first input:
$$
E_1 = (z_1 - t_1)^2 = (0.541635 - 0.707)^2 = 0.0273457
$$

#### **Backpropagation and Weight Update**

1. Compute the gradients for the output layer:
   $$
   p_1 = (z_1 - t_1) \cdot z_1 \cdot (1 - z_1) = (0.541635 - 0.707) \cdot 0.541635 \cdot (1 - 0.541635) = -0.041055
   $$
2. Compute the gradients for the hidden layer:
   $$
   q_1 = p_1 \cdot b_1 \cdot y_1 \cdot (1 - y_1) = -0.041055 \cdot 0.409993 \cdot 0.613023 \cdot (1 - 0.613023) = -0.003993
   $$
3. Update the weights:
   - $\Delta a_0 = -\beta \cdot q_1 = 0.000399$, 
     so $a_0' = 0.301843$
   - $\Delta a_1 = -\beta \cdot q_1 \cdot x_1 = 0.000314$, 
     so $a_1' = 0.202268$
   - $\Delta b_0 = -\beta \cdot p_1 = 0.004105$, 
     so $b_0' = -0.080305$
   - $\Delta b_1 = -\beta \cdot p_1 \cdot y_1 = 0.002517$, 
     so $b_1' = 0.412510$


In [None]:
# Step 1: Training with the first input (x1)
print("=== Step 1: Training with First Input (x1) ===")

# Step 1.1: Forward pass for the first input
print("=== Forward Pass ===")
u1 = a_0 + a_1 * x1
print(f"u1 = a_0 + a_1 * x1 = {a_0} + {a_1} * {x1} = {u1}")
y1 = sigmoid(u1)
print(f"y1 = sigmoid(u1) = sigmoid({u1}) = {y1}")
v1 = b_0 + b_1 * y1
print(f"v1 = b_0 + b_1 * y1 = {b_0} + {b_1} * {y1} = {v1}")
z1 = sigmoid(v1)
print(f"z1 = sigmoid(v1) = sigmoid({v1}) = {z1}\n")

# Step 1.2: Error calculation for the first input
print("=== Error Calculation ===")
error_1 = (z1 - t1) ** 2
print(f"Error for x1: E1 = (z1 - t1)^2 = ({z1} - {t1})^2 = {error_1}\n")

# Step 1.3: Backpropagation - Compute gradients for the first input
print("=== Backpropagation - Gradient Calculation ===")
p1 = (z1 - t1) * sigmoid_derivative(v1)
print(f"p1 (output layer gradient) = (z1 - t1) * sigmoid_derivative(v1) = ({z1} - {t1}) * sigmoid_derivative({v1}) = {p1}")
q1 = p1 * b_1 * sigmoid_derivative(u1)
print(f"q1 (hidden layer gradient) = p1 * b_1 * sigmoid_derivative(u1) = {p1} * {b_1} * sigmoid_derivative({u1}) = {q1}\n")

# Step 1.4: Update weights based on the first input
print("=== Weight Updates ===")
delta_b0_1 = -beta * p1
delta_b1_1 = -beta * p1 * y1
delta_a0_1 = -beta * q1
delta_a1_1 = -beta * q1 * x1

# New weights after the first input
a_0_new = a_0 + delta_a0_1
a_1_new = a_1 + delta_a1_1
b_0_new = b_0 + delta_b0_1
b_1_new = b_1 + delta_b1_1

print(f"Updated a_0: a_0_new = a_0 + delta_a0 = {a_0} + {delta_a0_1} = {a_0_new}")
print(f"Updated a_1: a_1_new = a_1 + delta_a1 = {a_1} + {delta_a1_1} = {a_1_new}")
print(f"Updated b_0: b_0_new = b_0 + delta_b0 = {b_0} + {delta_b0_1} = {b_0_new}")
print(f"Updated b_1: b_1_new = b_1 + delta_b1 = {b_1} + {delta_b1_1} = {b_1_new}\n")


## **Step 2**

### **Training with Second Input $x_2$**

#### **Forward Pass**

1. Compute the weighted sum for the hidden layer:
   $$
   u_2 = a_0' + a_1' \cdot x_2 = 0.301843 + 0.202268 \cdot 1.57 = 0.619403
   $$
2. Apply the sigmoid activation to get the hidden layer output:
   $$
   y_2 = \frac{1}{1 + e^{-u_2}} = \frac{1}{1 + e^{-0.619403}} = 0.650083
   $$
3. Compute the weighted sum for the output layer:
   $$
   v_2 = b_0' + b_1' \cdot y_2 = -0.080305 + 0.412510 \cdot 0.650083 = 0.187861
   $$
4. Apply the sigmoid activation to get the network output:
   $$
   z_2 = \frac{1}{1 + e^{-v_2}} = \frac{1}{1 + e^{-0.187861}} = 0.546828
   $$

#### **Error Calculation**

Calculate the error for the second input:
$$
E_2 = (z_2 - t_2)^2 = (0.546828 - 1.0)^2 = 0.205365
$$

#### **Backpropagation and Weight Update**

1. Compute the gradients for the output layer:
   $$
   p_2 = (z_2 - t_2) \cdot z_2 \cdot (1 - z_2) = (0.546828 - 1.0) \cdot 0.546828 \cdot (1 - 0.546828) = -0.112299
   $$
2. Compute the gradients for the hidden layer:
   $$
   q_2 = p_2 \cdot b_1' \cdot y_2 \cdot (1 - y_2) = -0.112299 \cdot 0.412510 \cdot 0.650083 \cdot (1 - 0.650083) = -0.010538
   $$
3. Update the weights:
   - $\Delta a_0 = -\beta \cdot q_2 = 0.001054$, so $a_0'' = 0.302897$
   - $\Delta a_1 = -\beta \cdot q_2 \cdot x_2 = 0.001654$, so $a_1'' = 0.203922$
   - $\Delta b_0 = -\beta \cdot p_2 = 0.011230$, so $b_0'' = -0.069075$
   - $\Delta b_1 = -\beta \cdot p_2 \cdot y_2 = 0.007300$, so $b_1'' = 0.419810$

In [None]:
# Step 2: Training with the second input (x2)
print("=== Step 2: Training with Second Input (x2) ===")

# Step 2.1: Forward pass for the second input with updated weights
print("=== Forward Pass ===")
u2 = a_0_new + a_1_new * x2
print(f"u2 = a_0_new + a_1_new * x2 = {a_0_new} + {a_1_new} * {x2} = {u2}")
y2 = sigmoid(u2)
print(f"y2 = sigmoid(u2) = sigmoid({u2}) = {y2}")
v2 = b_0_new + b_1_new * y2
print(f"v2 = b_0_new + b_1_new * y2 = {b_0_new} + {b_1_new} * {y2} = {v2}")
z2 = sigmoid(v2)
print(f"z2 = sigmoid(v2) = sigmoid({v2}) = {z2}\n")

# Step 2.2: Error calculation for the second input
print("=== Error Calculation ===")
error_2 = (z2 - t2) ** 2
print(f"Error for x2: E2 = (z2 - t2)^2 = ({z2} - {t2})^2 = {error_2}\n")

# Step 2.3: Backpropagation - Compute gradients for the second input
print("=== Backpropagation - Gradient Calculation ===")
p2 = (z2 - t2) * sigmoid_derivative(v2)
print(f"p2 (output layer gradient) = (z2 - t2) * sigmoid_derivative(v2) = ({z2} - {t2}) * sigmoid_derivative({v2}) = {p2}")
q2 = p2 * b_1_new * sigmoid_derivative(u2)
print(f"q2 (hidden layer gradient) = p2 * b_1_new * sigmoid_derivative(u2) = {p2} * {b_1_new} * sigmoid_derivative({u2}) = {q2}\n")

# Step 2.4: Update weights based on the second input
print("=== Weight Updates ===")
delta_b0_2 = -beta * p2
delta_b1_2 = -beta * p2 * y2
delta_a0_2 = -beta * q2
delta_a1_2 = -beta * q2 * x2

# Final updated weights after the second input
a_0_final = a_0_new + delta_a0_2
a_1_final = a_1_new + delta_a1_2
b_0_final = b_0_new + delta_b0_2
b_1_final = b_1_new + delta_b1_2

print(f"Updated a_0: a_0_final = a_0_new + delta_a0_2 = {a_0_new} + {delta_a0_2} = {a_0_final}")
print(f"Updated a_1: a_1_final = a_1_new + delta_a1_2 = {a_1_new} + {delta_a1_2} = {a_1_final}")
print(f"Updated b_0: b_0_final = b_0_new + delta_b0_2 = {b_0_new} + {delta_b0_2} = {b_0_final}")
print(f"Updated b_1: b_1_final = b_1_new + delta_b1_2 = {b_1_new} + {delta_b1_2} = {b_1_final}\n")


## **Step 3**

### **Part 1(ii): Network Output and MSE Calculation for Both Inputs**

After updating the weights for both inputs, we calculate the final network outputs and the Mean Square Error (MSE).

#### **Final Forward Pass for $x_1$ (Recalculating $z_1'$)**

1. **Compute the weighted sum for the hidden layer**:
   $$
   u_1' = a_0'' + a_1'' \cdot x_1 = 0.302897 + 0.203922 \cdot 0.7853 = 0.463467
   $$
2. **Compute the hidden layer output**:
   $$
   y_1' = \frac{1}{1 + e^{-u_1'}} = \frac{1}{1 + e^{-0.463467}} = 0.613857
   $$
3. **Compute the weighted sum for the output layer**:
   $$
   v_1' = b_0'' + b_1'' \cdot y_1' = -0.069075 + 0.419810 \cdot 0.613857 = 0.188855
   $$
4. **Compute the output for $x_1$**:
   $$
   z_1' = \frac{1}{1 + e^{-v_1'}} = \frac{1}{1 + e^{-0.188855}} = 0.547005
   $$
5. **Final error for $x_1$**:
   $$
   E_1' = (z_1' - t_1)^2 = (0.547005 - 0.707)^2 = 0.025598
   $$

#### **Final Forward Pass for $x_2$ (Recalculating $z_2'$)**

1. **Compute the weighted sum for the hidden layer**:
   $$
   u_2' = a_0'' + a_1'' \cdot x_2 = 0.302897 + 0.203922 \cdot 1.57 = 0.622052
   $$
2. **Compute the hidden layer output**:
   $$
   y_2' = \frac{1}{1 + e^{-u_2'}} = \frac{1}{1 + e^{-0.622052}} = 0.650717
   $$
3. **Compute the weighted sum for the output layer**:
   $$
   v_2' = b_0'' + b_1'' \cdot y_2' = -0.069075 + 0.419810 \cdot 0.650717 = 0.204835
   $$
4. **Compute the output for $x_2$**:
   $$
   z_2' = \frac{1}{1 + e^{-v_2'}} = \frac{1}{1 + e^{-0.204835}} = 0.550870
   $$
5. **Final error for $x_2$**:
   $$
   E_2' = (z_2' - t_2)^2 = (0.550870 - 1.0)^2 = 0.201718
   $$

#### **Mean Square Error (MSE)**

Finally, we calculate the Mean Square Error (MSE) as the average of the final errors for both inputs:
$$
\text{MSE} = \frac{E_1' + E_2'}{2} = \frac{0.025598 + 0.201718}{2} = 0.113658
$$

In [None]:
# Step 3: MSE Calculation for both inputs using final weights
print("=== Step 3: MSE Calculation ===")

# Final forward pass for the first input
u1_final = a_0_final + a_1_final * x1
y1_final = sigmoid(u1_final)
v1_final = b_0_final + b_1_final * y1_final
z1_final = sigmoid(v1_final)
error_1_final = (z1_final - t1) ** 2
print(f"Final output for x1: z1_final = {z1_final}, Final error for x1: E1_final = {error_1_final}")

# Final forward pass for the second input
u2_final = a_0_final + a_1_final * x2
y2_final = sigmoid(u2_final)
v2_final = b_0_final + b_1_final * y2_final
z2_final = sigmoid(v2_final)
error_2_final = (z2_final - t2) ** 2
print(f"Final output for x2: z2_final = {z2_final}, Final error for x2: E2_final = {error_2_final}\n")

# MSE calculation
mse = (error_1_final + error_2_final) / 2
print(f"Mean Square Error (MSE) = ({error_1_final} + {error_2_final}) / 2 = {mse}")


## **Conclusion**

### **Summary of Results:**

1. **Part 1(i)**:
   - Initial output for $x_1 = 0.7853$: $z_1 = 0.541635$
   - Error for $x_1$: $E_1 = 0.0273457$
   - Updated weights after training on $x_1$:
     - $a_0' = 0.301843$
     - $a_1' = 0.202268$
     - $b_0' = -0.080305$
     - $b_1' = 0.412510$

2. **Part 1(ii)**:
   - Initial output for $x_2 = 1.57$: $z_2 = 0.546828$
   - Error for $x_2$: $E_2 = 0.205365$
   - Final output for $x_1$ after weight updates: $z_1' = 0.547005$
   - Final output for $x_2$ after weight updates: $z_2' = 0.550870$
   - Mean Square Error (MSE): $0.1137$

The network was successfully trained using **example-by-example training**, and the final Mean Square Error (MSE) after training on both inputs was **0.1137**.