# Neural Networks: From Perceptron to MLP

	• Linear Threshold Units
	• Backpropagation Algorithm (Gradient Chains)
	• Activation Functions (Sigmoid, ReLU, Tanh)
	• Python: Manual Backprop on 2-Layer Network


Here’s a complete Neural Networks: From Perceptron to MLP module with key theory and a Python implementation of manual backpropagation for a 2-layer network.

⸻

Neural Networks: From Perceptron to MLP

⸻

1. Linear Threshold Units (LTUs)

A perceptron is the simplest neural unit:

$$
y = \begin{cases}
1 & \text{if } \mathbf{w}^\top \mathbf{x} + b > 0 \
0 & \text{otherwise}
\end{cases}
$$
	•	Works only for linearly separable problems.
	•	No hidden layers → limited expressiveness.

⸻

2. Activation Functions

Function	Formula	Notes
Sigmoid	$\sigma(z) = \frac{1}{1 + e^{-z}}$	Smooth, bounded, but saturates
Tanh	$\tanh(z) = \frac{e^z - e^{-z}}{e^z + e^{-z}}$	Zero-centered
ReLU	$\text{ReLU}(z) = \max(0, z)$	Sparse, fast, not bounded



⸻

3. Backpropagation Algorithm

Goal: Minimize loss by adjusting weights using gradient descent.
	1.	Forward Pass: Compute outputs layer by layer.
	2.	Loss: Typically MSE or Cross-Entropy.
	3.	Backward Pass:
	•	Use chain rule to compute gradients.
	•	Update weights:

$$
w \leftarrow w - \eta \frac{\partial L}{\partial w}
$$

⸻

4. Python: Manual Backprop on a 2-Layer MLP

import numpy as np

# Activation and derivatives
def sigmoid(x): return 1 / (1 + np.exp(-x))
def d_sigmoid(x): return sigmoid(x) * (1 - sigmoid(x))

# Data: XOR Problem
X = np.array([[0,0],[0,1],[1,0],[1,1]])
y = np.array([[0],[1],[1],[0]])

# Initialize weights
np.random.seed(42)
input_size, hidden_size, output_size = 2, 2, 1
W1 = np.random.randn(input_size, hidden_size)
b1 = np.zeros((1, hidden_size))
W2 = np.random.randn(hidden_size, output_size)
b2 = np.zeros((1, output_size))

# Training
lr = 0.1
for epoch in range(10000):
    # Forward pass
    z1 = X @ W1 + b1
    a1 = sigmoid(z1)
    z2 = a1 @ W2 + b2
    y_pred = sigmoid(z2)

    # Loss (MSE)
    loss = np.mean((y - y_pred) ** 2)
    if epoch % 1000 == 0:
        print(f"Epoch {epoch}, Loss: {loss:.4f}")
    
    # Backpropagation
    d_loss = 2 * (y_pred - y) / y.size
    d_z2 = d_loss * d_sigmoid(z2)
    d_W2 = a1.T @ d_z2
    d_b2 = np.sum(d_z2, axis=0, keepdims=True)

    d_a1 = d_z2 @ W2.T
    d_z1 = d_a1 * d_sigmoid(z1)
    d_W1 = X.T @ d_z1
    d_b1 = np.sum(d_z1, axis=0, keepdims=True)

    # Update weights
    W2 -= lr * d_W2
    b2 -= lr * d_b2
    W1 -= lr * d_W1
    b1 -= lr * d_b1

# Prediction
print("\nFinal Predictions:")
print(np.round(y_pred, 3))



⸻

Would you like me to extend this to multi-class classification, or add ReLU and tanh variants in the network?