
#  Assignment: Binary Classification with Forward and Backward Propagation (MSE Loss)  based on the lecture **01 - ANN from scratch**(ANN.ipynb - Backward Propagation section).

## Objectives
- Implement **forward propagation** for a binary neural network.
- Implement **backward propagation** using the **full chain rule** (with sigmoid derivatives kept explicitly).
- Train on MNIST (digits 0 vs 1) using only **NumPy**.
- Use **sigmoid activations** and **Mean Squared Error (MSE) loss**.

---

##  Definitions

- **Sigmoid function (σ):**
  $$
  \sigma(z) = \frac{1}{1 + e^{-z}}
  $$

- **Mean Squared Error (MSE):**
  $$
  L = \frac{1}{2m} \sum_{i=1}^m \Big( y^{(i)} - a^{(out)(i)} \Big)^2
  $$



## 1. Feedforward Equations

- Hidden layer:
  $$
  z^{(h)} = X W^{(h)T} + b^{(h)}, \quad a^{(h)} = \sigma(z^{(h)})
  $$

- Output layer:
  $$
  z^{(out)} = a^{(h)} W^{(out)T} + b^{(out)}, \quad a^{(out)} = \sigma(z^{(out)})
  $$

 **Task:** Implement `forward()`.


In [None]:

import numpy as np

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

def forward(X, W_h, b_h, W_out, b_out):
    z_h = np.dot(X, W_h) + b_h
    a_h = sigmoid(z_h)
    z_out = np.dot(a_h, W_out) + b_out
    a_out = sigmoid(z_out)
    return z_h, a_h, z_out, a_out



## 2. Backward Equations (with MSE + sigmoid)

We compute gradients using the chain rule, **without cancelling the sigmoid derivative**.

---

### Step 1: Loss derivative
- Loss derivative w.r.t. output activation:
$$
\frac{\partial L}{\partial a^{(out)}} = (a^{(out)} - y)
$$

### Step 2: Sigmoid derivative
$$
\frac{\partial a^{(out)}}{\partial z^{(out)}} = a^{(out)} (1 - a^{(out)})
$$

### Step 3: Output error term
$$
\delta^{(out)} = (a^{(out)} - y) \cdot a^{(out)} (1 - a^{(out)})
$$

### Step 4: Gradients for output layer
$$
\frac{\partial L}{\partial W^{(out)}} = (a^{(h)})^T \delta^{(out)},
\quad
\frac{\partial L}{\partial b^{(out)}} = \sum \delta^{(out)}
$$

### Step 5: Hidden error term
$$
\delta^{(h)} = (\delta^{(out)} W^{(out)}) \odot a^{(h)} (1 - a^{(h)})
$$

### Step 6: Gradients for hidden layer
$$
\frac{\partial L}{\partial W^{(h)}} = X^T \delta^{(h)},
\quad
\frac{\partial L}{\partial b^{(h)}} = \sum \delta^{(h)}
$$

---

 **Task:** Implement `backward()` accordingly.


In [2]:

def sigmoid_derivative(a):
    return a * (1 - a)

def backward(X, y, z_h, a_h, z_out, a_out, W_out):
    # Output error with sigmoid derivative explicit
    delta_out = (a_out - y) * sigmoid_derivative(a_out)
    dW_out = np.dot(a_h.T, delta_out)
    db_out = np.sum(delta_out, axis=0, keepdims=True)
    
    # Hidden error
    delta_h = np.dot(delta_out, W_out.T) * sigmoid_derivative(a_h)
    dW_h = np.dot(X.T, delta_h)
    db_h = np.sum(delta_h, axis=0, keepdims=True)
    
    return dW_h, db_h, dW_out, db_out



## 3. Training Function

We will train using stochastic gradient descent (SGD).


In [3]:

def train(X, y, hidden_dim=64, epochs=50, lr=0.01):
    n_samples, n_features = X.shape
    n_outputs = 1  # binary output

    # Initialize parameters
    W_h = np.random.randn(n_features, hidden_dim) * 0.01
    b_h = np.zeros((1, hidden_dim))
    W_out = np.random.randn(hidden_dim, n_outputs) * 0.01
    b_out = np.zeros((1, n_outputs))

    for epoch in range(epochs):
        # Forward
        z_h, a_h, z_out, a_out = forward(X, W_h, b_h, W_out, b_out)

        # Backward
        dW_h, db_h, dW_out, db_out = backward(X, y, z_h, a_h, z_out, a_out, W_out)

        # Parameter updates
        W_h -= lr * dW_h
        b_h -= lr * db_h
        W_out -= lr * dW_out
        b_out -= lr * db_out

        if epoch % 10 == 0:
            loss = np.mean(0.5 * (y - a_out)**2)
            print(f"Epoch {epoch}, Loss: {loss:.4f}")

    return W_h, b_h, W_out, b_out



## 4. Test on MNIST (Digits 0 and 1 Only)

 Use the block below to:
1. Load MNIST using `sklearn.datasets.fetch_openml`
2. Keep only digits 0 and 1
3. Normalize inputs to [0,1]
4. Train the model on a subset


In [4]:

from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split

# Load MNIST
mnist = fetch_openml("mnist_784", version=1)
X, y = mnist["data"], mnist["target"].astype(int)

# Select only digits 0 and 1
mask = (y == 0) | (y == 1)
X, y = X[mask], y[mask]

# Normalize
X = X / 255.0
y = y.to_numpy().reshape(-1, 1)  

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model on subset
W_h, b_h, W_out, b_out = train(X_train[:2000], y_train[:2000], hidden_dim=32, epochs=30, lr=0.5)

# Evaluate
_, _, _, a_out_test = forward(X_test, W_h, b_h, W_out, b_out)
y_pred = (a_out_test >= 0.5).astype(int)
accuracy = np.mean(y_pred == y_test)
print("Test Accuracy:", accuracy)


Epoch 0, Loss: 0.1251
Epoch 10, Loss: 0.2427
Epoch 20, Loss: 0.2427
Test Accuracy: 0.5257104194857916
