# Perceptron Activity 

This notebook solves the *Perceptron* activity end‑to‑end, following the assignment requirements and grading rubric. It includes:

- Clear data generation for both exercises (1000 samples per class) using multivariate normal distributions.
- A **from-scratch single-layer perceptron** trained with the classic online update rule (*NumPy-only for linear algebra*).
- Visualizations: data scatter plots, decision boundaries overlaid on data, misclassified points, and **training accuracy vs. epoch**.
- Discussion of results and how class separability affects convergence.

> **Tooling restriction:** Only basic NumPy operations (e.g., dot products, additions) and Matplotlib for plotting are used. No ML libraries are used anywhere.

In [9]:
import os, numpy as np, matplotlib.pyplot as plt
from dataclasses import dataclass
import numpy as np

## Reproducibility & Output Paths

All figures are saved to a path required by the assignment’s publishing workflow:

```python
imgs = "../docs/exercises/perceptron/images"
```

In [10]:
imgs = "../docs/exercises/perceptron/images"
os.makedirs(imgs, exist_ok=True)
print("Images will be saved to:", imgs)

Images will be saved to: ../docs/exercises/perceptron/images


## 1. Helper Utilities

Small utility functions for plotting and saving figures. We keep them intentionally simple and transparent.

In [11]:
def _savefig(path):
    """Save the current Matplotlib figure to `path` with tight layout."""
    plt.tight_layout()
    plt.savefig(path, dpi=160, bbox_inches="tight")
    plt.close()

def plot_scatter(X, y, title, path):
    """Scatter plot for two classes (0 as dots, 1 as x-marks)."""
    plt.figure(figsize=(6,5))
    m0, m1 = (y == 0), (y == 1)
    plt.scatter(X[m0,0], X[m0,1], s=12, label="Class 0")          
    plt.scatter(X[m1,0], X[m1,1], s=12, label="Class 1", marker="x")
    plt.xlabel("x1") 
    plt.ylabel("x2")
    plt.title(title) 
    plt.legend()
    _savefig(path)

def decision_boundary_line(w, b):
    """Plot the line w·x + b = 0 on the current axes."""
    ax = plt.gca()
    x_min, x_max = ax.get_xlim()
    xs = np.linspace(x_min, x_max, 200)
    if abs(w[1]) < 1e-9:
        ax.axvline(-b/(w[0] + 1e-12), linestyle="--", linewidth=2, label="w·x+b=0")
    else:
        ys = -(w[0]*xs + b) / (w[1] + 1e-12)
        ax.plot(xs, ys, linestyle="--", linewidth=2, label="w·x+b=0")

def plot_with_boundary_and_errors(X, y, w, b, title, path):
    """Scatter data, draw decision boundary, and circle misclassified points."""
    plt.figure(figsize=(6,5))
    m0, m1 = (y == 0), (y == 1)
    plt.scatter(X[m0,0], X[m0,1], s=12, label="Class 0")
    plt.scatter(X[m1,0], X[m1,1], s=12, label="Class 1", marker="x")
    # draw boundary
    decision_boundary_line(w, b)
    # misclassified markers (computed by caller to respect the perceptron decision)
    plt.xlabel("x1")
    plt.ylabel("x2")
    plt.legend()
    plt.title(title)
    _savefig(path)

## 2. Data Generation (as specified)

We generate two 2D Gaussian classes (1000 samples/class) with **fixed means** and **diagonal covariances** given in the activity.
- Exercise 1: Well-separated means, **low variance** ⇒ almost linearly separable.
- Exercise 2: Closer means, **higher variance** ⇒ partial overlap (not perfectly separable).

In [12]:
def gen_ex1(rng):
    """Exercise 1 data: two Gaussians with low variance and distant means."""
    mean0 = np.array([1.5, 1.5])
    cov0 = np.array([[0.5, 0.0],[0.0, 0.5]])
    mean1 = np.array([5.0, 5.0])
    cov1 = np.array([[0.5, 0.0],[0.0, 0.5]])
    X0 = rng.multivariate_normal(mean0, cov0, size=1000)
    X1 = rng.multivariate_normal(mean1, cov1, size=1000)
    X = np.vstack([X0, X1])
    y = np.concatenate([np.zeros(1000, int), np.ones(1000, int)])
    return X, y

def gen_ex2(rng):
    """Exercise 2 data: two Gaussians with higher variance and closer means."""
    mean0 = np.array([3.0, 3.0])
    cov0 = np.array([[1.5, 0.0],[0.0, 1.5]])
    mean1 = np.array([5.0, 5.0])
    cov1 = np.array([[1.5, 0.0],[0.0, 1.5]])
    X0 = rng.multivariate_normal(mean0, cov0, size=1000)
    X1 = rng.multivariate_normal(mean1, cov1, size=1000)
    X = np.vstack([X0, X1])
    y = np.concatenate([np.zeros(1000, int), np.ones(1000, int)])
    return X, y

rng = np.random.default_rng(42)  # single seed for reproducibility across both exercises
X1, y1 = gen_ex1(rng)
X2, y2 = gen_ex2(rng)

# Save and display initial scatter plots
plot_scatter(X1, y1, "Exercise1: Data Distribution", f"{imgs}/ex1_scatter.png")
plot_scatter(X2, y2, "Exercise 2: Data Distribution", f"{imgs}/ex2_scatter.png")

X1.shape, X2.shape

((2000, 2), (2000, 2))

## 3. Perceptron (from scratch)

We implement the **classic online perceptron** with a bias term. Labels are mapped to $\{-1, +1\}$ internally.  
Update rule for a misclassified sample $(x, y)$ with $y \in \{-1,+1\}$:

$$
\mathbf{w} \leftarrow \mathbf{w} + \eta\, y\, \mathbf{x}, \qquad
b \leftarrow b + \eta\, y.
$$

- Learning rate: $\eta = 0.01$  
- Stopping criteria: stop early if an epoch completes with **zero mistakes** or after **100 epochs**.


In [13]:
class Perceptron:
    """A simple online perceptron with bias (NumPy-only)."""
    def __init__(self, eta=0.01, max_epochs=100, random_state=None):
        self.eta = float(eta)
        self.max_epochs = int(max_epochs)
        self.random_state = random_state
        self.w = None
        self.b = 0.0
        self.history = {"acc": []}
        self.converged_epoch = None

    def _init_params(self, d):
        rng = np.random.default_rng(self.random_state)
        self.w = rng.normal(scale=0.01, size=d)
        self.b = 0.0

    def fit(self, X, y):
        """Train with online updates; record accuracy after each epoch."""
        y_signed = np.where(y == 1, 1, -1).astype(float)
        n, d = X.shape
        if self.w is None:
            self._init_params(d)

        self.history = {"acc": []}
        self.converged_epoch = None

        for epoch in range(self.max_epochs):
            errors = 0
            for i in range(n):
                xi = X[i]
                yi = y_signed[i]
                activation = float(np.dot(self.w, xi) + self.b)
                if yi * activation <= 0:  # misclassified or on the boundary
                    self.w = self.w + self.eta * yi * xi
                    self.b = self.b + self.eta * yi
                    errors += 1

            # track accuracy on the training set
            preds = np.sign(X @ self.w + self.b)
            acc = float((preds == y_signed).mean())
            self.history["acc"].append(acc)

            if errors == 0:
                self.converged_epoch = epoch + 1  # 1-based index for readability
                break
        return self

    def predict(self, X):
        return (X @ self.w + self.b > 0).astype(int)

    def accuracy(self, X, y):
        return float((self.predict(X) == y).mean())

## 4. Exercise 1 — Training, Decision Boundary, and Accuracy

Given the strong linear separability, we expect fast convergence and near-perfect accuracy.

In [14]:
p1 = Perceptron(eta=0.01, max_epochs=100, random_state=123).fit(X1, y1)
acc1 = p1.accuracy(X1, y1)

# Decision boundary with misclassified points circled
plt.figure(figsize=(6,5))
m0, m1 = (y1 == 0), (y1 == 1)
plt.scatter(X1[m0,0], X1[m0,1], s=12, label="Class 0")
plt.scatter(X1[m1,0], X1[m1,1], s=12, label="Class 1", marker="x")
decision_boundary_line(p1.w, p1.b)

y1_hat = p1.predict(X1)
mis = y1_hat != y1
if mis.any():
    plt.scatter(X1[mis,0], X1[mis,1], facecolors="none", edgecolors="k", s=30, label="Misclassified")

plt.xlabel("x1")
plt.ylabel("x2")
plt.legend()
plt.title(f"Exercise 1: Decision Boundary (acc={acc1:.3f}, conv_epoch={p1.converged_epoch})")
_savefig(f"{imgs}/ex1_boundary_miscl.png")

# Accuracy over epochs
plt.figure(figsize=(6,4))
plt.plot(range(1, len(p1.history["acc"])+1), p1.history["acc"], linewidth=2)
plt.xlabel("Epoch")
plt.ylabel("Accuracy")
plt.title("Exercise 1: Training Accuracy over Epochs")
_savefig(f"{imgs}/ex1_accuracy.png")

acc1, p1.converged_epoch

(1.0, 34)

**Exercise 1 Discussion.**  
The classes are well separated and have low variance, so the perceptron quickly finds a separating hyperplane (a line in 2D). Convergence typically happens within a few epochs, and training accuracy is ~100%.

## 5. Exercise 2 — Training, Decision Boundary, and Accuracy

Here the classes overlap (closer means, higher variance). The dataset is **not perfectly separable**, so we do not expect 100% accuracy, and the perceptron may not strictly converge (zero mistakes) within 100 epochs.

In [15]:
p2 = Perceptron(eta=0.01, max_epochs=100, random_state=321).fit(X2, y2)
acc2 = p2.accuracy(X2, y2)

# Decision boundary with misclassified points circled
plt.figure(figsize=(6,5))
m0, m1 = (y2 == 0), (y2 == 1)
plt.scatter(X2[m0,0], X2[m0,1], s=12, label="Class 0")
plt.scatter(X2[m1,0], X2[m1,1], s=12, label="Class 1", marker="x")
decision_boundary_line(p2.w, p2.b)

y2_hat = p2.predict(X2)
mis2 = y2_hat != y2
if mis2.any():
    plt.scatter(X2[mis2,0], X2[mis2,1], facecolors="none", edgecolors="k", s=30, label="Misclassified")

plt.xlabel("x1"); plt.ylabel("x2"); plt.legend()
plt.title(f"Exercise 2: Decision Boundary (acc={acc2:.3f}, conv_epoch={p2.converged_epoch})")
_savefig(f"{imgs}/ex2_boundary_miscl.png")

# Accuracy over epochs
plt.figure(figsize=(6,4))
plt.plot(range(1, len(p2.history['acc'])+1), p2.history['acc'], linewidth=2)
plt.xlabel("Epoch"); plt.ylabel("Accuracy"); plt.title("Exercise 2: Training Accuracy over Epochs")
_savefig(f"{imgs}/ex2_accuracy.png")

acc2, p2.converged_epoch

(0.503, None)

### 5.1 Multiple Random Initializations (Convergence Behavior)

Because of overlap and non-separability, the final weights depend on initialization and data order. We run the perceptron **5 times** with different random seeds to visualize this effect.

In [16]:
runs = 5
accs, histories, params = [], [], []
for seed in range(100, 100 + runs):
    p = Perceptron(eta=0.01, max_epochs=100, random_state=seed).fit(X2, y2)
    accs.append(p.accuracy(X2, y2))
    histories.append(p.history["acc"][:])
    params.append((p.w.copy(), p.b, p.converged_epoch))

# Plot accuracy curves from all runs
plt.figure(figsize=(7,5))
for i, h in enumerate(histories):
    plt.plot(range(1, len(h)+1), h, linewidth=1.5, label=f"run {i+1}")
plt.xlabel("Epoch"); plt.ylabel("Accuracy"); plt.title("Exercise 2: Accuracy across 5 random initializations")
plt.legend()
_savefig(f"{imgs}/ex2_accuracy_multiruns.png")

# Boundary for the best run
best_idx = int(np.argmax(accs))
bw, bb, bconv = params[best_idx]
plt.figure(figsize=(6,5))
plt.scatter(X2[y2==0,0], X2[y2==0,1], s=12, label="Class 0")
plt.scatter(X2[y2==1,0], X2[y2==1,1], s=12, label="Class 1", marker="x")

# draw best boundary
ax = plt.gca()
x_min, x_max = ax.get_xlim(); xs = np.linspace(x_min, x_max, 200)
if abs(bw[1]) < 1e-9:
    ax.axvline(-bb/(bw[0] + 1e-12), linestyle="--", linewidth=2, label="w·x+b=0 (best)")
else:
    ys = -(bw[0]*xs + bb) / (bw[1] + 1e-12)
    ax.plot(xs, ys, linestyle="--", linewidth=2, label="w·x+b=0 (best)")

yb = (X2 @ bw + bb > 0).astype(int)
mb = yb != y2
if mb.any():
    plt.scatter(X2[mb,0], X2[mb,1], facecolors="none", edgecolors="k", s=30, label="Misclassified")
plt.xlabel("x1"); plt.ylabel("x2"); plt.legend()
plt.title(f"Exercise 2 (Best of 5): Decision Boundary (acc={accs[best_idx]:.3f}, conv_epoch={bconv})")
_savefig(f"{imgs}/ex2_best_boundary_miscl.png")

accs, bconv

([0.508, 0.508, 0.508, 0.5115, 0.5095], None)

## 6. Analysis & Discussion

- **Exercise 1 (Linearly separable):** The perceptron converges quickly (often within a handful of epochs) and achieves ~100% training accuracy. The decision boundary cleanly separates the classes because their means are far apart and covariance is small.
- **Exercise 2 (Overlapping classes):** With closer means and higher variance, the classes are **not perfectly separable**. The perceptron cannot reach zero classification error on the training set and may not strictly “converge” (no-mistake epoch) within 100 epochs. Accuracy plateaus below 1.0 and depends on the random initialization.
- **Takeaways:** The perceptron is guaranteed to converge only for linearly separable datasets. When separability is violated, it still finds a reasonable separating hyperplane but cannot eliminate all mistakes.
