# Lesson 05 - Perceptron, Exponential Family, GLMs


## Objectives
- Implement the perceptron algorithm.
- Summarize exponential family distributions and GLM structure.
- Connect link functions to mean parameters.


## From the notes

**Perceptron update**
- If $y^{(i)} (\theta^T x^{(i)}) \le 0$, update $\theta := \theta + \alpha y^{(i)} x^{(i)}$.

**GLM template**
- Choose exponential family distribution.
- Define $\eta = \theta^T x$ and link $g(\mu) = \eta$.

_TODO: Confirm notation with the official CS229 main notes PDF._


## Intuition
Perceptron is a simple linear classifier updated only on mistakes. GLMs generalize linear models by changing the output distribution and link function.


## Data
We use a linearly separable synthetic dataset to demonstrate the perceptron updates.


In [None]:
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(42)

X_pos = np.random.multivariate_normal([2, 2], np.eye(2), 50)
X_neg = np.random.multivariate_normal([-2, -2], np.eye(2), 50)
X = np.vstack([X_pos, X_neg])
y = np.hstack([np.ones(50), -np.ones(50)])
Xb = np.c_[np.ones(len(X)), X]

def perceptron(X, y, epochs=10, alpha=0.1):
    theta = np.zeros(X.shape[1])
    for _ in range(epochs):
        for xi, yi in zip(X, y):
            if yi * (theta @ xi) <= 0:
                theta += alpha * yi * xi
    return theta

theta = perceptron(Xb, y)
theta


## Experiments


In [None]:
# Compare perceptron accuracy
preds = np.sign(Xb @ theta)
acc = (preds == y).mean()
acc


## Visualizations


In [None]:
plt.figure(figsize=(6,4))
plt.scatter(X_pos[:,0], X_pos[:,1], label="+1")
plt.scatter(X_neg[:,0], X_neg[:,1], label="-1")
x1 = np.linspace(-4, 4, 100)
x2 = -(theta[0] + theta[1]*x1) / theta[2]
plt.plot(x1, x2, color="black", label="perceptron")
plt.title("Perceptron decision boundary")
plt.xlabel("x1")
plt.ylabel("x2")
plt.legend()
plt.show()

plt.figure(figsize=(6,4))
margins = y * (Xb @ theta)
plt.hist(margins, bins=15, alpha=0.7)
plt.title("Perceptron margins")
plt.xlabel("y*(θ^T x)")
plt.ylabel("count")
plt.show()


## Takeaways
- Perceptron is mistake-driven and finds any separating hyperplane if data is separable.
- GLMs unify linear prediction with different output distributions through link functions.


## Explain it in an interview
- Explain the perceptron update rule and when it converges.
- Describe how you would choose a link function in a GLM.


## Exercises
- Compare perceptron to logistic regression on a non-separable dataset.
- Write down the exponential family form for Poisson.
- Show that logistic regression is a GLM with Bernoulli output.
