# Lesson 08: Linear Discriminant Analysis (LDA)## Objectives- Implement LDA with shared covariance.- Visualize linear decision boundaries.- Compare class-conditional Gaussian assumptions.

## From the notesLDA assumes \(p(x\mid y=k)\) is Gaussian with mean \(\mu_k\) and shared covariance \(\Sigma\).The decision boundary between classes is linear.

## IntuitionA shared covariance yields linear decision boundaries, similar to logistic regression but derived generatively.

## DataWe reuse a 2D dataset with overlapping Gaussian classes.

In [None]:
import numpy as npimport matplotlib.pyplot as pltnp.random.seed(42)

In [None]:
# Synthetic datam = 120mean0 = np.array([-1.0, -0.5])mean1 = np.array([1.5, 1.0])X0 = np.random.multivariate_normal(mean0, np.eye(2)*0.7, size=m//2)X1 = np.random.multivariate_normal(mean1, np.eye(2)*0.7, size=m//2)X = np.vstack([X0, X1])y = np.array([0]*(m//2) + [1]*(m//2))

## Implementation: LDA

In [None]:
def lda_fit(X, y):    classes = np.unique(y)    means = {}    priors = {}    for c in classes:        Xc = X[y == c]        means[c] = Xc.mean(axis=0)        priors[c] = len(Xc) / len(X)    # shared covariance    X_centered = np.vstack([X[y==c] - means[c] for c in classes])    Sigma = (X_centered.T @ X_centered) / len(X)    return means, priors, Sigmadef lda_predict(X, means, priors, Sigma):    Sigma_inv = np.linalg.pinv(Sigma)    scores = []    for c in sorted(means.keys()):        mu = means[c]        score = X @ Sigma_inv @ mu - 0.5 * mu.T @ Sigma_inv @ mu + np.log(priors[c])        scores.append(score)    scores = np.vstack(scores).T    return np.argmax(scores, axis=1)

## Experiments

In [None]:
means, priors, Sigma = lda_fit(X, y)preds = lda_predict(X, means, priors, Sigma)acc = (preds == y).mean()print(f"Accuracy: {acc:.2f}")

## Visualizations

In [None]:
plt.figure(figsize=(6,4))plt.scatter(X0[:,0], X0[:,1], label="class 0", alpha=0.7)plt.scatter(X1[:,0], X1[:,1], label="class 1", alpha=0.7)plt.xlabel("x1")plt.ylabel("x2")plt.title("LDA data")plt.legend()plt.show()# Decision regionsx1_vals = np.linspace(X[:,0].min()-1, X[:,0].max()+1, 200)x2_vals = np.linspace(X[:,1].min()-1, X[:,1].max()+1, 200)xx1, xx2 = np.meshgrid(x1_vals, x2_vals)X_grid = np.c_[xx1.ravel(), xx2.ravel()]Z = lda_predict(X_grid, means, priors, Sigma).reshape(xx1.shape)plt.figure(figsize=(6,4))plt.contourf(xx1, xx2, Z, alpha=0.3, levels=2)plt.scatter(X0[:,0], X0[:,1], label="class 0", alpha=0.7)plt.scatter(X1[:,0], X1[:,1], label="class 1", alpha=0.7)plt.title("LDA decision regions")plt.xlabel("x1")plt.ylabel("x2")plt.legend()plt.show()

## Takeaways- LDA is a generative classifier with linear boundaries.- It can perform well with small datasets due to its structure.

## Explain it in an interview- Explain shared covariance and Bayes rule classification.- Compare to logistic regression (discriminative vs generative).

## Exercises1. Add a third class and visualize boundaries.2. Compare LDA vs QDA on the same dataset.3. Derive the linear boundary equation between two classes.