# Lesson 07: Generative Learning + Gaussian Naive Bayes## Objectives- Implement Gaussian Naive Bayes from scratch.- Compare generative likelihoods across classes.- Visualize class-conditional densities.

## From the notesGenerative models estimate \(p(x\mid y)\) and \(p(y)\), then use Bayes' rule.Naive Bayes assumes conditional independence:\[P(x\mid y) = \prod_{j=1}^n P(x_j\mid y).\]

## IntuitionNaive Bayes learns per-feature distributions for each class. With Gaussian features, we track means and variances.

## DataWe create a 2D dataset with diagonal covariance to match the Naive Bayes assumption.

In [None]:
import numpy as npimport matplotlib.pyplot as pltnp.random.seed(42)

In [None]:
# Synthetic datam = 120mean0 = np.array([-1.0, 0.0])mean1 = np.array([1.0, 1.5])X0 = np.random.multivariate_normal(mean0, np.diag([0.4, 0.7]), size=m//2)X1 = np.random.multivariate_normal(mean1, np.diag([0.5, 0.6]), size=m//2)X_raw = np.vstack([X0, X1])y = np.array([0]*(m//2) + [1]*(m//2))

## Implementation: Gaussian Naive Bayes

In [None]:
def gaussian_nb_fit(X, y):    classes = np.unique(y)    means = {}    vars_ = {}    priors = {}    for c in classes:        Xc = X[y == c]        means[c] = Xc.mean(axis=0)        vars_[c] = Xc.var(axis=0) + 1e-6        priors[c] = len(Xc) / len(X)    return means, vars_, priorsdef gaussian_logpdf(x, mean, var):    return -0.5 * np.sum(np.log(2 * np.pi * var) + ((x - mean) ** 2) / var)def gaussian_nb_predict(X, means, vars_, priors):    classes = sorted(priors.keys())    scores = []    for c in classes:        logps = np.array([gaussian_logpdf(x, means[c], vars_[c]) for x in X])        scores.append(logps + np.log(priors[c]))    scores = np.vstack(scores).T    preds = np.argmax(scores, axis=1)    return preds

## Experiments

In [None]:
means, vars_, priors = gaussian_nb_fit(X_raw, y)preds = gaussian_nb_predict(X_raw, means, vars_, priors)acc = (preds == y).mean()print(f"Accuracy: {acc:.2f}")

## Visualizations

In [None]:
plt.figure(figsize=(6,4))plt.scatter(X0[:,0], X0[:,1], label="class 0", alpha=0.7)plt.scatter(X1[:,0], X1[:,1], label="class 1", alpha=0.7)plt.xlabel("x1")plt.ylabel("x2")plt.title("Gaussian Naive Bayes data")plt.legend()plt.show()# Decision boundary gridx1_vals = np.linspace(X_raw[:,0].min()-1, X_raw[:,0].max()+1, 200)x2_vals = np.linspace(X_raw[:,1].min()-1, X_raw[:,1].max()+1, 200)xx1, xx2 = np.meshgrid(x1_vals, x2_vals)X_grid = np.c_[xx1.ravel(), xx2.ravel()]Z = gaussian_nb_predict(X_grid, means, vars_, priors).reshape(xx1.shape)plt.figure(figsize=(6,4))plt.contourf(xx1, xx2, Z, alpha=0.3, levels=2)plt.scatter(X0[:,0], X0[:,1], label="class 0", alpha=0.7)plt.scatter(X1[:,0], X1[:,1], label="class 1", alpha=0.7)plt.title("Naive Bayes decision regions")plt.xlabel("x1")plt.ylabel("x2")plt.legend()plt.show()

## Takeaways- Generative models estimate class-conditional densities and priors.- Naive Bayes is fast and works well when conditional independence is reasonable.

## Explain it in an interview- Outline Bayes' rule and the independence assumption.- Describe how means/variances are estimated per class.

## Exercises1. Add a third class and evaluate accuracy.2. Compare Gaussian NB vs logistic regression on the same data.3. Replace Gaussians with Bernoulli features (e.g., binary vectors).