# Lesson 05: Softmax Regression + Calibration## Objectives- Implement softmax regression for multi-class classification.- Visualize decision regions.- Build a reliability diagram to assess calibration.

## From the notesSoftmax hypothesis:\[P(y=k\mid x; \Theta) = rac{\exp(	heta_k^T x)}{\sum_{j=1}^K \exp(	heta_j^T x)}.\]Negative log-likelihood over \(m\) examples is minimized to fit \(\Theta\).

## IntuitionSoftmax assigns normalized scores to each class; calibration checks if predicted probabilities match observed frequencies.

## DataWe create three Gaussian clusters for a 3-class problem.

In [None]:
import numpy as npimport matplotlib.pyplot as pltnp.random.seed(42)

In [None]:
# Synthetic 3-class datam_per = 60means = [(-2,0), (2,0), (0,2.5)]X_list = []y_list = []for k, mean in enumerate(means):    Xk = np.random.multivariate_normal(mean, np.eye(2)*0.6, size=m_per)    X_list.append(Xk)    y_list.append(np.full(m_per, k))X_raw = np.vstack(X_list)y = np.concatenate(y_list)X = np.c_[np.ones(len(y)), X_raw]K = 3

## Implementation: softmax regression

In [None]:
def softmax(z):    z = z - np.max(z, axis=1, keepdims=True)    exp_z = np.exp(z)    return exp_z / np.sum(exp_z, axis=1, keepdims=True)def one_hot(y, K):    out = np.zeros((len(y), K))    out[np.arange(len(y)), y] = 1    return outdef softmax_gd(X, y, K, alpha=0.1, num_iters=300):    Theta = np.zeros((K, X.shape[1]))    Y = one_hot(y, K)    history = []    for _ in range(num_iters):        probs = softmax(X @ Theta.T)        grad = (probs - Y).T @ X / len(y)        Theta -= alpha * grad        loss = -np.mean(np.sum(Y * np.log(probs + 1e-9), axis=1))        history.append(loss)    return Theta, np.array(history)

## Experiments

In [None]:
Theta, history = softmax_gd(X, y, K)probs = softmax(X @ Theta.T)

## Visualizations

In [None]:
plt.figure(figsize=(6,4))for k in range(K):    plt.scatter(X_raw[y==k,0], X_raw[y==k,1], label=f"class {k}", alpha=0.7)plt.xlabel("x1")plt.ylabel("x2")plt.title("Softmax data")plt.legend()plt.show()plt.figure(figsize=(6,4))plt.plot(history)plt.xlabel("iteration")plt.ylabel("negative log-likelihood")plt.title("Training loss")plt.show()

### Calibration (reliability diagram)

In [None]:
# Use max probability as confidenceconf = probs.max(axis=1)preds = probs.argmax(axis=1)correct = (preds == y).astype(float)bins = np.linspace(0, 1, 6)bin_ids = np.digitize(conf, bins) - 1accs = []confs = []for b in range(len(bins)-1):    mask = bin_ids == b    if mask.any():        accs.append(correct[mask].mean())        confs.append(conf[mask].mean())    else:        accs.append(0.0)        confs.append((bins[b]+bins[b+1]) / 2)plt.figure(figsize=(5,5))plt.plot([0,1], [0,1], "k--", label="perfect")plt.bar(confs, accs, width=0.15, alpha=0.7, label="empirical")plt.xlabel("confidence")plt.ylabel("accuracy")plt.title("Reliability diagram")plt.legend()plt.show()

## Takeaways- Softmax generalizes logistic regression to multiple classes.- Calibration diagnostics reveal whether predicted probabilities are reliable.

## Explain it in an interview- Walk through the softmax normalization and cross-entropy loss.- Mention calibration and reliability diagrams.

## Exercises1. Add L2 regularization and see how calibration changes.2. Implement temperature scaling as a post-processing step.3. Compare one-vs-rest to softmax on the same dataset.