# Lesson 04 - Logistic Regression, Newton's Method, Softmax, Calibration


## Objectives
- Implement logistic regression with Newton's method.
- Extend to softmax regression for multiclass classification.
- Visualize calibration with reliability diagrams.


## From the notes

**Logistic regression**
- Hypothesis: $h_\theta(x) = \sigma(\theta^T x)$, where $\sigma(z) = 1/(1+e^{-z})$.
- Cost (negative log-likelihood): $J(\theta) = -\sum_i y^{(i)} \log h_\theta(x^{(i)}) + (1-y^{(i)}) \log(1-h_\theta(x^{(i)}))$.

**Softmax**
- $p(y=k|x) = \frac{\exp(\theta_k^T x)}{\sum_j \exp(\theta_j^T x)}$.

_TODO: Validate equations against the official CS229 main notes PDF._


## Intuition
Logistic regression treats classification as a probabilistic model. Newton's method uses second-order information for faster convergence. Softmax generalizes logistic regression to K classes.


## Data
We use a 2D synthetic classification dataset for visualization and calibration plots.


In [None]:
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(42)

# Binary classification data
m = 120
X_pos = np.random.multivariate_normal([2, 2], [[1, 0.3], [0.3, 1]], m//2)
X_neg = np.random.multivariate_normal([-2, -2], [[1, -0.2], [-0.2, 1]], m//2)
X = np.vstack([X_pos, X_neg])
y = np.hstack([np.ones(m//2), np.zeros(m//2)])
Xb = np.c_[np.ones(m), X]

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

def newton_logistic(X, y, iters=10):
    theta = np.zeros(X.shape[1])
    for _ in range(iters):
        p = sigmoid(X @ theta)
        R = np.diag(p * (1 - p))
        grad = X.T @ (p - y)
        hess = X.T @ R @ X
        theta -= np.linalg.pinv(hess) @ grad
    return theta

theta = newton_logistic(Xb, y)
probs = sigmoid(Xb @ theta)


## Experiments


In [None]:
# Softmax on a 3-class toy dataset
K = 3
Xc = np.vstack([
    np.random.multivariate_normal([2, 0], np.eye(2), 60),
    np.random.multivariate_normal([-2, 0], np.eye(2), 60),
    np.random.multivariate_normal([0, 2], np.eye(2), 60),
])
yc = np.array([0]*60 + [1]*60 + [2]*60)
Xc_b = np.c_[np.ones(Xc.shape[0]), Xc]

def softmax(z):
    z = z - z.max(axis=1, keepdims=True)
    expz = np.exp(z)
    return expz / expz.sum(axis=1, keepdims=True)

def softmax_gd(X, y, K, alpha=0.1, iters=200):
    theta = np.zeros((K, X.shape[1]))
    y_one = np.eye(K)[y]
    for _ in range(iters):
        probs = softmax(X @ theta.T)
        grad = (probs - y_one).T @ X / len(y)
        theta -= alpha * grad
    return theta

theta_sm = softmax_gd(Xc_b, yc, K)
probs_sm = softmax(Xc_b @ theta_sm.T)


## Visualizations


In [None]:
# Decision boundary for logistic regression
plt.figure(figsize=(6,4))
plt.scatter(X_pos[:,0], X_pos[:,1], label="class 1")
plt.scatter(X_neg[:,0], X_neg[:,1], label="class 0")
x1 = np.linspace(-4, 4, 100)
x2 = -(theta[0] + theta[1]*x1) / theta[2]
plt.plot(x1, x2, color="black", label="boundary")
plt.title("Logistic regression boundary")
plt.xlabel("x1")
plt.ylabel("x2")
plt.legend()
plt.show()

# Calibration plot
bins = np.linspace(0, 1, 6)
bin_ids = np.digitize(probs, bins) - 1
acc = [y[bin_ids==i].mean() if np.any(bin_ids==i) else 0 for i in range(len(bins)-1)]
conf = [probs[bin_ids==i].mean() if np.any(bin_ids==i) else 0 for i in range(len(bins)-1)]
plt.figure(figsize=(6,4))
plt.plot(conf, acc, marker="o")
plt.plot([0,1],[0,1], linestyle="--", color="gray")
plt.title("Reliability diagram")
plt.xlabel("Predicted probability")
plt.ylabel("Empirical accuracy")
plt.show()


## Takeaways
- Newton's method can converge quickly for logistic regression by using curvature information.
- Softmax provides a normalized probability distribution over K classes.


## Explain it in an interview
- Explain why logistic regression outputs probabilities rather than hard labels.
- Describe how you would assess calibration in a classifier.


## Exercises
- Implement logistic regression with gradient descent and compare to Newton's method.
- Try temperature scaling to improve calibration.
- Extend softmax to include L2 regularization.
