# Lesson 12 - Decision Trees and Ensemble Methods


## Objectives
- Implement a simple decision stump.
- Use bagging to combine stumps.
- Visualize how ensembles improve stability.


## From the notes

**Decision trees**
- Split data by maximizing information gain or minimizing impurity.

**Bagging**
- Train models on bootstrap samples and average predictions.

_TODO: Confirm definitions in the CS229 main notes PDF._


## Intuition
Trees are flexible but high-variance. Bagging reduces variance by averaging many noisy learners.


## Data
We create a 2D classification dataset with a nonlinear boundary.


In [None]:
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(42)

X = np.random.uniform(-2, 2, size=(200, 2))
y = (X[:,0]**2 + X[:,1]**2 > 1.5).astype(int)

def best_stump(X, y):
    best = None
    for feature in range(X.shape[1]):
        thresholds = np.unique(X[:, feature])
        for t in thresholds:
            pred = (X[:, feature] > t).astype(int)
            err = np.mean(pred != y)
            if best is None or err < best[0]:
                best = (err, feature, t)
    return best

err, feat, thresh = best_stump(X, y)
err


## Experiments


In [None]:
def stump_predict(X, feat, thresh):
    return (X[:, feat] > thresh).astype(int)

def bagging_predict(X, y, n_estimators=25):
    preds = []
    for _ in range(n_estimators):
        idx = np.random.choice(len(X), len(X), replace=True)
        err, feat, thresh = best_stump(X[idx], y[idx])
        preds.append(stump_predict(X, feat, thresh))
    return (np.mean(preds, axis=0) > 0.5).astype(int)

bagged = bagging_predict(X, y)
(bagged == y).mean()


## Visualizations


In [None]:
plt.figure(figsize=(6,4))
plt.scatter(X[:,0], X[:,1], c=y, cmap="coolwarm", alpha=0.7)
plt.title("Synthetic classification data")
plt.xlabel("x1")
plt.ylabel("x2")
plt.show()

preds = stump_predict(X, feat, thresh)
plt.figure(figsize=(6,4))
plt.scatter(X[:,0], X[:,1], c=preds, cmap="coolwarm", alpha=0.7)
plt.title("Decision stump predictions")
plt.xlabel("x1")
plt.ylabel("x2")
plt.show()


## Takeaways
- Decision stumps are weak learners; bagging reduces variance.
- Ensembles often outperform individual trees.


## Explain it in an interview
- Explain why bagging helps unstable learners like trees.
- Describe how a decision tree chooses a split.


## Exercises
- Implement Gini impurity for splits.
- Increase the number of bagged stumps and observe accuracy.
- Try boosting a set of stumps.
