# Random Forest 


> In previous lectures, we have seen 


## 📖 What Is Random Forest Regression?

**Random Forest Regression** is an ensemble learning method that fits many decision trees and averages their predictions.

It combines:
- **Bagging (Bootstrap Aggregation)**: Fit each tree on a random subset of the data
- **Random Feature Subsets**: At each split, choose from a random subset of features

---

### ⚙️ Key Features

- Reduces overfitting of single decision trees
- Handles nonlinear relationships well
- Provides built-in **feature importance**


### 📊 Key Hyperparameters

| Parameter | Meaning |
|-----------|---------|
| `n_estimators` | Number of trees in the forest |
| `max_depth` | Maximum depth of each tree |
| `max_features` | Number of features considered at each split |

> In this note, we will re-examine random forest to understand how it works.

Now that we have discussed Bagging, we can write down the following equations:

Random Forest = **Bagging** + **Random Subspace Method**

At *every split* inside each bootstrapped tree, choose only $k$ of the $p$ features ($k < p$) as candidate split columns. This decorrelates trees even if a few predictors are overwhelmingly strong.

Why it Works

* Like bagging, averaging reduces variance.
* Extra feature randomness prevents the “same strong feature first” syndrome, yielding **even lower correlation** between trees.
  |

In [1]:
import numpy as np

rng = np.random.default_rng(0)   
# Examples with more data points:
sample_size = 200
X = rng.uniform(0.1, 0.9, size=(sample_size, 2))
y = np.zeros(sample_size, dtype=int)
mask1 = X[:, 0] + X[:, 1] > 1.1
mask2 = (~mask1) & (X[:, 0] - X[:, 1] > 0.3)
y[mask1] = 1
y[mask2] = 0
y[~(mask1 | mask2)] = 2


In [2]:
# ----- imports -------------------------------------------------------------
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap

import sklearn                                   # version check
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier

import ipywidgets as widgets
from ipywidgets import interact


label_cmap = ListedColormap(["#1f77b4", "#ff7f0e", "#2ca02c"])

# 2 -- helper to fit & plot a forest for the current slider values
def plot_rf(max_features=1.0, n_estimators=100, max_depth=0):
    depth = None if max_depth == 0 else int(max_depth)

    rf = RandomForestClassifier(
        n_estimators=int(n_estimators),
        max_features=max_features,
        max_depth=depth,
        oob_score=True,
        random_state=42
    ).fit(X, y)

    # mesh grid
    x_min, x_max = X[:, 0].min() - 0.05, X[:, 0].max() + 0.05
    y_min, y_max = X[:, 1].min() - 0.05, X[:, 1].max() + 0.05
    xx, yy = np.meshgrid(
        np.linspace(x_min, x_max, 400),
        np.linspace(y_min, y_max, 400)
    )
    Z = rf.predict(np.c_[xx.ravel(), yy.ravel()]).reshape(xx.shape)

    # plot
    plt.figure(figsize=(6, 5))
    plt.contourf(xx, yy, Z, alpha=0.25, cmap=label_cmap)
    plt.scatter(X[:, 0], X[:, 1],
                c=y, cmap=label_cmap,
                edgecolors="k", s=80)

    depth_label = "None" if depth is None else depth
    plt.title(f"Random Forest: max_features={max_features:.2f}, "
              f"n_estimators={n_estimators}, max_depth={depth_label}\n"
              f"OOB accuracy ≈ {rf.oob_score_:.3f}")
    plt.xlabel("Feature 1")
    plt.ylabel("Feature 2")
    plt.xlim(x_min, x_max)
    plt.ylim(y_min, y_max)
    plt.tight_layout()
    plt.show()

# 3 -- build the widgets
interact(
    plot_rf,
    max_features = widgets.FloatSlider(
        value=1.0, min=0.2, max=1.0, step=0.1,
        description="max_features"
    ),
    n_estimators = widgets.IntSlider(
        value=100, min=10, max=300, step=10,
        description="n_estimators"
    ),
    max_depth = widgets.IntSlider(
        value=0, min=0, max=6, step=1,
        description="max_depth (0=None)"
    )
);


interactive(children=(FloatSlider(value=1.0, description='max_features', max=1.0, min=0.2), IntSlider(value=10…

> Check out the (slightly overly complicated but fun) visualization by MLU:  https://mlu-explain.github.io/random-forest/