# Machine Learning Class 2: Decision Trees & Random Forests

Welcome back! In our first class, you learned how computers find patterns using linear regression and gradient descent. Today we'll explore a completely different approach that thinks more like humans do - making decisions step by step.

### 🎯 What You'll Learn
1. 🌲 **Decision Trees** - How computers make decisions like humans
2. 🧠 **Interactive Tree Building** - Build your own decision tree
3. 🌲🌲🌲 **Random Forests** - Why many trees are better than one
4. ✉️ **Real Applications** - Email spam detection in action

### 🌳 **The Big Idea**

Linear regression tries to draw a single line through data. Decision trees take a more human approach: they ask yes/no questions like:
- "Is the house bigger than 1,500 sq ft?"
- "Does the email contain the word 'FREE'?"
- "Is the patient's temperature above 38°C?"

This method is:
- ✅ Interpretable (you can follow every decision),
- ❌ Assumption-free (no need to assume linearity),
- 🔀 Flexible (works with numbers, categories, and missing data),
- 🧠 Intuitive (mimics how we reason about choices).

### 🔗 **Building on Class 1**
- 📈 **Linear Regression**: Found the best line through data
- 🌳 **Decision Trees**: Ask the best questions about data
- 🤝 **Both**: Make predictions, but in very different ways

Let’s explore how a tree of decisions can help a machine learn!

In [None]:
# Quick Setup - Import Our Tools

import numpy as np
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px
import matplotlib.pyplot as plt
from plotly.subplots import make_subplots
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report

In [None]:
# Set random seed for reproducible results
np.random.seed(42)


## Part 1: Decision Trees - Thinking Like Humans

### 🤔 **How Do You Make Decisions?**

Imagine you're deciding whether to go outside:
1. **Is it raining?** → If YES: Stay inside ☔
2. **If NO: Is it sunny?** → If YES: Go outside 😎
3. **If NO: Is it too cold?** → If YES: Stay inside ❄️, If NO: Go outside 🌤️

That’s basically how a decision tree works: a series of yes/no questions that lead to a final decision.

### 🌳 **From Human Logic to Machine Learning**

**Decision trees** are machine learning algorithms that:
- **Ask questions** about the data (like "Is age > 30?")
- **Split the data** based on answers
- **Repeat** until they can make good predictions
- **Create a tree structure** of decisions

### 🎯 **Why Decision Trees Are Special**
- **Interpretable**: You can see exactly how decisions are made
- **No assumptions**: Don't assume linear relationships like regression
- **Handle mixed data**: Numbers, categories, missing values
- **Natural**: Mirror human decision-making processes


# 🍕Let's Start Simple: Predicting if Someone Likes Pizza

### Generating a Realistic Dataset

Let's create a simple dataset to predict if someone likes pizza based on their age, whether they like cheese, whether they are vegeatrian, and whether they have a pet.

In [None]:
def generate_data(n_samples):
    np.random.seed(42)  # for reproducibility

    # --- Core features ---
    ages = np.random.randint(5, 70, size=n_samples)
    likes_cheese = np.random.binomial(1, 0.75, size=n_samples)      # Most like cheese
    vegetarian = np.random.binomial(1, 0.3, size=n_samples)         # Minority vegetarian
    has_pet = np.random.binomial(1, 0.5, size=n_samples)

    # --- Extra features ---
    num_siblings = np.random.poisson(1.5, size=n_samples)           # Adds noise & pattern
    favorite_topping = np.random.choice(['pepperoni', 'mushroom', 'pineapple'], size=n_samples)

    # One-hot encode topping
    topping_dummies = pd.get_dummies(favorite_topping, prefix='topping')

    # --- Calculate probability of liking pizza ---
    prob = np.zeros(n_samples)

    # Age-based base probability
    prob += np.select(
        [
            ages < 18,
            (ages >= 18) & (ages < 30),
            (ages >= 30) & (ages < 50),
            ages >= 50
        ],
        [0.70, 0.60, 0.50, 0.40]
    )

    # Add/subtract effects
    prob += 0.15 * likes_cheese
    prob -= 0.12 * vegetarian
    prob += 0.10 * (likes_cheese & (favorite_topping == 'mushroom'))   # mushroom-lovers
    prob -= 0.08 * (favorite_topping == 'pineapple')                  # 🍍 controversy!
    prob += 0.05 * (num_siblings >= 2)
    prob -= 0.05 * ((ages > 45) & (vegetarian == 1))                  # older vegetarians

    # Add interaction bonus
    interaction = ((ages < 25) & (vegetarian == 1) & (likes_cheese == 1)) * 0.15
    prob += interaction

    # Add some random noise
    prob += np.random.normal(0, 0.05, size=n_samples)

    # Clip to valid probability range
    prob = np.clip(prob, 0, 1)

    # Final target variable
    likes_pizza = np.random.binomial(1, prob)

    # --- Build DataFrame ---
    pizza_data = pd.DataFrame({
        'age': ages,
        'likes_cheese': likes_cheese,
        'vegetarian': vegetarian,
        'has_pet': has_pet,
        'num_siblings': num_siblings,
        'topping': favorite_topping,
    })

    # Add one-hot toppings
    pizza_data = pd.concat([pizza_data, topping_dummies], axis=1)
    feature_cols = ['age', 'likes_cheese', 'vegetarian', 'has_pet', 'num_siblings'] + \
                    [col for col in pizza_data.columns if col.startswith('topping_')]

    # Add target variable
    pizza_data['likes_pizza'] = likes_pizza

    return pizza_data, feature_cols

# Generate the dataset
pizza_data, feature_cols = generate_data(n_samples=300)

This gives us a reasonably realistic dataset — perfect for learning how decision trees behave.

### 🔍 Visualizing the Dataset

Let’s forget how we generated it and just look at the data.

(Values: 1 = yes, 0 = no)

In [None]:
print(pizza_data[:10])

Now let's visualize the relationships between features and preferences:

In [None]:
fig = px.scatter_3d(pizza_data,
                    x='age', y='likes_cheese', z='vegetarian',
                    color='likes_pizza',
                    color_discrete_map={0: 'red', 1: 'green'},
                    title="🍕 Pizza Preferences in 3D",
                    labels={'likes_pizza': 'Likes Pizza'})

fig.update_layout(height=500)
fig.show()

See any patterns? A decision tree will find the best questions to ask!

### 🧪 Building Our First Decision Tree

Let’s build a model to predict if someone likes pizza based on what we know:
- 🧓 Age
- 🧀 Likes cheese
- 🥗 Vegetarian
- 🐶 Has a pet

We’ll use `scikit-learn` to train a Decision Tree Classifier that automatically finds the best questions to ask.

In [None]:
# Prepare the data
X = pizza_data[feature_cols]
y = pizza_data['likes_pizza']

# Create and train a decision tree
tree = DecisionTreeClassifier(max_depth=3, random_state=42)
tree.fit(X, y)

# Make predictions
predictions = tree.predict(X)
accuracy = accuracy_score(y, predictions)

print("🌳 Decision Tree Results:")
print(f"   Training Accuracy: {accuracy:.1%}")
print(f"   Tree Depth: {tree.get_depth()}")
print(f"   Number of Leaves: {tree.get_n_leaves()}")

🔍 **What do these numbers mean?**

- **Training Accuracy**: How well the model predicts on the data it was trained on. A high number might look good, but beware of overfitting!
- **Tree Depth**: How many levels of questions the model asks before making a decision.
- **Leaves**: The final decisions (like “Yes, likes pizza”) — each leaf is a possible outcome.

### 🔮 Making a Prediction

Let’s use the trained tree to predict whether a 25-year-old, who:
- 🧀 Likes cheese: ✅ Yes
- 🥦 Vegetarian: ✅ Yes
- 🐶 Has a pet: ✅ Yes
- 👨‍👩‍👧‍👦 Number of siblings: 2
- 🍄 Favourite topping: Mushroom ✅

likes pizza:

In [None]:
new_data = {
    'age': [25],
    'likes_cheese': [1],
    'vegetarian': [1],
    'has_pet': [1],
    'num_siblings': [2],
    'topping_mushroom': [1],
    'topping_pepperoni': [0],
    'topping_pineapple': [0]
}
new_person = pd.DataFrame(new_data)
prediction = tree.predict(new_person)[0]
probability = tree.predict_proba(new_person)[0]

print(f"\n🎯 Prediction for new person:")
print(f"   Prediction: {'Likes Pizza! 🍕' if prediction == 1 else 'Doesn´t like pizza 😞'}")
print(f"   Confidence: {max(probability):.1%}")

✅ This lets us peek inside the decision tree’s brain!

### 📊 Which Features Matter?

Some features have more influence than others. Let’s measure **feature importance** — this tells us which features helped the tree make its decisions.

In [None]:
feature_importance = pd.DataFrame({
    'feature': X.columns,
    'importance': tree.feature_importances_
}).sort_values('importance', ascending=False)

print(f"\n📊 Most Important Features:")
for _, row in feature_importance.iterrows():
    print(f"   {row['feature']}: {row['importance']:.3f}")


You may notice something surprising: Some features you didn't expect to be important actually are, and others you thought would matter don’t show up at all. What's going on?

🧠 **Why This Happens**

Two key effects are at play:

1. **Spurious Patterns**

- With small or noisy datasets, a few random coincidences can look important.
- Example: Maybe a few pet owners in the training data liked pizza — the tree picks up on this, even if it’s not meaningful.

2. **Regularization Effects**

- When we limit the tree’s depth or require a minimum number of samples to split, the tree may stop early and skip less useful features.
- This is good! It prevents the model from overfitting, but it also means that some genuinely relevant features might not show up unless they're clearly better than others.

🔍 **What You Can Do**

To handle this gracefully:

- 🧪 **Train/Test Split or Cross-Validation**

    Always evaluate your model on unseen data — this tells you if a feature is truly helpful.

- ✂️ **Use Regularization Intentionally**

    Adjust max_depth or min_samples_split to prevent the tree from chasing random patterns.

- 🔁 **Try Ensembles**

    Random forests reduce this kind of variance by averaging over many trees (coming up next!).

- 🔎 **Feature Importance ≠ Causality**

    Just because a feature is used doesn't mean it causes the outcome. Be skeptical!

### 🎮 Interactive: Build Your Own Decision Tree

In [None]:
def build_custom_tree(max_depth, min_samples_leaf):
    """Interactive tree building tool with accuracy gauge and feature importance"""

    # Create tree
    custom_tree = DecisionTreeClassifier(
        max_depth=max_depth,
        min_samples_leaf=min_samples_leaf,
        random_state=42
    )
    custom_tree.fit(X, y)
    predictions = custom_tree.predict(X)
    accuracy = accuracy_score(y, predictions)

    importance = custom_tree.feature_importances_

    fig = make_subplots(
        rows=1, cols=2,
        specs=[[{"type": "indicator"}, {"type": "bar"}]],
        subplot_titles=["🎯 Accuracy Gauge", "📊 Feature Importance"]
    )

    fig.add_trace(go.Indicator(
        mode="gauge+number",
        value=accuracy * 100,
        number={'suffix': "%"},
        gauge={
            'axis': {'range': [0, 100]},
            'bar': {'color': "green"},
            'steps': [
                {'range': [0, 60], 'color': "#ff4d4d"},
                {'range': [60, 80], 'color': "#ffa64d"},
                {'range': [80, 95], 'color': "#d4f542"},
                {'range': [95, 100], 'color': "#4dff88"}
            ],
        }
    ), row=1, col=1)

    # Bar chart for feature importance
    fig.add_trace(go.Bar(
        x=X.columns,
        y=importance,
        marker_color='teal',
        name='Feature Importance'
    ), row=1, col=2)

    fig.update_layout(
        height=450,
        title_text=f"🌳 Custom Decision Tree (Depth: {custom_tree.get_depth()}, Leaves: {custom_tree.get_n_leaves()})"
    )

    fig.show()

    if accuracy > 0.9:
        print("\n🎉 Excellent accuracy! But be careful of overfitting...")
    elif accuracy > 0.75:
        print("\n✅ Good performance!")
    else:
        print("\n⚠️ Try adjusting parameters for better performance")

    return custom_tree

tree = build_custom_tree(max_depth=2, min_samples_leaf=10)

Which decisions has the tree made? How does it decide if someone likes pizza? Let's visualize the decision tree and see how it splits the data based on features.

In [None]:
plt.figure(figsize=(12, 6))
plot_tree(tree,
          feature_names=feature_cols,
          class_names=['Dislike', 'Like'],
          filled=True, rounded=True)
plt.show()

### ✂️ Splitting the Data: Train vs. Test

Before we build even more powerful models, let’s take a step back.

Until now, we’ve been evaluating models on the same data they were trained on. That’s like studying the answers to a test and then using the same test to prove you’re a genius. Not very convincing. 🤓

🧪 **The Idea: Train-Test Split**

To test if a model generalizes to new, unseen data, we split our dataset:
- Training set – Used to train the model
- Test set – Used to evaluate the model’s performance

This lets us simulate how the model will behave in the real world!

Let's first generate a slightly larger dataset to ensure we have enough data for training and testing.

In [None]:
pizza_data_large, _ = generate_data(n_samples=10_000)

In [None]:
# Define features and target
X_large = pizza_data_large[feature_cols]
y_large = pizza_data_large['likes_pizza']

# Split into 80% training, 20% test data
X_train, X_test, y_train, y_test = train_test_split(
    X_large, y_large, test_size=0.2, random_state=42
)

print(f"Training size: {len(X_train)} samples")
print(f"Test size:     {len(X_test)} samples")

🌳 **Training a Tree on the Training Set**

Let’s retrain our decision tree, but now only on the training data, and then check how well it performs on the test set.

In [None]:
tree = DecisionTreeClassifier(max_depth=10, min_samples_leaf=10, random_state=42)
tree.fit(X_train, y_train)

train_acc = accuracy_score(y_train, tree.predict(X_train))
test_acc = accuracy_score(y_test, tree.predict(X_test))

print("🌳 Decision Tree Performance:")
print(f"   Training Accuracy: {train_acc:.1%}")
print(f"   Test Accuracy:     {test_acc:.1%}")

This actually looks pretty good! The training accuracy is high, and the test accuracy is also decent. This means our model is generalizing well to new data. What if we relax some of the regularization settings?

⚠️ **Why This Matters**

- A very high training accuracy but low test accuracy means overfitting.
- A model that performs well on the test set is more likely to work in the real world.


## Part 2: Random Forests - 🌲🌳🌲 The Power of Many Trees

🤔 **Why Isn’t One Tree Enough?**

Imagine asking one person for directions vs. asking 100 people:
- One person might lead you astray (hello, overfitting 👀)
- But 100 people voting? You’re much more likely to find the right path 🚶‍♂️➡️🗺️

That’s the idea behind **Random Forests**:
- Build lots of decision trees (often 100+)
- Each tree sees a different slice of the data
- They vote together on the final prediction
- The result? A **more accurate** and **more robust** model than any single tree

🧪 **The Random Forest Recipe**

1. Bootstrap Sampling: Each tree trains on a random subset of the data
2. Feature Randomness: At every split, each tree considers only a random subset of features
3. Majority Vote: For classification, the most common answer wins
4. Outcome: A strong, stable predictor that generalizes well and resists overfitting

🎲 **Why Is It Called Random?**

Because randomness is the secret sauce:
- ✅ Random data for each tree
- ✅ Random features for each split
- ✅ Random mistakes, which get averaged out

🎯 This randomness helps reduce overfitting and boosts performance on new data

In [None]:
# Train Random Forest
forest = RandomForestClassifier(n_estimators=100, max_depth=5, random_state=42)
forest.fit(X_train, y_train)

forest_train_acc = accuracy_score(y_train, forest.predict(X_train))
forest_test_acc = accuracy_score(y_test, forest.predict(X_test))

print("🔍 Model Comparison:")
print("─────────────────────────────")
print("🌳 Decision Tree:")
print(f"   ✅ Training Accuracy: {train_acc:.1%}")
print(f"   🧪 Test Accuracy:     {test_acc:.1%}")
print("─────────────────────────────")
print("🌲 Random Forest:")
print(f"   ✅ Training Accuracy: {forest_train_acc:.1%}")
print(f"   🧪 Test Accuracy:     {forest_test_acc:.1%}")
print(f"   🌿 Number of Trees:   {len(forest.estimators_)}")

# Compare Feature Importances
tree_importance = pd.DataFrame({
    'Feature': feature_cols,
    'Tree Importance': tree.feature_importances_,
    'Forest Importance': forest.feature_importances_
}).sort_values(by='Forest Importance', ascending=False)

print("\n📊 Feature Importances Comparison:")
for _, row in tree_importance.iterrows():
    print(f"   {row['Feature']:<18} | Tree: {row['Tree Importance']:.3f} | Forest: {row['Forest Importance']:.3f}")

### 🧠 Why is the Random Forest More Powerful?

A single decision tree is prone to **overfitting** — it tries to perfectly split the training data, which can make it sensitive to noise or quirks in the dataset. This often leads to high training accuracy but lower test performance.

A **random forest**, on the other hand, builds many decision trees on **random subsets** of the data and features. This randomness helps:
- 🔁 Reduce correlation between the trees  
- 🛡️ Prevent any single feature or sample from dominating  

By **averaging the predictions** of many diverse trees, the forest creates a more **stable and generalizable model**. That’s why:
- The forest may underfit slightly on the training set  
- But it often **performs better on unseen data** — just like we saw above!

📌 **Key Insight**: Random forests trade a bit of bias for a big reduction in variance — leading to better generalization.

### 📦 Feature Importance: One Tree vs Many Trees

In a single decision tree, the **feature importance** scores reflect how much each feature contributed to splitting the data. But in a **random forest**, each tree might make different decisions — especially if randomness is involved in both data and feature selection.

To understand the **stability and variability** of these decisions, let’s look at the feature importances **across all trees** in the forest.


In [None]:
all_importances = np.array([
    tree.feature_importances_ for tree in forest.estimators_
])

# Create box plots for each feature
fig = go.Figure()

for i, feature in enumerate(X.columns):
    fig.add_trace(go.Box(
        y=all_importances[:, i],
        name=feature,
        boxmean='sd',
        marker_color='teal'
    ))

fig.update_layout(
    title="📊 Feature Importance Across Trees (Random Forest)",
    yaxis_title="Feature Importance",
    height=400
)

fig.show()

### 🔍 What This Tells Us

Each box shows the **distribution of importance values** for a given feature across all trees in the random forest. 

- 📈 Some features (like `age`) are consistently important — they show up in many trees with high influence.  
- 🌀 Others vary more or are barely used — they may only matter in a few trees.

This highlights one of the strengths of random forests: by combining diverse trees, the model captures a **broader range of signals** without relying too heavily on any single decision.

### 🔁 Back to the Tree: Why Does It Overfit?

We just saw how random forests stabilize predictions by combining many shallow, varied trees. But what about a **single decision tree**?

Let’s investigate how the **tree depth** — the number of decision levels — affects performance. Deeper trees can make more specific decisions, but at what cost?

In [None]:
depths = range(1, 15)
train_accs, test_accs = [], []

for d in depths:
    model = DecisionTreeClassifier(max_depth=d, random_state=42)
    model.fit(X_train, y_train)
    train_accs.append(accuracy_score(y_train, model.predict(X_train)))
    test_accs.append(accuracy_score(y_test, model.predict(X_test)))

fig = go.Figure()
fig.add_trace(go.Scatter(x=list(depths), y=train_accs, mode='lines+markers', name='Train Accuracy'))
fig.add_trace(go.Scatter(x=list(depths), y=test_accs, mode='lines+markers', name='Test Accuracy'))
fig.update_layout(title="Effect of Tree Depth on Accuracy", xaxis_title="Max Depth", yaxis_title="Accuracy")
fig.show()

### 📉 What We See

- 🌳 As depth increases, the tree becomes better at fitting the training data — even memorizing it.  
- 🧪 But the test accuracy suffers beyond a certain point — a clear sign of **overfitting**.
- ⚖️ The best depth balances learning useful patterns without chasing every quirk in the data.

📌 **Key Insight**: Individual trees can easily overfit — that’s why ensembles like random forests are so effective.

## 🚀 Part 3: Enter Gradient Boosting: A New Strategy

We’ve seen how individual decision trees can overfit, and how random forests reduce variance by averaging many trees. But there’s another powerful idea: **Gradient Boosting**.

Instead of training all trees independently (like in a forest), boosting builds them **sequentially**:
- Each tree tries to **fix the mistakes** of the one before it.
- The model gradually **improves**, learning from its own errors.
- The result: a strong learner made from many weak learners. 💪

Let’s compare all three approaches:
- 🌳 A single decision tree  
- 🌲 A random forest  
- 🚀 A gradient boosting machine (BDT)

Which one performs best on our pizza prediction task?

In [None]:
models = {
    "Decision Tree 🌳": DecisionTreeClassifier(max_depth=3, random_state=42),
    "Random Forest 🌲": RandomForestClassifier(max_depth=3, n_estimators=100, random_state=42),
    "Gradient Boosting 🚀": GradientBoostingClassifier(n_estimators=100, learning_rate=0.1, random_state=42)
}

# Store results
results = {}

for name, model in models.items():
    model.fit(X_train, y_train)
    preds = model.predict(X_test)
    acc = accuracy_score(y_test, preds)
    results[name] = acc
    print(f"\n{name} Accuracy: {acc:.1%}")
    print(classification_report(y_test, preds))

# Visualization
fig = go.Figure()
fig.add_trace(go.Bar(x=list(results.keys()), y=list(results.values()),
                     text=[f"{v:.1%}" for v in results.values()],
                     textposition='auto', marker_color=["green", "blue", "orange"]))

fig.update_layout(title="📊 Model Comparison on Pizza Preference",
                  yaxis_title="Accuracy", xaxis_title="Model",
                  height=400)
fig.show()

### 🏁 Results & Reflections

All three models use decision trees at their core — but their strategies differ:

- 🌳 A single tree can overfit, especially if it’s deep.
- 🌲 A random forest is more stable and generalizes better by averaging many shallow trees.
- 🚀 Gradient boosting (BDT) focuses on **learning from errors**, often achieving the best accuracy — especially on **structured tabular data**.

📌 **Takeaway**: When accuracy matters and training time is acceptable, gradient boosting is a top choice.  
But when speed and interpretability are more important, simpler trees or random forests still shine!

### 🔄 Boosting Step-by-Step: How Does Performance Evolve?

Gradient boosting builds the model gradually, **one tree at a time**, each trying to correct the mistakes of the previous ones.

But how do the **training and testing errors change** as we add more trees?

Let’s plot the error rate after each boosting round to see how the model improves — or possibly starts to overfit.

In [None]:
# Calculate staged errors
bdt = models["Gradient Boosting 🚀"]

train_errors = []
test_errors = []

for y_train_pred, y_test_pred in zip(
        bdt.staged_predict(X_train),
        bdt.staged_predict(X_test)):
    train_errors.append(1 - accuracy_score(y_train, y_train_pred))
    test_errors.append(1 - accuracy_score(y_test, y_test_pred))

# Plot using Plotly
fig = go.Figure()

fig.add_trace(go.Scatter(
    y=train_errors,
    mode='lines+markers',
    name='Train Error',
    line=dict(color='blue')
))

fig.add_trace(go.Scatter(
    y=test_errors,
    mode='lines+markers',
    name='Test Error',
    line=dict(color='red')
))

fig.update_layout(
    title="📉 BDT Performance over Boosting Rounds",
    xaxis_title="Boosting Round",
    yaxis_title="Error Rate",
    height=400,
    legend=dict(x=0.7, y=0.95)
)

fig.show()

### 📈 What We Learn

- 🟦 The **training error** keeps dropping — the model fits the data better and better.
- 🟥 The **testing error** improves at first, but then may level off or even rise — a sign of **overfitting** if we go too far.
- ⚖️ The sweet spot is usually **before the last round**, where generalization is best.

📌 **Tip**: You can control this with **early stopping**, which halts training when test performance no longer improves.

## 🎉 What You've Accomplished Today!

In under an hour, you've explored a powerful family of machine learning algorithms built on trees — and compared their strengths head-to-head:

### ✅ **Core Concepts Learned:**
1. **Decision Trees** 🌳 – Ask questions to make predictions
2. **Random Forests** 🌲 – Combine many trees to reduce overfitting  
3. **Gradient Boosting** 🚀 – Learn from mistakes step-by-step
4. **Feature Importance** 🔍 – Understand what your model really uses
5. **Model Comparison** 📊 – Evaluate accuracy and generalization

### 🎯 **Key Insights:**
- **Trees think like humans** – breaking decisions into simple questions
- **Forests generalize better** – by averaging many imperfect models
- **Boosting learns iteratively** – fixing errors as it goes
- **Overfitting is real** – but you can control it with depth and ensembling
- **Machine learning is experimental** – you compare, tweak, and iterate

### 🔗 **Connecting the Classes:**
- **Class 1 (Linear Regression)**: Parametric and interpretable  
- **Class 2 (Trees & Boosting)**: Flexible, powerful, still interpretable  
- **Next Class**: Neural networks and deep learning

### 🌟 **The Big Picture:**
You now understand **two core pillars** of machine learning:
- **Linear Models** – fast, elegant, and mathematically grounded  
- **Tree-Based Models** – intuitive, powerful, and great for tabular data

Next up: We’ll dive into **neural networks**, the engine behind modern AI. But every step you’ve taken so far gives you the right tools — and mindset — to keep climbing! 🧠🔥

**Well done today!**