**Chapter 7 --- Ensemble Learning Random Forest (Revision Summary)**
=================================================

*Simple theory + formulas + minimal runnable snippets*

* * * * *

**1 --- What Is Ensemble Learning?**
==================================

Ensemble learning = **combining predictions from multiple models** to get a better overall prediction.

Key idea:

`Many weak models  →  together become a strong model.`

Types of ensembles:

-   Averaging methods (Bagging, Random Forests)

-   Boosting (AdaBoost, Gradient Boosting)

-   Stacking (meta-model on top of base models)

Why ensembles work:

-   Reduce variance

-   Reduce bias (boosting)

-   Reduce overfitting (bagging)

-   Increase robustness

* * * * *

**2 --- Voting Classifiers**
==========================

Combine predictions of multiple classifiers.

Types:

**2.1 Hard Voting**
-------------------

Choose class with **most votes**.

In [2]:
ŷ = mode(ŷ₁, ŷ₂, ..., ŷ_k)

SyntaxError: invalid character '₁' (U+2081) (271338967.py, line 1)

**2.2 Soft Voting**
-------------------

Average predicted probabilities and choose class with highest mean.

In [None]:
ŷ = argmax_j (1/n Σ p_j^(i))

Soft voting often performs better.

### Code Example:

In [None]:
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier

voting = VotingClassifier(
    estimators=[
        ('lr', LogisticRegression()),
        ('svc', SVC(probability=True)),
        ('dt', DecisionTreeClassifier())
    ],
    voting='soft'
)

voting.fit(X_train, y_train)

**3 --- Bagging & Pasting**
=========================

Train multiple models on **different subsets** of the training set.

### Bagging (Bootstrap Aggregating):

-   Sample **with replacement**

-   Each model sees a different bootstrap sample

-   Reduces variance

### Pasting:

-   Sample **without** replacement

Final prediction:

-   Classification → majority vote

-   Regression → average

### Code Example (Bagging):

In [None]:
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier

bag = BaggingClassifier(
    DecisionTreeClassifier(),
    n_estimators=100,
    max_samples=1.0,
    bootstrap=True
)

bag.fit(X_train, y_train)

**4 --- Out-of-Bag (OOB) Evaluation**
===================================

Since bagging uses bootstrap samples, ~37% of samples are **not** used in each model's training → these are out-of-bag samples.

OOB score = evaluate model on the samples it never saw.

In [None]:
bag = BaggingClassifier(
    DecisionTreeClassifier(),
    n_estimators=200,
    bootstrap=True,
    oob_score=True
)
bag.fit(X, y)
print(bag.oob_score_)

**5 --- Random Forests**
======================

A Random Forest = bagging + decision trees + randomness in feature selection.

Each split chooses:

`random subset of features → choose best split among them`

Why this helps:

-   De-correlates trees

-   Reduces variance further

-   More stable than plain bagging

* * * * *

**6 --- Random Forest Hyperparameters**
=====================================

| Parameter | Meaning |
| --- | --- |
| n_estimators | number of trees |
| max_features | number of features considered per split |
| max_depth | tree depth limit |
| min_samples_split | prevent overfitting |
| bootstrap | True = bagging |
| oob_score | evaluate without validation split |

* * * * *

**7 --- Extra-Trees (Extremely Randomized Trees)**
================================================

Like Random Forests, but splits are chosen **randomly**, not by best impurity reduction.

Benefits:

-   Even lower variance

-   Faster training

Drawback:

-   Slight increase in bias

Example:

`from sklearn.ensemble import ExtraTreesClassifier
model = ExtraTreesClassifier(n_estimators=200)
model.fit(X_train, y_train)`

* * * * *

**8 --- Feature Importance**
==========================

Random Forests provide feature importance based on impurity reduction.

`model.feature_importances_`

Interpretation:

-   Higher value → feature contributed more to splits

* * * * *

**9 --- Boosting**
================

Boosting = sequentially train weak learners, each correcting the errors of the previous ones.

Types:

-   AdaBoost

-   Gradient Boosting

-   XGBoost (extension)


**10 --- AdaBoost**
=================

Each model focuses on mistakes of previous model by increasing sample weights.

Weight update rule:

`wᵢ := wᵢ * exp(α * I(yᵢ ≠ ŷᵢ))`

α = model weight in ensemble.

Example:

In [None]:
from sklearn.ensemble import AdaBoostClassifier
model = AdaBoostClassifier(n_estimators=100)

**11 --- Gradient Boosting**
==========================

Train trees sequentially to fit the **residual errors**:



Each new tree tries to predict the residual.

`from sklearn.ensemble import GradientBoostingRegressor
model = GradientBoostingRegressor()`

* * * * *

In [None]:
residual = y - predictions_so_far

**12 --- Stochastic Gradient Boosting**
=====================================

Randomizes:

-   row sampling (subsample < 1)

-   column sampling

Reduces overfitting.

* * * * *

**13 --- XGBoost, LightGBM, CatBoost**
====================================

Advanced gradient boosting libraries:

-   exceptionally fast

-   great accuracy

-   handle large datasets

XGBoost implements:

-   regularization

-   shrinkage

-   weighted quantile sketch

* * * * *

**14 --- Stacking (Stacked Generalization)**
==========================================

Train base learners → feed predictions into a **meta-model** (e.g., logistic regression).

Example:

`level 0: Random Forest, SVM, Logistic Regression
level 1: Logistic Regression (meta learner)`

* * * * *

**15 --- Summary Table (One-Glance)**
===================================

| Method | Core Idea | Strength | Weakness |
| --- | --- | --- | --- |
| Voting | combine models | simple | needs diverse models |
| Bagging | bootstrap + many models | reduces variance | can still overfit |
| Random Forest | bagging + random features | strong generalization | slower |
| Extra Trees | random splits | very fast | slightly higher bias |
| AdaBoost | reweight errors | low bias | sensitive to noise |
| Gradient Boosting | fit residuals | high accuracy | overfitting if deep |
| XGBoost | optimized GBM | SOTA | more complex |
| Stacking | meta-model | flexible | slow to train |