#### **Ensemble Methods**

Ensemble techniques are machine learning methods that **combine predictions from multiple models (base learners)** to produce a single, more robust and accurate prediction than any individual model.

#### **Why Ensemble Methods Work**
**Core Idea** : A group of diverse, reasonably accurate models can outperform a single strong model.

Single models often suffer from one or more of the following issues:

**1) High variance**
- Model is very sensitive to training data (e.g., decision trees).
- Small changes in data lead to large changes in predictions.

**2) High bias**
- Model is too simplistic and underfits the data (e.g., linear models for complex problems).

**3) Instability**
- Some algorithms produce inconsistent results depending on sampling.

**4) Limited generalization**
- A single hypothesis may not capture all patterns in the data.

**Key insight**: Each model makes DIFFERENT errors!

- Model A: Correct on samples [1,2,3,5,7,9] - 60% accuracy

- Model B: Correct on samples [2,4,5,6,8,10] - 60% accuracy

- Model C: Correct on samples [1,3,4,6,7,8] - 60% accuracy

Majority vote: Correct on [1,2,3,4,5,6,7,8] - 80% accuracy!

**Diversity reduces variance!**

**Key Intuition (Bias - Variance Trade-Off)**

Ensemble methods improve performance by:

- Reducing variance (e.g., Bagging)

- Reducing bias (e.g., Boosting)

- Improving overall generalization by averaging or combining errors

#### **Main Types of Ensemble methods:**
- Bagging

- Boosting

**Bagging (Bootstrap Aggregating)** : Mainly reduces variance

**How it works:**
We create multiple models and aggregate (Either take majority or take mean/median, etc.) their predictions. 

- Create multiple training datasets using bootstrap sampling (sampling with replacement).

- Train the same algorithm on each dataset.

- Aggregate predictions (majority vote for classification, mean for regression).

**Best suited for:**
- High variance models like decision trees

**Example:** 
- Random Forest



**Boosting ()**

**How it works:**
We create a model in sequence. The performance of previous model is inherited in the sequence and developments on top of it is done so that the final model performs the best. 

- Models are trained sequentially.

- Each new model focuses more on previously misclassified samples.

- Final prediction is a weighted combination of all models.

**Best suited for:** 
- Weak learners that perform slightly better than random guessing

**Example:**
- Adaboost
- Gradient Boosting
- XGBoost : Usually the best models or atleast in top 3 models
- LightGBM
- CatBoost

#### **Bootstrap Dataset**

A bootstrap dataset is a same-sized dataset created by sampling the original data with replacement, allowing duplicates and omissions, and is used to improve model stability and generalization.

It is a foundational concept behind bagging-based ensemble methods such as Random Forest.

A bootstrap dataset is constructed by:
- Randomly selecting ùëõ samples with replacement from original dataset ùê∑

Because sampling is with replacement:

- Some observations appear multiple times
- Some observations are not selected at all

Row	Square Feet	    Price (Lakhs)

- A: 900:70
- B: 1000:80
- C: 900:70
- D: 1500:90
- E: 1600:95
- F: 1700:100

STEP 1: Create Bootstrap Samples (WITH Replacement)

                            Bootstrap Sample 1:
                            Randomly pick 6 samples WITH replacement:
                            ‚îú‚îÄ Row A: 900 sq ft, ‚Çπ70L (sampled TWICE!)
                            ‚îú‚îÄ Row A: 900 sq ft, ‚Çπ70L (duplicate)
                            ‚îú‚îÄ Row B: 1000 sq ft, ‚Çπ80L
                            ‚îú‚îÄ Row D: 1500 sq ft, ‚Çπ90L
                            ‚îú‚îÄ Row E: 1600 sq ft, ‚Çπ95L
                            ‚îî‚îÄ Row F: 1700 sq ft, ‚Çπ100L

                            Bootstrap Sample 2:
                            ‚îú‚îÄ Row C: 900 sq ft, ‚Çπ70L
                            ‚îú‚îÄ Row D: 1500 sq ft, ‚Çπ90L
                            ‚îú‚îÄ Row E: 1600 sq ft, ‚Çπ95L
                            ‚îú‚îÄ Row E: 1600 sq ft, ‚Çπ95L (sampled TWICE!)
                            ‚îú‚îÄ Row F: 1700 sq ft, ‚Çπ100L
                            ‚îî‚îÄ Row B: 1000 sq ft, ‚Çπ80L

                            Bootstrap Sample 3:
                            ‚îú‚îÄ Row F: 1700 sq ft, ‚Çπ100L
                            ‚îú‚îÄ Row C: 900 sq ft, ‚Çπ70L
                            ‚îú‚îÄ Row E: 1600 sq ft, ‚Çπ95L
                            ‚îú‚îÄ Row A: 900 sq ft, ‚Çπ70L
                            ‚îú‚îÄ Row B: 1000 sq ft, ‚Çπ80L
                            ‚îî‚îÄ Row D: 1500 sq ft, ‚Çπ90L
                        

                        
**This is like different decision trees on different subset of data, each with their own prediction**

STEP 2: Train Separate Model on Each Sample

                            Tree 1: Trained on Sample 1
                            ‚Ä¢ Learns splits based on its data
                            ‚Ä¢ For 950 sq ft ‚Üí Predicts: ‚Çπ75L

                            Tree 2: Trained on Sample 2
                            ‚Ä¢ Different data ‚Üí Different splits!
                            ‚Ä¢ For 950 sq ft ‚Üí Predicts: ‚Çπ72L

                            Tree 3: Trained on Sample 3
                            ‚Ä¢ Yet another perspective
                            ‚Ä¢ For 950 sq ft ‚Üí Predicts: ‚Çπ78L

STEP 3: Aggregate Predictions (Average)

                            For test property with 950 sq ft:

                            Prediction‚ÇÅ = ‚Çπ75L
                            Prediction‚ÇÇ = ‚Çπ72L
                            Prediction‚ÇÉ = ‚Çπ78L

                            Final Bagging Prediction:
                            Average = (75 + 72 + 78) / 3
                            = 225 / 3
                            = ‚Çπ75 Lakhs ‚úì

                            Why it works:
                            ‚Ä¢ Each tree makes slightly different errors
                            ‚Ä¢ Averaging reduces overall variance
                            ‚Ä¢ More stable than single tree!

**Out-of-Bag (OOB)** data refers to the **subset of original training samples that are not selected in a particular bootstrap dataset** during ensemble training.

It is a direct consequence of **bootstrap sampling with replacement** and is primarily used in bagging-based models, especially Random Forest.

**Random forest is a self validating model because each bootstrapped data is used for training and the OOB data for that bootstrapped dataset is tested so the training and testing is being done simultaneously**

#### **Random Forest**

Random Forest is a **supervised ensemble learning algorithm** used for **Classification as well as Regression** that builds **multiple decision trees** and combines their predictions to produce a more accurate, stable, and generalizable model.

It is based on:

- Bagging (Bootstrap Aggregation)
- Random feature selection

Intuitive definition of RANDOM FOREST
- Random = RANDOM FEATURE SELECTION + RANDOM DATASET SAMPLING
- Forest = Multiple Decision Trees

**Why Random Forest Was Needed**

**Problem with Decision Trees**

Decision Trees:
- Have low bias
- Have very high variance
- Tend to overfit the training data

**Bagging Alone Is Not Enough**

Bagging reduces variance by training trees on different bootstrap datasets, but:

- Trees can still be highly correlated
- Correlated errors reduce ensemble effectiveness

**Random Forest Solution**

Random Forest:

- Uses bootstrap sampling (data randomness)
- Uses feature subsampling (model randomness)

This decorrelates trees, significantly improving performance.**

**Core Ideas Behind Random Forest**

Random Forest reduces variance by:

- Training many decision trees on different bootstrap datasets
- Restricting each tree to consider only a random subset of features at each split
- Aggregating predictions across all trees

**Steps**

1) Bootstrap Sampling : For each tree: Sample  ùëõ rows with replacement. Remaining rows become Out-of-Bag (OOB) samples. For each tree you will have a bootstrap data and OOB data which are basically remainng rows apart from bootstrap data

2) Random Feature selection & CReating trees : 
- Select at random a few features less than total features. 
    - Calculate parent variance
    - Sort data and find mid points
    - For each mid point
        - Split from mid point 
        - Calculate variance of right and left side
        - Calculate weighted variance
        - Calculate variance reduction 
    - Find the best mid point using the highest variance reduction. 

    Each tree will predict some value and ultimately we take average of all those predictions and that becomes the final answer. 