## Bagging (Bootstrap Aggregating) in Machine Learning

This notebook covers Bagging from scratch to end-to-end:
- Introduction
- Core Concepts
- Algorithm
- Mathematical Background
- Intuition
- Bagging with Decision Trees
- Out-of-Bag Error
- Advantages and Disadvantages
- Implementation with Scikit-learn
- Real-World Use Cases
- Comparison with Other Ensemble Methods


## 1. What is Bagging?

Bagging stands for **Bootstrap Aggregating**.  
It is an ensemble learning method that combines predictions from multiple models to improve accuracy, stability, and robustness.

- For classification → majority voting (most common class wins).  
- For regression → average of predictions.  

The main purpose of Bagging is to reduce **variance** (overfitting) without increasing **bias** too much.


## 2. Core Concepts

### a) Ensemble Learning
The idea that many weak learners together can form a strong learner.

### b) Bootstrap Sampling
- From a dataset of size N, create new datasets by sampling N points **with replacement**.
- Each new dataset is called a *bootstrap sample*.
- On average, about 63% of the original samples appear in each bootstrap sample, and the rest are *out-of-bag* samples.

### c) Aggregation
- Each bootstrap dataset trains a separate base model.
- Predictions from models are combined:
  - Classification: majority vote
  - Regression: average


## 3. Bagging Algorithm (Step by Step)

1. Start with training dataset \( D = \{(x_1,y_1), (x_2,y_2), …, (x_N,y_N)\} \).
2. For \( b = 1 \) to \( B \) (number of models):
   - Create bootstrap sample \( D_b \) by sampling \( N \) points with replacement.
   - Train base model \( f_b(x) \) on \( D_b \).
3. For a new test point \( x \):
   - Get predictions from each model \( f_b(x) \).
   - Combine results:
     - Regression → \( \hat{y} = \frac{1}{B} \sum_{b=1}^B f_b(x) \)
     - Classification → \( \hat{y} = \text{mode}\{f_1(x), f_2(x), …, f_B(x)\} \)


## 4. Mathematical Understanding

### Variance Reduction
- A single model may have high variance.
- Bagging reduces variance by averaging.

If models are independent with variance \( \sigma^2 \), the variance after averaging \( B \) models is:

$$
\text{Var(Bagging)} = \frac{\sigma^2}{B}
$$

In practice, models are not fully independent, but variance is still reduced significantly.


## 5. Intuition (Coin Toss Analogy)

- Imagine one person guessing coin tosses: 50% accuracy.
- If 100 people guess independently and we take the majority vote, the probability of being correct increases.
- Bagging works the same way: many unstable models combined are more reliable than one.


## 6. Bagging with Decision Trees

- Bagging is usually applied with **Decision Trees**.
- Decision Trees are high-variance models, so bagging stabilizes them.
- An ensemble of bagged decision trees is also the foundation for **Random Forests**.


## 7. Out-of-Bag (OOB) Error

- On average, ~36% of the samples are not included in a bootstrap sample.
- These left-out samples are called **out-of-bag (OOB)** samples.
- They can be used to estimate error without needing a separate validation set.


## 8. Advantages of Bagging

- Reduces variance and prevents overfitting
- Works well with unstable learners (e.g., decision trees)
- OOB error provides internal validation
- Easy to parallelize

## 9. Disadvantages of Bagging

- Does not reduce bias
- Requires training many models (higher computation)
- Models may still be correlated, reducing variance reduction effectiveness


## 10. Real-World Use Cases

- Finance: credit risk analysis
- Medicine: disease prediction
- Marketing: customer churn prediction
- Machine Learning Competitions: improving accuracy with ensembles


## 11. Bagging vs Other Ensemble Methods

- **Bagging**: Trains models independently on bootstrapped samples; reduces variance.
- **Boosting**: Trains models sequentially, focusing on mistakes; reduces bias.
- **Stacking**: Combines predictions of multiple models using a meta-learner.
