# Bagging and Boosting: Detailed Explanation

Bagging and boosting are two popular ensemble learning techniques in machine learning that combine the predictions from multiple models to improve overall performance. While both methods aim to enhance the accuracy and robustness of the predictions, they do so in fundamentally different ways.

## Bagging (Bootstrap Aggregating)

### How It Works

1. **Data Sampling**:
   - From the original dataset, multiple subsets are created through bootstrapping. Bootstrapping involves random sampling with replacement, meaning some instances may appear multiple times in a subset while others may not appear at all.
   
2. **Model Training**:
   - A separate model (often the same type of model) is trained on each of these bootstrapped subsets.

3. **Aggregation**:
   - For regression tasks, the predictions from all models are averaged.
   - For classification tasks, a majority vote is taken from the predictions of all models.

### Mathematical Background

Let's denote the original dataset as $D$ with $n$ samples. Each subset $D_i$ is obtained by sampling $n$ instances from $D$ with replacement.

- **Training**:
  Each model $M_i$ is trained on a subset $D_i$.

- **Prediction**:
  For regression:
  $$
  \hat{y} = \frac{1}{B} \sum_{i=1}^B M_i(x)
  $$
  For classification:
  $$
  \hat{y} = \text{mode}(M_1(x), M_2(x), \ldots, M_B(x))
  $$
  where $B$ is the number of models.

### Illustrations

<center><img src="fig/Bagging.png"/></center>

### Advantages and Disadvantages

- **Advantages**:
  - Reduces variance and helps to prevent overfitting.
  - Can handle high-dimensional data well.
  - Easy to parallelize as each model is independent.

- **Disadvantages**:
  - Does not reduce bias if individual models are biased.
  - Requires more computational resources due to multiple models.

## Boosting

### How It Works

1. **Sequential Training**:
   - Models are trained sequentially, each trying to correct the errors of the previous one.

2. **Weight Adjustment**:
   - Instances that are incorrectly predicted by a model are given higher weights so that the next model focuses more on these difficult cases.

3. **Final Prediction**:
   - The final prediction is a weighted sum (or majority vote) of all individual model predictions.

### Mathematical Background

Let's denote the initial weight of each instance as $\frac{1}{n}$.

- **Training**:
  Each model $M_i$ is trained on the weighted dataset.

- **Error Calculation**:
  The error of model $M_i$ is calculated as:
  $$
  \epsilon_i = \sum_{j=1}^n w_j I(y_j \neq M_i(x_j))
  $$
  where $w_j$ is the weight of instance $j$, $y_j$ is the true label, and $I$ is the indicator function.

- **Model Weight**:
  The weight of model $M_i$ is:
  $$
  \alpha_i = \frac{1}{2} \ln\left(\frac{1 - \epsilon_i}{\epsilon_i}\right)
  $$

- **Weight Update**:
  The weights of instances are updated as:
  $$
  w_j \leftarrow w_j \exp(\alpha_i I(y_j \neq M_i(x_j)))
  $$

- **Normalization**:
  The weights are then normalized.

- **Final Prediction**:
  For classification:
  $$
  \hat{y} = \text{sign}\left(\sum_{i=1}^B \alpha_i M_i(x)\right)
  $$

### Illustrations

<center><img src="fig/Boosting.png"/></center>

### Advantages and Disadvantages

- **Advantages**:
  - Reduces both bias and variance.
  - Can achieve high accuracy with relatively simple models.
  - Focuses on difficult instances, improving model performance on challenging data.

- **Disadvantages**:
  - Can be sensitive to noisy data and outliers.
  - More difficult to parallelize due to sequential nature.
  - Requires careful tuning of hyperparameters to prevent overfitting.

## When to Use Bagging and Boosting

- **Bagging**:
  - When the model has high variance (e.g., decision trees).
  - When computational resources are available for parallel processing.
  - When simplicity and ease of implementation are preferred.

- **Boosting**:
  - When the model has high bias (e.g., simple models like decision stumps).
  - When achieving the highest possible accuracy is crucial.
  - When dealing with imbalanced datasets, as boosting can focus on minority classes.

## Summary

- **Bagging**: Reduces variance, works well with high-variance models, involves parallel model training and averaging or voting.
- **Boosting**: Reduces both bias and variance, works well with weak learners, involves sequential model training with weight adjustment and weighted predictions.

Both techniques are powerful tools in the ensemble learning arsenal, each suited to different types of modeling challenges and datasets.
