# üéØ Bagging vs Boosting

## üìä Quick Comparison

| Aspect | Bagging | Boosting |
|--------|---------|----------|
| **Training** | Parallel | Sequential |
| **Sampling** | Bootstrap with replacement | Weighted sampling |
| **Focus** | Reduce variance | Reduce bias |
| **Combining** | Simple averaging/voting | Weighted combination |
| **Examples** | Random Forest, Extra Trees | AdaBoost, Gradient Boosting, XGBoost |

---

## üéí Bagging (Bootstrap Aggregating)

### Core Concept
- Train multiple models **independently** on different subsets
- Combine predictions through **averaging** (regression) or **voting** (classification)

### Key Characteristics
- ‚úÖ **Reduces variance** ‚Üí Less overfitting
- ‚úÖ **Parallelizable** ‚Üí Faster training
- ‚úÖ **Robust** ‚Üí Handles outliers well
- ‚ùå May not improve bias much

### Process
```python
# Pseudo-code
for i in range(n_estimators):
    bootstrap_sample = random_sample_with_replacement(data)
    model_i = train_model(bootstrap_sample)
    
final_prediction = average(all_model_predictions)
```

---

## üöÄ Boosting

### Core Concept
- Train models **sequentially**, each correcting previous errors
- Focus on **hard-to-predict** examples with higher weights

### Key Characteristics
- ‚úÖ **Reduces bias** ‚Üí Better accuracy
- ‚úÖ **Adaptive** ‚Üí Learns from mistakes
- ‚ùå **Sequential** ‚Üí Cannot parallelize
- ‚ùå **Prone to overfitting** with noisy data

### Process
```python
# Pseudo-code
weights = initialize_uniform_weights(data)
for i in range(n_estimators):
    model_i = train_model(data, weights)
    errors = calculate_errors(model_i, data)
    update_weights(weights, errors)  # Increase weights for misclassified
    
final_prediction = weighted_combination(all_models)
```

---

## üé§ Interview Answer

**"Bagging and Boosting are both ensemble methods but work differently:**

**Bagging** trains multiple models independently on bootstrap samples and averages their predictions. It reduces variance and prevents overfitting. Random Forest is a popular example.

**Boosting** trains models sequentially, where each model learns from the previous model's mistakes by focusing on misclassified examples. It reduces bias and improves accuracy. XGBoost and AdaBoost are common examples.

**Key difference**: Bagging reduces variance through parallel training, while Boosting reduces bias through sequential error correction."

---

## üîß When to Use

**Choose Bagging when:**
- High variance models (like Decision Trees)
- Need parallelization
- Noisy data with outliers

**Choose Boosting when:**
- High bias models (like shallow trees)
- Clean data
- Want maximum accuracy

![image.png](attachment:image.png)

In bagging, all the base models have equal weights or importance but in boosting we have different weights for each model based on its bias