Ensemble learning is a family of methods where you train multiple models and then combine their outputs to get a stronger overall predictor than any single model alone. It is especially powerful for reducing variance and/or bias in practical ML systems.

### Core ideas: bagging vs boosting

- **Bagging (Bootstrap Aggregating)**:  
  - Train many copies of the same base model (often decision trees) on different bootstrap samples of the training data (sampling with replacement).  
  - Combine predictions by averaging (regression) or majority vote (classification).  
  - Main effect: reduces **variance** and overfitting by smoothing out unstable models like deep trees (e.g., Random Forest).

- **Boosting**:  
  - Train models **sequentially**; each new model focuses on the mistakes of the previous ones (by reweighting examples or fitting to residuals).  
  - Combine predictions with a **weighted** sum or vote.  
  - Main effect: reduces **bias** and can achieve very high accuracy with shallow trees (weak learners) but is more sensitive to noise and hyperparameters.

### Main ensemble families

1. **Bagging-type methods**
   - Random Forest  
     - Many decision trees trained on bootstrap samples.  
     - At each split, only a random subset of features is considered, which decorrelates trees and improves generalization.  
     - Good default ensemble for tabular data; relatively robust, parallelizable, and easy to tune (few main hyperparameters like number of trees, max depth, max features).
   - Bagging meta-estimator  
     - General wrapper that can bag **any** base estimator (not just trees) by training it on bootstrap samples and averaging the results.

2. **Boosting-type methods**
   - AdaBoost  
     - Sequentially trains weak learners (often decision stumps) and increases the weight of misclassified examples.  
     - Final prediction is a weighted vote of all learners.  
     - Works well on simpler datasets; can be sensitive to outliers.
   - Gradient Boosting Machine (GBM)  
     - Views boosting as **gradient descent** in function space: each new tree fits the negative gradient of a chosen loss (e.g., squared error, logistic loss).  
     - Very flexible, but classical implementations can be slower to train and require careful tuning (learning rate, number of trees, depth).
   - XGBoost (XGB)  
     - Optimized, regularized gradient boosting framework with features like tree pruning, builtâ€‘in handling of missing values, efficient handling of sparse data, and good parallelization.  
     - Strong baseline for tabular problems; heavily used in Kaggle competitions.
   - LightGBM  
     - Gradient boosting with a different tree-growing strategy (leaf-wise with depth constraints) and histogram-based splits.  
     - Very fast on large datasets with many features; handles categorical features with special encodings.
   - CatBoost  
     - Gradient boosting with native, sophisticated handling of **categorical** features (ordered target statistics, permutations) and strong default settings.  
     - Often performs well with minimal feature engineering on mixed/tabular data.

3. **Other ensemble concepts (good to keep in mind)**
   - Voting and averaging: combine independently trained models (possibly of different types) by simple averaging or majority vote.  
   - Stacking: train diverse base models and then learn a **meta-model** on their predictions to combine them optimally.

### When to use which (intuition)

- Start with **Random Forest** when you want a strong, robust baseline on tabular data with minimal tuning.  
- Use **gradient boosting** (XGBoost, LightGBM, CatBoost) when you need top performance and can spend time tuning; especially effective on structured/tabular data.  
- Use **bagging meta-estimator** when you have an unstable base learner (like a high-variance model) and want to stabilize it.  
- Consider **AdaBoost** for simpler problems or when you want an interpretable, classical boosting method.  