### What is Boosting?

Boosting is a **machine learning technique** that improves prediction accuracy by combining many **weak learners** (simple models) into one **strong learner** (a powerful, accurate model).

A **weak learner** is a simple model that performs just slightly better than random guessing, such as a very shallow decision tree. On their own, these models aren’t great, but when you combine them thoughtfully, they can achieve very high accuracy.

The key idea of boosting is **sequential learning**:
- Models are trained one after another.
- Each new model **focuses on the examples** that the previous models got wrong.
- Over many rounds, the ensemble “boosts” its performance by gradually fixing its weaknesses.

Boosting is often used to reduce **bias**, meaning it helps models that underfit (too simple) learn more complex patterns.



### How Boosting Differs from Bagging

It helps to compare boosting with another ensemble method called **bagging**:
- **Bagging** builds models in **parallel** (all at once), reducing **variance** — useful for models that overfit.
- **Boosting** builds models **sequentially** (one after another), reducing **bias** — useful when the model is too simple.

So, if you think of bagging as having many independent opinions averaged together, boosting is more like a conversation where each new speaker learns from the previous one’s mistakes.



### The AdaBoost Algorithm (Adaptive Boosting)

**AdaBoost** is one of the earliest and most popular boosting methods. It’s mainly used for **classification** tasks.

Here’s how it works step by step:

1. **Start with equal weights:**
   Every training example starts with the same weight. These weights tell the algorithm how much attention to pay to each sample.

2. **Train a weak model:**
   A very simple model — usually a tiny decision tree (called a “stump”) — is trained to classify the data.

3. **Evaluate errors:**
   The algorithm looks at which samples were incorrectly classified. It calculates the total *error rate*, which is the sum of the weights of the misclassified examples.  
   This value is often denoted as $ \varepsilon_s $.

4. **Compute model influence:**
   Each weak model gets a score called its **influence**, noted as $ \alpha_s $, based on how accurate it was:
   $$
   \alpha_s = \frac{1}{2} \log \left( \frac{1 - \varepsilon_s}{\varepsilon_s} \right)
   $$
   - If the model made fewer mistakes (small $ \varepsilon_s $), its influence $ \alpha_s $ will be high.
   - If the model did poor (large $ \varepsilon_s $), $ \alpha_s $ will be low.

5. **Update sample weights:**
   The weights of the misclassified samples are **increased** — the algorithm will pay more attention to them next time.
   The weights of correctly classified samples are **decreased**.  
   Then all weights are **normalized** so they still sum to 1.

6. **Repeat the process:**
   The next weak learner is trained using these new weights. It tries harder on the previously misclassified samples.  
   This loop continues for many rounds.

7. **Final prediction:**
   When making predictions, AdaBoost takes a **weighted vote** of all weak models’ outputs — models with higher influence ($ \alpha_s $) get more say.



### Behavior and Stopping

As AdaBoost continues training:
- It gradually builds more complex boundaries between classes.
- The **influence coefficients ($ \alpha_s $)** tend to get smaller over time — meaning new models are adding smaller improvements.
- Eventually, adding more models doesn’t change the predictions much. This signals it’s a good time to stop training.

Boosting tends to be **slow and careful**:
- It doesn’t overfit quickly.
- It gives you opportunities to stop before it becomes overly complex.

This is why AdaBoost and other boosting algorithms like **Gradient Boosting** and **XGBoost** are considered reliable, with strong performance on many types of data.