- Bias
    - bias refers to the error introduced by approximating a real world problem with a simple model.
    - a model with high bias pays little attention to the training data and oversimplifies the problem , leading to underfitting .model fails to capture pattern
- Variance
    -  variance refers to the error introduced due to a models sensitivity to small fluctuations in the training dataset .high variance means model is more sensitive ,affected by outliers and overfit.

# Bagging vs Boosting: A Comprehensive Comparison

## Introduction
Bagging and Boosting are both powerful **ensemble learning** techniques used to enhance the performance of machine learning models. While both combine multiple base models to create a stronger learner, they do so in very different ways. Below, we explore the key differences, advantages, and use cases for **Bagging** and **Boosting**.

---

## Key Concepts

### **1. Bagging (Bootstrap Aggregating)**
- **Goal**: Reduce variance and prevent overfitting.
- **Method**: 
  - **Parallel training** of multiple models on **random subsets** of the data.
  - Each model is trained on a **bootstrap sample** (randomly drawn subset with replacement) of the training data.
  - After training, predictions from all models are combined by **averaging** (for regression) or **voting** (for classification).
- **Examples**: Random Forest, Bagged Decision Trees.
- **Impact**: Primarily reduces **variance** but does not directly address bias.

#### Key Characteristics:
- **Independence**: Models are trained independently, in parallel.
- **Final Prediction**: Averaging or voting.
- **Focus**: Reduces overfitting by smoothing predictions across multiple models.

---

### **2. Boosting**
- **Goal**: Reduce both bias and variance.
- **Method**: 
  - **Sequential training** of models, where each subsequent model corrects the errors made by the previous one.
  - Each new model gives more weight to the **misclassified instances** of previous models.
  - The final model is a weighted combination of all the models in the ensemble.
- **Examples**: AdaBoost, Gradient Boosting (XGBoost, LightGBM, CatBoost).
- **Impact**: Reduces both **bias** and **variance**, creating a strong model by correcting weaknesses iteratively.

#### Key Characteristics:
- **Sequential Learning**: Models are trained one after the other, each correcting errors.
- **Final Prediction**: Weighted sum (for regression) or weighted vote (for classification).
- **Focus**: Corrects errors made by prior models, reduces bias, and can increase accuracy.

---

## **Comparison Table**

| **Aspect**               | **Bagging**                                    | **Boosting**                                      |
|--------------------------|------------------------------------------------|--------------------------------------------------|
| **Training Process**      | Parallel (independent models)                  | Sequential (models build on each other)          |
| **Focus**                 | Reduces variance by averaging/voting           | Reduces both bias and variance by correcting errors|
| **Model Weighting**       | Equal weights for all models                   | Models are weighted based on performance         |
| **Combination of Models** | Averaging (regression) or Voting (classification) | Weighted sum (regression) or weighted vote (classification) |
| **Examples**              | Random Forest                                  | AdaBoost, Gradient Boosting (XGBoost, LightGBM)  |
| **Risk of Overfitting**   | Less prone (reduces variance)                  | More prone (especially with noisy data)          |
| **Computational Cost**    | Faster to train (independent models)           | Slower to train (sequential nature)              |
| **Parallelization**       | Easily parallelizable                          | Difficult to parallelize due to sequential nature|
| **Handling Noise**        | More robust to noise and outliers              | Can be sensitive to noisy data                   |

---

## **When to Use Each?**

### **Use Bagging if:**
- Your model is prone to overfitting due to high variance.
- You have noisy data or outliers that might negatively affect your model.
- You want a stable, less complex model (e.g., Random Forest).
- You want to combine models independently to reduce overfitting.

### **Use Boosting if:**
- You want to improve predictive accuracy and reduce both bias and variance.
- Your base learner is weak (e.g., shallow decision trees) and you want to correct errors.
- You are willing to take on additional computational complexity for potentially higher accuracy.
- You need to focus on difficult, misclassified instances in your dataset.

---

## **Pros and Cons**

### **Bagging**
- **Pros**:
  - Reduces variance, preventing overfitting.
  - Handles noisy data and outliers well.
  - Simple and intuitive approach.
- **Cons**:
  - Does not reduce bias (can still have high bias if base learner is weak).
  - May not outperform boosting on certain problems.

### **Boosting**
- **Pros**:
  - Often results in higher accuracy by reducing both bias and variance.
  - Can be used with weak learners, improving their performance significantly.
  - Works well for complex datasets.
- **Cons**:
  - Prone to overfitting, especially with noisy data.
  - Computationally expensive and harder to parallelize.
  - Sensitive to outliers, as misclassified instances are given more weight.

---

## **Summary**

- **Bagging** is a technique for reducing **variance** by training models independently on different subsets of the data and combining their predictions. It is most useful for models that are highly complex and prone to overfitting (e.g., decision trees).
- **Boosting**, on the other hand, reduces **bias** and **variance** by focusing sequentially on the mistakes of previous models. It is highly effective in improving the accuracy of weak learners but can be prone to overfitting and is computationally more expensive.

Choose **Bagging** when stability and reducing variance are your priorities, and opt for **Boosting** when improving accuracy by focusing on difficult instances and reducing both bias and variance is your goal.

---


#### gradient boosting
![grad_data](images/grad_data.png)

- let base model is to find average of y 
- let it is 75k 
- compute residual
![](images/grad2.png)


- let the difference is the residual here -25,-5,...
-  construct decision tree with old inputs and output R1
-  get result and residuals
- for input 1 -> 75 + lr (R2) is output
- add one more decision tree , here output will be R2
- output ->F(x)

___

![ada](images/ada1.png)
![ada](images/ada2.png)

- boosting
    - create base learners sequentially
    - if n records are incorrectly classified ,only these records are passed to next base learner. and so on ...
    - it will go on unless and until we want only some amount of base learners



### ADABOOST
- we have  weights


#### step1
- let f1,f2,f3 be features and we have 7 rows
- assign sample weight to each row 1/7 (1/n)
#### step2
- first base learner will be decision tree
    - here decistion trees are of a single depth ->stumps
    - we create a stump foreach feature and choose one with low entropy as first base learner
#### step3
- if it predicted 4 correct and 1 wrong 
- we will calculate total error(TE) =  1/7 (add all sample weights)
- find performance of stump = 1/2 log_e(1-TE)/TE = 0.896
- only wrong classified record will be passed to next baselearner
- for that weight of wrongly classified record will be increased and others will be dicreased
  - new weight (erronious data)= weight * e^performance say  (.896)
  - for correclty classified points  change formula by adding '- ' to performance say



#### step4
- total of updated weights is not 1 so standardize it
- create new dataset 
    - based on normalized weight we will create bucket
      - let (0.07,0.51,0.07,0.07,0.07,0.07,0.07) be updated weights
      - 0 to 0.07 is first bucket
      - 0.07 to 0.51 be next bucket
      - 0.51 to 0.51+0.07 be next bucket  and so on
    - iterativly choose datasets .suppose first iteration choosen 0.43 select the bucket and curresponding data and populate to new  
    - probability of choosing erronious data will be hight
    - create new stump and continue
