# Boosting

## Boosting

Boosting focuses on sequentially improving the model's weaknesses.

It gives more weight to misclassified instances, allowing the model to learn from its mistakes.


### Boosting with Decision Trees - AdaBoost

AdaBoost (Adaptive Boosting) is a popular boosting algorithm with decision trees.

Weak learners (usually shallow trees) are trained sequentially.

Each tree corrects the mistakes of the previous ones.

Initially, all data points have equal weights.

Misclassified points receive higher weights in subsequent iterations.

Final prediction is a weighted sum of individual tree predictions.


#### Tuning Parameters (ie hyperparameters)

The primary ones include:
- **base_estimator:** The base model used for boosting (e.g., DecisionTreeClassifier).
- **n_estimators:** Number of weak learners (base models) to train sequentially. (50)
- **learning_rate:** The contribution of each weak learner to the final prediction. (1)


### Benefits of Boosting

Why use boosting?
- Improved accuracy: Boosting focuses on difficult-to-learn instances, improving overall model performance.
- Versatility: Works well with various base learners.

## Bagging vs Boosting

#### Remember our bias-variance tradeoff??

Bagging aims to reduce variance (overfitting)

Boosting focuses on reducing bias (underfitting)

## Boosting 

|   |   |
|:--|:--|
| <img src="http://github.com/david-biron/DATA221imgs/blob/main/icon_definition.png?raw=true" width="50" height=""> | **Boosting** combines weak learners (decision trees) **sequentially**. <br> The output of a weak learner are assigned **weights**. <br> The incorrect classifications of the current weak learner  <br> are given higher weights such that they have more <br> representations/influence as inputs to the next weak learner. |


#### Note

Boosting can be a powerful alternative to bagging. Instead of 'egalitarian' aggregation of predictions, boosters tweak weak learners by sequentially preferring to correct previous errors. 

In Gradient Boosting, an individual model is trained on the **residuals** of its predecessor, such that it learns from previous errors.

### An outline of the AdaBoost algorithm

* Each record in the training set is assigned a weight. These weight change every step <br/> and influence the representation of the corresponding records in the next step. <br/> Initially, all weights are equal so the training subset is randomly selected. 

<br/> 

* Select a training subset according to the weights.
* Train the current stump using the current training subset. 
* Assign higher weights to wrongly classified records. <br/> They will be more likely represented in the input for the next iteration.
* Assign an overall weight to the stump based on its accuracy (for instance). 
* Iterate until the training data fits without errors or the specified maximum number of stumps was reached.

<br/>

* Weighted-vote across all of the stumps using the overall weights assigned to each of them.

**Note**: 

* The more succesful the stump, the more the sample weights of its errors are scaled up (emphasized). <br/> They are scaled down (de-emphasized) when the previous stump had a negative vote weight. 
* The more succesful the stump, the more the sample weights for correct classifications are scaled down (de-emphasized). <br/> They are scaled up (emphasized) when the previous stump had a negative vote weight.  

### Finally, after generating a forest of stumps: 

To classify a given record:
* Sum the $Stump\_Vote\_Weight$s for the stumps that give $0$.
* Sum the $Stump\_Vote\_Weight$s for the stumps that give $1$. 
* The larger of the two sums determines the classification decision. 

#### The base estimator can be changed 

For instance, instead of the (default) decision stumps AdaBoost can use [Support Vector Machines](https://scikit-learn.org/stable/modules/svm.html).

This may or may not improve things, depending on the dataset... 


#### Pros 

AdaBoost is not (particularly) prone to overfitting. 

#### Cons 

AdaBoost is sensitive to noise/outliers and can be slow. 