In sequential methods the different combined weak models are no longer fitted independently from each others. The idea is to fit models iteratively such that the training of model at a given step depends on the models fitted at the previous steps. “Boosting” is the most famous of these approaches and it produces an ensemble model that is in general less biased than the weak learners that compose it.

### Boosting:

Boosting methods work in the same spirit as bagging methods: we build a family of models that are aggregated to obtain a strong learner that performs better. However, unlike bagging that mainly aims at reducing variance, boosting is a technique that consists in fitting sequentially multiple weak learners in a very adaptative way: each model in the sequence is fitted giving more importance to observations in the dataset that were badly handled by the previous models in the sequence. Intuitively, each new model focus its efforts on the most difficult observations to fit up to now, so that we obtain, at the end of the process, a strong learner with lower bias (even if we can notice that boosting can also have the effect of reducing variance). Boosting, like bagging, can be used for regression as well as for classification problems.

Being mainly focused at reducing bias, the base models that are often considered for boosting are models with low variance but high bias. For example, if we want to use trees as our base models, we will choose most of the time shallow decision trees with only a few depths. Another important reason that motivates the use of low variance but high bias models as weak learners for boosting is that these models are in general less computationally expensive to fit (few degrees of freedom when parametrised). Indeed, as computations to fit the different models can’t be done in parallel (unlike bagging), it could become too expensive to fit sequentially several complex models.
Once the weak learners have been chosen, we still need to define how they will be sequentially fitted (what information from previous models do we take into account when fitting current model?) and how they will be aggregated (how do we aggregate the current model to the previous ones?). We will discuss these questions in the two following subsections, describing more especially two important boosting algorithms: adaboost and gradient boosting.

**In a nutshell, these two meta-algorithms differ on how they create and aggregate the weak learners during the sequential process. Adaptive boosting updates the weights attached to each of the training dataset observations whereas gradient boosting updates the value of these observations. This main difference comes from the way both methods try to solve the optimisation problem of finding the best model that can be written as a weighted sum of weak learners.**

## REFERNCES: 

- [Gradient Boosting in Python from Scratch*](https://towardsdatascience.com/gradient-boosting-in-python-from-scratch-4a3d9077367#:~:text=Gradient%20boosting%20is%20an%20ensemble,what%20it's%20given%20as%20inputs.)
- [Gradient Boosting from scratch*](https://medium.com/mlreview/gradient-boosting-from-scratch-1e317ae4587d)
- [Gradient Boosting (GBM) from Scratch - [Tutorial]](https://steemit.com/machine-learning/@cristi/gradient-boosting-gbm-from-scratch-tutorial)
- [How to explain gradient boosting](https://explained.ai/gradient-boosting/)
- [Demystifying Maths of Gradient Boosting](https://towardsdatascience.com/demystifying-maths-of-gradient-boosting-bd5715e82b7c#:~:text=The%20idea%20is%20simple%2D%20form,suitable%20number%20of%20base%20learners.)
- [A Gentle Introduction to Gradient Boosting](http://www.ccs.neu.edu/home/vip/teach/MLcourse/4_boosting/slides/gradient_boosting.pdf)
- [The Gradient Boosters I: The Good Old Gradient Boosting](https://deep-and-shallow.com/2020/02/02/the-gradient-boosters-i-the-math-heavy-primer-to-gradient-boosting-algorithm/)
- [Gradient Boosting Explained](https://www.gormanalysis.com/blog/gradient-boosting-explained/)
- [Math behind GBM and XGBoost](https://medium.com/analytics-vidhya/math-behind-gbm-and-xgboost-d00e8536b7de)
- [Gradient Boosting Decision Tree Algorithm Explained*](https://towardsdatascience.com/machine-learning-part-18-boosting-algorithms-gradient-boosting-in-python-ef5ae6965be4)
- []()