## What is bagging
bagging is short for "bootstrap aggregating".
It aims at producing an ensemble model that is more robust than the individual models composing it (reduce variance).

Due to the theoretical variance of the training dataset (we remind that a dataset is an observed sample coming from a true unknown underlying distribution), the fitted model is also subject to variability: if another dataset had been observed, we would have obtained a different model.

The idea of bagging is then simple: we want to **fit several independent models and “average” their predictions in order to obtain a model with a lower variance**.

## Steps for bagging

![](./bagging.png)
1. create multiple bootstrap samples so that each new bootstrap sample will act as another (almost) independent dataset drawn from true distribution.
2. fit a weak learner for each of these samples and finally aggregate them such that we kind of “average” their outputs.

## Ways to average the final output

1. For a regression problem, the outputs of individual models can literally be averaged to obtain the output of the ensemble model. 
2. For classification problem the class outputted by each model can be seen as a vote and the class that receives the majority of the votes is returned by the ensemble model (this is called **hard-voting**). 
3. Still for a classification problem, we can also consider the probabilities of each classes returned by all the models, average these probabilities and keep the class with the highest average probability (this is called **soft-voting**). 

Averages or votes can either be simple or weighted if any relevant weights can be used.

## Why bagging works?

Roughly speaking, as the bootstrap samples are approximatively independent and identically distributed (i.i.d.), so are the learned base models. Then, “averaging” weak learners outputs do not change the expected answer but reduce its variance (just like averaging i.i.d. random variables preserve expected value but reduce variance).