# Ensemble Models

- An ensemble method combines the predictions of many individual classifiers by majority voting.
- Ensemble of *low-correlating* classifiers with slightly greater than 50% accuracy will outperform each of the classifiers individually.
- Condorcet's jury theorem: 
    - If each member of the jury (of size $N$) makes an *independent* judgement and the probability $p$ of the correct decision by each juror is more than 0.5, then the probability of the correct decision $P_N$ by the majority $m$ tends to one. On the other hand, if $p<0.5$ for each juror, then the probability tends to zero.
    <center><img width=350 src="images/Screen Shot 2019-06-06 at 20.29.13.png"/></center>
    - where $m$ as a minimal number of jurors that would make a majority.
    - But real votes are not independent, and do not have uniform probabilities.
- Uncorrelated submissions clearly do better when ensembled than correlated submissions.
- Majority votes make most sense when the evaluation metric requires hard predictions.
- [KAGGLE ENSEMBLING GUIDE](https://mlwave.com/kaggle-ensembling-guide/)

## Averaging

- Averaging is taking the mean of individual model predictions.
- Averaging predictions often reduces variance (as bagging does).
- It’s a fairly trivial technique that results in easy, sizeable performance improvements.
- Averaging exactly the same linear regressions won't give any penalty.
- An often heard shorthand for this on Kaggle is "bagging submissions".

#### Weighted averaging:
- Use weighted averaging to give a better model more weight in a vote.

#### Conditional averaging:
- Use conditional averaging to cancel out erroneous ranges of individual estimators.

## Bagging

- Bagging (bootstrap aggregating) means averaging slightly different versions of the same model to improve accuracy.
- Bagging combines *strong learners* together in order to "smooth out" their predictions and reduce overfitting.
- Randomly subsampled training data (bootstrapping) produces further diversity of individual models and drives efficiency.

<center><img width=350 src="images/Ozone.png"/></center>
<center><a href="https://en.wikipedia.org/wiki/Bootstrap_aggregating" style="color: lightgrey">Credit</a></center>


- Can be used with any type of method as a base model.
- Bagging is effective on small datasets.
- Out-of-bag estimate is the mean estimate of the base algorithms on 37% of inputs that are left out of a particular bootstrap sample.
    - Helps avoid the need for an independent validation dataset.
- Parameters that control bagging:
    - Random seed
    - Row sampling or bootstrapping
    - Column sampling or bootstrapping
    - Size of sample (use a much smaller sample size on a larger dataset)
    - Shuffling
    - Number of bags
    - Parallelism

#### Bootstrapping:
- Bootstrapping is random sampling with replacement.
- Sampling with replacement is a convenient way to treat a sample like it is a population.
- This technique allows estimation of the sampling distribution of almost any statistic using random sampling methods.
- It is a straightforward way to derive estimates of standard errors and confidence intervals for complex estimators.
- For example:
    - Select a random element from the original sample of size $N$ and do this $N$ times.
    - Calculate the mean of each sub-sample.
    - Obtain a 95% confidence interval around the mean estimate for the original sample.
- Each bootstrap sample of size $N$ contains on average $0.623*N$ observations.

## Boosting

## Stacking