## Ensemble Method

Train multiple learners for same task, aka <i>committee-based, multiple classifier systems</i>

#### Homogeneous/Heterogeneous

Homogeneous: all have base learners of same type (e.g. all decision trees)\
Heterogeneous: use a variety\
note: base learners are known as <i>weak</i> learners due to performance only slightly better than random guess

#### Bagging

<i>Bootstrap aggregrating</i>\
uses bootstrap sampling or sampling with replacement\



In [4]:
from sklearn.utils import resample
data = [1,2,5,7,9,11,13,15,18]
sample1 = resample(data, replace=True, n_samples=5, random_state=11)
print('Bootstrap sample: {}'.format(sample1))

Bootstrap sample: [1, 2, 15, 2, 15]


Given m training examples, the probability that the ith example is selected k times is approximated by Poisson distribution with $\lambda = 1$

$p(k,\lambda) = \frac{e^{-\lambda}\lambda^k}{k!} = 0, 1, 2, \ldots $

If a learner is susceptible to additional or deletion of training examples, these are called <i>unstable learners</i>\
Learners such as k-NN are hardly affected by some additions/deletions therefore are known as <i>stable learners</i>\
Bagging works best with <strong><i>Unstable Learners</i></strong> (decision trees, etc)

#### Random Forest

Extension of bagging with decision trees. In addition to bootstrap sampling,\
a random subset of features are used to select splits from each tree node.
sklearn.ensemble: <a href="https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html">RandomForestClassifier</a>

#### Boosting

Any procedure that combines many weak learners to yield a much higher performance. Base learners in boosting are\
sequentially generated by focusing on examples misclassified by earlier weak learners in the chain.

Assign weights to each training example, sets probability of being chosen by next base learner\
Have identical weights initially, as misclassifications occur, increase the weight\
Outputs of learners are combined using the weights to create final response.

#### Combining Weak Learners

Averaging, voting, stacking\
stacking: combine the output of weak learners, aka <i>second-level learner</i> weak learners: <i>first-level</i>

#### Gradient Boosted Trees (GBT)

<ol>
    <li>Construct base tree with root node: initial guess for all samples</li>
    <li>Build a tree from errors of previous tree</li>
    <li>Scale the tree by learning rate, learning rate determines the contribution of tree in prediction</li>
    <li>Combine the new tree with all previous trees to predict result and repeat step 2 until max tree are achieved, or new 
        dont fit</li>
    <li>Final predicition model is combination of all the trees</li>
</ol>

Documentation on XGBoosT <a href="https://xgboost.ai">here</a>\
Variation of GBT, fast and highly accurate.