## Bagging aka Bootstrap Aggregating

### Bagging algorithm

```psuedo code
    
let n be the total number of bootstrap samples

for i = 1 to n do
   Draw a bootstrap sample of size m, D1
   Train base classifier h1 on D1
y_pred = mode{h1(X), ..., hn(X)}     
```

## How Bootstraping Is Done

In bootstraping what we do is we randomly select samples from the original training data with replacement. The size of the bootstrap sample must be equal to that of the original training sample. Example if the original training sample was ``m`` then the size of each bootstrap round must be of size ``m``. Sampling with replacement means some data points can be repeated more than once. 

Not all data points from the original data will be included in each bootstrap sample since some of them repeat and we don't want to exceed the size ``m``.

The once that do not occure in the bootstrap sample are called **Out Of Back** samples or simpley **OOB**. These are just data point that are not included in a given **round**.

![bootstrap1](bootstrap1.png)
[source](https://www.youtube.com/watch?v=pWSULhaZlQM&list=PLTKMiZHVd_2KyGirGEvKlniaWeLOHhUF3&index=41)

## The Math Behind It

### What is the probability of not choosing a given sample

probability of choosing a given sample is :
 ### $$ P = \frac{1}{m} $$
where m = total number of samples

Hence probability of not choosing a sample is:

### $$ P(not choosen) = \Big(1 - \frac{1}{m} \Big) $$

Applying this to the whole sample we, raise it to power ``m`` and we get:

### $$ P(not choosen) = \Big(1 - \frac{1}{m} \Big)^m $$

### In case the data set is very large then:

### $$ P(not choosen) = \frac{1}{e}, m \to \infty = 0.368 = 36.8% $$

This means that, 36.8% of the data will not be choosen if the dataset was very large.

![bootstrap2](bootstrap2.png)
[source](https://www.youtube.com/watch?v=pWSULhaZlQM&list=PLTKMiZHVd_2KyGirGEvKlniaWeLOHhUF3&index=41)

From the above graph, with 200+ data points, we'll be having around 0.63 or 63% unique data points, the rest are duplicates

## Overview Of Bagging in General

NOTE: Majority voting is same as **mode**

![bootstrap3](bootstrap3.png)
[source](https://www.youtube.com/watch?v=pWSULhaZlQM&list=PLTKMiZHVd_2KyGirGEvKlniaWeLOHhUF3&index=41)


In [8]:
from sklearn.svm import SVC
from sklearn.ensemble import BaggingClassifier
from sklearn.datasets import make_classification
X, y = make_classification(n_samples=100, n_features=4,
                           n_informative=2, n_redundant=0,
                           random_state=0, shuffle=False)

clf = BaggingClassifier(base_estimator=SVC(),
                        n_estimators=500, 
                        random_state=0,
                        oob_score=True
                       ).fit(X, y)

clf.predict([[0, 0, 0, 0]])

array([1])

The **OOB_score** is computed as the number of correctly predicted rows from the out-of-bag sample. And. **OOB Error** is the number of wrongly classifying the OOB Sample[source](https://www.analyticsvidhya.com/blog/2020/12/out-of-bag-oob-score-in-the-random-forest-algorithm/)

In [9]:
print(f"OOB score: {clf.oob_score_}")

OOB score: 0.89


In [None]:
The more the ``n_estimators`` the greater the 

## Why Bagging Works

Bagging works to well do to the fact that it finds average of many base learners hence reducing variance of the base learners when combined. For more watch the ending section of this [video](https://www.youtube.com/watch?v=pWSULhaZlQM&list=PLTKMiZHVd_2KyGirGEvKlniaWeLOHhUF3&index=41)

## Resources

[Official documentation](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingClassifier.html)

[video Resource](https://www.youtube.com/watch?v=pWSULhaZlQM&list=PLTKMiZHVd_2KyGirGEvKlniaWeLOHhUF3&index=41)