## Ensemble Models: Bagging and Boosting
#### The idea: Combine the results of multiple Models to enhance prediction performance

![image.png](attachment:image.png)

![image.png](attachment:image.png)

### Bagging
* Short for Bootstrap Aggregation
* We sample a subset of training set with replacement
* Train a classifier on the subset
* Repeat again for another random subset
* Get the majority vote of predictions from each of them to make the final prediction
* Can be parallelized


### Random Forest: Bagging applied to Decision Trees
* Bagging applied to Decision Tree as the base classifier
* We train several decision trees in parallel
* Get a majority vote from their predictions to make the final prediction
* How do base decision trees that make up our forest differ?
* Take a random sample (called the bootstrap sample) from dataset repeatedly with replacement
* Take a random subset of features instead of all features to build each tree
* Combining these two techniques results in more randomness and better generalization

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

#### Out of Bag Score (OOB)
* Because we train on a subset, the other samples not used in training can be used for validation
* If we do that, this is called Out of Bag score
* Most Scikit-learn Bagging methods e.g. Random forests support OOB


![image.png](attachment:image.png)

![image.png](attachment:image.png)

**In Random Forest**

* We use random subset of features for each tree
* The number of randomly selected features can influence the generalization error in two ways:
  * selecting many features increases the strength of the individual trees
  * reducing the number of features leads to a lower correlation among the trees increasing the strength of the forest as a whole.
  
Empirical rule found through experimentation

![image.png](attachment:image.png)

Where m is the number of features

**Sometimes additional randomness is desired beyond Random Forest**
* select features randomly and create splits randomly
* Called “Extra Random Trees”

### Boosting
* Train multiple classifiers sequentially
* Use the results of the previous classifiers to boost the performance of the next
* Use the output of the final one as prediction

![image.png](attachment:image.png)

![image.png](attachment:image.png)

**Using a Learning Rate < 1.0 helps overfitting (regularization)**

#### Boosting uses different loss functions

![image.png](attachment:image.png)

#### Adaptive Boosting
* Sample a subset of training set with replacement
* Train a classifier on that subset
* Make predictions using the classifier
* Increase the weight of misclassified examples
* Repeat the process with more chances of picking up misclassified examples

![image.png](attachment:image.png)

### Gradient Boosting
* Generalized Boosting method that can use sifferent loss functions
* Sample a subset of training set with replacement
* Train a classifier on that subset
* Make predictions using the classifier
* Calculate the residual errors made by the classifier
* Repeat the process with the next classifier using residual errors from the previous one for training itself
* Learning rate parameter controls how strongly each tree tries to correct the mistakes of the previous one

![image.png](attachment:image.png)

### Random Forest Vs Gradient Boosting
* Random Forest can be easily parallelized whereas Gradient Boosting is sequential
* Gradient Boosting is relatively more sensitive to parameter setting than Random Forest
* On increasing the number of trees, Gradient Boosting results are better than Random forest in general
* Gradient Boosting is very popular in ML competitions (e.g. Kaggle)
* Random Forest is more frequently used in practical and commercial setting

![image.png](attachment:image.png)