# Why Ensemble?

> "With our powers combined, it's Captain Planet!!!"


<img src='https://data.junkee.com/wp-content/uploads/2016/10/captain.jpg' width=50%/>

Decreases variance thus overfitting (common in Decision Trees)

# Bagging - Bootstrap AGGregation

Train weak learners, combine together into one via voting

![](images/bagging.png)

Essentially pick 'randomly' the different columns, create multiple DTs and let them vote!

# Random Forest

## The Good?

- Super friend! 
- High performance 
    + low variance
- Transparent
    + inherited from Decision Trees

## The Bads?

- We got so many trees to plant...
- Computationally expensive
- Memory
    + all trees stored in memory
    + think back to k-Nearest Neighbors

## Subspace Sampling Method

### Don't be like a banana tree

Banana trees can be susceptible to [Panama's disease](https://en.wikipedia.org/wiki/Panama_disease)

![Many individual yellow bananas](images/bananas.jpeg)
They're all clones!

All Decision Trees will be the same if given the same data! (A clone!!!)

### Breed variety of trees

Take part of the data to create different trees

Steps:

1. Save a portion of data for validation (**out-of-bag**), the rest for training (**bag**)
2. The data for training (**bag**) is then split up by randomly selecting predictors
3. Grow/train your tree with the training data using just those features
4. Use our validation set (**out-of-bag**), take out the columns used in our tree from the previous step, and predict using the tree & this *out-of-bag* data
5. Compare on how well the tree did *out-of-bag error*
6. Repeat to make new trees and use the result to "vote" for the final decision

### Q: Benefit?

Less overfitting! Variety is the spice of life!

## Code

https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn.ensemble.RandomForestClassifier

# Adaboost

## Algorithm

- Train model
- inflate errors
    + Errors increase weight to make it a 50-50
- Retrain model using errors (repeat)

![](images/adaboost.png)

## Voting

Combine weak learners

+ Truthful, Guesser, Liar ==> 1,0,-1 weights
+ $y= ln(\frac{x}{1-x})$ where $x$ is accuracy
+ $y=ln(\frac{correct}{incorrect})$ to log-odds
+ Vote by combining these calculated $y$ for each crossed-area (negative for "not blue" or whatever

## Code

https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.AdaBoostRegressor.html

```python
from sklearn.ensemble import AdaBoostClassifier
model = AdaBoostClassifier()
model.fit(x_train, y_train)
model.predict(x_test)
```

### Hyperparameters

```python
from sklearn.tree import DecisionTreeClassifier
model = AdaBoostClassifier(base_estimator = DecisionTreeClassifier(max_depth=4), n_estimators = 5)
```

`base_estimator`: The model utilized for the weak learners (Warning: Don't forget to import the model that you decide to use for the weak learner).
`n_estimators`: The maximum number of weak learners used.

# Gradient Boosting

Use gradient descent to improve the model

![](images/gradient_boosting_residuals.png)

## Algorithm

Use mean squared error (MSE) and want to minimize that <-- done by gradient descent

Use the residuals (pattern in the residuals) to create an even better model

1. Fit a model to the data, $F_1(x) = y$
2. Fit a model to the residuals, $h_1(x) = y - F_1(x)$
3. Create a new model, $F_2(x) = F_1(x) + h_1(x)$
4. Repeat


## Code

https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html

## XGBoost - eXtreme Gradient Boosting
