<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#AdaBoost-(Adaptive-Boosting)" data-toc-modified-id="AdaBoost-(Adaptive-Boosting)-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>AdaBoost (Adaptive Boosting)</a></span><ul class="toc-item"><li><span><a href="#Algorithm" data-toc-modified-id="Algorithm-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Algorithm</a></span></li><li><span><a href="#Voting" data-toc-modified-id="Voting-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Voting</a></span></li><li><span><a href="#Code" data-toc-modified-id="Code-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Code</a></span><ul class="toc-item"><li><span><a href="#Hyperparameters" data-toc-modified-id="Hyperparameters-1.3.1"><span class="toc-item-num">1.3.1&nbsp;&nbsp;</span>Hyperparameters</a></span></li></ul></li></ul></li><li><span><a href="#Gradient-Boosting" data-toc-modified-id="Gradient-Boosting-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Gradient Boosting</a></span><ul class="toc-item"><li><span><a href="#Algorithm" data-toc-modified-id="Algorithm-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Algorithm</a></span></li><li><span><a href="#Code" data-toc-modified-id="Code-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Code</a></span><ul class="toc-item"><li><span><a href="#Create-some-noisy-data" data-toc-modified-id="Create-some-noisy-data-2.2.1"><span class="toc-item-num">2.2.1&nbsp;&nbsp;</span>Create some noisy data</a></span></li></ul></li><li><span><a href="#Train-iteratively-on-the-residuals-of-its-predecessor" data-toc-modified-id="Train-iteratively-on-the-residuals-of-its-predecessor-2.3"><span class="toc-item-num">2.3&nbsp;&nbsp;</span>Train iteratively on the residuals of its predecessor</a></span><ul class="toc-item"><li><span><a href="#Observe-how-the-regressor-gets-better" data-toc-modified-id="Observe-how-the-regressor-gets-better-2.3.1"><span class="toc-item-num">2.3.1&nbsp;&nbsp;</span>Observe how the regressor gets better</a></span></li><li><span><a href="#Using-SciKit-learn's-Gradient-Boosting" data-toc-modified-id="Using-SciKit-learn's-Gradient-Boosting-2.3.2"><span class="toc-item-num">2.3.2&nbsp;&nbsp;</span>Using SciKit-learn's Gradient Boosting</a></span><ul class="toc-item"><li><span><a href="#Comparing-gradient-boosting-with-many-estimators" data-toc-modified-id="Comparing-gradient-boosting-with-many-estimators-2.3.2.1"><span class="toc-item-num">2.3.2.1&nbsp;&nbsp;</span>Comparing gradient boosting with many estimators</a></span></li></ul></li><li><span><a href="#Early-Stopping" data-toc-modified-id="Early-Stopping-2.3.3"><span class="toc-item-num">2.3.3&nbsp;&nbsp;</span>Early Stopping</a></span></li></ul></li></ul></li></ul></div>

# AdaBoost (Adaptive Boosting)

## Algorithm

- Train model
- inflate errors
    + Errors increase weight to make it a 50-50
- Retrain model using errors (repeat)

![](images/adaboost.png)

## Voting

Combine weak learners

+ Vote by combining these calculated $y$ for each crossed-area (negative for "not blue" or whatever)

## Code

https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.AdaBoostRegressor.html

```python
from sklearn.ensemble import AdaBoostClassifier
model = AdaBoostClassifier()
model.fit(x_train, y_train)
model.predict(x_test)
```

### Hyperparameters

```python
from sklearn.tree import DecisionTreeClassifier
model = AdaBoostClassifier(base_estimator = DecisionTreeClassifier(max_depth=4), n_estimators = 5)
```

`base_estimator`: The model utilized for the weak learners (Warning: Don't forget to import the model that you decide to use for the weak learner).
`n_estimators`: The maximum number of weak learners used.

# Gradient Boosting

Use gradient descent to improve the model

![](images/gradient_boosting_residuals.png)

## Algorithm

Use mean squared error (MSE) and want to minimize that <-- done by gradient descent

Use the residuals (pattern in the residuals) to create an even better model

1. Fit a model to the data, $F_1(x) = y$
2. Fit a model to the residuals, $h_1(x) = y - F_1(x)$
3. Create a new model, $F_2(x) = F_1(x) + h_1(x)$
4. Repeat


## Code

https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html

> Parts adapted from https://github.com/ageron/handson-ml/blob/master/07_ensemble_learning_and_random_forests.ipynb

In [None]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

seed = 42
np.random.seed(seed)

### Create some noisy data

In [None]:
n = 200
X = np.random.rand(n, 1) - 0.5
y = 3*X[:, 0]**2 + 0.05 * np.random.randn(n)

plt.scatter(X,y,alpha=0.3)

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=seed)

## Train iteratively on the residuals of its predecessor

In [None]:
from sklearn.tree import DecisionTreeRegressor

In [None]:
# First iteration
tree_reg1 = DecisionTreeRegressor(max_depth=2, random_state=seed)
tree_reg1.fit(X, y)

In [None]:
# Second iteration
y2 = y - tree_reg1.predict(X)
y2

In [None]:
tree_reg2 = DecisionTreeRegressor(max_depth=2, random_state=seed)
tree_reg2.fit(X, y2)

In [None]:
# Third iteration
y3 = y2 - tree_reg2.predict(X)
tree_reg3 = DecisionTreeRegressor(max_depth=2, random_state=seed)
tree_reg3.fit(X, y3);

### Observe how the regressor gets better

In [None]:
def plot_predictions(regressors, X, y, axes, label=None, style="r-", data_style="b.", data_label=None):
    x1 = np.linspace(axes[0], axes[1], 500)
    y_pred = sum(regressor.predict(x1.reshape(-1, 1)) for regressor in regressors)
    plt.plot(X[:, 0], y, data_style, label=data_label)
    plt.plot(x1, y_pred, style, linewidth=2, label=label)
    if label or data_label:
        plt.legend(loc="upper center", fontsize=16)
    plt.axis(axes)

In [None]:
plt.figure(figsize=(11,11))

# First iteration
plt.subplot(321)
plot_predictions([tree_reg1], X, y, axes=[-0.5, 0.5, -0.1, 0.8], label="$h_1(x_1)$", style="g-", data_label="Training set")
plt.ylabel("$y$", fontsize=16, rotation=0)
plt.title("Residuals and tree predictions", fontsize=16)

plt.subplot(322)
plot_predictions([tree_reg1], X, y, axes=[-0.5, 0.5, -0.1, 0.8], label="$h(x_1) = h_1(x_1)$", data_label="Training set")
plt.ylabel("$y$", fontsize=16, rotation=0)
plt.title("Ensemble predictions", fontsize=16)


# Second iteration
plt.subplot(323)
plot_predictions([tree_reg2], X, y2, axes=[-0.5, 0.5, -0.5, 0.5], label="$h_2(x_1)$", style="g-", data_style="k+", data_label="Residuals")
plt.ylabel("$y - h_1(x_1)$", fontsize=16)

plt.subplot(324)
plot_predictions([tree_reg1, tree_reg2], X, y, axes=[-0.5, 0.5, -0.1, 0.8], label="$h(x_1) = h_1(x_1) + h_2(x_1)$")
plt.ylabel("$y$", fontsize=16, rotation=0)


# Third iteration
plt.subplot(325)
plot_predictions([tree_reg3], X, y3, axes=[-0.5, 0.5, -0.5, 0.5], label="$h_3(x_1)$", style="g-", data_style="k+")
plt.ylabel("$y - h_1(x_1) - h_2(x_1)$", fontsize=16)
plt.xlabel("$x_1$", fontsize=16)

plt.subplot(326)
plot_predictions([tree_reg1, tree_reg2, tree_reg3], X, y, axes=[-0.5, 0.5, -0.1, 0.8], label="$h(x_1) = h_1(x_1) + h_2(x_1) + h_3(x_1)$")
plt.xlabel("$x_1$", fontsize=16)
plt.ylabel("$y$", fontsize=16, rotation=0)

plt.show()

### Using SciKit-learn's Gradient Boosting

In [None]:
from sklearn.ensemble import GradientBoostingRegressor

In [None]:
gbrt = GradientBoostingRegressor(max_depth=2, n_estimators=3, learning_rate=1.0)
gbrt.fit(X, y)

#### Comparing gradient boosting with many estimators 

In [None]:
gbrt_slow = GradientBoostingRegressor(max_depth=2, n_estimators=200, learning_rate=0.1, random_state=seed)
gbrt_slow.fit(X, y)

In [None]:
plt.figure(figsize=(11,4))

plt.subplot(121)
plot_predictions([gbrt], X, y, axes=[-0.5, 0.5, -0.1, 0.8], label="Ensemble predictions")
plt.title("learning_rate={}, n_estimators={}".format(gbrt.learning_rate, gbrt.n_estimators), fontsize=14)

plt.subplot(122)
plot_predictions([gbrt_slow], X, y, axes=[-0.5, 0.5, -0.1, 0.8])
plt.title("learning_rate={}, n_estimators={}".format(gbrt_slow.learning_rate, gbrt_slow.n_estimators), fontsize=14)

plt.show()

### Early Stopping

> https://nbviewer.jupyter.org/github/ageron/handson-ml/blob/master/07_ensemble_learning_and_random_forests.ipynb#Gradient-Boosting-with-Early-stopping