# AdaBoost

- <b>Boosting</b>: ensemble method combining several weak learners to form a strong learner
- <b>Weak Learner</b>: model doing slightly better than random guessing (Decision Stump whose maximum depth = 1)

### AdaBoost and GradientBoosting
- train an ensemble of predictors sequentially
- each predictor tries to correct its predecessor
- In AdaBoost (Adaptive Boosting), each predictor pays more attention to the instances wrongly predicted by its predecessor. Achieves it by changing the weights of training instances


ADABOOST -> LEARNING RATE (0 <= LEARNINGRATE <= 1)

REWRITE ABOUT ADABOOST & GRADIENTBOOSTING

In [2]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

from sklearn.ensemble import AdaBoostClassifier, GradientBoostingRegressor
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.metrics import roc_auc_score, mean_squared_error as MSE
from sklearn.model_selection import train_test_split

sns.set()

In [3]:
from sklearn.datasets import load_breast_cancer

cancer = load_breast_cancer()
X, y = cancer.data, cancer.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.3, stratify=y)

In [4]:
dt = DecisionTreeClassifier(max_depth=1)
adaboost = AdaBoostClassifier(base_estimator=dt, n_estimators=100)

adaboost.fit(X_train, y_train)

y_pred_proba = adaboost.predict_proba(X_test)[:, 1]

# evaluate test set roc_auc_score
adaboost_roc_auc = roc_auc_score(y_test, y_pred_proba)

print(f'ROC AUC score: {adaboost_roc_auc}')

ROC AUC score: 0.9811623831775701


# GradientBoosting
- sequential correction of predecessor's errors
- doesn't tweak the weights of training instances (unlike AdaBoost)
- fit each predictor is trained using its predecessor's residual errors as labels


In [5]:
auto_mpg = pd.read_csv('datasets/auto_mpg.csv')
auto_mpg = pd.get_dummies(auto_mpg, columns=['origin'])
X, y = auto_mpg.drop('mpg', axis=1), auto_mpg['mpg']

In [6]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.3)

dt = DecisionTreeRegressor(max_depth=2)
gradient = GradientBoostingRegressor(n_estimators=300, max_depth=1).fit(X_train, y_train)
y_pred = gradient.predict(X_test)
mse = MSE(y_test, y_pred)
print(f'MSE: {mse}')
print(f'RMSE: {np.sqrt(mse)}')
print(f'Test score: {gradient.score(X_test, y_test)}')

MSE: 16.866683838572914
RMSE: 4.106906845616652
Test score: 0.7189088575104001


# Stochastic Gradient Boosting (SGB) 
- each tree is trained on a random subset of rows of the training data
- the sampled instances (40-80% of the training set) are sampled without replacement 
- features are sampled without replacement when choosing split points
- result: further ensemble diversity
- effect: adding further variance to the ensemble of trees


Now define a stochastic-gradient-boosting-regressor named sgbt consisting of 300 decision-stumps. This can be done by setting the parameters max_depth to 1 and n_estimators to 300. Here, the parameter subsample was set to 0-dot-8 in order for each tree to sample 80% of the data for training. Finally, the parameter max_features was set to 0-dot-2 so that each tree uses 20% of available features to perform the best-split. Once done, fit sgbt to the training set and predict the test set labels.

In [7]:
# instantiate a stochastic gradientboosting
sgbt = GradientBoostingRegressor(max_depth=1, subsample=.8, max_features=.2, n_estimators=200)
sgbt.fit(X_train, y_train)

y_pred = sgbt.predict(X_test)
mse = MSE(y_test, y_pred)
print(f'MSE: {mse}')
print(f'RMSE: {np.sqrt(mse)}')
print(f'Test score: {sgbt.score(X_test, y_test)}')

MSE: 18.210396357928435
RMSE: 4.267364099526596
Test score: 0.6965152624885123
