### Boosting

- Training ensemble of predictors sequencially for the purpose of the predictor to try to correct the errors of the predecessor predictor.

### AdaBoost - "Adaptive Boosting"

- Changes the weights of training instances

Classification
- weighted majority voting via **AdaBoostClassifier**

Regression
- weighted average via **AdaBoostRegressor**

In [None]:
from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import train_test_split

SEED=1
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
                                                    stratify=y,
                                                   random_state=SEED)

dt= DecisionTreeClassifier(max_depth=1, random_state=SEED)
adb_clf = AdaBoostClassifier(base_estimator=dt, n_estimators=100)

adb_clf.fit(X_train, y_train)

y_pred_proba = adb_clf.predict_proba(X_test)[:,1]

adb_clf_roc_auc_score = roc_auc_score(y_test, y_pred_proba)

print('ROC AUC Score: {:.2f}'.format(adb_clf_roc_auc_score))

### Gradient Boosting

- unlike AdaBoost, the weights of training instances are not tweaked.
- sequential correction of predecessor's errors
- fit each predictor is trained using residual errors as labels
- Gradient Boosted Trees - a CART used as a BASE LEARNER

- Classification: **GradientBoostingClassifier**
- Regression: **GradientBoostingRegressor**

In [None]:
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error as MSE

SEED=1
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
                                                    stratify=y,
                                                   random_state=SEED)

gbt = GradientBoostingRegressor(n_estimators=300,
                               max_depth=1,
                               random_seed=SEED)

gbt.fit(X_train, y_train)
y_pred = gbt.predict(X_test)

rmse_test = (MSE(y_test, y_pred))**(1/2)
print('test set RMSE of rf: {:.2f}'.format(rmse_test))


### Stochastic Gradient Boosting

- each tree is trained on a random subset of rows of the training data
- 40-80% of sampled instances are sampled without replacement
- features are sampled without replacements when choosing split points
- **this results in adding further variance to the ensemble of trees**

In [None]:
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error as MSE

SEED=1
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
                                                    stratify=y,
                                                   random_state=SEED)

sgbt = GradientBoostingRegressor(max_depth=1,
                                subsample=0.8,
                                max_features,
                                n_estimators=300,
                                random_state=SEED)

sgbt.fit(X_train, y_train)
y_pred = sgbt.predict(X_test)

rmse_test = MSE(y_test, y_pred)**(1/2)
print('Test set RMSE: {:.2f}'.format(rmse_test))