## Boosting 

#### Table of Contents

- [Preliminaries](#Preliminaries)
- [AdaBoost](#AdaBoost)
- [Null Model](#Null-Model)
- [Manual Gradient Boosting](#Manual-Gradient-Boosting)
- [Gradient Boosting](#Gradient-Boosting)
- [Extreme Gradient Boosting](#Extreme-Gradient-Boosting)


Today, we shall predict `pct_d_rgdp`.
We will fit 

1. an AdaBoost model
2. a manual gradient boosting model
3. `skearn`'s gradient boosting model
4. and finally `xgboost`'s EXTREME!!!!!!!!!! gradient boosting model


We are performing a regression problem.

```
conda install -c conda-forge xgboost
```
We have been using RMSE, MSE, and MAE to evaluate the performance of regression problems. 
Now we are going to use $R^2$ to take advantage of `sklearn`'s `.score()` method.

HOWEVER! Let's grab our metric functions from our helper script so I can make a point!

In [9]:
%run metrics.py

In [4]:
# utitlties
import pandas as pd
import numpy as np

#processing
from sklearn.metrics import plot_confusion_matrix
from sklearn.model_selection import GridSearchCV, train_test_split

# algorithms
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import AdaBoostRegressor, GradientBoostingRegressor
import xgboost as xgb

# plotting
import matplotlib.pyplot as plt
import seaborn as sns

In [6]:
df = pd.read_pickle('C:/Users/hubst/Econ490_group/class_data.pkl')
df = df.drop(columns = ['urate_bin', 'GeoName']).join([
    pd.get_dummies(df['urate_bin'], drop_first = True)
])

In [7]:
y = df['pct_d_rgdp']
x = df.drop(columns = 'pct_d_rgdp')
x_train, x_test, y_train, y_test = train_test_split(x, y, 
                                                    train_size = 2/3, 
                                                    random_state = 490)

*****
# Null Model
[TOP](#Boosting)

In [10]:
r2_null = r2(np.mean(y_train), y_test)

In [11]:
rmse_null = rmse(np.mean(y_train), y_test)

***************
# AdaBoost
[TOP](#Boosting)

In [18]:
reg_ada = AdaBoostRegressor(base_estimator = DecisionTreeRegressor(max_depth = 1),
                           n_estimators = 200,
                           learning_rate = 0.5)
reg_ada.fit(x_train, y_train)

AdaBoostRegressor(base_estimator=DecisionTreeRegressor(max_depth=1),
                  learning_rate=0.5, n_estimators=200)

Let's see how we did!

In [19]:
r2_ada = reg_ada.score(x_test, y_test)
r2_ada

-0.7789815954703778

What? A negative $R^2$?!

In [20]:
rmse_ada = rmse(reg_ada.predict(x_test), y_test)
rmse_ada

12.397708716312868

In [21]:
rmse_null

9.295172646932564

In [22]:
r2_null

-8.104385480267595e-06

We have overfit our training data. 
Perhaps we should take a look at some good ol' cross-validation.

In [24]:
%%time
param_grid = {
    'n_estimators': [15, 25, 50, 75],
    'learning_rate': 10.**np.arange(-6, -2)
}

ada_cv = AdaBoostRegressor(base_estimator = DecisionTreeRegressor(max_depth = 1),
                           random_state = 490)
grid_search = GridSearchCV(ada_cv, param_grid,
                          cv = 5,
                          scoring = 'r2',
                          n_jobs = 2).fit(x_train, y_train)

best = grid_search.best_params_
best

Wall time: 1min 1s


{'learning_rate': 0.001, 'n_estimators': 25}

In [26]:
reg_ada = AdaBoostRegressor(base_estimator = DecisionTreeRegressor(max_depth = 1),
                           random_state = 490,
                           n_estimators = best['n_estimators'],
                           learning_rate = best['learning_rate'])
reg_ada.fit(x_train, y_train)

r2_ada = reg_ada.score(x_test, y_test)
r2_ada

0.0006505586486580395

In [7]:
for i in range(0, 200, 25):
    print (i)

0
25
50
75
100
125
150
175


We explained 1% of the variation in the test data... Yikes...

*****
# Manual Gradient Boosting
[TOP](#Boosting)

Our textbook lays out how to manually fit a gradient descent problem. 
Since the mid-semester feedback told me most students are not reading the textbook, let's demonstrate how to do it.

In [27]:
reg1 = DecisionTreeRegressor(max_depth = 2).fit(x_train, y_train)
y_train2 = y_train - reg1.predict(x_train)

reg2 = DecisionTreeRegressor(max_depth = 2).fit(x_train, y_train2)
y_train3 = y_train - reg2.predict(x_train)

reg3 = DecisionTreeRegressor(max_depth = 2).fit(x_train, y_train3)

yhat = sum(reg.predict(x_test) for reg in (reg1, reg2, reg3))
r2_manual =  r2(yhat, y_test)
r2_manual

-0.03345984800204671

Better than AdaBoost...

********
# Gradient Boosting
[TOP](#Boosting)

Fortunately, unlike `AdaBoostRegressor()`, `GradientBoostingRegressor()` has early stopping. 

This means cross-validation is not necessary!! WOOO!!!!!

In [29]:
reg_gb = GradientBoostingRegressor(n_estimators = 200,
                                  max_depth = 2,
                                  learning_rate = 0.1,
                                  validation_fraction = 1/8,
                                  n_iter_no_change = 4,
                                  verbose = 2)
reg_gb.fit(x_train, y_train)

      Iter       Train Loss   Remaining Time 
         1          83.4084           11.94s
         2          83.0673           11.49s
         3          82.7788           11.23s
         4          82.4451           10.98s
         5          82.1454           10.92s
         6          81.9084           10.80s
         7          81.7115           10.67s
         8          81.4859           10.59s
         9          81.2805           10.53s
        10          81.1250           10.45s
        11          80.9172           10.36s
        12          80.7497           10.28s
        13          80.6237           10.22s
        14          80.4921           10.17s
        15          80.3611           10.10s
        16          80.2315           10.05s
        17          80.1085           10.00s
        18          79.9921            9.95s
        19          79.8975            9.89s
        20          79.7921            9.84s
        21          79.7015            9.80s
        2

GradientBoostingRegressor(max_depth=2, n_estimators=200, n_iter_no_change=4,
                          validation_fraction=0.125, verbose=2)

Let's see how we did!

In [30]:
r2_gb = reg_gb.score(x_test, y_test)
r2_gb

0.026609031992107846

Relatively speaking, not to shabby.
Not super exciting either.

*****
# Extreme Gradient Boosting
[TOP](#Boosting)

Now we get to test to see if extreme gradient boosting is all that it is made out to be!

In [31]:
x_train_train, x_train_test, y_train_train, y_train_test = train_test_split(x_train, y_train,
                                                                           train_size = 4/5,
                                                                           random_state = 490)

In [33]:
reg_xgb = xgb.XGBRegressor(n_estimators = 200,
                          max_depth = 2,
                          learning = 0.1,
                          random_state = 490)

reg_xgb.fit(x_train_train, y_train_train,
           eval_set = [(x_train_test, y_train_test)],
           early_stopping_rounds = 4)

Parameters: { learning } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[0]	validation_0-rmse:8.99019
[1]	validation_0-rmse:8.94539
[2]	validation_0-rmse:8.92961
[3]	validation_0-rmse:8.92726
[4]	validation_0-rmse:8.89911
[5]	validation_0-rmse:8.91130
[6]	validation_0-rmse:8.91488
[7]	validation_0-rmse:8.91046
[8]	validation_0-rmse:8.92315


XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
             colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
             importance_type='gain', interaction_constraints='', learning=0.1,
             learning_rate=0.300000012, max_delta_step=0, max_depth=2,
             min_child_weight=1, missing=nan, monotone_constraints='()',
             n_estimators=200, n_jobs=12, num_parallel_tree=1, random_state=490,
             reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
             tree_method='exact', validate_parameters=1, verbosity=None)

In [34]:
r2_xgb = reg_xgb.score(x_test, y_test)
r2_xgb

0.017345876922395753

Well, there you have it.

Let us sloppily print out these values for a conclusion.

In [35]:
print('R2 Null:', r2_null)
print('R2 AdaBoost:', r2_ada)
print('R2 Manual:', r2_manual)
print('R2 GradientBoosting:', r2_gb)
print('R2 XGBoost:', r2_xgb)

R2 Null: -8.104385480267595e-06
R2 AdaBoost: 0.0006505586486580395
R2 Manual: -0.03345984800204671
R2 GradientBoosting: 0.026609031992107846
R2 XGBoost: 0.017345876922395753
