### Building a gradient boosting model from scratch

In [8]:
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')

In [9]:
df_bikes = pd.read_csv('../data/bike_rentals_cleaned.csv')
df_bikes.head()

Unnamed: 0,instant,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,cnt
0,1,1.0,0.0,1,0.0,6.0,0.0,2,0.344167,0.363625,0.805833,0.160446,985
1,2,1.0,0.0,1,0.0,0.0,0.0,2,0.363478,0.353739,0.696087,0.248539,801
2,3,1.0,0.0,1,0.0,1.0,1.0,1,0.196364,0.189405,0.437273,0.248309,1349
3,4,1.0,0.0,1,0.0,2.0,1.0,1,0.2,0.212122,0.590435,0.160296,1562
4,5,1.0,0.0,1,0.0,3.0,1.0,1,0.226957,0.22927,0.436957,0.1869,1600


 Split the data into X and y, then split into training and test sets

In [10]:
X_bikes = df_bikes.iloc[:,:-1]
y_bikes = df_bikes.iloc[:, -1]

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X_bikes, y_bikes, random_state=2)

#### Steps to build a gradient model:
* Fit the data to the decision tree. This can be a stump (max_depth=1) or 2 or 3. This initial decision tree is the base learner
* Initalize a DT with max_depth = 2 and fit it on the training set as tree_1

In [11]:
from sklearn.tree import DecisionTreeRegressor

tree_1 = DecisionTreeRegressor(max_depth=2, random_state=2)
tree_1.fit(X_train, y_train)

* Make predictions with the training set, instead of with test set, to compute the residuals while still in the training phase.

In [12]:
y_train_pred = tree_1.predict(X_train)

* Compute the residuals (difference between predictions and target column)


In [13]:
# These are called train because they are the new target column for the tree
y2_train = y_train - y_train_pred

* Fit the new tree on the residuals

In [16]:
tree_2 = DecisionTreeRegressor(max_depth=2, random_state=2)
tree_2.fit(X_train, y2_train)

* Repeat steps. As process continues, the residuals should gradually approach 0.

In [17]:
y2_train_pred = tree_2.predict(X_train)
y3_train = y2_train - y2_train_pred
tree_3 = DecisionTreeRegressor(max_depth=2, random_state=2)
tree_3.fit(X_train, y3_train)

Sum the results.

In [19]:
y1_pred = tree_1.predict(X_test)
y2_pred = tree_2.predict(X_test)
y3_pred = tree_3.predict(X_test)

y_pred = y1_pred + y2_pred + y3_pred

Compute the Mean Squared Error (MSE)

In [21]:
from sklearn.metrics import mean_squared_error as MSE
MSE(y_test, y_pred)**0.5

911.0479538776444

### Use scikit-learn GradientBoostingRegressor

In [22]:
from sklearn.ensemble import GradientBoostingRegressor

In [23]:
gbr = GradientBoostingRegressor(max_depth=2, n_estimators=3, random_state=2, learning_rate=1.0)

gbr.fit(X_train, y_train)
y_pred = gbr.predict(X_test)
MSE(y_test, y_pred)**0.5

911.0479538776439

By changing the number of estimators, the number of iterations will be changed.

In [24]:
gbr = GradientBoostingRegressor(max_depth=2, n_estimators=30, random_state=2, learning_rate=1.0)

gbr.fit(X_train, y_train)

y_pred = gbr.predict(X_test)
MSE(y_test, y_pred)**0.5

857.1072323426944

What about 300 estimators?

In [25]:
gbr = GradientBoostingRegressor(max_depth=2, n_estimators=300, random_state=2, learning_rate=1.0)

gbr.fit(X_train, y_train)

y_pred = gbr.predict(X_test)
MSE(y_test, y_pred)**0.5

936.3617413678853

The score got worse! what if we remove learning rate = 1.0 and leave it as default.

In [26]:
gbr = GradientBoostingRegressor(max_depth=2, n_estimators=300, random_state=2)

gbr.fit(X_train, y_train)

y_pred = gbr.predict(X_test)
MSE(y_test, y_pred) ** 0.5

653.7456840231495

Much better

### Hyperparameters of Gradient Boosters
#### Learning Rate
* also known as shrinkage, specifies the contribution of individual trees, so no tree has too much influence when building the model. as no_estimators goes up, learning rate should go down

In [27]:
learning_rate_values = [0.001, 0.01, 0.05, 0.1, 0.15, 0.2, 0.3, 0.5, 1.0]

for value in learning_rate_values:
    gbr = GradientBoostingRegressor(max_depth=2, n_estimators=300, random_state=2, learning_rate=value)
    gbr.fit(X_train, y_train)
    y_pred = gbr.predict(X_test)
    rmse = MSE(y_test, y_pred)**0.5
    print('Learning Rate:', value, ', Score:', rmse)

Learning Rate: 0.001 , Score: 1633.0261400367258
Learning Rate: 0.01 , Score: 831.5430182728547
Learning Rate: 0.05 , Score: 685.0192988749717
Learning Rate: 0.1 , Score: 653.7456840231495
Learning Rate: 0.15 , Score: 687.666134269379
Learning Rate: 0.2 , Score: 664.312804425697
Learning Rate: 0.3 , Score: 689.4190385930236
Learning Rate: 0.5 , Score: 693.8856905068778
Learning Rate: 1.0 , Score: 936.3617413678853


In [28]:
for value in learning_rate_values:
    gbr = GradientBoostingRegressor(max_depth=2, n_estimators=3000, random_state=2, learning_rate=value)
    gbr.fit(X_train, y_train)
    y_pred = gbr.predict(X_test)
    rmse = MSE(y_test, y_pred)**0.5
    print('Learning Rate:', value, ', Score:', rmse)

Learning Rate: 0.001 , Score: 833.3969271105901
Learning Rate: 0.01 , Score: 657.6148014941521
Learning Rate: 0.05 , Score: 682.9065694015884
Learning Rate: 0.1 , Score: 672.9003273253688
Learning Rate: 0.15 , Score: 702.6671711067352
Learning Rate: 0.2 , Score: 673.13671999295
Learning Rate: 0.3 , Score: 705.3628705117246
Learning Rate: 0.5 , Score: 704.3625015524514
Learning Rate: 1.0 , Score: 941.7714655879082


In [29]:
depths = [None, 1, 2, 3, 4]
for value in depths:
    gbr = GradientBoostingRegressor(max_depth=value, n_estimators=300, random_state=2)
    gbr.fit(X_train, y_train)
    y_pred = gbr.predict(X_test)
    rmse = MSE(y_test, y_pred)**0.5
    print('Depth:', value, ', Score:', rmse)

Depth: None , Score: 869.2788645118395
Depth: 1 , Score: 707.8261886858736
Depth: 2 , Score: 653.7456840231495
Depth: 3 , Score: 646.4045923317708
Depth: 4 , Score: 663.048387855927


Subsample is a subset of samples. Since samples are the rows, a subset of rows means that all rows may not be includes when building each tree. When subsample is not 1, this method is stochastic gradient descent.

In [31]:
samples = [1, 0.9, 0.8, 0.7, 0.6, 0.5]
for sample in samples:
    gbr = GradientBoostingRegressor(max_depth=3, n_estimators=300, subsample=sample, random_state=2)
    gbr.fit(X_train, y_train)
    y_pred = gbr.predict(X_test)
    rmse = MSE(y_test, y_pred)**0.5

    print('Subsample:', sample, ', Score:', rmse)

Subsample: 1 , Score: 646.4045923317708
Subsample: 0.9 , Score: 620.1819001443569
Subsample: 0.8 , Score: 617.2355650565677
Subsample: 0.7 , Score: 612.9879156983139
Subsample: 0.6 , Score: 622.6385116402317
Subsample: 0.5 , Score: 626.9974073227554


### RandomizedSearchCV

In [32]:
# here is a possible starting point:
params = {'subsample': [0.65, 0.7, 0.75],
          'n_estimators': [300, 500, 1000],
          'learning_rate': [0.05, 0,.075, 0.1]}
from sklearn.model_selection import RandomizedSearchCV
gbr = GradientBoostingRegressor(max_depth=3, random_state=2)

rand_reg = RandomizedSearchCV(gbr, params, n_iter=10, scoring='neg_mean_squared_error', cv=5, n_jobs=-1, random_state=2)

rand_reg.fit(X_train, y_train)
best_model = rand_reg.best_estimator_
best_params = rand_reg.best_params_

print("Best params:", best_params)

best_score = np.sqrt(-rand_reg.best_score_)

print("Training score: {:.3f}".format(best_score))

y_pred = best_model.predict(X_test)

rmse_test = MSE(y_test, y_pred)**0.5

print('Test set score: {:.3f}'.format(rmse_test))

Best params: {'subsample': 0.65, 'n_estimators': 300, 'learning_rate': 0.05}
Training score: 636.200
Test set score: 625.985


we found that best params:
n_estimator = 1600, learning_rate=0.02, subsample=0.75

### XGBoost

In [33]:
from xgboost import XGBRegressor

xg_reg = XGBRegressor(max_depth=3, n_estimators=1600, eta=0.02, subsample=0.75, random_state=2)
xg_reg.fit(X_train, y_train)

y_pred = xg_reg.predict(X_test)
MSE(y_test, y_pred)**0.5

584.3395337495713