# Gradient Boosting regression

Gradient Boosting is a powerful boosting algorithm that combines several weak learners into strong learners, in which each new model is trained to minimize the loss function such as mean squared error or cross-entropy of the previous model using gradient descent. In each iteration, the algorithm computes the gradient of the loss function with respect to the predictions of the current ensemble and then trains a new weak model to minimize this gradient. The predictions of the new model are then added to the ensemble, and the process is repeated until a stopping criterion is met.

## Importing and loading data

In [12]:
# Importing libraries
import pandas as pd 
import numpy as np

# For plotting graphs
import matplotlib.pyplot as plt

# Importing the train_test_split function
from sklearn.model_selection import train_test_split
from sklearn import datasets

from sklearn.metrics import r2_score, mean_squared_error
from sklearn.ensemble import GradientBoostingRegressor, AdaBoostRegressor

from sklearn import metrics
from sklearn.metrics import f1_score

import warnings
warnings.filterwarnings('ignore')

In [13]:
from sklearn.datasets import load_diabetes

# Load the diabetes data
X, y = load_diabetes(return_X_y = True)

print(X[:10], end = '\n')
print('---')
print(y[:10])

[[ 0.03807591  0.05068012  0.06169621  0.02187239 -0.0442235  -0.03482076
  -0.04340085 -0.00259226  0.01990749 -0.01764613]
 [-0.00188202 -0.04464164 -0.05147406 -0.02632753 -0.00844872 -0.01916334
   0.07441156 -0.03949338 -0.06833155 -0.09220405]
 [ 0.08529891  0.05068012  0.04445121 -0.00567042 -0.04559945 -0.03419447
  -0.03235593 -0.00259226  0.00286131 -0.02593034]
 [-0.08906294 -0.04464164 -0.01159501 -0.03665608  0.01219057  0.02499059
  -0.03603757  0.03430886  0.02268774 -0.00936191]
 [ 0.00538306 -0.04464164 -0.03638469  0.02187239  0.00393485  0.01559614
   0.00814208 -0.00259226 -0.03198764 -0.04664087]
 [-0.09269548 -0.04464164 -0.04069594 -0.01944183 -0.06899065 -0.07928784
   0.04127682 -0.0763945  -0.04117617 -0.09634616]
 [-0.04547248  0.05068012 -0.04716281 -0.01599898 -0.04009564 -0.02480001
   0.00077881 -0.03949338 -0.06291688 -0.03835666]
 [ 0.06350368  0.05068012 -0.00189471  0.06662945  0.09061988  0.10891438
   0.02286863  0.01770335 -0.03581619  0.00306441]


## Model building

### Creating the training and testing sets

In [14]:
# Divide into train and test sets
train_x, test_x, train_y, test_y = train_test_split(X, y, random_state = 23)

print(train_x.shape, train_y.shape)
print(test_x.shape, test_y.shape)

(331, 10) (331,)
(111, 10) (111,)


### Build the Gradient Boosting model

In [15]:
# Creating an Gradient boosting instance
gbr = GradientBoostingRegressor(loss = 'absolute_error', learning_rate = 0.1, n_estimators = 300, max_depth = 1, random_state = 23, max_features = 5)

# Train the model
gbr.fit(train_x, train_y)

In [16]:
# Calculating scores
print('Training Score:', gbr.score(train_x, train_y).round(3))
print('Testing Score:', gbr.score(test_x, test_y).round(3))

Training Score: 0.579
Testing Score: 0.437


In [23]:
# Calculate MAE, R2 Score and RMSE
y_train_pred = gbr.predict(train_x)
y_pred = gbr.predict(test_x)

print('Training R2-Score:', round(r2_score(train_y, y_train_pred), 3))
print('Testing R2-Score:', round(r2_score(test_y, y_pred), 3))
print('Root Mean Square Error:', round(np.sqrt(mean_squared_error(test_y, y_pred)), 3))

Training R2-Score: 0.579
Testing R2-Score: 0.437
Root Mean Square Error: 56.392


### Build the AdaBoost model

In [20]:
# Creating an Gradient boosting instance
ada = AdaBoostRegressor(loss = 'linear', learning_rate = 0.1, n_estimators = 300, random_state = 23)

# Train the model
ada.fit(train_x, train_y)

In [21]:
# Calculating scores
print('Training Score:', ada.score(train_x, train_y).round(3))
print('Testing Score:', ada.score(test_x, test_y).round(3))

Training Score: 0.63
Testing Score: 0.425


In [24]:
# Calculate MAE, R2 Score and RMSE
y_train_pred = ada.predict(train_x)
y_pred = ada.predict(test_x)

print('Training R2-Score:', round(r2_score(train_y, y_train_pred), 3))
print('Testing R2-Score:', round(r2_score(test_y, y_pred), 3))
print('Root Mean Square Error:', round(np.sqrt(mean_squared_error(test_y, y_pred)), 3))

Training R2-Score: 0.63
Testing R2-Score: 0.425
Root Mean Square Error: 56.969
