# Implementation of Gradient Boost Regressor

<div class='alert alert-success'>
    <p>Referred Article for Gradient Boosting ==> <a href="https://www.analyticsvidhya.com/blog/2021/04/how-the-gradient-boosting-algorithm-works/">LINK TO ARTICLE</a></p>
</div>

In [2]:
# importing modules

import numpy as np
import pandas as pd
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.preprocessing import LabelEncoder

In [3]:
# Creation of Dataframe

X=pd.DataFrame({'LikesExercising':[False,False,False,True,False,True,True,True,True],
                'GotoGym':[True,True,True,True,True,False,True,False,False],
                 'DrivesCar':[True,False,False,True,True,False,True,False,True]})
Y=pd.Series(name='Age',data=[14,15,16,26,36,50,69,72,74])

In [4]:
X.head()

Unnamed: 0,LikesExercising,GotoGym,DrivesCar
0,False,True,True
1,False,True,False
2,False,True,False
3,True,True,True
4,False,True,True


In [6]:
Y[:5]

0    14
1    15
2    16
3    26
4    36
Name: Age, dtype: int64

In [8]:
# Label Encoding True and False to 1 and 0
LE = LabelEncoder()
X['LikesExercising'] = LE.fit_transform(X["LikesExercising"])
X['GotoGym'] = LE.fit_transform(X["GotoGym"])
X['DrivesCar'] = LE.fit_transform(X["DrivesCar"])

X.head()

Unnamed: 0,LikesExercising,GotoGym,DrivesCar
0,0,1,1
1,0,1,0
2,0,1,0
3,1,1,1
4,0,1,1


In [10]:
# 1) Let us now use  GradientBoostingRegressor with 2 estimators 
# to train the model and to predict the age for the same inputs. 

GB = GradientBoostingRegressor(n_estimators = 2)
GB.fit(X,Y)
Y_pred = GB.predict(X) 
#ages predicted by model with 2 estimators 

Y_pred

array([38.23 , 36.425, 36.425, 42.505, 38.23 , 45.07 , 42.505, 45.07 ,
       47.54 ])

In [12]:
# Finding Mean Squared Error of prediction with Gradient Boosting having 2 estimators

MSE_2 = (sum((Y-Y_pred)**2))/len(Y)
print('Mean Squared Error for 2 estimators: ', MSE_2)

Mean Squared Error for 2 estimators:  432.48205555555546


## Doing same thing for 3 and 50 estimators

In [13]:
# 2) Let us now use GradientBoostingRegressor with 3 estimators 
#to train the model and to predict the age for the same inputs. 

GB=GradientBoostingRegressor(n_estimators=3)
GB.fit(X,Y)
Y_predict=GB.predict(X) #ages predicted by model with 3 estimators
Y_predict

array([36.907 , 34.3325, 34.3325, 43.0045, 36.907 , 46.663 , 43.0045,
       46.663 , 50.186 ])

In [14]:
MSE_3=(sum((Y-Y_predict)**2))/len(Y)
print('MSE for three estimators :',MSE_3)

MSE for three estimators : 380.05602055555556


<hr><hr>

In [15]:
# 3) Let us now use GradientBoostingRegressor with 50 estimators 
#to train the model and to predict the age for the same inputs.

GB=GradientBoostingRegressor(n_estimators=50)
GB.fit(X,Y)
Y_predict=GB.predict(X) #ages predicted by model with 50 estimators
Y_predict

array([25.08417833, 15.63313919, 15.63313919, 47.46821839, 25.08417833,
       60.89864242, 47.46821839, 60.89864242, 73.83164334])

In [16]:
MSE_50=(sum((Y-Y_predict)**2))/len(Y)
print('MSE for fifty estimators :',MSE_50)

MSE for fifty estimators : 156.5667260994211


## Observation:
As we can see here, MSE reduces as we increase the estimator value. The situation comes where MSE becomes saturated which means even if we increase the estimator value there will be no significant decrease in MSE.

# <font color="purple">Finding the best estimator with GridSearchCV:</font>


In [17]:
from sklearn.model_selection import GridSearchCV
?GridSearchCV

In [18]:
model = GradientBoostingRegressor()
params = {'n_estimators': range(1,200)}
grid = GridSearchCV(estimator = model,
                   cv=2,
                   param_grid = params,
                   scoring = 'neg_mean_squared_error')

grid.fit(X,Y)
print("The best fit estimator returned by GridSearch CV is:", grid.best_estimator_)


The best fit estimator returned by GridSearch CV is: GradientBoostingRegressor(n_estimators=19)


In [19]:
GB = grid.best_estimator_
GB.fit(X,Y)

Y_pred = GB.predict(X)
Y_pred

array([27.20639114, 18.98970027, 18.98970027, 46.66697477, 27.20639114,
       58.34332496, 46.66697477, 58.34332496, 69.58721772])

In [20]:
#Following code is used to find out MSE of prediction for 
#Gradient boosting algorithm with best estimator value given by GridSearchCV


MSE_best=(sum((Y-Y_predict)**2))/len(Y)
print('MSE for best estimators =', MSE_best)

MSE for best estimators = 156.5667260994211


<div class = 'alert alert-info'><p>
<strong>Observation:</strong>
You may think that MSE for n_estimator=50 is better than MSE for n_estimator=19. Still GridSearchCV returns 19 not 50. Actually, we can observe here is until 19 with each increment in estimator value the reduction in MSE was significant, but after 19 there is no significant decrease in MSE with increment in estimators. So, n_estimator=19 was returned by GridSearchSV.</p></div>