Q1. What is Gradient Boosting Regression?

Gradient boosting is a machine learning technique used in regression tasks, among others. It gives a prediction model in the form of an ensemble of weak prediction models, i.e., models that make very few assumptions about the data, which are typically simple decision trees.When a decision tree is the weak learner, the resulting algorithm is called gradient-boosted trees; it usually outperforms random forest. A gradient-boosted trees model is built in a stage-wise fashion as in other boosting methods, but it generalizes the other methods by allowing optimization of an arbitrary differentiable loss function.

Q2. Implement a simple gradient boosting algorithm from scratch using Python and NumPy. Use a
simple regression problem as an example and train the model on a small dataset. Evaluate the model's
performance using metrics such as mean squared error and R-squared.

In [2]:
import pandas as pd
import numpy as np
from sklearn.datasets import load_iris

In [3]:
dataset=load_iris()

In [4]:
df=pd.DataFrame(data=dataset.data,columns=dataset.feature_names)

In [5]:
df.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2


In [7]:
X=df
y=dataset.target

In [11]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y,random_state=42,test_size=0.25)

In [12]:
from sklearn.ensemble import GradientBoostingRegressor

In [13]:
gradient=GradientBoostingRegressor()

In [14]:
gradient.fit(X_train,y_train)

In [16]:
y_pred=gradient.predict(X_test)

In [17]:
from sklearn.metrics import mean_squared_error,r2_score

In [19]:
print(mean_squared_error(y_test,y_pred))
print(r2_score(y_test,y_pred))

0.0035692524268991187
0.9949321528963202


Q3. Experiment with different hyperparameters such as learning rate, number of trees, and tree depth to
optimise the performance of the model. Use grid search or random search to find the best
hyperparameters

In [20]:
from sklearn.model_selection import GridSearchCV

In [21]:
parameters={
    "learning_rate":[0.01,0.1,1,0.001],
    "n_estimators":[100,200,300],
    "max_depth":[5,10,15]
}

In [22]:
grid=GridSearchCV(estimator=gradient,param_grid=parameters,cv=3,verbose=3)

In [23]:
grid.fit(X_train,y_train)

Fitting 3 folds for each of 36 candidates, totalling 108 fits
[CV 1/3] END learning_rate=0.01, max_depth=5, n_estimators=100;, score=0.796 total time=   0.1s
[CV 2/3] END learning_rate=0.01, max_depth=5, n_estimators=100;, score=0.780 total time=   0.1s
[CV 3/3] END learning_rate=0.01, max_depth=5, n_estimators=100;, score=0.770 total time=   0.1s
[CV 1/3] END learning_rate=0.01, max_depth=5, n_estimators=200;, score=0.900 total time=   0.1s
[CV 2/3] END learning_rate=0.01, max_depth=5, n_estimators=200;, score=0.851 total time=   0.1s
[CV 3/3] END learning_rate=0.01, max_depth=5, n_estimators=200;, score=0.888 total time=   0.1s
[CV 1/3] END learning_rate=0.01, max_depth=5, n_estimators=300;, score=0.922 total time=   0.2s
[CV 2/3] END learning_rate=0.01, max_depth=5, n_estimators=300;, score=0.849 total time=   0.2s
[CV 3/3] END learning_rate=0.01, max_depth=5, n_estimators=300;, score=0.905 total time=   0.2s
[CV 1/3] END learning_rate=0.01, max_depth=10, n_estimators=100;, score=0.

In [24]:
y_pred=grid.predict(X_test)

In [25]:
print(mean_squared_error(y_test,y_pred))
print(r2_score(y_test,y_pred))

1.6812422787480622e-16
0.9999999999999998


Q4. What is a weak learner in Gradient Boosting?

All the models we’ve learned so far are Strong Learners — models with the goal of doing as well as possible on the classification or regression task they are given. The term Weak Learner refers to simple models that do only slightly better than random chance. Boosting algorithms start with a single weak learner (tree methods are overwhelmingly used here), but technically, any model will do. Boosting works as follows:

1.Train a single weak learner
2.Figure out which examples the weak learner got wrong
3.Build another weak learner that focuses on the areas the first weak learner got wrong
4.Continue this process until a predetermined stopping condition is met, such as until a set number of weak learners have been created, or the model’s performance has plateaued

Q5. What is the intuition behind the Gradient Boosting algorithm?

The main intuition behind the algorithm is that the best possible next model, when combined with previous models, minimizes the overall prediction error. The key idea is to set the target outcomes for this next model to minimize the error. Let’s understand this with an example of data for the regression problem.

Q6. How does Gradient Boosting algorithm build an ensemble of weak learners?

The gradient-boosting regressor works by iteratively building an ensemble of weak learners, where each subsequent weak learner is trained to correct the mistakes made by the previous ones. The predictions from all weak learners are combined to make the final prediction.

Q7. What are the steps involved in constructing the mathematical intuition of Gradient Boosting
algorithm?

Step 1: Calculate the average/mean of the target variable.
Step 2: Calculate the residuals for each sample.
Step 3: Construct a decision tree. We build a tree with the goal of predicting the Residuals.
Step 4: Predict the target label using all the trees within the ensemble.
Compute the new residuals
Step 6: Repeat steps 3 to 5 until the number of iterations matches the number specified by the hyper parameter(numbers of estimators)
Step 7: Once trained, use all of the trees in the ensemble to make a final prediction as to value of the target variable. The final prediction will be equal to the mean we computed in Step 1 plus all the residuals predicted by the trees that make up the forest multiplied by the learning rate.