Q1. What is Gradient Boosting Regression?

Gradient Boosting Regression is a powerful machine learning technique that combines multiple weak learners (typically decision trees) to create a strong predictive model. It's particularly effective for regression tasks, where the goal is to predict a continuous numerical value.

Q2. Implement a simple gradient boosting algorithm from scratch using Python and NumPy. Use a
simple regression problem as an example and train the model on a small dataset. Evaluate the model's
performance using metrics such as mean squared error and R-squared.

In [1]:
from sklearn.datasets import make_regression
X,y = make_regression(n_features=4, n_informative= 2,
                      shuffle= False, random_state= 0)

In [2]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.33, random_state=42)

In [3]:
from sklearn.ensemble import GradientBoostingRegressor

In [4]:
reg = GradientBoostingRegressor()

In [5]:
reg.fit(X_train, y_train)

In [6]:
y_pred = reg.predict(X_test)

In [7]:
from sklearn.metrics import mean_squared_error,r2_score

In [8]:
print(mean_squared_error(y_test,y_pred))
print(r2_score(y_test,y_pred))

75.8584242184952
0.9389871479185301


Q3. Experiment with different hyperparameters such as learning rate, number of trees, and tree depth to
optimise the performance of the model. Use grid search or random search to find the best
hyperparameters.

In [9]:
param = {'n_estimators': [25, 50, 100,150, 200],
    'learning_rate': [0.1, 0.03, 0.05, 0.08, 0.01],
    'max_depth': [3, 5, 7, 10, 12, 15]}

In [10]:
from sklearn.model_selection import GridSearchCV

In [11]:
grid_r = GridSearchCV(reg, param_grid= param, cv =5,
                      scoring='accuracy')

In [12]:
import warnings
warnings.filterwarnings('ignore')

In [13]:
grid_r.fit(X_train, y_train)

In [14]:
grid_r.best_params_

{'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 25}

In [15]:
y_pred = grid_r.predict(X_test)

In [16]:
print(mean_squared_error(y_test,y_pred))
print(r2_score(y_test,y_pred))

109.60512123000933
0.9118447144417787


Q4. What is a weak learner in Gradient Boosting?

A weak learner in Gradient Boosting is a simple model that is only slightly better than random guessing. Typically, decision trees with a shallow depth (e.g., 1-2 levels) are used as weak learners in Gradient Boosting. These simple models are easy to train and can capture basic patterns in the data

Q5. What is the intuition behind the Gradient Boosting algorithm?

Imagine you're trying to paint a portrait. You start with a rough sketch (the first weak learner), and then gradually refine it with each stroke (each subsequent weak learner). You focus on the areas that need the most improvement, adding more detail and correcting mistakes.

Q6. How does Gradient Boosting algorithm build an ensemble of weak learners?

Gradient Boosting builds an ensemble of weak learners, typically decision trees, in a sequential manner. Here's how it works:
- A simple model, often a decision tree, is trained on the entire dataset.
- This initial model makes predictions.
- The difference between the actual values and the predicted values (residuals) is calculated.
- A new model is trained to predict these residuals. This model focuses on the areas where the previous model made errors.
- The predictions of all the models are combined to create a final prediction.

Q7. What are the steps involved in constructing the mathematical intuition of Gradient Boosting
algorithm?

Gradient Boosting is a powerful ensemble learning technique that relies on the concept of gradient descent to iteratively improve the performance of a model. Let's break down the mathematical intuition behind this algorithm.
- The first step is to define a loss function that measures the discrepancy between the predicted values and the true values.
- Gradient descent is an optimization algorithm used to minimize the loss function.
- The gradient of the loss function with respect to the model's parameters is calculated.
- Simple models, often decision trees, are used as weak learners.
- These weak learners are trained to predict the residuals (the difference between the true values and the current predictions).
- The final prediction is a weighted sum of the predictions from all the weak learners.