Q1. What is Gradient Boosting Regression?

 Gradient Boosting Regression is a machine learning algorithm that belongs to the boosting family of algorithms. It involves the iterative construction of a series of decision tree models, each attempting to correct the mistakes of the previous one, by fitting the residual errors in the prediction. In other words, the algorithm builds the model in a stepwise fashion, where each new model tries to improve the accuracy of the previous model's predictions.

Q2. Implement a simple gradient boosting algorithm from scratch using Python and NumPy. Use a
simple regression problem as an example and train the model on a small dataset. Evaluate the model's
performance using metrics such as mean squared error and R-squared.

In [8]:
import numpy as np

class GradientBoostingRegressor:
    
    def __init__(self, n_trees=100, max_depth=3, learning_rate=0.1):
        self.n_trees = n_trees
        self.max_depth = max_depth
        self.learning_rate = learning_rate
        self.trees = []
        
    def fit(self, X, y):
        y_pred = np.full(np.shape(y), np.mean(y))
        for i in range(self.n_trees):
            tree = DecisionTreeRegressor(max_depth=self.max_depth)
            residuals = y - y_pred
            tree.fit(X, residuals)
            update = tree.predict(X) * self.learning_rate
            y_pred += update
            self.trees.append(tree)
            
    def predict(self, X):
        y_pred = np.zeros(len(X))
        for tree in self.trees:
            y_pred += tree.predict(X) * self.learning_rate
        return y_pred


In [10]:
from sklearn.datasets import make_regression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import train_test_split

# generate sample data
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1)

# split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# fit the gradient boosting model
gbr = GradientBoostingRegressor(n_trees=100, max_depth=3, learning_rate=0.1)
gbr.fit(X_train, y_train)

# make predictions on the test set
y_pred = gbr.predict(X_test)

# evaluate the performance of the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Mean squared error:", mse)
print("R-squared:", r2)


Q3. Experiment with different hyperparameters such as learning rate, number of trees, and tree depth to
optimise the performance of the model. Use grid search or random search to find the best
hyperparameters

Hyperparameters such as learning rate, number of trees, and tree depth can significantly affect the performance of the gradient boosting model. You can experiment with different hyperparameters to optimize the performance of the model. One way to do this is to use grid search or random search to find the best hyperparameters. Here's an example of how you can use grid search:

In [12]:
import numpy as np
from sklearn.datasets import load_boston
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import mean_squared_error, r2_score

# Load the Boston Housing dataset
data = load_boston()
X, y = data.data, data.target

# Define the hyperparameter space
param_grid = {
    'learning_rate': [0.01, 0.1, 1],
    'n_estimators': [50, 100, 200],
    'max_depth': [1, 2, 3, 4],
}

# Define the model
model = GradientBoostingRegressor()

# Define the grid search object
grid_search = GridSearchCV(model, param_grid, cv=5, scoring='neg_mean_squared_error')

# Fit the grid search object to the data
grid_search.fit(X, y)

# Print the best hyperparameters and corresponding mean squared error
print(f"Best hyperparameters: {grid_search.best_params_}")
print(f"Best MSE: {-grid_search.best_score_}")

# Evaluate the model on the test set using the best hyperparameters
best_model = grid_search.best_estimator_
y_pred = best_model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"Test MSE: {mse}")
print(f"Test R^2: {r2}")


Q4. What is a weak learner in Gradient Boosting?

 In Gradient Boosting, a weak learner is a simple machine learning model with low predictive power, such as a decision tree with a small number of splits. The idea is to build an ensemble of such weak learners that, when combined, can improve the overall performance of the model.

Q5. What is the intuition behind the Gradient Boosting algorithm?

The intuition behind Gradient Boosting is to iteratively fit a sequence of weak learners to the data, each one correcting the mistakes of the previous one. The idea is to build an ensemble of models that can capture complex patterns in the data by combining the individual strengths of each model.

Q6. How does Gradient Boosting algorithm build an ensemble of weak learners?

The Gradient Boosting algorithm builds an ensemble of weak learners by sequentially fitting a new model to the residuals of the previous model. At each iteration, the algorithm calculates the difference between the true target values and the predicted values of the current model. This difference, known as the residual, becomes the new target for the next model. The new model is then trained on the residuals, and its predictions are added to the predictions of the previous models, weighted by a learning rate parameter that controls the contribution of each model. This process is repeated until the desired number of models is reached or until the residuals converge to a minimum value.

Q7. What are the steps involved in constructing the mathematical intuition of Gradient Boosting
algorithm?

 The steps involved in constructing the mathematical intuition of Gradient Boosting algorithm are:

Initialize the model with a constant value, usually the mean of the target variable.

For each iteration:

a. Calculate the negative gradient of the loss function with respect to the current predictions.

b. Fit a weak learner, such as a decision tree, to the negative gradient residuals.

c. Multiply the predictions of the new model by a learning rate parameter and add them to the predictions of the previous models.

Repeat step 2 until the desired number of models is reached.

Make predictions by combining the predictions of all models in the ensemble.

Calculate the loss function on the predictions and the true targets to evaluate the performance of the model.