In [1]:
#1. What is Gradient Boosting Regression?

#Ans

#Gradient Boosting Regression is a machine learning algorithm that combines the principles of gradient boosting with regression tasks. It is a powerful technique for building predictive models by iteratively adding weak learners (usually decision trees) to an ensemble. In each iteration, the model is trained to correct the errors made by the previous models. Gradient Boosting Regression optimizes a loss function by minimizing the gradients of the loss with respect to the predictions.

In [2]:
#2. Implement a simple gradient boosting algorithm from scratch using Python and NumPy. Use a simple regression problem as an example and train the model on a small dataset. Evaluate the model's performance using metrics such as mean squared error and R-squared.

#Ans

import numpy as np

class GradientBoostingRegressor:
    def __init__(self, n_estimators, learning_rate):
        self.n_estimators = n_estimators
        self.learning_rate = learning_rate
        self.models = []
        self.residuals = []

    def fit(self, X, y):
        self.models = []
        self.residuals = y.copy()

        for _ in range(self.n_estimators):
            tree = DecisionTreeRegressor(max_depth=3)
            tree.fit(X, self.residuals)
            self.models.append(tree)

            # Update residuals
            predictions = tree.predict(X)
            self.residuals -= self.learning_rate * predictions

    def predict(self, X):
        predictions = np.zeros(len(X))
        for model in self.models:
            predictions += self.learning_rate * model.predict(X)
        return predictions

# Example usage
from sklearn.datasets import make_regression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split

# Generate a small regression dataset
X, y = make_regression(n_samples=100, n_features=1, noise=0.2, random_state=42)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train the gradient boosting regressor
n_estimators = 100
learning_rate = 0.1
gb_regressor = GradientBoostingRegressor(n_estimators=n_estimators, learning_rate=learning_rate)
gb_regressor.fit(X_train, y_train)

# Make predictions on the test set
y_pred = gb_regressor.predict(X_test)

# Evaluate the model's performance
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Mean Squared Error:", mse)
print("R-squared:", r2)

Mean Squared Error: 1.4726355328014227
R-squared: 0.9989452132122804


In [8]:
#3. Experiment with different hyperparameters such as learning rate, number of trees, and tree depth to optimise the performance of the model. Use grid search or random search to find the best hyperparameters

#Ans

import numpy as np
from sklearn.datasets import make_regression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.tree import DecisionTreeRegressor
from sklearn.base import BaseEstimator

class GradientBoostingRegressor(BaseEstimator):
    def __init__(self, n_estimators, learning_rate, max_depth):
        self.n_estimators = n_estimators
        self.learning_rate = learning_rate
        self.max_depth = max_depth
        self.models = []
        self.residuals = []

    def fit(self, X, y):
        self.models = []
        self.residuals = y.copy()

        for _ in range(self.n_estimators):
            tree = DecisionTreeRegressor(max_depth=self.max_depth)
            tree.fit(X, self.residuals)
            self.models.append(tree)

            # Update residuals
            predictions = tree.predict(X)
            self.residuals -= self.learning_rate * predictions

    def predict(self, X):
        predictions = np.zeros(len(X))
        for model in self.models:
            predictions += self.learning_rate * model.predict(X)
        return predictions
    
    def score(self, X, y):
        predictions = self.predict(X)
        return r2_score(y, predictions)
    
    def get_params(self, deep=True):
        return {
            'n_estimators': self.n_estimators,
            'learning_rate': self.learning_rate,
            'max_depth': self.max_depth
        }

# Generate a small regression dataset
X, y = make_regression(n_samples=100, n_features=1, noise=0.2, random_state=42)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define the parameter grid for grid search
param_grid = {
    'n_estimators': [50, 100, 200],
    'learning_rate': [0.01, 0.1, 0.5],
    'max_depth': [3, 5, None]
}

# Create the gradient boosting regressor
gb_regressor = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, max_depth=3)

# Perform grid search to find the best hyperparameters
grid_search = GridSearchCV(gb_regressor, param_grid, cv=3)
grid_search.fit(X_train, y_train)

# Get the best hyperparameters
best_params = grid_search.best_params_
print("Best Hyperparameters:", best_params)

# Train a new model with the best hyperparameters
best_gb_regressor = GradientBoostingRegressor(**best_params)
best_gb_regressor.fit(X_train, y_train)

# Make predictions on the test set
y_pred = best_gb_regressor.predict(X_test)

# Evaluate the performance of the best model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Mean Squared Error (Best Model):", mse)
print("R-squared (Best Model):", r2)

Best Hyperparameters: {'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 100}
Mean Squared Error (Best Model): 1.4726355328014227
R-squared (Best Model): 0.9989452132122804


In [9]:
#4. What is a weak learner in Gradient Boosting?

#Ans

#In Gradient Boosting, a weak learner refers to a model or hypothesis that performs slightly better than random guessing. It is typically a simple and relatively low-complexity model, such as a decision tree with limited depth. The weak learner's simplicity allows it to capture only the most obvious patterns in the data, and its limitations make it prone to high bias. However, when combined with other weak learners in an ensemble, their collective strength can produce a powerful predictive model.

In [10]:
#5. What is the intuition behind the Gradient Boosting algorithm?

#Ans

#The intuition behind the Gradient Boosting algorithm is to sequentially add models (weak learners) to an ensemble in a way that each new model focuses on correcting the mistakes made by the previous models. At each iteration, the algorithm trains a new weak learner to predict the residuals (the differences between the target values and the current ensemble's predictions). By repeatedly fitting models to the residuals, the ensemble gradually improves its ability to capture complex patterns and reduce the overall error. The final ensemble is the combination of all weak learners weighted by a learning rate, which determines the contribution of each model.

In [11]:
#6. How does Gradient Boosting algorithm build an ensemble of weak learners?

#Ans

#The Gradient Boosting algorithm builds an ensemble of weak learners in the following steps:

#1 - Initialize the ensemble by setting the initial predictions to the average of the target values.

#2 - For a fixed number of iterations (or until a stopping criterion is met):
#a. Compute the residuals by subtracting the current ensemble's predictions from the target values.
#b. Train a weak learner (e.g., decision tree) to predict the residuals.
#c. Update the ensemble by adding the weak learner's predictions, weighted by a learning rate.

#3Repeat steps 2a to 2c, iteratively refining the ensemble's predictions by focusing on the remaining errors.

#4The final ensemble is the combination of all weak learners' predictions, weighted by their corresponding learning rates.

In [12]:
#7. What are the steps involved in constructing the mathematical intuition of Gradient Boosting algorithm?

#Ans

#The steps involved in constructing the mathematical intuition of the Gradient Boosting algorithm are as follows:

#1 - Given a dataset with input features X and target values y, initialize the ensemble's predictions as the average of y.

#2 - For each iteration (t = 1 to T, where T is the total number of iterations):
#a. Compute the negative gradient of the loss function with respect to the current ensemble's predictions. This gradient represents the residuals or errors to be corrected in the next iteration.
#b. Train a weak learner (e.g., decision tree) to predict the negative gradient, using X as input and the negative gradient as the target.
#c. Determine the learning rate (η) and the contribution of the weak learner to the ensemble. This is typically done through optimization techniques like line search or fixed step size.
#d. Update the ensemble's predictions by adding the weak learner's predictions, scaled by the learning rate. The ensemble's predictions are now improved and reflect the corrected errors.

#3 - Repeat steps 2a to 2d until the desired number of iterations is reached.

#4 - The final ensemble is the sum of the initial predictions and the weighted contributions of all weak learners, forming a more accurate prediction model than any of the weak learners individually.