#### `Q1`. What is Gradient Boosting Regression?


Gradient Boosting Regression is a machine learning algorithm that builds an ensemble of decision trees to predict a numerical target variable. It works by iteratively adding decision trees to the model, each one correcting the errors made by the previous trees. The algorithm optimizes the model by minimizing the residual errors using gradient descent. The final model is a weighted combination of the individual trees, where each tree's weight depends on its contribution to reducing the overall error. The result is a powerful predictive model that can handle complex non-linear relationships between the input features and the target variable.

#### `Q2`. Implement a simple gradient boosting algorithm from scratch using Python and NumPy. Use a simple regression problem as an example and train the model on a small dataset. Evaluate the model's performance using metrics such as mean squared error and R-squared.


In [1]:
import numpy as np
from sklearn.tree import DecisionTreeRegressor

import warnings
warnings.filterwarnings('ignore')

class GradientBoostingRegressor:
    
    def __init__(self, n_estimators=100, learning_rate=0.1, max_depth=3):
        self.n_estimators = n_estimators
        self.learning_rate = learning_rate
        self.max_depth = max_depth
        self.trees = []
        self.residuals = []
        
    def fit(self, X, y):
        # Initialize the residuals as the difference between the true values and the mean
        self.residuals = y - np.mean(y)
        # Fit a tree for each estimator
        for i in range(self.n_estimators):
            tree = DecisionTreeRegressor(max_depth=self.max_depth)
            tree.fit(X, self.residuals)
            # Predict the residuals and update them
            residuals_pred = tree.predict(X)
            self.residuals -= self.learning_rate * residuals_pred
            # Add the tree to the list of trees
            self.trees.append(tree)
    
    def predict(self, X):
        # Predict the residuals for each tree and sum them up
        residuals_pred = np.sum(tree.predict(X) for tree in self.trees)
        # Return the sum of the residuals and the mean
        return np.mean(y) + self.learning_rate * residuals_pred

# Generate some random data
X = np.random.rand(100, 3)
y = np.sum(X, axis=1) + np.random.normal(scale=0.1, size=100)

# Split the data into training and testing sets
X_train, X_test = X[:80], X[80:]
y_train, y_test = y[:80], y[80:]

# Train the gradient boosting regressor on the training set
gbr = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, max_depth=3)
gbr.fit(X_train, y_train)

# Make predictions on the testing set
y_pred = gbr.predict(X_test)

# Evaluate the model using mean squared error and R-squared
mse = np.mean((y_pred - y_test)**2)
r2 = 1 - mse / np.var(y_test)

print("Mean squared error: {:.3f}".format(mse))
print("R-squared: {:.3f}".format(r2))

Mean squared error: 0.011
R-squared: 0.949


#### `Q3`. Experiment with different hyperparameters such as learning rate, number of trees, and tree depth to optimise the performance of the model. Use grid search or random search to find the best hyperparameters


In [2]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.ensemble import GradientBoostingRegressor

In [3]:
cancer = load_breast_cancer()
X,y = cancer.data , cancer.target

In [4]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [5]:
param_grid = {
    'n_estimators': [50, 100, 150],
    'learning_rate': [0.01, 0.1, 0.5],
    'max_depth': [3, 4, 5]
}

In [6]:
gbr = GradientBoostingRegressor(random_state=42)

grid_search = GridSearchCV(gbr, param_grid, cv=5, scoring='neg_mean_squared_error', n_jobs=-1)
grid_search.fit(X_train, y_train)

In [7]:
print("Best hyperparameters: ", grid_search.best_params_)

Best hyperparameters:  {'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 50}


In [8]:
y_pred = grid_search.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print("Mean squared error: {:.3f}".format(mse))
print("R-squared: {:.3f}".format(r2))

Mean squared error: 0.031
R-squared: 0.866


#### `Q4`. What is a weak learner in Gradient Boosting?


In Gradient Boosting, a weak learner is a decision tree that has limited predictive power and is typically constructed with a small number of shallow nodes. The goal of the Gradient Boosting algorithm is to iteratively improve upon the weak learner by adding additional trees that correct the errors of the previous trees. The final model is a weighted sum of the individual trees, where each tree's weight is determined by its contribution to the overall prediction accuracy. The use of weak learners allows the algorithm to focus on the areas of the data that are difficult to predict, while avoiding overfitting to the training data.

#### `Q5`. What is the intuition behind the Gradient Boosting algorithm?


The intuition behind the Gradient Boosting algorithm is to create a strong predictive model by iteratively adding simple, weak models to the ensemble, with each new model trained to correct the errors of the previous ones.

At each step, the algorithm calculates the gradient of the loss function with respect to the predicted values and uses this gradient to update the weights of the weak learners, which are then combined into a stronger ensemble model.

This process continues until a stopping criterion is met, such as when the error on the validation set stops improving or when a maximum number of iterations is reached. The result is a powerful and flexible machine learning model that can handle complex relationships between the input features and the target variable, and that can generalize well to new data.

#### `Q6`. How does Gradient Boosting algorithm build an ensemble of weak learners?


The Gradient Boosting algorithm builds an ensemble of weak learners in a sequential manner. At each step of the algorithm, a new weak learner is added to the ensemble and its predictions are combined with the predictions of the previous weak learners to improve the overall prediction accuracy.

The algorithm works as follows:

- Initialize the ensemble with a constant value, such as the mean or median of the target variable.

- Train a weak learner, such as a decision tree, on the training data. The weak learner is trained to minimize the loss function between the predicted values and the actual values.

- Calculate the residuals between the predicted values and the actual values. These residuals represent the errors made by the current ensemble.

- Train a new weak learner on the residuals. This new learner is trained to predict the residuals, so that it can correct the errors made by the previous learner.

- Combine the predictions of the new learner with the predictions of the previous learners. This is done by adding the new learner's predictions to the predictions of the previous learners, with a weighting factor that determines the contribution of each learner to the final prediction.

- Repeat steps 3 to 5 for a predetermined number of iterations or until a certain level of accuracy is achieved.

#### `Q7`. What are the steps involved in constructing the mathematical intuition of Gradient Boosting algorithm?

he mathematical intuition behind the Gradient Boosting algorithm can be broken down into the following steps:

* Define a loss function that measures the difference between the predicted values and the true values of the target variable.

* Initialize the model with a simple weak learner, such as a decision tree with few levels.

* Calculate the negative gradient of the loss function with respect to the predicted values, which represents the direction and magnitude of the error.

* Fit a new weak learner to the negative gradient, so that it corrects the errors made by the previous model.

* Update the weights of the weak learners by applying a gradient descent optimization algorithm to minimize the loss function.

* Combine the individual trees into an ensemble model by taking a weighted sum of their predictions, with each tree's weight determined by its contribution to reducing the overall error.

* Repeat steps 3-6 for a fixed number of iterations or until the error on the validation set stops improving.