### Q1. What is Gradient Boosting Regression?

Gradient Boosting Regression is a machine learning technique that builds a predictive model in the form of an ensemble of weak predictive models, typically decision trees. It is used for regression problems, aiming to predict a continuous target variable. Gradient Boosting builds the model sequentially by correcting the errors made by the previous models, optimizing a specified loss function.

### Q2. Implement a simple gradient boosting algorithm from scratch using Python and NumPy. Use a simple regression problem as an example and train the model on a small dataset. Evaluate the model's performance using metrics such as mean squared error and R-squared.

In [2]:
import numpy as np
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split

# Generate a small dataset for regression
np.random.seed(42)
X = np.sort(5 * np.random.rand(80, 1), axis=0)
y = np.sin(X).ravel() + np.random.normal(0, 0.1, X.shape[0])

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Gradient Boosting Regression class
class GradientBoostingRegressor:
    def __init__(self, n_estimators=100, learning_rate=0.1, max_depth=3):
        self.n_estimators = n_estimators
        self.learning_rate = learning_rate
        self.max_depth = max_depth
        self.models = []

    def fit(self, X, y):
        predictions = np.zeros_like(y)
        for _ in range(self.n_estimators):
            residuals = y - predictions
            tree = DecisionTreeRegressor(max_depth=self.max_depth)
            tree.fit(X, residuals)
            update = self.learning_rate * tree.predict(X)
            predictions += update
            self.models.append(tree)

    def predict(self, X):
        predictions = np.zeros(X.shape[0])
        for tree in self.models:
            predictions += self.learning_rate * tree.predict(X)
        return predictions

# Train the Gradient Boosting model
gb_model = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, max_depth=3)
gb_model.fit(X_train, y_train)

# Evaluate the model
y_pred = gb_model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error: {mse}")
print(f"R-squared: {r2}")

Mean Squared Error: 0.007864677608425885
R-squared: 0.9828172820205644


### Q3. Experiment with different hyperparameters such as learning rate, number of trees, and tree depth to optimise the performance of the model. Use grid search or random search to find the best hyperparameters

In [5]:
from sklearn.model_selection import GridSearchCV

# Create the Gradient Boosting regressor
gb_model = GradientBoostingRegressor()

# Define the parameter grid
param_grid = {
    'n_estimators': [50, 100, 150],
    'learning_rate': [0.05, 0.1, 0.2],
    'max_depth': [2, 3, 4]
}

# Perform grid search
grid_search = GridSearchCV(estimator=gb_model, param_grid=param_grid, scoring='neg_mean_squared_error', cv=5)
grid_search.fit(X_train, y_train)

# Get the best parameters
best_params = grid_search.best_params_

# Train the model with the best parameters
best_gb_model = GradientBoostingRegressor(**best_params)
best_gb_model.fit(X_train, y_train)

# Evaluate the model
y_pred_best = best_gb_model.predict(X_test)
mse_best = mean_squared_error(y_test, y_pred_best)
r2_best = r2_score(y_test, y_pred_best)

print(f"Best hyperparameters: {best_params}")
print(f"Best Mean Squared Error: {mse_best}")
print(f"Best R-squared: {r2_best}")


Best hyperparameters: {'learning_rate': 0.1, 'max_depth': 2, 'n_estimators': 50}
Best Mean Squared Error: 0.011806382097885416
Best R-squared: 0.97420546091704


### Q4. What is a weak learner in Gradient Boosting?

In the context of Gradient Boosting, a weak learner is a model that performs slightly better than random chance on a given problem. Typically, decision trees with a shallow depth are used as weak learners in Gradient Boosting. These weak learners are also often referred to as "base learners" or "base models." The key characteristic of a weak learner is that it should be better than random guessing, even if only marginally so.

### Q5. What is the intuition behind the Gradient Boosting algorithm?

The intuition behind the Gradient Boosting algorithm is to sequentially improve the performance of a model by focusing on the mistakes made by the previous models. It combines the predictions of weak learners to create a strong learner. At each iteration, a new weak learner is trained to correct the errors of the combined model. The learning process is guided by the gradient of the loss function, aiming to minimize the difference between the predicted values and the actual target values.

### Q6. How does Gradient Boosting algorithm build an ensemble of weak learners?

The Gradient Boosting algorithm builds an ensemble of weak learners through an iterative process. At each iteration, a new weak learner is trained on the residuals (the differences between the actual and predicted values) of the combined model from the previous iterations. The new learner is then added to the ensemble with a certain weight, and the process repeats. The weights of the weak learners are determined based on their ability to correct the errors made by the previous models.

### Q7. What are the steps involved in constructing the mathematical intuition of Gradient Boosting algorithm?

The mathematical intuition of the Gradient Boosting algorithm can be explained through the following steps:

1. Initialize the model: Start with an initial prediction, often the mean of the target variable.

2. Compute residuals: Calculate the residuals by subtracting the predicted values from the actual target values.

3. Train a weak learner: Train a weak learner (e.g., decision tree) on the residuals. The weak learner should fit the data, focusing on the instances where the model has made errors.

4. Compute the learning rate weighted update: Multiply the predictions of the weak learner by a small learning rate (a hyperparameter that controls the contribution of each weak learner). This scaled prediction is then added to the ensemble.

5. Update the residuals: Subtract the scaled predictions from the residuals to create new residuals.

6. Repeat: Repeat steps 3-5 for a specified number of iterations or until a convergence criterion is met.

7. Combine weak learners: The final model is the sum of all the weak learners, each multiplied by its learning rate.

The algorithm aims to minimize a loss function (e.g., mean squared error) by iteratively adjusting the predictions using the gradients of the loss with respect to the predictions. The process results in a powerful ensemble model that can capture complex relationships in the data.