## Q1. What is Gradient Boosting Regression?


Gradient Boosting Regression is a popular machine learning algorithm for regression problems. It is an extension of the gradient boosting algorithm that is used for classification problems. In Gradient Boosting Regression, an ensemble of weak regression models is built sequentially, where each model is trained on the residuals of the previous model. The final prediction is obtained by summing the predictions of all the models in the ensemble.

## Q2. Implement a simple gradient boosting algorithm from scratch using Python and NumPy. Use a simple regression problem as an example and train the model on a small dataset. Evaluate the model's performance using metrics such as mean squared error and R-squared.


In [2]:
import numpy as np

class GradientBoostingRegressor:
    
    def __init__(self, n_estimators=100, learning_rate=0.1, max_depth=3):
        self.n_estimators = n_estimators
        self.learning_rate = learning_rate
        self.max_depth = max_depth
        self.estimators = []
        
    def fit(self, X, y):
        residuals = y
        for i in range(self.n_estimators):
            tree = DecisionTreeRegressor(max_depth=self.max_depth)
            tree.fit(X, residuals)
            predictions = tree.predict(X)
            residuals = residuals - self.learning_rate * predictions
            self.estimators.append(tree)
            
    def predict(self, X):
        predictions = np.zeros(len(X))
        for tree in self.estimators:
            predictions += self.learning_rate * tree.predict(X)
        return predictions

# Example usage
from sklearn.datasets import make_regression
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

X, y = make_regression(n_samples=100, n_features=10, noise=0.5)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

model = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, max_depth=3)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print('MSE:', mse)
print('R^2:', r2)


MSE: 10909.328331339491
R^2: 0.6476776030863363


## Q3. Experiment with different hyperparameters such as learning rate, number of trees, and tree depth to optimise the performance of the model. Use grid search or random search to find the best hyperparameters.


In [None]:
from sklearn.model_selection import GridSearchCV

param_grid = {
    'n_estimators': [50, 100, 150],
    'learning_rate': [0.05, 0.1, 0.2],
    'max_depth': [2, 3, 4]
}

model = GradientBoostingRegressor()

grid_search = GridSearchCV(model, param_grid, cv=5)
grid_search.fit(X_train, y_train)

best_params = grid_search.best_params_
print('Best params:', best_params)

model = GradientBoostingRegressor(**best_params)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print('MSE:', mse)
print('R^2:', r2)

## Q4. What is a weak learner in Gradient Boosting?


A weak learner in Gradient Boosting is a simple model that is designed to do slightly better than random guessing on the training set. In Gradient Boosting, the weak learners are typically decision trees with a small depth. These weak learners are trained sequentially, with each subsequent learner trying to correct the errors of the previous ones. The final prediction is obtained by summing the predictions of all the weak learners in the ensemble, each weighted by a scalar value.

## Q5. What is the intuition behind the Gradient Boosting algorithm?


The intuition behind the Gradient Boosting algorithm is to create an ensemble of weak learners that can together make strong predictions. The algorithm works by training each weak learner on the residual errors of the previous ones, where the residual errors are the differences between the true labels and the predictions made by the previous weak learners. The algorithm tries to iteratively reduce the residual errors by adding new weak learners to the ensemble, each one correcting the errors of the previous ones. By summing the predictions of all the weak learners in the ensemble, we obtain the final prediction.

## Q6. How does Gradient Boosting algorithm build an ensemble of weak learners?


The Gradient Boosting algorithm builds an ensemble of weak learners in the following steps:

- Initialize the ensemble with a weak learner, such as a decision tree with a small depth.
- Train the weak learner on the training data and make predictions.
- Calculate the residual errors as the differences between the true labels and the predictions made by the weak learner.
- Train a new weak learner on the residual errors and make predictions.
- Combine the predictions of the two weak learners by adding them up, each one weighted by a scalar value.
- Repeat steps 3 to 5 until the desired number of weak learners is reached.

## Q7. What are the steps involved in constructing the mathematical intuition of Gradient Boosting algorithm?

The mathematical intuition of the Gradient Boosting algorithm can be constructed in the following steps:

1. Let X be the input features, y be the true labels, and F be the model we want to learn.
2. Initialize the model with a constant value, such as the mean of y.
3. Compute the negative gradient of the loss function with respect to the current model, which represents the residual errors.
4. Train a weak learner, such as a decision tree, to fit the residual errors.
5. Add the weak learner to the model by multiplying its predictions by a small scalar value, called the learning rate, and adding them to the current model.
6. Repeat steps 3 to 5 until the desired number of weak learners is reached.
7. The final model is the sum of all the weak learners, each one weighted by the learning rate.


The steps above represent the intuition behind Gradient Boosting for regression. For classification problems, the same steps can be followed with some modifications, such as using a different loss function and modifying the way the weak learners are combined.