**Q1. What is Gradient Boosting Regression?**

Gradient Boosting Regression is a machine learning technique that builds an ensemble of weak regression models, typically decision trees, in a sequential manner. Unlike traditional boosting methods that focus on correcting errors by adjusting weights, gradient boosting fits each new model to the residual errors of the ensemble's predictions from the previous models. This process continues iteratively, with each new model aiming to reduce the errors made by the ensemble of previous models.

---

**Q2. Implement a simple gradient boosting algorithm from scratch using Python and NumPy. Use a simple regression problem as an example and train the model on a small dataset. Evaluate the model's performance using metrics such as mean squared error and R-squared.**


In [6]:
import numpy as np
from sklearn.tree import DecisionTreeRegressor

X = np.array([[1], [2], [3], [4], [5]])
y = np.array([2, 3, 4, 5, 6])

class GradientBoostingRegressor:
    def __init__(self, n_estimators=100, learning_rate=0.1, max_depth=1):
        self.n_estimators = n_estimators
        self.learning_rate = learning_rate
        self.max_depth = max_depth
        self.estimators = []

    def fit(self, X, y):
        predictions = np.full_like(y, np.mean(y), dtype=np.float64)

        for _ in range(self.n_estimators):
            # Calculate residuals
            residuals = y - predictions

            tree = DecisionTreeRegressor(max_depth=self.max_depth)
            tree.fit(X, residuals)

            predictions += self.learning_rate * tree.predict(X)

            self.estimators.append(tree)

    def predict(self, X):

        predictions = np.zeros(len(X), dtype=np.float64)


        for tree in self.estimators:
            predictions += self.learning_rate * tree.predict(X)

        return predictions


gb_regressor = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, max_depth=3)
gb_regressor.fit(X, y)

y_pred = gb_regressor.predict(X)

mse = np.mean((y - y_pred) ** 2)
r_squared = 1 - (np.sum((y - y_pred) ** 2) / np.sum((y - np.mean(y)) ** 2))

print("Mean Squared Error:", mse)
print("R-squared:", r_squared)


Mean Squared Error: 16.000000001411017
R-squared: -7.0000000007055085


---

**Q3. Experiment with different hyperparameters such as learning rate, number of trees, and tree depth to optimize the performance of the model. Use grid search or random search to find the best hyperparameters.**

In [7]:
import numpy as np
from sklearn.tree import DecisionTreeRegressor
from sklearn.base import BaseEstimator, RegressorMixin
from sklearn.model_selection import GridSearchCV

X = np.array([[1], [2], [3], [4], [5]])
y = np.array([2, 3, 4, 5, 6])

class GradientBoostingRegressor(BaseEstimator, RegressorMixin):
    def __init__(self, n_estimators=100, learning_rate=0.1, max_depth=1):
        self.n_estimators = n_estimators
        self.learning_rate = learning_rate
        self.max_depth = max_depth
        self.estimators = []

    def fit(self, X, y):
        predictions = np.full_like(y, np.mean(y), dtype=np.float64)


        for _ in range(self.n_estimators):
            residuals = y - predictions

            tree = DecisionTreeRegressor(max_depth=self.max_depth)
            tree.fit(X, residuals)
            predictions += self.learning_rate * tree.predict(X)

            self.estimators.append(tree)

        return self

    def predict(self, X):
        predictions = np.zeros(len(X), dtype=np.float64)

        for tree in self.estimators:
            predictions += self.learning_rate * tree.predict(X)

        return predictions

    def get_params(self, deep=True):
        return {"n_estimators": self.n_estimators,
                "learning_rate": self.learning_rate,
                "max_depth": self.max_depth}

    def set_params(self, **parameters):
        for parameter, value in parameters.items():
            setattr(self, parameter, value)
        return self

param_grid = {
    'n_estimators': [50, 100, 150],
    'learning_rate': [0.01, 0.1, 0.2],
    'max_depth': [1, 2, 3]
}

gb_regressor = GradientBoostingRegressor()

grid_search = GridSearchCV(gb_regressor, param_grid, cv=3, scoring='neg_mean_squared_error')
grid_search.fit(X, y)

best_params = grid_search.best_params_
print("Best Hyperparameters:", best_params)

best_model = grid_search.best_estimator_
y_pred = best_model.predict(X)
mse = np.mean((y - y_pred) ** 2)
r_squared = 1 - (np.sum((y - y_pred) ** 2) / np.sum((y - np.mean(y)) ** 2))

print("Best Model Mean Squared Error:", mse)
print("Best Model R-squared:", r_squared)


Best Hyperparameters: {'learning_rate': 0.2, 'max_depth': 2, 'n_estimators': 150}
Best Model Mean Squared Error: 15.999999999999991
Best Model R-squared: -6.999999999999996


---

**Q4. What is a weak learner in Gradient Boosting?**

A weak learner in Gradient Boosting is a simple predictive model that performs slightly better than random guessing on a given task. In the context of regression, weak learners are typically shallow decision trees with limited depth. These weak learners are iteratively combined to form a strong predictive model through boosting, where each subsequent weak learner is trained to correct the errors made by the ensemble of previous weak learners.

---
**Q5. What is the intuition behind the Gradient Boosting algorithm?**

The intuition behind the Gradient Boosting algorithm is to iteratively improve the model's predictive performance by focusing on the mistakes made by the ensemble of weak learners. It achieves this by fitting each new weak learner to the residuals (errors) of the previous model's predictions. By sequentially reducing the residuals, Gradient Boosting constructs an ensemble of weak learners that collectively form a strong predictive model.

---

**Q6. How does the Gradient Boosting algorithm build an ensemble of weak learners?**

The Gradient Boosting algorithm builds an ensemble of weak learners in a sequential manner. It starts by fitting an initial weak learner to the data and then fits subsequent weak learners to the residuals (errors) of the ensemble's predictions from the previous models. Each weak learner is trained to minimize the loss function with respect to the residuals, effectively correcting the errors made by the ensemble of previous weak learners.

---

**Q7. What are the steps involved in constructing the mathematical intuition of Gradient Boosting algorithm?**

The mathematical intuition of the Gradient Boosting algorithm involves the following steps:

Initialize the ensemble's predictions with a constant value (e.g., mean of target variable).
Calculate the residuals (errors) between the actual target values and the ensemble's predictions.
Fit a weak learner (e.g., decision tree) to the residuals, aiming to minimize the loss function.
Update the ensemble's predictions by adding a scaled version of the weak learner's predictions.
Repeat steps 2-4 until a stopping criterion is met (e.g., maximum number of iterations reached or no significant improvement in performance).

---