### Q1. What is Gradient Boosting Regression?

Gradient Boosting Regression is a machine learning algorithm that belongs to the family of boosting algorithms. It is used for solving regression problems, i.e., problems where the goal is to predict a continuous numerical value, such as the price of a house or the temperature of a city.

The Gradient Boosting Regression algorithm works by combining multiple weak regression models to create a stronger model. The algorithm starts by fitting an initial model to the data, which can be a simple model such as a decision tree with a small depth. The algorithm then iteratively adds new weak models to the ensemble, where each new model is trained to correct the errors of the previous model.

In each iteration, the algorithm calculates the gradient of the loss function with respect to the predicted values of the previous model. The gradient is then used to fit a new regression model to the residuals, i.e., the difference between the predicted values of the previous model and the true values. The new model is then added to the ensemble with a weight that is determined by a learning rate hyperparameter.

The final prediction of the Gradient Boosting Regression algorithm is a weighted sum of the predictions of all weak models in the ensemble. The weights are determined by the performance of each weak model on the training data.

The key benefits of Gradient Boosting Regression are its ability to handle non-linear and complex relationships between the features and the target variable, and its robustness to noise and outliers in the data. However, the algorithm can be sensitive to overfitting and may require careful tuning of the hyperparameters, such as the learning rate, the number of weak models, and their complexity.

### Q2. Implement a simple gradient boosting algorithm from scratch using Python and NumPy. Use a simple regression problem as an example and train the model on a small dataset. Evaluate the model's performance using metrics such as mean squared error and R-squared.

In [35]:
import numpy as np
from sklearn.base import BaseEstimator, RegressorMixin
from sklearn.datasets import make_regression
from sklearn.metrics import mean_squared_error, r2_score

class GradientBoostingRegressor(BaseEstimator,RegressorMixin):
    def __init__(self, n_estimators=100, learning_rate=0.1, max_depth=3):
        self.n_estimators = n_estimators
        self.learning_rate = learning_rate
        self.max_depth = max_depth
        self.trees = []
        self.intercept = None

    def fit(self, X, y):
        self.intercept = np.mean(y)
        f = np.full_like(y, self.intercept)
        for i in range(self.n_estimators):
            r = y - f
            tree = DecisionTreeRegressor(max_depth=self.max_depth)
            tree.fit(X, r)
            self.trees.append(tree)
            f += self.learning_rate * tree.predict(X)

    def predict(self, X):
        f = np.full(X.shape[0], self.intercept)
        for tree in self.trees:
            f += self.learning_rate * tree.predict(X)
        return f

# Generate a simple regression dataset
X, y = make_regression(n_samples=100, n_features=10, noise=0.2, random_state=42)

# Split the dataset into training and testing sets
train_X, train_y = X[:80], y[:80]
test_X, test_y = X[80:], y[80:]

# Train the gradient boosting model
model = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, max_depth=3)
model.fit(train_X, train_y)

# Evaluate the model's performance on the testing set
y_pred = model.predict(test_X)
mse = mean_squared_error(test_y, y_pred)
r2 = r2_score(test_y, y_pred)
print("Mean Squared Error:", mse)
print("R-Squared Score:", r2)


Mean Squared Error: 7191.08013677331
R-Squared Score: 0.5590719366306647


### Q3. Experiment with different hyperparameters such as learning rate, number of trees, and tree depth to optimise the performance of the model. Use grid search or random search to find the best hyperparameters

In [36]:
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import mean_squared_error

# Define a parameter grid
param_grid = {
    'n_estimators': [50, 100, 150],
    'learning_rate': [0.01, 0.1, 0.5],
    'max_depth': [3, 4, 5]
}

# Create a grid search object with MSE as the scoring metric
grid_search = GridSearchCV(GradientBoostingRegressor(), param_grid, cv=5, scoring='neg_mean_squared_error')

# Fit the grid search object to the training data
grid_search.fit(train_X, train_y)

# Print the best hyperparameters and corresponding mean test score
print("Best Hyperparameters:", grid_search.best_params_)
print("Best Mean Test Score:", -grid_search.best_score_)


Best Hyperparameters: {'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 150}
Best Mean Test Score: 9944.879791365898


### Q4. What is a weak learner in Gradient Boosting?

In Gradient Boosting, a weak learner is a simple and relatively weak predictive model that is used as the base model in an ensemble of models. Specifically, a weak learner is a model that is only slightly better than random guessing on the training data. In Gradient Boosting, the weak learner is typically a decision tree with a very small depth or number of leaf nodes.

The idea behind Gradient Boosting is to iteratively improve the predictions of the weak learner by fitting each successive model to the negative gradient of the loss function with respect to the predicted values of the previous model. By doing this, the subsequent models focus on the areas where the previous models performed poorly, and thus the ensemble model gradually improves its overall predictive accuracy.

### Q5. What is the intuition behind the Gradient Boosting algorithm?

The Gradient Boosting algorithm is a machine learning technique for building powerful predictive models. The intuition behind the algorithm is to combine several weak learners (often decision trees) into a strong learner that makes accurate predictions.

The algorithm works by fitting the weak learners sequentially to the residuals of the previous weak learner. In other words, it first fits a weak learner to the data and then calculates the difference between the predicted values and the true values. The next weak learner is then fit to these residuals, and the process continues until a specified stopping criterion is reached (such as a maximum number of trees or a minimum improvement in performance).

The key idea behind this approach is that the subsequent weak learners are trained to correct the errors made by the previous weak learners, leading to a strong ensemble model that is better than any of the individual weak learners.

The name "Gradient Boosting" comes from the fact that the algorithm uses gradient descent optimization to minimize the loss function of the model. In each iteration, the algorithm calculates the negative gradient of the loss function with respect to the predicted values, and fits a weak learner to these gradients. This approach allows the model to learn from its mistakes and improve over time, leading to a highly accurate predictive model.

### Q6. How does Gradient Boosting algorithm build an ensemble of weak learners?

The Gradient Boosting algorithm builds an ensemble of weak learners by iteratively adding new weak learners to the model. At each iteration, the algorithm fits a new weak learner to the residual errors of the current model, instead of fitting the weak learner to the original target variable. This means that each new weak learner focuses on the errors made by the current model, and tries to correct these errors.

The new weak learner is then added to the current model by multiplying its predictions by a learning rate, and adding the resulting scaled predictions to the predictions of the current model. The learning rate controls the contribution of each new weak learner to the final model, and is typically a small value between 0.01 and 0.1.

The process is repeated until a predetermined number of weak learners has been added to the model, or until the training error stops improving. The final model is then the sum of all the individual weak learners, each scaled by its corresponding learning rate. The result is a powerful model that can capture complex non-linear relationships between the input features and the target variable.






### Q7. What are the steps involved in constructing the mathematical intuition of Gradient Boosting algorithm?

The mathematical intuition of Gradient Boosting algorithm can be broken down into the following steps:

1. Define a loss function: The first step in building a Gradient Boosting model is to define a loss function that measures the difference between the predicted values and the actual values. In regression problems, the commonly used loss function is mean squared error, while for classification problems, cross-entropy loss or exponential loss is often used.

2. Fit the first model: The next step is to fit the first model, usually a simple one like a decision tree, to the training data. The predicted values from this model are used as the starting point for the Gradient Boosting algorithm.

3. Compute the residuals: The difference between the predicted values and the actual values are computed and treated as the residuals. The objective of Gradient Boosting is to fit a model to these residuals that can correct the errors made by the first model.

4. Fit subsequent models to the residuals: A new model is fit to the residuals, and the predicted values from this model are added to the predictions from the previous model. This process continues until a predefined stopping criteria are met.

5. Tune the hyperparameters: The hyperparameters of the Gradient Boosting algorithm, such as the learning rate, the number of trees, and the depth of each tree, are tuned to achieve the best performance on the validation data.

6. Make predictions: The final step is to make predictions using the ensemble of models. The predictions are made by summing up the predictions from all the models in the ensemble.