## Ques 1:

### Ans: Gradient Boosting Regression is a machine learning algorithm used for regression problems, which involves predicting a continuous output value. It is a boosting algorithm that combines multiple weak regression models to create a strong regression model. The algorithm works by iteratively adding weak models to the ensemble, each model correcting the errors of the previous ones.
### The key idea behind Gradient Boosting Regression is to fit each new model to the residual errors of the previous models, which are the differences between the predicted and the actual output values. In each iteration, the algorithm trains a new weak regression model to predict the residual errors of the previous models, and adds it to the ensemble. The final prediction of the ensemble is obtained by adding up the predictions of all the weak models, weighted by their learning rate, which controls the contribution of each model to the final prediction.

## Ques 2:

### Ans:

In [3]:
import numpy as np
from sklearn.datasets import make_regression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.tree import DecisionTreeRegressor
# Generate a small regression dataset
X, y = make_regression(n_samples=100, n_features=5, noise=0.1, random_state=42)

# Split the dataset into training and testing sets
split = int(len(X)*0.8)
X_train, X_test, y_train, y_test = X[:split], X[split:], y[:split], y[split:]

class GradientBoostingRegressor:
    def __init__(self, n_estimators=100, learning_rate=0.1, max_depth=3):
        self.n_estimators = n_estimators
        self.learning_rate = learning_rate
        self.max_depth = max_depth
        self.models = []
        self.weights = []

    def fit(self, X, y):
        # Initialize the predictions to the mean of the target variable
        self.mean = np.mean(y)
        self.predictions = np.full(len(y), self.mean)

        for i in range(self.n_estimators):
            # Calculate the residual errors
            residuals = y - self.predictions

            # Fit a decision tree to the residuals
            tree = DecisionTreeRegressor(max_depth=self.max_depth)
            tree.fit(X, residuals)

            # Add the tree to the ensemble
            self.models.append(tree)
            self.weights.append(self.learning_rate)

            # Update the predictions by adding the weighted predictions of the new tree
            self.predictions += self.learning_rate * tree.predict(X)

    def predict(self, X):
        # Make predictions by adding up the weighted predictions of all trees
        y_pred = np.full(len(X), self.mean)
        for i, tree in enumerate(self.models):
            y_pred += self.weights[i] * tree.predict(X)
        return y_pred

# Train a gradient boosting regression model
model = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, max_depth=3)
model.fit(X_train, y_train)

# Make predictions on the testing set
y_pred = model.predict(X_test)

# Evaluate the model's performance
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Mean Squared Error:", mse)
print("R-squared:", r2)

Mean Squared Error: 1017.9179834907184
R-squared: 0.9074439095441444


## Ques 3:

### Ans: 

In [None]:
from sklearn.model_selection import GridSearchCV

param_grid = {
    'n_estimators': [50, 100, 200],
    'learning_rate': [0.01, 0.1, 1],
    'max_depth': [2, 3, 4]
}

grid_search = GridSearchCV(
    estimator=GradientBoostingRegressor(),
    param_grid=param_grid,
    cv=5,
    scoring='neg_mean_squared_error',
    n_jobs=-1
)

grid_search.fit(X_train, y_train)

print("Best Hyperparameters:", grid_search.best_params_)
print("Best Mean Squared Error:", grid_search.best_score_)

## Ques 4:

### Ans: In Gradient Boosting, a weak learner is a model that performs slightly better than random guessing. Typically, decision trees with a small number of nodes are used as weak learners in Gradient Boosting. The idea is to train an ensemble of weak learners sequentially, where each weak learner is trained to improve the errors made by the previous weak learners in the ensemble.
### Each weak learner is fit on the residuals of the previous predictions, so it focuses on the examples that were poorly predicted by the previous weak learners. By iteratively adding new weak learners to the ensemble, the model gradually learns to make better predictions on the training data.
### The strength of Gradient Boosting comes from the ability to combine many weak learners into a powerful ensemble model that can capture complex relationships between the features and the target variable.

## Ques 5:

### Ans: The intuition behind the Gradient Boosting algorithm is to iteratively add new weak learners to an ensemble in a way that focuses on the examples that were poorly predicted by the previous learners.
### At each iteration, the algorithm trains a new weak learner to fit the residuals (i.e., the differences between the predicted values and the actual values) of the previous weak learners. By adding the new weak learner to the ensemble, the algorithm improves the model's ability to predict the training data.
### The name "Gradient Boosting" comes from the fact that the algorithm uses gradient descent to optimize the objective function (i.e., the loss function) of the model. Specifically, the algorithm calculates the negative gradient of the loss function with respect to the predicted values, and then fits a weak learner to these negative gradients.
### By iteratively adding new weak learners and adjusting the predictions to minimize the negative gradient of the loss function, the algorithm gradually improves the model's predictions on the training data. The final model is an ensemble of all the weak learners, weighted by their learning rate, which determines the contribution of each weak learner to the final prediction.
### Overall, the intuition behind Gradient Boosting is to build an ensemble of weak learners that can capture complex relationships between the features and the target variable by focusing on the examples that are difficult to predict with the previous learners.

## Ques 6:

### Ans: Gradient Boosting builds an ensemble of weak learners sequentially, where each weak learner is trained to improve the errors made by the previous weak learners in the ensemble. The algorithm works as follows:
### Initialize the ensemble with a simple model, such as a decision tree with only one node.
### For each iteration:
### a. Calculate the residuals of the previous predictions by subtracting the predicted values from the actual values.
### b. Train a new weak learner on the residuals using a subset of the training data.
### c. Calculate the contribution of the new weak learner to the ensemble by multiplying its predictions by a learning rate (a hyperparameter between 0 and 1 that determines the contribution of each weak learner).
### d. Add the contribution of the new weak learner to the ensemble.
### Repeat step 2 for a fixed number of iterations or until a convergence criterion is met.
### Each weak learner is trained on a subset of the training data, using a technique called "boosting". Boosting samples the training data randomly with replacement (a process called "bagging"), but it assigns higher weights to the examples that were poorly predicted by the previous weak learners. This ensures that the new weak learner focuses on the examples that are difficult to predict with the previous learners.
### The final model is an ensemble of all the weak learners, weighted by their learning rate. The intuition behind this ensemble is that each weak learner contributes a small amount of information to the final prediction, but together they can capture complex relationships between the features and the target variable.

## Ques 7:

### Ans: The mathematical intuition behind Gradient Boosting can be summarized in the following steps:

- Define an initial model that makes a prediction based on the mean of the target variable.
- Calculate the negative gradient of the loss function with respect to the initial model's predictions. This gives the "pseudo-residuals", which are the negative gradients of the loss function.
- Train a new weak learner to fit the pseudo-residuals, using the features as input and the pseudo-residuals as the target variable.
- Add the new weak learner's predictions to the initial model's predictions, weighted by a learning rate, to get the updated predictions.
- Repeat steps 2-4 for a fixed number of iterations, or until a convergence criterion is met.
- The final model is an ensemble of all the weak learners, weighted by their learning rate.