#### Q1. What is Gradient Boosting Regression?

In [None]:
Ans-

Gradient Boosting Regression (GBR) is a machine learning algorithm used for regression tasks. 
It is a type of boosting algorithm that combines multiple weak predictive models to create a strong predictive model.

The idea behind GBR is to iteratively add weak predictive models to the ensemble in a way that corrects the errors made by the previous models.
At each iteration, a new model is trained on the residuals (i.e., the difference between the predicted values and the actual values) of the previous models.

The name "gradient boosting" comes from the fact that the algorithm minimizes the loss function by following the negative gradient of the loss with respect to the predicted values.

GBR is a powerful algorithm that can handle complex non-linear relationships between the features and the target variable. 
It is often used in problems where the number of features is large and the data is noisy.

GBR has many hyperparameters that can be tuned to optimize the performance of the model. 
Some of the important hyperparameters include the number of iterations, the learning rate, the depth of the tree models, and the regularization parameters.

#### Q2. Implement a simple gradient boosting algorithm from scratch using Python and NumPy. Use a simple regression problem as an example and train the model on a small dataset. Evaluate the model's performance using metrics such as mean squared error and R-squared.

In [8]:
#Ans-
import numpy as np

# Generate sample dataset
np.random.seed(0)
X = np.random.rand(100, 1)
y = np.sin(2*np.pi*X).ravel() + np.random.randn(100) * 0.1

def gradient_boosting(X, y, n_estimators, learning_rate):
    # Initialize the predictions with the mean of the target variable
    y_pred = np.full(len(y), np.mean(y))
    for i in range(n_estimators):
        # Calculate the residual between the true and predicted values
        residuals = y - y_pred
        
        # Fit a simple decision tree to the residuals
        tree = DecisionTreeRegressor(max_depth=1)
        tree.fit(X, residuals)
        
        # Update the predictions using the output of the decision tree
        y_pred += learning_rate * tree.predict(X)
    return y_pred

from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error, r2_score

# Train the model
y_pred = gradient_boosting(X, y, n_estimators=50, learning_rate=0.1)

# Evaluate the model
mse = mean_squared_error(y, y_pred)
r2 = r2_score(y, y_pred)
print(f"Mean Squared Error: {mse:.3f}")
print(f"R-squared: {r2:.3f}")


Mean Squared Error: 0.040
R-squared: 0.919


#### Q3. Experiment with different hyperparameters such as learning rate, number of trees, and tree depth to optimise the performance of the model. Use grid search or random search to find the best hyperparameters

In [9]:
#Ans

from sklearn.model_selection import RandomizedSearchCV
from sklearn.ensemble import GradientBoostingRegressor

# Define the parameter grid
param_grid = {
    'n_estimators': [50, 100, 150, 200],
    'learning_rate': [0.01, 0.1, 0.5],
    'max_depth': [1, 2, 3, 4, 5]
}

# Create the model and perform random search
model = GradientBoostingRegressor()
random_search = RandomizedSearchCV(estimator=model, param_distributions=param_grid, n_iter=20, cv=5, random_state=0)
random_search.fit(X, y)

# Print the best parameters and score
print("Best parameters: ", random_search.best_params_)
print("Best score: ", random_search.best_score_)

Best parameters:  {'n_estimators': 150, 'max_depth': 1, 'learning_rate': 0.1}
Best score:  0.9600861804922933


#### Q4. What is a weak learner in Gradient Boosting?

In [None]:
Ans-

Gradient Boosting is an ensemble method that combines multiple weak learners to create a strong predictive model.
A weak learner is a simple model that performs slightly better than random guessing.
In the context of Gradient Boosting, a weak learner is a decision tree with a limited number of splits, also known as a "shallow tree".

In Gradient Boosting, the idea is to sequentially add weak learners to the ensemble, with each new learner attempting to correct the errors made by the previous learners.
The algorithm assigns weights to the training examples based on their relative importance, with the goal of giving more weight to the examples that are difficult to classify correctly.

The process of adding new weak learners to the ensemble is repeated until the desired level of performance is achieved, or until the algorithm reaches a predefined number of iterations.
The final prediction is a weighted average of the predictions of all the weak learners in the ensemble.

In summary, a weak learner in Gradient Boosting is a decision tree with a limited number of splits, and it is used in combination with other weak learners to create a strong predictive model.

#### Q5. What is the intuition behind the Gradient Boosting algorithm?

In [None]:
Ans-

The Gradient Boosting algorithm is an ensemble method that combines multiple weak learners to create a strong predictive model. 
The intuition behind Gradient Boosting is to iteratively add weak learners to the ensemble, with each new learner attempting to correct the errors made by the previous learners.

The algorithm works by minimizing a cost function, which measures the difference between the predicted and actual values of the target variable.
In each iteration, the algorithm fits a new weak learner to the residuals, which are the differences between the predicted and actual values of the target variable.
The idea is that the new learner will learn to predict the residual errors made by the previous learners, thus reducing the overall error of the ensemble.

The algorithm assigns weights to the training examples based on their relative importance, with the goal of giving more weight to the examples that are difficult to classify correctly.
This allows the algorithm to focus on the examples that are most important for improving the performance of the model.

The final prediction is a weighted sum of the predictions of all the weak learners in the ensemble. 
The weights are determined by the performance of each learner on the training examples, with the best-performing learners given more weight in the final prediction.

In summary, the intuition behind Gradient Boosting is to iteratively add weak learners to the ensemble, with each new learner attempting to correct the errors made by the previous learners.
The algorithm assigns weights to the training examples based on their relative importance and minimizes a cost function by fitting new learners to the residual errors. 
The final prediction is a weighted sum of the predictions of all the weak learners in the ensemble.

#### Q6. How does Gradient Boosting algorithm build an ensemble of weak learners?

In [None]:
Ans-

The Gradient Boosting algorithm builds an ensemble of weak learners by iteratively adding new learners to the ensemble, with each new learner attempting to correct the errors made by the previous learners.
The steps involved in building an ensemble of weak learners using Gradient Boosting are as follows:

1.Initialize the ensemble with a simple model:
The algorithm starts by initializing the ensemble with a simple model, such as a decision tree with a single split.
This model is often called the "base learner" or the "initial model."

2.Calculate the residuals:
The algorithm calculates the difference between the predicted and actual values of the target variable for each training example. 
These differences are called "residuals."

3.Fit a new weak learner to the residuals: 
The algorithm fits a new weak learner to the residuals, with the goal of predicting the errors made by the previous learner.
The new learner is usually a decision tree with a limited number of splits, also known as a "shallow tree."

4.Update the ensemble:
The algorithm adds the new learner to the ensemble and combines it with the previous learners.
This is done by taking a weighted sum of the predictions of all the learners in the ensemble, where the weights are determined by the performance of each learner on the training examples.

5.Repeat steps 2-4: 
The algorithm repeats steps 2-4 until a predefined stopping criterion is met, such as a maximum number of iterations or a minimum level of performance.

6.Make predictions:
The final prediction is a weighted sum of the predictions of all the learners in the ensemble. 
The weights are determined by the performance of each learner on the training examples, with the best-performing learners given more weight in the final prediction.

In summary, the Gradient Boosting algorithm builds an ensemble of weak learners by iteratively adding new learners to the ensemble, with each new learner attempting to correct the errors made by the previous learners.
The algorithm calculates the residuals, fits a new weak learner to the residuals, updates the ensemble, and repeats the process until a stopping criterion is met. 
The final prediction is a weighted sum of the predictions of all the learners in the ensemble.

#### Q7. What are the steps involved in constructing the mathematical intuition of Gradient Boosting algorithm?

In [None]:
Ans-

The mathematical intuition behind Gradient Boosting algorithm involves the following steps:

1.Define a loss function: 
The first step is to define a loss function that measures the difference between the predicted and actual values of the target variable. 
This loss function is typically a differentiable function that can be optimized using gradient descent.

2.Initialize the model: 
The algorithm starts by initializing the model with a simple function, such as a constant value or the mean of the target variable.

3.Calculate the negative gradient of the loss function: 
The negative gradient of the loss function is calculated with respect to the predictions of the current model.
This negative gradient represents the direction in which the loss function decreases the most.

4.Fit a weak learner to the negative gradient: 
The algorithm fits a weak learner to the negative gradient, with the goal of predicting the direction in which the loss function decreases the most.
The weak learner is usually a decision tree with a limited number of splits, also known as a "shallow tree."

5.Update the model: 
The algorithm updates the model by adding the predictions of the weak learner, multiplied by a learning rate, to the current model.
The learning rate determines the contribution of the weak learner to the final model and is usually set to a small value, such as 0.1.

6.Repeat steps 3-5:
The algorithm repeats steps 3-5 until a predefined stopping criterion is met, such as a maximum number of iterations or a minimum level of performance.

7.Make predictions:
The final prediction is the sum of the predictions of all the weak learners in the ensemble, each multiplied by its learning rate.

In summary, the mathematical intuition behind Gradient Boosting algorithm involves defining a loss function, initializing the model, calculating the negative gradient of the loss function, 
fitting a weak learner to the negative gradient, updating the model, and repeating the process until a stopping criterion is met. 
The final prediction is the sum of the predictions of all the weak learners in the ensemble, each multiplied by its learning rate.