# **`Boosting-2`**

`Q1. What is Gradient Boosting Regression?`

Gradient Boosting Regression (GBR) is a type of boosting algorithm used for regression problems. It is a machine learning technique that builds an ensemble of decision trees, where each tree is trained on the residuals (i.e., the difference between the actual and predicted values) of the previous tree. The final prediction of the GBR model is the sum of the predictions of all the trees in the ensemble.

In GBR, the algorithm uses the gradient descent optimization method to minimize the loss function. The loss function used in GBR is typically the mean squared error (MSE), which measures the average squared difference between the actual and predicted values.

GBR works by iteratively adding decision trees to the model, where each new tree is trained to correct the errors made by the previous trees. In each iteration, the algorithm calculates the negative gradient of the loss function with respect to the predicted values, and then fits a new tree to this gradient. The predicted values of the new tree are then added to the predictions of the previous trees, and the process is repeated for a specified number of iterations or until a convergence criteria is met.

GBR has been shown to be an effective and powerful technique for regression problems, and is widely used in many applications, such as finance, healthcare, and e-commerce.

`Q2. Implement a simple gradient boosting algorithm from scratch using Python and NumPy. Use a
simple regression problem as an example and train the model on a small dataset. Evaluate the model's
performance using metrics such as mean squared error and R-squared.`

In [3]:
# making a regressing dataset
from sklearn.datasets import make_regression
X , y = make_regression(n_features=4,n_informative=2,
                        random_state=0,shuffle=False)

In [7]:
#spltting the data into train and test
from sklearn.model_selection import train_test_split
X_train , X_test , y_train , y_test = train_test_split(X,y, test_size=0.33, random_state=42)

In [8]:
# importing libraries
from sklearn.ensemble import GradientBoostingRegressor

In [9]:
#training the model
regressor = GradientBoostingRegressor()
regressor.fit(X_train, y_train)

In [10]:
y_pred = regressor.predict(X_test)

In [12]:
from sklearn.metrics import mean_squared_error , r2_score
print(mean_squared_error(y_test,y_pred))
print(r2_score(y_test,y_pred))

77.15098908220385
0.937947539336522


`Q3. Experiment with different hyperparameters such as learning rate, number of trees, and tree depth to
optimise the performance of the model. Use grid search or random search to find the best
hyperparameters`

In [76]:
params = {
    "learning_rate" : [0.15,0.25,0.3],
    "n_estimators" : [300,400,1000],
    "max_depth" : [1,2,3,4]
}

In [77]:
from sklearn.model_selection import GridSearchCV

In [78]:
regressor = GradientBoostingRegressor()

In [79]:
reg = GridSearchCV(regressor, param_grid=params, scoring= "r2" , cv=5, verbose=3)

In [80]:
reg.fit(X_train,y_train)

Fitting 5 folds for each of 36 candidates, totalling 180 fits
[CV 1/5] END learning_rate=0.15, max_depth=1, n_estimators=300;, score=0.960 total time=   0.2s
[CV 2/5] END learning_rate=0.15, max_depth=1, n_estimators=300;, score=0.973 total time=   0.3s
[CV 3/5] END learning_rate=0.15, max_depth=1, n_estimators=300;, score=0.942 total time=   0.2s
[CV 4/5] END learning_rate=0.15, max_depth=1, n_estimators=300;, score=0.988 total time=   0.2s
[CV 5/5] END learning_rate=0.15, max_depth=1, n_estimators=300;, score=0.856 total time=   0.2s
[CV 1/5] END learning_rate=0.15, max_depth=1, n_estimators=400;, score=0.960 total time=   0.3s
[CV 2/5] END learning_rate=0.15, max_depth=1, n_estimators=400;, score=0.974 total time=   0.3s
[CV 3/5] END learning_rate=0.15, max_depth=1, n_estimators=400;, score=0.944 total time=   0.3s
[CV 4/5] END learning_rate=0.15, max_depth=1, n_estimators=400;, score=0.987 total time=   0.3s
[CV 5/5] END learning_rate=0.15, max_depth=1, n_estimators=400;, score=0.8

In [83]:
#best r2 score
reg.best_score_

0.9473892996547371

In [84]:
#best hyperparameters
reg.best_params_

{'learning_rate': 0.15, 'max_depth': 1, 'n_estimators': 1000}

`Q4. What is a weak learner in Gradient Boosting?`

n Gradient Boosting, a weak learner is a simple model that is not very expressive and has low predictive power on its own, but can be combined with other weak learners to form a stronger model. In particular, a weak learner is a decision tree with few nodes or a linear regression model with a small number of features.

In Gradient Boosting, the weak learner is trained to fit the negative gradient of the loss function at each iteration. The idea is that the weak learner will learn to correct the errors made by the previous trees, and improve the overall performance of the model.

The key to the success of Gradient Boosting is that it is able to combine many weak learners to form a strong model. By iteratively adding weak learners and updating the predictions of the model, Gradient Boosting is able to reduce the bias of the model and increase its accuracy.

The use of weak learners in Gradient Boosting also helps to reduce the risk of overfitting, as each weak learner only focuses on a small subset of the features, and therefore has limited capacity to memorize the training data.

`Q5. What is the intuition behind the Gradient Boosting algorithm?`

The intuition behind the Gradient Boosting algorithm is to iteratively add simple models (weak learners) to the ensemble, where each new model tries to correct the errors made by the previous models. The algorithm works by minimizing the loss function of the model, which measures the difference between the predicted and actual values of the target variable.

The basic idea of Gradient Boosting is to fit the negative gradient of the loss function at each iteration, which corresponds to the residual error between the predicted and actual values. By adding a new weak learner that fits the negative gradient, the algorithm is able to correct the errors made by the previous models and improve the overall performance of the model.

The algorithm combines the predictions of all the weak learners by adding them together, with each new weak learner adding a new dimension to the prediction space. As a result, the final prediction of the model is a combination of many weak learners, each of which focuses on a different aspect of the data.

The Gradient Boosting algorithm has several advantages over other machine learning techniques, including the ability to handle non-linear relationships between the features and the target variable, and the ability to handle missing data and outliers. It is also able to handle large datasets and high-dimensional feature spaces, and can be applied to a wide range of machine learning problems, including classification, regression, and ranking.

`Q6. How does Gradient Boosting algorithm build an ensemble of weak learners?`

The Gradient Boosting algorithm builds an ensemble of weak learners in an iterative fashion. The general process of building an ensemble of weak learners in Gradient Boosting can be summarized as follows:

1. Initialize the model: The first step in Gradient Boosting is to initialize the model with a simple model, such as a decision tree with a single node. This model is called the "base learner" or the "starting model".

2. Calculate the residual errors: Once the starting model is trained, the algorithm calculates the residual errors by subtracting the predictions of the starting model from the actual values of the target variable.

3. Train a weak learner: The next step is to train a new weak learner on the residual errors calculated in the previous step. The weak learner is typically a simple model, such as a decision tree with a small number of nodes or a linear regression model with a small number of features.

4. Update the model: The weak learner is added to the model by multiplying its predictions by a small learning rate (typically between 0.01 and 0.1) and adding them to the predictions of the previous model. This process updates the predictions of the model and reduces the residual errors.

5. Repeat: Steps 2-4 are repeated for a fixed number of iterations or until the residual errors converge to a small value.

By iteratively adding weak learners and updating the predictions of the model, the Gradient Boosting algorithm is able to build a powerful ensemble of weak learners that work together to improve the overall performance of the model. The algorithm is able to handle a wide range of machine learning problems, including classification, regression, and ranking, and has been shown to be effective in many applications, such as finance, healthcare, and e-commerce.

`Q7. What are the steps involved in constructing the mathematical intuition of Gradient Boosting
algorithm?`

The mathematical intuition behind the Gradient Boosting algorithm involves the following steps:

1. Define the loss function: The first step is to define a differentiable loss function that measures the difference between the predicted and actual values of the target variable. The loss function is usually a function of the residuals, which are the differences between the predicted and actual values.

2. Initialize the model: The starting model is initialized with a constant value, which is the mean of the target variable. This value represents the global mean of the target variable and serves as the baseline prediction for all the data points.

3. Compute the negative gradient of the loss function: The negative gradient of the loss function is computed with respect to the predictions of the model. This gradient represents the direction in which the predictions of the model need to be adjusted in order to minimize the loss function.

4. Fit a weak learner to the negative gradient: A new weak learner is fitted to the negative gradient of the loss function. The weak learner is typically a simple model, such as a decision tree with a small number of nodes or a linear regression model with a small number of features.

5. Update the model: The predictions of the weak learner are multiplied by a small learning rate and added to the predictions of the previous model. This process updates the predictions of the model and reduces the residual errors.

6. Repeat steps 3-5: Steps 3-5 are repeated for a fixed number of iterations or until the residual errors converge to a small value.

7. Combine the weak learners: The final model is obtained by combining the predictions of all the weak learners, weighted by their learning rates.

The intuition behind these steps is that by iteratively fitting a weak learner to the negative gradient of the loss function and combining the weak learners to form a strong ensemble model, the Gradient Boosting algorithm is able to improve the performance of the model and minimize the loss function. By combining the predictions of the weak learners, the algorithm is able to capture complex non-linear relationships between the features and the target variable, and handle a wide range of machine learning problems, including classification, regression, and ranking.