#### Q1. What is Gradient Boosting Regression?

Ans: **Gradient Boosting Regression** is a machine learning technique that combines multiple weak learners, typically decision trees, to create a strong predictive model. 

The algorithm works by iteratively adding weak learners to the model, with each learner trying to improve upon the mistakes of the previous learner. 

In Gradient Boosting Regression, the algorithm focuses on minimizing the residual error between the actual and predicted values, using gradient descent to find the optimal values for the model parameters.

#### Q2. Implement a simple gradient boosting algorithm from scratch using Python and NumPy. Use a simple regression problem as an example and train the model on a small dataset. Evaluate the model's performance using metrics such as mean squared error and R-squared.

In [8]:
import numpy as np
from sklearn.datasets import make_regression
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import GridSearchCV

gb = GradientBoostingRegressor()

In [4]:
X, y = make_regression(n_samples=100, n_features=5, noise=0.5)

In [5]:
param_grid = {
    "n_estimators": [50, 100, 200],
    "learning_rate": [0.05, 0.1, 0.2],
    "max_depth": [1, 2, 3]
}

In [9]:
grid_search = GridSearchCV(gb, param_grid, cv=5)
grid_search.fit(X, y)

print("Best Hyperparameters:", grid_search.best_params_)

Best Hyperparameters: {'learning_rate': 0.2, 'max_depth': 1, 'n_estimators': 200}


In [10]:
best_gb = GradientBoostingRegressor(**grid_search.best_params_)

best_gb.fit(X, y)

In [11]:
predictions = best_gb.predict(X)

In [13]:
mse = mean_squared_error(y, predictions)
r2 = r2_score(y, predictions)

print("Mean Squared Error:", mse, '\n')
print("R-squared:", r2)

Mean Squared Error: 71.2400032876942 

R-squared: 0.9946903279725771


#### Q3. Experiment with different hyperparameters such as learning rate, number of trees, and tree depth to optimise the performance of the model. Use grid search or random search to find the best hyperparameters

In [18]:
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint as sp_randint
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split

X, y = make_regression(n_samples=100, n_features=5, noise=0.5)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


param_dist = {
    'n_estimators': sp_randint(10, 100),
    'max_depth': sp_randint(2, 10),
    'learning_rate': [0.01, 0.05, 0.1, 0.5, 1.0],
    'subsample': [0.5, 0.75, 1.0],
    'min_samples_split': sp_randint(2, 20),
    'min_samples_leaf': sp_randint(1, 10)
}

gb = GradientBoostingRegressor()

random_search = RandomizedSearchCV(
    estimator=gb,
    param_distributions=param_dist,
    n_iter=50,
    cv=5,
    scoring='neg_mean_squared_error',
    random_state=42
)

random_search.fit(X_train, y_train)

print("Best hyperparameters: ", random_search.best_params_, '\n \n')
print("Best score: ", -random_search.best_score_)

Best hyperparameters:  {'learning_rate': 0.1, 'max_depth': 2, 'min_samples_leaf': 3, 'min_samples_split': 8, 'n_estimators': 61, 'subsample': 0.5} 
 

Best score:  1041.2648048385934


#### Q4. What is a weak learner in Gradient Boosting?

Ans: A weak learner in Gradient Boosting is a simple or relatively simple model that performs only slightly better than random guessing. In the context of Gradient Boosting Regression, a weak learner is typically a decision tree with a small number of splits or levels.

#### Q5. What is the intuition behind the Gradient Boosting algorithm?

Ans:
- The intuition behind the Gradient Boosting algorithm is to iteratively add simple models to the ensemble, each model trying to correct the mistakes of the previous model. 

- The algorithm starts with a simple model, such as a decision tree, and then adds more models to the ensemble in a way that minimizes the residual error between the actual and predicted values. 

- At each iteration, the algorithm calculates the gradient of the loss function with respect to the predicted values, and then fits a weak learner to the negative gradient, essentially trying to predict the residual error of the previous model.

#### Q6. How does Gradient Boosting algorithm build an ensemble of weak learners?

Ans: The Gradient Boosting algorithm builds an ensemble of weak learners by adding them to the model sequentially, with each new learner trying to improve upon the mistakes of the previous learner. 

- The algorithm starts by fitting a simple model, such as a decision tree, to the data. 

- Then, it calculates the residuals between the actual and predicted values of the first model and fits a new model to the residuals. 

- This process is repeated for a predetermined number of iterations or until a stopping criterion is met. 

- The final prediction is the weighted sum of the predictions of all the weak learners in the ensemble.

#### Q7. What are the steps involved in constructing the mathematical intuition of Gradient Boosting algorithm?

Ans: The mathematical intuition behind the Gradient Boosting algorithm involves the following steps:

1. Initialize the model with a simple model, such as a decision tree.
2. Calculate the residuals between the actual and predicted values of the model.
3. Fit a new model to the residuals, essentially trying to predict the errors of the previous model.
4. Combine the predictions of the new model with the predictions of the previous model to form the updated predictions.
5. Repeat steps 2-4 until a predetermined number of iterations or until a stopping criterion is met.
6. Calculate the final prediction as the weighted sum of the predictions of all the models in the ensemble. The weights are determined by the performance of each model, with better-performing models receiving higher weights.