#### Q1. What is Gradient Boosting Regression?

Gradient Boosting Regression is a powerful machine learning technique that builds an ensemble of predictive models, typically decision trees, to optimize a loss function in a sequential manner. In Gradient Boosting, each new model incrementally corrects the errors made by previous models. The final prediction model is a weighted sum of these weaker models. This approach is used not only for regression but also for classification and other predictive tasks.

#### Q2. Implement a simple gradient boosting algorithm from scratch using Python and NumPy. Use a simple regression problem as an example and train the model on a small dataset. Evaluate the model's performance using metrics such as mean squared error and R-squared.

In [4]:
from sklearn.datasets import make_regression
X, y = make_regression(n_samples=100, n_features=1, noise=0.1)

from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y,random_state=42,test_size=0.33)

from sklearn.ensemble import GradientBoostingRegressor
gbr=GradientBoostingRegressor()
gbr.fit(X_train,y_train)
y_pred=gbr.predict(X_test)

from sklearn.metrics import mean_squared_error, r2_score
mse = mean_squared_error(y_test,y_pred)
r2_scure = r2_score(y_test,y_pred)

print("MSE:",mse)
print("R2:",r2_scure)

MSE: 1.9847493141763362
R2: 0.9980443215884244


#### Q3. Experiment with different hyperparameters such as learning rate, number of trees, and tree depth to optimise the performance of the model. Use grid search or random search to find the best hyperparameters


In [5]:
from sklearn.model_selection import GridSearchCV

# Define the model and parameters
model = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, max_depth=3)
param_grid = {
    'n_estimators': [50, 100, 200],
    'learning_rate': [0.01, 0.1, 0.5],
    'max_depth': [1, 2, 3]
}

# Grid search
grid_search = GridSearchCV(model, param_grid, cv=3, scoring='neg_mean_squared_error')
grid_search.fit(X_train, y_train)
best_params = grid_search.best_params_
best_score = grid_search.best_score_

print("Best Parameters:", best_params)
print("Best Score:", -best_score)


Best Parameters: {'learning_rate': 0.5, 'max_depth': 3, 'n_estimators': 200}
Best Score: 19.883364714804376


#### Q4. What is a weak learner in Gradient Boosting?


In Gradient Boosting, a "weak learner" is typically a simple model like a shallow decision tree. These trees are not very accurate on their own but are effective when combined into an ensemble. The weakness comes from the model's simplicity and limited depth, restricting its expressive power.

#### Q5. What is the intuition behind the Gradient Boosting algorithm?


In Gradient Boosting, a "weak learner" is typically a simple model like a shallow decision tree. These trees are not very accurate on their own but are effective when combined into an ensemble. The weakness comes from the model's simplicity and limited depth, restricting its expressive power.

#### Q6. How does Gradient Boosting algorithm build an ensemble of weak learners?


Gradient Boosting builds an ensemble of weak learners (typically decision trees) sequentially. Each tree in the sequence is trained to predict the residuals or errors of the previous ensemble. After a tree is trained, it contributes to the ensemble with a weighted prediction, with the weight often equivalent to the learning rate.

#### Q7. What are the steps involved in constructing the mathematical intuition of Gradient Boosting algorithm?

#### Steps:
1.Create a Base Model: The first step is to create a base model, which provides an initial prediction for the target variable. In Gradient Boosting Regression, this base model typically starts with the average value of the target variable. This initial prediction acts as a starting point for the ensemble.

2.Construct a Decision Tree: After creating the base model, a decision tree is constructed to capture the patterns in the data that the base model fails to capture. This decision tree is trained to predict the residuals or errors of the base model's predictions. The residuals are the differences between the actual target values and the predictions made by the base model.

3.Compute Residuals: Once the decision tree is trained, it predicts the residuals for each data point. These residuals represent the errors made by the base model, and the decision tree aims to correct these errors in its predictions.

4.Update Predictions: The predictions of the decision tree are then added to the predictions of the base model. However, the contribution of the decision tree's predictions to the ensemble is controlled by a hyperparameter called the learning rate. Typically, the learning rate is less than 1, and the predictions of the decision tree are multiplied by the learning rate before being added to the ensemble.

5.Repeat: Steps 2-4 are repeated iteratively, with each new decision tree trained to predict the residuals of the ensemble formed by the previous trees. The predictions of each new tree are added to the predictions of the ensemble with appropriate weights, determined by the learning rate.

6.Stopping Criterion: The iterative process continues until a stopping criterion is met, such as reaching a maximum number of trees or achieving satisfactory performance on a validation set.