Q1. What is Gradient Boosting Regression ?

Answer--> As it's a boosting technique which focuse on sequential traning of weak models to create a strong model. the main idea behind this to correct the mistakes made by the previous model by decreating the lose function .

Q2. Implement a simple gradient boosting algorithm from scratch using Python and NumPy. Use a
simple regression problem as an example and train the model on a small dataset. Evaluate the model's performance using metrics such as mean squared error and R-squared.

In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.datasets import fetch_california_housing

# Load the California Housing dataset
housing = fetch_california_housing()

# Create a DataFrame from the dataset
X = pd.DataFrame(data=housing.data, columns=housing.feature_names)
y = pd.Series(housing.target)
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# let's insialize our model
gb_reg = GradientBoostingRegressor()  # default values 
gb_reg.fit(X_train, y_train)

# Predict on the test data
y_pred = gb_reg.predict(X_test)

#Evaluation 

mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"Mean Squared Error: {mse:.4f}")
print(f"R-squared: {r2*100:.4f}%")

Mean Squared Error: 0.2941
R-squared: 77.5582%


Q3. Experiment with different hyperparameters such as learning rate, number of trees, and tree depth to optimise the performance of the model. Use grid search or random search to find the best hyperparameters

In [3]:
from sklearn.model_selection import GridSearchCV

param = {'learning_rate':[0.1,0.2,0.3],
          'n_estimators':[100,200,300],
          'max_depth':[2,3]}

gcv = GridSearchCV(gb_reg, param_grid=param, cv = 3, scoring='r2', verbose = 3)
gcv.fit(X_train, y_train)

Fitting 3 folds for each of 18 candidates, totalling 54 fits
[CV 1/3] END learning_rate=0.1, max_depth=2, n_estimators=100;, score=0.756 total time=   2.4s
[CV 2/3] END learning_rate=0.1, max_depth=2, n_estimators=100;, score=0.757 total time=   2.4s
[CV 3/3] END learning_rate=0.1, max_depth=2, n_estimators=100;, score=0.751 total time=   2.4s
[CV 1/3] END learning_rate=0.1, max_depth=2, n_estimators=200;, score=0.784 total time=   4.7s
[CV 2/3] END learning_rate=0.1, max_depth=2, n_estimators=200;, score=0.786 total time=   4.7s
[CV 3/3] END learning_rate=0.1, max_depth=2, n_estimators=200;, score=0.777 total time=   4.7s
[CV 1/3] END learning_rate=0.1, max_depth=2, n_estimators=300;, score=0.797 total time=   7.0s
[CV 2/3] END learning_rate=0.1, max_depth=2, n_estimators=300;, score=0.797 total time=   7.0s
[CV 3/3] END learning_rate=0.1, max_depth=2, n_estimators=300;, score=0.787 total time=   7.0s
[CV 1/3] END learning_rate=0.1, max_depth=3, n_estimators=100;, score=0.790 total ti

In [5]:
gcv.best_params_

{'learning_rate': 0.2, 'max_depth': 3, 'n_estimators': 300}

In [7]:
# Predict on the test data with tunned model 
y_pred = gcv.predict(X_test)

#Evaluation 

mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"Mean Squared Error: {mse:.4f}")
print(f"R-squared: {r2*100:.4f}%")

Mean Squared Error: 0.2331
R-squared: 82.2121%


Q4. What is a weak learner in Gradient Boosting?

Answer--> In Gradient Boosting, "weak learner" refers to a machine learning model that is relatively simple and performs only slightly better than random guessing on a given classification or regression problem. Weak learners are often used as the base or building blocks in ensemble methods like Gradient Boosting.

Q5. What is the intuition behind the Gradient Boosting algorithm?

Answer-->The intuition behind Gradient Boosting is to iteratively refine and improve predictions to create a generaliesed model by minimizing the lose function.  

Q6. How does Gradient Boosting algorithm build an ensemble of weak learners?

Answer--> Here's how the Gradient Boosting algorithm builds an ensemble of weak learners:

1. **Initialization**: It starts with an initial prediction, often just the mean of the target values for regression or the log-odds for classification.

2. **First Weak Learner**: It trains a weak learner (usually a decision tree with limited depth) to predict the errors (residuals) between the initial prediction and the actual target values.

3. **Updating Predictions**: The predictions of the first weak learner are added to the initial prediction, adjusting the model's predictions. This reduces the errors, but some errors often remain.

4. **Iterative Process**: The algorithm repeats steps 2 and 3 multiple times (a specified number of iterations or until a stopping criterion is met). In each iteration, a new weak learner is trained to predict the remaining errors (residuals).

5. **Weighted Combination**: The predictions of all the weak learners are combined in a weighted manner to create the final ensemble prediction. Initially, all weights are equal, but they are updated in each iteration to give more importance to the learners that reduce the errors the most.

6. **Optimization**: Gradient Boosting uses gradient descent optimization to adjust the weights and the parameters of the weak learners. It minimizes a loss function (e.g., Mean Squared Error for regression or log loss for classification) by moving in the direction that decreases the loss.

7. **Strong Ensemble**: Over iterations, the ensemble becomes stronger and more accurate as each weak learner focuses on the errors made by the previous ones. The final ensemble model combines the predictive power of all the weak learners to make highly accurate predictions on new, unseen data.

Q7. What are the steps involved in constructing the mathematical intuition of Gradient Boosting
algorithm?

Anawer--> Here are the key steps involved in constructing the mathematical intuition of Gradient Boosting:

1. **Initialization**:
   - Initialize the ensemble's prediction as the simplest model, often just the mean of the target values for regression or the log-odds for classification.

2. **Residual Calculation**:
   - Calculate the residuals (errors) by subtracting the ensemble's prediction from the actual target values.

3. **Weak Learner Training**:
   - Train a weak learner (typically a decision tree with limited depth) on the residuals. The goal is to create a model that predicts the errors made by the current ensemble.

4. **Prediction Update**:
   - Update the ensemble's prediction by adding the predictions of the newly trained weak learner. This adjusts the model's predictions and reduces some of the errors.

5. **Weighted Combination**:
   - Combine the predictions of all weak learners constructed so far in a weighted manner. Initially, all weights are equal, but they are updated in each iteration to give more importance to the learners that contribute the most to reducing the overall error.

6. **Gradient Descent**:
   - Use gradient descent optimization to adjust the weights and the parameters of the weak learners. This involves calculating the gradient of a loss function (e.g., Mean Squared Error for regression or log loss for classification) with respect to the ensemble's predictions.

7. **Iterative Process**:
   - Repeat steps 3 to 6 for a specified number of iterations or until a stopping criterion is met. In each iteration, a new weak learner is trained to predict the residuals of the current ensemble.

8. **Strong Ensemble**:
   - Over iterations, the ensemble becomes stronger and more accurate as each weak learner focuses on the errors made by the previous ones. The final ensemble model combines the predictive power of all the weak learners to make highly accurate predictions on new, unseen data.