#Q1. What is Gradient Boosting Regression? Give a practical example with code:

Gradient Boosting Regression is an ensemble learning technique that combines the predictions from multiple weak learners (typically decision trees) to create a strong predictive model. It builds the model sequentially, with each new tree trying to correct the errors of the previous ones.

In [1]:
#1
# Import necessary libraries
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import numpy as np

# Generate some example data
np.random.seed(42)
X = np.random.rand(100, 1)
y = 2 * X.squeeze() + np.random.randn(100)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize the Gradient Boosting Regressor
gb_regressor = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42)

# Train the model
gb_regressor.fit(X_train, y_train)

# Make predictions on the test set
predictions = gb_regressor.predict(X_test)

# Evaluate the model
mse = mean_squared_error(y_test, predictions)
print(f"Mean Squared Error: {mse}")

Mean Squared Error: 0.7340600217411903


In [6]:
#Q2
import numpy as np

def mean_squared_error(y_true, y_pred):
    return np.mean((y_true - y_pred) ** 2)

def gradient_boosting(X, y, n_estimators=100, learning_rate=0.1, max_depth=3):
    # Initialize predictions with the mean of the target variable
    predictions = np.full_like(y, np.mean(y))

    for _ in range(n_estimators):
        # Compute residuals
        residuals = y - predictions

        # Fit a decision tree to the residuals
        tree = DecisionTreeRegressor(max_depth=max_depth)
        tree.fit(X, residuals)

        # Update predictions with the learning rate and the new tree's predictions
        predictions += learning_rate * tree.predict(X)

    return predictions

# Example usage
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import r2_score
# Generate some example data
np.random.seed(42)
X = np.random.rand(100, 1)
y = 2 * X.squeeze() + np.random.randn(100)

# Use the gradient boosting function
predictions = gradient_boosting(X, y)

# Train the model
gb_regressor.fit(X_train, y_train)

# Make predictions on the test set
predictions = gb_regressor.predict(X_test)

# Evaluate the model
mse = mean_squared_error(y_test, predictions)
r_squared = r2_score(y_test, predictions)

print(f"Mean Squared Error: {mse}")
print(f"R-squared: {r_squared}")

Mean Squared Error: 0.7340600217411903
R-squared: 0.044479095570065574


In [4]:
#Q3
from sklearn.model_selection import GridSearchCV

# Define the parameter grid
param_grid = {
    'n_estimators': [50, 100, 150],
    'learning_rate': [0.01, 0.1, 0.2],
    'max_depth': [3, 4, 5]
}

# Create a Gradient Boosting Regressor
gb_regressor = GradientBoostingRegressor(random_state=42)

# Perform grid search
grid_search = GridSearchCV(gb_regressor, param_grid, cv=3, scoring='neg_mean_squared_error')
grid_search.fit(X_train, y_train)

# Get the best parameters
best_params = grid_search.best_params_
print(f"Best Hyperparameters: {best_params}")

# Train the model with the best hyperparameters
best_gb_regressor = GradientBoostingRegressor(**best_params, random_state=42)
best_gb_regressor.fit(X_train, y_train)

# Make predictions on the test set
best_predictions = best_gb_regressor.predict(X_test)

# Evaluate the model
best_mse = mean_squared_error(y_test, best_predictions)
print(f"Mean Squared Error with Best Hyperparameters: {best_mse}")

Best Hyperparameters: {'learning_rate': 0.01, 'max_depth': 3, 'n_estimators': 100}
Mean Squared Error with Best Hyperparameters: 0.5973702258607219


#Q4. What is a weak learner in Gradient Boosting? Give a practical example with code:

A weak learner is a model that performs slightly better than random chance. In the context of gradient boosting, decision trees with shallow depths are commonly used as weak learners.

In [7]:
#4
# Example of a weak learner using a shallow decision tree
weak_learner = DecisionTreeRegressor(max_depth=2)
weak_learner.fit(X_train, y_train)
weak_predictions = weak_learner.predict(X_test)

weak_mse = mean_squared_error(y_test, weak_predictions)
print(f"Mean Squared Error of Weak Learner: {weak_mse}")

Mean Squared Error of Weak Learner: 0.6054414730725821


#Q5 What is the intuition behind the Gradient Boosting algorithm?

The intuition behind the Gradient Boosting algorithm is to sequentially add weak learners (typically shallow decision trees) to correct the errors made by the previous ones. Each tree is trained to predict the residuals (the differences between the actual and predicted values) of the ensemble, and their predictions are combined with a weighted sum. The learning process involves minimizing a loss function, often mean squared error for regression problems, by adjusting the weights and predictions of each weak learner.

In [8]:
#Q6
# Build an ensemble of weak learners using scikit-learn's GradientBoostingRegressor
ensemble_regressor = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42)
ensemble_regressor.fit(X_train, y_train)

# Make predictions on the test set
ensemble_predictions = ensemble_regressor.predict(X_test)

# Evaluate the ensemble model
ensemble_mse = mean_squared_error(y_test, ensemble_predictions)
print(f"Mean Squared Error of Gradient Boosting Ensemble: {ensemble_mse}")

Mean Squared Error of Gradient Boosting Ensemble: 0.7340600217411903


#Q7. What are the steps involved in constructing the mathematical intuition of Gradient Boosting algorithm?

Initialize the Model: Start with a simple model, often the mean of the target variable for regression problems.

Compute Residuals: Calculate the difference between the actual and predicted values (residuals).

Fit a Weak Learner to Residuals: Train a weak learner (e.g., a decision tree) to predict the residuals.

Update Predictions: Update the model's predictions by adding the predictions of the weak learner, scaled by a learning rate.

Repeat Steps 2-4: Repeat steps 2-4 for a specified number of iterations or until a convergence criterion is met.

Final Prediction: The final prediction is the sum of all weak learner predictions.

The mathematical intuition involves minimizing a loss function by iteratively improving the model's predictions through the addition of weak learners. The learning rate controls the step size in each iteration, and the ensemble aims to capture complex patterns in the data.