In [None]:
# Q1. What is Gradient Boosting Regression?
Ans.
Gradient Boosting Regression is a machine learning technique that belongs to the ensemble learning family.
It builds a predictive model in the form of an additive combination of weak learners, typically decision 
trees, to create a strong predictive model. Gradient Boosting Regression is particularly used for regression
tasks, where the goal is to predict a continuous target variable.

In [43]:
# Q2. Implement a simple gradient boosting algorithm from scratch using Python and NumPy. Use a
# simple regression problem as an example and train the model on a small dataset. Evaluate the model's
# performance using metrics such as mean squared error and R-squared.

import numpy as np
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split

# Generate a small dataset for regression
X, y = make_regression(n_samples=1000,n_features=4,noise=0.1, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Define a simple decision tree regressor
class DecisionTreeRegressor:
    def fit(self, X, y):
        self.prediction = np.mean(y)

    def predict(self, X):
        return np.full(len(X), self.prediction)

# Define the Gradient Boosting Regressor
class GradientBoostingRegressor:
    def __init__(self, n_estimators=100, learning_rate=0.1):
        self.n_estimators = n_estimators
        self.learning_rate = learning_rate
        self.models = []

    def fit(self, X, y):
        prediction = np.mean(y)
        for _ in range(self.n_estimators):
            residual = y - prediction
            tree = DecisionTreeRegressor()
            tree.fit(X, residual)
            self.models.append(tree)
            prediction += self.learning_rate * tree.predict(X)

    def predict(self, X):
        predictions = np.zeros(len(X))
        for tree in self.models:
            predictions += self.learning_rate * tree.predict(X)
        return predictions

    def r_squared(self, y_true, y_pred):
        ss_residual = np.sum((y_true - y_pred) ** 2)
        ss_total = np.sum((y_true - np.mean(y_true)) ** 2)
        return 1 - (ss_residual / ss_total)

# Train the model
gb = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1)
gb.fit(X_train, y_train)

# Predict on the test data
y_pred = gb.predict(X_test)

# Evaluate the performance
mse = np.mean((y_test - y_pred) ** 2)
r_squared = gb.r_squared(y_test, y_pred)

print(f"Mean squared error: {mse:.2f}")
print(f"R-squared: {r_squared:.2f}")

Mean squared error: 12691.43
R-squared: -0.00


In [47]:
# Q3. Experiment with different hyperparameters such as learning rate, number of trees, and tree depth to
# optimise the performance of the model. Use grid search or random search to find the best
# hyperparameters

from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
from sklearn.ensemble import GradientBoostingRegressor
gb = GradientBoostingRegressor()

In [45]:
parameter = {
    'learning_rate' : [0.1,0.01,0.001],
    'n_estimators' : [100,200,300,400],
    'max_depth' : [3,4,5,10]
}

In [49]:
gscv = GridSearchCV(GradientBoostingRegressor(),param_grid=parameter,cv=5,scoring='neg_mean_squared_error' )
gscv.fit(X_train,y_train)

In [54]:
# Get the best parameters and best estimator from grid search
best_params = gscv.best_params_
best_gb = gscv.best_estimator_

In [55]:
# Train the best model on the entire training data
best_gb.fit(X_train, y_train)

In [56]:
# Evaluate the best model on the test data
y_pred = best_gb.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print("Best Parameters:", best_params)
print("Best Mean Squared Error:", mse)

Best Parameters: {'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 400}
Best Mean Squared Error: 158.8031711853206


In [59]:
# Alternatively, you can use RandomizedSearchCV for random search
random_search = RandomizedSearchCV(estimator=gb, param_distributions=parameter, scoring='neg_mean_squared_error', cv=5, n_iter=10)
random_search.fit(X_train, y_train)

In [60]:
best_params_random = random_search.best_params_
best_gb_random = random_search.best_estimator_

best_gb_random.fit(X_train, y_train)
y_pred_random = best_gb_random.predict(X_test)
mse_random = mean_squared_error(y_test, y_pred_random)
print("\nRandom Search Best Parameters:", best_params_random)
print("Random Search Best Mean Squared Error:", mse_random)


Random Search Best Parameters: {'n_estimators': 200, 'max_depth': 4, 'learning_rate': 0.1}
Random Search Best Mean Squared Error: 249.52349096429325


In [None]:
# Q4. What is a weak learner in Gradient Boosting?
Ans.

In Gradient Boosting, a weak learner is a simple model, often a shallow decision tree, that performs slightly 
better than random chance. Weak learners are used sequentially to build a strong predictive model by focusing 
on correcting errors made by the ensemble in previous iterations

In [None]:
# Q5. What is the intuition behind the Gradient Boosting algorithm?
Ans.
The intuition behind the Gradient Boosting algorithm is to iteratively build a strong predictive model by 
combining the knowledge of multiple weak learners. In each iteration, the algorithm focuses on correcting the 
errors made by the ensemble so far, by fitting a weak learner to the residuals. This adaptive learning process 
minimizes the overall prediction error, resulting in a robust and accurate model for regression or classification
tasks.

In [None]:
# Q6. How does Gradient Boosting algorithm build an ensemble of weak learners?
Ans.
The Gradient Boosting algorithm builds an ensemble of weak learners in an iterative fashion. Here's a simplified 
overview of the process:
1. Initialize Model: Start with a simple model, often the mean of the target variable for regression or a constant
for classification.
2. Calculate Residuals: Compute the difference between the actual and predicted values (residuals).
3. Train Weak Learner: Fit a weak learner (typically a shallow decision tree) to the residuals. This weak learner
aims to capture the patterns in the data that were not well-represented by the current ensemble.
4. Update Model: Add the weak learner's predictions, scaled by a learning rate, to the overall ensemble.
5. Repeat: Iterate the process by calculating new residuals based on the updated ensemble and training additional
weak learners until a predefined number of iterations is reached or until a stopping criterion is met.

In [None]:
# Q7. What are the steps involved in constructing the mathematical intuition of Gradient Boosting
# algorithm?
Ans.
Gradient Boosting is a popular machine learning algorithm used for both regression and classification tasks.
Here are the steps involved in constructing the mathematical intuition of Gradient Boosting:
1. Define the problem: Define the problem you want to solve using Gradient Boosting, whether it's a regression or 
classification task.
2. Define the loss function: The loss function is a measure of how well the algorithm is doing in fitting the training
data. In Gradient Boosting, we typically use a differentiable loss function such as mean squared error for regression 
or log loss for classification.
3. Create an initial model: Create an initial model to make predictions. This model can be as simple as the mean of the
target variable or a linear regression model.
4. Calculate the residual errors: Calculate the residual errors by subtracting the predictions of the initial model from
the actual values of the target variable.
5. Train a new model on the residual errors: Train a new model on the residual errors from the previous step. This model
is usually a decision tree with a fixed depth.
6. Add the predictions of the new model to the previous predictions: Add the predictions of the new model to the previous
predictions to update the model. This process is called boosting because we are boosting the performance of the model by 
adding new models to it.
7. Repeat steps 4 to 6 until convergence: Repeat steps 4 to 6 until the model converges or until a stopping criterion is
met. The stopping criterion can be a maximum number of models, a threshold for the improvement of the loss function, or a 
maximum depth for the decision trees.
8. Make predictions: Use the final model to make predictions on new data.