In [1]:
#assignment 71

ans 1:Gradient Boosting Regression is a machine learning technique used for regression problems. It is an ensemble method that combines multiple weak learning models, typically decision trees, to create a stronger prediction model.

In Gradient Boosting Regression, the algorithm starts by building an initial model and then iteratively adds more models, with each new model being trained to correct the errors of the previous model. The models are trained sequentially, with each subsequent model attempting to minimize the residual error of the previous model.

The process of building the ensemble of models is done by computing the gradient of the loss function with respect to the predictions of the previous model. This gradient is then used to update the parameters of the next model, such that it can better predict the residuals of the previous model.

The final prediction of the ensemble model is the sum of the predictions of all the individual models. Gradient Boosting Regression has been shown to be effective in a wide range of applications, and is particularly useful when dealing with non-linear relationships between the input features and the target variable.

In [2]:
#ans 2:
import numpy as np

# Define input features and target variable
X = np.array([[1], [2], [3], [4], [5], [6], [7], [8], [9], [10]])
y = np.array([2, 4, 6, 8, 10, 12, 14, 16, 18, 20])
class DecisionTree:
    def __init__(self, max_depth=2):
        self.max_depth = max_depth
        self.tree = None
        
    def fit(self, X, y):
        self.tree = self._build_tree(X, y, depth=0)
        
    def _build_tree(self, X, y, depth):
        # Stop building tree if max depth is reached or all samples have the same target value
        if depth == self.max_depth or len(np.unique(y)) == 1:
            return np.mean(y)
        
        # Find the best feature to split on based on lowest mean squared error
        mse = float('inf')
        for feature in range(X.shape[1]):
            for threshold in np.unique(X[:, feature]):
                left_y = y[X[:, feature] < threshold]
                right_y = y[X[:, feature] >= threshold]
                
                if len(left_y) == 0 or len(right_y) == 0:
                    continue
                
                left_mse = np.mean(np.square(left_y - np.mean(left_y)))
                right_mse = np.mean(np.square(right_y - np.mean(right_y)))
                total_mse = left_mse + right_mse
                
                if total_mse < mse:
                    mse = total_mse
                    best_feature = feature
                    best_threshold = threshold
        
        # Split the dataset based on the best feature and threshold
        left_X = X[X[:, best_feature] < best_threshold]
        left_y = y[X[:, best_feature] < best_threshold]
        right_X = X[X[:, best_feature] >= best_threshold]
        right_y = y[X[:, best_feature] >= best_threshold]
        
        # Recursively build left and right subtrees
        left_tree = self._build_tree(left_X, left_y, depth+1)
        right_tree = self._build_tree(right_X, right_y, depth+1)
        
        # Combine the subtrees into a single tree
        return {'feature': best_feature, 'threshold': best_threshold, 'left': left_tree, 'right': right_tree}
        
    def predict(self, X):
        return np.array([self._predict_tree(x, self.tree) for x in X])
        
    def _predict_tree(self, x, tree):
        if isinstance(tree, float):
            return tree
        
        if x[tree['feature']] < tree['threshold']:
            return self._predict_tree(x, tree['left'])
        else:
            return self._predict_tree(x, tree['right'])
class GradientBoostingRegressor:
    def __init__(self, n_estimators=100, learning_rate=0.1, max_depth=2):
        self.n_estimators = n_estimators
        self.learning_rate = learning_rate
        self.max_depth = max


In [3]:
#ans 3: 
from sklearn.metrics import mean_squared_error, r2_score
from itertools import product

def grid_search(X, y, estimator, param_grid, cv=5):
    best_params = None
    best_score = float('-inf')
    
    for params in product(*param_grid.values()):
        params = dict(zip(param_grid.keys(), params))
        model = estimator(**params)
        
        scores = []
        for train_idx, val_idx in KFold(n_splits=cv).split(X):
            X_train, y_train = X[train_idx], y[train_idx]
            X_val, y_val = X[val_idx], y[val_idx]
            
            model.fit(X_train, y_train)
            y_pred = model.predict(X_val)
            score = r2_score(y_val, y_pred)
            scores.append(score)
            
        avg_score = np.mean(scores)
        if avg_score > best_score:
            best_score = avg_score
            best_params = params
    
    return best_params, best_score
from sklearn.model_selection import KFold

# Define input features and target variable
X = np.array([[1], [2], [3], [4], [5], [6], [7], [8], [9], [10]])
y = np.array([2, 4, 6, 8, 10, 12, 14, 16, 18, 20])

# Define hyperparameters to search over
param_grid = {
    'n_estimators': [10, 50, 100],
    'learning_rate': [0.1, 0.01],
    'max_depth': [2, 3, 4]
}

# Perform grid search over hyperparameters
best_params, best_score = grid_search(X, y, GradientBoostingRegressor, param_grid)

# Train final model with best hyperparameters
model = GradientBoostingRegressor(**best_params)
model.fit(X, y)

# Evaluate final model performance
y_pred = model.predict(X)
mse = mean_squared_error(y, y_pred)
r2 = r2_score(y, y_pred)

print("Best hyperparameters:", best_params)
print("Best R-squared score:", best_score)
print("MSE:", mse)
print("R-squared:", r2)


ans 4:In Gradient Boosting, a weak learner is a model that performs only slightly better than random guessing on a given dataset. Weak learners are typically simple models with low complexity, such as decision trees with a single split or linear regression models.

In the context of Gradient Boosting, weak learners are combined together to form a strong ensemble model. Each weak learner is trained on the residuals (i.e., the difference between the predicted and true values) of the previous model in the ensemble. By iteratively adding weak learners and optimizing the ensemble's predictions, the Gradient Boosting algorithm is able to create a strong predictive model that can generalize well to new data.

The key advantage of using weak learners in Gradient Boosting is that it allows the model to be flexible and adaptive to complex and non-linear relationships in the data, while avoiding overfitting. By focusing on improving the residuals, rather than fitting the training data perfectly, Gradient Boosting can prevent the model from memorizing the training data and improve its ability to generalize to new data.

ans 5: The intuition behind the Gradient Boosting algorithm is to iteratively improve the predictions of a weak learner by adding new models to the ensemble, with each model trained to correct the errors of the previous models.

At a high level, Gradient Boosting works as follows:

First, a simple model (i.e., a weak learner) is trained on the training data.
The model's predictions are compared to the true values, and the differences between the two are calculated.
A new model is then trained to predict the residuals (i.e., the differences between the predicted and true values) of the previous model.
The predictions of the new model are added to the predictions of the previous model, creating a new and improved set of predictions.
Steps 2-4 are repeated iteratively, with each new model trained on the residuals of the previous model, until the algorithm reaches a predetermined stopping criterion (e.g., a maximum number of iterations or a minimum improvement in performance).
The key intuition behind Gradient Boosting is that by iteratively correcting the errors of the previous models, the algorithm is able to create a strong and accurate predictive model. The weak learners used in Gradient Boosting are typically simple models with low complexity, which allows the algorithm to avoid overfitting and generalize well to new data.

ans 6: The Gradient Boosting algorithm builds an ensemble of weak learners by iteratively adding new models to the ensemble, with each model trained to correct the errors of the previous models. The general steps for building an ensemble in Gradient Boosting are as follows:

Initialize the ensemble: The algorithm starts by initializing the ensemble with a simple model, typically a decision tree with a single split or a linear regression model.

Compute the residuals: The predictions of the current ensemble are compared to the true values, and the differences between the two (i.e., the residuals) are computed.

Train a new model: A new weak learner is trained to predict the residuals of the previous model. The goal is to find a new model that can accurately predict the errors of the current ensemble, and thus improve the overall predictions of the ensemble.

Add the new model to the ensemble: The predictions of the new model are added to the predictions of the current ensemble, creating a new set of predictions that better fit the true values.

Iterate: Steps 2-4 are repeated iteratively, with each new model trained on the residuals of the previous model, until a stopping criterion is met (e.g., a maximum number of iterations or a minimum improvement in performance).

ans 7: The mathematical intuition behind the Gradient Boosting algorithm can be constructed using the following steps:

Define the objective function: In Gradient Boosting, the objective function is typically a loss function that measures the difference between the predicted and true values of the target variable. For regression problems, the mean squared error (MSE) or mean absolute error (MAE) are commonly used as loss functions.

Initialize the ensemble: The algorithm starts by initializing the ensemble with a simple model, typically a constant value that is the average or median of the target variable.

Compute the negative gradient of the objective function: The negative gradient of the objective function with respect to the current predictions of the ensemble is computed. This negative gradient represents the direction of steepest descent for the objective function and indicates how much the predictions need to be adjusted to minimize the loss.

Train a new model: A new weak learner is trained to predict the negative gradient computed in step 3. This new model is typically a decision tree with a small number of splits, and is trained to approximate the negative gradient as closely as possible.

Compute the optimal step size: The optimal step size (also known as the learning rate) is computed by minimizing the objective function with respect to the new model. This step size determines how much the predictions of the new model are added to the predictions of the previous models in the ensemble.

Add the new model to the ensemble: The predictions of the new model, scaled by the optimal step size, are added to the predictions of the previous models in the ensemble. This creates a new set of predictions that better fit the true values of the target variable.

Iterate: Steps 3-6 are repeated iteratively, with each new model trained on the negative gradient of the previous predictions, until a stopping criterion is met (e.g., a maximum number of iterations or a minimum improvement in performance).