# Q.1

Gradient Boosting Regression (GBR) is a machine learning algorithm that belongs to the family of boosting algorithms. It is a supervised learning algorithm used for both regression and classification problems.

Gradient Boosting Regression is an ensemble learning technique that combines weak regression models, typically decision trees, to make accurate predictions for regression tasks. It iteratively trains models to minimize the errors made by previous models, gradually improving predictions. By combining the strengths of multiple models, it captures complex relationships in the data and produces robust regression predictions.

# Q.2

In [1]:
from sklearn.datasets import make_regression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.tree import DecisionTreeRegressor
import numpy as np

class GradientBoostingRegressor:
    
    def __init__(self, n_estimators=100, learning_rate=0.1, max_depth=3):
        self.n_estimators = n_estimators
        self.learning_rate = learning_rate
        self.max_depth = max_depth
        self.trees = []
        self.intercept = None
        
    def fit(self, X, y):
        self.intercept = np.mean(y)
        residual = y - self.intercept
        
        for i in range(self.n_estimators):
            tree = DecisionTreeRegressor(max_depth=self.max_depth)
            tree.fit(X, residual)
            self.trees.append(tree)
            pred = tree.predict(X)
            residual -= self.learning_rate * pred
            
    def predict(self, X):
        preds = np.array([tree.predict(X) for tree in self.trees])
        return self.intercept + self.learning_rate * np.sum(preds, axis=0)

# Generate a random regression problem
X, y = make_regression(n_samples=100, n_features=5, noise=0.5)

# Split data into training and testing sets
n_train = int(0.8 * len(X))
X_train, y_train = X[:n_train], y[:n_train]
X_test, y_test = X[n_train:], y[n_train:]

# Train a gradient boosting regressor on the training set
gb = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, max_depth=3)
gb.fit(X_train, y_train)

# Evaluate the model on the testing set
y_pred = gb.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Mean squared error: {mse:.2f}")
print(f"R-squared: {r2:.2f}")

Mean squared error: 1216.38
R-squared: 0.91


# Q.3

In [6]:
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
from sklearn.metrics import mean_squared_error
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split

# Generate dummy regression data
X, y = make_regression(n_samples=1000, n_features=10, random_state=42)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a Gradient Boosting Regression model
model = GradientBoostingRegressor()

# Define the parameter grid for grid search
param_grid = {
    'learning_rate': [0.1, 0.01, 0.001],
    'n_estimators': [100, 200, 300],
    'max_depth': [3, 4, 5]
}

# Perform grid search
grid_search = GridSearchCV(model, param_grid, cv=5, scoring='neg_mean_squared_error')
grid_search.fit(X_train, y_train)

# Print the best hyperparameters and the corresponding MSE
print("Best Hyperparameters (Grid Search): ", grid_search.best_params_)
print("Best MSE (Grid Search): ", -grid_search.best_score_)

# Define the parameter grid for random search
param_dist = {
    'learning_rate': [0.1, 0.01, 0.001],
    'n_estimators': [100, 200, 300],
    'max_depth': [3, 4, 5]
}

# Perform random search
random_search = RandomizedSearchCV(model, param_dist, cv=5, scoring='neg_mean_squared_error', n_iter=10)
random_search.fit(X_train, y_train)

# Print the best hyperparameters and the corresponding MSE
print("Best Hyperparameters (Random Search): ", random_search.best_params_)
print("Best MSE (Random Search): ", -random_search.best_score_)

# Evaluate the best model on the test set
best_model = random_search.best_estimator_
y_pred = best_model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print("MSE on Test Set (Best Model): ", mse)


Best Hyperparameters (Grid Search):  {'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 300}
Best MSE (Grid Search):  876.6888496274581
Best Hyperparameters (Random Search):  {'n_estimators': 300, 'max_depth': 4, 'learning_rate': 0.1}
Best MSE (Random Search):  1148.8147424966003
MSE on Test Set (Best Model):  1189.2080757166354


# Q.4

In Gradient Boosting, a weak learner is a machine learning algorithm that performs only slightly better than random guessing, but is still able to learn from the training data. The concept of a weak learner in Gradient Boosting algorithm, combines many weak learners to create a strong learner.

In Gradient Boosting, each weak learner is typically a decision tree with a shallow depth, often referred to as a decision stump. During training, the algorithm fits the weak learner to the data, and then adjusts the weights of the training examples to focus on the examples that were poorly predicted by the previous weak learner. The algorithm then adds the new weak learner to the ensemble, and repeats this process many times to gradually improve the model's accuracy.

The strength of the Gradient Boosting algorithm lies in its ability to combine many weak learners to create a strong learner. By adding many weak learners and focusing on the examples that are difficult to predict, the algorithm is able to learn complex relationships in the data and create a highly accurate model.

# Q.5

The intuition behind the Gradient Boosting algorithm is to iteratively improve the accuracy of a model by combining many weak learners to create a strong learner.

The algorithm starts by fitting a weak learner to the training data. This initial model may not be very accurate, but it provides a starting point for the algorithm to build on. The algorithm then evaluates the errors made by this model, and focuses on the examples that were poorly predicted. The next weak learner is then trained to focus on these difficult examples, in an attempt to correct the errors made by the previous model.

This process of adding a new weak learner and adjusting the weights of the examples is repeated many times, with each new weak learner improving on the errors made by the previous model. The final model is then created by combining all of the weak learners into a single ensemble model.

The key idea behind Gradient Boosting is that each weak learner focuses on the examples that are difficult to predict, and by combining many of these weak learners, the algorithm is able to learn complex relationships in the data and create a highly accurate model. The "gradient" in the name comes from the fact that the algorithm uses the gradient of the loss function to determine how to adjust the weights of the training examples at each step, in order to focus on the examples that are difficult to predict.

# Q.6

The Gradient Boosting algorithm builds an ensemble of weak learners by iteratively adding new weak learners to the ensemble and adjusting the weights of the training examples to focus on the examples that are difficult to predict. This algorithm works as follows:

**1. Initialize the model:** The algorithm starts by initializing the model with a single weak learner, such as a decision tree with a shallow depth.  
**2. Fit the model:** The weak learner is fit to the training data, and the model makes predictions on the training set.  
**3. Compute the residuals:** The algorithm computes the difference between the predicted values and the true values on the training set. These differences are called the residuals.  
**4. Update the weights:** The weights of the training examples are adjusted based on the residuals. Examples, or simply features, that were poorly predicted by the previous model are given more weight, while examples that were well predicted are given less weight.  
**5. Fit a new weak learner** A new weak learner is fit to the training data, with the weights of the examples, or simply features, that were adjusted to focus on the examples that are difficult to predict.  
**6. Add the new weak learner to the ensemble:** The new weak learner is added to the ensemble of weak learners.  
**7. Repeat:** Steps 2-6 are repeated many times, with each new weak learner focusing on the examples that are difficult to predict, and the weights of the training examples adjusted to emphasize these examples.  
**8. Combine the weak learners:** The final model is created by combining all of the weak learners in the ensemble. The predictions of the weak learners are weighted according to their performance on the training set, and the weighted sum of the predictions is used to make the final prediction.

# Q.7

The mathematical intuition behind the Gradient Boosting algorithm can be broken down into several key steps:

1. Define the Loss Function: The first step is to define a loss function that measures the difference between the predicted values and the true values. Common loss functions for regression problems include the mean squared error (MSE) and the mean absolute error (MAE). For classification problems, common loss functions include the binary cross-entropy loss and the multi-class cross-entropy loss.
2. Fit the Initial Model: The algorithm starts by fitting an initial model to the data, such as a decision tree with a shallow depth. This initial model may not be very accurate, but it provides a starting point for creating an algorithm.
3. Compute the Residuals: The algorithm computes the difference between the predicted values and the true values on the training set. These differences are called the residuals.
4. Fit a New Weak Learner: The next step is to fit a new weak learner to the residuals. The new weak learner should focus on the examples that were poorly predicted by the previous model, in an attempt to correct the errors made by the previous model. The most common choice for a weak learner in Gradient Boosting is a decision tree with a shallow depth.
5. Update the Model: The predictions of the new weak learner are added to the predictions of the previous model, and the model is updated to minimize the loss function. This is typically done using a gradient descent algorithm, where the gradient of the loss function is used to determine the direction and magnitude of the update.
6. Repeat: Steps 3-5 are repeated many times, with each new weak learner focusing on the examples that are difficult to predict, and the model updated to minimize the loss function.
7. Combine the Weak Learners: The final model is created by combining all of the weak learners in the ensemble. The predictions of the weak learners are weighted according to their performance on the training set, and the weighted sum of the predictions is used to make the final prediction.