## Assignment Question

__Q1. What is Gradient Boosting Regression?__

__Ans)__ Gradient Boosting Regression is a machine learning technique that uses an ensemble of weak regression models to create a strong regression model. It is a variant of the Gradient Boosting algorithm, which is based on the principle of iteratively combining weak learners to create a powerful predictive model.

In Gradient Boosting Regression, the weak learners are typically decision trees. The algorithm starts by fitting an initial decision tree to the training data and making predictions. Then, it calculates the residuals (the difference between the actual target values and the predicted values) and fits a new decision tree to these residuals. The process is repeated iteratively, with each new decision tree being trained on the negative gradients (partial derivatives) of the loss function with respect to the current predictions.

__Q2. Implement a simple gradient boosting algorithm from scratch using Python and NumPy. Use a
simple regression problem as an example and train the model on a small dataset. Evaluate the model's
performance using metrics such as mean squared error and R-squared.__

__Q3. Experiment with different hyperparameters such as learning rate, number of trees, and tree depth to
optimise the performance of the model. Use grid search or random search to find the best
hyperparameters__

In [None]:
import numpy as np
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score


# Define the Gradient Boosting Regressor class
class GradientBoostingRegressor:
    def __init__(self, n_estimators=100, learning_rate=0.1, max_depth=3):
        self.n_estimators = n_estimators  # Number of trees
        self.learning_rate = learning_rate  # Learning rate or shrinkage parameter
        self.max_depth = max_depth  # Maximum depth of each tree
        self.trees = []  # List to store the individual decision trees

    def fit(self, X, y):
        # Initialize the residuals with the target values
        residuals = y.copy()

        # Build each tree in the ensemble
        for _ in range(self.n_estimators):
            # Train a decision tree on the residuals
            tree = DecisionTreeRegressor(max_depth=self.max_depth)
            tree.fit(X, residuals)

            # Update the residuals by subtracting the predictions of the new tree
            residuals -= self.learning_rate * tree.predict(X)

            # Store the tree in the ensemble
            self.trees.append(tree)

    def predict(self, X):
        # Predict the target values by summing the predictions of all trees
        y_pred = np.sum([tree.predict(X) for tree in self.trees], axis=0)
        return y_pred


# Load the Boston Housing dataset
boston = load_boston()
X = boston.data
y = boston.target

# Split the dataset into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Instantiate and fit the gradient boosting regressor
gb_regressor = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, max_depth=3)
gb_regressor.fit(X_train, y_train)

# Make predictions on the test set
y_pred = gb_regressor.predict(X_test)

# Evaluate the model's performance
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Mean Squared Error (MSE):", mse)
print("R-squared (R2):", r2)

In [None]:
from sklearn.model_selection import GridSearchCV

# Define the parameter grid for grid search
param_grid = {
    'n_estimators': [50, 100, 150],
    'learning_rate': [0.1, 0.01, 0.001],
    'max_depth': [3, 4, 5]
}

# Instantiate the gradient boosting regressor
gb_regressor = GradientBoostingRegressor()

# Perform grid search
grid_search = GridSearchCV(gb_regressor, param_grid, cv=5)
grid_search.fit(X_train, y_train)

# Print the best hyperparameters
print("Best Hyperparameters:", grid_search.best_params_)

# Make predictions using the best model
best_model = grid_search.best_estimator_
y_pred = best_model.predict(X_test)

# Evaluate the best model's performance
mse, r2 = evaluate(y_test, y_pred)
print("Best Model - Mean Squared Error (MSE):", mse)
print("Best Model - R-squared (R2):", r2)

__Q4. What is a weak learner in Gradient Boosting?__

A weak learner in Gradient Boosting refers to a base or individual model that performs only slightly better than random guessing on a given learning task. In the context of Gradient Boosting, weak learners are typically decision trees with a shallow depth or limited complexity. These weak learners are often referred to as "decision stumps" when they consist of decision trees with only a single split. The key characteristic of a weak learner is that it produces predictions that are slightly better than random chance, making it useful for building an ensemble model through boosting.

__Q5. What is the intuition behind the Gradient Boosting algorithm?__

The intuition behind the Gradient Boosting algorithm is to iteratively build an ensemble model by combining the strengths of multiple weak learners to create a stronger and more accurate predictive model. The algorithm aims to improve upon the shortcomings of each weak learner by focusing on the mistakes made by the previous models in the ensemble. It does this by assigning higher weights to the misclassified samples and adjusting the subsequent weak learners to prioritize those samples during training. By repeatedly adding weak learners to the ensemble and updating their weights, the algorithm gradually reduces the overall error and produces a strong learner with improved predictive performance.

__Q6. How does the Gradient Boosting algorithm build an ensemble of weak learners?__

The Gradient Boosting algorithm builds an ensemble of weak learners in an iterative manner. It starts with an initial weak learner, which is often a simple model that provides rough predictions for the target variable. Then, for each subsequent iteration, it fits a new weak learner to the residuals or errors of the previous predictions. The new weak learner is trained to minimize the residual errors by adjusting its parameters. The predictions of all the weak learners are then combined to produce the final ensemble prediction. Importantly, the algorithm assigns weights to each weak learner to determine their individual contributions to the ensemble, with higher weights given to more accurate models.

__Q7. What are the steps involved in constructing the mathematical intuition of the Gradient Boosting algorithm?__

The mathematical intuition of the Gradient Boosting algorithm involves the following steps:

- Initialize the ensemble model by assigning initial predictions to all samples in the dataset.

- Calculate the residuals or errors between the actual target values and the predictions made by the ensemble model.

- Train a weak learner, such as a decision tree, on the residuals to fit the errors.

- Update the ensemble model by adding the predictions of the newly trained weak learner, multiplied by a learning rate, to the previous predictions.

- Repeat steps 2 to 4 for a predetermined number of iterations or until a certain stopping criterion is met.

- Combine the predictions of all weak learners in the ensemble, typically through a weighted sum, to obtain the final prediction of the Gradient Boosting model.

- Optionally, apply regularization techniques, such as shrinkage or subsampling, to improve the generalization and prevent overfitting.

__By iteratively adjusting the weak learners based on the errors of the ensemble model, the Gradient Boosting algorithm progressively improves the model's predictive accuracy and builds a strong learner capable of capturing complex patterns in the data.__