# Assignment

### Q1. What is Gradient Boosting Regression?
Gradient Boosting Regression is an ensemble technique that builds a strong regression model by sequentially adding weak learners (typically decision trees). Each new learner corrects the errors made by the previous ones by minimizing a loss function (usually the mean squared error for regression). Gradient boosting models are trained iteratively using gradient descent, where the predictions are improved in the direction that reduces the loss the most.

### Q2. Implement a simple gradient boosting algorithm from scratch using Python and NumPy
Here’s a simple implementation of Gradient Boosting Regression from scratch

import numpy as np
from sklearn.metrics import mean_squared_error, r2_score

class SimpleGradientBoostingRegressor:
    def __init__(self, n_estimators=100, learning_rate=0.1, max_depth=2):
        self.n_estimators = n_estimators
        self.learning_rate = learning_rate
        self.max_depth = max_depth
        self.trees = []
        self.init_pred = None

    def _fit_tree(self, X, residuals):
        # Simple decision stump (tree with max depth 1) for demonstration purposes
        thresholds = np.mean(X, axis=0)
        best_threshold = None
        best_loss = float('inf')
        best_split = None

        for threshold in thresholds:
            left_split = X[:, 0] <= threshold
            right_split = X[:, 0] > threshold
            loss = mean_squared_error(residuals[left_split], np.mean(residuals[left_split])) + \
                   mean_squared_error(residuals[right_split], np.mean(residuals[right_split]))

            if loss < best_loss:
                best_loss = loss
                best_threshold = threshold
                best_split = (left_split, right_split)

        return best_threshold, best_split

    def fit(self, X, y):
        self.init_pred = np.mean(y)  # Initial prediction (mean of the target values)
        residuals = y - self.init_pred  # Compute residuals
        
        for _ in range(self.n_estimators):
            threshold, (left_split, right_split) = self._fit_tree(X, residuals)
            residuals[left_split] -= self.learning_rate * np.mean(residuals[left_split])
            residuals[right_split] -= self.learning_rate * np.mean(residuals[right_split])
            self.trees.append((threshold, np.mean(residuals[left_split]), np.mean(residuals[right_split])))

    def predict(self, X):
        preds = np.full(X.shape[0], self.init_pred)
        for threshold, left_pred, right_pred in self.trees:
            preds[X[:, 0] <= threshold] += self.learning_rate * left_pred
            preds[X[:, 0] > threshold] += self.learning_rate * right_pred
        return preds

# Generate simple data
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([1.5, 2.5, 3.5, 4.5, 5.5])

# Fit the model
model = SimpleGradientBoostingRegressor(n_estimators=10, learning_rate=0.1)
model.fit(X, y)

# Predict and evaluate the model
y_pred = model.predict(X)
mse = mean_squared_error(y, y_pred)
r2 = r2_score(y, y_pred)

print(f"Mean Squared Error: {mse}")
print(f"R-Squared: {r2}")


### Q3. Experiment with different hyperparameters such as learning rate, number of trees, and tree depth
You can experiment with different hyperparameters using grid search or random search to find the optimal configuration. For example

In [2]:
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.datasets import make_regression
from sklearn.metrics import make_scorer

# Generate a synthetic dataset
X, y = make_regression(n_samples=100, n_features=1, noise=0.1)

# Define a grid of hyperparameters
param_grid = {
    'n_estimators': [50, 100, 150],
    'learning_rate': [0.01, 0.1, 0.5],
    'max_depth': [1, 2, 3]
}

# Initialize the model
gbr = GradientBoostingRegressor()

# Perform grid search
grid_search = GridSearchCV(gbr, param_grid, scoring=make_scorer(mean_squared_error), cv=3)
grid_search.fit(X, y)

# Best hyperparameters
print(f"Best parameters: {grid_search.best_params_}")
print(f"Best score (MSE): {grid_search.best_score_}")


Best parameters: {'learning_rate': 0.01, 'max_depth': 1, 'n_estimators': 50}
Best score (MSE): 5775.772062981407


### Q4. What is a weak learner in Gradient Boosting?
A weak learner in Gradient Boosting is a model that performs slightly better than random guessing. In the case of gradient boosting, decision trees with a small depth (often referred to as decision stumps) are typically used as weak learners. These models are weak individually but can be combined to form a strong model through boosting.

### Q5. What is the intuition behind the Gradient Boosting algorithm?
The intuition behind Gradient Boosting is to sequentially add weak learners to correct the mistakes made by previous learners. At each step, the model learns from the residuals (the difference between the actual values and predictions) and fits a new model to these residuals. Over multiple iterations, the model improves, minimizing the error.

### Q6. How does Gradient Boosting algorithm build an ensemble of weak learners?
Gradient Boosting builds an ensemble by training weak learners sequentially. After each learner is trained, it is used to correct the errors made by the previous ones. The contribution of each weak learner is controlled by a learning rate. The ensemble grows by adding weak learners until the desired number of estimators or until the error is sufficiently minimized.

### Q7. What are the steps involved in constructing the mathematical intuition of Gradient Boosting algorithm?
Initialize the model: Start with an initial prediction, typically the mean of the target variable for regression.
Calculate residuals: Compute the difference between the actual values and the predicted values (residuals).
Fit a weak learner: Train a weak learner on the residuals.
Update the model: Update the model by adding the predictions of the weak learner to the current model.
Iterate: Repeat the process, fitting new learners to the residuals and updating the model until a stopping criterion is reached (e.g., number of estimators, or error threshold).