### Q1. What is Gradient Boosting Regression?

**Gradient Boosting Regression** is a machine learning technique used for regression tasks, where models are built sequentially, and each new model is trained to correct the residuals (errors) of the previous models. The goal of gradient boosting is to create a strong predictive model by combining several weak learners, typically decision trees. Each tree is fitted on the errors or residuals of the previous predictions, and the overall model improves iteratively using gradient descent to minimize a predefined loss function (e.g., mean squared error).

---

###  Q2. Implement a simple gradient boosting algorithm from scratch using Python and NumPy


In [1]:
import numpy as np
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error, r2_score

class SimpleGradientBoostingRegressor:
    def __init__(self, n_estimators=100, learning_rate=0.1, max_depth=3):
        self.n_estimators = n_estimators
        self.learning_rate = learning_rate
        self.max_depth = max_depth
        self.models = []
        
    def fit(self, X, y):
        # Initialize the predictions as the mean of the target variable
        self.initial_prediction = np.mean(y)
        predictions = np.full(y.shape, self.initial_prediction)
        
        # Train multiple decision trees (weak learners)
        for _ in range(self.n_estimators):
            # Compute the residuals (errors)
            residuals = y - predictions
            
            # Train a decision tree on the residuals
            tree = DecisionTreeRegressor(max_depth=self.max_depth)
            tree.fit(X, residuals)
            
            # Update predictions
            predictions += self.learning_rate * tree.predict(X)
            
            # Store the tree model
            self.models.append(tree)
            
    def predict(self, X):
        # Initialize predictions with the mean value
        predictions = np.full(X.shape[0], self.initial_prediction)
        
        # Sum the predictions of each tree in the ensemble
        for tree in self.models:
            predictions += self.learning_rate * tree.predict(X)
            
        return predictions


# Example dataset (simple synthetic dataset)
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([1.5, 2.5, 3.5, 4.5, 5.5])

# Instantiate the model
gbr = SimpleGradientBoostingRegressor(n_estimators=100, learning_rate=0.1, max_depth=2)
gbr.fit(X, y)

# Make predictions
predictions = gbr.predict(X)

# Evaluate performance
mse = mean_squared_error(y, predictions)
r2 = r2_score(y, predictions)

print("Mean Squared Error:", mse)
print("R-squared:", r2)


Mean Squared Error: 4.0408644744005826e-09
R-squared: 0.9999999979795677


### Q3. Experiment with different hyperparameters such as learning rate, number of trees, and tree depth to optimise the performance of the model. Use grid search or random search to find the best hyperparameters

To optimize performance, you can adjust hyperparameters using **grid search** or **random search**. Here's an example using `GridSearchCV`:


In [2]:
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import GradientBoostingRegressor

# Hyperparameter grid
param_grid = {
    'n_estimators': [50, 100, 200],
    'learning_rate': [0.01, 0.1, 0.2],
    'max_depth': [2, 3, 4]
}

# Instantiate the model
gbr = GradientBoostingRegressor()

# Perform grid search
grid_search = GridSearchCV(estimator=gbr, param_grid=param_grid, cv=3, scoring='neg_mean_squared_error')
grid_search.fit(X, y)

# Best hyperparameters
print("Best Hyperparameters:", grid_search.best_params_)

# Evaluate the best model
best_model = grid_search.best_estimator_
predictions = best_model.predict(X)
mse = mean_squared_error(y, predictions)
r2 = r2_score(y, predictions)

print("Best Model Mean Squared Error:", mse)
print("Best Model R-squared:", r2)


Best Hyperparameters: {'learning_rate': 0.2, 'max_depth': 2, 'n_estimators': 100}
Best Model Mean Squared Error: 1.9923772363239538e-16
Best Model R-squared: 0.9999999999999999


### Q4. What is a weak learner in Gradient Boosting?

A **weak learner** in Gradient Boosting is a simple model that performs slightly better than random guessing. In Gradient Boosting, weak learners are typically shallow decision trees (trees with a small depth, often 1 or 2 levels). These weak learners are combined to form a strong model by focusing on correcting the errors of the previous learners.

---

### Q5. What is the intuition behind the Gradient Boosting algorithm?

The intuition behind **Gradient Boosting** is that by training models sequentially and focusing on the mistakes made by previous models, we can progressively improve the model's accuracy. Each model corrects the residuals (errors) of the previous models, and gradient descent is used to minimize a loss function (e.g., mean squared error) iteratively.

Instead of fitting a single large, complex model, gradient boosting builds a series of small, simple models (weak learners) that gradually reduce the error.

---

### Q6. How does Gradient Boosting algorithm build an ensemble of weak learners?

Gradient Boosting builds an ensemble by training weak learners (usually shallow decision trees) sequentially. Each weak learner is trained to correct the errors (residuals) of the previous learners. The process is as follows:

1. **First learner**: A weak learner (decision tree) is trained to predict the target values.
2. **Residuals**: After the first learner makes predictions, the residuals (errors) between the predictions and the actual values are computed.
3. **Next learners**: Each subsequent learner is trained to predict these residuals, and this process continues.
4. **Aggregation**: The predictions of all weak learners are aggregated (summed) to produce the final prediction. A learning rate controls how much each learner's prediction contributes to the final model.

---

### Q7. What are the steps involved in constructing the mathematical intuition of Gradient Boosting?

1. **Initialization**: Start by predicting a constant value, typically the mean of the target values.
   
2. **Residual Calculation**: Compute the residuals (errors) between the true values and the predicted values. These residuals become the target for the next learner.

3. **Model Training**: Fit a weak learner (e.g., a decision tree) to the residuals. This model is trained to minimize the residuals, effectively reducing the error.

4. **Gradient Descent**: Gradient Boosting uses gradient descent to minimize a predefined loss function. Each new learner is fitted to the negative gradient of the loss function with respect to the current model's predictions.

5. **Iteration**: Repeat this process for a set number of iterations (trees), with each new model trained to predict the residuals (errors) of the previous model.

6. **Final Prediction**: The final prediction is a weighted sum of the initial prediction and all residual-correcting models. The learning rate determines the contribution of each learner to the final prediction.