In [None]:
#Q1. What is Gradient Boosting Regression?

In [None]:
'''
Gradient Boosting Regression is a machine learning algorithm that belongs to the ensemble family. 
It iteratively trains a series of weak models (often decision trees) and combines their predictions to create a strong predictive model.
Unlike Random Forest, which creates models independently, Gradient Boosting focuses on improving the performance of the ensemble by adjusting the weights of training instances based on their classification accuracy by previous models.

Key components of Gradient Boosting Regression:

Weak Models: Typically decision trees.
Iterative Training: Models are trained sequentially, with each model focusing on the errors made by previous models.
Gradient Descent: The algorithm uses gradient descent to minimize the loss function, optimizing the weights of the individual models.
Ensemble: The final prediction is a weighted sum of the predictions from all the weak models.

How it works:

Initialize: A weak model is trained on the entire training dataset.
Calculate Residuals: The residuals (differences between actual and predicted values) are calculated.
Train New Model: A new weak model is trained to predict the residuals.
Update Predictions: The predictions of the new model are scaled and added to the predictions of previous models.
Repeat: Steps 2-4 are repeated for a specified number of iterations.

Advantages of Gradient Boosting Regression:

Accurate: Often achieves high accuracy, especially when the base models are weak learners.
Handles Complex Patterns: Can handle complex relationships in the data.
Flexible: Can be applied to various tasks, including regression, classification, and ranking.
Interpretable: The importance of features can be assessed based on the contribution of each weak model.

Disadvantages of Gradient Boosting Regression:

Computational Cost: Can be computationally expensive, especially for large datasets or deep models.
Overfitting: Prone to overfitting if not carefully tuned.
Sensitive to Hyperparameters: The performance is sensitive to the choice of hyperparameters.

Gradient Boosting Regression is a powerful and versatile algorithm that can be used for various regression tasks.
Its ability to handle complex patterns and achieve high accuracy makes it a popular choice in machine learning.  '''

In [None]:
#Q2. Implement a simple gradient boosting algorithm from scratch using Python and NumPy. Use a
simple regression problem as an example and train the model on a small dataset. Evaluate the model's
performance using metrics such as mean squared error and R-squared.

In [None]:
'''import numpy as np

def gradient_boosting_regression(X, y, n_estimators=100, learning_rate=0.1):
    """
    Implements a simple gradient boosting regression algorithm.

    Args:
        X: Input features
        y: Target variable
        n_estimators: Number of boosting iterations
        learning_rate: Learning rate for updating predictions

    Returns:
        The trained gradient boosting model
    """

    m, n = X.shape
    y_pred = np.zeros(m)
    models = []

    for i in range(n_estimators):
        residuals = y - y_pred
        model = DecisionTreeRegressor(max_depth=1)  # Simple decision tree
        model.fit(X, residuals)
        models.append(model)
        y_pred += learning_rate * model.predict(X)

    return models

# Example usage
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([2, 4, 5, 4, 5])

model = gradient_boosting_regression(X, y)

# Make predictions
y_pred = np.sum([model.predict(X) for model in model], axis=0)

# Evaluate performance
mse = np.mean((y - y_pred) ** 2)
r2 = 1 - np.sum((y - y_pred) ** 2) / np.sum((y - np.mean(y)) ** 2)

print("Mean Squared Error:", mse)
print("R-squared:", r2) '''

In [None]:
#Q3. Experiment with different hyperparameters such as learning rate, number of trees, and tree depth to
optimise the performance of the model. Use grid search or random search to find the best
hyperparameters

In [None]:
'''Hyperparameter Tuning for Gradient Boosting Regression
Understanding the Problem:
We want to optimize the performance of our Gradient Boosting Regression model by tuning its hyperparameters. 
We'll use grid search to explore different combinations of hyperparameters and find the best configuration.

Hyperparameters to Tune:

n_estimators: Number of boosting stages
learning_rate: Learning rate for updates
max_depth: Maximum depth of individual trees

Using GridSearchCV:

from sklearn.model_selection import GridSearchCV

# Create a gradient boosting regressor
gbm = GradientBoostingRegressor()

# Define the hyperparameter grid
param_grid = {
    'n_estimators': [100, 200, 300],
    'learning_rate': [0.01, 0.1, 0.3],
    'max_depth': [3, 4, 5]
}

# Create a grid search object
grid_search = GridSearchCV(gbm, param_grid, cv=5, scoring='neg_mean_squared_error')

# Fit the grid search to the data
grid_search.fit(X_train, y_train)

# Best parameters and score
best_params = grid_search.best_params_
best_score = -grid_search.best_score_

print("Best Parameters:", best_params)
print("Best Score (Mean Squared Error):", best_score)

'''

In [None]:
#Q4. What is a weak learner in Gradient Boosting?

In [None]:
'''
In Gradient Boosting, a weak learner is a simple model that is individually not very accurate but can still make reasonable predictions.
These models are often used as base learners in the boosting process.

Common examples of weak learners in Gradient Boosting include:

Decision Stumps: These are decision trees with a maximum depth of 1, meaning they make a single split on a single feature.
Shallow Decision Trees: While not as simple as decision stumps, shallow decision trees with limited depth are still considered weak learners compared to deep trees.
Linear Models: Simple linear models like linear regression or logistic regression can also be used as weak learners.

The key idea behind using weak learners in Gradient Boosting is that their combined predictions can be more accurate than the predictions of a 
single, more complex model. By iteratively training weak models and focusing on the errors made by previous models, 
Gradient Boosting can create a strong ensemble that effectively captures complex patterns in the data.            '''

In [None]:
#Q5. What is the intuition behind the Gradient Boosting algorithm?

In [None]:
'''The intuition behind Gradient Boosting is to iteratively train weak models and focus on the errors made by previous models.

Here's a breakdown of the underlying concepts:

Residuals: Gradient Boosting starts by training a base model on the original data. The residuals, which are the differences between the actual and predicted values, are then calculated.
Focus on Errors: In subsequent iterations, new models are trained to predict these residuals. This means that the algorithm is essentially focusing on the parts of the data that the previous models struggled to predict accurately.
Ensemble: The predictions of all the weak models are combined, with the weights of each model determined based on its contribution to reducing the overall error.

Key ideas:

Iterative Improvement: By repeatedly focusing on the errors of previous models, Gradient Boosting can gradually improve the overall accuracy.
Ensemble: The final prediction is an ensemble of the predictions from all the weak models, leveraging the collective wisdom of the ensemble.
Gradient Descent: The algorithm uses gradient descent to optimize the weights of the models, ensuring that each new model contributes to reducing the overall error. '''

In [None]:
#Q6. How does Gradient Boosting algorithm build an ensemble of weak learners?

In [None]:
'''Gradient Boosting builds an ensemble of weak learners by iteratively training new models to predict the residuals from previous models.

Here's a breakdown of the process:

Initialize: A base model (often a decision tree) is trained on the original data.
Calculate Residuals: The residuals, which are the differences between the actual and predicted values, are calculated.
Train New Model: A new weak model is trained to predict these residuals.
Update Predictions: The predictions of the new model are scaled and added to the predictions of previous models.
Repeat: Steps 2-4 are repeated for a specified number of iterations.

Key points:

Iterative Process: Gradient Boosting is an iterative process where each new model focuses on the errors made by previous models.
Residuals: The algorithm focuses on predicting the residuals, which represent the parts of the data that the previous models struggled to capture.
Ensemble: The final prediction is an ensemble of the predictions from all the weak models, weighted according to their contribution to reducing the overall error. '''

In [None]:
#Q7. What are the steps involved in constructing the mathematical intuition of Gradient Boosting algorithm?

In [None]:
'''Constructing the Mathematical Intuition of Gradient Boosting

Gradient Boosting is a powerful ensemble learning technique that iteratively trains weak models to create a strong predictive model. To understand its mathematical intuition, we can break it down into the following steps:

Loss Function: The first step is to define a loss function that quantifies the error between the predicted and actual values. Common loss functions include squared error for regression and log loss for classification.
Initialization: A base model (often a decision tree) is trained on the original data.
Residual Calculation: The residuals, which are the differences between the actual and predicted values, are calculated.
Gradient Calculation: The gradient of the loss function with respect to the predictions is computed. This gradient indicates the direction in which the predictions should be adjusted to minimize the loss.
Update Predictions: The predictions are updated by adding a scaled version of the gradient to the previous predictions. The scaling factor, known as the learning rate, controls the step size in the optimization process.
Iterative Process: Steps 3-5 are repeated for a specified number of iterations, with each iteration focusing on reducing the loss.

Mathematical Formulation:

Let:

y: The true target variable
y_pred: The predicted values
L: The loss function
h_m: The m-th weak model
α_m: The learning rate for the m-th model

The gradient boosting algorithm can be expressed as follows:

Initialize: y_pred = h_0(X)
For m = 1 to M:
Calculate residuals: r_m = -∂L(y, y_pred) / ∂y_pred
Train weak model: h_m = fit_model(X, r_m)
Update predictions: y_pred = y_pred + α_m * h_m(X)   '''