In [1]:
# Ans 1

# Gradient Boosting Regression is a powerful machine learning technique used for regression tasks.
# It produces a prediction model in the form of an ensemble of weak prediction models.

# Here’s how it works:

# Sequential Learning: The algorithm builds a model in a stage-wise fashion and each new model tries to correct the previous model.
# It combines several weak learners into strong learners.
# Minimize Loss Function: Each new model is trained to minimize the loss function such as mean squared error or cross-entropy of the previous
# model using gradient descent. In each iteration, the algorithm computes the gradient of the loss function with respect to the predictions of 
# the current ensemble and then trains a new weak model to minimize this gradient.
# Residual Errors: In contrast to AdaBoost, the weights of the training instances are not tweaked, instead, each predictor is trained using
# the residual errors of the predecessor as labels.
# Gradient Boosted Trees: There is a technique called the Gradient Boosted Trees whose base learner is CART (Classification and Regression Trees).

# Shrinkage: There is an important parameter used in this technique known as Shrinkage. Shrinkage refers to the fact that the prediction of each 
# tree in the ensemble is shrunk after it is multiplied by the learning rate (eta) which ranges between 0 to 11.

In [None]:
# Ans 2

import numpy as np

# Define the base learner
def base_learner(X, y, weights):
    return np.polyfit(X, y, 1, w=weights)

# Define the boosting algorithm
def gradient_boosting(X, y, M):
    N = len(y)
    weights = np.ones(N) / N
    models = []
    for m in range(M):
        learner = base_learner(X, y, weights)
        models.append(learner)
        predictions = np.polyval(learner, X)
        residuals = y - predictions
        weights = np.abs(residuals)
        weights /= np.sum(weights)
    return models

# Define the prediction function
def predict(models, X):
    return sum(np.polyval(model, X) for model in models)

# Define the mean squared error function
def mean_squared_error(y_true, y_pred):
    return np.mean((y_true - y_pred) ** 2)

# Define the R-squared function
def r_squared(y_true, y_pred):
    ss_res = np.sum((y_true - y_pred) ** 2)
    ss_tot = np.var(y_true) * len(y_true)
    return 1 - ss_res / ss_tot

# Generate a small dataset
np.random.seed(0)
X = np.random.rand(100)
y = X ** 2 + np.random.randn(100) * 0.1

# Train the model
models = gradient_boosting(X, y, 10)

# Make predictions
y_pred = predict(models, X)

# Evaluate the model
mse = mean_squared_error(y, y_pred)
r2 = r_squared(y, y_pred)

print(f'Mean Squared Error: {mse:.2f}')
print(f'R-squared: {r2:.2f}')


In [None]:
# Ans 3

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import mean_squared_error, r2_score

# Generate a small dataset
X, y = make_regression(n_samples=100, n_features=1, noise=0.1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define the model
model = GradientBoostingRegressor()

# Define the grid of hyperparameters
param_grid = {
    'learning_rate': [0.01, 0.1, 1],
    'n_estimators': [10, 100, 1000],
    'max_depth': [1, 2, 3]
}

# Define the grid search
grid_search = GridSearchCV(model, param_grid, cv=5, scoring='neg_mean_squared_error')

# Fit the grid search
grid_search.fit(X_train, y_train)

# Print the best parameters
print(f'Best parameters: {grid_search.best_params_}')

# Make predictions with the best model
y_pred = grid_search.predict(X_test)

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f'Mean Squared Error: {mse:.2f}')
print(f'R-squared: {r2:.2f}')


In [2]:
# ANs 4

# In the context of Gradient Boosting, a weak learner refers to a simple model that performs only slightly better than random chance. 
# The term "weak" does not imply that these learners are bad, but rather that they are not complex. 

# In Gradient Boosting, decision trees are commonly used as weak learners. Specifically, regression trees are used that output real values 
# for splits and whose output can be added together, allowing subsequent models' outputs to be added and "correct" the residuals in the predictions.

# The algorithm starts with a single weak learner and works as follows:
# 1. Train a single weak learner.
# 2. Figure out which examples the weak learner got wrong.
# 3. Build another weak learner that focuses on the areas the first weak learner got wrong.
# 4. Continue this process until a predetermined stopping condition is met, such as until a set number of weak learners have been created, 
# or the model’s performance has plateaued.

# In this way, each new weak learner is specifically tuned to focus on the weak points of the previous weak learner(s).
# The more often an example is missed, the more likely it is that the next weak learner will be the one that can classify that example correctly.
# In this way, all the weak learners work together to make up a single strong learner.


In [3]:
# ANs 5

# The intuition behind the Gradient Boosting algorithm is that it repetitively leverages the patterns in residuals and strengthens a model
# with weak predictions to make it better.
# Here’s how it works:

# Initial Model: The algorithm starts by fitting an initial model (e.g., a tree or linear regression) to the data.

# Sequential Learning: Then a second model is built that focuses on accurately predicting the cases where the first model performs poorly.
# The combination of these two models is expected to be better than either model alone.
# Minimize Error: Each successive model attempts to correct for the shortcomings of the combined boosted ensemble of all previous models.
# The main intuition behind the algorithm is that the best possible next model, when combined with previous models, minimizes the overall 
# prediction error.
# Target Outcomes: The key idea is to set the target outcomes for this next model to minimize the error.
# Residuals: So, the intuition behind gradient boosting algorithm is to repetitively leverage the patterns in residuals and strengthen a model
# with weak predictions and make it better. Once we reach a stage where residuals do not have any pattern that could be modeled, we can stop 
# modeling residuals (otherwise it might lead to overfitting).

# Negative Gradient: The principle idea behind this algorithm is to construct the new base-learners to be maximally correlated 
# with the negative gradient of the loss function, associated with the whole ensemble.

In [4]:
# Ans 6

# Gradient Boosting builds an ensemble of weak learners in a sequential manner. Here’s how it works:

# Initial Model: The algorithm starts by fitting an initial model (e.g., a tree or linear regression) to the data.

# Sequential Learning: Then a second model is built that focuses on accurately predicting the cases where the first model performs poorly.
# The combination of these two models is expected to be better than either model alone1.

# Minimize Error: Each successive model attempts to correct for the shortcomings of the combined boosted ensemble of all previous models.
# The main intuition behind the algorithm is that the best possible next model, when combined with previous models, minimizes the overall prediction error1.

# Target Outcomes: The key idea is to set the target outcomes for this next model to minimize the error.

# Residuals: So, the intuition behind gradient boosting algorithm is to repetitively leverage the patterns in residuals and strengthen a model 
# with weak predictions and make it better3. Once we reach a stage where residuals do not have any pattern that could be modeled, we can stop 
# modeling residuals (otherwise it might lead to overfitting).

# Negative Gradient: The principle idea behind this algorithm is to construct the new base-learners to be maximally correlated with the negative 
# gradient of the loss function, associated with the whole ensemble

In [None]:
# Ans 7

# Initial Model: The algorithm starts by fitting an initial model (e.g., a tree or linear regression) to the data1234. This model will be associated with a residual (y – F0)3.

# Sequential Learning: Then a second model is built that focuses on accurately predicting the cases where the first model performs poorly1234. The combination of these two models is expected to be better than either model alone1234.

# Minimize Error: Each successive model attempts to correct for the shortcomings of the combined boosted ensemble of all previous models1234. The main intuition behind the algorithm is that the best possible next model, when combined with previous models, minimizes the overall prediction error1234.

# Target Outcomes: The key idea is to set the target outcomes for this next model to minimize the error1234.

# Residuals: So, the intuition behind gradient boosting algorithm is to repetitively leverage the patterns in residuals and strengthen a model with weak predictions and make it better1234. Once we reach a stage where residuals do not have any pattern that could be modeled, we can stop modeling residuals (otherwise it might lead to overfitting)1234.

# Negative Gradient: The principle idea behind this algorithm is to construct the new base-learners to be maximally correlated with the negative gradient of the loss function, associated with the whole ensemble1234.

# Combine Models: Now, F0 and h1 are combined to give F1, the boosted version of F03. The mean squared error from F1 will be lower than that from F03.