# ANSWER 1
Gradient Boosting Regression is a machine learning technique that uses the gradient boosting framework to solve regression problems. It is an extension of the gradient boosting algorithm, which was originally designed for classification tasks, to handle continuous target variables and make accurate predictions for regression tasks.

# ANSWER 2

In [13]:
import numpy as np
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_regression
from sklearn.tree import DecisionTreeRegressor
from random import randint, uniform

X, y = make_regression(n_samples=100, n_features=1, noise=10, random_state=42)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [14]:
class GradientBoostingRegressor:
    def __init__(self, n_estimators=100, learning_rate=0.1):
        self.n_estimators = n_estimators
        self.learning_rate = learning_rate
        self.estimators = []

    def fit(self, X, y):
        # Initialize the y_pred using the mean of the target values
        y_pred = np.full(len(y), np.mean(y))

        for _ in range(self.n_estimators):
            # Calculate the negative gradient for the current estimator
            gradient = y - y_pred

            # Fit a weak learner (in this case, a decision stump) to the negative gradient (residuals)
            stump_pred = X[:, 0] < 0.5  # Split the data at the midpoint
            residual_stump = gradient - np.mean(gradient[stump_pred])

            # Update the y_pred with a fraction of the predictions to prevent overfitting
            y_pred += self.learning_rate * np.dot(X[:, 0] < 0.5, residual_stump) / np.sum(X[:, 0] < 0.5)

            self.estimators.append((np.mean(gradient[stump_pred]), X[:, 0] < 0.5))

    def predict(self, X):
        y_pred = np.zeros(len(X))
        for stump_val, stump_pred in self.estimators:
            y_pred += self.learning_rate * np.dot(X[:, 0] < 0.5, stump_val)
        return y_pred

In [15]:
model = GradientBoostingRegressor()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

In [16]:
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Mean Squared Error:", mse)
print("R-squared:", r2)

Mean Squared Error: 11973.660335875124
R-squared: -6.191493235300815


# ANSWER 3

In [17]:
import numpy as np
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import train_test_split, RandomizedSearchCV
from sklearn.datasets import make_regression
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import GradientBoostingRegressor
from scipy.stats import uniform, randint

In [18]:
X, y = make_regression(n_samples=100, n_features=1, noise=10, random_state=42)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [19]:
param_dist = {
    'n_estimators': randint(50, 200),
    'learning_rate': uniform(0.01, 0.2),
    'max_depth': randint(2, 8),
}

Gradient_Boosting_Regressor = GradientBoostingRegressor()

random_search = RandomizedSearchCV(Gradient_Boosting_Regressor, param_distributions=param_dist, n_iter=10, scoring='neg_mean_squared_error', cv=3, random_state=42)

random_search.fit(X_train, y_train)

In [20]:
best_model = random_search.best_estimator_
best_params = random_search.best_params_

y_pred = best_model.predict(X_test)

In [21]:
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Best Hyperparameters:", best_params)
print("Mean Squared Error:", mse)
print("R-squared:", r2)

Best Hyperparameters: {'learning_rate': 0.11495493205167782, 'max_depth': 3, 'n_estimators': 64}
Mean Squared Error: 160.0893826801497
R-squared: 0.9038488081093845


# ANSWER 4
In the context of Gradient Boosting, a weak learner refers to a simple model that performs slightly better than random guessing on a given learning task. Weak learners are typically simple, low-complexity models, such as decision trees with limited depth (often called decision stumps), linear models, or shallow neural networks.

# ANSWER 5
The intuition behind the Gradient Boosting algorithm can be understood as follows:
1. Ensemble Learning: Gradient Boosting is an ensemble learning technique, which means it combines the predictions of multiple weak learners (e.g., decision trees) to create a more accurate and robust model. Each weak learner contributes its predictions, and their collective knowledge is used to make the final prediction.
2. Sequential Improvement: Gradient Boosting builds the ensemble of weak learners in a sequential and iterative manner. It starts with an initial prediction (usually a simple estimate like the mean for regression or the majority class for classification) and then continuously improves it in each iteration.
3. Correcting Errors: The core idea of Gradient Boosting is to correct the errors made by the previous weak learners. In each iteration, a new weak learner is trained to focus on the remaining errors or residuals of the current ensemble.
4. Gradient Descent-like Optimization: In the training process, the weak learners are fitted to the negative gradient of the loss function with respect to the current ensemble's predictions. This approach is similar to gradient descent, where the weak learners aim to minimize the loss function by updating their parameters.
4. Emphasis on Difficult Examples: By focusing on the residuals (errors) in the training data, Gradient Boosting pays more attention to the difficult-to-predict examples. The subsequent weak learners learn to capture the patterns that are challenging for the current ensemble, leading to improved model performance.
5. Weighted Voting: The predictions of the weak learners are combined using weighted voting, where the predictions of more accurate weak learners are given higher importance. The combination of predictions from multiple learners results in a strong model capable of capturing complex relationships in the data.
6. Regularization: Gradient Boosting can be prone to overfitting, especially if the number of iterations (weak learners) is too high. To address this, techniques like shrinkage (learning rate) and subsampling can be used to regularize the boosting process and prevent overfitting.

# ANSWER 6
Steps Gradient Boosting algorithm build an ensemble of weak learners : 
1. Initialization: The process starts with an initial prediction, which can be a simple estimate like the mean of the target values for regression tasks or the majority class for classification tasks. This initial prediction serves as the starting point for the iterative process.
2. Residual Calculation: The differences between the actual target values and the current ensemble's predictions are computed. These differences, known as residuals, represent the errors that need to be corrected in the subsequent iterations.
3. Training of Weak Learners: A weak learner (e.g., decision tree with limited depth) is trained to predict the residuals. The weak learner is fitted to the negative gradient of the loss function with respect to the current ensemble's predictions. This step involves gradient descent-like optimization, where the weak learner aims to minimize the loss by updating its parameters.
4. Shrinking: The predictions of the weak learner are then multiplied by a learning rate (also known as the shrinkage rate), which is a value less than 1. This step helps control the contribution of each weak learner to the final ensemble. Smaller learning rates lead to a more cautious learning process and can prevent overfitting.
5. Updating Ensemble Predictions: The predictions of the weak learner (multiplied by the learning rate) are added to the current ensemble's predictions. This step updates the ensemble's predictions to incorporate the new information from the latest weak learner. The ensemble now provides an improved estimate of the target variable.
6. Iterative Process: Steps 2 to 5 are repeated for a predefined number of iterations or until a stopping criterion (e.g., a minimum improvement in the loss function) is met. In each iteration, the weak learner focuses on the remaining errors (residuals) and tries to minimize them.
7. Ensemble Prediction: The final prediction of the Gradient Boosting algorithm is the sum of the initial prediction and the predictions from all the weak learners, each multiplied by its corresponding learning rate. The ensemble of weak learners collectively works together to achieve a highly accurate prediction.

# ANSWER 7
1. Objective Function: The first step is to define the objective function to be optimized during the training process. For regression tasks, the common objective function is the mean squared error (MSE), while for classification tasks, the cross-entropy loss (log loss) is commonly used. These objective functions quantify the difference between the actual target values and the model's predictions.
2. Initialization: The process starts with an initial prediction (often a simple estimate like the mean of target values for regression or the log-odds for classification). The initial prediction serves as the starting point for the iterative process.
3. Residual Calculation: The differences between the actual target values and the current ensemble's predictions are computed. These differences, known as residuals, represent the errors that need to be corrected in the subsequent iterations. For regression, the residuals are the true target values minus the initial predictions. For classification, the residuals are the true class probabilities (encoded as 0s and 1s) minus the initial predicted probabilities.
4. Training of Weak Learners: A weak learner (e.g., decision tree with limited depth) is trained to predict the residuals. The weak learner is fitted to the negative gradient of the objective function with respect to the current ensemble's predictions. This step involves gradient descent-like optimization, where the weak learner aims to minimize the loss by updating its parameters.
5. Learning Rate: The predictions of the weak learner are multiplied by a learning rate (also known as the shrinkage rate). The learning rate controls the contribution of each weak learner to the final ensemble. Smaller learning rates lead to a more cautious learning process and can prevent overfitting.
6. Updating Ensemble Predictions: The predictions of the weak learner (multiplied by the learning rate) are added to the current ensemble's predictions. This step updates the ensemble's predictions to incorporate the new information from the latest weak learner. The ensemble now provides an improved estimate of the target variable.
7. Iterative Process: Steps 3 to 6 are repeated for a predefined number of iterations or until a stopping criterion (e.g., a minimum improvement in the objective function) is met. In each iteration, the weak learner focuses on the remaining errors (residuals) and tries to minimize them.
8. Ensemble Prediction: The final prediction of the Gradient Boosting algorithm is the sum of the initial prediction and the predictions from all the weak learners, each multiplied by its corresponding learning rate. The ensemble of weak learners collectively works together to achieve a highly accurate prediction.