###[Q1.] What is Gradient Boosting Regression?
#####[Ans]
Gradient Boosting Regression is a machine learning technique that builds an ensemble of weak learners (typically decision trees) to predict a continuous target variable. The model is built iteratively, with each new learner trained to minimize the residual error of the previous learners. It uses gradient descent to optimize the loss function, which can be mean squared error or other regression-specific metrics.

###[Q2.] Implement a simple gradient boosting algorithm from scratch using Python and NumPy. Use a simple regression problem as an example and train the model on a small dataset. Evaluate the model's performance using metrics such as mean squared error and R-squared.
#####[Ans}

In [6]:
import numpy as np
from sklearn.metrics import mean_squared_error, r2_score

X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
y = np.array([1.2, 1.9, 3.1, 3.9, 5.1])

n_trees = 10
learning_rate = 0.1

prediction = np.full_like(y, y.mean(), dtype='float')

for _ in range(n_trees):
   residuals = y - prediction
   slope = np.sum((X.flatten() - X.mean()) * residuals) / np.sum((X.flatten() - X.mean())**2)
   intercept = residuals.mean()

   weak_prediction = slope * X.flatten() + intercept

   prediction += learning_rate * weak_prediction

mse = mean_squared_error(y, prediction)
r2 = r2_score(y, prediction)

print(f"MEAN SQUARED ERROR : {mse:.4f}")
print(f"R2 SCORE : {r2:.4f}")


MEAN SQUARED ERROR : 1.5445
R2 SCORE : 0.2016


###[Q3.] Experiment with different hyperparameters such as learning rate, number of trees, and tree depth to optimise the performance of the model. Use grid search or random search to find the best hyperparameters.
#####[Ans]

In [8]:
import numpy as np
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import mean_squared_error, r2_score

X = np.random.rand(100, 1) * 10
y = 3 * X.flatten() + np.random.randn(100) * 2

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

gbr = GradientBoostingRegressor(random_state=42)

param_grid = {
    'learning_rate': [0.01, 0.1, 0.2],
    'n_estimators': [50, 100, 150],
    'max_depth': [1, 2, 3]
}

grid_search = GridSearchCV(estimator=gbr, param_grid=param_grid, scoring='neg_mean_squared_error', cv=3)
grid_search.fit(X_train, y_train)

best_params = grid_search.best_params_
best_model = grid_search.best_estimator_
y_pred = best_model.predict(X_test)

mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Best Parameters: {best_params}")
print(f"Mean Squared Error: {mse:.4f}")
print(f"R2 Score: {r2:.4f}")


Best Parameters: {'learning_rate': 0.1, 'max_depth': 1, 'n_estimators': 50}
Mean Squared Error: 4.2173
R2 Score: 0.9508


###[Q4.] What is a Weak Learner in Gradient Boosting?
#####[Ans]
A weak learner is a simple model, typically a shallow decision tree, that performs slightly better than random guessing. In Gradient Boosting, weak learners are used to iteratively correct the errors made by previous learners.

###[Q5.] What is the Intuition Behind Gradient Boosting?
#####[Ans]
The core idea is to build models sequentially, where each new model aims to correct the residual errors of the combined model so far. Gradient descent is used to minimize a specified loss function, ensuring that each new model makes the ensemble more accurate.

###[Q6.] How Does Gradient Boosting Build an Ensemble of Weak Learners?
#####[Ans]
1. Initialize predictions with a constant value (e.g., the mean for regression).
2. Compute residuals based on the loss function.
3. Train a weak learner to predict the residuals.
4. Update the predictions using the output of the weak learner, scaled by the learning rate.
5. Repeat the process for a specified number of iterations.

###[Q7]. Steps to Build the Mathematical Intuition of Gradient Boosting:
#####[Ans]

1. Initialize a model with a constant prediction (e.g., mean or mode).
2. Calculate Residuals: Measure the difference between the true values and current predictions.
3. Fit Weak Learner: Train a weak model to predict the residuals.
4. Update Predictions: Add the weak learner’s predictions to the current predictions, scaled by the learning rate.
5. Optimize: Repeat the process to minimize the loss function iteratively.