In [None]:
Q1. What is Gradient Boosting Regression?

In [None]:
Answer :
    
Gradient Boosting Regression is a machine learning technique that belongs to the ensemble learning family. It is used for both
classification and regression tasks. The basic idea behind gradient boosting is to combine multiple weak learners (usually decision 
trees) to create a strong predictive model.

Here's a brief overview of how gradient boosting regression works:
1. Base Learners (Weak Models): Gradient boosting starts with an initial simple model, often a shallow decision tree. This model is
called a weak learner because it might not perform well on its own.

2. Training Iteratively: The algorithm is trained iteratively. In each iteration, a new weak learner is added to the ensemble to 
correct the errors made by the existing ensemble. The new learner is trained on the residuals (the differences between the actual
and predicted values) of the previous ensemble.

3. Gradient Descent Optimization: The term "gradient" in gradient boosting refers to the use of gradient descent optimization to 
minimize the loss function. The algorithm minimizes the loss by moving in the direction of steepest decrease of the loss function 
with respect to the predictions.

4. Combining Weak Models: The predictions from all weak learners are combined to form the final prediction. The final model is a 
weighted sum of the predictions from all the weak learners.

In [None]:
Q2. Implement a simple gradient boosting algorithm from scratch using Python and NumPy. Use a simple regression problem as an 
example and train the model on a small dataset. Evaluate the model's performance using metrics such as mean squared error and
R-squared.

In [3]:
from sklearn.datasets import make_regression

In [4]:
X, Y = make_regression(n_samples = 1000, n_features = 4, n_informative = 2, random_state = 42, shuffle = True)

In [5]:
from sklearn.model_selection import train_test_split

In [6]:
x_train, x_test, y_train, y_test = train_test_split(X,Y, test_size = 0.25, random_state = 42)

In [8]:
from sklearn.ensemble import GradientBoostingRegressor

In [9]:
classifier1 = GradientBoostingRegressor()

In [10]:
classifier1.fit(x_train, y_train)

In [11]:
y_pred = classifier1.predict(x_test)

In [12]:
from sklearn.metrics import r2_score

In [13]:
print("Accuracy : " , r2_score(y_pred, y_test))

Accuracy :  0.9935588768315411


In [None]:
Q3. Experiment with different hyperparameters such as learning rate, number of trees, and tree depth to optimise the performance of 
the model. Use grid search or random search to find the best hyperparameters.

In [14]:
from sklearn.model_selection import GridSearchCV

In [15]:
param_grid = { 
               'learning_rate' : [0.01, 0.1, 1],
                'max_depth' : [1,2,3],
                'min_weight_fraction_leaf' : [0,0.1,0.2,0.3] }
               

In [16]:
classifier2 = GradientBoostingRegressor()

In [17]:
grid = GridSearchCV(classifier2, param_grid = param_grid, refit = True, cv=5, verbose = 3)

In [18]:
grid.fit(x_train, y_train)

Fitting 5 folds for each of 36 candidates, totalling 180 fits
[CV 1/5] END learning_rate=0.01, max_depth=1, min_weight_fraction_leaf=0;, score=0.627 total time=   0.1s
[CV 2/5] END learning_rate=0.01, max_depth=1, min_weight_fraction_leaf=0;, score=0.621 total time=   0.1s
[CV 3/5] END learning_rate=0.01, max_depth=1, min_weight_fraction_leaf=0;, score=0.592 total time=   0.1s
[CV 4/5] END learning_rate=0.01, max_depth=1, min_weight_fraction_leaf=0;, score=0.605 total time=   0.1s
[CV 5/5] END learning_rate=0.01, max_depth=1, min_weight_fraction_leaf=0;, score=0.663 total time=   0.1s
[CV 1/5] END learning_rate=0.01, max_depth=1, min_weight_fraction_leaf=0.1;, score=0.627 total time=   0.1s
[CV 2/5] END learning_rate=0.01, max_depth=1, min_weight_fraction_leaf=0.1;, score=0.621 total time=   0.1s
[CV 3/5] END learning_rate=0.01, max_depth=1, min_weight_fraction_leaf=0.1;, score=0.592 total time=   0.1s
[CV 4/5] END learning_rate=0.01, max_depth=1, min_weight_fraction_leaf=0.1;, score=0

In [19]:
grid.best_params_

{'learning_rate': 0.1, 'max_depth': 3, 'min_weight_fraction_leaf': 0}

In [20]:
y_pred2 = grid.predict(x_test)

In [21]:
print("Accuracy : " , r2_score(y_test, y_pred2))

Accuracy :  0.9938086841547614


In [None]:
Q4. What is a weak learner in Gradient Boosting?

In [None]:
Answer : In the context of gradient boosting, a weak learner refers to a model that performs slightly better than random chance on
a given task. It is a model that is better than random guessing but may still make considerable errors. Weak learners are typically
simple models, often shallow decision trees or linear models.

The concept of using weak learners in ensemble methods like gradient boosting is based on the idea that by combining multiple weak
learners, their individual weaknesses can be compensated for, leading to a strong and robust predictive model. Each weak learner is 
trained to focus on the mistakes or residuals of the combined model built in the previous iteration.

In the context of gradient boosting regression:

1. First Iteration: The initial weak learner is fit to the data, and its predictions are combined to form the initial ensemble.

2. Subsequent Iterations: In each subsequent iteration, a new weak learner is added to the ensemble. This new learner is trained on
the residuals (the differences between the actual and predicted values) of the current ensemble. The idea is to correct the errors 
made by the existing model.

3. Cumulative Improvement: As more weak learners are added, the overall model becomes a strong learner capable of capturing complex 
patterns in the data.

In [None]:
Q5. What is the intuition behind the Gradient Boosting algorithm?

In [None]:
Answer : 
    The intuition behind the Gradient Boosting algorithm lies in the iterative improvement of a model's predictive performance by
    sequentially combining weak learners. Here's a step-by-step intuition for understanding how Gradient Boosting works:

Start with a Weak Model:

Begin with a simple model, often a shallow decision tree, as the first weak learner.
This initial model might perform poorly on the task but is better than random chance.

Focus on Errors (Residuals):
Identify the differences (residuals) between the actual target values and the predictions of the weak learner.
These residuals represent the errors made by the current model.

Train a New Model to Correct Errors:

Train a new weak learner on the residuals of the previous model.
The new model is tasked with capturing and correcting the errors made by the existing ensemble.
Combine Models Additively:

Combine the predictions of all weak learners additively.
Each new model contributes to the overall prediction, with a weight determined by its performance in reducing the residuals.
Gradient Descent Optimization:

Use gradient descent optimization to find the direction in the feature space that minimizes the loss function.
The algorithm iteratively moves towards the optimal direction, improving the model's performance.
Iterative Refinement:

Repeat the process iteratively, adding new weak learners to the ensemble in each iteration.
Each new learner corrects the errors of the existing ensemble, gradually improving the model's accuracy.
Stop at a Predetermined Number of Iterations or Criteria:

The process continues until a specified number of weak learners are added or a certain criterion is met.
Common criteria include achieving a satisfactory level of performance or avoiding overfitting.

In [None]:
Q6. How does Gradient Boosting algorithm build an ensemble of weak learners?

In [None]:
Answer : 
    The Gradient Boosting algorithm builds an ensemble of weak learners in an iterative fashion. Here's a step-by-step explanation
    of how the ensemble is constructed:

1. Initialize with a Weak Learner:
- The process starts with an initial weak learner, often a shallow decision tree.
- The initial model is a simple approximation of the true relationship in the data.

2. Calculate Residuals:
- Calculate the residuals by finding the differences between the actual target values and the predictions of the current ensemble 
(which consists only of the initial weak learner in the first iteration).

3. Train a New Weak Learner on Residuals:
- Train a new weak learner (usually another decision tree) on the residuals.
- The new learner's task is to capture and correct the errors made by the current ensemble.

4. Compute Weighted Sum of Predictions:
- Combine the predictions of all weak learners additively, with each learner's contribution weighted by a factor determined during
training.
- The weights are determined by an optimization algorithm, often using gradient descent.

5. Update Residuals:
- Calculate the residuals again, this time considering the combined predictions of the current ensemble.
- These residuals represent the errors that the next weak learner will try to correct.

6. Iterative Process:
- Repeat steps 3 to 5 for a predetermined number of iterations or until a stopping criterion is met.
- In each iteration, a new weak learner is trained to correct the errors of the current ensemble.

7. Combine All Weak Learners:
- The final prediction is the sum of the predictions from all weak learners, each weighted by its contribution to the overall model.
- The combination of these weak learners results in a strong predictive model.

8. Regularization (Optional):
- Gradient Boosting algorithms may include regularization techniques to prevent overfitting. Regularization terms, such as 
shrinkage or depth constraints on trees, can be incorporated.

In [None]:
Q7. What are the steps involved in constructing the mathematical intuition of Gradient Boosting algorithm?

In [None]:
Answer :
Constructing the mathematical intuition of the Gradient Boosting algorithm involves understanding the underlying principles of how 
the algorithm minimizes a loss function using gradient descent. Here are the steps involved in the mathematical intuition of Gradient
Boosting:

1. Define the Objective Function (Loss Function):
Start with a loss function that measures the difference between the actual target values and the predicted values of the model.
Common loss functions include mean squared error (for regression) or log loss (for classification).

2. Initialize with a Simple Model:
- Begin with an initial weak learner (often a shallow decision tree).
- The initial model is a simple approximation of the true relationship, and its predictions are denoted as Fo(x) for the i-th 
obeservation.

3. Compute Residuals:
- Calculate the residuals by finding the differences between the actual target values(yi) and the predictions of the current model 
(Fm(x), where m is the iteration.)

4. Train a New Weak Learner on Residuals:
- Train a new weak learner to predict the residuals. The goal is to find a model, hm(x), that minimizes the loss when applied to the
residuals (yi - Fm(x)).

5. Update Model:
- Update the model by adding the new weak learner's predictions to the current model:  Fm+1(x) = Fm(x) + α⋅hm(x).
- The parameter α is the learning rate, controlling the step size in the gradient descent process.

6. Compute New Residuals:
- Calculate the new residuals based on the updated model: r_m+1 = yi - Fm+1(x). 

7. Iterative Process:
- Repeat steps 4 to 6 for a predetermined number of iterations or until a stopping criterion is met.
- In each iteration, a new weak learner is trained to predict the residuals, and its predictions are added to the current model.

8. Final Model:
- The final model is the sum of all weak learners : F(x) = Fo(x) + α⋅h1(x) + α⋅h2(x) + .... + α⋅hm(x).

9. Regularization (Optional):
- Regularization terms can be added to the objective function to prevent overfitting. For example, a regularization term might
penalize complex models.

10. Optimization (Gradient Descent):
- The optimization process involves finding the best parameters (e.g., tree structure and leaf values) for each weak learner to
minimize the overall loss function.
- Gradient descent is used to update the parameters in the direction that minimizes the loss.