## Q1. What is Gradient Boosting Regression?
"Gradient Boosting Regression is an ensemble technique that builds a strong regressor by sequentially adding weak models, usually decision trees, where each new model tries to minimize the residual errors of the combined previous models."

🔶 2. Core Idea (Go Deep Here):
“At each step, the algorithm doesn’t just fit on the raw data — it fits on the gradient of the loss function. That is, it learns in the direction where the loss (like Mean Squared Error) is decreasing the most.”

✅ Why it's called "Gradient":
The residuals (errors) are the negative gradients of the loss function.

So, we’re using gradient descent in function space — each model is a step toward minimizing the loss.

So instead of just minimizing error by brute force, Gradient Boosting minimizes it by following the gradient of the loss function, which tells us how to improve predictions in the steepest descent direction. That’s the core — combining weak learners using gradient optimization

## Q2. What is a weak learner in Gradient Boosting? 

A weak learner is a model that performs just slightly better than random guessing on the given task.

In Gradient Boosting, the most commonly used weak learner is a shallow decision tree, often called a decision stump (a tree with 1–3 splits).


 Example:
Suppose you have a regression task.

The first tree predicts the average (say ₹50L), but there's a sample with true value ₹55L.

The next tree (weak learner) learns to predict a part of this 5L error — say 1L.

Over time, combining many such weak predictions gets you very close to the actual target.

🔄 Summary:
✅ Weak learner = simple model (e.g., shallow tree)

✅ Used to learn from residuals or gradients

✅ When added together = strong learner

## Q3. What is the intuition behind the Gradient Boosting algorithm?

The intution behind gradient boosting is to use bossting technique along with gradient optimaization. In this technque the weak learners are fitted on the raw data but are fitted on the gradient of the loss function , so that the the next weak learner comes to know the direction and magnitude of the error and gradient descent happens to reach the global minima . 
Everey next learner in the sequence tries to fitted on the gradiet of the previous model so that those errors/residual can be reduced , when such weak leanrners are combined the resultant is a stable , robust model whose output can be trusted .

Gradient Boosting is about fitting weak learners not on data directly, but on the gradients of a loss function — guiding each model to reduce the error in the most efficient direction."

Gradient Boosting combines the idea of boosting — where models are built sequentially to correct errors of previous ones — with gradient descent optimization.

Instead of training weak learners on the raw target values, we train each learner on the negative gradient of the loss function, which represents the direction and magnitude of the error.

This tells the next model how to adjust the prediction to reduce the error most effectively.

Each subsequent learner corrects the mistakes of the combined ensemble so far. Over many iterations, the model gradually improves and converges toward a strong, stable predictor that minimizes the overall loss.

## Q4. How does Gradient Boosting algorithm build an ensemble of weak learners?
Core Idea:
Gradient Boosting builds an ensemble sequentially, where each new weak learner (usually a decision tree) is trained to correct the errors (residuals or gradients) made by the current model.

🔁 Step-by-Step Intuition:
Start with a base model:
Typically, we initialize with a constant prediction (like the mean of the target values for regression).

Compute the loss (error):
For each data point, calculate the difference between the predicted and actual values — this is the residual.

Fit a weak learner to the residuals:
Train a shallow decision tree to predict these residuals (or, more precisely, the negative gradient of the loss function).

Update the model:
Add the new learner's predictions to the overall model with a learning rate 
𝜂
η:

𝐹
𝑚
(
𝑥
)
=
𝐹
𝑚
−
1
(
𝑥
)
+
𝜂
⋅
ℎ
𝑚
(
𝑥
)
F 
m
​
 (x)=F 
m−1
​
 (x)+η⋅h 
m
​
 (x)
Repeat:
Continue this process for a fixed number of iterations or until the error stops improving.

📦 Example:
Let's say you are predicting house prices.

The first model predicts ₹50L for every house (mean value).

You calculate errors: one house’s true price is ₹55L → residual = ₹5L.

The next tree learns to predict this ₹5L gap.

Add a fraction of it (say 10%): new prediction = ₹50L + 0.1 × ₹5L = ₹50.5L.

This continues for all data points and for many rounds.

🔑 Key Points to Say in an Interview:
"Each weak learner is trained to minimize the gradient of the loss function."

"The ensemble is built stage-by-stage, and each stage improves upon the last."

"Eventually, the model converges to a strong predictor by minimizing the overall loss."



## Q5. What are the steps involved in constructing the mathematical intuition of Gradient Boosting algorithm?

1. Build a base model - Usually gives mean value as output
2. Calculate the gradient of the loss function i.e the error with direction
3. Fit the next model with with this gradient of loss function , i.e the next model (DTR) in sequence will not predict the actual avle but the gradient / error . This helps to next learner to reduce the error using gradient optimization process.
    To Get back the result i.e the value not the error :
        Value = base model prediction + Alpha*(predicted error)
        
4. Repeat the process until the error is minimized and desired accuracy is achieved

Gradient Boosting fits each new weak learner on the gradient of the loss function, updating predictions gradually to minimize overall error using gradient descent in function space

## Q6. Experiment with different hyperparameters such as learning rate, number of trees, and tree depth tooptimise the performance of the model. Use grid search or random search to find the best hyperparameters

In [1]:
from sklearn.datasets import make_regression

In [3]:
X , y = make_regression(n_samples=1000 , n_features=4 ,n_informative=3 , random_state=42 )

In [6]:
from sklearn.model_selection import train_test_split

X_train , X_test , y_train , y_test = train_test_split(X , y , test_size=0.33 , random_state=42)

In [8]:
from sklearn.ensemble import GradientBoostingRegressor

GBR=GradientBoostingRegressor()

In [9]:
param_grid = {
    'learning_rate': [0.01, 0.05, 0.1, 0.2],     # How fast the model learns (lower = more robust)
    'n_estimators': [100, 200, 300, 500],        # Number of boosting rounds (trees)
    'max_depth': [3, 5, 7, 10]                   # Depth of each tree (controls model complexity)
}

In [10]:
from sklearn.model_selection import GridSearchCV

gscv=GridSearchCV(GBR , param_grid=param_grid , refit=True , cv=3)

In [12]:
gscv.fit(X_train , y_train)

In [15]:
gscv.best_params_

{'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 500}

In [16]:
gscv.best_score_

0.991503589129462

In [17]:
model=GradientBoostingRegressor(learning_rate=0.1 , max_depth=3, n_estimators=500)

In [18]:
model.fit(X_train , y_train)

In [20]:
y_pred=model.predict(X_test)

In [23]:
from sklearn.metrics import r2_score , mean_absolute_error , mean_squared_error
import numpy as np

In [None]:
print(r2_score(y_test , y_pred))
print(mean_absolute_error(y_test , y_pred))
print(mean_squared_error(y_test  , y_pred))
print(np.sqrt())