step-by-step through the **iterative process of Gradient Boosting for Regression** .

---

##  Iterative Process of Gradient Boosting (Regression with MSE)

---

###  **Step 0: Initialize with a constant prediction**

We start by predicting a constant for all records — usually the **mean** of the target variable $( y )$:

$[
\hat{y}^{(0)} = \bar{y} = \frac{1}{n} \sum_{i=1}^{n} y_i
]$

This is our **initial model** — the “zeroth” iteration.

---

### 🔁 **Now, for each boosting round \( m = 1, 2, ..., M \):**

---

###  **Step 1: Compute the pseudo-residuals**

Since we’re using **MSE**, the pseudo-residual is simply:

$[
r_i^{(m)} = y_i - \hat{y}_i^{(m-1)}
]$

- $( r_i^{(m)} )$ tells us how wrong the previous model’s prediction was for point $( i )$.
- This is the **target** for the next regression tree.

---

###  **Step 2: Fit a regression tree**

Train a small regression tree $( h_m(x) )$ on the residuals $( r_i^{(m)} )$ using the input features \( x \).

- Each leaf outputs a **correction value** — like “how much to tweak the prediction” in that region of the feature space.

---

###  **Step 3: Update the model**

Now update your prediction as:

$[
\hat{y}_i^{(m)} = \hat{y}_i^{(m-1)} + \eta \cdot h_m(x_i)
]$

- $( \eta )$ is the **learning rate** (typically between 0.01 and 0.3)
- This makes the update **conservative** and prevents overfitting

---

###  Repeat Steps 1–3 for M rounds

Each time:
- Compute new residuals
- Fit new tree
- Add the new tree's output to the prediction

---

###  Final model:

After $( M )$ rounds, your final prediction is:

$[
\hat{y}(x) = \hat{y}^{(0)} + \eta \cdot \sum_{m=1}^{M} h_m(x)
]$

Each $( h_m(x) )$ is a tiny regression tree learning to fix the mistakes of the previous trees.

---


