 **iterative process of Gradient Boosting for Binary Classification**, where the loss function is **log-loss** (also called logistic loss).

---

## Iterative Process of Gradient Boosting (Classification with Log-Loss)

---

###  **Step 0: Initialize with a constant prediction (log-odds)**

We start by initializing the prediction with the same value for all records:

$[
\hat{y}^{(0)} = \text{log-odds} = \log\left(\frac{\bar{p}}{1 - \bar{p}}\right)
\quad \text{where} \quad \bar{p} = \frac{1}{n} \sum_{i=1}^{n} y_i
]$

This is the log-odds of the positive class, and it’s our initial prediction $( F_0(x) )$ (the raw score, not probability).

---

###  **Now, for each boosting round \( m = 1, 2, ..., M \):**

---

###  **Step 1: Convert log-odds to probabilities**

Convert current raw predictions to probabilities using the **sigmoid function**:

$[
p_i^{(m-1)} = \frac{1}{1 + e^{-\hat{y}_i^{(m-1)}}}
]$

This gives us the model’s predicted probability for each sample.

---

###  **Step 2: Compute the pseudo-residuals**

Now calculate the pseudo-residuals as the **negative gradient of log-loss**:

$[
r_i^{(m)} = y_i - p_i^{(m-1)}
]$

- If $( y_i = 1 )$ and $( p_i = 0.6 )$, then $( r_i = 0.4 )$
- This residual tells us how much correction is needed to get the model closer to the truth.

---

###  **Step 3: Fit a regression tree to residuals**

Train a regression tree $( h_m(x) )$ on the pseudo-residuals $( r_i^{(m)} )$, using the input features $( x_i )$.

- The tree tries to predict how much to change the **log-odds** for each region of the input space.

---

###  **Step 4: Update the model**

Update predictions (still in log-odds space):

$[
\hat{y}_i^{(m)} = \hat{y}_i^{(m-1)} + \eta \cdot h_m(x_i)
]$

- $( \eta )$ is the **learning rate**
- It slows down learning to prevent overfitting

---

###  Repeat Steps 1–4 for \( M \) rounds

Each tree pushes the prediction a little closer to the correct class.

---

###  Final prediction:

After M rounds, the final model is:

$[
F(x) = \hat{y}^{(0)} + \eta \cdot \sum_{m=1}^{M} h_m(x)
]$

Then, convert this raw score into probability using sigmoid:

$[
p(x) = \frac{1}{1 + e^{-F(x)}}
]$

You can then threshold this (e.g., $( p > 0.5 )$) for classification.

---

###  Summary Table

| Step | Description |
|------|-------------|
| 0 | Initialize with log-odds |
| 1 | Convert log-odds to probs (sigmoid) |
| 2 | Compute pseudo-residuals = \( y - p \) |
| 3 | Train regression tree on residuals |
| 4 | Update prediction (add scaled tree output) |

---
