
# üìò XGBoost Classification
---

# 1Ô∏è‚É£ Goal
Understand how **XGBoost solves classification problems step-by-step**.

Toy Problem:
Predict **placement (0/1)** using **CGPA**.

---

# 2Ô∏è‚É£ Prerequisites
You should know:
- Gradient Boosting
- Logistic Regression basics
- Log-Odds concept

---

# 3Ô∏è‚É£ Big Idea

XGBoost classification workflow = Gradient Boosting workflow

### Core Loop
1. Start with base prediction (log-odds)
2. Compute residuals
3. Train tree on residuals
4. Add scaled tree output
5. Repeat

Key difference:
> Trees are built using **similarity score**, not entropy/Gini

---

# 4Ô∏è‚É£ Stage 1 ‚Äî Base Model

In regression ‚Üí mean  
In classification ‚Üí **log-odds**

### Log-Odds Formula
```
log(p / (1 - p))
```

Where:
p = probability of positive class

---

# 5Ô∏è‚É£ Compute Base Log-Odds

Assume:
- 3 positives
- 2 negatives

```
p = 3/5
log-odds = log(3/2) ‚âà 0.405
```

So base prediction for all samples:
```
F‚ÇÄ(x) = 0.405
```

---

# 6Ô∏è‚É£ Convert Log-Odds ‚Üí Probability

Using sigmoid:

```
p = e^F / (1 + e^F)
```

For 0.405:
```
p ‚âà 0.60
```

So model predicts:
> 60% placement probability for everyone

Clearly weak model ‚ùå

---

# 7Ô∏è‚É£ Residuals (Pseudo-Residuals)

Residual formula:
```
Residual = Actual ‚àí Predicted Probability
```

Example residuals:
- 1 ‚àí 0.6 = +0.4
- 0 ‚àí 0.6 = ‚àí0.6

Residuals capture:
> Model error direction

---

# 8Ô∏è‚É£ Build First XGBoost Tree

Train tree on:
```
Input: CGPA
Target: Residuals
```

But tree construction is DIFFERENT.

---

# 9Ô∏è‚É£ Similarity Score (Classification Version)

### Formula
```
Similarity = Œ£(residual¬≤) / Œ£(p * (1 - p)) + Œª
```

Where:
- p = previous probability
- Œª = regularization (assume 0)

This replaces:
- Gini
- Entropy

---

# üîü Root Node Similarity

If residuals cancel out ‚Üí score ‚âà 0

This becomes baseline.

Goal now:
> Split data to increase similarity score

---

# 1Ô∏è‚É£1Ô∏è‚É£ Find Split Points

Sort feature (CGPA).

Candidate splits = averages between adjacent values.

Example:
```
5.97, 6.67, 7.62, 8.87
```

We test ALL splits.

---

# 1Ô∏è‚É£2Ô∏è‚É£ Evaluate Each Split

For each split:

1. Divide residuals
2. Compute similarity for left and right
3. Compute gain

---

# Gain Formula
```
Gain = LeftScore + RightScore ‚àí ParentScore
```

Choose split with **max gain**.

---

# 1Ô∏è‚É£3Ô∏è‚É£ Best Split Found

Example best split:
```
CGPA < 7.62
```

Tree structure:
```
        CGPA < 7.62
        /            Residual A   Residual B
```

---

# 1Ô∏è‚É£4Ô∏è‚É£ Leaf Output Formula

Leaf output determines tree contribution.

### Formula
```
Leaf Output = Œ£(residuals) / Œ£(p * (1 - p)) + Œª
```

Difference from similarity score:
- No square in numerator

---

# 1Ô∏è‚É£5Ô∏è‚É£ Example Leaf Outputs

Left leaf:
```
‚âà -1.11
```

Right leaf:
```
‚âà +1.66
```

These are in **log-odds space**.

---

# 1Ô∏è‚É£6Ô∏è‚É£ Stage 2 Model

Combined model:
```
F(x) = Base Log-Odds + Œ∑ √ó Tree Output
```

Where:
Œ∑ = learning rate (e.g., 0.3)

---

# 1Ô∏è‚É£7Ô∏è‚É£ Example Prediction

If CGPA = 6:
- Goes left leaf
- Output = -1.11

New log-odds:
```
0.405 + 0.3 √ó (-1.11) = 0.072
```

---

# 1Ô∏è‚É£8Ô∏è‚É£ Convert Back to Probability

Apply sigmoid again:

```
p = e^0.072 / (1 + e^0.072) ‚âà 0.518
```

So probability dropped from:
```
0.60 ‚Üí 0.52
```

Model improved ‚úÖ

---

# 1Ô∏è‚É£9Ô∏è‚É£ Compute New Residuals

```
New Residual = Actual ‚àí New Probability
```

Residuals shrink toward 0.

This indicates:
> Model learning correctly

---

# 2Ô∏è‚É£0Ô∏è‚É£ Stage 3+

Repeat process:

1. Train new tree on new residuals
2. Add to model
3. Convert to probability
4. Compute new residuals

---

# Final Model Form

```
F(x) = Base + Œ∑T‚ÇÅ + Œ∑T‚ÇÇ + Œ∑T‚ÇÉ ...
```

Prediction:
```
Probability = sigmoid(F(x))
```

---

# 2Ô∏è‚É£1Ô∏è‚É£ Key Differences vs Gradient Boosting

| Aspect | Gradient Boosting | XGBoost |
|-------|------------------|---------|
| Tree split metric | Entropy/Gini | Similarity score |
| Regularization | Weak | Strong |
| Speed | Moderate | Optimized |
| Math | Simpler | Second-order |

---

# 2Ô∏è‚É£2Ô∏è‚É£ Important Formulas

### Log-Odds
```
log(p / (1 - p))
```

### Sigmoid
```
e^F / (1 + e^F)
```

### Similarity Score
```
Œ£(residual¬≤) / Œ£(p(1 - p))
```

### Gain
```
Left + Right ‚àí Parent
```

### Leaf Output
```
Œ£(residual) / Œ£(p(1 - p))
```

---

# 2Ô∏è‚É£3Ô∏è‚É£ Intuition

- Base model predicts global probability
- Trees fix mistakes
- Each tree nudges predictions
- Residuals shrink toward 0

When residual ‚âà 0:
> Model is optimal

---

# 2Ô∏è‚É£4Ô∏è‚É£ Why Log-Odds?

Because:
- Works well with sigmoid
- Stable gradients
- Smooth optimization

This is same idea used in:
- Logistic regression
- Gradient boosting classifier

---

# 2Ô∏è‚É£5Ô∏è‚É£ What Makes XGBoost Powerful

- Log-odds optimization
- Regularized tree building
- Similarity-based splits
- Learning rate shrinkage
- Additive modeling

---
