<a href="https://colab.research.google.com/github/adnanagbaria/MLcourse/blob/main/Lec8_boosingAlgorithms.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Boosting algorithms
Agenda
* Gradient Boosting: Regression concept
* Gradient Boosting: Regression Calculation
* Gradient Boosting: Classification concept
* Gradient Boosting: Calculation

# Gradient Boosting for Regression
Gradient Boosting Regression is an ensemble technique that builds a strong regressor by combining multiple weak regressors (typically decision trees), each trained to correct the errors of its predecessor.

Instead of predicting directly, each new model tries to predict the residuals (errors) of the combined previous models.

**Mathematical Form:**
Suppose you want to minimize a loss function $L(y, \hat{y})$:
1. Start with an initial model:
$F_0(x) = arg \min_{\gamma} \Sigma L(y_i, \gamma)$

2. For each iteration $m = 1, 2, ..., M$:
  * Compute pseudo-residuals:
  $r_i^m = - [\frac{\partial L(y_i, F(x_i))}{\partial F(x_i)}]_{F=F_{m-1}}$
  * Fit a weak model $h_m(x)$ to $r_i^m$
  * Compute step size $\gamma_m$
  *Update: $F_m(x) = F_{m-1}(x) + \gamma_m h_m(x)$

**Common Loss Functions:**
* Squared Error: $L = (y - \hat{y})^2$
* Absolute Error:  $L = |y - \hat{y}|$
* Huber Loss: Mix of the above, robust to outliers

(Lec8: Slides 3 -- 23)

In [1]:
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split

# Create data
X, y = make_regression(n_samples=200, n_features=5, noise=10, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y)

# Fit model
reg = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, max_depth=3)
reg.fit(X_train, y_train)

# Evaluate
print("R2 Score:", reg.score(X_test, y_test))


R2 Score: 0.8863388281067902


**Advantages:**
* Handles non-linear relationships
* Customizable with loss functions
* Robust to overfitting (if tuned properly)


# Regression Calculation
Let's walk through a step-by-step numerical example of Gradient Boosting Regression using the squared error loss function.

**Dataset:**

| $x$ | $y$ |
| --- | --- |
| 1   | 2   |
| 2   | 3   |
| 3   | 2.5 |

**Step 1: Initial Prediction $F_0(x)$**
For squared error, initial prediction is usually the mean of $y$:
$F_0(x) =bar{y} = \frac{2+3+2.5}{3} = 2.5$

**Step 2: Compute Residuals (Negative Gradient)**

$r_i = y_i - F_0(x_i)$

| $x$ | $y$ | $F_0(x)$   | Residual $r$ |
| --- | --- | ---------- | ------------ |
| 1   | 2   |   2.5      | -0.5         |
| 2   | 3   |   2.5      | 0.5          |
| 3   | 2.5 |   2.5      | 0.0          |

**Step 3: Fit a Simple Tree to Residuals**
Let’s say our weak learner is a decision stump:
* $x < 1.5 => -0.5$
* $x \ge 1.5 => 0.25$ (average of 0.5 and 0)

This is the prediction $h_1(x)$

**Step 4: Update Model**
With learning rate $\phi = 1.0$, $F_1(x) = F_0(x) + \phi h_1(x)$

| $x$ | $F_0(x)$ | $h_1(x)$ | $F_1(x)$ |
| --- | -------- | -------- | -------- |
| 1   | 2.5      | -0.5     | 2.0      |
| 2   | 2.5      | 0.25     | 2.75     |
| 3   | 2.5      | 0.25     | 2.75     |

