***Extreme gradient boosting (XGBoost)***

Is an optimized distributed gradient boosting library designed to be highly efficient, flexible, and portable. It implements machine learning algorithms under the Gradient Boosting framework. XGBoost provides a parallel tree boosting (also known as GBDT, GBM) that solve many data science problems in a fast and accurate way.

This uses Ridge/lasso regularaization where gradient uses a minimal regularization

It’s an improved implementation of gradient boosting, focused on:

1. Performance (fast)

2. Regularization (to avoid overfitting)

3. Handling missing values

4. Parallelization (CPU & GPU support)

***Key Concepts Behind XGBoost***

1. Same foundation as Gradient Boosting

    - Builds trees sequentially.

    - Each tree corrects the errors of the previous ones.

    - Uses gradient descent to minimize the loss.

2. But adds several improvements:

| Feature             | Gradient Boosting | XGBoost                                            |
| ------------------- | ----------------- | -----------------------------------------------    |
| Regularization      | None or minimal   | ✅ L1 & L2 regularization (like Ridge/Lasso)      |
| Parallelization     | ❌ Sequential      | ✅ Parallel tree construction                    |
| Missing values      | ❌ Manual handling | ✅ Auto-learns best direction for missing values |
| Overfitting control | ❌ Manual          | ✅ Shrinkage + Subsampling                       |
| Tree growth         | Level-wise        | ✅ Depth-wise + best split per leaf                |

***How XGBoost Works (Step-by-Step)***

1. Initialize model with a constant prediction (e.g., mean of target).

2. For each iteration:

    - Compute the gradient (first derivative of loss) and hessian (second derivative).

    - Fit a new tree to the gradients (i.e., residuals).

    - Compute the leaf weights using both gradients and hessians.

    - Update predictions with a learning rate (η).

3. Final prediction = sum of all weighted trees.

In [None]:
# implementation of xg boost model from sklearn

from xgboost import XGBClassifier

# Model
model = XGBClassifier(
    n_estimators=100,
    learning_rate=0.1,
    max_depth=3,
    subsample=0.8,
    colsample_bytree=0.8,
    reg_lambda=1,
    random_state=42
)

***Important Hyperparameters***

| Parameter          | Description                                        |
| ------------------ | -------------------------------------------------- |
| `n_estimators`     | Number of boosting rounds (trees)                  |
| `learning_rate`    | Shrinks contribution of each tree                  |
| `max_depth`        | Maximum depth of each tree                         |
| `subsample`        | Fraction of data used per tree                     |
| `colsample_bytree` | Fraction of features used per tree                 |
| `lambda`           | L2 regularization                                  |
| `alpha`            | L1 regularization                                  |
| `gamma`            | Minimum loss reduction to make a further partition |
