# XGBoost

Extreme Gradient Boosting is an **ensemble learning model** (combines multiple models to improve overall performance) from the *boosting* family (Each 'weak' model learns from the mistakes of the previous ones)

Unlike *adaptive boosting*, which assigns weights to misclassified points, **gradient boosting** builds new models to predict the *residual errors* of the previous models and improves the predictions step by step. The name gradient boosting comes from the fact that the algorithm uses gradients (derivatives) to minimize the loss function. XGBoost minimizes a differentiable loss function using second-order derivatives.

### Quick note on the relationship between Residual Errors and Loss Function

The **residual error** is nothing more than the difference between the *true value* of the target variable and the *prediction* made by the model:

$$
r_{i} = y_{i} - \hat{y_{i}}
$$

A **loss function** is a *mathematical function* that **aggregates residual errors** into a single number often used to measure the model's performance. One example is the mean squared error (MSE) for regression:

$$
L = \frac{1}{n} \sum (y_{i} - \hat{y_{i}})^2
$$



Gradient boosting fits a new model to predict *residuals* rather than the target variable directly. The residuals are computed from the gradient of the loss function.

#### Example: Mean Squared Error (MSE) and Residuals

If the loss function is MSE:

$$
L = \frac{1}{n} \sum (y_{i} - \hat{y_{i}})^2
$$

The gradient (derivative) with respect to predictions is:

$$
\frac{\partial L}{\partial \hat{y_{i}}} =  -2(y_{i} - \hat{y_{i}})
$$

This shows that the gradient of the loss function is proportional to the negative residuals. Then, in gradient boosting, the next model is trained to predict the residual errors (or equivalently, follow the gradient of the loss function)


In [2]:
# import necessary libraries

import xgboost as xgb
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error


In [3]:
# Load dataset

data = fetch_california_housing()
X, y = data.data, data.target

### Split data for taining and testing

In [4]:
# Split into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

### Data format conversion 

In [5]:
# Convert to DMatrix format 

dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)

DMatrix is a special **data structure** used by XGBoost to optimize *training speed* and *memory usage*. It is more efficient than using raw NumPy arrays or pandas DataFrames because it handles missing values automatically, makes training faster by storing data in a compressed format and precomputes statistics like gradients to speed up calculations.

In [6]:
# Set hyperparameters
params = {
    'objective': 'reg:squarederror',  # Regression task
    'eval_metric': 'rmse',
    'eta': 0.1,  # Learning rate
    'max_depth': 6,
    'subsample': 0.8
}

### Model training

In [7]:
# Train model
num_rounds = 100
model = xgb.train(params, dtrain, num_rounds)

### Model prediction and evaluation

In [8]:
# Make predictions
y_pred = model.predict(dtest)

# Evaluate performance
mse = mean_squared_error(y_test, y_pred)
print(f'Mean Squared Error: {mse:.4f}')


Mean Squared Error: 0.2260
