# XGBoost (Extreme Gradient Boosting)

**XGBoost** stands for *Extreme Gradient Boosting*. It is a scalable, distributed gradient-boosted decision tree (GBDT) machine learning library.

**Purpose:** Primarily used for **Regression**, **Classification**, and **Ranking** tasks.
**Fame:** It dominates Kaggle competitions and industry applications due to its superior execution speed and model performance on structured/tabular data.

**Core Concept:**
XGBoost builds trees **sequentially**. Each new tree is trained to correct the prediction errors (residuals) made by all the previous trees combined, using a gradient descent approach to minimize the loss.

---

## 1. Mathematical Intuition

### A. Prediction (Additive Model)
The final prediction $\hat{y}_i$ for a given instance $i$ is the sum of predictions from $K$ sequential trees ($f_k$):

$$\hat{y}_i = \sum_{k=1}^{K} f_k(x_i)$$

* $f_k$: The function represented by the $k$-th tree.
* $K$: Total number of trees.

### B. Regularized Objective Function (The XGBoost Genius)
Unlike standard Gradient Boosting, XGBoost explicitly includes a regularization term in its objective function to control complexity.

$$\text{Obj}(\theta) = \underbrace{\sum_{i=1}^{n} L(y_i, \hat{y}_i)}_{\text{Loss Term}} + \underbrace{\sum_{k=1}^{K} \Omega(f_k)}_{\text{Regularization Term}}$$

1.  **Loss Term ($L$):** Measures how well the model fits the data (e.g., MSE for regression, Log Loss for classification).
2.  **Regularization Term ($\Omega$):** Penalizes complex models to prevent overfitting.
    $$\Omega(f) = \gamma T + \frac{1}{2} \lambda ||w||^2$$
    * $T$: Number of leaves in the tree.
    * $w$: Vector of leaf weights (scores).
    * $\gamma$ (gamma): Minimum loss reduction required to make a split.
    * $\lambda$ (lambda): L2 regularization term on weights.

**Note:** XGBoost uses **second-order approximations** (Taylor Expansion using both Gradient and Hessian) of the loss function, making optimization faster and more precise than traditional GBDT (which uses only first-order gradients).

---

## 2. Why is XGBoost So Powerful?

| Feature | Description |
| :--- | :--- |
| **Built-in Regularization** | Controls overfitting via L1 ($\alpha$) and L2 ($\lambda$) penalties on leaf weights. |
| **Sparse Data Handling** | Automatically learns the best "default direction" for missing values during training. |
| **Parallel Processing** | Builds trees sequentially, but parallelizes the **node splitting** phase (feature sorting). |
| **Tree Pruning** | Uses "max_depth" parameter and prunes trees backwards using the $\gamma$ threshold. |
| **Hardware Optimization** | Out-of-core computing for datasets larger than RAM; Cache-aware access. |

---

## 3. Key Hyperparameters

### Boosting Process
* **`n_estimators`**: Number of boosting rounds (trees). Increasing this improves fit but increases overfitting risk.

### Tree Structure
* **`learning_rate` ($\eta$):** Shrinks the contribution of each new tree. Low values ($0.01 - 0.1$) with high `n_estimators` usually yield the best results.
* **`max_depth`**: Maximum depth of a tree. Controls model complexity (Typical: $3-10$).
* **`min_child_weight`**: Minimum sum of instance weight (Hessian) needed in a child node. Higher values $\rightarrow$ More conservative (prevents overfitting).
* **`gamma` ($\gamma$):** Minimum loss reduction required to make a further partition. Acts as a pseudo-regularizer.

### Randomization (Stochastic Boosting)
* **`subsample`**: Fraction of training rows sampled for each tree. (Typical: $0.5 - 0.9$).
* **`colsample_bytree`**: Fraction of columns (features) sampled for each tree. Reduces correlation between trees.

### Regularization
* **`reg_alpha` ($\alpha$):** L1 regularization term on weights. Good for high dimensionality (feature selection).
* **`reg_lambda` ($\lambda$):** L2 regularization term on weights. (Default: 1). Makes predictions smoother.

---

## 4. XGBoost Workflow (Step-by-Step)

1.  **Initialize:** Start with a simple prediction (e.g., the mean of the target $y$).
2.  **Iterate ($m = 1$ to $M$):**
    * **a.** Compute the **Gradient** ($g_i$) and **Hessian** ($h_i$) for all data points based on the current error.
    * **b.** Fit a new Decision Tree to these gradients/hessians.
    * **c.** Calculate leaf scores (weights) using the regularization formula:
        $$w^* = -\frac{\sum g_i}{\sum h_i + \lambda}$$
    * **d.** Update the model:
        $$\hat{y}_{new} = \hat{y}_{old} + \eta \cdot \text{Tree}_{prediction}$$
3.  **Output:** Final sum of all weighted tree predictions.



---

## 5. When to Use XGBoost?

 **Use When:**
* You have **Structured / Tabular data** (Excel-like data).
* **Predictive Accuracy** is the #1 priority (Kaggle style).
* The relationship between features and target is complex and non-linear.
* You have missing values (XGBoost handles them natively).

 **Avoid When:**
* **Unstructured data:** Images (use CNNs) or Text (use Transformers/LLMs).
* **Interpretability:** You need a strictly explainable formula (use Linear/Logistic Regression).
* **Tiny Datasets:** Might be overkill and prone to overfitting; standard Random Forest or Linear Models might suffice.

In [None]:
from xgboost import XGBRegressor
from sklearn.model_selection import train_test_split

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Initialize and train
model = XGBRegressor(
    n_estimators=100,
    max_depth=5,
    learning_rate=0.1,
    random_state=42
)
model.fit(X_train, y_train)

# Predict
y_pred = model.predict(X_test)

**Q: Why is XGBoost often better than standard Gradient Boosting Machines (GBM)?**

A: Regularization, efficient handling of missing data, use of second-order derivatives (Hessian) for faster convergence, and advanced tree pruning.

**Q: How does XGBoost handle missing values?**

A: During training, it learns the default direction (left or right child) for missing values at each split that minimizes loss.

**Q: How can you prevent overfitting in XGBoost?**

A: Use a combination of:
1) Lower max_depth,
2) Increase min_child_weight and gamma
3) Use subsample and colsample_bytree
4) Apply stronger L1/L2 regularization (alpha, lambda)
5) Reduce learning_rate while increasing n_estimators.

In [None]:
import xgboost as xgb
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report

# Load data
data = load_iris()
X, y = data.data, data.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train the classifier
# For binary classification, objective='binary:logistic' (default)
# For multi-class, set objective='multi:softmax' and num_class
model = xgb.XGBClassifier(
    objective='multi:softmax',  # For multi-class classification
    num_class=3,                 # Number of classes in the Iris dataset
    n_estimators=100,
    max_depth=5,
    learning_rate=0.1,
    random_state=42
)
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)  # Predicts class labels
# y_pred_proba = model.predict_proba(X_test)  # Predicts class probabilities

# Evaluate
print(f"Accuracy: {accuracy_score(y_test, y_pred):.2f}")
print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=data.target_names))

**Important Parameters for Classification**

While you already know many parameters (like max_depth, eta) from the regressor, a few are particularly important for classification:

**scale_pos_weight :** Crucial for imbalanced datasets. A common value is (number of negative class samples) / (number of positive class samples). This tells the model to pay more attention to the minority class.

**eval_metric:** While training, it's helpful to monitor metrics like 'logloss', 'error' (classification error), or 'auc' (for binary classification).

**max_delta_step:** Can sometimes help stabilize training in logistic regression for extremely imbalanced classes.

**Q: How does XGBoost handle a multi-class classification problem?**

A: It uses a one-vs-all (OvA) strategy internally. When you set objective='multi:softmax', it essentially trains multiple binary classifiers (one for each class) and selects the class with the highest probability.

**Q: When would you choose XGBoost Classifier over a Random Forest?**

A: When your dataset is large, you need the highest possible accuracy and have the time/resources for careful tuning. Random Forest is excellent and more robust to overfitting with less tuning, but XGBoost's gradient boosting often achieves a slightly higher performance ceiling at the cost of complexity.

**Q: The classifier is overfitting. Which parameters would you adjust first?**

A: 
1) Increase reg_alpha (L1) and reg_lambda (L2) for stronger regularization. 
2) Reduce max_depth to make trees simpler. 
3) Lower learning_rate and increase n_estimators.
4) Use subsample and colsample_bytree to introduce more randomness.

### XGBoost: Regressor vs. Classifier

| Aspect | XGBoost Regressor | XGBoost Classifier |
| :--- | :--- | :--- |
| **Primary Task** | Predicts continuous numeric values. | Predicts discrete class labels. |
| **Core Objective** | Minimizes residuals (e.g., Squared Error). | Maximizes class probability (Log Loss). |
| **Default Objective**| `reg:squarederror` | `binary:logistic` or `multi:softprob`. |
| **Output Type** | Real numbers ($y \in \mathbb{R}$). | Probabilities or Class labels. |
| **Unique Params** | Standard tuning. | `num_class` (required for multi-class). |
| **Metrics** | RMSE, MAE, $R^2$. | Accuracy, F1, Log Loss, AUC. |


**XGBoost = Boosted Decision Trees → sequentially reduce errors.**

**Regressor → continuous predictions, Classifier → class labels.**

**Regularization + tree parameters = key to performance.**

**Handles missing/sparse data, fast, accurate, widely used in ML competition**


# LightGBM (Light Gradient Boosting Machine)

**LightGBM** is a gradient boosting framework developed by **Microsoft**. It is designed to be distributed and efficient with the following advantages:
* Faster training speed and higher efficiency.
* Lower memory usage.
* Better accuracy.
* Support of parallel and GPU learning.
* Capable of handling large-scale data.

**Core Philosophy:**
While XGBoost focuses on exactness and regularization, LightGBM focuses on **Speed** and **Scalability** by approximating the split-finding process using histograms.

---

## 1. Key Innovations (The "Secret Sauce")

LightGBM introduces architectural changes that make it significantly different from traditional boosting algorithms.

### A. Leaf-wise vs. Level-wise Growth (Crucial Difference)
Most boosting algorithms (like XGBoost) grow trees **Level-wise** (horizontally). They maintain a balanced tree.
LightGBM grows trees **Leaf-wise** (vertically/asymmetrically).

* **Leaf-wise (LightGBM):** It chooses the leaf with the **max delta loss** to grow. It expands the tree deeper in promising areas rather than wasting time on non-informative branches.
    * *Pros:* Lower loss, better accuracy on complex patterns.
    * *Cons:* Can grow very deep and overfit on small datasets. (Must control with `max_depth`).



### B. Histogram-based Learning
Instead of sorting all data points for every feature to find the best split (which is slow $O(N \log N)$), LightGBM buckets continuous feature values into discrete **bins** (histograms).

* **Efficiency:** Reduces calculation complexity from $O(\text{data} \times \text{features})$ to $O(\text{data} \times \text{bins})$.
* **Memory:** Significantly reduces memory usage because it stores discrete bins (integers) instead of raw floats.

### C. GOSS (Gradient-based One-Side Sampling)
This deals with the **number of data samples**.
* **Logic:** Data points with large gradients (large errors) are "hard" to learn. Points with small gradients are "easy" (already well-learned).
* **The Trick:** GOSS keeps all instances with **large gradients** and performs random sampling on instances with **small gradients**.
* **Result:** Focuses computation on the under-trained data without changing the data distribution.

### D. EFB (Exclusive Feature Bundling)
This deals with the **number of features**.
* **Logic:** In high-dimensional sparse data (like One-Hot encoded), many features are mutually exclusive (they are never non-zero at the same time).
* **The Trick:** EFB bundles these features into a single feature.
* **Result:** Reduces dimensionality without losing information.

---

## 2. Comparison: LightGBM vs. XGBoost

| Feature | **XGBoost** | **LightGBM** |
| :--- | :--- | :--- |
| **Tree Growth** | **Level-wise** (Horizontal/Balanced) | **Leaf-wise** (Vertical/Asymmetrical) |
| **Split Finding** | Pre-sorted (Exact) or Histogram (Approx) | **Histogram-based** (Fast & Low Memory) |
| **Categorical Data** | Requires One-Hot/Label Encoding | **Native Support** (Auto-handles categories) |
| **Missing Values** | Auto-learned direction | Auto-learned direction |
| **Memory Usage** | Higher | **Very Low** |
| **Speed** | Fast | **Very Fast** (Often 2-10x XGBoost) |
| **Best For** | Accuracy on medium data | Large datasets, High-dimensional data |

---

## 3. Handling Categorical Features
One of LightGBM's biggest advantages is **Native Categorical Support**.
* You do **not** need to One-Hot Encode your data.
* You simply define the column as `categorical` type.
* LightGBM partitions categorical features by using a "Many-vs-Many" split strategy, which is often more accurate than One-Hot encoding for high-cardinality features.

---

## 4. Key Hyperparameters

### Control Overfitting
Because Leaf-wise growth is aggressive, these parameters are vital:

1.  **`num_leaves`**: The main parameter to control complexity. Theoretical limit is $2^{\text{max\_depth}}$, but usually set smaller to prevent overfitting.
2.  **`max_depth`**: Explicitly limits how deep the tree can grow.
3.  **`min_data_in_leaf`**: Minimum samples required in a leaf. Setting this high prevents the tree from picking up noise.

### Tuning Speed
1.  **`feature_fraction`**: Randomly select a subset of features for each iteration (like `colsample_bytree` in XGB).
2.  **`bagging_fraction`**: Randomly select a subset of data (like `subsample` in XGB).

---

## 5. Summary: When to Use LightGBM?

**Use When:**
* You have **Huge Datasets** ($100k+$ rows) and speed is a concern.
* You have **High-dimensional** sparse data.
* You have many **Categorical Features** and don't want to deal with encoding.
* You have limited RAM/Memory.

 **Be Careful When:**
* **Small Datasets:** Leaf-wise growth can overfit easily. If used, limit `max_depth`.
* **Noise:** It is sensitive to noise in the data due to its aggressive splitting.

| Parameter          | Type   | Description                             |
| ------------------ | ------ | --------------------------------------- |
| `num_leaves`       | int    | Max number of leaves in one tree.       |
| `max_depth`        | int    | Max depth of each tree.                 |
| `learning_rate`    | float  | Shrinks weight of new trees.            |
| `n_estimators`     | int    | Number of boosting rounds.              |
| `min_data_in_leaf` | int    | Minimum number of samples per leaf.     |
| `feature_fraction` | float  | Fraction of features used per tree.     |
| `bagging_fraction` | float  | Fraction of data sampled for each tree. |
| `bagging_freq`     | int    | Frequency for bagging (0 = disabled).   |
| `lambda_l1`        | float  | L1 regularization.                      |
| `lambda_l2`        | float  | L2 regularization.                      |
| `objective`        | string | Task type (regression/classification).  |



## Common objectives:

* **Regression: regression, huber, fair**

* **Binary classification: binary**

* **Multi-class classification: multiclass**




| Feature           | LGBMRegressor                | LGBMClassifier                   |
| ----------------- | ---------------------------- | -------------------------------- |
| Task              | Regression (predict numbers) | Classification (predict classes) |
| Objective         | `regression` (default)       | `binary`, `multiclass`           |
| Output            | Continuous values            | Class labels / probabilities     |
| Evaluation Metric | RMSE, MAE, R²                | Accuracy, AUC, Log Loss          |


In [None]:
from lightgbm import LGBMRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = LGBMRegressor(n_estimators=100, learning_rate=0.1, num_leaves=31)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print("RMSE:", mean_squared_error(y_test, y_pred, squared=False))


In [None]:
from lightgbm import LGBMClassifier
from sklearn.metrics import accuracy_score

model = LGBMClassifier(n_estimators=100, learning_rate=0.1, num_leaves=31)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))


# LightGBM Strategy Guide

**LightGBM** (Light Gradient Boosting Machine) is the "Speedster" of the gradient boosting family. It is optimized for high efficiency, low memory usage, and handling large-scale data.

---

### 1. When to Choose LightGBM?
You should prefer LightGBM over Random Forest or standard XGBoost when:
* **Large Datasets:** You have $>10,000$ samples (often millions).
* **High Dimensions:** You have many features (wide data).
* **Speed is Critical:** You need fast training times (iterating quickly).
* **Categorical Features:** Your data contains many categories (it handles them natively without one-hot encoding).

---

### 2. Key Advantages

| Advantage | Powered By... |
| :--- | :--- |
| **Extreme Speed** | **Histogram Algorithm** (buckets continuous values) + **GOSS** (Gradient-based One-Side Sampling). |
| **Low Memory** | **EFB** (Exclusive Feature Bundling) + Histogram binning (uses integers instead of floats). |
| **High Accuracy** | **Leaf-wise Growth** (Can model complex, non-linear patterns better than level-wise). |
| **Convenience** | Native handling of **Categorical Features** and **Missing Values**. |

---

### 3. Deep Dive: Leaf-wise Growth

LightGBM uses a different tree-growing strategy than XGBoost.

* **XGBoost (Level-wise):** Grows the tree horizontally. It splits all nodes at the same depth. It is balanced and "safe" but slower.
* **LightGBM (Leaf-wise):** It picks the **single leaf** with the highest loss reduction (error) and splits it. It creates deeper, asymmetrical trees.
    * *Benefit:* More efficient; focuses on the "hard" parts of the data.
    * *Risk:* Can overfit easily on small datasets if not controlled.



---

### 4. Hyperparameter Tuning Priority
Tuning LightGBM requires a specific order to get the best results without over-complicating.

**Priority 1: The Core (Structure)**
* **`num_leaves`**: The most important parameter. Controls complexity. (Theoretical max $\approx 2^{max\_depth}$).
* **`max_depth`**: Limit this to prevent the tree from growing too deep and overfitting.

**Priority 2: The Learning (Optimization)**
* **`learning_rate`** & **`n_estimators`**: Lower learning rate + Higher estimators usually = Better accuracy (but slower). Use **Early Stopping**.

**Priority 3: Regularization (Prevent Overfitting)**
* **`min_data_in_leaf`**: Very important for leaf-wise growth. Prevents the model from isolating noise in a leaf.
* **`lambda_l1` / `lambda_l2`**: Standard regularization.

**Priority 4: Sampling (Speed & Diversity)**
* **`feature_fraction`**: Randomly select subsets of features (like `colsample_bytree`).
* **`bagging_fraction`**: Randomly select subsets of data rows.

---

### 5. Missing Value Handling
**How it works:**
LightGBM does **not** need imputation (filling with mean/median).
* During training, it learns the "best direction" (left or right) to send missing values for every single split.
* It calculates which direction reduces the loss the most and assigns `NaN` to that path.

---

### Summary
> **LightGBM** = Fast Gradient Boosting using **leaf-wise trees**.
> * **Regressor** $\rightarrow$ Predicts numeric values.
> * **Classifier** $\rightarrow$ Predicts classes.
> * **Winning Edge:** Handles large data, categories, and missing values efficiently.
> * **Watch Out:** Always tune `num_leaves` and `max_depth` to prevent overfitting.

In [None]:

from lightgbm import LGBMClassifier
from sklearn.model_selection import train_test_split
import numpy as np

# Prepare data
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)


model = LGBMClassifier(
    n_estimators=1000,           # Large number with early stopping
    learning_rate=0.05,
    num_leaves=31,
    max_depth=-1,
    min_child_samples=20,
    subsample=0.8,              # Bagging fraction
    colsample_bytree=0.8,       # Feature fraction
    reg_alpha=0.1,              # L1
    reg_lambda=0.1,             # L2
    random_state=42,
    n_jobs=-1,
    importance_type='gain'
)

# Train with early stopping
model.fit(
    X_train, y_train,
    eval_set=[(X_val, y_val)],
    eval_metric='logloss',
    early_stopping_rounds=50,
    verbose=10
)

print(f"Best iteration: {model.best_iteration_}")
print(f"Best score: {model.best_score_}")

# CatBoost (Categorical Boosting)

**CatBoost** (developed by **Yandex**) stands for **"Category Boosting"**.
It is a high-performance open-source library for gradient boosting on decision trees.

**The "Killer Feature":**
It handles **Categorical Features** automatically and natively. You do not need to preprocess your data with One-Hot Encoding or Label Encoding.

---

### 1. Key Innovations

#### A. Native Categorical Handling (Target Statistics)
Most algorithms require you to convert text categories into numbers before training.
* **Standard approach:** One-Hot Encoding (Explodes dimensionality) or Label Encoding (Imposes false order).
* **CatBoost approach:** It converts categories into numbers using **Target Statistics** (the average value of the target for that category).
    * *Twist:* To prevent overfitting (Target Leakage), it uses **Ordered Target Statistics**. It effectively shuffles the data and calculates the average target for a category based only on the rows *before* the current one in that random permutation.

#### B. Ordered Boosting (Solving Prediction Shift)
Standard Gradient Boosting suffers from **Prediction Shift** (a type of target leakage). The model calculates residuals using the same data points it trains on, leading to biased gradients.
* **CatBoost Solution:** It uses a permutation-driven approach. It maintains multiple random permutations of the dataset to calculate residuals for a data point using a model trained *only* on other data points.
* **Result:** Reduces overfitting significantly, especially on **Small Datasets**.

#### C. Symmetric Trees (Oblivious Trees)
XGBoost and LightGBM build flexible trees (Level-wise or Leaf-wise).
CatBoost builds **Symmetric (Oblivious) Trees**.
* **Concept:** In a symmetric tree, the same split condition is applied to **all nodes** at the same depth.
* **Example:** If Depth 1 splits on "Age > 30", *every* node at that level splits on "Age > 30".
* **Benefits:**
    1.  **Extremely Fast Prediction:** The structure is simple and fits perfectly into CPU caches.
    2.  **Less Overfitting:** The structure is constrained and regularized.



---

### 2. Comparison: The "Big Three"

| Feature | **XGBoost** | **LightGBM** | **CatBoost** |
| :--- | :--- | :--- | :--- |
| **Categorical Data** | Needs One-Hot/Label | Native (Good) | **Native (Best)** |
| **Tree Structure** | Level-wise | Leaf-wise (Asymmetric) | **Symmetric (Balanced)** |
| **Overfitting** | Good regularization | Prone on small data | **Very Robust** (Ordered Boosting) |
| **Speed** | Fast | **Fastest Training** | **Fastest Prediction** |
| **Tuning** | Needs Tuning | Needs Tuning | **Great Defaults** ("Set & Forget") |

---

### 3. Key Features

1.  **Robust to Overfitting:** Due to Ordered Boosting, it works exceptionally well on small datasets where other boosting models might memorize the noise.
2.  **Missing Values:** Like XGBoost/LightGBM, it supports missing values ("NaN") automatically.
3.  **GPU Support:** Efficient GPU implementation for faster training.
4.  **Feature Importance:** Provides built-in methods (`model.get_feature_importance()`) to understand which features drive predictions.

---

### 4. How to Use It? (Workflow)

CatBoost is famous for providing great results with default hyperparameters.

**Python Example:**
```python
from catboost import CatBoostClassifier

# Define categorical features indices
cat_features = [0, 2, 5] 

model = CatBoostClassifier(
    iterations=1000,
    learning_rate=0.1,
    depth=6,
    cat_features=cat_features, # Pass indices directly!
    verbose=False
)

# fit model
model.fit(X_train, y_train)
```

| Parameter             | Type   | Description                               |
| --------------------- | ------ | ----------------------------------------- |
| `iterations`          | int    | Number of trees / boosting rounds.        |
| `learning_rate`       | float  | Shrinks weight of new trees.              |
| `depth`               | int    | Depth of each tree.                       |
| `l2_leaf_reg`         | float  | L2 regularization coefficient.            |
| `border_count`        | int    | Number of splits for numeric features.    |
| `bagging_temperature` | float  | Controls randomness in selecting samples. |
| `random_seed`         | int    | Seed for reproducibility.                 |
| `task_type`           | string | `'CPU'` or `'GPU'`.                       |
| `loss_function`       | string | Objective function for task.              |
| `eval_metric`         | string | Metric for validation.                    |


## Common loss_function values:

* **Regression: RMSE, MAE, Quantile**

* **Binary classification: Logloss, CrossEntropy**

* **Multi-class classification: MultiClass, MultiClassOneVsAll**



| Feature              | CatBoostRegressor            | CatBoostClassifier               |
| -------------------- | ---------------------------- | -------------------------------- |
| Task                 | Regression (predict numbers) | Classification (predict classes) |
| Loss function        | RMSE, MAE                    | Logloss, CrossEntropy            |
| Output               | Continuous values            | Class labels / probabilities     |
| Categorical Features | Supported automatically      | Supported automatically          |







In [None]:
from catboost import CatBoostRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = CatBoostRegressor(iterations=1000, learning_rate=0.1, depth=6, verbose=0)
model.fit(X_train, y_train, eval_set=(X_test, y_test))
y_pred = model.predict(X_test)
print("RMSE:", mean_squared_error(y_test, y_pred, squared=False))


In [None]:
from catboost import CatBoostClassifier
from sklearn.metrics import accuracy_score

model = CatBoostClassifier(iterations=1000, learning_rate=0.1, depth=6, verbose=0)
model.fit(X_train, y_train, eval_set=(X_test, y_test))
y_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))


**Mention categorical feature handling without one-hot encoding.**

**Ordered boosting → reduces overfitting on small datasets.**

**Symmetric trees → faster prediction.**

**Hyperparameter tuning: iterations, learning_rate, depth, l2_leaf_reg.**



CatBoost = Gradient Boosting with native categorical handling & ordered boosting.

Regressor → numeric output, Classifier → class labels.

Reduces overfitting → good for small & medium datasets.

Efficient, accurate, GPU-compatibl



| Feature                           | **XGBoost**                                                                                                                                                         | **LightGBM**                                                                                                                                                                                            | **CatBoost**                                                                                                                                                                        |
| --------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Definition**                    | eXtreme Gradient Boosting; gradient boosting using decision trees with regularization                                                                               | Light Gradient Boosting Machine; faster gradient boosting using **leaf-wise growth**                                                                                                                    | Categorical Boosting; gradient boosting with **native categorical handling** and **ordered boosting**                                                                               |
| **Key Idea**                      | Sequentially builds trees to reduce residual error                                                                                                                  | Leaf-wise tree growth → splits largest loss leaf → faster & more accurate                                                                                                                               | Ordered boosting → reduces overfitting, handles categorical features automatically                                                                                                  |
| **Why Use / Advantages**          | - Accurate, robust<br>- Handles missing values<br>- Regularization to reduce overfitting<br>- Widely used in competitions                                           | - Very fast & memory efficient<br>- Handles large datasets<br>- Native categorical support<br>- High accuracy due to leaf-wise trees                                                                    | - Handles categorical features **without encoding**<br>- Reduces overfitting on small datasets<br>- Symmetric trees → faster prediction<br>- GPU support                            |
| **Why Not / Disadvantages**       | - Slower than LightGBM on large datasets<br>- More memory usage<br>- Sensitive to hyperparameters                                                                   | - Can overfit on small datasets (leaf-wise growth)<br>- Slightly complex hyperparameter tuning                                                                                                          | - Slower than LightGBM on very large datasets<br>- Slightly higher memory usage<br>- Less flexible for some advanced tasks                                                          |
| **When to Use**                   | - Small/medium datasets<br>- Need highly accurate model<br>- Want **regularization control**                                                                        | - Very large datasets<br>- Need **fast training** & prediction<br>- Want high accuracy and memory efficiency                                                                                            | - Datasets with **categorical features**<br>- Small to medium datasets<br>- Want low overfitting on small samples                                                                   |
| **When Not to Use**               | - Extremely large datasets where speed is critical                                                                                                                  | - Small datasets prone to overfitting                                                                                                                                                                   | - Extremely large datasets where memory & speed matter more than categorical handling                                                                                               |
| **Key Hyperparameters**           | - `n_estimators` (trees)<br>- `learning_rate`<br>- `max_depth`<br>- `min_child_weight`<br>- `subsample`, `colsample_bytree`<br>- `gamma`, `reg_alpha`, `reg_lambda` | - `num_leaves` (leaf nodes)<br>- `max_depth`<br>- `learning_rate`<br>- `n_estimators`<br>- `min_data_in_leaf`<br>- `feature_fraction`, `bagging_fraction`, `bagging_freq`<br>- `lambda_l1`, `lambda_l2` | - `iterations`<br>- `depth`<br>- `learning_rate`<br>- `l2_leaf_reg`<br>- `border_count` (numeric splits)<br>- `bagging_temperature`<br>- `loss_function`<br>- `task_type` (CPU/GPU) |
| **Handling Categorical Features** |  Must encode manually (one-hot, label encoding)                                                                                                                    |  Partial support (can encode manually or use categorical indices)                                                                                                                                      |  Fully automatic, no preprocessing needed                                                                                                                                          |
| **Training Speed**                | Moderate                                                                                                                                                            | Very fast                                                                                                                                                                                               | Moderate                                                                                                                                                                            |
| **Prediction Speed**              | Fast                                                                                                                                                                | Fastest                                                                                                                                                                                                 | Fast                                                                                                                                                                                |
| **Memory Usage**                  | High                                                                                                                                                                | Low                                                                                                                                                                                                     | Moderate                                                                                                                                                                            |
