# LightGBM
LightGBM stands for Light Gradient Boosting Machine. It is a gradient boosting framework developed by Microsoft that is designed to be:

Faster than other GBM implementations 

More memory efficient

Able to handle large-scale data

Like XGBoost, it's an ensemble method that uses decision trees, but it employs several novel techniques to achieve its performance advantages.

### Key Philosophy: Why be "Light"?
Traditional boosting algorithms (including XGBoost's default) grow trees level-wise (horizontally). LightGBM grows trees leaf-wise (vertically), which is more efficient but can lead to overfitting on small datasets if not properly regularized.

### How it Works
LightGBM shares the same fundamental gradient boosting framework with XGBoost. The objective function is similar:
Obj(θ) = Σ_i L(y_i, ŷ_i) + Σ_k Ω(f_k)

Where the regularization term Ω(f_k) also penalizes the number of leaves and the L2 norm of leaf weights.

However, its key differentiators lie in HOW it builds the trees.

Gradient-Based One-Side Sampling (GOSS)
Idea: Not all data points are equally important for boosting. Data points with larger gradients (i.e., those that are poorly predicted) contribute more to the information gain.

### How it works:

Sort the training instances by the absolute value of their gradients.

Keep the top a% of instances with the largest gradients.

Randomly sample b% from the remaining instances (those with small gradients).

When computing the gain for splits, GOSS amplifies the weight of the sampled data with small gradients by a constant factor (1-a)/b.

This allows LightGBM to focus computational resources on the "difficult" examples while still maintaining the original data distribution, leading to faster training with minimal accuracy loss.

## Exclusive Feature Bundling (EFB)
Idea: In high-dimensional data, many features are sparse and often mutually exclusive (they never take non-zero values simultaneously). EFB bundles these features together to reduce the dimensionality.

How it works:

Identify Exclusive Features: Find features that rarely take non-zero values simultaneously.

Bundle Them: Merge these features into a single "bundled" feature.

Result: The number of features is significantly reduced, speeding up the training process without losing much information.

This is particularly effective for one-hot encoded categorical features.

## Leaf-Wise (Best-First) Tree Growth
This is the most significant difference from XGBoost's default approach.

XGBoost (Level-Wise): Grows the tree level by level. At each level, all leaves are split simultaneously. This can be inefficient as it may split even leaves that contribute little to the loss reduction.

LightGBM (Leaf-Wise): At each step, it identifies the leaf that will yield the largest reduction in the loss and splits only that leaf. This results in much deeper trees for the same number of leaves and often achieves lower loss.

Trade-off: Leaf-wise growth is more prone to overfitting, especially on small datasets, which is why LightGBM has important regularization parameters like num_leaves and min_data_in_leaf.


## LightGBM	               
Tree Growth	         Leaf-wise	 

Speed	             Faster	  

Memory Usage         Lower

Handling Large Data	  Excellent

Categorical Features    Native support (no need for one-hot encoding)
## XGBoost	
                         
Tree Growth	  	            Higher 

Speed	       	          Level-wise

Memory Usage               Slower

Handling Large Data        Good 

Categorical Features      Requires one-hot encoding or preprocessing


# When to Choose LightGBM vs XGBoost

## Choose LightGBM when:

You have very large datasets (millions of rows)

Training time is critical

You have limited computational resources (memory)

Your data has many categorical features

You're dealing with high-dimensional data

## Choose XGBoost when:

Your dataset is small to medium-sized

You want maximum performance and are willing to wait

You need extremely robust results (less prone to overfitting on small data)

You want fine-grained control over the training process

```bash

Leaf-Wise Tree Growth (Important)

XGBoost grows level-wise

LightGBM grows leaf-wise (chooses the best leaf to split)


         Root
       /      \
   Leaf A     Leaf B  <- LightGBM splits where maximum gain is


   This gives:

 Better accuracy
 Deeper useful trees

But risk:
Overfitting — solved using max_depth


```
- CatBoost Regressor → numeric output

- CatBoost Classifier → class probability output

- Core boosting logic is almost identical

- Only loss function and output mapping changes

## Key Difference

In regression, CatBoost predicts numbers.

In classification, CatBoost predicts probabilities (0 to 1 for binary, or softmax for multi-class).

Internally, the tree structure, ordered boosting, and categorical handling are almost identical.

Only the loss function and gradient calculation change between Regressor and Classifier.


# LightGBM Regressor vs Classifier 
## Problem Type & Output
```bash

LightGBM Classifier	            LightGBM Regressor
Classification problems	        Regression problems
Predicts class labels or        Predicts continuous values
          probabilities	
Output: 0/1, or [0.3, 0.7]   	Output: 125.7, -2.45
```
## Objectives & Metrics
```bash
Classifier	                      Regressor
binary (binary classification)	  regression (L2 loss)
multiclass	                      regression_l1 (MAE)
cross_entropy	                  huber (Huber loss)
fair (Fair loss)	              fair (Fair loss)
eval_metric: auc, binary_logloss, eval_metric: l2, l1, rmse, mape
     error
```


## Use LightGBM Classifier when:
Predicting categories (yes/no, spam/not spam, multiple classes)

Need probability outputs

Working with classification metrics (accuracy, precision, AUC)

## Use LightGBM Regressor when:
Predicting continuous numerical values

Forecasting quantities (sales, prices, temperatures)

Working with regression metrics (RMSE, MAE, R²)


## Key Difference

In regression, LightGBM predicts numbers directly.

In classification, LightGBM predicts probabilities using sigmoid (binary) or softmax (multi-class).

Tree-building, leaf-wise growth, histogram binning, categorical handling remain the same.

Only loss function and output mapping differ.




- LightGBM Regressor → numeric output

- LightGBM Classifier → probability output

- Core tree-building logic is identical

- Only loss function and output transformation differ



In [1]:
import numpy as np



class LightGBMTree:
    def __init__(self, max_depth=3, min_samples_leaf=20, num_bins=16):
        self.max_depth = max_depth
        self.min_samples_leaf = min_samples_leaf
        self.num_bins = num_bins
        self.tree = None

    def _bin_data(self, X):
        bins = []
        X_binned = np.zeros_like(X, dtype=int)

        for j in range(X.shape[1]):
            col = X[:, j]
            edges = np.linspace(col.min(), col.max(), self.num_bins + 1)
            X_binned[:, j] = np.digitize(col, edges) - 1
            bins.append(edges)

        return X_binned, bins

    def _best_split(self, X, grad, hess):
        best_gain = -1
        best_feature = None
        best_bin = None

        for j in range(X.shape[1]):
            for b in range(self.num_bins):
                left = (X[:, j] <= b)
                right = ~left

                if left.sum() < self.min_samples_leaf or right.sum() < self.min_samples_leaf:
                    continue

                G_left, H_left = grad[left].sum(), hess[left].sum()
                G_right, H_right = grad[right].sum(), hess[right].sum()

                gain = (G_left**2 / (H_left + 1e-6)) + (G_right**2 / (H_right + 1e-6))

                if gain > best_gain:
                    best_gain = gain
                    best_feature = j
                    best_bin = b

        return best_feature, best_bin

    def _build(self, X, grad, hess, depth):
        if depth == self.max_depth:
            return {"leaf": -grad.sum() / (hess.sum() + 1e-6)}

        feature, split_bin = self._best_split(X, grad, hess)

        if feature is None:
            return {"leaf": -grad.sum() / (hess.sum() + 1e-6)}

        left = (X[:, feature] <= split_bin)
        right = ~left

        return {
            "feature": feature,
            "bin": split_bin,
            "left": self._build(X[left], grad[left], hess[left], depth + 1),
            "right": self._build(X[right], grad[right], hess[right], depth + 1),
        }

    def fit(self, X, grad, hess):
        X_binned, self.bins = self._bin_data(X)
        self.tree = self._build(X_binned, grad, hess, depth=0)

    def _predict_row(self, row, node):
        if "leaf" in node:
            return node["leaf"]

        feature = node["feature"]
        bin_value = np.digitize(row[feature], self.bins[feature]) - 1

        if bin_value <= node["bin"]:
            return self._predict_row(row, node["left"])
        else:
            return self._predict_row(row, node["right"])

    def predict(self, X):
        return np.array([self._predict_row(x, self.tree) for x in X])



# Gradient Boosting using the custom LightGBM tree


class LightGBMRegressorCore:
    def __init__(self, n_estimators=10, learning_rate=0.1, max_depth=3):
        self.n_estimators = n_estimators
        self.learning_rate = learning_rate
        self.max_depth = max_depth
        self.trees = []

    def fit(self, X, y):
        pred = np.zeros_like(y, dtype=float)

        for _ in range(self.n_estimators):
            grad = pred - y
            hess = np.ones_like(y)

            tree = LightGBMTree(max_depth=self.max_depth)
            tree.fit(X, grad, hess)

            update = tree.predict(X)
            pred -= self.learning_rate * update
            self.trees.append(tree)

    def predict(self, X):
        pred = np.zeros(X.shape[0])

        for tree in self.trees:
            pred -= self.learning_rate * tree.predict(X)

        return pred


In [None]:
# If we can do this (Classifier):
from lightgbm import LGBMClassifier
model_clf = LGBMClassifier(
    n_estimators=1000,
    num_leaves=31,
    learning_rate=0.05,
    categorical_feature=['category_col'],
    metric='auc'
)

# Then you automatically know this (Regressor):
from lightgbm import LGBMRegressor
model_reg = LGBMRegressor(
    n_estimators=1000,           # Same
    num_leaves=31,               # Same  
    learning_rate=0.05,          # Same
    categorical_feature=['category_col'],  # Same
    metric='rmse'                # Only this changes!
)