# How to Beat the Heck Out of XGBoost with LightGBM: Comprehensive Tutorial
## Not anymore, XGBoost, not anymore
![](images/unsplash.jpg)
<figcaption style="text-align: center;">
    <strong>
        Photo by 
        <a href='https://unsplash.com/@grstocks?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText'>GR Stocks</a>
        on 
        <a href='https://unsplash.com/s/photos/win?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText'>Unsplash.</a> All images are by the author unless specified otherwise.
    </strong>
</figcaption>

# Setup

In [None]:
import warnings
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from matplotlib import rcParams

rcParams["figure.figsize"] = [12, 9]
rcParams["xtick.labelsize"] = 15
rcParams["ytick.labelsize"] = 15

warnings.filterwarnings("ignore")

# Introduction

I am confused.

So many people are drawn to XGBoost like a moth to a flame. Yes, it has seen some glorious days in prestigious competitions and is still the most widely-used ML library.

But, it has been 4 years since XGBoost lost its top spot in terms of performance. Specifically, in 2017, Microsoft open-sourced LightGBM (Light Gradient Boosting Machine) that yields equally high accuracy with 2-10 times less training speed.

This is a game-chaning advantage considering the ubiquity of massive, million-row datasets. There are other distinctions that tip the scales towards LightGBM and (fill in...)

By the end of this post, you will learn:
- how to develop LightGBM models for classification and regression tasks
- learn to implement successful cross-validation strategy in LightGBM
- the most critical hyperparameters of LightGBM and how to tune them using Optuna

# XGBoost vs. LightGBM

When LGBM first got released, it came with ground-breaking changes to the way it grows decision trees.

> Both XGBoost and LightGBM are ensebmle algorithms. They use a special type of decision trees, also called weak learners, to capture complex, non-linear patterns.

In XGBoost (and many other libraries), decision trees were built one level at a time:

![](https://lightgbm.readthedocs.io/en/latest/_images/level-wise.png)

This type of structure tend to result in unnecessary nodes and leaves because the trees continued to build until the `max_depth` reached. This leaded to higher model complexity and training cost runtime. 

In contrast, LightGBM takes a leaf-wise approach:

![](https://lightgbm.readthedocs.io/en/latest/_images/leaf-wise.png)

The structure continues to grow with the most promising branches and leaves (nodes with the most delta loss), holding number of decision leaves constant. (If this doesn't make sense to you, don't sweat. This won't prevent you from effectively using LGBM)

This is one of the main reasons why LGBM crushed XGBoost in terms of speed when it first came out.

![image.png](attachment:31471688-061b-493c-84d5-aca91e5228f6.png)

Above is a benchmark comparsion of XGBoost with traditional decision trees and LGBM with leaf-wise structure (first and last columns) on datasets with ~500k-13M samples. It shows that LGBM is orders of magnitude faster than XGB.

LGBM also uses histogram binning of continuous features, which provides even more speed-up than traditional gradient boosting. Binning numeric values significantly decreases the number of split points to consider in decision trees and they remove the need to use sorting algorithms, which are always computation-heavy.

Inspired by LGBM, XGBoost also introduced histogram-binning which gave massive speed-up but still not enough to match LGBM's:

![image.png](attachment:472c7f66-048b-4fd3-84e4-ad65419bdfa3.png)
<figcaption style="text-align: center;">
    <strong>
        Histogram-binning comparison - second and third columns.
    </strong>
</figcaption>

We will continue exploring the differences through interacting with LGBM for various tasks.

# Model initialization, objectives and metrics

Like XGBoost, LGBM has two APIs - core learning API and Sklearn-compatible one. You know I am a big fan of Sklearn, so this tutorial will focus on that version. 

> Sklearn-compatible API of XGBoost and LGBM allows you to integrate their models in the Sklearn ecosystem so that you can use them inside pipelines along with other transformers (TODO....)



# Categorical and missing values support

# Cross-validation with LightGBM

# Most important LightGBM hyperparameters

# LightGBM hyperparameter tuning with Optuna