# <p style='text-align: center;'> Light Gradient Boosting Classifier (LGBM) </p>

## What is Gradient Boosting?

Gradient Boosting is a method during which weak learners and continuously improve into strong learners. Unlike Random Forest in which all trees are built independently, Boosted Trees are likely to reach higher accuracy due to the continuous learning.

One of the most popular ones is XGBoost. It is known for its popularity on Kaggle, speed and reliable performance for multiclass classification projects. XGBoost is only one type of several Gradient Boosting Decision Trees (GBDT).


## The problem with Gradient Boosting Decision Trees

The trees in GBDTs are trained in sequence by evaluating the residual errors of each iteration and improving the following one. GBDTs need to compute the information gain across all instances and consider all possible split points while doing so. This is very time-consuming! As a result, with the emergence of big data, they are facing challenges especially due to insufficient speed.

They have encountered this problem when running a grid search for XGBoost model. Training the model took ages! So, when they found out about LightGBM, they were intrigued.

## What is Light Gradient Boosting?

LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed and efficient with the following advantages:

- Faster training speed and higher efficiency.


- Lower memory usage.


- Better accuracy.


- Support of parallel and GPU learning.


- Capable of handling large-scale data.


A couple of years ago, Microsoft announced its gradient boosting framework LightGBM. Nowadays, it steals the spotlight in gradient boosting machines. Kagglers start to use LightGBM more than XGBoost. LightGBM is 6 times faster than XGBoost.

Light GBM is a relatively new algorithm and have long list of parameters given in the LightGBM documentation,

The size of dataset is increasing rapidly. It is become very difficult for traditional data science algorithms to give accurate results. Light GBM is prefixed as Light because of its high speed. Light GBM can handle the large size of data and takes lower memory to run.

Another reason why Light GBM is so popular is because it focuses on accuracy of results. LGBM also supports GPU learning and thus data scientists are widely using LGBM for data science application development.

It is not advisable to use LGBM on small datasets. Light GBM is sensitive to overfitting and can easily overfit small data.


**LightGBM** is one of the more novel types of GBDT. It was developed by a team of researchers at Microsoft in 2016. To put it simply, Light GBM introduces two novel features that are not present in XGBoost. The purpose of them is to help the algorithm with large number of variables and data instances.

#### What are those novel features?
<b> 1.Gradient-based One-Side Sampling

This is a sampling method that reduces the amount of data that a decision tree uses for learning.

This sampling method considers the size of the gradient (=training error). It keeps the instances where the error is still large. The instances with a small error are randomly sampled, before being introduced to the tree. As a result, each tree has to crunch through less data!

<b> 2. Exclusive Feature Bundling

As you can guess from the name, this method reduces the number of features or variables. Very often, large number of features of your dataset are sparse (mostly zeros), especially if you work with many categorical variables. Many of these are mutually exclusive at the same time. This method bundles these very similar features together into a single feature.

Both features result in what is the general advantage of LightGMB over XGBoost and other GBDTs — it is over less computationally exhaustive and thus faster.

## LightGBM intuition 
LightGBM is a gradient boosting framework that uses tree based learning algorithm.

LightGBM documentation states that -


- LightGBM grows tree vertically while other tree based learning algorithms grow trees horizontally. 


- It means that LightGBM grows tree leaf-wise while other algorithms grow level-wise. It will choose the leaf with max delta loss to grow. When growing the same leaf, leaf-wise algorithm can reduce more loss than a level-wise algorithm.


- So, we need to understand the distinction between leaf-wise tree growth and level-wise tree growth.

### Leaf-wise tree growth 
    
Leaf-wise tree growth can best be explained with the following visual -

![image.png](attachment:image.png)


### Level-wise tree growth 

Most decision tree learning algorithms grow tree by level (depth)-wise. Level-wise tree growth can best be explained with the following visual -

![image-2.png](attachment:image-2.png)





### Important points about tree-growth
- If we grow the full tree, best-first (leaf-wise) and depth-first (level-wise) will result in the same tree. The difference is in the order in which the tree is expanded. Since we don't normally grow trees to their full depth, order matters.


- Application of early stopping criteria and pruning methods can result in very different trees. Because leaf-wise chooses splits based on their contribution to the global loss and not just the loss along a particular branch, it often (not always) will learn lower-error trees "faster" than level-wise.


- For a small number of nodes, leaf-wise will probably out-perform level-wise. As we add more nodes, without stopping or pruning they will converge to the same performance because they will literally build the same tree eventually.

### Some parameters to tune for LGBM

If you used GBDTs before, you will be familiar with most of there. Here is a list of parameters you can tune or feed into grid-search to find your optimal combination.


- **Max_depth** — to limit complexity and prevent overfitting.


- **Num_leaves** — to limit complexity and prevent overfitting, should be smaller than 2^(max_depth).


- **Bagging_fraction** — specifies the fraction of data to be used for each iteration, will increase speed.


- **Learning_rate** — increases accuracy if set to a small value.


- **Num_iterations** — number of boosting interaction, default is 100, increase for higher accuracy.


- **Device — options** — ‘gpu’ or ‘cpu’, choose ‘gpu’ (graphical processing unit) for faster computation.

## XGBoost Vs LightGBM 

- XGBoost is a very fast and accurate ML algorithm. But now it's been challenged by LightGBM — which runs even faster with comparable model accuracy and more hyperparameters for users to tune.


- The key difference in speed is because XGBoost split the tree nodes one level at a time and LightGBM does that one node at a time.


- So XGBoost developers later improved their algorithms to catch up with LightGBM, allowing users to also run XGBoost in split-by-leaf mode (grow_policy = ‘lossguide’). Now XGBoost is much faster with this improvement, but LightGBM is still about 1.3X — 1.5X the speed of XGB.


- Another difference between XGBoost and LightGBM is that XGBoost has a feature that LightGBM lacks — **monotonic** constraint. It will sacrifice some model accuracy and increase training time, but may improve model interpretability.

## What is the difference between LightGBM and CatBoost?
In **CatBoost**, symmetric trees, or balanced trees, refer to the splitting condition being consistent across all nodes at the same depth of the tree. **LightGBM and XGBoost**, on the other hand, results in asymmetric trees, meaning splitting condition for each node across the same depth can differ.

## Why LightGBM is better?
**Faster training speed and higher efficiency:** Light GBM use histogram based algorithm i.e it buckets continuous feature values into discrete bins which fasten the training procedure. Lower memory usage: Replaces continuous values to discrete bins which result in lower memory