# Light GBM

Light GBM, XGBoost'un eğitim süresi performansını artırmaya yönelik geliştirilen bir diğer GBM türüdür.

* Daha performanslı
* Level-wise büyüme stratejisi yerine Leaf-wise büyüme stratejisi
* Breadth-first search (BFS) yerine depth-first search (DFS)

xgboost, Level-wise büyüme stratejisini kullanır ve Breadth-first search (BFS) yapar.

# Light GBM - Model

In [38]:
import numpy as np
from sklearn.model_selection import train_test_split, GridSearchCV,cross_val_score
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt
from sklearn.preprocessing import scale
from sklearn import model_selection
from sklearn.tree import DecisionTreeRegressor, DecisionTreeClassifier
from sklearn.neighbors import KNeighborsRegressor
from sklearn.ensemble import BaggingRegressor

# uyarılar gözükmesin
from warnings import filterwarnings
filterwarnings('ignore')

# bilgilerin gözükmesi için
from sklearn import set_config
set_config(print_changed_only=False)

In [39]:
import pandas as pd
hit = pd.read_csv("Hitters.csv")
df = hit.copy()
df = df.dropna()
dms = pd.get_dummies(df[['League', 'Division', 'NewLeague']])
y = df["Salary"]
X_ = df.drop(['Salary', 'League', 'Division', 'NewLeague'], axis=1).astype('float64')
X = pd.concat([X_, dms[['League_N', 'Division_W', 'NewLeague_N']]], axis=1)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                    test_size=0.25,
                                                    random_state=42)

In [40]:
!pip install lightgbm



In [None]:
# conda install -c conda-forge lightgbm

In [41]:
from lightgbm import LGBMRegressor

In [42]:
lgbm = LGBMRegressor()

In [43]:
lgbm_model = lgbm.fit(X_train, y_train)

[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000099 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 831
[LightGBM] [Info] Number of data points in the train set: 197, number of used features: 19
[LightGBM] [Info] Start training from score 543.483442


# Light GBM - Tahmin

In [44]:
y_pred = lgbm_model.predict(X_test,
                            num_iteration = lgbm_model.best_iteration_)

In [45]:
np.sqrt(mean_squared_error(y_test, y_pred))

363.8712087611089

# Light GBM - Model Tuning

In [46]:
lgbm_model

In [47]:
lgbm_grid = {
    "colsample_bytree":[0.4,0.5,0.6,0.9,1],
    "learning_rate":[0.01,0.1,0.5,1],
    "n_estimators":[20,40,100,200,500,1000],
    "max_depth":[1,2,3,4,5,6,7,8]
}

9600 ağaç, xgboost'un 3 katı 3 dakikadan daha az bir şekilde bu işlemi tamamlarsa lightgbm daha hızlıdır diyebiliriz. lightgbm, xgboost'un 3 katından daha fazla fit işlemini 1.5 dakikada tamamladı.

In [48]:
lgbm = LGBMRegressor()

In [49]:
lgbm_cv_model = GridSearchCV(lgbm, lgbm_grid, cv=10, n_jobs=-1,verbose=2)

In [50]:
lgbm_cv_model.fit(X_train, y_train)

Fitting 10 folds for each of 960 candidates, totalling 9600 fits
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000100 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 831
[LightGBM] [Info] Number of data points in the train set: 197, number of used features: 19
[LightGBM] [Info] Start training from score 543.483442


In [51]:
# modelin optimum parametre değerleri
lgbm_cv_model.best_params_

{'colsample_bytree': 0.5,
 'learning_rate': 0.1,
 'max_depth': 6,
 'n_estimators': 20}

In [52]:
# final modelimizi oluşturalım
lgbm_tuned = LGBMRegressor(learning_rate = 0.1,
                          max_depth = 6,
                          n_estimators = 20,
                          colsample_bytree = 0.5)

In [53]:
lgbm_tuned = lgbm_tuned.fit(X_train,y_train)

[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000112 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 831
[LightGBM] [Info] Number of data points in the train set: 197, number of used features: 19
[LightGBM] [Info] Start training from score 543.483442


In [None]:
# final modeli için test hatası

In [54]:
y_pred = lgbm_tuned.predict(X_test)



In [55]:
np.sqrt(mean_squared_error(y_test,y_pred))

375.6085209015434

ligthGBM, XGBoost'a göre...