# LİGHT GBM

* **XGBOOST'un performansını eğitim süresini artırmaya yönelik geliştirilen bir diğer GBM türü.**
* **XGbOOSt'da hiperparametreler arttığında eğitim süreleri artmakta bu yüzden lightGBM geliştirilmiş.**
* **leaf-wise büyüme stratejisi kullanır. XGBOOST'da level-wise vardır.**

### Model Tahmin

In [1]:
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import mean_squared_error, r2_score
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestRegressor  #Random forest için
from sklearn.ensemble import GradientBoostingRegressor

In [2]:
#veri
df = pd.read_csv("beyzbol_veri.csv")
df = df.dropna()
dms = pd.get_dummies(df[['League', 'Division', 'NewLeague']])
y = df["Salary"]
X_ = df.drop(['Salary', 'League', 'Division', 'NewLeague'], axis=1).astype('float64')
X = pd.concat([X_, dms[['League_N', 'Division_W', 'NewLeague_N']]], axis=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)


In [4]:
#!pip install lightgbm

In [5]:
from lightgbm import LGBMRegressor

In [6]:
lgbm_model = LGBMRegressor().fit(X_train, y_train)

In [7]:
lgbm_model

LGBMRegressor(boosting_type='gbdt', class_weight=None, colsample_bytree=1.0,
              importance_type='split', learning_rate=0.1, max_depth=-1,
              min_child_samples=20, min_child_weight=0.001, min_split_gain=0.0,
              n_estimators=100, n_jobs=-1, num_leaves=31, objective=None,
              random_state=None, reg_alpha=0.0, reg_lambda=0.0, silent=True,
              subsample=1.0, subsample_for_bin=200000, subsample_freq=0)

In [8]:
y_pred = lgbm_model.predict(X_test)
RMSE = np.sqrt(mean_squared_error(y_test, y_pred))
RMSE

363.8712087611089

### Model Tuning

In [9]:
lgbm_model = LGBMRegressor().fit(X_train, y_train)
lgbm_model

LGBMRegressor(boosting_type='gbdt', class_weight=None, colsample_bytree=1.0,
              importance_type='split', learning_rate=0.1, max_depth=-1,
              min_child_samples=20, min_child_weight=0.001, min_split_gain=0.0,
              n_estimators=100, n_jobs=-1, num_leaves=31, objective=None,
              random_state=None, reg_alpha=0.0, reg_lambda=0.0, silent=True,
              subsample=1.0, subsample_for_bin=200000, subsample_freq=0)

In [10]:
params = {"learning_rate": [0.01, 0.001, 0.5, 1, 0.05],
         "max_depth": [5 , 10, -10, 2],
         "n_estimartor": [500, 1000, 2000, 5000, 20]}

In [11]:
lgbm_cv = GridSearchCV(lgbm_model, params, cv =10, n_jobs = -1 , verbose = 2).fit(X_train, y_train)

Fitting 10 folds for each of 100 candidates, totalling 1000 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done  33 tasks      | elapsed:    3.8s
[Parallel(n_jobs=-1)]: Done 272 tasks      | elapsed:    7.9s
[Parallel(n_jobs=-1)]: Done 678 tasks      | elapsed:   15.0s
[Parallel(n_jobs=-1)]: Done 993 out of 1000 | elapsed:   21.2s remaining:    0.1s
[Parallel(n_jobs=-1)]: Done 1000 out of 1000 | elapsed:   21.3s finished


In [12]:
lgbm_cv.best_params_

{'learning_rate': 0.05, 'max_depth': 10, 'n_estimartor': 500}

In [13]:
lgbm_tuned = LGBMRegressor( learning_rate = 0.05,
                          max_depth = 10,
                          n_estimator = 500).fit(X_train, y_train)

In [14]:
y_pred = lgbm_tuned.predict(X_test)
RMSE = np.sqrt(mean_squared_error(y_test, y_pred))

In [15]:
RMSE

368.2039173029082