# LightGBM
* LightGBM uses the leaf-wise tree growth algorithm, while many other popular tools use depth-wise tree growth. 
* Compared with depth-wise growth, the leaf-wise algorithm can convenge much faster. However, the leaf-wise growth may be over-fitting if not used with the appropriate parameters.

In [1]:
import sys, os
sys.path.insert(1, os.path.join(sys.path[0], '..'))
import pandas as pd
import numpy as np
from datetime import datetime
import pandas_profiling
from plots import *
from eda import *
import pandas as pd
import numpy as np
from scipy import stats
from sklearn import metrics
from sklearn.metrics import mean_squared_error
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
import re
import plotly.graph_objects as go
from plotly.graph_objs import *
from plotly.offline import plot
import matplotlib.pyplot as plt
import random
from sklearn.model_selection import train_test_split 
import re
import lightgbm as lgb
%reload_ext autoreload
%autoreload 2


numpy.core.umath_tests is an internal NumPy module and should not be imported. It will be removed in a future NumPy release.


Starting from version 2.2.1, the library file in distribution wheels for macOS is built by the Apple Clang (Xcode_8.3.3) compiler.
This means that in case of installing LightGBM from PyPI via the ``pip install lightgbm`` command, you don't need to install the gcc compiler anymore.
Instead of that, you need to install the OpenMP library, which is required for running LightGBM on the system with the Apple Clang compiler.
You can install the OpenMP library by the following command: ``brew install libomp``.



In [2]:
df_raw = pd.read_csv('../credits.csv', index_col='ID', low_memory=False, parse_dates=True)
categorical_cols = ['PAY_0', 'PAY_2', 'PAY_3', 'PAY_4', 'PAY_5', 'PAY_6', 'MARRIAGE', 'EDUCATION', 'SEX']

for col in categorical_cols:
    df_raw[col] = df_raw[col].astype('category')

In [4]:
data = df_raw.drop(columns=['default payment next month'])
X_train, X_test, Y_train, Y_test = train_test_split(data, df_raw['default payment next month'], test_size=0.3)

In [5]:
param = {'num_leaves': 30, 'objective': 'cross_entropy'}
param['metric'] = ['binary_error', 'auc', 'RMSE']

In [6]:
w = np.random.rand(len(np.array(Y_train)), )
train_data = lgb.Dataset(X_train, label=np.array(Y_train), weight=w)
#validation_data = lgb.Dataset('validation.svm', reference=train_data)
validation_data = lgb.Dataset(X_test, label=np.array(Y_test))

num_round = 10
bst = lgb.train(param, train_data, num_round, valid_sets=[validation_data])

[1]	valid_0's auc: 0.760584	valid_0's rmse: 0.407564	valid_0's binary_error: 0.223111
[2]	valid_0's auc: 0.770144	valid_0's rmse: 0.400639	valid_0's binary_error: 0.223111
[3]	valid_0's auc: 0.772425	valid_0's rmse: 0.394876	valid_0's binary_error: 0.223111
[4]	valid_0's auc: 0.774204	valid_0's rmse: 0.390249	valid_0's binary_error: 0.223111
[5]	valid_0's auc: 0.776081	valid_0's rmse: 0.386468	valid_0's binary_error: 0.223111
[6]	valid_0's auc: 0.776264	valid_0's rmse: 0.38354	valid_0's binary_error: 0.223111
[7]	valid_0's auc: 0.77676	valid_0's rmse: 0.380904	valid_0's binary_error: 0.223111
[8]	valid_0's auc: 0.777679	valid_0's rmse: 0.378765	valid_0's binary_error: 0.219
[9]	valid_0's auc: 0.778791	valid_0's rmse: 0.376931	valid_0's binary_error: 0.186556
[10]	valid_0's auc: 0.779025	valid_0's rmse: 0.375497	valid_0's binary_error: 0.181556



categorical_feature in param dict is overridden.



In [7]:
model = lgb.LGBMClassifier

In [11]:
list(bst.best_score.values())[0], 'accuracy: ' + str(1- list(bst.best_score.values())[0]['binary_error'])

({'auc': 0.7790250966955975,
  'rmse': 0.3754970543527726,
  'binary_error': 0.18155555555555555},
 'accuracy: 0.8184444444444444')

In [9]:
bst.save_model('lgb_model.txt')

<lightgbm.basic.Booster at 0x12555d080>