- Porto Kernel : https://www.kaggle.com/c/porto-seguro-safe-driver-prediction/kernels
- 2nd Place Lightgbm Solution : https://www.kaggle.com/xiaozhouwang/2nd-place-lightgbm-solution
- 포트투 커널 스터디 5 : https://www.youtube.com/watch?v=mZSh9_Lh_5g&list=PLC_wC_PMBL5NLKkMooi-n4iK4gv3VqXyo&index=5


----

# 2nd Place Lightgbm Solution

Part of 2nd Place soultion : Lightgbm model with private score 0.29124 and public 1b socre 0.28555

## Contents

1. [import Library](#001)
1. [Define Gini](#002)
1. [Define Global Variables](#003)
1. [Define Evalerror](#004)
1. [Setting Train & Test](#005)
1. [Categorycal Features](#006)
1. [One Hot Encoding](#007)
1. [Indicate Variable](#008)
1. [Model Development](#009)



<a class="anchor" id="001"></a>

## Import Library

In [1]:
import lightgbm as lgbm
from scipy import sparse as ssp
from sklearn.model_selection import StratifiedKFold
import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder

import warnings
warnings.filterwarnings('ignore')

This means that in case of installing LightGBM from PyPI via the ``pip install lightgbm`` command, you don't need to install the gcc compiler anymore.
Instead of that, you need to install the OpenMP library, which is required for running LightGBM on the system with the Apple Clang compiler.
You can install the OpenMP library by the following command: ``brew install libomp``.


<a class='anchor' id='002'></a>

## Define Gini

In [2]:
def Gini(y_true, y_pred):
    
    # Check and get number of sample
    assert y_true.shape == y_pred.shape
    n_samples = y_true.shape[0]
    
    # sort rows on prediction column
    # from largest to smallest
    arr = np.array([y_true, y_pred]).transpose()
    true_order = arr[arr[:, 0].argsort()][::-1, 0]
    pred_order = arr[arr[:, 1].argsort()][::-1, 0]
    
    #get Lorenz curves
    L_true = np.cumsum(true_order) * 1. / np.sum(true_order)
    L_pred = np.cumsum(pred_order) * 1. / np.sum(pred_order)
    L_ones = np.linspace(1 / n_samples, 1, n_samples)
    
    # get Gini coefficients (area between curves)
    G_true = np.sum(L_ones - L_true)
    G_pred = np.sum(L_ones - L_pred)
    
    # normalize to true Gini coefficient
    return G_pred * 1. / G_true

<a class="anchor" id="003"></a>

## Define Global Variables

In [3]:
cv_only = True
save_cv = True
full_train = False

<a class="anchor" id="004"></a>

## Define Evalerror

In [4]:
def evalerror(preds, dtrain):
    labels = dtrain.get_label()
    return 'gini', Gini(labels, preds), True

<a class="anchor" id="005"></a>

## Setting Train & Test

path에 train, test 데이터가 있는 경로를 입력해준다.

In [5]:
%%time
path = "../../input/porto/"
train = pd.read_csv(path+'train.csv', nrows=10000)
train_label = train['target']
train_id = train['id']
test = pd.read_csv(path+'test.csv', nrows=10000)
test_id = test['id']

CPU times: user 138 ms, sys: 34 ms, total: 172 ms
Wall time: 188 ms


In [6]:
# 불러온 데이터를 확인한다.
print(train.shape)
train.head()

(10000, 59)


Unnamed: 0,id,target,ps_ind_01,ps_ind_02_cat,ps_ind_03,ps_ind_04_cat,ps_ind_05_cat,ps_ind_06_bin,ps_ind_07_bin,ps_ind_08_bin,...,ps_calc_11,ps_calc_12,ps_calc_13,ps_calc_14,ps_calc_15_bin,ps_calc_16_bin,ps_calc_17_bin,ps_calc_18_bin,ps_calc_19_bin,ps_calc_20_bin
0,7,0,2,2,5,1,0,0,1,0,...,9,1,5,8,0,1,1,0,0,1
1,9,0,1,1,7,0,0,0,0,1,...,3,1,1,9,0,1,1,0,1,0
2,13,0,5,4,9,1,0,0,0,1,...,4,2,7,7,0,1,1,0,1,0
3,16,0,0,1,2,0,0,1,0,0,...,2,2,4,9,0,0,0,0,0,0
4,17,0,0,2,0,1,0,1,0,0,...,3,1,1,3,0,0,0,1,1,0


In [7]:
# StratifiedKFold는 같은 구성비로 데이터를 나누는 기능
# 이때 인덱스가 겹치는 것을 방지한다.
NFOLDS = 5
kfold = StratifiedKFold(n_splits=NFOLDS, shuffle=True, random_state=218)
kfold

StratifiedKFold(n_splits=5, random_state=218, shuffle=True)

In [8]:
# 확인 결과 대부분의 값은 0이다.
y = train['target'].values
y

array([0, 0, 0, ..., 0, 0, 0])

In [9]:
# 사용하지 않는 변수는 제거한다.
drop_feature = ['id', 'target']
X = train.drop(drop_feature, axis=1)
# 남은 변수를 리스트로 변환한다.
feature_names = X.columns.tolist()

<a class="anchor" id="feature_engineering"></a>

## Feature Engineering

데이터를 불러온 다음 훈련에 적합하도록 변수를 조정한다. 전처리는 N/A값을 처리는 한면, 엔지니어링은 예측력을 높일 수 있는 기능을 만드는 차이가 있다.

In [10]:
cat_features = [c for c in feature_names if ('cat' in c and 'count' not in c)]
num_features = [c for c in feature_names if ('cat' not in c and 'calc' not in c)]

In [11]:
# 확인결과 카테고리 변수들은 아래와 같다. 총 14개가 있다.
cat_feature = [c for c in feature_names if ('cat' in c and 'count' not in c)]
print(len(cat_feature))
print('처음 5개만 출력 : ', cat_feature[:5])

14
처음 5개만 출력 :  ['ps_ind_02_cat', 'ps_ind_04_cat', 'ps_ind_05_cat', 'ps_car_01_cat', 'ps_car_02_cat']


In [12]:
# train == -1 은 조건검색으로 True / False로 나옴.
(train==-1).head()

Unnamed: 0,id,target,ps_ind_01,ps_ind_02_cat,ps_ind_03,ps_ind_04_cat,ps_ind_05_cat,ps_ind_06_bin,ps_ind_07_bin,ps_ind_08_bin,...,ps_calc_11,ps_calc_12,ps_calc_13,ps_calc_14,ps_calc_15_bin,ps_calc_16_bin,ps_calc_17_bin,ps_calc_18_bin,ps_calc_19_bin,ps_calc_20_bin
0,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False


In [13]:
# 조건식에 sum()을 붙이면 True의 개수를 반환한다.
# 행 단위로 계산하기 위해 axis=1로 설정한다.
# 확인 결과 2개의 -1을 가진 행이 가장 많다.
(train==-1).sum(axis=1).value_counts()

2    4314
1    2600
0    2123
3     928
4      34
7       1
dtype: int64

In [14]:
# 위 과정을 표로 볼 수 있다.
# -1값을 2개 가진 행은 4314개이며 평균은 0.03이다.
pd.concat( [(train==-1).sum(axis=1), train['target']], axis=1).groupby(0).agg(['count', 'mean']).reset_index()

Unnamed: 0_level_0,0,target,target
Unnamed: 0_level_1,Unnamed: 1_level_1,count,mean
0,0,2123,0.045219
1,1,2600,0.043077
2,2,4314,0.033148
3,3,928,0.028017
4,4,34,0.058824
5,7,1,0.0


In [15]:
train['missing'] = (train==-1).sum(axis=1).astype(float)
train['missing'].head()

0    1.0
1    2.0
2    3.0
3    0.0
4    2.0
Name: missing, dtype: float64

In [16]:
test['missing'] = (test==-1).sum(axis=1).astype(float)
test['missing'].head()

0    2.0
1    1.0
2    2.0
3    3.0
4    2.0
Name: missing, dtype: float64

In [17]:
num_features.append('missing')
num_features

['ps_ind_01',
 'ps_ind_03',
 'ps_ind_06_bin',
 'ps_ind_07_bin',
 'ps_ind_08_bin',
 'ps_ind_09_bin',
 'ps_ind_10_bin',
 'ps_ind_11_bin',
 'ps_ind_12_bin',
 'ps_ind_13_bin',
 'ps_ind_14',
 'ps_ind_15',
 'ps_ind_16_bin',
 'ps_ind_17_bin',
 'ps_ind_18_bin',
 'ps_reg_01',
 'ps_reg_02',
 'ps_reg_03',
 'ps_car_11',
 'ps_car_12',
 'ps_car_13',
 'ps_car_14',
 'ps_car_15',
 'missing']

<a class="anchor" id="006"></a>

## Categorycal Features

In [18]:
? le.transform()

Object `le.transform` not found.


In [19]:
for c in cat_features:
    le = LabelEncoder()
    le.fit(train[c])
    train[c] = le.transform(train[c])
    test[c] = le.transform(test[c])


<a class="anchor" id="007"></a>

## One Hot Encoding

In [20]:
enc = OneHotEncoder()
enc.fit(train[cat_features])

OneHotEncoder(categorical_features=None, categories=None,
       dtype=<class 'numpy.float64'>, handle_unknown='error',
       n_values=None, sparse=True)

In [21]:
X_cat = enc.transform(train[cat_features])
X_t_cat = enc.transform(test[cat_features])

<a class="anchor" id="008"></a>

## Ind Variable

In [22]:
# 총 18개의 변수가 해당한다.
ind_features = [c for c in feature_names if 'ind' in c]
count = 0
print(len(ind_features))
ind_features

18


['ps_ind_01',
 'ps_ind_02_cat',
 'ps_ind_03',
 'ps_ind_04_cat',
 'ps_ind_05_cat',
 'ps_ind_06_bin',
 'ps_ind_07_bin',
 'ps_ind_08_bin',
 'ps_ind_09_bin',
 'ps_ind_10_bin',
 'ps_ind_11_bin',
 'ps_ind_12_bin',
 'ps_ind_13_bin',
 'ps_ind_14',
 'ps_ind_15',
 'ps_ind_16_bin',
 'ps_ind_17_bin',
 'ps_ind_18_bin']

In [23]:
# 첫번째 요소는 대체하는 거고 두번째 요소부터는 추가가 된다.
# 새로운 칼럼을 만드는 기능 : ind feature를 하나로 묶어서 새로운 카테고리를 만들어 낸 것.
for c in ind_features:
    #break
    if count==0:
        train['new_ind'] = train[c].astype(str)+"_"  # 타입이 int64에서 object(str)으로 바뀐다.
        test['new_ind'] = test[c].astype(str)+'_'
        count+=1
    else:
        train['new_ind'] += train[c].astype(str)+'_'
        test['new_ind'] += test[c].astype(str)+'_'


In [24]:
# 위 함수 결과 
# 총 1만개가 있으며 7010가지의 값이 있다.
print(train['new_ind'].shape)
print(train['new_ind'].value_counts().shape[0])
train['new_ind'].head()

(10000,)
7010


0    2_2_5_2_1_0_1_0_0_0_0_0_0_0_11_0_1_0_
1     1_1_7_1_1_0_0_1_0_0_0_0_0_0_3_0_0_1_
2    5_4_9_2_1_0_0_1_0_0_0_0_0_0_12_1_0_0_
3     0_1_2_1_1_1_0_0_0_0_0_0_0_0_8_1_0_0_
4     0_2_0_2_1_1_0_0_0_0_0_0_0_0_9_1_0_0_
Name: new_ind, dtype: object

In [25]:
cat_count_features = []

# cat_features는 카테고리 변수만 모아놓은 것.
# cat_features+['new_ind'] 하면 새로운 변수가 추가된다.
for c in cat_features+['new_ind']:
    d = pd.concat([train[c], test[c]]).value_counts().to_dict()
    train['%s_count'%c] = train[c].apply(lambda x:d.get(x,0))
    test['%s_count'%c] = test[c].apply(lambda x:d.get(x,0))
    cat_count_features.append('%s_count'%c)


In [26]:
X_cat

<10000x183 sparse matrix of type '<class 'numpy.float64'>'
	with 140000 stored elements in Compressed Sparse Row format>

In [27]:
train_list = [train[num_features+cat_count_features].values, X_cat,]
train_list

[array([[2.0000e+00, 5.0000e+00, 0.0000e+00, ..., 1.9833e+04, 2.2200e+02,
         1.0000e+00],
        [1.0000e+00, 7.0000e+00, 0.0000e+00, ..., 1.9833e+04, 1.5600e+02,
         1.0000e+00],
        [5.0000e+00, 9.0000e+00, 0.0000e+00, ..., 1.9833e+04, 2.7300e+02,
         2.0000e+00],
        ...,
        [1.0000e+00, 5.0000e+00, 1.0000e+00, ..., 1.9833e+04, 1.5500e+02,
         4.0000e+00],
        [1.0000e+00, 6.0000e+00, 1.0000e+00, ..., 1.9833e+04, 2.7300e+02,
         1.1000e+01],
        [0.0000e+00, 4.0000e+00, 1.0000e+00, ..., 1.9833e+04, 8.4000e+01,
         2.0000e+00]]),
 <10000x183 sparse matrix of type '<class 'numpy.float64'>'
 	with 140000 stored elements in Compressed Sparse Row format>]

In [28]:
test_list = [test[num_features+cat_count_features].values, X_cat,]
test_list

[array([[0.0000e+00, 8.0000e+00, 0.0000e+00, ..., 1.9833e+04, 3.5400e+02,
         8.0000e+00],
        [4.0000e+00, 5.0000e+00, 0.0000e+00, ..., 1.9833e+04, 7.8700e+02,
         1.0000e+00],
        [5.0000e+00, 3.0000e+00, 0.0000e+00, ..., 1.9833e+04, 1.5200e+02,
         1.0000e+00],
        ...,
        [4.0000e+00, 1.0000e+01, 0.0000e+00, ..., 1.9833e+04, 4.5800e+02,
         1.0000e+00],
        [1.0000e+00, 5.0000e+00, 0.0000e+00, ..., 1.9833e+04, 1.5200e+02,
         2.0000e+00],
        [1.0000e+00, 6.0000e+00, 0.0000e+00, ..., 1.9833e+04, 2.9120e+03,
         2.0000e+00]]),
 <10000x183 sparse matrix of type '<class 'numpy.float64'>'
 	with 140000 stored elements in Compressed Sparse Row format>]

In [29]:
X = ssp.hstack(train_list).tocsr()
X_test = ssp.hstack(test_list).tocsr()
X

<10000x222 sparse matrix of type '<class 'numpy.float64'>'
	with 420920 stored elements in Compressed Sparse Row format>

In [30]:
ssp.hstack(train_list)

<10000x222 sparse matrix of type '<class 'numpy.float64'>'
	with 420920 stored elements in COOrdinate format>

In [31]:
# tocsr()은 Compressed Spares Row로 압축된 로우로 변경해준다.
ssp.hstack(train_list).tocsr()

<10000x222 sparse matrix of type '<class 'numpy.float64'>'
	with 420920 stored elements in Compressed Sparse Row format>

<a class="anchor" id="009"></a>

## Model Development

In [32]:
learning_rate=0.1
num_leaves = 15
min_data_in_leaf = 2000
feature_fraction = 0.6
num_boost_round = 10000

In [33]:
params = {
    "objective": "binary",
    "boosting_type" : "gbdt",
    "learning_rate" : learning_rate,
    "num_leaves" : num_leaves,
    "max_bin" : 256,
    "feature_fraction" : feature_fraction,
    "verbosity": 0,
    "drop_rate": 0.1,
    "is_unbalance": False,
    "max_drop": 50,
    "min_child_sample": 10,
    "min_child_weight": 150,
    "min_split_gain": 0,
    "subsample":0.9
}

In [34]:
x_score= []
final_cv_train = np.zeros(len(train_label))
final_cv_pred = np.zeros(len(test_id))

In [35]:
for i in kfold.split(X, train_label):
    print(i)

(array([   1,    2,    3, ..., 9997, 9998, 9999]), array([   0,    8,   10, ..., 9990, 9992, 9994]))
(array([   0,    1,    2, ..., 9997, 9998, 9999]), array([   5,    7,   13, ..., 9980, 9989, 9991]))
(array([   0,    1,    3, ..., 9994, 9996, 9998]), array([   2,    6,   15, ..., 9995, 9997, 9999]))
(array([   0,    2,    5, ..., 9995, 9997, 9999]), array([   1,    3,    4, ..., 9993, 9996, 9998]))
(array([   0,    1,    2, ..., 9997, 9998, 9999]), array([   9,   12,   24, ..., 9959, 9971, 9987]))


In [36]:
for s in range(16):
    cv_train = np.zeros(len(train_label))
    cv_pred = np.zeros(len(test_id))
    
    params['seed'] = s
    
    if cv_only:
        kf = kfold.split(X, train_label)
        
        best_trees = []
        fold_scores = []
        
        for i, (train_fold, validate) in enumerate(kf):
            X_train, X_validate, label_train, label_validate = \
                X[train_fold, :], X[validate, :], train_label[train_fold], train_label[validate]
            dtrain = lgbm.Dataset(X_train, label_train)
            dvalid = lgbm.Dataset(X_validate, label_validate, reference=dtrain)
            bst = lgbm.train(params, dtrain, num_boost_round, valid_sets=dvalid, feval=evalerror,
                            verbose_eval=100, early_stopping_rounds=100)
            best_trees.append(bst.best_iteration)
            cv_pred += bst.predict(X_test, num_iteration=bst.best_iteration)
            cv_train[validate] += bst.predict(X_validate)
            
            score = Gini(label_validate, cv_train[validate])
            print(score)
            fold_scores.append(score)
        
        cv_pred /= NFOLDS
        final_cv_train += cv_train
        final_cv_pred += cv_pred
        
        print('cv scroe :')
        print(Gini(train_label, cv_train))
        print('current score : ', Gini(train_label, final_cv_train / (s + 1.)), s+1)
        print(fold_scores)
        print(best_trees, np.mean(best_trees))
        
        x_score.append(Gini(train_label, cv_train))

print(x_score)
pd.DataFrame({'id':test_id, 'target':final_cv_pred / 16.}).to_csv('../../input/porto/lgbm3_pred_avg.csv', index=False)
pd.DataFrame({'id':train_id, 'target':final_cv_train / 16.}).to_csv('../../input/porto/lgbm3_cv_avg.csv', index=False)
            
            

Training until validation scores don't improve for 100 rounds.
[100]	valid_0's binary_logloss: 0.247265	valid_0's gini: -0.0761586
Early stopping, best iteration is:
[1]	valid_0's binary_logloss: 0.247265	valid_0's gini: -0.0761586
-0.07615857826384138
Training until validation scores don't improve for 100 rounds.
[100]	valid_0's binary_logloss: 0.247397	valid_0's gini: -0.00436317
Early stopping, best iteration is:
[1]	valid_0's binary_logloss: 0.247397	valid_0's gini: -0.00436317
-0.00436316883685302
Training until validation scores don't improve for 100 rounds.
[100]	valid_0's binary_logloss: 0.247397	valid_0's gini: -0.0709596
Early stopping, best iteration is:
[1]	valid_0's binary_logloss: 0.247397	valid_0's gini: -0.0709596
-0.07095962359120252
Training until validation scores don't improve for 100 rounds.
[100]	valid_0's binary_logloss: 0.247397	valid_0's gini: -0.0313492
Early stopping, best iteration is:
[1]	valid_0's binary_logloss: 0.247397	valid_0's gini: -0.0313492
-0.0313

Training until validation scores don't improve for 100 rounds.
[100]	valid_0's binary_logloss: 0.244047	valid_0's gini: 0.00433818
Early stopping, best iteration is:
[1]	valid_0's binary_logloss: 0.244047	valid_0's gini: 0.00433818
0.00433818433818444
cv scroe :
-6.609332761807546e-05
current score :  -6.609332761807546e-05 6
[-0.07615857826384138, -0.00436316883685302, -0.07095962359120252, -0.03134916292811026, 0.00433818433818444]
[1, 1, 1, 1, 1] 1.0
Training until validation scores don't improve for 100 rounds.
[100]	valid_0's binary_logloss: 0.247265	valid_0's gini: -0.0761586
Early stopping, best iteration is:
[1]	valid_0's binary_logloss: 0.247265	valid_0's gini: -0.0761586
-0.07615857826384138
Training until validation scores don't improve for 100 rounds.
[100]	valid_0's binary_logloss: 0.247397	valid_0's gini: -0.00436317
Early stopping, best iteration is:
[1]	valid_0's binary_logloss: 0.247397	valid_0's gini: -0.00436317
-0.00436316883685302
Training until validation scores d

[100]	valid_0's binary_logloss: 0.247397	valid_0's gini: -0.0709596
Early stopping, best iteration is:
[1]	valid_0's binary_logloss: 0.247397	valid_0's gini: -0.0709596
-0.07095962359120252
Training until validation scores don't improve for 100 rounds.
[100]	valid_0's binary_logloss: 0.247397	valid_0's gini: -0.0313492
Early stopping, best iteration is:
[1]	valid_0's binary_logloss: 0.247397	valid_0's gini: -0.0313492
-0.03134916292811026
Training until validation scores don't improve for 100 rounds.
[100]	valid_0's binary_logloss: 0.244047	valid_0's gini: 0.00433818
Early stopping, best iteration is:
[1]	valid_0's binary_logloss: 0.244047	valid_0's gini: 0.00433818
0.00433818433818444
cv scroe :
-6.609332761807546e-05
current score :  -6.609332761807546e-05 12
[-0.07615857826384138, -0.00436316883685302, -0.07095962359120252, -0.03134916292811026, 0.00433818433818444]
[1, 1, 1, 1, 1] 1.0
Training until validation scores don't improve for 100 rounds.
[100]	valid_0's binary_logloss: 0.2

====== 위 함수를 뜯어보자 ========

In [118]:
for s in range(16):
    break

In [125]:
cv_train = np.zeros(len(train_label))
cv_pred = np.zeros(len(test_id))
# 앙상블은 다양성이 있어서 되는 것. 랜덤넘버를 다양하게 설정하여 조정할 수 있다.
params['seed'] = s

10000

In [43]:
kf = kfold.split(X, train_label)
# kfold를 뜰어보면 이렇게 생겼다.
for i in kf:
    print(i)

(array([   1,    2,    3, ..., 9997, 9998, 9999]), array([   0,    8,   10, ..., 9990, 9992, 9994]))
(array([   0,    1,    2, ..., 9997, 9998, 9999]), array([   5,    7,   13, ..., 9980, 9989, 9991]))
(array([   0,    1,    3, ..., 9994, 9996, 9998]), array([   2,    6,   15, ..., 9995, 9997, 9999]))
(array([   0,    2,    5, ..., 9995, 9997, 9999]), array([   1,    3,    4, ..., 9993, 9996, 9998]))
(array([   0,    1,    2, ..., 9997, 9998, 9999]), array([   9,   12,   24, ..., 9959, 9971, 9987]))


In [45]:
best_trees = []; fold_scores = []

In [54]:
train_fold

array([   0,    1,    2, ..., 9997, 9998, 9999])

In [63]:
for i,(train_fold, validate) in enumerate(kf):
    break

In [67]:
print(len(train_fold), train_fold.shape, train_fold)

8001 (8001,) [   0    1    2 ... 9997 9998 9999]


In [68]:
print(len(validate), validate.shape, validate)

1999 (1999,) [   9   12   24 ... 9959 9971 9987]


In [60]:
 X_train, X_validate, label_train, label_validate = \
        X[train_fold,:], X[validate], train_label[train_fold], train_label[validate]

In [72]:
dtrain = lgbm.Dataset(X_train, label_train)
dvalid = lgbm.Dataset(X_validate, label_validate, reference=dtrain)
print(dtrain, dvalid)

<lightgbm.basic.Dataset object at 0x11aef3390> <lightgbm.basic.Dataset object at 0x11aef3588>


In [76]:
lgbm.train(params, dtrain, 
           num_boost_round, 
           valid_sets=dvalid, 
           feval=evalerror, 
           verbose_eval=100, 
           early_stopping_rounds=100)\

Training until validation scores don't improve for 100 rounds.
[100]	valid_0's binary_logloss: 0.244047	valid_0's gini: 0.00433818
Early stopping, best iteration is:
[1]	valid_0's binary_logloss: 0.244047	valid_0's gini: 0.00433818


<lightgbm.basic.Booster at 0x11af15da0>

In [None]:
 bst = lgbm.train(params, dtrain, num_boost_round, valid_sets=dvalid, feval=evalerror,
                            verbose_eval=100, early_stopping_rounds=100)

====== 위 함수를 뜯어보자 완료 ========

==============================

----