回调函数(通过callbacks参数传入)
   * record_evaluation===>xgboost(evals_result)
   * early_stopping===>xgboost(early_stopping_rounds)
   * log_evaluation===>xgboost(verbose_eval)

In [5]:
import lightgbm as lgb
from sklearn import datasets
from sklearn.model_selection import train_test_split
import numpy as np
from sklearn.preprocessing import OrdinalEncoder

In [6]:
X = datasets.fetch_covtype().data[:3000]
y = datasets.fetch_covtype().target[:3000]
X_train, X_test, y_train, y_test = train_test_split(X, y)

print(X_train.shape)
print(y_train.shape)
print(np.unique(y_train))  # 7分类任务

(2250, 54)
(2250,)
[1 2 3 4 5 6 7]


In [7]:
enc = OrdinalEncoder()
y_train_enc = enc.fit_transform(y_train.reshape(-1, 1))
y_test_enc = enc.transform(y_test.reshape(-1, 1))
print(np.unique(y_train_enc))

[0. 1. 2. 3. 4. 5. 6.]


In [8]:
train_dataset = lgb.Dataset(data=X_train, label=y_train_enc)

In [9]:
evals_result = {}  # 储存评估指标(内置和自定义)结果
# Create a callback that records the evaluation history into eval_result.
re_func = lgb.record_evaluation(eval_result=evals_result)

val_dataset = lgb.Dataset(data=X_test, label=y_test_enc)
eval_set = [train_dataset, val_dataset]

params = {"objective": "multiclass",
          "num_class": 7,
          "metric": "multi_error",
          "verbosity": -1}
lgb.train(params=params,
          train_set=train_dataset,
          valid_sets=eval_set,
          # List of callback functions that are applied at each iteration.
          callbacks=[re_func])
'''
 after finishing a model training process will have the following structure:
{
 'training':
     {
      'multi_error': [0.48253, 0.35953, ...]
     },
 'valid1':
     {
      'multi_error': [0.480385, 0.357756, ...]
     }
}
'''
evals_result



{'training': OrderedDict([('multi_error',
               [0.4408888888888889,
                0.2604444444444444,
                0.17733333333333334,
                0.14355555555555555,
                0.12044444444444445,
                0.10666666666666667,
                0.10044444444444445,
                0.0951111111111111,
                0.09022222222222222,
                0.08444444444444445,
                0.07511111111111111,
                0.068,
                0.06355555555555556,
                0.057777777777777775,
                0.051111111111111114,
                0.04711111111111111,
                0.04577777777777778,
                0.04,
                0.036444444444444446,
                0.035111111111111114,
                0.03288888888888889,
                0.03288888888888889,
                0.028444444444444446,
                0.024888888888888887,
                0.023555555555555555,
                0.01911111111111111,
                0.016

In [10]:
# Create a callback that activates early stopping.
es_func = lgb.early_stopping(stopping_rounds=200)

val_dataset = lgb.Dataset(data=X_test, label=y_test_enc)
eval_set = [train_dataset, val_dataset]

params = {"objective": "multiclass",
          "num_class": 7,
          "metric": "multi_error",
          "verbosity": -1}
lgb.train(params=params,
          train_set=train_dataset,
          valid_sets=eval_set,
          callbacks=[es_func])

Training until validation scores don't improve for 200 rounds
Did not meet early stopping. Best iteration is:
[52]	training's multi_error: 0	valid_1's multi_error: 0.154667


<lightgbm.basic.Booster at 0x1b8ae26ffa0>

In [11]:
# Create a callback that logs the evaluation results.
le_func = lgb.log_evaluation(
    # period (int, optional (default=1)) –
    # The period to log the evaluation results.
    # The last boosting stage or the boosting stage found by using early_stopping callback is also logged.
    period=10)

val_dataset = lgb.Dataset(data=X_test, label=y_test_enc)
eval_set = [train_dataset, val_dataset]

params = {"objective": "multiclass",
          "num_class": 7,
          "metric": "multi_error"}

lgb.train(params=params,
          train_set=train_dataset,
          valid_sets=eval_set,
          callbacks=[le_func])

You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1863
[LightGBM] [Info] Number of data points in the train set: 2250, number of used features: 33
[LightGBM] [Info] Start training from score -1.770651
[LightGBM] [Info] Start training from score -1.187808
[LightGBM] [Info] Start training from score -2.531300
[LightGBM] [Info] Start training from score -3.143975
[LightGBM] [Info] Start training from score -1.360843
[LightGBM] [Info] Start training from score -2.307039
[LightGBM] [Info] Start training from score -3.074295
[10]	training's multi_error: 0.0844444	valid_1's multi_error: 0.186667
[20]	training's multi_error: 0.0351111	valid_1's multi_error: 0.166667
[30]	training's multi_error: 0.0146667	valid_1's multi_error: 0.168
[40]	training's multi_error: 0.00266667	valid_1's multi_error: 0.157333
[50]	training's multi_error: 0.000444444	valid_1's multi_error: 0.152
[60]	training's multi_error: 0	valid_1's multi_error: 0.149333
[70]	training's multi_e

<lightgbm.basic.Booster at 0x1b8ae26fb80>