#### 任务1：模型训练与预测

* 步骤1 ：导入LightGBM库
* 步骤2 ：使用LGBMClassifier对iris进行训练
* 步骤3 ：将预测的模型对iris进行预测

##### 基于sklearn接口分类

In [1]:
import pandas as pd 
import numpy as np 
import lightgbm as lgb
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import json
import pickle
from sklearn.metrics import accuracy_score

iris = load_iris()
data = iris.data
target = iris.target

X_train, X_test, y_train, y_test = train_test_split(data, target, test_size=0.3)

gbm = lgb.LGBMClassifier(num_leaves=31, learning_rate=0.02, n_estimators=20)
gbm.fit(X_train, y_train, eval_set=[(X_test, y_test)], early_stopping_rounds=5)

y_pred = gbm.predict(X_test, num_iteration=gbm.best_iteration_)
print('The accuracy of prediction is:', accuracy_score(y_test, y_pred))

[1]	valid_0's multi_logloss: 1.06863
[2]	valid_0's multi_logloss: 1.03872
[3]	valid_0's multi_logloss: 1.0103
[4]	valid_0's multi_logloss: 0.983268
[5]	valid_0's multi_logloss: 0.957531
[6]	valid_0's multi_logloss: 0.933007
[7]	valid_0's multi_logloss: 0.90962
[8]	valid_0's multi_logloss: 0.887301
[9]	valid_0's multi_logloss: 0.864325
[10]	valid_0's multi_logloss: 0.842311
[11]	valid_0's multi_logloss: 0.821206
[12]	valid_0's multi_logloss: 0.80096
[13]	valid_0's multi_logloss: 0.781126
[14]	valid_0's multi_logloss: 0.762399
[15]	valid_0's multi_logloss: 0.74476
[16]	valid_0's multi_logloss: 0.727259
[17]	valid_0's multi_logloss: 0.712293
[18]	valid_0's multi_logloss: 0.696184
[19]	valid_0's multi_logloss: 0.680285
[20]	valid_0's multi_logloss: 0.665313
The accuracy of prediction is: 0.8888888888888888




##### 基于原生接口分类

In [2]:
import pandas as pd 
import numpy as py
import lightgbm as lgb 
from sklearn.model_selection import train_test_split
from sklearn import datasets
from sklearn.metrics import accuracy_score

# 加载数据集
iris = datasets.load_iris() 
''' 
iris.head()
iris.shape
'''

# 划分数据集&转换格式
# 使用lgb训练完模型后，实际内存中已经释放掉原始数据了，所以没法再继续训练模型，使用free_raw_data参数可以保留
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.3)
train_data = lgb.Dataset(X_train, label=y_train, free_raw_data=False)
test_data = lgb.Dataset(X_test, label=y_test, free_raw_data=False)

# 参数设置
params = {
    'learning_rate':0.02,
    'lambad_l1':0.1,
    'lambda_l2':0.2,
    'max_depth':4,
    'objective':'multiclass',
    'num_class':3,
    'boosting_type':'gbdt',
    'metric':'multi_logloss',
    'feature_fraction': 0.8,
    'bagging_fraction': 0.8,
    'bagging_freq': 5
}

# 训练模型
gbm = lgb.train(params, train_data, valid_sets=[test_data])
print('训练完成')

# 模型预测
y_pred = gbm.predict(X_test)
y_pred = [list(x).index(max(x)) for x in y_pred]
print(y_pred)

# 评估
print(accuracy_score(y_test, y_pred))

You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 84
[LightGBM] [Info] Number of data points in the train set: 105, number of used features: 4
[LightGBM] [Info] Start training from score -1.043042
[LightGBM] [Info] Start training from score -1.070441
[LightGBM] [Info] Start training from score -1.188224
[1]	valid_0's multi_logloss: 1.07561
[2]	valid_0's multi_logloss: 1.04363
[3]	valid_0's multi_logloss: 1.0131
[4]	valid_0's multi_logloss: 0.983873
[5]	valid_0's multi_logloss: 0.955893
[6]	valid_0's multi_logloss: 0.929441
[7]	valid_0's multi_logloss: 0.904349
[8]	valid_0's multi_logloss: 0.879271
[9]	valid_0's multi_logloss: 0.854882
[10]	valid_0's multi_logloss: 0.831807
[11]	valid_0's multi_logloss: 0.809079
[12]	valid_0's multi_logloss: 0.788309
[13]	valid_0's multi_logloss: 0.767301
[14]	valid_0's multi_logloss: 0.747031
[15]	valid_0's multi_logloss: 0.72748
[16]	valid_0's multi_logloss: 0.710089
[17]	valid_0's multi_logloss: 0.693365
[18]	vali

#### 任务2：模型保存与加载

https://github.com/microsoft/LightGBM/blob/master/examples/python-guide/advanced_example.py

* 步骤1 ：将任务1训练得到的模型，使用pickle进行保存
* 步骤2 ：将任务1训练得到的模型，使用txt进行保存
* 步骤3 ：加载步骤1和步骤2的模型，并进行预测

In [3]:
import pickle 

def pkl_save(filename, file):
    output = open(filename, 'wb')
    pickle.dump(file, output)
    output.close()

def pkl_load(filename):
    pkl_file = open(filename, 'rb')
    file = pickle.load(pkl_file)
    pkl_file.close()
    return file

pkl_save('./model.pkl', gbm)
print('模型已保存')

model_load = pkl_load('./model.pkl')
print('加载完毕')

model_load.predict(X_test)


模型已保存
加载完毕


array([[0.03882975, 0.05836981, 0.90280044],
       [0.93308595, 0.03892203, 0.02799201],
       [0.048595  , 0.11079538, 0.84060962],
       [0.92767235, 0.04437726, 0.02795039],
       [0.93046886, 0.0406326 , 0.02889854],
       [0.04846423, 0.90338492, 0.04815086],
       [0.03882975, 0.05836981, 0.90280044],
       [0.04403358, 0.08362581, 0.87234061],
       [0.03894869, 0.05548542, 0.90556589],
       [0.03882975, 0.05836981, 0.90280044],
       [0.9326008 , 0.03890179, 0.02849741],
       [0.04175387, 0.91523877, 0.04300736],
       [0.04214741, 0.90818242, 0.04967018],
       [0.03894869, 0.05548542, 0.90556589],
       [0.93216516, 0.03888362, 0.02895122],
       [0.0366785 , 0.05732066, 0.90600084],
       [0.93046886, 0.0406326 , 0.02889854],
       [0.93046886, 0.0406326 , 0.02889854],
       [0.92683207, 0.04438235, 0.02878558],
       [0.04398452, 0.90721682, 0.04879866],
       [0.04343858, 0.91142798, 0.04513344],
       [0.0389186 , 0.0562151 , 0.90486629],
       [0.

In [4]:
gbm.save_model('model.txt')
bst = lgb.Booster(model_file='model.txt')
y_pred = bst.predict(X_test)
y_pred

array([[0.03882975, 0.05836981, 0.90280044],
       [0.93308595, 0.03892203, 0.02799201],
       [0.048595  , 0.11079538, 0.84060962],
       [0.92767235, 0.04437726, 0.02795039],
       [0.93046886, 0.0406326 , 0.02889854],
       [0.04846423, 0.90338492, 0.04815086],
       [0.03882975, 0.05836981, 0.90280044],
       [0.04403358, 0.08362581, 0.87234061],
       [0.03894869, 0.05548542, 0.90556589],
       [0.03882975, 0.05836981, 0.90280044],
       [0.9326008 , 0.03890179, 0.02849741],
       [0.04175387, 0.91523877, 0.04300736],
       [0.04214741, 0.90818242, 0.04967018],
       [0.03894869, 0.05548542, 0.90556589],
       [0.93216516, 0.03888362, 0.02895122],
       [0.0366785 , 0.05732066, 0.90600084],
       [0.93046886, 0.0406326 , 0.02889854],
       [0.93046886, 0.0406326 , 0.02889854],
       [0.92683207, 0.04438235, 0.02878558],
       [0.04398452, 0.90721682, 0.04879866],
       [0.04343858, 0.91142798, 0.04513344],
       [0.0389186 , 0.0562151 , 0.90486629],
       [0.

#### 任务3：分类、回归和排序任务

https://github.com/microsoft/LightGBM/blob/master/examples/python-guide/sklearn_example.py
https://github.com/microsoft/LightGBM/blob/master/examples/python-guide/simple_example.py

* 步骤1 ：学习LightGBM中sklearn接口的使用，导入分类、回归和排序接口。
* 步骤2 ：学习LightGBM中原生train接口的使用。
* 步骤3 ：二分类任务
使用make_classification，创建一个二分类数据集。
使用sklearn接口完成训练和预测。
使用原生train接口完成训练和预测。
* 步骤4 ：多分类任务
使用make_classification，创建一个多分类数据集。
使用sklearn接口完成训练和预测。
使用原生train接口完成训练和预测。
* 步骤5 ：回归任务
使用make_regression，创建一个回归数据集。
使用sklearn接口完成训练和预测
使用原生train接口完成训练和预测

In [6]:
from lightgbm import LGBMClassifier, LGBMRegressor, LGBMRanker

gbm = lgb.LGBMClassifier(
    num_leaves=31,
    objective= 'multiclass',
    metric = 'multi_logloss', 
    learning_rate=0.02, 
    n_estimators=20
    )

##### 原生接口使用

**1. 训练参数**

*  params：传参
*  train_date：训练数据集，X_train X_test
*  vaild_set：验证集，y_train y_test
*  num_boost_round：最大迭代次数
*  early_stopping_rounds：N次迭代没有优化就停止训练
*  verbose_eval：每间隔verbose_eval次迭代就输出一次信息

**2. 预测**

predict(data, num_iteration=None)

num_iteration：选择第几次迭代用于预测，如果使用了 early_stopping_rounds，那么最佳的一次迭代将被使用

In [None]:
import pandas as pd
import numpy as np
import lightgbm as lgb
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification, make_regression
from matplotlib import pyplot as plt

##### 二分类