<a href="https://colab.research.google.com/github/YaCpotato/python/blob/master/LightGBM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## ドライブのマウントとデータセットを保存してあるディレクトリへ移動

In [0]:
from google.colab import drive
drive.mount('/content/drive')

In [0]:
%cd drive/'My Drive'/'Colab Notebooks'/

## データセットのロードとプレビュー

In [3]:
import pandas as pd
import io
train = pd.read_csv('./poker-hand-training-true.csv', header=None)
test = pd.read_csv('./poker-hand-testing.csv', header=None)
train.head()
test.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10
0,1,1,1,13,2,4,2,3,1,12,0
1,3,12,3,2,3,11,4,5,2,5,1
2,1,9,4,6,1,4,3,2,3,9,1
3,1,4,3,13,2,13,2,1,3,6,1
4,3,10,2,7,1,2,2,11,4,9,0


## データ整形：ラベルの抽出と学習、教師データの分割

In [4]:
from sklearn import preprocessing
from keras.utils import np_utils
from sklearn.model_selection import train_test_split
from tensorflow.keras.utils import normalize

train_Y = train[:][10]
test_Y = test[:][10]

X_train, X_test, = train_test_split(train,train_size=0.7)
Y_train,Y_test = train_test_split(train_Y,train_size=0.7)
print(X_train.shape,Y_train.shape,X_test.shape,Y_test.shape)

X_train.drop(10,axis=1,inplace=True)
X_test.drop(10,axis=1,inplace=True)
print(X_train.shape,Y_train.shape,X_test.shape,Y_test.shape)

Using TensorFlow backend.


(17507, 11) (17507,) (7503, 11) (7503,)
(17507, 10) (17507,) (7503, 10) (7503,)


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


## 学習データからValidation用にデータを取り出す

In [5]:
import numpy as np
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
Y_train = Y_train.astype('float32')
Y_test = Y_test.astype('float32')

X_train, X_val,= train_test_split(X_train,train_size=0.7)
Y_train, Y_val = train_test_split(Y_train,train_size=0.7)
print(X_train.shape,Y_train.shape,X_val.shape,Y_val.shape)

(12254, 10) (12254,) (5253, 10) (5253,)


## LightGBMモデルを定義

In [0]:
import lightgbm as lgb

params = {
    'n_estimators': 10000,  # 大きめにとっておく
}

lgb_model = lgb.LGBMClassifier(**params)

## 学習

In [7]:
%%time
lgb_model.fit(X_train, Y_train,eval_set=[(X_val, Y_val)], early_stopping_rounds=100)

[1]	valid_0's multi_logloss: 1.01439
Training until validation scores don't improve for 100 rounds.
[2]	valid_0's multi_logloss: 0.991101
[3]	valid_0's multi_logloss: 1.01081
[4]	valid_0's multi_logloss: 0.991012
[5]	valid_0's multi_logloss: 1.02348
[6]	valid_0's multi_logloss: 1.01084
[7]	valid_0's multi_logloss: 0.998051
[8]	valid_0's multi_logloss: 1.03
[9]	valid_0's multi_logloss: 1.0043
[10]	valid_0's multi_logloss: 1.01097
[11]	valid_0's multi_logloss: 1.08132
[12]	valid_0's multi_logloss: 1.0111
[13]	valid_0's multi_logloss: 1.02425
[14]	valid_0's multi_logloss: 0.998558
[15]	valid_0's multi_logloss: 1.06192
[16]	valid_0's multi_logloss: 0.998928
[17]	valid_0's multi_logloss: 1.08618
[18]	valid_0's multi_logloss: 0.999575
[19]	valid_0's multi_logloss: 1.0384
[20]	valid_0's multi_logloss: 1.0001
[21]	valid_0's multi_logloss: 1.05715
[22]	valid_0's multi_logloss: 1.03258
[23]	valid_0's multi_logloss: 1.06898
[24]	valid_0's multi_logloss: 1.00762
[25]	valid_0's multi_logloss: 1.122

LGBMClassifier(boosting_type='gbdt', class_weight=None, colsample_bytree=1.0,
               importance_type='split', learning_rate=0.1, max_depth=-1,
               min_child_samples=20, min_child_weight=0.001, min_split_gain=0.0,
               n_estimators=10000, n_jobs=-1, num_leaves=31, objective=None,
               random_state=None, reg_alpha=0.0, reg_lambda=0.0, silent=True,
               subsample=1.0, subsample_for_bin=200000, subsample_freq=0)

## 評価

In [8]:
# テストデータを予測する
Y_pred = lgb_model.predict_proba(X_test)
Y_pred_max = np.argmax(Y_pred, axis=1)  # 最尤と判断したクラスの値にする

# 精度 (Accuracy) を計算する
from sklearn.metrics import confusion_matrix,classification_report
print (confusion_matrix(Y_test, Y_pred_max))
print (classification_report(Y_test, Y_pred_max))


[[3693    5    0    0    0    0    0    5    0    4]
 [3218    5    0    0    0    0    0    2    0    5]
 [ 363    0    0    0    0    0    0    1    0    1]
 [ 139    0    0    0    0    0    0    0    0    0]
 [  30    0    0    0    0    0    0    0    0    0]
 [  17    0    0    0    0    0    0    0    0    0]
 [  12    0    0    0    0    0    0    0    0    0]
 [   1    0    0    0    0    0    0    0    0    0]
 [   1    0    0    0    0    0    0    0    0    0]
 [   1    0    0    0    0    0    0    0    0    0]]
              precision    recall  f1-score   support

         0.0       0.49      1.00      0.66      3707
         1.0       0.50      0.00      0.00      3230
         2.0       0.00      0.00      0.00       365
         3.0       0.00      0.00      0.00       139
         4.0       0.00      0.00      0.00        30
         5.0       0.00      0.00      0.00        17
         6.0       0.00      0.00      0.00        12
         7.0       0.00      0.00   

  'precision', 'predicted', average, warn_for)
