<a href="https://colab.research.google.com/github/YaCpotato/python/blob/master/LightGBM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## ドライブのマウントとデータセットを保存してあるディレクトリへ移動

In [0]:
from google.colab import drive
drive.mount('/content/drive')

In [0]:
%cd drive/'My Drive'/'Colab Notebooks'/

## データセットのロードとプレビュー

In [22]:
import pandas as pd
import io
train = pd.read_csv('./poker-hand-training-true.csv', header=None)
test = pd.read_csv('./poker-hand-testing.csv', header=None)
train.head()
test.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10
0,1,1,1,13,2,4,2,3,1,12,0
1,3,12,3,2,3,11,4,5,2,5,1
2,1,9,4,6,1,4,3,2,3,9,1
3,1,4,3,13,2,13,2,1,3,6,1
4,3,10,2,7,1,2,2,11,4,9,0


## データ整形：ラベルの抽出と学習、教師データの分割

In [23]:
from sklearn import preprocessing
from keras.utils import np_utils
from sklearn.model_selection import train_test_split
from tensorflow.keras.utils import normalize

train_Y = train[:][10]
test_Y = test[:][10]

X_train, X_test, = train_test_split(train,train_size=0.7)
Y_train,Y_test = train_test_split(train_Y,train_size=0.7)
print(X_train.shape,Y_train.shape,X_test.shape,Y_test.shape)

X_train.drop(10,axis=1,inplace=True)
X_test.drop(10,axis=1,inplace=True)
print(X_train.shape,Y_train.shape,X_test.shape,Y_test.shape)

(17507, 11) (17507,) (7503, 11) (7503,)
(17507, 10) (17507,) (7503, 10) (7503,)


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


## 学習データからValidation用にデータを取り出す

In [24]:
import numpy as np
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
Y_train = Y_train.astype('float32')
Y_test = Y_test.astype('float32')

X_train, X_val,= train_test_split(X_train,train_size=0.7)
Y_train, Y_val = train_test_split(Y_train,train_size=0.7)
print(X_train.shape,Y_train.shape,X_val.shape,Y_val.shape)

(12254, 10) (12254,) (5253, 10) (5253,)


## LightGBMモデルを定義

In [0]:
import lightgbm as lgb
from sklearn.model_selection import GridSearchCV

params = {
    'n_estimators': 10000,  # 大きめにとっておく
}

lgb = lgb.LGBMClassifier(**params)


## 学習

In [26]:
%%time
lgb.fit(X_train, Y_train,
        early_stopping_rounds=100,
        eval_set=[[X_val, Y_val]],
        verbose=1)

[1]	valid_0's multi_logloss: 1.01505
Training until validation scores don't improve for 100 rounds.
[2]	valid_0's multi_logloss: 1.01095
[3]	valid_0's multi_logloss: 1.00914
[4]	valid_0's multi_logloss: 1.00775
[5]	valid_0's multi_logloss: 1.00714
[6]	valid_0's multi_logloss: 1.00639
[7]	valid_0's multi_logloss: 1.00601
[8]	valid_0's multi_logloss: 1.00553
[9]	valid_0's multi_logloss: 1.00513
[10]	valid_0's multi_logloss: 1.00473
[11]	valid_0's multi_logloss: 1.00464
[12]	valid_0's multi_logloss: 1.00443
[13]	valid_0's multi_logloss: 1.00426
[14]	valid_0's multi_logloss: 1.00429
[15]	valid_0's multi_logloss: 1.00413
[16]	valid_0's multi_logloss: 1.00398
[17]	valid_0's multi_logloss: 1.00385
[18]	valid_0's multi_logloss: 1.00374
[19]	valid_0's multi_logloss: 1.00356
[20]	valid_0's multi_logloss: 1.00368
[21]	valid_0's multi_logloss: 1.00377
[22]	valid_0's multi_logloss: 1.00385
[23]	valid_0's multi_logloss: 1.00405
[24]	valid_0's multi_logloss: 1.00422
[25]	valid_0's multi_logloss: 1.00

LGBMClassifier(boosting_type='gbdt', class_weight=None, colsample_bytree=1.0,
               importance_type='split', learning_rate=0.1, max_depth=-1,
               min_child_samples=20, min_child_weight=0.001, min_split_gain=0.0,
               n_estimators=10000, n_jobs=-1, num_leaves=31, objective=None,
               random_state=None, reg_alpha=0.0, reg_lambda=0.0, silent=True,
               subsample=1.0, subsample_for_bin=200000, subsample_freq=0)

## 評価

In [27]:
# テストデータを予測する
Y_pred = lgb.predict_proba(X_test)
Y_pred_max = np.argmax(Y_pred, axis=1)  # 最尤と判断したクラスの値にする

# 精度 (Accuracy) を計算する
from sklearn.metrics import confusion_matrix,classification_report
print (confusion_matrix(Y_test, Y_pred_max))
print (classification_report(Y_test, Y_pred_max))


[[3483  269    0    0    0    0    0    3   13    0]
 [2940  217    0    0    0    0    0    3   11    0]
 [ 343   27    0    0    0    0    0    0    0    0]
 [ 133   10    0    0    0    0    0    0    1    0]
 [  24    2    0    0    0    0    0    0    0    0]
 [   9    1    0    0    0    0    0    0    0    0]
 [  11    0    0    0    0    0    0    0    0    0]
 [   1    0    0    0    0    0    0    0    0    0]
 [   0    0    0    0    0    0    0    0    0    0]
 [   2    0    0    0    0    0    0    0    0    0]]
              precision    recall  f1-score   support

         0.0       0.50      0.92      0.65      3768
         1.0       0.41      0.07      0.12      3171
         2.0       0.00      0.00      0.00       370
         3.0       0.00      0.00      0.00       144
         4.0       0.00      0.00      0.00        26
         5.0       0.00      0.00      0.00        10
         6.0       0.00      0.00      0.00        11
         7.0       0.00      0.00   

  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)
