### [Recognizing Tic-Tac-Toe Winners with Neural Networks 用神经网络识别井字棋游戏的胜者](https://www.kdnuggets.com/2017/09/neural-networks-tic-tac-toe-keras.html)
[数据集](https://archive.ics.uci.edu/ml/datasets/Tic-Tac-Toe+Endgame)是由958个可能的井字棋游戏结局组成的，每个数据有9个表示井字棋棋盘的9个格子状态的变量，第10个变量表示该数据描述的结局对于玩家X而言的胜负情况。

由于一局井字棋有255,168种可能的棋局方式，所以很难通过设置规则的方式布置棋局。

以下是对数据的一种描述。
>1. top-left-square: {x,o,b} 
>2. top-middle-square: {x,o,b} 
>3. top-right-square: {x,o,b} 
>4. middle-left-square: {x,o,b} 
>5. middle-middle-square: {x,o,b} 
>6. middle-right-square: {x,o,b} 
>7. bottom-left-square: {x,o,b} 
>8. bottom-middle-square: {x,o,b} 
>9. bottom-right-square: {x,o,b} 
>10. Class: {positive,negative}


每个格子可以被标为x,o或者b(空)，positive或者negative表明玩家X的胜负情况。

#### 准备工作
以下是我们在构建神经网络前对数据进行的一些处理工作。

- **将类别变量编码为数字。**我们将{x, o ,b}转化为{0, 1, 2}， sklearn中preprocessing的[`LabelEncoder class`](http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html)可以实现这个功能。
- **用`One-hot`编码所有独立的类别变量。**`One-hot`编码将独立变量进一步表示为向量，因为仅仅将变量转化为数字并不能保证它们两两之间的欧式距离相等。这里我们可以用含2个元素的向量表示3个类别。sklearn中的[`OneHotEncoder class`](http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html)可实现这个功能。
- **避免虚拟变量陷阱**
（这个概念不是特别理解）教程原文如下。
>Remove every third column to avoid dummy variable trap - As the one-hot encoding process in Scikit-learn creates as many columns for each variables as there are possible options (as per the dataset), one column needs to be removed in order to avoid what is referred to as the dummy variable trap. This is so to avoid redundant data which could bias results. In our case, each square has 3 possible options (27 columns), which can be expressed with 2 'bit' columns (I leave this to you to confirm), and so we remove every third column from the newly formed dataset (leaving us with 18).

  PS 虚拟变量陷阱：若有m个定性变量，则只在模型中引入m-1个虚拟变量，若引入m个虚拟变量，会导致模型解释变量间出现完全共线性的情况。

  比如这里我们要表示每个格子的状态，下面这种编码方式不合适，会导致完全共线性。
  >X: [1, 0 ,0]

  >O: [0, 1 ,0]

  >b: [0, 0, 1]

  所以按原文，每过两列需要删除一列，这样编码就变成了
  >X: [1, 0] 

  >O: [0, 1]

  >b: [0, 0]

- **编码目标类别变量**，如{positive, negative}→{0,1}

- **训练集/测试集分割**，我们将20%的数据作为测试集。

#### 神经网络
我们一共有18个作为输入的变量，做2元分类
- **输入单元**。因为有18个独立变量，所以我们需要18个输入神经元。

- **隐藏层**。一个简单的决定每个隐藏层神经元个数的方法：把独立变量和输出变量的个数加起来再除以2。这里共19个变量，我们向下取整，每层设9个神经单元。关于隐藏层层数的设置，最好的方法是先设较小的层数，然后不断增加层数，直到网络的表现不再随层数提高为止。这里我们先设2层隐藏层。

- **激活函数**。惯例：隐藏层默认使用`ReLU`函数，而2元分类的输出层使用`sigmoid`函数。这里我们按照这个惯例搭建网络。

- **优化器**。我们使用Adam优化器

- **损失函数**。我们使用二分类交叉熵损失函数

- **权重初始化**。我们设定随机数作为初始权重。

网络如下：
1. dense_1_input: InputLayer
input: (None, 18)
output: (None, 18)
2. dense_1: Dense
input: (None, 18)
output: (None, 9)
3. dense_2: Dense
input: (None, 9)
output: (None, 9)
4. dense_3: Dense
input: (None, 9)
output: (None, 1)

In [314]:
import numpy as np
import pandas as pd
import random
from sklearn import preprocessing as ppc
from keras.models import Sequential
from keras.layers import Dense,Activation
from keras.optimizers import Adam
from sklearn.model_selection import train_test_split
from keras.utils import np_utils

In [315]:
def load_data(file_name):
    data=[]
    with open(file_name) as txt_data:
        lines=txt_data.readlines()
        for line in lines:
            #.strip()默认去除行首尾空格
            line=line.strip().split(',') 
            data.append(line)
        return np.array(data)

In [316]:
def split_data(dataset):
    feature=[]
    label=[]
    for i in range(len(dataset)):
        feature.append([data for data in dataset[i][:-1]])
        label.append(dataset[i][-1])
    return np.array(feature),np.array(label)

In [317]:
path = 'data.txt'
data = load_data(path)
feature,label=data[:,:-1],data[:,-1]

#### 准备工作
以下是我们在构建神经网络前对数据进行的一些处理工作。

**将类别变量编码为数字。**我们将{x, o ,b}转化为{0, 1, 2}， sklearn中preprocessing的[`LabelEncoder class`](http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html)可以实现这个功能。

In [318]:
#类别变量编码为数字
label_encoder = ppc.LabelEncoder()
label_encoder.fit(['x','o','b'])

for i in range(len(feature)):
    feature[i]=label_encoder.transform(feature[i])
    
for i in range(len(label)):
    label[i]=1 if label[i]=='positive' else 0

print(feature[:3])
print(label[:3])


[['2' '2' '2' '2' '1' '1' '2' '1' '1']
 ['2' '2' '2' '2' '1' '1' '1' '2' '1']
 ['2' '2' '2' '2' '1' '1' '1' '1' '2']]
['1' '1' '1']


**用`One-hot`编码所有独立的类别变量。**`One-hot`编码将独立变量进一步表示为向量，因为仅仅将变量转化为数字并不能保证它们两两之间的欧式距离相等。这里我们可以用含2个元素的向量表示3个类别。sklearn中的[`OneHotEncoder class`](http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html)可实现这个功能。

In [319]:
onehot_enc=ppc.OneHotEncoder(handle_unknown='ignore')
onehot_enc.fit(feature)
feature=onehot_enc.transform(feature).toarray()

new_feature=[]

print(feature.shape[1])
columns=feature.shape[1]
print(feature[0])

for i in range(columns):
    if (i+1)%3 != 0:
        new_feature.append(feature.T[i])
        
new_feature=np.array(new_feature)
feature=new_feature.T


27
[0. 0. 1. 0. 0. 1. 0. 0. 1. 0. 0. 1. 0. 1. 0. 0. 1. 0. 0. 0. 1. 0. 1. 0.
 0. 1. 0.]


** 避免虚拟变量陷阱 **
（这个概念不是特别理解）教程原文如下。
>Remove every third column to avoid dummy variable trap - As the one-hot encoding process in Scikit-learn creates as many columns for each variables as there are possible options (as per the dataset), one column needs to be removed in order to avoid what is referred to as the dummy variable trap. This is so to avoid redundant data which could bias results. In our case, each square has 3 possible options (27 columns), which can be expressed with 2 'bit' columns (I leave this to you to confirm), and so we remove every third column from the newly formed dataset (leaving us with 18).

  PS 虚拟变量陷阱：若有m个定性变量，则只在模型中引入m-1个虚拟变量，若引入m个虚拟变量，会导致模型解释变量间出现完全共线性的情况。

  比如这里我们要表示每个格子的状态，下面这种编码方式不合适，会导致完全共线性。
  >X: [1, 0 ,0]

  >O: [0, 1 ,0]

  >b: [0, 0, 1]

  所以按原文，每过两列需要删除一列，这样编码就变成了
  >X: [1, 0] 

  >O: [0, 1]

  >b: [0, 0]


**编码目标类别变量**，如{positive, negative}→{0,1}

In [320]:
label=np_utils.to_categorical(label,2)

#feature=feature.astype('float32')
#label=label.astype('float32')
print(feature.shape)
#label=label[:,np.newaxis]
print(label[:3])

(958, 18)
[[0. 1.]
 [0. 1.]
 [0. 1.]]


**训练集/测试集分割**，我们将20%的数据作为测试集。

In [321]:
X_train,X_test,y_train,y_test=train_test_split(feature,label,test_size=0.2,random_state=42)
print(X_train[:3])
print(y_train[:3])


[[0. 1. 0. 0. 0. 0. 0. 1. 0. 0. 0. 1. 1. 0. 0. 0. 1. 0.]
 [0. 1. 1. 0. 0. 1. 0. 1. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 1. 0. 1. 0. 1. 0. 0. 0. 1. 0. 0. 1. 0. 1. 0. 0.]]
[[0. 1.]
 [0. 1.]
 [0. 1.]]


我们一共有18个作为输入的变量，做2元分类
- **输入单元**。因为有18个独立变量，所以我们需要18个输入神经元。

- **隐藏层**。一个简单的决定每个隐藏层神经元个数的方法：把独立变量和输出变量的个数加起来再除以2。这里共19个变量，我们向下取整，每层设9个神经单元。关于隐藏层层数的设置，最好的方法是先设较小的层数，然后不断增加层数，直到网络的表现不再随层数提高为止。这里我们先设2层隐藏层。

- **激活函数**。惯例：隐藏层默认使用`ReLU`函数，而2元分类的输出层使用`sigmoid`函数。这里我们按照这个惯例搭建网络。

- **优化器**。我们使用Adam优化器

- **损失函数**。我们使用二分类交叉熵损失函数

- **权重初始化**。我们设定随机数作为初始权重。

网络如下：
1. dense_1_input: InputLayer
input: (None, 18)
output: (None, 18)
2. dense_1: Dense
input: (None, 18)
output: (None, 9)
3. dense_2: Dense
input: (None, 9)
output: (None, 9)
4. dense_3: Dense
input: (None, 9)
output: (None, 1)

In [322]:
#模型搭建
model = Sequential([
    Dense(18,input_dim=18),
    Activation('relu'),
    Dense(10),
    Activation('relu'),
    Dense(10),
    Activation('relu'),
    Dense(2),
    Activation('sigmoid'),
])


In [323]:
model.compile(optimizer='adam',
             loss='categorical_crossentropy',
             metrics=['accuracy'])

In [324]:
model.fit(X_train,y_train,epochs=100,batch_size=10)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

Epoch 84/100
Epoch 85/100
Epoch 86/100
Epoch 87/100
Epoch 88/100
Epoch 89/100
Epoch 90/100
Epoch 91/100
Epoch 92/100
Epoch 93/100
Epoch 94/100
Epoch 95/100
Epoch 96/100
Epoch 97/100
Epoch 98/100
Epoch 99/100
Epoch 100/100


<keras.callbacks.History at 0x1a96a9b8198>

In [325]:
loss, accuracy = model.evaluate(X_test,y_test)
print('loss=',loss)
print('accuracy=',accuracy)

loss= 0.22566105937585235
accuracy= 0.9427083333333334


# 作者源代码

In [326]:
import numpy as np
import pandas as pd

from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix

from keras.models import Sequential
from keras.layers import Dense

# Import dataset
dataset = pd.read_csv('tic_tac_toe.csv')
X = dataset.iloc[:, 0:9].values
y = dataset.iloc[:, 9:10].values

# Encode categorical variables as numeric
labelencoder_X = LabelEncoder()
for _ in range(9):
    X[:, _] = labelencoder_X.fit_transform(X[:, _])

# Onehot encode all dependent categorical variables
onehotencoder = OneHotEncoder(categorical_features = [0,1,2,3,4,5,6,7,8])
X = onehotencoder.fit_transform(X).toarray()

# Remove every third column to avoid dummy variable trap
# Only need 2 bits to represent 3 possibilities
X = np.delete(X, [0,3,6,9,12,15,18,21,24], axis=1)

# Encode target categorical variable
labelencoder_y = LabelEncoder()
y[:, 0] = labelencoder_y.fit_transform(y[:, 0])

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize neural network
nnet = Sequential()

# Add first hidden layer (and input layer)
nnet.add(Dense(units=9, kernel_initializer='uniform', activation='relu', input_dim=18))

# Add second hidden layer
nnet.add(Dense(units=9, kernel_initializer='uniform', activation='relu'))

# Add output layer
nnet.add(Dense(units=1, kernel_initializer='uniform', activation='sigmoid'))

# Compile network
nnet.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train network
nnet.fit(X_train, y_train, batch_size=10, epochs=100)

# Predicting the test set results
y_pred = nnet.predict(X_test)


Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

Epoch 84/100
Epoch 85/100
Epoch 86/100
Epoch 87/100
Epoch 88/100
Epoch 89/100
Epoch 90/100
Epoch 91/100
Epoch 92/100
Epoch 93/100
Epoch 94/100
Epoch 95/100
Epoch 96/100
Epoch 97/100
Epoch 98/100
Epoch 99/100
Epoch 100/100


In [327]:
loss, accuracy = nnet.evaluate(X_test,y_test)
print('loss=',loss)
print('accuracy=',accuracy)

loss= 0.09561582972916464
accuracy= 0.96875


In [328]:
print(y[:3]) 

[[1]
 [1]
 [1]]
