## Tic-Tac-Toe Endgame Data Set 

Binary classification task on possible configurations of tic-tac-toe game

dataset from: https://archive.ics.uci.edu/ml/datasets/Tic-Tac-Toe+Endgame

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
%matplotlib inline
#plt.style.use('seaborn')

In [2]:
df = pd.read_csv('./tic-tac-toe.data', header=None)
df.columns = ['top_left_sqr', 'top_middle_sqr', 'top_right_sqr',
             'mid_left_sqr', 'mid_mid_sqr', 'mid_right_sqr', 
             'btm_left_sqr', 'btm_mid_sqr', 'btm_right_sqr',
             'class']

In [3]:
df.head()

Unnamed: 0,top_left_sqr,top_middle_sqr,top_right_sqr,mid_left_sqr,mid_mid_sqr,mid_right_sqr,btm_left_sqr,btm_mid_sqr,btm_right_sqr,class
0,x,x,x,x,o,o,x,o,o,positive
1,x,x,x,x,o,o,o,x,o,positive
2,x,x,x,x,o,o,o,o,x,positive
3,x,x,x,x,o,o,o,b,b,positive
4,x,x,x,x,o,o,b,o,b,positive


In [4]:
df.isnull().sum()

top_left_sqr      0
top_middle_sqr    0
top_right_sqr     0
mid_left_sqr      0
mid_mid_sqr       0
mid_right_sqr     0
btm_left_sqr      0
btm_mid_sqr       0
btm_right_sqr     0
class             0
dtype: int64

In [5]:
dummy_X = pd.get_dummies(df.iloc[:, :-1], drop_first=True) # 

In [6]:
dummy_X.head()

Unnamed: 0,top_left_sqr_o,top_left_sqr_x,top_middle_sqr_o,top_middle_sqr_x,top_right_sqr_o,top_right_sqr_x,mid_left_sqr_o,mid_left_sqr_x,mid_mid_sqr_o,mid_mid_sqr_x,mid_right_sqr_o,mid_right_sqr_x,btm_left_sqr_o,btm_left_sqr_x,btm_mid_sqr_o,btm_mid_sqr_x,btm_right_sqr_o,btm_right_sqr_x
0,0,1,0,1,0,1,0,1,1,0,1,0,0,1,1,0,1,0
1,0,1,0,1,0,1,0,1,1,0,1,0,1,0,0,1,1,0
2,0,1,0,1,0,1,0,1,1,0,1,0,1,0,1,0,0,1
3,0,1,0,1,0,1,0,1,1,0,1,0,1,0,0,0,0,0
4,0,1,0,1,0,1,0,1,1,0,1,0,0,0,1,0,0,0


In [7]:
for i in range(dummy_X.shape[1]):
    print(dummy_X.iloc[:, i].dtype)

uint8
uint8
uint8
uint8
uint8
uint8
uint8
uint8
uint8
uint8
uint8
uint8
uint8
uint8
uint8
uint8
uint8
uint8


In [8]:
df.loc[df['class'] == 'positive', 'class'] = 1
df.loc[df['class'] == 'negative', 'class'] = 0

In [9]:
df['class'].unique()

array([1, 0], dtype=object)

In [10]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

In [11]:
X = dummy_X
y = df['class'].astype('int')

In [12]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

### Logistic Regression

In [13]:
lr = LogisticRegression()
lr.fit(X_train, y_train)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)

In [14]:
pred = lr.predict(X_train)
print("\nLogistic Regression - Train accuracy (Eye Detection)", round(accuracy_score(y_train, pred), 5))

pred = lr.predict(X_test)
print("\nLogistic Regression - Test accuracy (Eye Detection)", round(accuracy_score(y_test, pred), 5))


Logistic Regression - Train accuracy (Eye Detection) 0.98507

Logistic Regression - Test accuracy (Eye Detection) 0.97917


### Decision Tree

In [15]:
model_dtc = DecisionTreeClassifier(random_state=42)
model_dtc.fit(X_train, y_train)
prediction_dtc = model_dtc.predict(X_test)
print('Decison Tree - Test accuracy: ', accuracy_score(prediction_dtc, y_test))

Decison Tree - Test accuracy:  0.9513888888888888


When drop_first = True, we got the following accuracy

Logistic Regression - Train accuracy (Eye Detection) 0.98507
Logistic Regression - Test accuracy (Eye Detection) 0.97917

Decison Tree - Test accuracy:  0.9513888888888888


When drop_first = False

Logistic Regression - Train accuracy (Eye Detection) 0.98507
Logistic Regression - Test accuracy (Eye Detection) 0.97569

Decison Tree - Test accuracy:  0.9340277777777778

It's look like dropping first column from one-hot-encoding yields marginally better accuracy.