# Multi Layer Perceptron Neural Network Modeling

I want to display the value of MLP neural network modeling. 

The data I will model is of all possible poker hands.  The value of each card is represented by a value representing its suit, and a value representing its numeric (or royal) value, totaling to 10 features in total. The last column is the outcome variable, the type of hand a person has (single card high, pair, full house, etc.).  This is represented as a number from 0-8.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
import timeit
from sklearn.neural_network import MLPClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import GradientBoostingClassifier
from  sklearn.utils import resample

# Importing the Data

I am taking my data set from the UCI machine learning data base. 

The train and test set have already been split.

In [2]:
train = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/poker/poker-hand-training-true.data')
test = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/poker/poker-hand-testing.data')

In [3]:
print('Train data shape: ',train.shape)
print('Test data shape: ',test.shape)

Train data shape:  (25009, 11)
Test data shape:  (999999, 11)


In [4]:
#drop null values
train = train.dropna()
test = test.dropna()

In [5]:
train.head()

Unnamed: 0,1,10,1.1,11,1.2,13,1.3,12,1.4,1.5,9
0,2,11,2,13,2,10,2,12,2,1,9
1,3,12,3,11,3,13,3,10,3,1,9
2,4,10,4,11,4,1,4,13,4,12,9
3,4,1,4,13,4,12,4,11,4,10,9
4,1,2,1,4,1,5,1,3,1,6,8


The data has not come with column names and the first row of data has been erroneuosly forced to the column names. I will need to fix this.

In [6]:
true_columns = ['Suit1','Rank1','Suit2','Rank2','Suit3','Rank3','Suit4','Rank4','Suit5','Rank5','Hand']

In [7]:
misplaced = pd.Series(train.columns,index=true_columns)
misplaced = misplaced.astype(float).astype(int)

In [8]:
train.columns = true_columns
train = train.append(misplaced,ignore_index=True)

In [9]:
train.head()

Unnamed: 0,Suit1,Rank1,Suit2,Rank2,Suit3,Rank3,Suit4,Rank4,Suit5,Rank5,Hand
0,2,11,2,13,2,10,2,12,2,1,9
1,3,12,3,11,3,13,3,10,3,1,9
2,4,10,4,11,4,1,4,13,4,12,9
3,4,1,4,13,4,12,4,11,4,10,9
4,1,2,1,4,1,5,1,3,1,6,8


In [10]:
test.columns = true_columns
test = test.append(misplaced,ignore_index=True)

In [11]:
test.head()

Unnamed: 0,Suit1,Rank1,Suit2,Rank2,Suit3,Rank3,Suit4,Rank4,Suit5,Rank5,Hand
0,3,12,3,2,3,11,4,5,2,5,1
1,1,9,4,6,1,4,3,2,3,9,1
2,1,4,3,13,2,13,2,1,3,6,1
3,3,10,2,7,1,2,2,11,4,9,0
4,1,3,4,5,3,4,1,12,4,6,0


In [12]:
train.Hand.value_counts()/len(train.Hand)

0    0.499520
1    0.423790
2    0.048221
3    0.020512
4    0.003719
5    0.002159
6    0.001439
7    0.000240
9    0.000200
8    0.000200
Name: Hand, dtype: float64

Though the outcome is inbalanced in the training data, I do not think I should balance it, due to the nature of this data.  Since the training data represents literally all possible combinations of poker hands and I do not have the processing power to model on a data set much larger than the training data, I would have to downsample, leading to loss of information.  

For curiousity sake, I will resample the data to balance the outcome and show it is not ideal.

In [13]:
x = []
for target in range(9):
    outcome = train[train.Hand==target]
    sampled = resample(outcome,replace=True,n_samples=3000,random_state=42)
    x.append(sampled)

train_sampled = pd.concat(x)

In [14]:
# downsize the test set, as it is much too big work with
leftover, test_sampled = train_test_split(test,test_size=50000,random_state=42)

# Modeling

I would model primarily with the Multi-Layer Perceptron Classifier, MLPC. I will try it in 4 different configurations, including one to test the balanced outcome training set.

In [15]:
Xtrain = train.drop('Hand',1)
Ytrain = train.Hand

Xsamp = train_sampled.drop('Hand',1)
Ysamp = train_sampled.Hand

Xtest = test_sampled.drop('Hand',1)
Ytest = test_sampled.Hand

In [16]:
Ytest.value_counts()/len(Ytest)

0    0.50474
1    0.42010
2    0.04614
3    0.02126
4    0.00406
5    0.00194
6    0.00144
7    0.00032
Name: Hand, dtype: float64

In [17]:
scores = pd.DataFrame()

## MLP Config 1

This configuration will be rather basic, using a single layer neural network 100 nodes wide. It will have an alpha value of 0.5, Which will be standard across all the MLP configurations.

In [18]:
start = timeit.default_timer()

mlp1 = MLPClassifier(hidden_layer_sizes=(100,),random_state=42,alpha=0.5)
mlp1.fit(Xtrain, Ytrain)

stop = timeit.default_timer()

runtime1 = stop - start
print('Runtime: ',round(runtime1,3),' seconds')

Runtime:  7.378  seconds


In [19]:
print('Training Score: ',mlp1.score(Xtrain, Ytrain))
print('Testing Score: ',mlp1.score(Xtest, Ytest))

Training Score:  0.6143942423030788
Testing Score:  0.603


In [20]:
cross_val_score(mlp1, Xtest, Ytest, cv=5)

array([0.64484206, 0.66550035, 0.58385839, 0.58607582, 0.62088627])

## MLP Config  2

This configuration is identical to the previous one, but it uses the sampled training data with a balanced outcome set, but at the cost of a huge information loss.

In [21]:
start = timeit.default_timer()

mlp2 = MLPClassifier(hidden_layer_sizes=(100,),random_state=42,alpha=0.5)
mlp2.fit(Xsamp, Ysamp)

stop = timeit.default_timer()

runtime2 = stop - start
print('Runtime: ',round(runtime2,3),' seconds')

Runtime:  18.12  seconds


In [22]:
print('Training Score: ',mlp2.score(Xsamp, Ysamp))
print('Testing Score: ',mlp2.score(Xtest, Ytest))

Training Score:  0.7395555555555555
Testing Score:  0.36692


In [23]:
cross_val_score(mlp2, Xtest, Ytest, cv=5)

array([0.64484206, 0.66550035, 0.58385839, 0.58607582, 0.62088627])

## MLP Config 3

This configuration differents from the first one by adding 2 addition hidden layers, both 10 nodes wide. 

In [24]:
start = timeit.default_timer()

mlp3 = MLPClassifier(hidden_layer_sizes=(100,10,10,),random_state=42,alpha=0.5)
mlp3.fit(Xtrain, Ytrain)

stop = timeit.default_timer()

runtime3 = stop - start
print('Runtime: ',round(runtime3,3),' seconds')

Runtime:  12.467  seconds


In [25]:
print('Training Score: ',mlp3.score(Xtrain, Ytrain))
print('Testing Score: ',mlp3.score(Xtest, Ytest))

Training Score:  0.6716913234706118
Testing Score:  0.65668


In [26]:
cross_val_score(mlp3, Xtest, Ytest, cv=5)

array([0.94402239, 0.88813356, 0.6029603 , 0.77353206, 0.7463239 ])

## Config 4

This configuration is very different to the previous ones, having an initial layer 50 nodes wides and 4 additional hidden layers all 10 nodes wide.

In [27]:
start = timeit.default_timer()

mlp4 = MLPClassifier(hidden_layer_sizes=(50,10,10,10,10,),random_state=42,alpha=0.5)
mlp4.fit(Xtrain, Ytrain)

stop = timeit.default_timer()

runtime4 = stop - start
print('Runtime: ',round(runtime4,3),' seconds')

Runtime:  13.955  seconds


alpha=0.5, 7.408

In [28]:
print('Training Score: ',mlp4.score(Xtrain, Ytrain))
print('Testing Score: ',mlp4.score(Xtest, Ytest))

Training Score:  0.6207117153138745
Testing Score:  0.61088


alpha=0.5
Training Score:  0.4995201919232307
Testing Score:  0.4955

In [29]:
cross_val_score(mlp4, Xtest, Ytest, cv=5)

array([0.68502599, 0.66280116, 0.5759576 , 0.64709413, 0.63979194])

## MLP Conclusion

The single-layer MLP performed decently well, having a decent runtime, and accuracy, and cross-validation score, its cv score only having 1 slight inconsistency.

As expected, the MLP using the sampled data was disasterous. Though it trained much better than the unbalanced training set, it tested horribly, having the worst accuracy by far. Because this data was not a sample, but in fact a complete representation of all possible combinations, trying to balance the classes caused too much information loss. If I had more computing power, I would try to upsample the smaller classes, only increasing the training size.

The MLP with 2 addition 10 node layers performed better than the initial configuration in terms of accuracy, but had a more less inconsistent cross validation score. Even though it has double the runtime, its more than 5% increase in test score and steller cross-validation score is more than worth it.

The MLP with and 50 node first layer and 4 addition 10 node hidden layers performed slightly better than the initial configuration in terms of accuracy, but had a much less inconsistent cross-validation score.

The best model configuration to use is the initial one, a single, 100-node-wide layer.

# Testing MLP Against Other Models

## Random Forest Classifier

In [30]:
start = timeit.default_timer()

rfc = RandomForestClassifier(n_estimators=200,random_state=42)
rfc.fit(Xtrain,Ytrain)

stop = timeit.default_timer()

runtimerfc = stop - start
print('Runtime: ',round(runtimerfc,3),' seconds')

Runtime:  6.634  seconds


In [31]:
print('Training Score: ',rfc.score(Xtrain, Ytrain))
print('Testing Score: ',rfc.score(Xtest, Ytest))

Training Score:  1.0
Testing Score:  0.61876


In [32]:
cross_val_score(rfc, Xtest, Ytest, cv=5)

array([0.64114354, 0.63630911, 0.65066507, 0.63349005, 0.6449935 ])

## Gradient Boosted Classifier

In [33]:
start = timeit.default_timer()

gbc = GradientBoostingClassifier(random_state=42)
gbc.fit(Xtrain,Ytrain)

stop = timeit.default_timer()

runtimegbc = stop - start
print('Runtime: ',round(runtimegbc,3),' seconds')

Runtime:  23.396  seconds


In [34]:
print('Training Score: ',gbc.score(Xtrain, Ytrain))
print('Testing Score: ',gbc.score(Xtest, Ytest))

Training Score:  0.6287085165933627
Testing Score:  0.61468


In [35]:
cross_val_score(gbc, Xtest, Ytest, cv=5)

array([0.60845662, 0.60491852, 0.61116112, 0.60098029, 0.59147744])

## Cross Model Conclusion

The  RFC model seemed to overfit the data, having a perfect training score and a much lower testing score. Still, it cross validated well, so that may not be the case. It still tested better than the best MLP

The GBC model performed very well, having good traing, testing, and cross-validation scores. This is the best performing model of poker data overall. 

Both the RFC and the GBC models performed better than the most stable configuration of MLPC, both having higher traing and testing accuracy as well as much better cross-validation scores.