Skip to content

BeardedGinger24/Poker-Rule-Induction

Repository files navigation

Poker-Rule-Induction

Course project for Introduction to Data Science.

Overview

The goal of this project is to develop an algorithm that can learn poker without being given specific information about the hands that can be played in po ker. The algorithm is given information on playing cards, including the suits and ranks of cards; using only this information and a list of outcomes from poker hands already played, the algorithm must learn the rules and determine the outcome of future hands.

The training data has 10 features and 1 label. The features include the rank and suit of 5 cards and the label states the best hand that can be played using the given cards. Suits of cards are labeled 1-4 each representing, Hearts, Spades, Diamonds, Clubs respectively. The rank of cards are numbered from 1-13 and they represent Ace, 2 through 10, Jack, Queen and King. The labels are represented from 0 to 9: 0 representing nothing in hand, not a recognized poker hand; 1 representing one pair, one pair of equal ranks within five cards; 2 representing two pairs, two pairs of equal ranks within five cards; 3 represents three of a kind, three equal ranks within five cards; 4 representing Straight, five cards sequentially ranked with no gaps; 5 representing Flush; five cards with the same suit, 6 representing Full house; pair + different rank three of a kind, 7 representing Four of a kind, four equal ranks within five cards; 8 representing Straight flush, straight + flush; 9 representing Royal flush, {Ace, King, Queen, Jack, Ten} + flush.

Implemenation

We have tested the data using KNN algorithm (Kth Nearest Neighbor), Decision Tree, and Logistic Regression algorithm. We applied Cross-Validation to each algorithm to validate that the accuracy scores that we were outputting were correct. We also applied One Hot Encoding to separate the features since the dataset provides increasing values and that will not give us the desired output we want. In addition, we used Random Forest to handle the large amount of features. We noticed that none of the methods that we implemented worked well because the “computer” had difficulty recognizing patterns and making connections between data. To solve this we had to utilize feature engineering. We started with first combining the suits of all the cards. We made 4 features, one for each suit, which had the count of how many of each suit was present in the current hand. Then to make the ranks of cards more meaningful, we sorted the cards and found the difference of each card and its neighboring cards. The combination of these two new lists made the data much more clear and it easier to group different hands.

Results

Before feature engineering the the accuracy for K-Nearest Neighbor (K = 3), Decision Tree, Logistic Regression, and Random Forest are 53.7 percent, 49.1 percent, 53.7 percent, and 61.5 percent respectively. Applying cross validation to each algorithm resulted in similar accuracies, 52.9 percent, 49.3 percent, 50 percent, and 57 percent respectively. Since Random Forest was the best performing algorithm before feature engineering, we decided to calculate the area under curve (AUC) for it, resulting in 50 percent AUC. That result means that the algorithms are just randomly assigning results instead of figuring out the correlation of the data and the results (output). When feature engineering was applied to the original dataset and tested on the test file, K-Nearest Neighbor got 60 percent for K = 3 and 62 percent for K = 5, Random Forest achieved 99.889 percent, and Decision Tree got 99.995 percent.

About

Course project for Introduction to Data Science.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published