## Section 01: Data Generation ##

**Abstract:**
This notebook generates random (but reproducible) data representing Texas Hold-em poker hands and their evaluations at each street (Flop, River, and Turn). The raw data is purely numerical and is generated and evaluated efficiently using the Python library `deuces`. The data can be visualized in its raw form, or transformed into a human-friendly string dataframe, and finally saved as a `.csv`, .



In [2]:
# Imports
import pandas as pd
import numpy as np
from deuces import Card, Deck, Evaluator
import random

**Hand Simulation**: Generate many hands using `deuces` deck object. The 7 cards collected here can represent different stages of a round, also known as "streets". For example, we have two hole cards, three flop cards, a turn card, and a river card. Each of these cards is represented by a column in a dataframe, while rows represent new hands. Individual cells contain integer values that map to specific cards. A random seed is required for reproducible results.

In [3]:
# set the random seed for reproducibility
random.seed(42)

n_hands = 100000

deck = Deck()
hands = []
for _ in range(n_hands):
    deck.shuffle()
    cards = deck.draw(7)
    hands.append(cards)
hands_df = pd.DataFrame(hands, columns=['hole1', 'hole2', 'flop1', 'flop2', 'flop3', 'turn', 'river'])
hands_df.head()

Unnamed: 0,hole1,hole2,flop1,flop2,flop3,turn,river
0,533255,67144223,557831,270853,4204049,134236965,73730
1,4212241,164099,67144223,295429,1065995,2106637,147715
2,268454953,8394515,139523,533255,33564957,4199953,2114829
3,8406803,8398611,67119647,2114829,33589533,529159,4228625
4,16787479,279045,67115551,1082379,81922,270853,2102541


**Hand Evaluations:** The `deuces` library provides a relatively efficient method for evaluating hands. This library allows for the evaluation of 5-7 cards, giving an evaluation score via lookup table. Here is a quote from the library's documentation that describes this system:

*Hand strength is valued on a scale of 1 to 7462, where 1 is a Royal Flush and 7462 is unsuited 7-5-4-3-2, as there are only 7642 distinctly ranked hands in poker.*

This method can be applied to the current dataframe in order to get columns representing the evaluation of the hand at each street.


In [4]:
evaluator = Evaluator()

hands_df['flop_eval'] = hands_df.apply(lambda row: evaluator.evaluate(list(row.values[:2]), list(row.values[2:5])), axis=1)
hands_df['turn_eval'] = hands_df.apply(lambda row: evaluator.evaluate(list(row.values[:2]), list(row.values[2:6])), axis=1)
hands_df['river_eval'] = hands_df.apply(lambda row: evaluator.evaluate(list(row.values[:2]), list(row.values[2:7])), axis=1)

hands_df.head()

Unnamed: 0,hole1,hole2,flop1,flop2,flop3,turn,river,flop_eval,turn_eval,river_eval
0,533255,67144223,557831,270853,4204049,134236965,73730,5429,5364,5364
1,4212241,164099,67144223,295429,1065995,2106637,147715,7195,7183,5867
2,268454953,8394515,139523,533255,33564957,4199953,2114829,6514,6500,6498
3,8406803,8398611,67119647,2114829,33589533,529159,4228625,4528,4528,4527
4,16787479,279045,67115551,1082379,81922,270853,2102541,7141,5637,5636


**Final Steps:** The current data is provided in its raw form, maximizing efficiency. However, this form is not very human-friendly as it is not conducive to expect people to recognize specific cards or hand classifications from their integer representations. A helpful script from the `row_transformer` module is used to quickly convert a dataframe from this form into a more readable one, denoting cards such as "3s" (Three of Spades) and translating evaluation scores into hand classifications such as "Full House". Finally, the raw data can be saved locally for further use in subsequent sections. 

*Note that the data file is not included in the repo in order to save space.*


In [5]:
from scripts.row_transformer import RowTransformer

transformer = RowTransformer()
pretty_hands_df = hands_df.transform(transformer.prettify_row, axis=1)

pretty_hands_df.head()

Unnamed: 0,hole1,hole2,flop1,flop2,flop3,turn,river,flop_eval,turn_eval,river_eval
0,5h,Qc,5c,4h,8h,Kd,2h,Pair,Pair,Pair
1,8d,3c,Qc,4c,6d,7h,3d,High Card,High Card,Pair
2,Ad,9s,3h,5h,Jh,8s,7d,High Card,High Card,High Card
3,9d,9h,Qh,7d,Jc,5s,8c,Pair,Pair,Pair
4,Th,4d,Qs,6c,2d,4h,7s,High Card,Pair,Pair


In [6]:
# save to the data directory
hands_df.to_csv("../data/hands.csv", index=False)