## Section 01: Data Generation
**Abstract**: This notebook generates a synthetic dataset of Texas Hold'em poker hands, simulating 100,000 hands with 9 players each. It evaluates the strength of each player's hand at different stages (flop, turn, river) and saves the data in both wide and long formats for further analysis. This data set is generated using the `deuces` library, which is a fast and efficient poker hand evaluation library in Python. For efficiency, we use `deuces` represents cards as unique integers, and evaluates 5-7 card poker hands on a scale from 1 (best) to 7462 (worst).

In [1]:
# Imports
import pandas as pd
import random
from deuces import Deck, Evaluator

**Hand Simulation**: We simulate 100,000 hands of Texas Hold'em poker with 9 players each. For each hand, we deal two hole cards to each player and five community cards (flop, turn, river). We then evaluate the strength of each player's hand at each street using the evaluator object provided in `deuces` library. Finally, we save the wide-form dataset locally for future use.

In [2]:
# set random seed for reproducibility
random.seed(42)

n = 100000 # number of hands to simulate
hands = []
evaluator = Evaluator()
deck = Deck()
for i in range(n):
    deck.shuffle()
    # create a row that represents 1 hand of texas holdem with 9 players
    hands.append({
        'hand_id': i,
        'hole_0': deck.draw(2),
        'hole_1': deck.draw(2),
        'hole_2': deck.draw(2),
        'hole_3': deck.draw(2),
        'hole_4': deck.draw(2),
        'hole_5': deck.draw(2),
        'hole_6': deck.draw(2),
        'hole_7': deck.draw(2),
        'hole_8': deck.draw(2),
        'flop': deck.draw(3),
        'turn': deck.draw(1),
        'river': deck.draw(1),
    })
# save as wide-form dataframe
hands_wide = pd.DataFrame(hands)
# preview data
hands_wide.head()

Unnamed: 0,hand_id,hole_0,hole_1,hole_2,hole_3,hole_4,hole_5,hole_6,hole_7,hole_8,flop,turn,river
0,0,"[533255, 67144223]","[557831, 270853]","[4204049, 134236965]","[73730, 69634]","[1053707, 98306]","[4212241, 16787479]","[8423187, 67127839]","[8394515, 33564957]","[268446761, 164099]","[16795671, 33573149, 16783383]",1082379,67119647
1,1,"[4212241, 164099]","[67144223, 295429]","[1065995, 2106637]","[147715, 2114829]","[67127839, 16787479]","[279045, 69634]","[8394515, 16783383]","[33589533, 268446761]","[1057803, 529159]","[268454953, 1082379, 134253349]",139523,268442665
2,2,"[268454953, 8394515]","[139523, 533255]","[33564957, 4199953]","[2114829, 67127839]","[4228625, 1057803]","[541447, 266757]","[2106637, 16795671]","[134253349, 135427]","[8423187, 8406803]","[67119647, 81922, 1065995]",16812055,73730
3,3,"[8406803, 8398611]","[67119647, 2114829]","[33589533, 529159]","[4228625, 2102541]","[270853, 295429]","[139523, 2131213]","[279045, 16795671]","[67144223, 4199953]","[533255, 1082379]","[268446761, 4204049, 4212241]",1065995,268454953
4,4,"[16787479, 279045]","[67115551, 1082379]","[81922, 270853]","[2102541, 67119647]","[2131213, 2114829]","[33564957, 33560861]","[4204049, 1053707]","[541447, 67144223]","[1065995, 16783383]","[268454953, 4212241, 73730]",268471337,134228773


In [3]:
# add columns that represent hand evaluations for each player at each street (flop, turn, river)
for i in range(9):
    hands_wide[f'flop_eval_{i}'] = hands_wide.apply(
        lambda row: evaluator.evaluate(row[f'hole_{i}'], row['flop']),
        axis=1
    )
    hands_wide[f'turn_eval_{i}'] = hands_wide.apply(
        lambda row: evaluator.evaluate(row[f'hole_{i}'], row['flop'] + [row['turn']]),
        axis=1
    )
    hands_wide[f'river_eval_{i}'] = hands_wide.apply(
        lambda row: evaluator.evaluate(row[f'hole_{i}'], row['flop'] + [row['turn'], row['river']]),
        axis=1
    )

# preview data
hands_wide.head()

Unnamed: 0,hand_id,hole_0,hole_1,hole_2,hole_3,hole_4,hole_5,hole_6,hole_7,hole_8,...,river_eval_5,flop_eval_6,turn_eval_6,river_eval_6,flop_eval_7,turn_eval_7,river_eval_7,flop_eval_8,turn_eval_8,river_eval_8
0,0,"[533255, 67144223]","[557831, 270853]","[4204049, 134236965]","[73730, 69634]","[1053707, 98306]","[4212241, 16787479]","[8423187, 67127839]","[8394515, 33564957]","[268446761, 164099]",...,1895,4306,4306,2734,2834,2834,2833,4231,4228,4216
1,1,"[4212241, 164099]","[67144223, 295429]","[1065995, 2106637]","[147715, 2114829]","[67127839, 16787479]","[279045, 69634]","[8394515, 16783383]","[33589533, 268446761]","[1057803, 529159]",...,3372,6268,6268,3345,3340,3340,1611,5092,5092,2545
2,2,"[268454953, 8394515]","[139523, 533255]","[33564957, 4199953]","[2114829, 67127839]","[4228625, 1057803]","[541447, 266757]","[2106637, 16795671]","[134253349, 135427]","[8423187, 8406803]",...,6077,7130,4327,3009,6793,6734,6022,4555,4536,3086
3,3,"[8406803, 8398611]","[67119647, 2114829]","[33589533, 529159]","[4228625, 2102541]","[270853, 295429]","[139523, 2131213]","[279045, 16795671]","[67144223, 4199953]","[533255, 1082379]",...,2528,4677,4675,718,2007,2007,239,4691,3106,2529
4,4,"[16787479, 279045]","[67115551, 1082379]","[81922, 270853]","[2102541, 67119647]","[2131213, 2114829]","[33564957, 33560861]","[4204049, 1053707]","[541447, 67144223]","[1065995, 16783383]",...,2490,4694,2529,2523,6446,3407,3329,6583,3470,3346


In [4]:
from scipy.stats import rankdata
# it will also be extremely useful later to have the ordered ranking for each player's hand at showdown
# Compute showdown order for each hand based on river_eval columns
def showdown_order(row):
    evals = [row[f'river_eval_{i}'] for i in range(9)]
    ranks = rankdata(evals, method='min')
    return pd.Series(ranks, index=[f'showdown_order_{i}' for i in range(9)])

# concatenate the showdown orders to the original dataframe
showdown_orders = hands_wide.apply(showdown_order, axis=1)
hands_wide = pd.concat([hands_wide, showdown_orders], axis=1)
# preview data
hands_wide.head()

Unnamed: 0,hand_id,hole_0,hole_1,hole_2,hole_3,hole_4,hole_5,hole_6,hole_7,hole_8,...,river_eval_8,showdown_order_0,showdown_order_1,showdown_order_2,showdown_order_3,showdown_order_4,showdown_order_5,showdown_order_6,showdown_order_7,showdown_order_8
0,0,"[533255, 67144223]","[557831, 270853]","[4204049, 134236965]","[73730, 69634]","[1053707, 98306]","[4212241, 16787479]","[8423187, 67127839]","[8394515, 33564957]","[268446761, 164099]",...,4216,2,9,8,6,5,1,2,4,7
1,1,"[4212241, 164099]","[67144223, 295429]","[1065995, 2106637]","[147715, 2114829]","[67127839, 16787479]","[279045, 69634]","[8394515, 16783383]","[33589533, 268446761]","[1057803, 529159]",...,2545,4,7,2,4,6,9,8,1,2
2,2,"[268454953, 8394515]","[139523, 533255]","[33564957, 4199953]","[2114829, 67127839]","[4228625, 1057803]","[541447, 266757]","[2106637, 16795671]","[134253349, 135427]","[8423187, 8406803]",...,3086,5,8,7,1,4,8,2,6,3
3,3,"[8406803, 8398611]","[67119647, 2114829]","[33589533, 529159]","[4228625, 2102541]","[270853, 295429]","[139523, 2131213]","[279045, 16795671]","[67144223, 4199953]","[533255, 1082379]",...,2529,4,5,6,1,8,7,3,1,8
4,4,"[16787479, 279045]","[67115551, 1082379]","[81922, 270853]","[2102541, 67119647]","[2131213, 2114829]","[33564957, 33560861]","[4204049, 1053707]","[541447, 67144223]","[1065995, 16783383]",...,3346,8,5,4,5,3,1,2,5,8


In [5]:
# save locally
hands_wide.to_pickle('../data/hands_wide.pkl')

**Long Format Conversion**: We convert the wide-form dataset to a long-form dataset using `pd.wide_to_long()`. This format will be more suitable for analyzing individual hands, as it allows for easier grouping and aggregation of data later on. We save the long-form dataset locally for future use, as well. This method of conversion preserves the relationship between different players and their respective showdown orders from the wide dataframe.

In [6]:
# Convert to long form dataframe using wide_to_long with correct suffix
hands_long = pd.wide_to_long(
    hands_wide,
    stubnames=['hole_', 'flop_eval_', 'turn_eval_', 'river_eval_', 'showdown_order_'],
    i='hand_id',
    j='player_id',
).reset_index()

# we should have 100,000 * 9 = 900,000 rows now
hands_long.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 900000 entries, 0 to 899999
Data columns (total 10 columns):
 #   Column           Non-Null Count   Dtype 
---  ------           --------------   ----- 
 0   hand_id          900000 non-null  int64 
 1   player_id        900000 non-null  int64 
 2   flop             900000 non-null  object
 3   river            900000 non-null  int64 
 4   turn             900000 non-null  int64 
 5   hole_            900000 non-null  object
 6   flop_eval_       900000 non-null  int64 
 7   turn_eval_       900000 non-null  int64 
 8   river_eval_      900000 non-null  int64 
 9   showdown_order_  900000 non-null  int64 
dtypes: int64(8), object(2)
memory usage: 68.7+ MB


In [7]:
# save locally
hands_long.to_pickle('../data/hands_long.pkl')

**Conclusion**: This notebook successfully generates a synthetic dataset of Texas Hold'em poker hands, simulating 100,000 hands with 9 players each. The dataset includes hand evaluations at different stages (flop, turn, river) and is saved in both wide and long formats for further analysis. The use of the `deuces` library ensures efficient hand evaluation, making this dataset a valuable resource for estimating equity, for example.