## Section 02: Hand Frequency Distribution ##
**Abstract**: In this notebook, we validate the frequencies of certain poker hands generated previously in [Section 01](./01_hand_simulation.ipynb). This involves verifying that each distinct seven card hand appears with a frequency that approximates its known likelihood. Subsequently, the RMSE between frequencies and likelihoods can be used as an approximation for accuracy, and a dataframe representing the hand frequencies can be saved locally.

In [12]:
# Imports
import numpy as np
import pandas as pd
from deuces import Evaluator
import altair as alt

In [13]:
# preview hands dataframe
hands_df = pd.read_pickle('../data/hands_long.pkl')
hands_df.head()

Unnamed: 0,hand_id,player_id,flop,river,turn,hole_,flop_eval_,turn_eval_,river_eval_,showdown_order_,hand_class_
0,0,0,"[16795671, 33573149, 16783383]",67119647,1082379,"[533255, 67144223]",4310,4309,2734,2,7
1,1,0,"[268454953, 1082379, 134253349]",268442665,139523,"[4212241, 164099]",6322,5750,2578,4,7
2,2,0,"[67119647, 81922, 1065995]",73730,16812055,"[268454953, 8394515]",6428,6388,5977,5,8
3,3,0,"[268446761, 4204049, 4212241]",268454953,1065995,"[8406803, 8398611]",3018,3018,2516,4,7
4,4,0,"[268454953, 4212241, 73730]",134228773,268471337,"[16787479, 279045]",6588,3472,3346,8,8


**Hand Frequency**: Each 7-card hand must be classified at showdown, and represents a categorical option given as an integer by the `deuces.evaluator` object. Then, the value counts of this series of integers can be used to measure the occurrences of each `hand_class` in the synthetic data. Normalizing such data converts discrete counts into a frequency for specific hands. The index of each row represents the unique integer representation of a hand, and this integer can be converted to the name of the hand as a string using the `deuces.evaluator` object, as well.

In [15]:
"""
Determine the frequency of each type of 7-card hand
"""

evaluator = Evaluator()

# frequency of each class, normalized
hand_dist = pd.DataFrame()
hand_dist['freq'] = hands_df['hand_class_'].value_counts(normalize=True).sort_index()
hand_dist['class_name'] = hand_dist.index.map(evaluator.class_to_string)
hand_dist['class_int'] = hand_dist.index
hand_dist.reset_index(drop=True, inplace=True)
hand_dist

Unnamed: 0,freq,class_name,class_int
0,0.000302,Straight Flush,1
1,0.001716,Four of a Kind,2
2,0.026248,Full House,3
3,0.030383,Flush,4
4,0.046916,Straight,5
5,0.048131,Three of a Kind,6
6,0.23407,Two Pair,7
7,0.438299,Pair,8
8,0.173936,High Card,9


**Comparing to Known Likelihoods**: To get a sense for the accuracy of the `hands_long` dataframe, the simulated frequencies of each hand must be compared to its theoretical likelihood. Known likelihoods are provided from [Wikipedia](https://en.wikipedia.org/wiki/Poker_probability), and serve as another column to the `hand_dist` dataframe. More columns for absolute and squared error are computed to approximate simulation accuracy.
$$
|\text{error} | = | \text{frequency} - \text{likelihood} |
$$$$
\text{error}^2 = ( \text{frequency} - \text{likelihood} )^2
$$

In [4]:
"""
Compute error of hand distribution by comparing frequency of hands to their known likelihoods.
"""
hand_dist['likelihood'] = pd.Series([
    0.000032 + 0.000279, # Royal Flush + other Straight Flushes
    0.00168, # Four of a kind
    0.0260,
    0.0303,
    0.0462,
    0.0483,
    0.235,
    0.438,
    0.174
])
hand_dist['error_abs'] = np.abs(hand_dist['freq'] - hand_dist['likelihood'])
hand_dist['error_sqr'] = np.square(hand_dist['freq'] - hand_dist['likelihood'])

In [5]:
"""
Aproximate the accuracy of simulated data by calculating the RMSE
"""
mse = hand_dist['error_sqr'].mean()
print(f'RMSE: {mse**0.5}')

RMSE: 0.00041748058939788333


**Conclusion**: This notebook successfully measures the frequencies for each 7-card hand. In comparison to known likelihoods, the simulation had an $\text{RMSE} \approx 4.175 \times 10^{-4}$, which would suggest a high simulation accuracy. This is a good sanity check before estimating equity.