## Two Set Mahjong Probability Calculations

This notebook tabulates hand probabilities for a teaching Mahjong variant with a limited number of types and hand size:
- **Tiles**: 72 tiles covering the bamboo (1-9 numeric) and circles (1-9 numeric) suits; four copies of each tile.
- **Hand Size**: Players seek to complete a hand with 8 tiles, consisting of 2 sets of three (sequence or triplet; no quads) and 1 pair.
- **No Calls**: All hands are concealed until completed / won.

In traditional Mahjong, patterns of tiles in a completed hand are given point values based generally on their elegance and rarity: how do the rarities of those patterns change when we limit the types of tiles and the number of tiles in hand?

In [1]:
import math
import numpy as np
import pandas as pd

from itertools import product

In [2]:
# load pre-computed tile combination properties
suited_df = pd.read_csv('./shanten_suuhai.csv', 
                        index_col='tile_int', 
                        dtype={'tile_vector': str})

# trim to only combinations with eight or fewer tiles
suited_df = suited_df[suited_df['n_tiles'] <= 8]

print(suited_df.shape)
suited_df.sample(10)

(22330, 11)


Unnamed: 0_level_0,tile_vector,n_tiles,n_sets,n_triplets,n_sequences,n_blocks,n_pairs,max_pairs,n_koritsu,n_terminals,n_ways
tile_int,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
11234455,211220000,8,1,0,1,2,2,3,0,1,3456
22666,20003000,5,1,1,0,1,1,1,0,0,24
22236669,31003001,8,2,2,0,0,0,0,2,1,256
22555678,20031110,8,2,1,1,1,1,1,0,0,1536
4667788,102220,7,2,0,2,0,0,3,1,0,864
122249,130100001,6,1,1,0,0,0,0,3,2,256
1225,120010000,4,0,0,0,1,1,1,1,1,96
25589999,10020014,8,1,1,0,2,1,1,1,1,96
22255789,30020111,8,2,1,1,1,1,1,0,1,1536
12667799,110002202,8,0,0,0,4,3,3,0,2,3456


In [3]:
def vector_to_int(t_vector):
    t_int = ''
    for i, cnt in zip(np.arange(1,len(t_vector)+1),t_vector):
        t_int += cnt * str(i)
    if t_int:
        return int(t_int)
    else:
        return 0

def int_to_vector(t_int, n_types=9):
    t_vector = np.zeros(n_types, dtype=int)
    t_int = str(t_int)
    for i in t_int:
        t_vector[int(i)-1] += 1
    return t_vector

## General Probabilities
- How many possible hands are there?
- How many of those hands form a winning combination? (Tenhou/Chiihou equivalent)

In [4]:
### How many possible hands are there, winning or otherwise?
hands_by_tiles = suited_df.groupby('n_tiles').agg({'n_ways': sum})

total_hands = 0
for i in range(9):
    total_hands += int(hands_by_tiles.values[i] * hands_by_tiles.values[8-i])

print(total_hands)

11969016345


In [5]:
### How many possible winning hands are there?
suited_complete = suited_df.query('(3 * n_sets + 2 * n_pairs == n_tiles) & (n_pairs <= 1)')
suited_complete_ways = suited_complete.groupby(['n_tiles', 'n_sets', 'n_pairs']).sum(numeric_only=True)['n_ways'].reset_index()
suited_complete_ways

Unnamed: 0,n_tiles,n_sets,n_pairs,n_ways
0,0,0,0,1
1,2,0,1,54
2,3,1,0,484
3,5,1,1,19200
4,6,2,0,65272
5,8,2,1,1748756


In [6]:
total_winning_hands = 0
for i in range(6):
    total_winning_hands += suited_complete_ways.loc[i, 'n_ways'] * suited_complete_ways.loc[5-i, 'n_ways']

print(total_winning_hands)
print(f"proportion: {total_winning_hands/total_hands:0.7f}; 1 in {total_hands/total_winning_hands:.0f}")

29132488
proportion: 0.0024340; 1 in 411


## Specific Hand Type Proportions
- **All Simples** (_tanyao_): only numeric tiles from 2-8
- **Included Terminals** (_junchan_): each set and the pair includes a 1 or 9
- **All Sequences** (_pinfu_-like): two sequences and a pair
- **All Triplets** (_toitoi_; _sanankou_-like): two triplets and a pair
- **Full Flush** (_chinitsu_): all tiles are of a single numeric suit
- **Two Identical Sequences** (_iipeikou_-like): two identical sequences, including same suit

In [7]:
# defining sets for assembling winning combinations
sequences = [int_to_vector(123), int_to_vector(234), int_to_vector(345), int_to_vector(456),
             int_to_vector(567), int_to_vector(678), int_to_vector(789), np.zeros(9,dtype=int)]
triplets  = [int_to_vector(111), int_to_vector(222), int_to_vector(333), int_to_vector(444), int_to_vector(555),
             int_to_vector(666), int_to_vector(777), int_to_vector(888), int_to_vector(999), np.zeros(9,dtype=int)]

pairs = [int_to_vector(11), int_to_vector(22), int_to_vector(33), int_to_vector(44), int_to_vector(55),
         int_to_vector(66), int_to_vector(77), int_to_vector(88), int_to_vector(99), np.zeros(9,dtype=int)]

terminal_sets  = [int_to_vector(111), int_to_vector(999),
                  int_to_vector(123), int_to_vector(789), np.zeros(9,dtype=int)]
terminal_pairs = [int_to_vector(11), int_to_vector(99), np.zeros(9,dtype=int)]

In [8]:
def assemble_from_groups(*args):
    test_groups = product(*args)

    valid_groups = []
    for test_group in test_groups:
        test_vector = np.array(test_group).sum(axis=0)
        if (test_vector <= 4).sum() == test_vector.size:
            valid_groups.append(vector_to_int(test_vector))
    valid_groups = np.unique(np.array(valid_groups))
    
    return valid_groups

In [9]:
### All Simples
simple_complete = suited_complete.query('n_terminals == 0')
simple_complete_ways = simple_complete.groupby(['n_tiles', 'n_sets', 'n_pairs']).sum(numeric_only=True)['n_ways'].reset_index()

winning_hands = 0
for i in range(6):
    winning_hands += simple_complete_ways.loc[i, 'n_ways'] * simple_complete_ways.loc[5-i, 'n_ways']

print(winning_hands)
print(f"proportion: {winning_hands/total_winning_hands:0.7f}; 1 in {total_winning_hands/winning_hands:.2f}")

9722776
proportion: 0.3337434; 1 in 3.00


In [10]:
### Included Terminals
valid_groups = assemble_from_groups(terminal_sets, terminal_sets, terminal_pairs)

terminal_complete = suited_complete.loc[valid_groups,:]
terminal_complete_ways = terminal_complete.groupby(['n_tiles', 'n_sets', 'n_pairs']).sum(numeric_only=True)['n_ways'].reset_index()

winning_hands = 0
for i in range(6):
    winning_hands += terminal_complete_ways.loc[i, 'n_ways'] * terminal_complete_ways.loc[5-i, 'n_ways']

print(winning_hands)
print(f"proportion: {winning_hands/total_winning_hands:0.7f}; 1 in {total_winning_hands/winning_hands:.1f}")

402000
proportion: 0.0137990; 1 in 72.5


In [11]:
### All Sequences
valid_groups = assemble_from_groups(sequences, sequences, pairs)

sequences_complete = suited_complete.loc[valid_groups,:]
sequences_complete_ways = sequences_complete.groupby(['n_tiles', 'n_sets', 'n_pairs']).sum(numeric_only=True)['n_ways'].reset_index()

winning_hands = 0
for i in range(6):
    winning_hands += sequences_complete_ways.loc[i, 'n_ways'] * sequences_complete_ways.loc[5-i, 'n_ways']

print(winning_hands)
print(f"proportion: {winning_hands/total_winning_hands:0.7f}; 1 in {total_winning_hands/winning_hands:.2f}")

24161608
proportion: 0.8293699; 1 in 1.21


In [12]:
### All Triplets
valid_groups = assemble_from_groups(triplets, triplets, pairs)

triplets_complete = suited_complete.loc[valid_groups,:]
triplets_complete_ways = triplets_complete.groupby(['n_tiles', 'n_sets', 'n_pairs']).sum(numeric_only=True)['n_ways'].reset_index()

winning_hands = 0
for i in range(6):
    winning_hands += triplets_complete_ways.loc[i, 'n_ways'] * triplets_complete_ways.loc[5-i, 'n_ways']

print(winning_hands)
print(f"proportion: {winning_hands/total_winning_hands:0.7f}; 1 in {total_winning_hands/winning_hands:.0f}")

235008
proportion: 0.0080669; 1 in 124


In [13]:
### Full Flush
n_winning_hands = 2 * suited_complete_ways.loc[5, 'n_ways']

print(n_winning_hands)
print(f"proportion: {n_winning_hands/total_winning_hands:0.7f}; 1 in {total_winning_hands/n_winning_hands:.1f}")

3497512
proportion: 0.1200554; 1 in 8.3


In [14]:
### Two Identical Sequences
identical_sequences = [x+x for x in sequences]
valid_groups = assemble_from_groups(identical_sequences, pairs)

iipeikou_complete = suited_complete.loc[valid_groups,:]
iipeikou_complete_ways = iipeikou_complete.groupby(['n_tiles', 'n_sets', 'n_pairs']).sum(numeric_only=True)['n_ways'].reset_index()

winning_hands = 0
for i in range(4):
    winning_hands += iipeikou_complete_ways.loc[i, 'n_ways'] * iipeikou_complete_ways.loc[3-i, 'n_ways']

print(winning_hands)
print(f"proportion: {winning_hands/total_winning_hands:0.7f}; 1 in {total_winning_hands/winning_hands:.0f}")


258120
proportion: 0.0088602; 1 in 113


#### Special Hands

- **Four Pairs** (_yontoitsu_): Four unique pairs of tiles.

In [15]:
n_winning_hands = math.comb(2*9,4) * 6 ** 4

print(n_winning_hands)
print(f"proportion of all hands: {n_winning_hands/total_hands:0.7f}; 1 in {total_hands/n_winning_hands:.0f}")
print(f"ratio vs standard hands: 1 to {total_winning_hands/n_winning_hands:.2f}")

3965760
proportion of all hands: 0.0003313; 1 in 3018
ratio vs standard hands: 1 to 7.35


## Shanten Calculations

- What is the shanten distribution across all seven-tile hands, and what is the average shanten count?
- Excludes consideration of _yontoitsu_ shanten; standard hands only.

In [16]:
suited_ways = suited_df.groupby(['n_tiles', 'n_sets', 'n_blocks', 'n_pairs']).agg({'n_ways': sum}).reset_index()
suited_ways = suited_ways[suited_ways['n_tiles'] <= 7]

In [17]:
shanten_ways = np.zeros(5,dtype=np.int64)

combos = product(range(suited_ways.shape[0]),repeat=2)
for sou_idx,pin_idx in combos:
    sou_part = suited_ways.loc[sou_idx]
    pin_part = suited_ways.loc[pin_idx]
    hand = sou_part + pin_part
    
    # check if hand has the correct size
    if hand['n_tiles'] != 7:
        continue

    # calculate shanten
    has_pair = min(hand['n_pairs'], 1)
    shanten = 4 - 2 * hand['n_sets'] - has_pair - min(hand['n_blocks']-has_pair, 2-hand['n_sets'])
    shanten_ways[shanten] += sou_part['n_ways'] * pin_part['n_ways']

In [18]:
print(shanten_ways)
print(f'tenpai chance: {shanten_ways[0] / shanten_ways.sum():0.7f}; 1 in {shanten_ways.sum() / shanten_ways[0]:0.0f}')
print(f'average shanten: {(shanten_ways * np.arange(5)).sum() / shanten_ways.sum():0.2f}')

[ 49386696 676727040 708534528  38461440         0]
tenpai chance: 0.0335255; 1 in 30
average shanten: 1.50
