#### Smoothed superleaves
We calculate superleaves by having Macondo play itself millions of times, and then seeing the difference between how much plays score that contain that superleave versus the average of all other plays (the "leave value"). However, some of the lower probability superleaves get observed very infrequently, and so end up with inaccurate superleave values (for instance, if the one time you have DLPQX? you played QUADPLEX for 300+, you're going to incorrectly think that's a dream leave!).

To compensate this, we "smooth out" the superleaves for any superleave that was observed less than a cutoff number of times (maybe 50 or 100). We sum up over the statistics for all neighboring leaves (all leaves that are only different by 1 tile and contain the same number of blanks). The proper way of doing this is really with a superior model like a neural net, but this gets pretty close and prevents "gravity wells" (when a superleave is valued way too high, and the fast player will keep trying to keep that superleave at all costs).

This smoothing script is also now included at the end of generate_superleaves as an automatic next step once superleave calculation is done, since it now takes about 2 minutes to run. I'm keeping it around as a stand-alone to expedite further tinkering.

In [1]:
from itertools import combinations
import numpy as np
import pandas as pd
import pickle as pkl
import time

pd.options.display.max_rows = 200
run_date = '20200515'

In [2]:
summary_df = pd.read_csv('leave_summary_' + run_date +'.csv').rename(
    columns={'Unnamed: 0':'leave'}).set_index('leave')

In [3]:
count_dict = summary_df['count'].to_dict()
equity_dict = summary_df['equity'].to_dict()
mean_equity_dict = summary_df['mean_equity'].to_dict()
summary_df = summary_df.reset_index()
summary_df['leave_len'] = summary_df['leave'].apply(lambda x: len(x))

You only need to run the following code if you don't have an existing pickle file with all of the possible superleaves.

In [4]:
# t0 = time.time()

# leaves = {i:sorted(list(set(list(combinations(truncated_tilebag,i))))) for i in 
#           range(1,maximum_superleave_length+1)}

# # turn leaves from lists of letters into strings
# # algorithm runs faster if leaves non-alphabetical!
# for i in range(1,maximum_superleave_length+1):
#     leaves[i] = [''.join(sorted(leave, key=sort_func))
#                  for leave in leaves[i]]

# t1 = time.time()
# print('Calculated superleaves up to length {} in {} seconds'.format(
#     maximum_superleave_length,t1-t0))

# pkl.dump(leaves,open('all_leaves.p','wb'))

In [5]:
leaves = pkl.load(open('all_leaves.p','rb'))
alphabetical_key = '?ABCDEFGHIJKLMNOPQRSTUVWXYZ'
sort_func = lambda x: alphabetical_key.index(x)

In [6]:
child_leaves = {leave:[''.join(sorted(leave+letter, key=sort_func)) for letter in alphabetical_key]
                for i in range(1,6) for leave in leaves[i]}
child_leaves[''] = [x for x in alphabetical_key]

The list of neighbors in the following function will include some impossible racks. Those are filtered out in the calculate_smoothed_superleave because the code is waaaay faster as a result.

In [7]:
def get_neighboring_leaves(original_leave):
    t0 = time.time()
    subleaves = [''.join(x) for x in combinations(original_leave, len(original_leave)-1)]
    t1 = time.time()
    
    neighbors = []
    for leave in subleaves:
        neighbors += child_leaves[leave]
        
    t2 = time.time()
    
    # filter neighbors to make sure they have the same number of blanks
    blank_count = sum([x=='?' for x in original_leave])
    
    t3 = time.time()
    
    neighbors = [leave for leave in neighbors if(sum([x=='?' for x in leave])==blank_count)]
    
    t4 = time.time()
    
    return neighbors


def calculate_smoothed_superleave(superleave):
    neighbors = get_neighboring_leaves(superleave)
    
    neighboring_equity = 0
    neighboring_count = 0
    equity_list = []
        
    for neighbor_leave in neighbors:
        neighboring_equity += equity_dict.get(neighbor_leave, 0)
        neighboring_count += count_dict.get(neighbor_leave, 0)
        equity_list.append(mean_equity_dict.get(neighbor_leave))
                
    equity_list = [x for x in equity_list if pd.notnull(x)]
    
#     print('Original:')
#     print(summary_df.loc[summary_df['leave']==superleave])
#     print(neighboring_equity, neighboring_count, neighboring_equity/neighboring_count)
#     print(np.mean(equity_list))
#     print(equity_list)
    
    return neighboring_equity/neighboring_count

Shows how many superleaves were never seen, and how many were seen less than 10 times

In [24]:
print(pd.notnull(summary_df.loc[summary_df['leave_len']==5])['ev'].value_counts())
print((summary_df.loc[summary_df['leave_len']==5]['count']<10).value_counts())

print(pd.notnull(summary_df.loc[summary_df['leave_len']==6])['ev'].value_counts())
print((summary_df.loc[summary_df['leave_len']==6]['count']<10).value_counts())

True    148150
Name: ev, dtype: int64
False    148148
True          2
Name: count, dtype: int64
True     732814
False      4497
Name: ev, dtype: int64
False    665564
True      71747
Name: count, dtype: int64


Show the strongest superleaves in your lexicon. If your superleaves are unsmoothed, you'll likely see some weird superleaves at the top of this list with low count.

In [9]:
summary_df.loc[summary_df['leave_len']==5].sort_values('ev', ascending=False)[:10]

Unnamed: 0,leave,points,equity,count,bingo_count,mean_score,mean_equity,bingo pct,pct,adjusted_mean_score,ev,leave_len
123214,??ESZ,115307,129780.913,1235,860,93.365992,105.08576,69.635628,0.135028,54.164938,62.467565,5
118409,??EIZ,272424,304339.545,2941,2137,92.629718,103.481654,72.66236,0.321553,53.428664,60.86346,5
157270,??ISZ,93231,105365.189,1021,691,91.313418,103.19803,67.678746,0.111631,52.112365,60.579836,5
54189,??ASZ,77782,89282.271,879,568,88.489192,101.572549,64.618885,0.096105,49.288139,58.954355,5
43198,??AEZ,222029,255113.585,2517,1652,88.21176,101.356212,65.633691,0.275195,49.010706,58.738017,5
49384,??AIZ,177905,202670.494,2014,1358,88.334161,100.630831,67.428004,0.2202,49.133107,58.012637,5
175662,??QSU,72850,81577.116,816,555,89.276961,99.971956,68.014706,0.089217,50.075907,57.353761,5
122810,??EQU,195412,222553.969,2227,1554,87.746744,99.934427,69.779973,0.243488,48.545691,57.316232,5
122367,??EOZ,186869,214065.116,2150,1427,86.915814,99.56517,66.372093,0.235069,47.71476,56.946976,5
156866,??IQU,161499,184988.163,1871,1282,86.316943,98.871279,68.519508,0.204565,47.115889,56.253084,5


In [10]:
summary_df['smoothed_ev'] = summary_df['ev']
summary_df['point_equity_diff'] = (summary_df['points']-summary_df['equity'])/summary_df['count']

If there's a big delta between the equity scored with a given leave and the average points, that can be a sign that your existing ev for a superleave is too high.

In [11]:
summary_df.loc[summary_df['leave_len']==5].sort_values('point_equity_diff')[:10]

Unnamed: 0,leave,points,equity,count,bingo_count,mean_score,mean_equity,bingo pct,pct,adjusted_mean_score,ev,leave_len,smoothed_ev,point_equity_diff
167808,??LQX,5123,12012.831,164,8,31.237805,73.24897,4.878049,0.017931,-7.963249,30.630775,5,30.630775,-42.011165
109471,??DQX,5501,12900.021,181,0,30.392265,71.270834,0.0,0.01979,-8.808788,28.65264,5,28.65264,-40.878569
177264,??VVZ,676,1917.022,31,0,21.806452,61.839419,0.0,0.003389,-17.394602,19.221225,5,19.221225,-40.032968
177239,??UXZ,2590,5288.186,68,0,38.088235,77.767441,0.0,0.007435,-1.112818,35.149247,5,35.149247,-39.679206
161003,??JQU,4213,9288.051,128,2,32.914062,72.562898,1.5625,0.013995,-6.286991,29.944704,5,29.944704,-39.648836
175380,??PWZ,3006,6557.393,91,0,33.032967,72.059264,0.0,0.009949,-6.168087,29.441069,5,29.441069,-39.026297
175951,??QXZ,908,1948.395,27,0,33.62963,72.162778,0.0,0.002952,-5.571424,29.544583,5,29.544583,-38.533148
175872,??QUW,9071,18839.257,254,11,35.712598,74.170303,4.330709,0.027771,-3.488455,31.552108,5,31.552108,-38.457705
170886,??MWZ,2441,5160.027,71,2,34.380282,72.676437,2.816901,0.007763,-4.820772,30.058242,5,30.058242,-38.296155
168896,??MMZ,2111,4014.222,50,10,42.22,80.28444,20.0,0.005467,3.018946,37.666245,5,37.666245,-38.06444


In [12]:
# What's the minimum number of times you want to see a superleave before you'll take the
# value as is, without smoothing?
five_tile_superleave_cutoff = 100
six_tile_superleave_cutoff = 50

In [13]:
leaves_to_smooth = list(summary_df.loc[(summary_df['leave_len']==5) & 
    (summary_df['count']<five_tile_superleave_cutoff)]['leave'].values)
print(len(leaves_to_smooth))

leaves_to_smooth += list(summary_df.loc[(summary_df['leave_len']==6) &
    (summary_df['count']<six_tile_superleave_cutoff)]['leave'].values)
print(len(leaves_to_smooth))

4969
239415


"ev" is defined as the average equity of a superleave, minus the average equity over all plays in a run of simulated games. It's about 41 points.

In [14]:
mean_equity = summary_df.loc[summary_df['leave']=='??']['mean_equity'].values[0] - \
    summary_df.loc[summary_df['leave']=='??']['ev'].values[0]

In [15]:
summary_df = summary_df.set_index('leave')
smooth_ev_dict = summary_df['ev'].to_dict()

In [16]:
ev_dict = summary_df['ev'].to_dict()

In [17]:
t0 = time.time()

for i,leave in enumerate(leaves_to_smooth):
    if (i+1)%1000==0:
        print(i, time.time()-t0)
    
    smooth_ev_dict[leave] = calculate_smoothed_superleave(leave) - mean_equity

999 0.24864697456359863
1999 0.494337797164917
2999 0.7545878887176514
3999 1.0196728706359863
4999 1.282167673110962
5999 1.71657395362854
6999 2.065020799636841
7999 2.4082350730895996
8999 2.74455189704895
9999 3.094031810760498
10999 3.4341959953308105
11999 3.7790708541870117
12999 4.127179861068726
13999 4.460558891296387
14999 4.809697866439819
15999 5.136732816696167
16999 5.463489770889282
17999 5.790481805801392
18999 6.118754863739014
19999 6.455884695053101
20999 6.789628982543945
21999 7.156440734863281
22999 7.4989588260650635
23999 7.885048866271973
24999 8.218668699264526
25999 8.535114049911499
26999 8.846161842346191
27999 9.144171714782715
28999 9.447783946990967
29999 9.75968885421753
30999 10.080253839492798
31999 10.391347885131836
32999 10.73531699180603
33999 11.060918807983398
34999 11.398963928222656
35999 11.709194898605347
36999 12.04073977470398
37999 12.371803998947144
38999 12.708934783935547
39999 13.060337781906128
40999 13.392892837524414
41999 13.7135

In [28]:
pd.Series(smooth_ev_dict).to_csv('leave_values_' + run_date + '_smoothed.csv')

  """Entry point for launching an IPython kernel.


In [29]:
smoothed_ev = pd.Series(smooth_ev_dict,name='smoothed_ev')

In [30]:
summary_df = summary_df.drop('smoothed_ev', axis=1)
summary_df = pd.concat([summary_df,smoothed_ev],axis=1)

In [31]:
summary_df

Unnamed: 0_level_0,points,equity,count,bingo_count,mean_score,mean_equity,bingo pct,pct,adjusted_mean_score,ev,leave_len,point_equity_diff,smoothed_ev
leave,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
?,808232438,9.814264e+08,14471709,7488398,55.849136,67.816898,51.745084,1582.257737,16.648082,25.198703,1,-11.967762,25.198703
A,1969983034,2.142377e+09,48800245,10282885,40.368302,43.900948,21.071380,5335.552642,1.167248,1.282753,1,-3.532646,1.282753
B,433317338,4.611732e+08,11644326,1593330,37.212745,39.604976,13.683317,1273.127099,-1.988308,-3.013219,1,-2.392230,-3.013219
C,567890854,6.164116e+08,14213437,2723586,39.954506,43.368229,19.162051,1554.019685,0.753452,0.750034,1,-3.413723,0.750034
D,972192302,1.044775e+09,24359309,4829780,39.910504,42.890165,19.827246,2663.313996,0.709450,0.271970,1,-2.979661,0.271970
...,...,...,...,...,...,...,...,...,...,...,...,...,...
??WXYY,43,8.311900e+01,1,0,43.000000,83.119000,0.000000,0.000109,3.798946,40.500805,6,-40.119000,32.039652
?WXYYZ,84,8.383600e+01,1,0,84.000000,83.836000,0.000000,0.000109,44.798946,41.217805,6,0.164000,18.028086
??WXYZ,50,8.135600e+01,1,0,50.000000,81.356000,0.000000,0.000109,10.798946,38.737805,6,-31.356000,34.208376
??WYYZ,0,0.000000e+00,0,0,,,,0.000000,,,6,,32.479705


In [34]:
summary_df['ev_delta'] = summary_df['smoothed_ev']-summary_df['ev']
summary_df['abs_ev_delta'] = summary_df['ev_delta'].apply(lambda x: abs(x))

In [36]:
summary_df.sort_values('abs_ev_delta', ascending=False)[:200]

Unnamed: 0_level_0,points,equity,count,bingo_count,mean_score,mean_equity,bingo pct,pct,adjusted_mean_score,ev,leave_len,point_equity_diff,smoothed_ev,ev_delta,abs_ev_delta
leave,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
??FFGQ,284,284.0,1,1,284.0,284.0,100.0,0.000109,244.798946,241.381805,6,0.0,24.275402,-217.106403,217.106403
HKPQXZ,84,90.141,1,0,84.0,90.141,0.0,0.000109,44.798946,47.522805,6,-6.141,-6.01603,-53.538835,53.538835
??HHJS,135,135.0,1,1,135.0,135.0,100.0,0.000109,95.798946,92.381805,6,0.0,41.263334,-51.118472,51.118472
FFHJPZ,75,81.853,1,0,75.0,81.853,0.0,0.000109,35.798946,39.234805,6,-6.853,-10.137014,-49.371819,49.371819
HHQSSS,76,84.675,1,0,76.0,84.675,0.0,0.000109,36.798946,42.056805,6,-8.675,-7.054861,-49.111666,49.111666
BBHKVY,79,71.606,1,0,79.0,71.606,0.0,0.000109,39.798946,28.987805,6,7.394,-16.7722,-45.760006,45.760006
?FFQUX,80,103.654,1,0,80.0,103.654,0.0,0.000109,40.798946,61.035805,6,-23.654,16.808852,-44.226954,44.226954
JKQSYZ,78,83.572,1,0,78.0,83.572,0.0,0.000109,38.798946,40.953805,6,-5.572,-3.268969,-44.222775,44.222775
??ABBW,376,376.0,3,3,125.333333,125.333333,100.0,0.000328,86.13228,82.715139,6,0.0,38.541853,-44.173286,44.173286
JKQXYY,78,79.807,1,0,78.0,79.807,0.0,0.000109,38.798946,37.188805,6,-1.807,-5.631062,-42.819867,42.819867
