#### Background
This notebook seeks to quantify the value of leaving a certain number of tiles in the bag during the pre-endgame based on a repository of games. We will then implement these values as a pre-endgame heuristic in the Macondo speedy player to improve simulation quality.

Initial questions:
1. What is the probability that you will go out first if you make a play leaving N tiles in the bag?
2. What is the expected difference between your end-of-turn spread and end-of-game spread?
2. What's your win probability?

#### Implementation details
Similar

#### Assumptions
* We're only analyzing complete games

#### Next steps
* Standardize sign convention for spread.
* Start figuring out how to calculate pre-endgame spread

#### Quackle values for reference
* 0,0.0
* 1,-8.0
* 2,0.0
* 3,-0.5
* 4,-2.0
* 5,-3.5
* 6,-2.0
* 7,2.0
* 8,10.0,
* 9,7.0,
* 10,4.0,
* 11,-1.0,
* 12,-2.0

#### Runtime
I was able to run this script on my local machine for ~20M rows in 2 minutes.

In [1]:
import csv
from datetime import date
import numpy as np
import pandas as pd
import time

log_folder = '../logs/'
log_file = log_folder + 'log_20200515_preendgames.csv'

todays_date = date.today().strftime("%Y%m%d")

In [2]:
final_spread_dict = {}
out_first_dict = {}
win_dict = {}

Store the final spread of each game for comparison. The assumption here is that the last row logged is the final turn of the game, so for each game ID we overwrite the final move dictionary until there are no more rows from that game

In [3]:
t0 = time.time()

with open(log_file,'r') as f:
    moveReader = csv.reader(f)
    next(moveReader)
    
    for i,row in enumerate(moveReader):
        if (i+1)%1000000==0:
            print('Processed {} rows in {} seconds'.format(i+1, time.time()-t0))
            
        if i<10:
            print(row)
            
        if row[0]=='p1':
            final_spread_dict[row[1]] = int(row[6])-int(row[11])
        else:
            final_spread_dict[row[1]] = int(row[11])-int(row[6])
            
        out_first_dict[row[1]] = row[0]
        
        # This flag indicates whether p1 won or not, with 0.5 as the value if the game was tied.
        win_dict[row[1]] = (np.sign(final_spread_dict[row[1]])+1)/2

['p2', 'J3DH3KGZPuqMEDDcFXJauQ', '1', 'CEIIORT', '8H TORIC', '20', '20', '5', 'EI', '21.853', '86', '0']
['p1', 'J3DH3KGZPuqMEDDcFXJauQ', '2', 'CDEIIJW', '9G JEW', '36', '36', '3', 'CDII', '33.265', '81', '20']
['p2', 'J3DH3KGZPuqMEDDcFXJauQ', '3', 'EFIMOQT', '10F MOTE', '30', '50', '4', 'FIQ', '21.483', '78', '36']
['p1', 'J3DH3KGZPuqMEDDcFXJauQ', '4', 'ACDHIII', 'L7 A.IDIC', '22', '58', '5', 'HI', '22.994', '74', '50']
['p2', 'J3DH3KGZPuqMEDDcFXJauQ', '5', 'BEFGIOQ', '11C BEFOG', '37', '87', '5', 'IQ', '32.050', '69', '58']
['p2', 'wvTRHqNYEEjCHJiuGDw3WM', '1', 'DEIOSTW', '8D WITED', '26', '26', '5', 'OS', '32.236', '86', '0']
['p1', 'J3DH3KGZPuqMEDDcFXJauQ', '6', 'AHIOORV', '12D HAO', '32', '90', '3', 'IORV', '29.935', '64', '87']
['p1', 'wvTRHqNYEEjCHJiuGDw3WM', '2', 'DENRTWZ', 'E7 W.ZEN', '34', '34', '4', 'DRT', '34.836', '81', '26']
['p2', 'J3DH3KGZPuqMEDDcFXJauQ', '7', 'AINQRTV', 'J10 TRANQ', '45', '132', '5', 'IV', '40.426', '61', '90']
['p2', 'wvTRHqNYEEjCHJiuGDw3WM', '3', 'AB

In [4]:
preendgame_boundaries = [1,14] # how many tiles are in the bag before we count as pre-endgame?
leftover_tile_range = range(preendgame_boundaries[0],preendgame_boundaries[1]+7)

end_of_turn_spread_counter = {remaining_tile_count:0 for remaining_tile_count in leftover_tile_range}
final_spread_counter = {remaining_tile_count:0 for remaining_tile_count in leftover_tile_range}
game_counter = {remaining_tile_count:0 for remaining_tile_count in leftover_tile_range}
out_first_counter = {remaining_tile_count:0 for remaining_tile_count in leftover_tile_range}
# start_of_turn_tiles_left_counter = {remaining_tile_count:0 for remaining_tile_count in leftover_tile_range}
win_counter = {remaining_tile_count:0 for remaining_tile_count in leftover_tile_range}

In [5]:
t0=time.time()
print('There are {} games'.format(len(final_spread_dict)))

with open(log_file,'r') as f:
    moveReader = csv.reader(f)
    next(moveReader)
    
    for i,row in enumerate(moveReader):
        if (i+1)%1000000==0:
            print('Processed {} rows in {} seconds'.format(i+1, time.time()-t0))
            
        if int(row[10]) >= preendgame_boundaries[0] and int(row[10]) <= preendgame_boundaries[1]:
            end_of_turn_tiles_left = int(row[10])-int(row[7])+7
            end_of_turn_spread_counter[end_of_turn_tiles_left] += int(row[6])-int(row[11])
            game_counter[end_of_turn_tiles_left] += 1        
            out_first_counter[end_of_turn_tiles_left] += out_first_dict[row[1]] == row[0]
            
            if row[0]=='p1':
                final_spread_counter[end_of_turn_tiles_left] += final_spread_dict[row[1]]
                win_counter[end_of_turn_tiles_left] += win_dict[row[1]]
            else:
                final_spread_counter[end_of_turn_tiles_left] -= final_spread_dict[row[1]]
                win_counter[end_of_turn_tiles_left] += (1-win_dict[row[1]])

There are 817267 games
Processed 1000000 rows in 2.41697359085083 seconds
Processed 2000000 rows in 4.930365800857544 seconds
Processed 3000000 rows in 7.389007806777954 seconds
Processed 4000000 rows in 9.83132266998291 seconds
Processed 5000000 rows in 12.306525945663452 seconds
Processed 6000000 rows in 14.78326678276062 seconds
Processed 7000000 rows in 17.268573760986328 seconds
Processed 8000000 rows in 19.874756813049316 seconds
Processed 9000000 rows in 22.451932907104492 seconds
Processed 10000000 rows in 25.057535886764526 seconds
Processed 11000000 rows in 27.684056758880615 seconds
Processed 12000000 rows in 30.246875762939453 seconds
Processed 13000000 rows in 32.76583981513977 seconds
Processed 14000000 rows in 35.33682870864868 seconds
Processed 15000000 rows in 37.899295806884766 seconds
Processed 16000000 rows in 40.42449188232422 seconds
Processed 17000000 rows in 43.02092885971069 seconds
Processed 18000000 rows in 45.587676763534546 seconds


In [6]:
end_of_turn_spread_series = pd.Series(end_of_turn_spread_counter,name='end_of_turn_spread')
final_spread_series = pd.Series(final_spread_counter,name='final_spread')
game_series = pd.Series(game_counter,name='count')
out_first_series = pd.Series(out_first_counter, name='out_first_count')
win_series = pd.Series(win_counter, name='win_count')

In [7]:
df = pd.concat([end_of_turn_spread_series, final_spread_series, game_series,
                out_first_series, win_series],axis=1)

In [8]:
df['spread_delta'] = df['final_spread']-df['end_of_turn_spread']
df['avg_spread_delta'] = df['spread_delta']/df['count']
df['out_first_pct'] = 100*df['out_first_count']/df['count']
df['win_pct'] = 100*df['win_count']/df['count']

In [14]:
quackle_peg_dict = {
    1:-8.0,
    2:0.0,
    3:-0.5,
    4:-2.0,
    5:-3.5,
    6:-2.0,
    7:2.0,
    8:10.0,
    9:7.0,
    10:4.0,
    11:-1.0,
    12:-2.0
}

quackle_peg_series = pd.Series(quackle_peg_dict, name='quackle_values')

In [16]:
df = pd.concat([df,quackle_peg_series],axis=1)

In [18]:
df['quackle_macondo_delta'] = df['quackle_values']-df['avg_spread_delta']

In [19]:
df

Unnamed: 0,end_of_turn_spread,final_spread,count,out_first_count,win_count,spread_delta,avg_spread_delta,out_first_pct,win_pct,quackle_values,quackle_macondo_delta
1,1424482,815723,26831,23679,16794.5,-608759,-22.688644,88.252395,62.593642,-8.0,14.688644
2,1562416,819930,35311,29503,21113.0,-742486,-21.027045,83.551868,59.791566,0.0,21.027045
3,2047038,659927,60696,43130,33153.0,-1387111,-22.853417,71.059048,54.621392,-0.5,22.353417
4,2726961,133851,107692,62063,54591.5,-2593110,-24.078947,57.630093,50.692252,-2.0,22.078947
5,3474924,-640197,168474,77505,82125.0,-4115121,-24.425852,46.004131,48.746394,-3.5,20.925852
6,3731121,-1289327,206882,74351,98828.5,-5020448,-24.267205,35.938844,47.770468,-2.0,22.267205
7,3714007,-279585,211376,59454,104976.0,-3993592,-18.893309,28.127129,49.663159,2.0,20.893309
8,3670547,931117,210489,83786,108728.5,-2739430,-13.014599,39.805406,51.655193,10.0,23.014599
9,3798585,812144,209899,107310,107768.5,-2986441,-14.227991,51.124588,51.343027,7.0,21.227991
10,3788064,358082,209460,120511,105784.0,-3429982,-16.375356,57.534135,50.503199,4.0,20.375356


Save a summary and a verbose version of preendgame heuristic values.

In [10]:
df['avg_spread_delta'].to_csv('peg_heuristics_' + todays_date + '.csv')
df.to_csv('peg_summary_' + todays_date + '.csv')

  """Entry point for launching an IPython kernel.
