#### Background
This notebook seeks to quantify the value of leaving a certain number of tiles in the bag during the pre-endgame based on a repository of games. We will then implement these values as a pre-endgame heuristic in the Macondo speedy player to improve simulation quality.

Initial questions:
1. What is the probability that you will go out first if you make a play leaving N tiles in the bag?
2. What is the expected difference between your end-of-turn spread and end-of-game spread?
2. What's your win probability?

#### Implementation details
Similar

#### Assumptions
* We're only analyzing complete games

#### Next steps
* Standardize sign convention for spread.
* Start figuring out how to calculate pre-endgame spread

#### Quackle values for reference
* 0,0.0
* 1,-8.0
* 2,0.0
* 3,-0.5
* 4,-2.0
* 5,-3.5
* 6,-2.0
* 7,2.0
* 8,10.0,
* 9,7.0,
* 10,4.0,
* 11,-1.0,
* 12,-2.0

In [10]:
import csv
import numpy as np
import pandas as pd
import time

log_folder = '../logs/'
log_file = log_folder + 'log_10m_preendgames.csv'

todays

In [15]:
final_spread_dict = {}
out_first_dict = {}
win_dict = {}

Store the final spread of each game for comparison. The assumption here is that the last row logged is the final turn of the game, so for each game ID we overwrite the final move dictionary until there are no more rows from that game

In [35]:
t0 = time.time()

with open(log_file,'r') as f:
    moveReader = csv.reader(f)
    next(moveReader)
    
    for i,row in enumerate(moveReader):
        if (i+1)%1000000==0:
            print('Processed {} rows in {} seconds'.format(i+1, time.time()-t0))
            
        if i<10:
            print(row)
            
        if row[0]=='p1':
            final_spread_dict[row[1]] = int(row[6])-int(row[11])
        else:
            final_spread_dict[row[1]] = int(row[11])-int(row[6])
            
        out_first_dict[row[1]] = row[0]
        
        # This flag indicates whether p1 won or not, with 0.5 as the value if the game was tied.
        win_dict[row[1]] = (np.sign(final_spread_dict[row[1]])+1)/2

['p2', 'J3DH3KGZPuqMEDDcFXJauQ', '1', 'CEIIORT', '8H TORIC', '20', '20', '5', 'EI', '21.853', '86', '0']
['p1', 'J3DH3KGZPuqMEDDcFXJauQ', '2', 'CDEIIJW', '9G JEW', '36', '36', '3', 'CDII', '33.265', '81', '20']
['p2', 'J3DH3KGZPuqMEDDcFXJauQ', '3', 'EFIMOQT', '10F MOTE', '30', '50', '4', 'FIQ', '21.483', '78', '36']
['p1', 'J3DH3KGZPuqMEDDcFXJauQ', '4', 'ACDHIII', 'L7 A.IDIC', '22', '58', '5', 'HI', '22.994', '74', '50']
['p2', 'J3DH3KGZPuqMEDDcFXJauQ', '5', 'BEFGIOQ', '11C BEFOG', '37', '87', '5', 'IQ', '32.050', '69', '58']
['p2', 'wvTRHqNYEEjCHJiuGDw3WM', '1', 'DEIOSTW', '8D WITED', '26', '26', '5', 'OS', '32.236', '86', '0']
['p1', 'J3DH3KGZPuqMEDDcFXJauQ', '6', 'AHIOORV', '12D HAO', '32', '90', '3', 'IORV', '29.935', '64', '87']
['p1', 'wvTRHqNYEEjCHJiuGDw3WM', '2', 'DENRTWZ', 'E7 W.ZEN', '34', '34', '4', 'DRT', '34.836', '81', '26']
['p2', 'J3DH3KGZPuqMEDDcFXJauQ', '7', 'AINQRTV', 'J10 TRANQ', '45', '132', '5', 'IV', '40.426', '61', '90']
['p2', 'wvTRHqNYEEjCHJiuGDw3WM', '3', 'AB

In [36]:
preendgame_boundaries = [1,14] # how many tiles are in the bag before we count as pre-endgame?
leftover_tile_range = range(preendgame_boundaries[0],preendgame_boundaries[1]+7)

end_of_turn_spread_counter = {remaining_tile_count:0 for remaining_tile_count in leftover_tile_range}
final_spread_counter = {remaining_tile_count:0 for remaining_tile_count in leftover_tile_range}
game_counter = {remaining_tile_count:0 for remaining_tile_count in leftover_tile_range}
out_first_counter = {remaining_tile_count:0 for remaining_tile_count in leftover_tile_range}
# start_of_turn_tiles_left_counter = {remaining_tile_count:0 for remaining_tile_count in leftover_tile_range}
win_counter = {remaining_tile_count:0 for remaining_tile_count in leftover_tile_range}

In [41]:
t0=time.time()
print('There are {} games'.format(len(final_spread_dict)))

with open(log_file,'r') as f:
    moveReader = csv.reader(f)
    next(moveReader)
    
    for i,row in enumerate(moveReader):
        if (i+1)%1000000==0:
            print('Processed {} rows in {} seconds'.format(i+1, time.time()-t0))
            
        if int(row[10]) >= preendgame_boundaries[0] and int(row[10]) <= preendgame_boundaries[1]:
            end_of_turn_tiles_left = int(row[10])-int(row[7])+7
            end_of_turn_spread_counter[end_of_turn_tiles_left] += int(row[6])-int(row[11])
            game_counter[end_of_turn_tiles_left] += 1        
            out_first_counter[end_of_turn_tiles_left] += out_first_dict[row[1]] == row[0]
            
            if row[0]=='p1':
                final_spread_counter[end_of_turn_tiles_left] += final_spread_dict[row[1]]
                win_counter[end_of_turn_tiles_left] += win_dict[row[1]]
            else:
                final_spread_counter[end_of_turn_tiles_left] -= final_spread_dict[row[1]]
                win_counter[end_of_turn_tiles_left] += (1-win_dict[row[1]])

There are 435135 games
Processed 1000000 rows in 2.4444141387939453 seconds
Processed 2000000 rows in 4.92453408241272 seconds
Processed 3000000 rows in 7.336021184921265 seconds
Processed 4000000 rows in 9.86043095588684 seconds
Processed 5000000 rows in 12.243863821029663 seconds
Processed 6000000 rows in 14.6528480052948 seconds
Processed 7000000 rows in 17.034505128860474 seconds
Processed 8000000 rows in 19.488646984100342 seconds
Processed 9000000 rows in 21.911154985427856 seconds


In [52]:
end_of_turn_spread_series = pd.Series(end_of_turn_spread_counter,name='end_of_turn_spread')
final_spread_series = pd.Series(final_spread_counter,name='final_spread')
game_series = pd.Series(game_counter,name='count')
out_first_series = pd.Series(out_first_counter, name='out_first_count')
win_series = pd.Series(win_counter, name='win_count')

In [54]:
df = pd.concat([end_of_turn_spread_series, final_spread_series, game_series,
                out_first_series, win_series],axis=1)

In [64]:
df['spread_delta'] = df['final_spread']-df['end_of_turn_spread']
df['avg_spread_delta'] = df['spread_delta']/df['count']
df['out_first_pct'] = 100*df['out_first_count']/df['count']
df['win_pct'] = 100*df['win_count']/df['count']

In [65]:
df

Unnamed: 0,end_of_turn_spread,final_spread,count,out_first_count,win_count,spread_delta,avg_spread_delta,win_pct,out_first_pct
1,764014,439546,14407,12728,9017.0,-324468,-22.521552,62.587631,88.345943
2,826904,431920,18802,15694,11255.0,-394984,-21.007552,59.860653,83.469844
3,1099034,358649,32286,22877,17654.5,-740385,-22.932076,54.681596,70.857338
4,1474131,87250,57808,33256,29386.0,-1386881,-23.99116,50.833795,57.52837
5,1835052,-340485,89338,41193,43554.5,-2175537,-24.351754,48.752491,46.109158
6,2049874,-628950,110027,39466,52682.0,-2678824,-24.346969,47.880975,35.869378
7,1938865,-171198,112459,31868,55654.0,-2110063,-18.762954,49.488258,28.337439
8,1931896,463866,112324,44887,57820.0,-1468030,-13.069602,51.476087,39.962074
9,1993838,400396,111614,57026,57287.5,-1593442,-14.276363,51.326447,51.092157
10,2016971,194976,111333,63959,56383.0,-1821995,-16.365274,50.643565,57.448376
