#### Background
This notebook seeks to quantify the value of leaving a certain number of tiles in the bag during the pre-endgame based on a repository of games. We will then implement these values as a pre-endgame heuristic in the Macondo speedy player to improve simulation quality.

Initial questions:
1. What is the probability that you will go out first if you make a play leaving N tiles in the bag?
2. (slightly harder) What is the expected improvement in end-of-game spread after making a play that leaves N tiles in the bag?

#### Implementation details
We'll need to make several passes through the log file to obtain the following:
* Final spread of each simulated game
* Delta between pre-endgame and final spread

#### Assumptions
* We're only analyzing complete games
* The last two rows in the log file for a given game ID are also the last two turns of the game

#### Next steps
* Standardize sign convention for spread.
* Start figuring out how to calculate pre-endgame spread

In [1]:
import csv
import pandas as pd

log_folder = '../logs/'
log_file = log_folder + 'log_20200411_short.csv'

In [2]:
second_to_last_move_dict = {}
last_move_dict = {}
spread_dict = {}
p1_minus_p2_spread_dict = {}
preendgame_dict = {}

In [13]:
n=100000

with open(log_file,'r') as f:
    moveReader = csv.reader(f)
    next(moveReader)
    
    for i,row in enumerate(moveReader):
        if i<100:
            print(row)
            print(row[10])
            print(row[10])
        
        if i==n:
            break
            
        if row[1] in last_move_dict.keys():
            second_to_last_move_dict[row[1]] = last_move_dict[row[1]]
        
        last_move_dict[row[1]] = row
        


['0', 'pCdgpioJfST7J6AcwFPhwi', '1', 'AUP?NAL', '8D PLANUlA', '72', '72', '7', '', '72.000', '86']
86
86
['1', 'pCdgpioJfST7J6AcwFPhwi', '2', 'BZONDDR', 'J8 .DZ', '33', '33', '2', 'BDNOR', '31.475', '79']
79
79
['0', 'pCdgpioJfST7J6AcwFPhwi', '3', 'EHEEIOI', 'G5 HEI.IE', '11', '83', '5', 'EO', '10.338', '77']
77
77
['1', 'pCdgpioJfST7J6AcwFPhwi', '4', 'GFBDNOR', 'F10 FROND', '34', '67', '5', 'BG', '29.018', '72']
72
72
['0', 'qnzjBBfJvUqBoehxRpX7SA', '1', 'ESRNLUI', '(exch LU)', '0', '0', '2', 'EINRS', '20.276', '86']
86
86
['1', 'qnzjBBfJvUqBoehxRpX7SA', '2', 'IEYKHPI', '8E PIKI', '20', '20', '4', 'EHY', '22.207', '86']
86
86
['0', 'qnzjBBfJvUqBoehxRpX7SA', '3', 'AEEINRS', 'E6 NA.ERIES', '70', '70', '7', '', '70.000', '82']
82
82
['0', 'pCdgpioJfST7J6AcwFPhwi', '5', 'NILMREO', '6F L.MONIER', '64', '147', '7', '', '64.000', '67']
67
67
['1', 'pCdgpioJfST7J6AcwFPhwi', '6', 'SAVBTBG', '15F STAB', '28', '95', '4', 'BGV', '17.250', '60']
60
60
['1', 'qnzjBBfJvUqBoehxRpX7SA', '4', 'TGPAEHY'

In [4]:
# whoever made the final move went out first
went_out_first_dict = {game_id:last_move_dict[game_id][0] for game_id in last_move_dict.keys()}

# good sanity check - player 1 should go out a bit more often
print('Analyzing {} games, player 1 went out first {}% of the time'.format(
    len(went_out_first_dict),
    100*pd.Series(went_out_first_dict).value_counts(normalize=True)[0]))

Analyzing 4391 games, player 1 went out first 50.785698018674566% of the time


In [5]:
for game_id in last_move_dict.keys():
    spread_dict[game_id] = int(last_move_dict[game_id][6])-int(second_to_last_move_dict[game_id][6])
    p1_minus_p2_spread_dict[game_id] = (int(last_move_dict[game_id][6])-int(second_to_last_move_dict[game_id][6]))*\
        -(int(last_move_dict[game_id][0])-int(second_to_last_move_dict[game_id][0]))

In [6]:
print('The person who went out first won by an average of {} points'.format(pd.Series(spread_dict).mean()))
print('Player 1 won by an average of {} points'.format(pd.Series(p1_minus_p2_spread_dict).mean()))

The person who went out first won by an average of 16.495786836711456 points
Player 1 won by an average of 14.645638806649966 points


In [11]:
p1_minus_p2_spread_dict['pCdgpioJfST7J6AcwFPhwi']

58

In [8]:
second_to_last_move_dict['pCdgpioJfST7J6AcwFPhwi']

['0',
 'pCdgpioJfST7J6AcwFPhwi',
 '19',
 'QIECITU',
 'B10 QUIET',
 '79',
 '520',
 '5',
 'CI',
 '61.000',
 '0']