# Defining a value function via Monte Carlo

The goal of this notebook is to illustrate the mechanics used to compute a value function via Monte Carlo.
For this many instances of `ConnectFour` will be created. Then both players perform random moves until the game ends. The value function is then obtained by averaging whether one won or not.

In [2]:
import sys
sys.path.append('..')

from connectfour import ConnectFour
import random
import numpy as np
from time import time

In [3]:
def take_random_move(connect_four):
    move = random.choice(connect_four.possible_moves())
    connect_four.place(move)
    
def playout(connect_four):
    while not connect_four.game_over():
        take_random_move(connect_four)
        
def determine_one_value():
    connect_four = ConnectFour()
    playout(connect_four)
    
    return ((connect_four.winner() or 0) + 1)/2

[determine_one_value() for _ in range(10)]

[1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0]

# Convergence
We now illustrate the convergence behavior of this value function.

In [4]:
def determine_value(steps):
    start_time = time()
    return np.mean([determine_one_value() for _ in range(steps)]), time() - start_time

for steps in [1, 10, 100, 1000, 10000]:
    v, t = determine_value(steps)
    print('Steps {:5} value {:2.3f} time {:2.3f}'.format(steps, v, t))

Steps     1 value 0.000 time 0.003
Steps    10 value 0.400 time 0.024
Steps   100 value 0.630 time 0.191
Steps  1000 value 0.580 time 1.925
Steps 10000 value 0.541 time 19.503
