## Investigating two different approaches to playing tic-tac-toe:
### 1. Simple play: a greedy approach where information isn't shared between nodes
### 2. Stochastic play: a greedy approach where information is shared between nodes
### 3. Comparison of simple play vs stochastic play

#### Note: One of the things we'll investigate is the existence of a first-mover advantage i.e. the first mover can afford to be careless about their first move whereas the second player can't. 

In [2]:
## load everything:

import numpy as np
from simple_play import simple_play
from stochastic_play import stochastic_play

### there are four possible permutations where player A moves first
### and so we'll use a 2-d boolean vector(ex. [0,1]) to denote the
### combination of interest:
## 1. simple(player A) vs simple(player B)
## 2. stochastic(player A) vs stochastic(player B)
## 3. simple(player A) vs stochastic(player B)
## 4. stochastic(player A) vs simple(player B)

def game_simulation(player_combo,num_games,random_start,depth,gamma):

    outcomes = np.zeros(num_games)
    
    initial_conditions = []
    
    for i in range(num_games):
        
        game = 1.0
        
        Z = np.zeros((3,3))
        X, O = np.random.choice(np.arange(9),2,replace=False)
        Z[int(X/3)][X % 3] = 1.0
        
        if random_start == 1.0:
            ## the second player plays randomly:
            Z[int(O/3)][O % 3] = -1.0
            
        else:
            ## the second player doesn't play randomly:
            if player_combo[1] == 1.0:
                P2 = stochastic_play(-1.0*Z,depth,gamma)
                Z += -1.0*P2.move()
            else:   
                P2 = simple_play(-1.0*Z,depth,gamma)
                Z += -1.0*P2.move()
        
        initial_conditions.append(np.copy(Z))
    
        while game == 1.0:
            ## player A move:
            if player_combo[0] == 1.0:
                P1 = stochastic_play(Z,depth,gamma)
                Z += P1.move()
            else:  
                P1 = simple_play(Z,depth,gamma)
                Z += P1.move()
            
            if P1.score(Z)[1] != 0.0:
                outcomes[i] = P1.score(Z)[1]
                
                game = 0.0
                
                break
            
            ## player B move:
            if player_combo[1] == 1.0:
                P2 = stochastic_play(-1.0*Z,depth,gamma)
                Z += -1.0*P2.move()
            else:   
                P2 = simple_play(-1.0*Z,depth,gamma)
                Z += -1.0*P2.move()
            
    return initial_conditions, outcomes

## 1. Analysing simple play via self-play: 
### a. Distribution of outcomes when second mover plays randomly
### b. Distribution of outcomes when the second player is non-random

In [2]:
player_combo = np.array([0.0,0.0])
num_games = 100
random_start = 1.0
depth = 5
gamma = 0.5

_, outcomes = game_simulation(player_combo,num_games,random_start,depth,gamma)

  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


In [3]:
np.mean(outcomes == 1.0), np.mean(outcomes == 0.5),np.mean(outcomes == -1.0) ## win, draw, loss percentages

(0.69, 0.09, 0.22)

In [4]:
np.mean(outcomes == 1.0)/np.mean(outcomes == -1.0) ## win-loss ratio

3.1363636363636362

### When player B makes their first move randomly, there appears to be a distinct first mover advantage. 

In [5]:
player_combo = np.array([0.0,0.0])
num_games = 100
random_start = 0.0
depth = 5
gamma = 0.5

_, outcomes = game_simulation(player_combo,num_games,random_start,depth,gamma)

In [6]:
np.mean(outcomes == 1.0), np.mean(outcomes == 0.5),np.mean(outcomes == -1.0) ## win, draw, loss percentages

(0.36, 0.2, 0.44)

In [7]:
np.mean(outcomes == 1.0)/np.mean(outcomes == -1.0) ## win-loss ratio

0.8181818181818181

### The first mover advantage disappears when the second player reacts in an appropriate manner. 

## 2. Analysing stochastic play via self-play: 
### a. Distribution of outcomes when second mover plays randomly
### b. Distribution of outcomes when the second player is non-random

In [2]:
player_combo = np.array([1.0,1.0])
num_games = 100
random_start = 1.0
depth = 5
gamma = 0.5

_, outcomes = game_simulation(player_combo,num_games,random_start,depth,gamma)

In [3]:
np.mean(outcomes == 1.0), np.mean(outcomes == 0.5),np.mean(outcomes == -1.0) ## win, draw, loss percentages

(0.82, 0.15, 0.03)

In [4]:
np.mean(outcomes == 1.0)/np.mean(outcomes == -1.0) ## win-loss ratio

27.333333333333332

### As we will see later, stochastic player is a 'stronger player' than 'simple player' and here there is again empirical evidence for a first mover advantage. 

In [5]:
player_combo = np.array([1.0,1.0])
num_games = 100
random_start = 0.0
depth = 5
gamma = 0.5

_, outcomes = game_simulation(player_combo,num_games,random_start,depth,gamma)

In [6]:
np.mean(outcomes == 1.0), np.mean(outcomes == 0.5),np.mean(outcomes == -1.0) ## win, draw, loss percentage

(0.8, 0.18, 0.02)

In [7]:
np.mean(outcomes == 1.0)/np.mean(outcomes == -1.0) ## win-loss ratio 

40.0

### It appears that in the case of the stochastic player(which doesn't play perfectly) there is again a first-mover advantage even in situations
### where the first move of player B is non-random. I suspect that this advantage would tend to zero as the strength of the second player approached
### perfection. 

## 3. Stochastic player vs Simple player where stochastic player moves first:
### a. Simple player's first move is random
### b. Simple player's first move is non-random

In [3]:
player_combo = np.array([1.0,0.0])
num_games = 100
random_start = 1.0
depth = 5
gamma = 0.5

_, outcomes = game_simulation(player_combo,num_games,random_start,depth,gamma)

In [4]:
np.mean(outcomes == 1.0), np.mean(outcomes == 0.5),np.mean(outcomes == -1.0) ## win, draw, loss percentage

(0.66, 0.27, 0.07)

In [5]:
np.mean(outcomes == 1.0)/np.mean(outcomes == -1.0) ## win-loss ratio

9.428571428571429

### There's a clear first-mover advantage but it's less remarkable than the case where simple player or stochastic player plays against itself. They say that styles make fights but neither of these players are making an effort to adapt to each other so I suspect that there would be a similar ratio if the setup was reversed i.e. Simple player moved first. 

In [6]:
player_combo = np.array([1.0,0.0])
num_games = 100
random_start = 0.0
depth = 5
gamma = 0.5

_, outcomes = game_simulation(player_combo,num_games,random_start,depth,gamma)

In [7]:
np.mean(outcomes == 1.0), np.mean(outcomes == 0.5),np.mean(outcomes == -1.0) ## win, draw, loss percentage

(0.65, 0.23, 0.12)

In [8]:
np.mean(outcomes == 1.0)/np.mean(outcomes == -1.0) ## win-loss ratio

5.416666666666667

### Stochastic player still performs well without an unfair advantage but the difference is much less remarkable. Let's see what happens when the situation is reversed. 

## 4. Simple player vs Stochastic player where simple player moves first:
### a. Stochastic player's first move is random
### b. Stochastic player's first move is non-random

In [9]:
player_combo = np.array([0.0,1.0])
num_games = 100
random_start = 1.0
depth = 5
gamma = 0.5

_, outcomes = game_simulation(player_combo,num_games,random_start,depth,gamma)

  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


In [10]:
np.mean(outcomes == 1.0), np.mean(outcomes == 0.5),np.mean(outcomes == -1.0) ## win, draw, loss percentage

(0.65, 0.31, 0.04)

In [11]:
np.mean(outcomes == 1.0)/np.mean(outcomes == -1.0) ## win-loss ratio

16.25

### Interestingly, the first-mover advantage here is much more evident than in the situation where stochastic player moved first against simple player. It might be that Stochastic player is somehow weaker. 

In [12]:
player_combo = np.array([0.0,1.0])
num_games = 100
random_start = 0.0
depth = 5
gamma = 0.5

_, outcomes = game_simulation(player_combo,num_games,random_start,depth,gamma)

  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


In [13]:
np.mean(outcomes == 1.0), np.mean(outcomes == 0.5),np.mean(outcomes == -1.0) ## win, draw, loss percentage

(0.68, 0.32, 0.0)

In [14]:
np.mean(outcomes == 1.0)/np.mean(outcomes == -1.0) ## win-loss ratio

  """Entry point for launching an IPython kernel.


inf