<p style="font-family: Arial; font-size:3.75em;color:purple; font-style:bold"><br>
Tic-Tac-Toe
</p><br>
<strong>3x3 version of the <a href='https://en.wikipedia.org/wiki/Hex_(board_game)'>Hex game</a> solved using the minimax method</strong>

Minimax works:
- on games for 2 players
- when a player's win is another player's loss: 0-sum games
- when there is a complete information on the possible outcomes
- where the goal of the game is to minimize loss (and we assume that the other player is trying to maximize gain)

<p>Minimax can either be the brute force method of calculating every possible outcome, or improved using alpha-beta pruning (which eliminates sqrt(n) where n is the number of combinations for the brute-force method for the negamax solution, or n^0.75 otherwise)</p>
<p>We calculate the probability to win for the first player only.</p>

[@vbipin]('https://github.com/vbipin')
The minimax should contain the following functions:
* calculate the children of the current state (<strong>child_node</strong>): or for more complex games with some heuristics, divide into action(state) and move(state,action)
* evaluate if the current state is terminal: either when reaching maximum depth or in case of win/loss (<strong>calc_value()</strong>)
* utility function (<strong>calc_score()</strong>) that interprets the terminal state in terms of value for the player (typically, 1, -1 or 0). To assign different values to paths based on how fast they reach a terminal state, associate to each node a weight corresponding to the distance to the terminal state (a win becomes +1+depth_number, a losss -1-depth_number)
* the minimax is a recursive function calculating the utility function for all the children of a given state, for each state. Since players alternate to play each state, the utility for a given state is reversed from the point of view of the original player.

### Calculate the win/loss function
<a href='http://ohboyigettodomath.blogspot.com/2015/05/tic-tac-toe-as-magic-square.html'>Magic square trick</a>

In [1]:
import numpy as np

In [336]:
class State():
    
    def __init__(self,state):
        self.grid = state
        self.depth = np.sum(np.isin(self.grid,'.'))
        self.itemindex = np.where(np.asanyarray(self.grid) == '.')[0]
        self.id = ''.join([str(i) for i in self.grid])
        self.player = 'X'
        self.next_player = 'O'
        if self.depth % 2 == 0:
            self.player = 'O'
            self.next_player = 'X'
            
    def calc_value(self): 
        numbered_grid = [2, 7, 6, 9, 5, 1, 4, 3, 8]
        self.nb_player = [numbered_grid[k] if x == self.player else 0 for k,x in enumerate(self.grid)]
        self.score_cols = max([sum(self.nb_player[x:x+3]) for x in [0,3,6]])
        self.score_rows=max([sum(self.nb_player[x::3]) for x in range(0,3)])
        self.score_diagonal1 = self.nb_player[0] + self.nb_player[4] + self.nb_player[8]
        self.score_diagonal2 = self.nb_player[2] + self.nb_player[4] + self.nb_player[6]
        return max(self.score_cols,self.score_rows,self.score_diagonal1,self.score_diagonal2)
    
    def calc_score(self):
        #returns 1 for the maxplayer, -1 for the minplayer, 0 if nobody wins at this state
        if self.calc_value() == 15:
            if self.player == 'O':
                return self.depth + 1
            else:
                return -self.depth - 1
        return 0
    
    def child_node(self):
        #contains a list of all the possible grids at the next move for a player
        child_node = []
        child_grid = self.grid.copy()   
        #replace each remaining 0 by 'player' one by one and append the resulting grid to the child_node list
        for child in self.itemindex:
            child_grid[child] = self.next_player
            child_node.append(child_grid)
            child_grid = self.grid.copy()
        return child_node
    
    def terminal(self):
        #evaluate if terminal state (only possible once the first player has played at least 3 times) or a player wins
        return self.depth<6 and self.calc_score()!=0

In [337]:
#five=State(np.array([['O','.','X'], ['O','X','.'], ['.','.','.']]))
five=State(['O','.','X','O','X','.', '.','.','.'])
four=State(['O','.','X','O','X','.','O','.','.'])
print(five.player,five.calc_value(),five.calc_score())
print(four.player,four.calc_value(),four.calc_score())
four.child_node()

X 11 0
O 15 5


[['O', 'X', 'X', 'O', 'X', '.', 'O', '.', '.'],
 ['O', '.', 'X', 'O', 'X', 'X', 'O', '.', '.'],
 ['O', '.', 'X', 'O', 'X', '.', 'O', 'X', '.'],
 ['O', '.', 'X', 'O', 'X', '.', 'O', '.', 'X']]

### Minimax

In [370]:
#the game being symmetrical around zero, we can apply the negamax simplification
def minimax(state,tree):
    value=0
    if state.depth==0 or state.terminal():
        tree[state.id]=(state.depth,value)
        return state.calc_score()
    else:
        if state.player == 'X':                
            for child in state.child_node():
                node=State(child)
                value=max(value,minimax(node,tree))
                tree[node.id]=(node.depth,value)
        else:
            for child in state.child_node():
                node=State(child)
                value=min(value,minimax(node,tree))
                tree[node.id]=(node.depth,value)
        return value

In [354]:
def main(state):
    print(np.array(state.grid).reshape(-1,3))
    tree={}
    chance=minimax(state,tree)
    print('Best score: ',chance)
    if chance!=0 and state.depth!=chance-1:
        print('Next move:')
        winning_move=next(key for key, value in tree.items() if value == max(tree.values()))
        print(np.array(list(winning_move)).reshape(-1,3))
        results=[x[1] for x in tree.values()]
        print('Moves resulting in a draw: ',"{:.1%}".format(results.count(0)/len(results)))
        print('Moves resulting in a loss: ',"{:.1%}".format(sum(i<0 for i in results)/len(results)))
        print('Moves resulting in a victory: ',"{:.1%}".format(sum(i>0 for i in results)/len(results)))
    return tree

In [375]:
starting=main(State(list('....O....')))

[['.' '.' '.']
 ['.' 'O' '.']
 ['.' '.' '.']]
Best score:  0


In [377]:
test1 = main(five)

[['O' '.' 'X']
 ['O' 'X' '.']
 ['.' '.' '.']]
Best score:  5
Next move:
[['O' '.' 'X']
 ['O' 'X' '.']
 ['O' '.' '.']]
Moves resulting in a draw:  31.3%
Moves resulting in a loss:  37.3%
Moves resulting in a victory:  31.3%


In [378]:
main(State(list('X..O.OOXX')))

[['X' '.' '.']
 ['O' '.' 'O']
 ['O' 'X' 'X']]
Best score:  3
Next move:
[['X' '.' '.']
 ['O' 'O' 'O']
 ['O' 'X' 'X']]
Moves resulting in a draw:  44.4%
Moves resulting in a loss:  22.2%
Moves resulting in a victory:  33.3%


{'XOXOOOOXX': (0, 1),
 'XOXO.OOXX': (1, 0),
 'XO.OXOOXX': (1, -2),
 'XO.O.OOXX': (2, 0),
 'XXOOOOOXX': (0, 1),
 'XXOO.OOXX': (1, 0),
 'X.OOXOOXX': (1, -2),
 'X.OO.OOXX': (2, 0),
 'X..OOOOXX': (2, 3)}