# Build a Game Playing Agent - Knights Isolation
## James McGuigan
- Source: https://github.com/JamesMcGuigan/udacity-artificial-intelligence/blob/master/Projects/3_Adversarial%20Search/

# Unit Tests

In [1]:
from agents.AlphaBetaPlayer import AlphaBetaAreaPlayer, AlphaBetaPlayer, MinimaxPlayer
from agents.DistancePlayer import DistancePlayer, GreedyDistancePlayer
from agents.MCTS import MCTSMaximum, MCTSMaximumHeuristic, MCTSRandom, MCTSRandomHeuristic
from agents.UCT import UCTPlayer
from sample_players import GreedyPlayer, RandomPlayer
from run_backpropagation import run_backpropagation, TEST_AGENTS
from isolation import Agent, logger
from run_match_sync import play_sync
import time
import gc

%load_ext autoreload
%autoreload 2

In [2]:
! python3 -m unittest -v

test_get_action_midgame (tests.test_my_custom_player.CustomPlayerGetActionTest)
get_action() calls self.queue.put() before timeout in a game in progress ... ok
test_get_action_player1 (tests.test_my_custom_player.CustomPlayerGetActionTest)
get_action() calls self.queue.put() before timeout on an empty board ... ok
test_get_action_player2 (tests.test_my_custom_player.CustomPlayerGetActionTest)
get_action() calls self.queue.put() before timeout as player 2 ... ok
test_get_action_terminal (tests.test_my_custom_player.CustomPlayerGetActionTest)
get_action() calls self.queue.put() before timeout when the game is over ... ok
test_custom_player (tests.test_my_custom_player.CustomPlayerPlayTest)
CustomPlayer successfully completes a game against itself ... ok

----------------------------------------------------------------------
Ran 5 tests in 18.558s

OK


# Infrastructure

- [run_match_sync.py](run_match_sync.py)
- [run_backpropagation.py](run_backpropagation.py)
- [agents/DataSavePlayer.py](agents/DataSavePlayer.py)

I've rewritten `run_match.py` with a synchronous implementation in `run_match_sync.py` and `run_backpropagation.py`.
using signals rather than `multithreading.Pool()`, which allows for better profiling and a 2x performance speedup.

I have improved the CLI flags and logging ouput, with an extra `--verbose` flag which can
be used to ASCII print out the board state after each turn.

DataSavePlayer now handles loading and atexit autosaving of cls.data, whilst also gzipping contents

# Advanced Heuristic

- [agents/AlphaBetaPlayer.py](agents/AlphaBetaPlayer.py)

> What features of the game does your heuristic incorporate, and why do you think those features matter in evaluating states during search?

The main analogy is with the game of Go. The goal is to surround your opponent and capture a larger territory.

Recursively computing liberties several moves ahead shows the area of the board that the opponent
could potentially escape to.

At `depth=1` this `heuristic_area()` is equivalent to `#my_moves - #opponent_moves`

Early in the game the opponent effectively has access to the entire board, making this heuristic ineffective,
hence `max_area` is used in addition to depth to shortcircuit the computational cost of
expanding breadth-first search to all possible future moves on a mostly empty board when neither side is trapped.

This heuristic is endgame focused. It solves the simplified subproblem of local search without an adversary,
and provides an upper-bound estimate for the difference in maximum number of total moves each player has remaining.
It reaches maximum value for moves that trap a player within a self-contained section whilst leaving
the other a means of escape. The player in the smaller territory will run out of moves first.

The other major impact on the performance of this agent is the addition of alphabeta pruning
with the aggressive use of caching. This avoids the computation expense of recomputing previously explored subtrees, and by caching on `@classmethod`
parts of the cache can even be reused between runs. The net effect of this is to increase the
maximum depth of iterative deepening before the timeout to beyond what `MinimaxPlayer(depth=3)` can compute.

Improvements over previous submission:
- Iterative Deepening checks terminates early if action score is infinite
- Caching has been redone: now keyed on (player_id,state) and ignores depth/alpha/beta
- Alphabeta caching only stores infinite values, which is effectively an endgame table
- Caching is persisted to disk ./data/*.zi  p.pickle via DataSavePlayer base class
- Persisted caching means Alphabeta can pretrained with a higher timeout before the match

Performance improvements mean the algorithm now dominates the course Minimax implementation
- First percentage is the total winrate, second percentage is the rolling average winrate

In [3]:
!rm -f ./data/A*.pickle

Greedy vs Minimax (at depth 2) depends on who gets the first turn. At standard depth 3, it has a 100% winrate.

In [4]:
! python3 ./run_backpropagation.py  -a GREEDY -o MINIMAX --progress -r 70

---------------------------------------------------------------------- match_id:   70 |  72s |   0% ->   0% | Greedy vs Minimax


AlphaBeta gets 86%+ winrate vs both Greedy and Minimax

In [5]:
! python3 ./run_backpropagation.py  -a ALPHABETA -o GREEDY --progress -r 70

++++++++++++++++++++++-++++++-+-+++++++-+++++++-++++++++++++++++++++++ match_id:   70 | 290s |  93% ->  93% | AlphaBeta vs Greedy
wrote:  ./data/AlphaBetaPlayer.zip.pickle        |  0.8MB in  1.1s | entries: 55693


In [6]:
! python3 ./run_backpropagation.py  -a ALPHABETA -o MINIMAX --progress -r 70

loaded: ./data/AlphaBetaPlayer.zip.pickle        |  0.9MB in  0.3s | entries: 55693
+++-+-+-+---+++++++++++-++--+++-++++++++--+++-+++++-+-+++++++++-+++++- match_id:   70 | 304s |  76% ->  79% | AlphaBeta vs Minimax
wrote:  ./data/AlphaBetaPlayer.zip.pickle        |  1.6MB in  2.3s | entries: 107152


AlphaBetaAreaPlayer gets near 100% winrate vs Greedy and Minimax, plus 70% vs AlphaBetaPlayer

In [7]:
! python3 ./run_backpropagation.py  -a AREA -o GREEDY --progress -r 70

++++++++++++++++++++++++++++++-+++++++++++++++-+++++++++++++++++++++++ match_id:   70 | 246s |  97% ->  97% | AlphaBeta Area vs Greedy
wrote:  ./data/AlphaBetaAreaPlayer.zip.pickle    |  5.8MB in  7.4s | entries: 368483


In [8]:
! python3 ./run_backpropagation.py  -a AREA -o MINIMAX --progress -r 70

loaded: ./data/AlphaBetaAreaPlayer.zip.pickle    |  5.9MB in  2.0s | entries: 368483
+++++++++++++++++++++++++++++++++++-++++++++++++++++++++++++-+++++++++ match_id:   70 | 327s |  97% ->  96% | AlphaBeta Area vs Minimax
wrote:  ./data/AlphaBetaAreaPlayer.zip.pickle    | 13.8MB in 18.1s | entries: 873487


In [9]:
! python3 ./run_backpropagation.py  -a AREA -o ALPHABETA --progress -r 70

loaded: ./data/AlphaBetaPlayer.zip.pickle        |  1.6MB in  0.5s | entries: 107152
loaded: ./data/AlphaBetaAreaPlayer.zip.pickle    | 13.8MB in  6.2s | entries: 873487
++++--+-+-+-+--+++-++-+-+-+-+-----+---+++-+-+++-++-+------+++++++++++- match_id:   70 | 623s |  56% ->  58% | AlphaBeta Area vs AlphaBeta
wrote:  ./data/AlphaBetaPlayer.zip.pickle        |  4.0MB in  4.5s | entries: 261045
wrote:  ./data/AlphaBetaAreaPlayer.zip.pickle    | 22.9MB in 25.2s | entries: 1452080


# Depth Analysis

- MiniMax course default is depth 3
- AlphaBeta
    - has trouble getting past depth 1 on the first turn
    - can get to depth 5 on the second turn
    - remains at depth 4-6 for the early game
    - grows to depth 6-9 during the mid-game
    - expands out to max depth of 14 before finding a -inf lose condition
- AlphaBetaArea
    - is usually depth-2 compared to AlphaBeta
    - the area heuristic effectively adds an extra 4 layers of hidden depth
    - so for the same CPU cost, AlphaBetaArea has a depth 2 advantage
    - finds the inf win condition before AlphaBeta and at a lower depth

In [10]:
AlphaBetaPlayer.verbose_depth     = True
AlphaBetaAreaPlayer.verbose_depth = True

play_sync( ( Agent(AlphaBetaPlayer,'AlphaBeta'), Agent(AlphaBetaAreaPlayer,'AlphaBetaArea') ) )
pass

loaded: ./data/AlphaBetaPlayer.zip.pickle        |  3.1MB in  1.1s | entries: 203452
loaded: ./data/AlphaBetaAreaPlayer.zip.pickle    | 18.0MB in  6.1s | entries: 1142325

AlphaBetaPlayer      | depth: 1 
AlphaBetaAreaPlayer  | depth: 1 2 
AlphaBetaPlayer      | depth: 1 2 3 4 
AlphaBetaAreaPlayer  | depth: 1 2 3 4 
AlphaBetaPlayer      | depth: 1 2 3 4 5 
AlphaBetaAreaPlayer  | depth: 1 2 3 4 5 
AlphaBetaPlayer      | depth: 1 2 3 4 5 
AlphaBetaAreaPlayer  | depth: 1 2 3 4 5 
AlphaBetaPlayer      | depth: 1 2 3 4 
AlphaBetaAreaPlayer  | depth: 1 2 3 
AlphaBetaPlayer      | depth: 1 2 3 4 5 
AlphaBetaAreaPlayer  | depth: 1 2 3 
AlphaBetaPlayer      | depth: 1 2 3 4 
AlphaBetaAreaPlayer  | depth: 1 2 3 
AlphaBetaPlayer      | depth: 1 2 3 4 5 
AlphaBetaAreaPlayer  | depth: 1 2 3 
AlphaBetaPlayer      | depth: 1 2 3 4 5 
AlphaBetaAreaPlayer  | depth: 1 2 3 
AlphaBetaPlayer      | depth: 1 2 3 4 5 
AlphaBetaAreaPlayer  | depth: 1 2 3 
AlphaBetaPlayer      | depth: 1 2 3 4 5 
AlphaBetaArea

> If this can be done, then it turns out that alpha–beta needs to examine only O(bm/2) nodes to pick the best move, instead of O(bm) for minimax. This means that the effective branching factor becomes √b instead of b—for chess, about 6 instead of 35. Put another way, alpha–beta can solve a tree roughly twice as deep as minimax in the same amount of time.
> - Artificial Intelligence: A Modern Approach (p160)

In practice, AlphaBeta gets an smaller depth lead over MiniMax with an average of 1 and range of 0-2

In [32]:
# don't load existing caches
class AlphaBeta(AlphaBetaPlayer):
    search_fn     = 'alphabeta'
    heuristic_fn  = 'heuristic_liberties'  # 'heuristic_liberties' | 'heuristic_area'
    verbose_depth = True
    def load(cls): pass
    def save(cls): pass

class MiniMax(AlphaBetaPlayer):
    search_fn     = 'minimax'
    heuristic_fn  = 'heuristic_liberties'  # 'heuristic_liberties' | 'heuristic_area'
    verbose_depth = True
    def load(cls): pass
    def save(cls): pass

play_sync( ( Agent(MiniMax,'MiniMax'), Agent(AlphaBeta,'AlphaBeta') ) )
pass


MiniMax              | depth: 1 
AlphaBeta            | depth: 1 2 
MiniMax              | depth: 1 2 3 4 
AlphaBeta            | depth: 1 2 3 4 5 
MiniMax              | depth: 1 2 3 4 
AlphaBeta            | depth: 1 2 3 4 5 
MiniMax              | depth: 1 2 3 4 5 
AlphaBeta            | depth: 1 2 3 4 5 
MiniMax              | depth: 1 2 3 4 5 
AlphaBeta            | depth: 1 2 3 4 5 6 
MiniMax              | depth: 1 2 3 4 5 
AlphaBeta            | depth: 1 2 3 4 5 6 
MiniMax              | depth: 1 2 3 4 5 
AlphaBeta            | depth: 1 2 3 4 5 6 
MiniMax              | depth: 1 2 3 4 5 
AlphaBeta            | depth: 1 2 3 4 5 
MiniMax              | depth: 1 2 3 4 
AlphaBeta            | depth: 1 2 3 4 5 
MiniMax              | depth: 1 2 3 4 
AlphaBeta            | depth: 1 2 3 4 5 
MiniMax              | depth: 1 2 3 4 
AlphaBeta            | depth: 1 2 3 4 5 6 
MiniMax              | depth: 1 2 3 4 5 6 
AlphaBeta            | depth: 1 2 3 4 5 6 7 
MiniMax              | de

# Monty Carlo Tree Search

- [agents/MCTS.py](agents/MCTS.py)
- [agents/UCT.py](agents/UCT.py)

I have implemented variations on Monty Carlo Tree Search.

## MCTSMaximum

is designed to be reinforcement learning agents,
to be trained by running repeatedly before the match. Initially they will make random moves.
After each match, the `.backpropergate()` function is called, which will build a record of
the win/loss ratio for each seen board position, and also compute the BestChild score
`(w/n + c√(ln N/n))` for each seen node. The agent is runtime fast in the sense that it only
needs to read from it's own cache to compute the max score of available actions, which assumes
the agent has seen this exact board position before during training,

## MCTSRandom

Is a variation inspired by the Ant Colony Optimization Algorithm.
It removes the exploration term, and instead of selecting the action with the maximum score,
it uses the difference in score as random weighting factor to stochastically select the next action.

## MCTSRandomHeuristic + MCTSMaximumHeuristic

These are variants that add the liberties heuristic to the BestChild score


## Results

- MCTSRandom wins against MCTSMaximum by 68%

In [25]:
! rm -f ./data/MCTS*.pickle
! python3 ./run_backpropagation.py  -a MCR  -o MCM  -r 100 -l 100

 match_id:  100 |   1s |  70% ->  70% | MCTS Random vs MCTS Maximum
wrote:  ./data/MCTSMaximum.zip.pickle            |  0.1MB in  0.1s | entries: 4354
wrote:  ./data/MCTSRandom.zip.pickle             |  0.1MB in  0.1s | entries: 4354


MCTSRandom can be shown to be trainable against a Greedy opponent

In [26]:
! python3 ./run_backpropagation.py  -a MCR -o GREEDY -r 1000  -l 100

loaded: ./data/MCTSRandom.zip.pickle             |  0.1MB in  0.0s | entries: 4354
 match_id:  100 |   1s |  39% ->  37% | MCTS Random vs Greedy
 match_id:  200 |   3s |  44% ->  46% | MCTS Random vs Greedy
 match_id:  300 |   4s |  47% ->  50% | MCTS Random vs Greedy
 match_id:  400 |   5s |  45% ->  46% | MCTS Random vs Greedy
 match_id:  500 |   7s |  45% ->  45% | MCTS Random vs Greedy
 match_id:  600 |   8s |  46% ->  47% | MCTS Random vs Greedy
 match_id:  700 |   9s |  48% ->  49% | MCTS Random vs Greedy
 match_id:  800 |  11s |  48% ->  49% | MCTS Random vs Greedy
 match_id:  900 |  12s |  48% ->  50% | MCTS Random vs Greedy
 match_id: 1000 |  13s |  48% ->  49% | MCTS Random vs Greedy
wrote:  ./data/MCTSRandom.zip.pickle             |  0.6MB in  0.8s | entries: 29185


MCTSRandom has a curious dip in performance against Minimax.
- After training against Greedy, its performance is 17%
- Repeatedly losing seems to amplify exploration, decreasing to a minimum 11% winrate
- It eventually finds better countermoves, training itself back upto 18% winrate

In [27]:
! python3 ./run_backpropagation.py  -a MCR -o MINIMAX -r 1000

loaded: ./data/MCTSRandom.zip.pickle             |  0.6MB in  0.1s | entries: 29185
 match_id:  100 |  32s |  17% ->  17% | MCTS Random vs Minimax
 match_id:  200 |  63s |  13% ->  11% | MCTS Random vs Minimax
 match_id:  300 |  92s |  13% ->  13% | MCTS Random vs Minimax
 match_id:  400 | 122s |  14% ->  14% | MCTS Random vs Minimax
 match_id:  500 | 153s |  14% ->  14% | MCTS Random vs Minimax
 match_id:  600 | 186s |  16% ->  17% | MCTS Random vs Minimax
 match_id:  700 | 216s |  16% ->  18% | MCTS Random vs Minimax
 match_id:  800 | 246s |  17% ->  19% | MCTS Random vs Minimax
 match_id:  900 | 275s |  17% ->  18% | MCTS Random vs Minimax
 match_id: 1000 | 303s |  17% ->  18% | MCTS Random vs Minimax
wrote:  ./data/MCTSRandom.zip.pickle             |  1.3MB in  1.8s | entries: 64980


It has a hard time against AlphaBeta, with only a 4-6% winrate

In [28]:
! python3 ./run_backpropagation.py  -a MCR -o ALPHABETA -r 250 -l 50

loaded: ./data/MCTSRandom.zip.pickle             |  1.3MB in  0.3s | entries: 64980
 match_id:   50 | 192s |   4% ->   7% | MCTS Random vs AlphaBeta
 match_id:  100 | 377s |   6% ->   7% | MCTS Random vs AlphaBeta
 match_id:  150 | 570s |   6% ->   6% | MCTS Random vs AlphaBeta
 match_id:  200 | 762s |   6% ->   5% | MCTS Random vs AlphaBeta
 match_id:  250 | 957s |   6% ->   6% | MCTS Random vs AlphaBeta
wrote:  ./data/AlphaBetaPlayer.zip.pickle        |  4.7MB in  5.3s | entries: 308358
wrote:  ./data/MCTSRandom.zip.pickle             |  1.5MB in  2.1s | entries: 75992


What if we pretrain all the Monty Carlo agents against each other?

NOTE: Cross-training in a league is the method used by the Starcraft AlphaStar agent.

In [29]:
for agent in TEST_AGENTS.keys():
    if agent.startswith('MC'):
        TEST_AGENTS[agent].agent_class.verbose = False
        TEST_AGENTS[agent].agent_class.load()

for agent in TEST_AGENTS.keys():
    for opponent in TEST_AGENTS.keys():
        if agent.startswith('MC') and opponent.startswith('MC'):
            time.sleep(0.1)
            run_backpropagation({
                "agent":      agent,
                "opponent":   opponent,
                "rounds":     1000,
                "logging":    1000,
                "progress":   False,
                "time_limit": 0   # reduce freak TimeoutErrors
            })

for agent in TEST_AGENTS.keys():
    if agent.startswith('MC'):
        TEST_AGENTS[agent].agent_class.save()

 match_id: 1000 |   9s |  51% ->  50% | MCTS Maximum vs MCTS Maximum 2
 match_id: 1000 |   9s |  26% ->  26% | MCTS Maximum vs MCTS Random
 match_id: 1000 |  10s |  53% ->  51% | MCTS Maximum vs MCTS Maximum Heuristic
 match_id: 1000 |  20s |  31% ->  32% | MCTS Maximum vs MCTS Random Heuristic
 match_id: 1000 |  11s |  70% ->  72% | MCTS Random vs MCTS Maximum
 match_id: 1000 |  10s |  49% ->  49% | MCTS Random vs MCTS Random 2
 match_id: 1000 |  12s |  74% ->  74% | MCTS Random vs MCTS Maximum Heuristic
 match_id: 1000 |  14s |  53% ->  55% | MCTS Random vs MCTS Random Heuristic
 match_id: 1000 |  10s |  53% ->  51% | MCTS Maximum Heuristic vs MCTS Maximum
 match_id: 1000 |  12s |  24% ->  24% | MCTS Maximum Heuristic vs MCTS Random
 match_id: 1000 |  12s |  45% ->  46% | MCTS Maximum Heuristic vs MCTS Maximum Heuristic 2
 match_id: 1000 |  17s |  32% ->  32% | MCTS Maximum Heuristic vs MCTS Random Heuristic
 match_id: 1000 |  22s |  69% ->  70% | MCTS Random Heuristic vs MCTS Maximu

Then attempt a rematch against AlphaBeta, which results in a small but significant winrate improvement.

In [30]:
! python3 ./run_backpropagation.py  -a MCR -o ALPHABETA -r 250 -l 50

loaded: ./data/AlphaBetaPlayer.zip.pickle        |  4.7MB in  1.4s | entries: 308358
loaded: ./data/MCTSRandom.zip.pickle             |  7.2MB in  2.1s | entries: 369486
 match_id:   50 | 194s |   4% ->   5% | MCTS Random vs AlphaBeta
 match_id:  100 | 378s |   4% ->   4% | MCTS Random vs AlphaBeta
 match_id:  150 | 565s |   5% ->   5% | MCTS Random vs AlphaBeta
 match_id:  200 | 752s |   5% ->   5% | MCTS Random vs AlphaBeta
 match_id:  250 | 947s |   5% ->   5% | MCTS Random vs AlphaBeta
wrote:  ./data/AlphaBetaPlayer.zip.pickle        |  9.5MB in 12.0s | entries: 629184
wrote:  ./data/MCTSRandom.zip.pickle             |  7.4MB in 10.1s | entries: 380447


## UCTPlayer

UCTPlayer will use its 150ms of time to simulate MCTSMaximum from the current board position before returning an answer based on the current scores of current available actions.

In [31]:
! python3 ./run_backpropagation.py  -a UCT -o GREEDY -r 70 --progress

loaded: ./data/MCTSMaximum.zip.pickle            |  5.4MB in  1.5s | entries: 280851
loaded: ./data/MCTSMaximum.zip.pickle            |  5.4MB in  1.5s | entries: 280851
-+----++--+------------++--++----+-+--------+---+-+----+-------+--++-- match_id:   70 | 266s |  24% ->  24% | UCT vs Greedy
wrote:  ./data/MCTSMaximum.zip.pickle            |  5.7MB in  9.0s | entries: 295859


In [32]:
! python3 ./run_backpropagation.py  -a UCT -o MINIMAX -r 70 --progress

loaded: ./data/MCTSMaximum.zip.pickle            |  5.7MB in  2.0s | entries: 295859
loaded: ./data/MCTSMaximum.zip.pickle            |  5.7MB in  2.0s | entries: 295859
----------------------------------------+--------------------------+-- match_id:   70 | 282s |   3% ->   4% | UCT vs Minimax
wrote:  ./data/MCTSMaximum.zip.pickle            |  6.0MB in  8.3s | entries: 307631


In [33]:
! python3 ./run_backpropagation.py  -a UCT -o ALPHABETA -r 70 --progress

loaded: ./data/AlphaBetaPlayer.zip.pickle        |  9.5MB in  2.8s | entries: 629184
loaded: ./data/MCTSMaximum.zip.pickle            |  6.0MB in  1.8s | entries: 307631
loaded: ./data/MCTSMaximum.zip.pickle            |  6.0MB in  1.7s | entries: 307631
---------------+-----------------+-----------------------------------+ match_id:   70 | 529s |   4% ->   5% | UCT vs AlphaBeta
wrote:  ./data/MCTSMaximum.zip.pickle            |  6.2MB in  8.8s | entries: 319067
wrote:  ./data/AlphaBetaPlayer.zip.pickle        | 10.8MB in 16.7s | entries: 719376


# League Tables

Lets run every agent against every other agent and compare results
- AlphaBetaArea has a 100% winrate against everything except AlphaBeta and MCTS Random
- AlphaBetaArea vs AlphaBeta scores 60/40 both ways, which may depend on who has the bigger cache
- MCTS Random scored a maximum 90% vs Alphabeta and 24% vs AlphaBeta Area in reverse matchups,
- MCTS Random got beaten 100% Alphabeta and AlphaBeta later when in was in first position, this may be cache related

In [3]:
for agent in TEST_AGENTS.keys():
    for opponent in TEST_AGENTS.keys():
        try: TEST_AGENTS[agent].agent_class.load()
        except: pass

        time.sleep(0.1)
        TEST_AGENTS[agent   ].agent_class.verbose = False
        TEST_AGENTS[opponent].agent_class.verbose = False
        is_slow = any(
            name in TEST_AGENTS[agent].name + TEST_AGENTS[opponent].name
            for name in ['Alpha', 'Area', 'UCT', 'Minimax', 'Custom']
        )
        run_backpropagation({
            "agent":      agent,
            "opponent":   opponent,
            "time_limit": 150,
            "rounds":     10  if is_slow else 100,
            "progress":   False,
            "exceptions": False,
        })
    print()

for agent in TEST_AGENTS.keys():
    try: TEST_AGENTS[agent].agent_class.save()
    except: pass

 match_id:  100 |   1s |  52% ->  54% | Random vs Random 2
 match_id:  100 |   1s |  17% ->  17% | Random vs Greedy
 match_id:  100 |   1s |  61% ->  62% | Random vs Distance
 match_id:  100 |   2s |  22% ->  21% | Random vs Greedy Distance
 match_id:   10 |   3s |   0% ->   0% | Random vs Minimax
 match_id:   10 |  41s |  30% ->  23% | Random vs AlphaBeta
 match_id:   10 |  42s |   0% ->   0% | Random vs AlphaBeta Area
 match_id:  100 |   2s |  70% ->  69% | Random vs MCTS Maximum
 match_id:  100 |   4s |  50% ->  48% | Random vs MCTS Random
 match_id:  100 |   2s |  70% ->  69% | Random vs MCTS Maximum Heuristic
 match_id:  100 |   1s |  58% ->  62% | Random vs MCTS Random Heuristic
 match_id:   10 |  35s |   0% ->   0% | Random vs UCT
 match_id:   10 |  35s |   0% ->   0% | Random vs Custom TestAgent

 match_id:  100 |   1s |  75% ->  75% | Greedy vs Random
 match_id:  100 |   1s |  50% ->  50% | Greedy vs Greedy 2
 match_id:  100 |   2s |  50% ->  50% | Greedy vs Distance
 match_id

 match_id:   10 |  32s |   0% ->   0% | MCTS Maximum Heuristic vs AlphaBeta Area
 match_id:  100 |   2s |  48% ->  46% | MCTS Maximum Heuristic vs MCTS Maximum
 match_id:  100 |   2s |  35% ->  33% | MCTS Maximum Heuristic vs MCTS Random
 match_id:  100 |   2s |  46% ->  50% | MCTS Maximum Heuristic vs MCTS Maximum Heuristic 2
 match_id:  100 |   2s |  35% ->  36% | MCTS Maximum Heuristic vs MCTS Random Heuristic
 match_id:   10 |  32s |  40% ->  52% | MCTS Maximum Heuristic vs UCT
 match_id:   10 |  31s |   0% ->   0% | MCTS Maximum Heuristic vs Custom TestAgent



# Opening Book

## Monty Carlo Method

We can derive an opening book by looking at the cached scores in MCTSRandom
- score priorities exploration of 100% winrate nodes that have only been explored once
    - sorting by score actually suggested the best countermove was the corner square
    - sort by wins * wins/count to find the best move

- Monty Carlo suggests the best opening strategy is:
    - defend the corner on the 3,3 point
    - attack the knights blind spot on the 4,3 point
        - knight requires three turns to move one space sideways
    - the first player is attempting to move into position to directly attack their opponent
    - the second player is attempting to remain in the blind spot adjcent to the first player
    - the pieces only diverge after move 8

In [4]:
from agents.MCTS import MCTSRandom
from isolation.isolation import Isolation, DebugState
MCTSRandom.load()

def best_response(parent_state=Isolation(), verbose=True):
    actions      = parent_state.actions()
    child_states = [ parent_state.result(action) for action in actions ]
    child_states = [ child for child in child_states if child in MCTSRandom.data ]
    records      = [ MCTSRandom.data[child] for child in child_states  ]
    best_state, best_record = max(zip(child_states, records),
                                  key=lambda key_value: key_value[1].wins * key_value[1].wins/key_value[1].count)
    best_move    = DebugState.ind2xy(best_state.locs[parent_state.player()])
    if verbose:
        print('Best move '+str(best_state.ply_count)+' is:')
        print(best_move, best_record)
        print( DebugState.from_state(best_state).bitboard_string )
        print( DebugState.from_state(best_state) )
    return best_move, best_record, best_state

best_state = Isolation()
for i in range(8):
    try:
        best_move, best_record, best_state = best_response(best_state)
    except: pass


loaded: ./data/MCTSRandom.zip.pickle             |  7.4MB in  2.6s | entries: 380447
Best move 1 is:
(2, 2) MCTSRecord(wins=977, count=1487, score=0.6567967698519516)
1111111111100111111111110011111111111001111111111100111111111110011111111111001111111101100111111111110011111111111

+ - + - + - + - + - + - + - + - + - + - + - +
|   |   |   |   |   |   |   |   |   |   |   |
+ - + - + - + - + - + - + - + - + - + - + - +
|   |   |   |   |   |   |   |   |   |   |   |
+ - + - + - + - + - + - + - + - + - + - + - +
|   |   |   |   |   |   |   |   |   |   |   |
+ - + - + - + - + - + - + - + - + - + - + - +
|   |   |   |   |   |   |   |   |   |   |   |
+ - + - + - + - + - + - + - + - + - + - + - +
|   |   |   |   |   |   |   |   |   |   |   |
+ - + - + - + - + - + - + - + - + - + - + - +
|   |   |   |   |   |   |   |   |   |   |   |
+ - + - + - + - + - + - + - + - + - + - + - +
|   |   |   |   |   |   |   |   | 1 |   |   |
+ - + - + - + - + - + - + - + - + - + - + - +
|   |   |   |   |   |   | 

## AlphaBetaArea Method

Alteratively we could see how AlphaBetaArea plays when given different depths to search during the opening

Opening Move:
- Monty Carlo: likes the 3,3 corner position
- Depth 1: main diagonal, one off from center
- Depth 2: short centerline, one off from center
- Depth 3: center,3 position
- Depth 4: corner 4,4 position
- Depth 5: long-edge but one off from center

Subsequent Moves:
- Usually ends up in either an attacking or adjacent square
- Except at depth 4, when p1 approaches diagonally, p2 jumps away

In [5]:
AlphaBetaAreaPlayer.load()
class AlphaBetaOpening(AlphaBetaAreaPlayer):
    data             = AlphaBetaAreaPlayer.data  # import cache
    verbose_depth    = True
for depth in range(1,6):
    AlphaBetaOpening.search_max_depth = depth
    play_sync(
        agents=(
            Agent(AlphaBetaOpening, 'AlphaBetaOpening'),
            Agent(AlphaBetaOpening, 'AlphaBetaOpening'),
        ),
        verbose=True,
        time_limit=0,
        max_moves=4,
    )


AlphaBetaOpening     | depth: 1 
AlphaBetaOpening     | depth: 1 
match: 0 | move: 2 | 0.06s | AlphaBetaOpening(1) => (6, 4)

+ - + - + - + - + - + - + - + - + - + - + - +
|   |   |   |   |   |   |   |   |   |   |   |
+ - + - + - + - + - + - + - + - + - + - + - +
|   |   |   |   |   |   |   |   |   |   |   |
+ - + - + - + - + - + - + - + - + - + - + - +
|   |   |   |   |   |   |   |   |   |   |   |
+ - + - + - + - + - + - + - + - + - + - + - +
|   |   |   |   |   |   |   |   |   |   |   |
+ - + - + - + - + - + - + - + - + - + - + - +
|   |   |   |   | 2 |   |   |   |   |   |   |
+ - + - + - + - + - + - + - + - + - + - + - +
|   |   |   |   |   |   | 1 |   |   |   |   |
+ - + - + - + - + - + - + - + - + - + - + - +
|   |   |   |   |   |   |   |   |   |   |   |
+ - + - + - + - + - + - + - + - + - + - + - +
|   |   |   |   |   |   |   |   |   |   |   |
+ - + - + - + - + - + - + - + - + - + - + - +
|   |   |   |   |   |   |   |   |   |   |   |
+ - + - + - + - + - + - + - + - + - + - + - +

AlphaBetaOpening     | depth: 1 2 3 4 5 
AlphaBetaOpening     | depth: 1 2 3 4 5 
match: 0 | move: 4 | 1.46s | AlphaBetaOpening(1) => (11, 0)

+ - + - + - + - + - + - + - + - + - + - + - +
|   |   |   |   |   |   |   |   |   |   |   |
+ - + - + - + - + - + - + - + - + - + - + - +
|   |   |   |   |   |   |   |   |   |   |   |
+ - + - + - + - + - + - + - + - + - + - + - +
|   |   |   |   |   |   |   |   |   |   |   |
+ - + - + - + - + - + - + - + - + - + - + - +
|   |   |   |   |   |   |   |   |   |   |   |
+ - + - + - + - + - + - + - + - + - + - + - +
|   |   |   |   |   |   |   |   |   |   |   |
+ - + - + - + - + - + - + - + - + - + - + - +
|   |   |   |   |   |   |   | 2 |   |   |   |
+ - + - + - + - + - + - + - + - + - + - + - +
|   |   |   |   |   | X |   |   |   |   |   |
+ - + - + - + - + - + - + - + - + - + - + - +
|   |   |   |   |   |   |   |   | 1 |   |   |
+ - + - + - + - + - + - + - + - + - + - + - +
|   |   |   |   |   |   | X |   |   |   |   |
+ - + - + - + - + - + - + - +

# Full Game (Final Section)

Here is a full game between AlphaBeta and AlphaBetaArea, with 1 second per move
- Curiously, AlphaBeta at depth 2 picks the Monty Carlo 3,3 opening position
- AlphaBetaArea(2) figures out forced checkmate in 26 on move 54 delivers it on move 65 (11 moves later)
- The checkmate strategy for player 2 involves forcing 1 into a trapped corner

In [18]:
AlphaBetaPlayer.verbose_depth     = True
AlphaBetaAreaPlayer.verbose_depth = True
play_sync(
    agents=(
        Agent(AlphaBetaPlayer,     'AlphaBeta'),
        Agent(AlphaBetaAreaPlayer, 'AlphaBetaArea'),
    ),
    verbose=True,
    verbose_depth=True,
    time_limit=1*1000,
)
AlphaBetaPlayer.verbose_depth     = False
AlphaBetaAreaPlayer.verbose_depth = False



AlphaBetaPlayer      | depth: 1 2 
AlphaBetaAreaPlayer  | depth: 1 2 3 
match: 0 | move: 2 | 1.00s | AlphaBetaAreaPlayer(1) => (6, 2)

+ - + - + - + - + - + - + - + - + - + - + - +
|   |   |   |   |   |   |   |   |   |   |   |
+ - + - + - + - + - + - + - + - + - + - + - +
|   |   |   |   |   |   |   |   |   |   |   |
+ - + - + - + - + - + - + - + - + - + - + - +
|   |   |   |   |   |   |   |   |   |   |   |
+ - + - + - + - + - + - + - + - + - + - + - +
|   |   |   |   |   |   |   |   |   |   |   |
+ - + - + - + - + - + - + - + - + - + - + - +
|   |   |   |   |   |   |   |   |   |   |   |
+ - + - + - + - + - + - + - + - + - + - + - +
|   |   |   |   |   |   |   |   |   |   |   |
+ - + - + - + - + - + - + - + - + - + - + - +
|   |   |   |   | 2 |   |   |   | 1 |   |   |
+ - + - + - + - + - + - + - + - + - + - + - +
|   |   |   |   |   |   |   |   |   |   |   |
+ - + - + - + - + - + - + - + - + - + - + - +
|   |   |   |   |   |   |   |   |   |   |   |
+ - + - + - + - + - + - + - + - + - 

AlphaBetaPlayer      | depth: 1 2 3 4 5 6 7 
AlphaBetaAreaPlayer  | depth: 1 2 3 4 5 
match: 0 | move: 18 | 1.00s | AlphaBetaAreaPlayer(1) => (11, -2)

+ - + - + - + - + - + - + - + - + - + - + - +
|   |   |   |   |   |   |   |   |   |   |   |
+ - + - + - + - + - + - + - + - + - + - + - +
|   |   |   |   |   |   |   |   |   | X |   |
+ - + - + - + - + - + - + - + - + - + - + - +
|   |   |   |   |   | X |   | X |   |   | X |
+ - + - + - + - + - + - + - + - + - + - + - +
|   |   |   | X |   |   |   | X | X | 2 | X |
+ - + - + - + - + - + - + - + - + - + - + - +
|   |   |   |   |   |   | X |   |   | X |   |
+ - + - + - + - + - + - + - + - + - + - + - +
|   |   | X |   |   |   |   |   | X | X |   |
+ - + - + - + - + - + - + - + - + - + - + - +
|   |   |   |   | X |   |   |   | X |   |   |
+ - + - + - + - + - + - + - + - + - + - + - +
|   |   |   |   |   |   |   |   | X |   |   |
+ - + - + - + - + - + - + - + - + - + - + - +
|   |   |   |   |   |   | 1 |   |   |   |   |
+ - + - + - + - + - 

AlphaBetaPlayer      | depth: 1 2 3 4 5 6 7 8 
AlphaBetaAreaPlayer  | depth: 1 2 3 4 5 
match: 0 | move: 34 | 1.11s | AlphaBetaAreaPlayer(1) => (1, 2)

+ - + - + - + - + - + - + - + - + - + - + - +
|   |   |   |   |   |   |   |   |   |   |   |
+ - + - + - + - + - + - + - + - + - + - + - +
|   |   |   |   |   |   |   |   |   | X |   |
+ - + - + - + - + - + - + - + - + - + - + - +
|   |   | 2 |   | X | X |   | X |   |   | X |
+ - + - + - + - + - + - + - + - + - + - + - +
|   |   | X | X |   |   |   | X | X | X | X |
+ - + - + - + - + - + - + - + - + - + - + - +
|   |   |   | X | X |   | X | X | X | X |   |
+ - + - + - + - + - + - + - + - + - + - + - +
| X |   | X |   |   | X | X |   | X | X |   |
+ - + - + - + - + - + - + - + - + - + - + - +
|   |   | 1 | X | X |   |   | X | X |   |   |
+ - + - + - + - + - + - + - + - + - + - + - +
|   | X |   |   |   | X |   |   | X |   |   |
+ - + - + - + - + - + - + - + - + - + - + - +
|   |   |   | X |   |   | X |   |   |   |   |
+ - + - + - + - + - 

AlphaBetaPlayer      | depth: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 
AlphaBetaAreaPlayer  | depth: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 
match: 0 | move: 50 | 1.00s | AlphaBetaAreaPlayer(1) => (1, 2)

+ - + - + - + - + - + - + - + - + - + - + - +
|   |   |   |   |   |   |   | X |   | X |   |
+ - + - + - + - + - + - + - + - + - + - + - +
|   |   |   |   |   |   |   | 1 |   | X |   |
+ - + - + - + - + - + - + - + - + - + - + - +
|   |   | X |   | X | X | X | X | X |   | X |
+ - + - + - + - + - + - + - + - + - + - + - +
|   |   | X | X | X |   |   | X | X | X | X |
+ - + - + - + - + - + - + - + - + - + - + - +
|   | X | X | X | X | X | X | X | X | X |   |
+ - + - + - + - + - + - + - + - + - + - + - +
| X | 2 | X | X | X | X | X | X | X | X |   |
+ - + - + - + - + - + - + - + - + - + - + - +
|   |   | X | X | X |   |   | X | X |   |   |
+ - + - + - + - + - + - + - + - + - + - + - +
|   | X | X |   |   | X | X |   | X |   |   |
+ - + - + - + - + - + - + - + - + - + - + - +
|   |   |  

AlphaBetaPlayer      | depth: 1 -inf 
match: 0 | move: 65 | 0.00s | AlphaBetaPlayer(0) => (11, 0)

+ - + - + - + - + - + - + - + - + - + - + - +
|   |   | X |   | X | X | X | X | X | X | 1 |
+ - + - + - + - + - + - + - + - + - + - + - +
|   | X |   | X | X |   | X | X | X | X |   |
+ - + - + - + - + - + - + - + - + - + - + - +
|   | X | X | X | X | X | X | X | X | 2 | X |
+ - + - + - + - + - + - + - + - + - + - + - +
| X |   | X | X | X |   |   | X | X | X | X |
+ - + - + - + - + - + - + - + - + - + - + - +
|   | X | X | X | X | X | X | X | X | X |   |
+ - + - + - + - + - + - + - + - + - + - + - +
| X | X | X | X | X | X | X | X | X | X |   |
+ - + - + - + - + - + - + - + - + - + - + - +
|   |   | X | X | X |   |   | X | X |   |   |
+ - + - + - + - + - + - + - + - + - + - + - +
|   | X | X |   |   | X | X |   | X |   |   |
+ - + - + - + - + - + - + - + - + - + - + - +
|   |   |   | X | X |   | X |   |   |   |   |
+ - + - + - + - + - + - + - + - + - + - + - +

