# Quickstart Guide

OAZ has three games built-in: `Bandits`, `TicTacToe` and `ConnectFour`. Let's create an instance of the `TicTacToe` game.

In [None]:
from pyoaz.games.tic_tac_toe import TicTacToe
from pyoaz.games.tic_tac_toe.viz import view_board

In [None]:
game = TicTacToe()

Let's play a few moves. Moves are integers in $\{0, 1, \ldots, 8\}$. To place a token on row $i$ column $j$ we should play move $i + 3j$.

In [None]:
game.play_move(0) # Player 1's first move
game.play_move(4) # Player 2's first move
game.play_move(8) # Player 1's second move
game.play_move(2) # Player 2's second move

In [None]:
view_board(game.board)

## Monte Carlo Tree Search

Let us use Monte Carlo Tree Search to search to find the best move...

In [None]:
from pyoaz.search import Search, select_best_move_by_visit_count
from pyoaz.selection import UCTSelector
from pyoaz.evaluator.simulation_evaluator import SimulationEvaluator
from pyoaz.thread_pool import ThreadPool

In [None]:
thread_pool = ThreadPool(n_workers=1) # Set to the number of CPU cores on your machine
evaluator = SimulationEvaluator(thread_pool=thread_pool)
selector = UCTSelector()

In [None]:
search = Search(
    game=game,
    selector=selector,
    thread_pool=thread_pool,
    evaluator=evaluator,
    n_iterations=100_000
)

In [None]:
best_move = select_best_move_by_visit_count(search)

In [None]:
print(best_move)

In [None]:
game.play_move(best_move)

In [None]:
view_board(game.board)

## Train an Alpha Zero agent

First, create a neural network model. We use a convenience function to create a resnet model, but you could also implement your own architecture.

In [None]:
from pyoaz.models import create_tic_tac_toe_model
from pyoaz.utils import get_keras_model_node_names
import tensorflow.compat.v1 as tf
from tensorflow.compat.v1 import disable_v2_behavior
from tensorflow.compat.v1.keras import backend as K

In [None]:
disable_v2_behavior() # OAZ is not yet compatible with eager exectution mode

In [None]:
keras_model = create_tic_tac_toe_model(depth=3)
keras_model.compile(
    loss={
        "policy": "categorical_crossentropy",
        "value": "mean_squared_error",
    },
    optimizer=tf.keras.optimizers.SGD()
)

In [None]:
K.get_session().run(tf.global_variables_initializer())

The input node name should be `input`, the value head `value` and the policy head `policy`. However, if the TensorFlow graph is not empty, they may have been renamed. The next line is used to fetch node names.

In [None]:
input_node_name, value_node_name, policy_node_name = get_keras_model_node_names(keras_model)

In [None]:
from pyoaz.self_play import SelfPlay
import numpy as np

Now let's implement a (simplified) Alpha Zero training loop. Use the `Trainer` class from `pyoaz.training.trainer` for a more sophisticated implementation

In [None]:
for i in range(50):
    self_play = SelfPlay(
        game = TicTacToe, # Class object used to create 
        n_tree_workers=1, # Number of threads to simultaneously work on an MCTS tree
        n_games_per_worker=100, # Number of games to play per Python worker thread
        n_workers=4, # Size of the underlying ThreadPool object
        n_threads=8, # Size of Python threads to use
        evaluator_batch_size=4, # Number of game positions to accumulate before performing neural network inference
        epsilon=0.25, # Epsilon value (see Alpha Zero paper)
        alpha=1.0, # Alpha value (see Alpha Zero paper)
    )
    dataset = self_play.self_play(
        session=K.get_session(),
        input_node_name=input_node_name,
        value_node_name=value_node_name,
        policy_node_name=policy_node_name,
    )
    
    dataset_size = dataset["Boards"].shape[0]

    train_select = np.random.choice(
        a=[False, True], size=dataset_size, p=[0.1, 0.9]
    )
    validation_select = ~train_select

    train_boards = dataset["Boards"][train_select]
    train_policies = dataset["Policies"][train_select]
    train_values = dataset["Values"][train_select]

    validation_boards = dataset["Boards"][validation_select]
    validation_policies = dataset["Policies"][validation_select]
    validation_values = dataset["Values"][validation_select]
    
    validation_data = (
        validation_boards,
        {"value": validation_values, "policy": validation_policies},
    )

    keras_model.fit(
        train_boards,
        {"value": train_values, "policy": train_policies},
        validation_data=validation_data,
        batch_size=512,
        epochs=2,
        verbose=1,
    )
    

In [None]:
game = TicTacToe()
game.play_move(0) # Player 1's first move
game.play_move(4) # Player 2's first move
game.play_move(8) # Player 1's second move
game.play_move(2) # Player 2's second move

In [None]:
best_move = np.argmax(
    keras_model.predict(game.canonical_board.reshape((1, 3, 3, 2)))[0]
)

In [None]:
print(best_move) # The best move should still be 6...