In [1]:
import logging
import random

import numpy as np

import gym
from gym.envs.board_game import go

logging.basicConfig(level=logging.INFO)

DependencyNotInstalled: No module named 'pachi_py'. (HINT: you may need to install the Go dependencies via "pip install gym[pachi]".)

# ACS2 in Go environment
## Playing Go with Pachi
OpenAI engine for playing Go uses *Pachi framework* (can be also used for other board games such as Weiqi or Baduk).

From the [docs](http://repo.or.cz/pachi.git/blob_plain/HEAD:/README) we can read that: 

> The default engine plays by Chinese rules and should be about 7d KGS strength on 9x9. On 19x19 (using e.g. six-way Intel i7), it can hold a solid KGS 2d rank.  When using a large cluster (64 machines, 20 cores each), it maintains KGS 3d to 4d and has won e.g. a 7-stone handicap game against Zhou Junxun 9p. 

> By default, Pachi currently uses the UCT engine that combines Monte Carlo approach with tree search; UCB1AMAF tree policy using the RAVE method is used for tree search, while the Moggy playout policy using 3x3 patterns and various tactical checks is used for the semi-random Monte Carlo playouts.  Large-scale board patterns are used in the tree search.

> At the same time, we keep trying a wide variety of other approaches and enhancements. Pachi is an active research platform and quite a few improvements have been already achieved. We rigorously play-test new features and enable them by default only when they give a universal strength boost.

Begin with initializing the environment into initial state and visualizing state into console.

In [None]:
env = gym.make('Go9x9-v0')

# Reset the state
state = env.reset()

# Render current state
print(env._state)

Pachi uses it's own way of describing state of the Go play. To represent action we need to convert it from symbolic symbol (such as *A8*) to internal numeric representation. OpenAi delivers function for mapping but in order to know what actions are possible we need to know mapping between all representations.

In [None]:
def moves_9x9():
    """
    Return a list of all available moves (such as A8) on 9x9 board.
    """
    rows = [chr(i) for i in range(ord('A'),ord('H')+1)]
    cols = list(range(1, 8 + 1))
    actions = []

    for row in rows:
        for col in cols:
            actions.append("{}{}".format(row, col))

    return actions

We can define a function performing action mapping from numbers from `0-64` to Pachi internal representation

In [None]:
def map_action(action_idx):
    """
    Maps action with given index (0, num_actions) to Pachi representation.
    """
    moves = [(move, go.str_to_action(env._state.board, move)) for move in moves_9x9()]
    return moves[action_idx][1]

Let's create a map of all available actions

In [None]:
action_map = {i: map_action(i) for i in range(64)}

To perform a move let's pick up a random opening action

In [None]:
action = random.choice(action_map)

print("Random Pachi action id: [{}]".format(action))

Now we can execute it getting back all possible information

In [None]:
state, reward, done, info = env.step(action)

print("Reward: [{}]".format(reward))
print("Done: [{}]".format(done))
print(info['state'])

## ACS2

In [None]:
# Import PyALCS code from local path
# import sys
# sys.path.append('/Users/khozzy/Projects/pyalcs')

from alcs import ACS2, ACS2Configuration

# Enable automatic module reload
%load_ext autoreload
%autoreload 2

Configure agent. In this case we have to specify the classifier length (state of the board), number of possible actions (all possible moves). The state returned by the environment is not suitable to be used in ACS2 classifier. Therefore a special mapper function is created.

In [None]:
def map_state(state):
    """
    Returns a flatten array of board state of given state.
    Black stones are represented as 'B', whites as 'W'
    """
    black_stones = state[0]
    white_stones = state[1] * 2
    
    board = (black_stones + white_stones).astype('str')
    board[board == '1'] = 'B'
    board[board == '2'] = 'W'
    
    return list(board.flatten())

We want our agent to focus more on exploration than exploitation (`epsilon` parameter).

In [None]:
CLASSIFIER_LENGTH=env._state.board.size ** 2
NUMBER_OF_POSSIBLE_ACTIONS=len(moves_9x9())

cfg = ACS2Configuration(
    classifier_length=CLASSIFIER_LENGTH,
    number_of_possible_actions=NUMBER_OF_POSSIBLE_ACTIONS,
    perception_mapper_fcn=map_state,
    action_mapping_dict=action_map,
    epsilon=0.7
)

print(cfg)

Now we can build the agent using previously defined configuration.

In [None]:
agent = ACS2(cfg)

Explore best action.

In [None]:
population, metrics = agent.explore(env, 10)

In [None]:
population