JavaScript HTML CSS
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.

README.md

UCThello icon UCThello

UCThello - a board game demonstrator with computer AI using Monte-Carlo Tree Search (MCTS) with UCB (Upper Confidence Bounds) applied to trees (UCT in short)

Keywords, Categories Monte-Carlo Tree Search (MCTS), Upper Confidence Bounds (UCB), UCB applied to trees (UCT), AI, 2-player board game, deterministic game with perfect information, JavaScript, ECMAScript, W3C WebWorker

Abstract

UCThello is a board game using Monte-Carlo Tree Search (MCTS) with UCB (Upper Confidence Bounds) applied to trees (UCT in short) for the computer player AI. The board game used for demonstration purposes of the UCT algorithm is close to a game named Othello depending on selected options. In fact it can be played depending on your configuration following the official tournament rules of the WOF - World Othello Federation - if intended [WOF14]. Other rule settings to play variants are available, too. Per design decision the playing strength is limited for pleasure and fun level. Thus it is kept at a moderate to quite strong level on purpose due to the target environment, device platform, and audience expectations. This is done e.g. by limitation of the maximum AI response time and using a single execution thread for AI only plus just a second independent execution thread for a responsive user interface to avoid battery drains if full CPU and GPU core support would be implemented leading to bad user experience. Other possible but at least currently postponed improvements could be done by simple usage of a well-known and available game opening book. Although such simple modifications could improve the playing strength these features are not implemented in the current version yet.

Othello is a derivative of the board game Reversi which can be played by UCThello as well. Reversi is claimed to be invented by either Lewis Waterman or John W. Mollett. Predecessor of Reversi created by Mollett is The game of Annexation, also called Annex back in 19th century.

Monte-Carlo Tree Search

The Monte-Carlo Tree Search (MCTS in short) represents an algorithms used to build a Search Tree interatively by successively adding nodes according to traversing of nodes and simulations in the problem domain. If the problem domain is a game then the nodes can represent moves according to the game rules. Traversing nodes follows a Selection Strategy. Simulations are often called playouts, too. The different nodes inside the simulated paths get statistics reflecting ratios of win and loss related to total amount of simulations. Assumption is that with higher total amount of simulations the confidence in the statistics gets high enough and allows to select quality nodes or moves. Such that the idea is to retrieve the acceptable next node or move with optimal ratio then.

The iterative MCTS algorithm is modelled to perform four main states typically called

  • Selection,
  • Expansion,
  • Simulation, and
  • Backpropagation. See [Cha10] & [CBSS08]

In UCThello the related code fragment for this loop is close to

Uct.prototype.getActionInfo = function ( board, maxIterations, verbose ) {
  var root = new UctNode(null, board, null);
  for(var iterations=0; iterations<maxIterations; iterations+=1) {
    var node = root;
    var variantBoard = board.copy();
    /* Selection */
    ...
    /* Expansion */
    ...
    /* Simulation */
    ...
    /* Backpropagation */
    ...
  }
  return { action : root.mostVisitedChild().action };
};

On a given board the most visited and therefore best information describing an action according to the rules performed by the current player shall be determined.

Selection

First step or state in an MCTS algorithm iteration is the Selection. Objective of the Selection is to retrieve a path beginning at the root node towards a selected leaf node from the search tree. The Search Tree stays fixed inside the Selection state. It grows in a later state of the algorithm by appending more nodes on each iteration of the MCTS. Only exception is when a selected path has a final leaf node that is a terminal node. A terminal node simply is a move representation of an end of game situation according to the rules. The root node represents the current game or problem domain situation. To traverse the search tree from the root node towards the leaf nodes simply means to follow a possible predicted variant of game play.

The objective of the Selection Strategy is to branch the intended search path in a balance of information exploration and exploitation. If a branch is selected following a search path branch already examined previously this is seen as an exploit. An exploit shall confirm the quality of an already examined node in terms of gaining higher statistical confidence. Higher statistical confidence does mean to have more reliable estimates. Exploration is performed by creating new nodes in later MCTS steps or alternatively search path branch selection of relatively rare traversed nodes. Nodes traversed in a low amount simply reflects a low reliability or statistical confidence. The border between exploit and explore is often seen as being soft and fluent.

Thus a selection of a child node to traverse next at each level of the already build search tree path is usually based on a quality value of the visited nodes in earlier iterations. An optimal Selection Strategy to best support the objective is unknown. One statistical approach called Upper Confidence Bounds (UCB) algorithm uses a logarithm based formula on collected quality values correlated to the nodes on the search path if applied to MCTS. The combination of MCTS and UCB called UCT (short for UCB applied to trees) is credited to [KS06]. Other approaches or additional supporting ideas for a Selection Strategy are presented and discussed e.g. in [CSUB06].

Besides the Selection Strategy in search path branch Selection an additional aspect is seen. To avoid a risk that any high quality node is unvisited that is located near the rood node already. To reach such a design goal a possible solution is to favor traversing any unexplored child node over following explored siblings. Widening the search tree is then favored over deepening. Critics could be that randomness of Monte-Carlo methods is reduced if applied.

In UCThello the select child step implements the UCT algorithm. The UCB related code is part of the UctNode.prototype.selectChild function.

Additionally UCThello implements to favor early Selection of a traversed node on any unexplored (or unexamined) child existing. Such an unexplored (or unexamined) child is preferred over continuing traversing any explored node.

var node = root;
var variantBoard = board.copy();
/* Selection */
while (node.unexamined.length == 0 && node.children.length > 0) {
  node = node.selectChild();
  variantBoard.doAction(node.action);
}

Expansion

The objective of the Expansion step is to add a new unexplored child of the node determined by the previous Selection.

If the node determined by the Selection is an inner node instead of a leaf node then this node has a combination of explored and unexplored children. Either way an unexplored child shall be added for the coming Simulation state. Only exception is that a leaf node has been reached representing a terminal node. In such a case no Expansion and Simulation is needed since a terminal node means that a end of game is implied at that node on the search path.

Sometimes you will find implementations where multiple Expansions take place on the Selection node. This simply means a set of child nodes is added at once then.

In UCThello exactly one node will be added unless a terminal node is reached and the list of remaining unexplored child nodes is determined before. To avoid any preferred order when getting a node from the set of remaining nodes or when a dependency from any parameter or state exists the returned node is selected randomly.

/* Selection */
...
/* Expansion */
if (node.unexamined.length > 0) {
  var j = Math.floor(Math.random() * node.unexamined.length);
  variantBoard.doAction(node.unexamined[j]);
  node = node.addChild(variantBoard, j);
}

Terminal nodes do not have any child nodes. So it is sufficient to check for the unexamined.length in case a terminal node has been selected.

Simulation

Now the objective of a Simulation is to playout a possible scenario starting from the newly expanded search tree leaf node. Simulation is performed until end of game is reached.

Mind the playout does not modify the expanded search tree leaf node. The fixed leaf node - respectively the correlated game state - is used as the base for the simulation only.

On each simulation step a player's action valid by the rules is performed on the created variant board. The variant board is used as a complete copy of the current board and game state. This is to avoid changes to the board and game state while following the full search path and simulation steps.

Instead of doing just a single playout alternatively several playouts could be started from the selected and expanded search tree leaf node. Idea behind this would be to save the run time needed for a possible choice of the same selection path in later iterations.

In UCThello a single playout is performed per iteration. The number of MCTS algorithm iterations equals the number of simulations then.

var variantBoard = board.copy();
/* Selection */
...
/* Expansion */
...
/* Simulation */
var actions = variantBoard.getActions();
while(actions.length > 0) {
  variantBoard.doAction(actions[Math.floor(Math.random() * actions.length)]);
  ...
  actions = variantBoard.getActions();
}

Backpropagation

Objective of the Backpropagation is to update the statistics of all nodes along the search tree path in reverse order until the root node is reached. The Simulation did not perform any changes on the search tree path. Since the search tree path is unchanged this means the eventually played or predicted result on the playout can be used to update statistics starting at the search tree path leaf node via the parent nodes until the root node is reached.

var node = root;
var variantBoard = board.copy();
/* Selection */
...
/* Expansion */
...
/* Simulation */
...
/* Backpropagation */
var result = variantBoard.getResult();
while(node) {
  node.update(result);
  node = node.parentNode;
}

In UCThello the UCT AI player does not maximize for the amount of discs of own color on board. Instead it analyzes the end of game situation just for any result being a win. The call variantBoard.getResult() returns an array of length two. The two values returned stand for the game result of the players in terms of win or loss. The winning player gets a full point while his opponent scores zero points. Meaning the result is either [ 1, 0 ] or [ 0, 1 ]. A draw or stalemate situation is represented as an array [ 0.5, 0.5 ]. Meaning a draw is better than a loss but shall be interpreted as half a win for both players.

The statistics for a node is updated by node.update(result). Mind the search tree node is representing a move of the active player according to the rules. The update picks the active player's end of game result from the result array and adds it to a statistics value representing the total amount of wins found traversing the search tree node over all MCTS iterations. Additionally the amount of visits for the node is increased.

References

3rd Party Libraries

Links

Othello Organizations

Mind that UCThello follows (most) official tournament rules of the listed organizations depending on your selected options. Still UCThello is independent development from any work of these organizations.

Contributors / Authors

Oliver Merkel,
Creative Commons License
This image is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Oliver Merkel, Creative Commons License, This image is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

All logos, brands and trademarks mentioned belong to their respective owners.