Update MCTS documentation.

PiperOrigin-RevId: 274022018 Change-Id: Iaf546af20ea4488d3b51ef60bc80a5b1d06437f7
google-deepmind · Oct 10, 2019 · b1b321a · b1b321a
1 parent 4465531
commit b1b321a
Show file tree

Hide file tree

Showing 2 changed files with 20 additions and 0 deletions.
diff --git a/open_spiel/algorithms/mcts.h b/open_spiel/algorithms/mcts.h
@@ -45,13 +45,23 @@
 // non-zero-sum games. It doesn't have any special handling for imperfect
 // information games.
 //
+// The implementation also supports backing up solved states, i.e. MCTS-Solver.
+// The implementation is general in that it is based on a max^n backup (each
+// player greedily chooses their maximum among proven children values, or there
+// exists one child whose proven value is Game::MaxUtility()), so it will work
+// for multiplayer, general-sum, and arbitrary payoff games (not just win/loss/
+// draw games). Also chance nodes are considered proven only if all children
+// have the same value.
+//
 // Some references:
 // - Sturtevant, An Analysis of UCT in Multi-Player Games,  2008,
 //   https://web.cs.du.edu/~sturtevant/papers/multi-player_UCT.pdf
 // - Nijssen, Monte-Carlo Tree Search for Multi-Player Games, 2013,
 //   https://project.dke.maastrichtuniversity.nl/games/files/phd/Nijssen_thesis.pdf
 // - Silver, AlphaGo Zero: Starting from scratch, 2017
 //   https://deepmind.com/blog/article/alphago-zero-starting-scratch
+// - Winands, Bjornsson, and Saito, Monte-Carlo Tree Search Solver, 2008.
+//   https://dke.maastrichtuniversity.nl/m.winands/documents/uctloa.pdf
 
 
 namespace open_spiel {

diff --git a/open_spiel/python/algorithms/mcts.py b/open_spiel/python/algorithms/mcts.py
@@ -332,13 +332,23 @@ def mcts_search(self, state):
     non-zero-sum games. It doesn't have any special handling for imperfect
     information games.
 
+    The implementation also supports backing up solved states, i.e. MCTS-Solver.
+    The implementation is general in that it is based on a max^n backup (each
+    player greedily chooses their maximum among proven children values, or there
+    exists one child whose proven value is game.max_utility()), so it will work
+    for multiplayer, general-sum, and arbitrary payoff games (not just win/loss/
+    draw games). Also chance nodes are considered proven only if all children
+    have the same value.
+
     Some references:
     - Sturtevant, An Analysis of UCT in Multi-Player Games,  2008,
       https://web.cs.du.edu/~sturtevant/papers/multi-player_UCT.pdf
     - Nijssen, Monte-Carlo Tree Search for Multi-Player Games, 2013,
       https://project.dke.maastrichtuniversity.nl/games/files/phd/Nijssen_thesis.pdf
     - Silver, AlphaGo Zero: Starting from scratch, 2017
       https://deepmind.com/blog/article/alphago-zero-starting-scratch
+    - Winands, Bjornsson, and Saito, "Monte-Carlo Tree Search Solver", 2008.
+      https://dke.maastrichtuniversity.nl/m.winands/documents/uctloa.pdf
 
     Arguments:
       state: pyspiel.State object, state to search from