A student implementation of Alpha Go Zero
Switch branches/tags
Nothing to show
Clone or download
dylan djian
Latest commit 27da9db Aug 1, 2018

README.md

SuperGo

A student implementation of AlphaGo Zero paper with documentation.

Ongoing project.

TODO (in order of priority)

  • Do something about the process leaking
  • File of constants that match the paper constants ?
  • OGS / KGS API ?
  • Use logging instead of prints ?

CURRENTLY DOING

  • Optimizations
  • Clean code, create install script, write documentation
  • Trying to see if it learns something on my computer

DONE

  • Statistics (branch statistics)
  • Game that are longer than the threshold of moves are now used
  • MCTS
    • Tree search
    • Dirichlet noise to prior probabilities in the rootnode
    • Adaptative temperature (either take max or proportionally)
    • Sample random rotation or reflection in the dihedral group
    • Multithreading of search
    • Batch size evaluation to save computation
  • Dihedral group of board for more training samples
  • Learning without MCTS doesnt seem to work
  • Resume training
  • GTP on trained models (human.py, to plug with Sabaki)
  • Learning rate annealing (see this)
  • Better display for game (viewer.py, converting self-play games into GTP and then using Sabaki)
  • Make the 3 components (self-play, training, evaluation) asynchronous
  • Multiprocessing of games for self-play and evaluation
  • Models and training without MCTS
  • Evaluation
  • Tromp Taylor scoring
  • Dataset ring buffer of self-play games
  • Loading saved models
  • Database for self-play games

LONG TERM PLAN ?

  • Compile my own version of Sabaki to watch games automatically while traning
  • Resignation ?
  • Training on a big computer / server once everything is ready ?

Resources

Statistics, check branch stats

For a 10 layers deep Resnet

9x9 board

soon

19x19 board

Differences with the official paper

  • No resignation
  • PyTorch instead of Tensorflow
  • Python instead of (probably) C++ / C