Skip to content
Michael Pang edited this page Dec 23, 2017 · 22 revisions

Welcome to the chess-alpha-zero wiki!

What I'm doing on this fork:

Model: Diagram

  • Input: 12 planes for pieces, 4 planes for castling, 1 plane for 50 move rule and 1 plane for en-passant (no history, flip-color transform). Simple and reduces overfitting (in theory)
  • Hidden: conv3-256 + 7 residual, batchnorms in between (total 15 conv layers)
  • Output: 1968-wide vector for policy, scalar for value

Speed improvements:

  • All workers are multithreaded/multiprocess
  • SL and opt are especially fast, loading thousands of games in minutes which is great for collecting more data!
  • self-play/eval/uci are also several times faster.

SL techniques:

  • Weight policy by ELO
  • Training on the material value of position

Other:

  • Extraneous bias removal

TODO:

  • Implement MCTS in C++
  • Variable regularization....
  • Try 5x5 convs in the first few layers

Goals:

  • Get a model that beats the materialistic MCTS agent