Skip to content

hyper optimized alpha zero implementation to play gomoku (distributed training with ray, mcts with cython)

Notifications You must be signed in to change notification settings

0xNineteen/hyper-alpha-zero

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

hyper-alpha-zero

hyper-optimized alpha-zero implementation with ray + cython for speed

train an agent that beats random actions and pure MCTS in 2 minutes

file structure

  • train.py: distributed training with ray
  • ctree/: mcts nodes in cython (node.py = pure python)
  • mcts.py: mcts playouts
  • network.py: neural net stuff
  • board.py: gomoku board

system design

  • ray distributed parts (train.py):
    • one distributed replay buffer
    • N actors with the 'best model' weights which self-play games and store data in replay buffer
    • M 'candidate models' which pull from the replay buffer and train
      • each iteration they play against the 'best model' and if they win the 'best model' weights is updated
      • include write/evaluation locks on 'best weights'
    • 1 best model weights store (PS / parameter server)
      • stores the best weights which are retrived by self-play and updated when candidates win

  • cython impl
    • ctree/: c++/cython mcts
    • node.py: pure python mcts

-- todos --

  • jax network impl
  • tpu + gpu support
  • saved model weights

references

About

hyper optimized alpha zero implementation to play gomoku (distributed training with ray, mcts with cython)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published