AlphaGoZero - Reinforcement Learning Project

This is my implementation of the DeepMind's AlphaZero algorithm for the Game of Go. For reference, Mastering the game of Go without human knowledge

Our source code consists of the following files inside the utils_6 folder :

policyValueNet.py : Architecture and training procedure for the neural network
MCTS.py : Gives policy for a given state after running MCTS for 100 simulations.
selfPlay.py : Starts training the model after loading the latest checkpoint
utils.py : Contains helper functions to be used for MCTS & Policy-Value Network
config.py : Parameters required for the configuration
enums.py : Enumeration class for BLACK & WHITE colours

To install the dependencies of this project, run: pip install -r requirements.txt

To download the model in model_6 folder: sh download_model_6.sh

To start the train (best checkpoint will be automatically loaded): sh train.sh

To start playing with AlphaGoPlayer_6.py against a random/fixed agent, execute: python tournament.py

To visualise the decrement in the loss of policy & value (inside the utils_6 folder): tensorboard logdirs='./logs' --port=6006

Model can be downloaded from: https://drive.google.com/open?id=1qADVoBD2fx48wy7pNTe76S2qv0FxYF5D

Report link: https://docs.google.com/document/d/1ZFzSJXpWKzD6DVGYP27FuhNmCPUAKruKWqbLKJEeTks/edit?usp=sharing

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
model_6		model_6
utils_6		utils_6
.gitignore		.gitignore
AlphaGoPlayer_1.py		AlphaGoPlayer_1.py
AlphaGoPlayer_2.py		AlphaGoPlayer_2.py
AlphaGoPlayer_6.py		AlphaGoPlayer_6.py
README.md		README.md
Readme.txt		Readme.txt
Report.pdf		Report.pdf
download_model_6.sh		download_model_6.sh
goSim.py		goSim.py
requirements.txt		requirements.txt
single_match.py		single_match.py
time_handler.py		time_handler.py
tournament.py		tournament.py
train.sh		train.sh

Provide feedback