Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
- CrazyAra 0.1 used a similar training schedule as Alpha-Zero:
Using a constant learning rate 0.1 and dropping by factor 10 when no improvement was made on the validation dataset for a given period.
- CrazyAra 0.2 uses a One-Cycle-Policy Learning rate schedule combined with a momentum schedule. The learning rate was determined using a lr-range test.
- Smith and Topin - 2017 - Super-Convergence Very Fast Training of Neural Networks Using Large Learning Rates - https://arxiv.org/pdf/1708.07120.pdf_
- Smith - 2018 - A disciplined approach to neural network hyper-pararameters - https://arxiv.org/pdf/1803.09820.pdf
The deeper model using 7 standard residual blocks and 12 bottleneck residual blocks was trained only supervised using the same training and validation dataset:
- 569,537 human games generated by lichess.org users from January 2016 to June 2018 (database.lichess.org/) in which both players had an elo >= 2000
As it can bee seen in the graphs the deeper model converged quicker. Despite using half of the batch-size and having a deeper model the full training time was reduced from previously ~40 hours to ~36,5 hours.
Result overview regarding metrics
Current overall best network trained on the all over 2000 elo game dataset:
|Metric||CrazyAra 0.1||CrazyAra 0.2|