Supervised training

QueensGambit edited this page Dec 18, 2018 · 5 revisions

Supervised Training

  • CrazyAra 0.1 used a similar training schedule as Alpha-Zero:

Using a constant learning rate 0.1 and dropping by factor 10 when no improvement was made on the validation dataset for a given period.

  • CrazyAra 0.2 uses a One-Cycle-Policy Learning rate schedule combined with a momentum schedule. The learning rate was determined using a lr-range test.
lr-schedule momentum-schedule


Training Data

The deeper model using 7 standard residual blocks and 12 bottleneck residual blocks was trained only supervised using the same training and validation dataset:

  • 569,537 human games generated by users from January 2016 to June 2018 ( in which both players had an elo >= 2000


Training Results

lr-schedule lr-schedule
lr-schedule lr-schedule

As it can bee seen in the graphs the deeper model converged quicker. Despite using half of the batch-size and having a deeper model the full training time was reduced from previously ~40 hours to ~36,5 hours.

Result overview regarding metrics

Current overall best network trained on the all over 2000 elo game dataset:

Metric CrazyAra 0.1 CrazyAra 0.2
val_policy_loss 1.2680 1.2647
val_value_loss 0.7817 0.7386
val_policy_acc 0.5930 0.5895
val_value_acc_sign 0.6818 0.7010
mate_in_one_policy_policy_loss 0.5859 0.5514
mate_in_one_value_loss 0.0769 0.0534
mate_in_one_acc 0.939 0.939
mate_in_one_top_5_acc 0.997 0.998
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.