Skip to content
Johannes Czech edited this page Apr 11, 2021 · 20 revisions

Strength Evaluation v0.8.4

In the following, information is given on how to replicate the experiments presented in our paper Improving AlphaZero Using Monte-Carlo Graph.

A release with binaries for Linux and Windows using the TensorRT-backend can be found in the release 0.8.4

⚠️ The ClassicAra binary is not crash free. It turned out that a variable overflow of numberParentNodes and a wrong ply-counter check (#83) was the cause of this. A fix is applied in release 0.9.0.

The ClassicAra and CrazyAra 0.8.4 binaries expect the Model_Directory to be model/ while release 0.9.0 expects it to be model/chess and model/crazyhouse by default instead (#75).

There are three ways to fix this problem.

    1. You manually set the Model_Directory before calling isready setoption name Model_Directory value model/chess
    1. You move the files from model/chess into model/.
    1. If you have already generated the trt-files, you should be able to edit the UCI-Option Model_Directory via the GUI directly.

Hardware Setup

Hardware / Software Description
GPU NVIDIA GeForce RTX2070 OC
Backend TensorRT-7.0.0.11, float16 precision
GPU-Driver CUDA 10.2, cuDNN 7.6.5
CPU AMD® Ryzen 7 1700 eight-core processor × 16
Memory (RAM) 31,4 GiB
Operating System Ubuntu 18.04.3 LTS, 64-bit
Tournament Environment Cutechess 1.1.0
CrazyAra 500da21e0bd9152657adbbc6118f3ebbc660e449
Multi-Variant-Stockfish 2020-06-13
Stockfish 12-NNUE, nn-82215d0fd0df.nnue

Opening Suites

The following opening suites were used when conducting the engine tournaments:

UCI-Options

Multi-Variant-Stockfish (2020-06-13)

All default except:

  • Threads: 8

Stockfish 12-NNUE

All default except:

  • Threads: 2

CrazyAra 0.8.4

The configuration labeled as AlphaZero* in the paper corresponds to.

  • Search_Type: MCTS
  • Context: gpu
  • Device_ID: 0
  • Batch_Size: 16
  • Threads: 2
  • Centi_CPuct_Init: 250
  • CPuct_Base: 19652
  • Centi_Dirichlet_Epsilon: 0
  • Centi_Dirichlet_Alpha: 20
  • Centi_U_Init: 100
  • Centi_U_Min: 100
  • U_Base: 1965
  • Centi_U_Init_Divisor: 100
  • Centi_Q_Value_Weight: 0
  • Centi_Q_Thresh_Init: 50
  • Centi_Q_Thresh_Max: 90
  • Q_Thresh_Base: 1965
  • Max_Search_Depth: 99
  • Centi_Temperature: 80
  • Temperature_Moves: 0
  • Centi_Temperature_Decay: 92
  • Centi_Node_Temperature: 200
  • Virtual_Loss: 1
  • Nodes: 0
  • Allow_Early_Stopping: True
  • Use_Raw_Network: False
  • Enhance_Checks: False
  • Enhance_Captures: False
  • Use_Transposition_Table: False
  • Use_TensorRT: True
  • Fixed_Movetime: 5000
  • Model_Directory: model/
  • Move_Overhead: 50
  • Centi_Random_Move_Factor: 0
  • Use_Random_Playout: False
  • MCTS_Solver: False

The configuration labeled as MCGS-Combined in the paper corresponds to AlphaZero* with the following overrides.

Note: Use_Transposition_Table = True activates the MCGS in this case.

  • Centi_Q_Value_Weight: 200 (Q-Values for Move)
  • Enhance_Checks: True (Enhanced Checks)
  • Use_Transposition_Table: True (MCGS)
  • Use_Random_Playout: True (Epsilon-Greedy)
  • MCTS_Solver: True (Terminal Solver)

Nodes per Second (NPS)

  • Multi-Variant-Stockfish: 7.9 Million NPS
  • Stockfish 12-NNUE: 1.6 Million NPS
  • CrazyAra 0.8.4: 17 K NPS

Experiments

In the following, the raw data for all experiments as shown in Figure 3 - 7 is given.

The experiments used a Fixed_Movetime between 100 to 5000 ms or a fixed number of Simulations and Nodes to avoid distorted results by the time manager.

Figure 3

  • Elo development relative to the number of neural network evaluations in crazyhouse.
Rank Name                                                                 Elo     +/-   Games   Score   Draws
   1 CrazyAra-0.8.4-ALL [3200sim]                                          371      27    1490   89.4%    4.0%
   2 CrazyAra-0.8.4 735b33481bfab02754f002926c4895d9aabdb7a1 [3200sim]     356      26    1488   88.6%    4.0%
   3 CrazyAra-0.8.4-Random-Playout [3200sim]                               346      26    1490   88.0%    3.6%
   4 CrazyAra-0.8.4-Solver [3200sim]                                       342      26    1490   87.8%    3.7%
   5 CrazyAra-0.8.4-CHECK-ENHANCE [3200sim]                                341      25    1488   87.7%    5.0%
   6 CrazyAra-0.8.4-Q-2.0 [3200sim]                                        339      25    1490   87.6%    4.3%
   7 CrazyAra-0.8.4-DAG [3200sim]                                          338      25    1488   87.5%    4.4%
   8 CrazyAra-0.8.4-ALL [1600sim]                                          204      20    1490   76.4%    4.6%
   9 CrazyAra-0.8.4 735b33481bfab02754f002926c4895d9aabdb7a1 [1600sim]     196      20    1488   75.5%    4.5%
  10 CrazyAra-0.8.4-CHECK-ENHANCE [1600sim]                                195      20    1488   75.5%    4.0%
  11 CrazyAra-0.8.4-DAG [1600sim]                                          194      20    1488   75.4%    4.1%
  12 CrazyAra-0.8.4-Random-Playout [1600sim]                               192      20    1490   75.1%    3.8%
  13 CrazyAra-0.8.4-Q-2.0 [1600sim]                                        180      20    1488   73.9%    4.4%
  14 CrazyAra-0.8.4-Solver [1600sim]                                       175      19    1490   73.3%    4.8%
  15 CrazyAra-0.8.4-DAG [800sim]                                            74      18    1488   60.6%    4.0%
  16 CrazyAra-0.8.4-ALL [800sim]                                            72      18    1488   60.2%    4.8%
  17 CrazyAra-0.8.4-CHECK-ENHANCE [800sim]                                  71      18    1488   60.1%    4.3%
  18 CrazyAra-0.8.4-Q-2.0 [800sim]                                          71      18    1490   60.1%    4.3%
  19 CrazyAra-0.8.4 735b33481bfab02754f002926c4895d9aabdb7a1 [800sim]       63      18    1488   58.9%    4.4%
  20 CrazyAra-0.8.4-Random-Playout [800sim]                                 61      18    1490   58.7%    4.1%
  21 CrazyAra-0.8.4-Solver [800sim]                                         55      17    1490   57.9%    4.4%
  22 CrazyAra-0.8.4-Q-2.0 [400sim]                                         -44      17    1490   43.7%    4.6%
  23 CrazyAra-0.8.4-ALL [400sim]                                           -46      17    1488   43.4%    4.6%
  24 CrazyAra-0.8.4 735b33481bfab02754f002926c4895d9aabdb7a1 [400sim]      -52      17    1488   42.6%    5.0%
  25 CrazyAra-0.8.4-CHECK-ENHANCE [400sim]                                 -52      17    1488   42.6%    4.9%
  26 CrazyAra-0.8.4-Solver [400sim]                                        -57      17    1490   41.9%    4.8%
  27 CrazyAra-0.8.4-DAG [400sim]                                           -61      17    1488   41.4%    4.9%
  28 CrazyAra-0.8.4-Random-Playout [400sim]                                -61      18    1490   41.3%    4.3%
  29 CrazyAra-0.8.4-Q-2.0 [200sim]                                        -166      19    1490   27.8%    4.4%
  30 CrazyAra-0.8.4-ALL [200sim]                                          -183      19    1490   25.9%    5.1%
  31 CrazyAra-0.8.4-Solver [200sim]                                       -189      20    1490   25.2%    5.0%
  32 CrazyAra-0.8.4-CHECK-ENHANCE [200sim]                                -201      20    1488   23.9%    4.4%
  33 CrazyAra-0.8.4-Random-Playout [200sim]                               -204      20    1490   23.6%    4.7%
  34 CrazyAra-0.8.4-DAG [200sim]                                          -206      20    1488   23.4%    4.5%
  35 CrazyAra-0.8.4 735b33481bfab02754f002926c4895d9aabdb7a1 [200sim]     -211      20    1488   22.9%    3.7%
  36 CrazyAra-0.8.4-ALL [100sim]                                          -334      26    1490   12.8%    2.9%
  37 CrazyAra-0.8.4-Q-2.0 [100sim]                                        -365      27    1488   10.9%    3.0%
  38 CrazyAra-0.8.4-DAG [100sim]                                          -377      28    1488   10.2%    2.9%
  39 CrazyAra-0.8.4-Solver [100sim]                                       -380      28    1490   10.1%    2.9%
  40 CrazyAra-0.8.4 735b33481bfab02754f002926c4895d9aabdb7a1 [100sim]     -388      29    1488    9.7%    2.2%
  41 CrazyAra-0.8.4-CHECK-ENHANCE [100sim]                                -390      29    1488    9.6%    2.8%
  42 CrazyAra-0.8.4-Random-Playout [100sim]                               -392      29    1490    9.5%    2.6%

31269 games finished.

Notes

  • Figure 3 only shows CrazyAra-0.8.4-ALL and CrazyAra-0.8.4 735b33481bfab02754f002926c4895d9aabdb7a1 respectively, to avoid over-plotting.
  • CrazyAra-0.8.4 735b33481bfab02754f002926c4895d9aabdb7a1 is described as AlphaZero* in the paper.

Figure 5

  • Elo development relative to the number of neural network evaluations in chess.
Rank Name                                                                       Elo     +/-   Games   Score   Draws
   1 ClassicAra-0.8.6-ALL 1600-Evals                                            328      41     414   86.8%   15.7%
   2 ClassicAra-0.8.6-500da21e0bd9152657adbbc6118f3ebbc660e449 1600-Evals       298      37     416   84.7%   19.5%
   3 ClassicAra-0.8.6-ALL 800-Evals                                             159      32     416   71.4%   19.2%
   4 ClassicAra-0.8.6-500da21e0bd9152657adbbc6118f3ebbc660e449 800-Evals        142      32     414   69.3%   17.4%
   5 ClassicAra-0.8.6-ALL 400-Evals                                              17      30     416   52.4%   21.6%
   6 ClassicAra-0.8.6-500da21e0bd9152657adbbc6118f3ebbc660e449 400-Evals          4      30     414   50.6%   19.6%
   7 ClassicAra-0.8.6-ALL 200-Evals                                            -140      33     416   30.9%   14.7%
   8 ClassicAra-0.8.6-500da21e0bd9152657adbbc6118f3ebbc660e449 200-Evals       -141      32     416   30.8%   17.3%
   9 ClassicAra-0.8.6-ALL 100-Evals                                            -349      47     414   11.8%    8.7%
  10 ClassicAra-0.8.6-500da21e0bd9152657adbbc6118f3ebbc660e449 100-Evals       -358      47     416   11.3%    8.7%

2077 of 45000 games finished.

Notes

  • ClassicAra-0.8.6-500da21e0bd9152657adbbc6118f3ebbc660e449 is described as AlphaZero* in the paper.

Figure 4

  • Elo development in crazyhouse over time of MCGS compared to MCTS which uses a hash table as a transposition buffer to copy neural network evaluations.
Version                                              Time per Move [ms]        Elo              +/-
CrazyAra-0.8.4 (MCGS)                                 5000                     511              36
CrazyAra-0.8.4-pre-pull-47 (Transposition Table)      5000                     456              81
CrazyAra-0.8.4 (MCGS)                                 2500                     252              36
CrazyAra-0.8.4-pre-pull-47 (Transposition Table)      2500                     224              50
CrazyAra-0.8.4 (MCGS)                                 1000                     120              41
CrazyAra-0.8.4-pre-pull-47 (Transposition Table)      1000                     74               43
CrazyAra-0.8.4 (MCGS)                                 50033              42
CrazyAra-0.8.4-pre-pull-47 (Transposition Table)      500                      -53              42
CrazyAra-0.8.4 (MCGS)                                 250188             22
CrazyAra-0.8.4-pre-pull-47 (Transposition Table)      250                      -222             42
CrazyAra-0.8.4 (MCGS)                                 100554             21
CrazyAra-0.8.4-pre-pull-47 (Transposition Table)      100                      -573             42

Notes

  • CrazyAra-0.8.4 (MCGS) corresponds to the binary CrazyAra-0.8.4.

  • CrazyAra-0.8.4-pre-pull-47 (Transposition Table) corresponds to the binary CrazyAra-0.8.4_pre_pull_47.

  • First all versions CrazyAra-0.8.4-pre-pull-47 (Transposition Table) have been tested against each other in different time controls.

  • Then each time control was tested against the respective CrazyAra-0.8.4 (MCGS) version. This is the reason why the error is lower for the CrazyAra-0.8.4 (MCGS) versions.

  • The configuration labeled as MCTS which uses a hash table as a transposition buffer to copy neural network evaluations corresponds to AlphaZero* with the override Use_Transposition_Table = True

  • Use_Transposition_Table = True activates the usage of a transposition table to copy over neural network evaluations for CrazyAra-0.8.4-pre-pull-47.

Figure 6

  • Elo comparison of the proposed search modification in crazyhouse using five seconds per move. On the used hardware this resulted in 100 000 - 800 000 total nodes per move.
Rank  Name                                                     Movetime Elo   Uncertainty
1     CrazyAra-0.8.4-ALL                                       5000ms   283   96
2     CrazyAra-0.8.4-DAG                                       5000ms    81   62
6     CrazyAra-0.8.4-Random-Playout                            5000ms    63   84
7     CrazyAra-0.8.4-CHECK-ENHANCE                             5000ms    19   65
3     CrazyAra-0.8.4-Q-2.0                                     5000ms   -18   65
4     CrazyAra-0.8.4-Solver                                    5000ms   -28   59
5     CrazyAra-0.8.4-735b33481bfab02754f002926c4895d9aabdb7a1  5000ms   -36   62

Notes

  • CrazyAra-0.8.4-735b33481bfab02754f002926c4895d9aabdb7a1 is described as AlphaZero* in the paper.

Figure 7

  • Elo comparison of the proposed search modification in chess using five seconds per move.
Rank Name                                                      Movetime  Elo  Uncertainty
1    ClassicAra-0.8.6-ALL                                      5000ms    51   38.99 
2    ClassicAra-0.8.6-DAG                                      5000ms    51   36.83 
3    ClassicAra-0.8.6-Q-2.0                                    5000ms    17   34    
4    ClassicAra-0.8.6-Random-Playout                           5000ms     9   54     
5    ClassicAra-0.8.6-CHECK-ENHANCE                            5000ms     0   43     
6    ClassicAra-0.8.6-Solver                                   5000ms    -9   48   
7    ClassicAra-0.8.6-500da21e0bd9152657adbbc6118f3ebbc660e449 5000ms   -18   66 

Notes

  • ClassicAra-0.8.6-500da21e0bd9152657adbbc6118f3ebbc660e449 is described as AlphaZero* in the paper.
Clone this wiki locally