Madness in computer chess

Destroying the faith to testsuite results

... some experiments on learning to solve a testsuite while losing general strength.

The testsuite EN-Test 2022.epd was taken from https://solistachess.jimdosite.com/testing/ and contains 120 positions sampled by a guy who constantly argues with testing the strength of a chess engine by counting how many positions of a testsuite can be solved. Let's see if RubiChess can learn to solve this testsuite (better).

Where we come from:

RubiChess.exe -bench -epdfile EN-Test_2022.epd -maxtime 5 > nul
Benchmark results
========================================================================================
RubiChess 20230225 NN-0cea6 (avx2) (Build Feb 27 2023 09:18:06 commit 80fb9ee Clang 9)
UCI compatible chess engine by Andreas Matthies
----------------------------------------------------------------------------------------
System: AMD Ryzen 7 3700X 8-Core Processor               Family: 23  Model: 113
CPU-Features of system: sse2 ssse3 popcnt lzcnt bmi1 avx2
CPU-Features of binary: sse2 ssse3 popcnt lzcnt bmi1 avx2
========================================================================================
...
=============================================================================================================
Overall:                    54/120 = 45.0%                    588.022705 sec. 1050412894 nodes    1786347 nps

Commands to generate training positions with branch trainonepd (wip) which gives positions with correct root move a 'win' and positions with incorrect root move a 'loss':

sfen1:	gensfen loop 10000000 book C:\Entwicklung\EPD\EN-Test_2022.epd random_book_pos 0 result_on_bm 1 write_minply 0 maxply 20 depth 9 disable_prune 1 random_multi_pv 5 random_multi_pv_depth 7 random_multi_pv_diff 100
sfen2:  gensfen loop 10000000 book C:\Entwicklung\EPD\EN-Test_2022.epd random_book_pos 0 result_on_bm 1 write_minply 0 maxply 20 depth 8 disable_prune 1 random_multi_pv 4 random_multi_pv_depth 6 random_multi_pv_diff 100
sfen3:  gensfen loop 10000000 book C:\Entwicklung\EPD\EN-Test_2022.epd random_book_pos 0 result_on_bm 1 write_minply 0 maxply 30 depth 7 disable_prune 1 random_multi_pv 4 random_multi_pv_depth 5 random_multi_pv_diff 150
sfen4:  gensfen loop 10000000 book C:\Entwicklung\EPD\EN-Test_2022.epd random_book_pos 0 result_on_bm 1 write_minply 0 maxply 30 depth 7 disable_prune 1 random_multi_pv 4 random_multi_pv_depth 5 random_multi_pv_diff 150 bm_factor 5
sfen5:  gensfen loop 10000000 book C:\Entwicklung\EPD\EN-Test_2022.epd random_book_pos 0 result_on_bm 1 write_minply 0 maxply 6 depth 7 disable_prune 1 random_multi_pv 4 random_multi_pv_depth 5 random_multi_pv_diff 150 bm_factor 10

Training on the resulting concatenated binpack using lambda = 0.5 starting from current master network:

python train.py C:\Schach\nnue-work\EN-train.binpack C:\Schach\nnue-work\EN-train.binpack --lambda 0.5 --threads 8 --num-workers 8 --gpus 1 --batch-size 8192 --smart-fen-skipping --random-fen-skipping 4 --features="HalfKAv2_hm^" --network-save-period 1 --resume-from-model master-0cea6.pt

Result after epoch 0 (network en-ep00-la05.nnue):

=============================================================================================================
Overall:                    93/120 = 77.5%                    588.023376 sec.  967252128 nodes    1644921 nps

Same results for epoch 1 and epoch 2.

Okay, already saturated. So lets try something even more extreme... even less lambda and no smart fen skipping:

python train.py C:\Schach\nnue-work\EN-train.binpack C:\Schach\nnue-work\EN-train.binpack --lambda 0.25 --threads 8 --num-workers 8 --gpus 1 --batch-size 8192  --random-fen-skipping 4 --features="HalfKAv2_hm^" --network-save-period 1 --resume-from-model master-0cea6.pt

Result after epoch 0 (network en-ep00-la025-nosfs.nnue):

=============================================================================================================
Overall:                    95/120 = 79.2%                    588.023438 sec.  991273523 nodes    1685772 nps

New record. Now lets try to introduce smart fen skipping again and reduce random fen skipping to 3 with lambda still 0.25:

Result after epoch 0 (network en-ep00-la025.nnue):

=============================================================================================================
Overall:                    99/120 = 82.5%                    588.023376 sec.  990447890 nodes    1684368 nps

Result after epoch 6 (network en-ep06-la025.nnue):

=============================================================================================================
Overall:                   102/120 = 85.0%                    588.023560 sec.  931192100 nodes    1583596 nps

Okay, this is good enough. Almost doubled the success rate on this test suite. Now lets test if this net is twice as strong in normal play... (network file is archived here).

Playing STC match between en-ep06-la025 and master ended in a disaster.

Score of EN-ep06-la025 vs Master: 0 - 400 - 0  [0.000] 400
Elo difference: -inf +/- nan, LOS: 0.0 %, DrawRatio: 0.0 %

Conclusion

By training on <50MB of data starting from the testsuite positions we managed to improve the results in solving this testsuite from 45% to 85% and at the same time decreased playing strength to... needs to be measure but very low for sure.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Madness in computer chess

Destroying the faith to testsuite results

Conclusion

Clone this wiki locally