The Story of Ender
I’ve started training Ender 160x10-se.
I will need to release Ender128-90l, the strongest of the 128x10 nets. I trained the last bit with Q+Z (q * 0.25 + z * 0.75). It's possible that this incarnation of Ender could have become stronger, but it was getting harder and harder.
I ran a comparison match with ID11258 and Ender against Komodo 12. Leela Ratio was 0.75, and time per move was 2.5 seconds. In these matches, ID11258 plays a game to completion, then Ender takes over at 16p and replays the endgame. So two games are recorded.
The results (divided by white and black) were:
White # PLAYER : RATING POINTS PLAYED (%) 1 Franken-Ender-83 : 3517.9 23.5 44 53.4% 2 Komodo 12 : 3494.0 43.0 88 48.9% 3 ID11258 lc0 0.19.1-rc1 : 3486.0 21.5 44 48.9% Black # PLAYER : RATING POINTS PLAYED (%) 1 Komodo 12 : 3494.0 45.0 78 57.7% 2 ID11258 lc0 0.19.1-rc2 : 3439.6 16.5 39 42.3% 3 Franken-Ender-88 : 3439.6 16.5 39 42.3%
You'll note I switched from Ender 83 for white and RC1, to Ender 88 and RC2 for black.
Dodgy Position Training
I've been feeding "dodgy" positions to Ender. How do identify them? Well, I run sf10 300k node search against a big batch of epd's and convert the centipawn eval to the same scale as Leela via the formula
2/(1+math.exp(-0.004 * cp)) - 1
Then I run a 0 node search on the same epd via lc0 and get the value head via verbose move stats. Then if the absolute value of the difference between the sf10 and lc0 values is greater than 0.5, it's a dodgy position and is put into the training set for self-play.
I glued up ID11258 and Ender83 to play as a UCI engine, Ender taking over when the piece count drops to 16 or less.
- All engines had access to 6 man tb.
- Openings were random 3 move Noomen, played twice with colors reversed.
- TC 1s per move.
- Leela Ratio 3.12
This Frankenstein did well against Komodo 12, but not so well vs SF10.
Komodo 12 results:
Score of Dual vs Komodo 12 TB 1: 6 - 0 - 14 [0.650] Elo difference: 107.54 +/- 78.22
Stockfish 10 results:
Score of Dual vs Stockfish 10 TB: 3 - 7 - 10 [0.400] Elo difference: -70.44 +/- 111.56
I’ve committed a “some assembly required” UCI wrapper to bolt Ender onto another net here
Distilling Problem Positions
Some major developments. Thanks to @oscardssmith for suggesting the idea and providing some initial code for distilling problem positions. That's where the position's value evaluation by Ender is at least .5 winrate away from a predicted value. Initially I used @oscardssmith's code for generating random 6 man positions and using WDL tablebase values to get the predicted value.
I've been mixing in 5k dodgy positions per 20k for a few rounds and the effect has been dramatic. Ender128-80 reached the best performance both in the endgame test suite and play performance (vs sf9tb with 250k nodes).
Test Suite: Ender 128-80 - 5 sec, 2 thr, 1.0 prune Success rate: 63.09% (94/149) Play: Score of stockfishTB vs Ender128-80: 140 - 156 - 104 [0.480] 400 Elo difference: -13.90 +/- 29.33 16p SFNODES=250000 LCNODES=62000
Batch 81 has dodgy positions mixed in with higher number of pieces, where the prediction is from SF9 with 350k nodes. BTW, no data is taken from SF9, it is just used to filter the positions that Ender is trained on.
The 128x10 and Half-Move Clock
- I’ve started training a 128x10 network and upped target to 16p. I continue to train the 64x6 Ender network on the self-play games produced by the 128x10.
- I’ve started to add a random half-move clock value of between 0-99 to 10% of the epd start positions. Typical evaluations for drawn positions have converged rapidly to 0. So, for example, some of the drawn endgames in The Carlsen-Caruana WC match went from 2.5-3.0 (also in 11258) to 0.1-0.4.
Ender 62 wins QvR. First nn to do so! What was the difference? I was converting the epd’s to fen’s by tacking on a “ 0 1”. Now I’m tacking on “ X 80”, where X is the half move clock in the range 0-99. Also, self-play is now using 6 man TB.
Also, Ender is starting to get the best of sf9tb from 12 man positions. Ender 62 got 1.5-0.5 from a 12 man played twice with colors reversed. Still early days, though.
200 positions, TC 0.25 sec per move on 200 12 man positions.
Score of ID9149 vs Ender62: 94 - 144 - 162 [0.438] 400 Elo difference: -43.66 +/- 26.34
I’ve moved over mostly to self-play with 14 man epd’s. The high water mark was Ender 52 so far. In a 0.25s per move match from 200 12 man epd’s, we get:
Score of stockfishTB vs Ender52: 146 - 101 - 153 [0.556] 400 Elo difference: 39.25 +/- 26.84 Score of stockfishTB vs 11258: 157 - 88 - 155 [0.586] 400 Elo difference: 60.54 +/- 26.81
Hopefully we can improve some more.
The EPD News
I've modified lc0 to take a file of epd's as starting positions. I'm now feeding it the same data as the adversarial play against sf9tb and alternating training on 20k batches of games. A 64x6 net really breezes through 20k endgame positions.
The command line for the self plays is
./lc0.ender selfplay --training --games=20000 -w ender-latest.txt.gz --visits=800 --cpuct=5.0 --resign-percentage=0
It’s changed a bit. I am using temp=1 in self play from 20k randomized 12 man epd’s I use noise for adversarial play with sf9tb. The nodes there are 350k for sf and 1600 for lc0.
I feed 2 adversarial batches for every 1 self play batch.
I’m going to play a 100 12 man epd match, with colors reversed, against 11258.
T11258 - 5 sec, 2 thr, no prune Success rate: 53.69% (80/149) Ender 38 - 5 sec, 2 thr, no prune Success rate: 62.42% (93/149)
Based on my most recent test suite run, I am hopeful.
The Ender net (64x6) was initially trained on ~400k semirandom 6, 5, 4, 3 man positions with perfect playouts by sf9tb. This lead to mediocre play.
Currently the net is being trained on 20k batches of playouts (500k window), played from 12 and 6 man positions sampled from a CCRL database, Kingbase played out from resignation, as well as 12, 6, 5, 4, and 3 man semirandom positions. The positions are played both ways between sf9tb and the latest net at 0.25s vs 3200 nodes per move.
The training makes use of @borg's zero history patch, with the added wrinkle that it is only applied 10% of the time. The net does well with and without history, as a result. (See This GoNN page for thoughts on this approach.)
Test suites aren't the end all and be all of testing, but Ender 5 has finally surpassed the 20b networks:
Ender 5 - 5 seconds, 2 threads Success rate: 52.35% (78/149) T902 - 5 seconds, 2 threads Success rate: 45.64% (68/149)
First Goal: win KQvKR
Right now none of the Leela nets can do this.
ID512 over 1000 positions (2000 playouts)
12 man CCRL Elo difference: 61.08 +/- 24.17 12 man Kingbase Elo difference: 76.98 +/- 23.64 12 man semi random Elo difference: 10.43 +/- 47.67 6 man CCRL Elo difference: 36.62 +/- 35.66 6 man Kingbase Elo difference: 47.19 +/- 37.99 6 man semi random Elo difference: 10.43 +/- 61.08 5 man semi random Elo difference: 10.43 +/- 61.08 4 man semi random Elo difference: 0.00 +/- 54.85 3 man semi random Elo difference: 0.00 +/- 60.66
Ender 3 (trained on batch 3) over 10k positions (20k playouts)
12 man CCRL Elo difference: 99.05 +/- 7.85 12 man Kingbase Elo difference: 99.65 +/- 7.75 12 man semi random Elo difference: 15.99 +/- 14.86 6 man CCRL Elo difference: 38.02 +/- 10.72 6 man Kingbase Elo difference: 50.56 +/- 11.37 6 man semi random Elo difference: 16.69 +/- 19.51 5 man semi random Elo difference: 13.56 +/- 18.61 4 man semi random Elo difference: 15.99 +/- 16.95 3 man semi random Elo difference: 0.00 +/- 19.49