Skip to content

Endgame Net

dkappe edited this page Aug 21, 2019 · 33 revisions

The Story of Ender

This page is organized in reverse chronological order, in other words the beginnings are at the bottom.

Latest release

Latest News

I’ve started training Ender 160x10-se.

I will need to release Ender128-90l, the strongest of the 128x10 nets. I trained the last bit with Q+Z (q * 0.25 + z * 0.75). It's possible that this incarnation of Ender could have become stronger, but it was getting harder and harder.

Matches

I ran a comparison match with ID11258 and Ender against Komodo 12. Leela Ratio was 0.75, and time per move was 2.5 seconds. In these matches, ID11258 plays a game to completion, then Ender takes over at 16p and replays the endgame. So two games are recorded.

The results (divided by white and black) were:

White
   # PLAYER                    : RATING    POINTS  PLAYED    (%)
   1 Franken-Ender-83          : 3517.9      23.5      44   53.4%
   2 Komodo 12                 : 3494.0      43.0      88   48.9%
   3 ID11258 lc0 0.19.1-rc1    : 3486.0      21.5      44   48.9%

Black
   # PLAYER                    : RATING    POINTS  PLAYED    (%)
   1 Komodo 12                 : 3494.0      45.0      78   57.7%
   2 ID11258 lc0 0.19.1-rc2    : 3439.6      16.5      39   42.3%
   3 Franken-Ender-88          : 3439.6      16.5      39   42.3%

You'll note I switched from Ender 83 for white and RC1, to Ender 88 and RC2 for black.

The delta games, where the results were different, are here for white and black.

Dodgy Position Training

I've been feeding "dodgy" positions to Ender. How do identify them? Well, I run sf10 300k node search against a big batch of epd's and convert the centipawn eval to the same scale as Leela via the formula

2/(1+math.exp(-0.004 * cp)) - 1

Then I run a 0 node search on the same epd via lc0 and get the value head via verbose move stats. Then if the absolute value of the difference between the sf10 and lc0 values is greater than 0.5, it's a dodgy position and is put into the training set for self-play.

Franken-Ender

I glued up ID11258 and Ender83 to play as a UCI engine, Ender taking over when the piece count drops to 16 or less.

  • All engines had access to 6 man tb.
  • Openings were random 3 move Noomen, played twice with colors reversed.
  • TC 1s per move.
  • Leela Ratio 3.12

This Frankenstein did well against Komodo 12, but not so well vs SF10.

Komodo 12 results:

Score of Dual vs Komodo 12 TB 1: 6 - 0 - 14 [0.650]
Elo difference: 107.54 +/- 78.22

Games

Stockfish 10 results:

Score of Dual vs Stockfish 10 TB: 3 - 7 - 10 [0.400]
Elo difference: -70.44 +/- 111.56

Games

I’ve committed a “some assembly required” UCI wrapper to bolt Ender onto another net here

Distilling Problem Positions

Some major developments. Thanks to @oscardssmith for suggesting the idea and providing some initial code for distilling problem positions. That's where the position's value evaluation by Ender is at least .5 winrate away from a predicted value. Initially I used @oscardssmith's code for generating random 6 man positions and using WDL tablebase values to get the predicted value.

I've been mixing in 5k dodgy positions per 20k for a few rounds and the effect has been dramatic. Ender128-80 reached the best performance both in the endgame test suite and play performance (vs sf9tb with 250k nodes).

Test Suite:
Ender 128-80 - 5 sec, 2 thr, 1.0 prune
Success rate: 63.09% (94/149)
Play:
Score of stockfishTB vs Ender128-80: 140 - 156 - 104  [0.480] 400
Elo difference: -13.90 +/- 29.33
16p SFNODES=250000 LCNODES=62000

Batch 81 has dodgy positions mixed in with higher number of pieces, where the prediction is from SF9 with 350k nodes. BTW, no data is taken from SF9, it is just used to filter the positions that Ender is trained on.

Stay tuned.

The 128x10 and Half-Move Clock

  • I’ve started training a 128x10 network and upped target to 16p. I continue to train the 64x6 Ender network on the self-play games produced by the 128x10.
  • I’ve started to add a random half-move clock value of between 0-99 to 10% of the epd start positions. Typical evaluations for drawn positions have converged rapidly to 0. So, for example, some of the drawn endgames in The Carlsen-Caruana WC match went from 2.5-3.0 (also in 11258) to 0.1-0.4.

QvR Milestone

Ender 62 wins QvR. First nn to do so! What was the difference? I was converting the epd’s to fen’s by tacking on a “ 0 1”. Now I’m tacking on “ X 80”, where X is the half move clock in the range 0-99. Also, self-play is now using 6 man TB.

Lichess PGN

Also, Ender is starting to get the best of sf9tb from 12 man positions. Ender 62 got 1.5-0.5 from a 12 man played twice with colors reversed. Still early days, though.

Against 9149

200 positions, TC 0.25 sec per move on 200 12 man positions.

Score of ID9149 vs Ender62: 94 - 144 - 162  [0.438] 400
Elo difference: -43.66 +/- 26.34

Self Play

I’ve moved over mostly to self-play with 14 man epd’s. The high water mark was Ender 52 so far. In a 0.25s per move match from 200 12 man epd’s, we get:

Score of stockfishTB vs Ender52: 146 - 101 - 153  [0.556] 400
Elo difference: 39.25 +/- 26.84
Score of stockfishTB vs 11258: 157 - 88 - 155  [0.586] 400
Elo difference: 60.54 +/- 26.81

Hopefully we can improve some more.

The EPD News

I've modified lc0 to take a file of epd's as starting positions. I'm now feeding it the same data as the adversarial play against sf9tb and alternating training on 20k batches of games. A 64x6 net really breezes through 20k endgame positions.

The command line for the self plays is

./lc0.ender selfplay --training --games=20000 -w ender-latest.txt.gz --visits=800 --cpuct=5.0 --resign-percentage=0

Latest Training

It’s changed a bit. I am using temp=1 in self play from 20k randomized 12 man epd’s I use noise for adversarial play with sf9tb. The nodes there are 350k for sf and 1600 for lc0.

I feed 2 adversarial batches for every 1 self play batch.

I’m going to play a 100 12 man epd match, with colors reversed, against 11258.

T11258 - 5 sec, 2 thr, no prune
Success rate: 53.69% (80/149)
Ender 38 - 5 sec, 2 thr, no prune
Success rate: 62.42% (93/149)

Based on my most recent test suite run, I am hopeful.

History

The Ender net (64x6) was initially trained on ~400k semirandom 6, 5, 4, 3 man positions with perfect playouts by sf9tb. This lead to mediocre play.

Currently the net is being trained on 20k batches of playouts (500k window), played from 12 and 6 man positions sampled from a CCRL database, Kingbase played out from resignation, as well as 12, 6, 5, 4, and 3 man semirandom positions. The positions are played both ways between sf9tb and the latest net at 0.25s vs 3200 nodes per move.

The training makes use of @borg's zero history patch, with the added wrinkle that it is only applied 10% of the time. The net does well with and without history, as a result. (See This GoNN page for thoughts on this approach.)

Test Suite

Test suites aren't the end all and be all of testing, but Ender 5 has finally surpassed the 20b networks:

Ender 5 - 5 seconds, 2 threads
Success rate: 52.35% (78/149)
T902 - 5 seconds, 2 threads
Success rate: 45.64% (68/149)

First Goal: win KQvKR

Right now none of the Leela nets can do this.

ID512 over 1000 positions (2000 playouts)

12 man CCRL        Elo difference: 61.08 +/- 24.17
12 man Kingbase    Elo difference: 76.98 +/- 23.64
12 man semi random Elo difference: 10.43 +/- 47.67
6 man CCRL         Elo difference: 36.62 +/- 35.66
6 man Kingbase     Elo difference: 47.19 +/- 37.99
6 man semi random  Elo difference: 10.43 +/- 61.08
5 man semi random  Elo difference: 10.43 +/- 61.08
4 man semi random  Elo difference: 0.00 +/- 54.85
3 man semi random  Elo difference: 0.00 +/- 60.66

Ender 3 (trained on batch 3) over 10k positions (20k playouts)

12 man CCRL        Elo difference: 99.05 +/- 7.85
12 man Kingbase    Elo difference: 99.65 +/- 7.75
12 man semi random Elo difference: 15.99 +/- 14.86
6 man CCRL         Elo difference: 38.02 +/- 10.72
6 man Kingbase     Elo difference: 50.56 +/- 11.37
6 man semi random  Elo difference: 16.69 +/- 19.51
5 man semi random  Elo difference: 13.56 +/- 18.61
4 man semi random  Elo difference: 15.99 +/- 16.95
3 man semi random  Elo difference: 0.00 +/- 19.49