Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moves left head #961

Merged
merged 22 commits into from
Mar 4, 2020
Merged

Moves left head #961

merged 22 commits into from
Mar 4, 2020

Conversation

Ttl
Copy link
Member

@Ttl Ttl commented Sep 29, 2019

Adds support for optional moves left head in the NN that tries to predict how many moves are remaining in the current game. Needs NN trained with moves left head to have any effect.

The moves left head is currently used to pick shorter moves when the root Q is above configurable threshold and longer moves when root Q is below the negative threhold. This fixes the problem of trolling where lc0 either throws away pieces when losing or doesn't make progress when winning.

I trained a 128x10 network from rescored T40 data with moves left head and matched it against 128x10 net trained from CCRL dataset. No adjudication or TB. Each player had 10k fixed nodes. lc0_128x10_moves_left_t40_0.95_0.05_10 uses the moves left logic, lc0_128x10_moves_left_t40_master is the same network using master branch and lc0_128x10_ccrl is the CCRL trained net using master branch.

The 128x10 network with moves left head can be downloaded here.

   # PLAYER                                    :  RATING  ERROR  POINTS  PLAYED   (%)  CFS(%)
   1 lc0_128x10_moves_left_t40_0.95_0.05_10    :    83.4   15.3   308.0     500    62     100
   2 lc0_128x10_moves_left_t40_master          :    44.0   15.4   281.0     500    56     100
   3 lc0_128x10_ccrl                           :     0.0    9.5   411.0    1000    41     ---

White advantage = 29.85 +/- 7.47
Draw rate (equal opponents) = 54.67 % +/- 1.68

moves_left_winning

Won games with the moves left logic enabled were on average 70 plies shorter.

Since the moves left head is trained from T40 games it might not be that representative of full strength games. Rescorer fixes the moves left counter for some endgame positions. Example endgame position that rescorer can currently rescore the moves counter correctly for:

position fen 4R3/8/3K4/8/8/5k2/8/8 w - - 5 93

Stockfish multipv = 4:

info depth 72 seldepth 24 multipv 1 score mate 12 nodes 100000916 nps 4562293 hashfull 223 tbhits 0 time 21919 pv d6e5 f3e3 ...
info depth 71 seldepth 24 multipv 2 score mate 12 nodes 100000916 nps 4562293 hashfull 223 tbhits 0 time 21919 pv e8e1 f3f4 ..
info depth 71 seldepth 24 multipv 3 score mate 12 nodes 100000916 nps 4562293 hashfull 223 tbhits 0 time 21919 pv e8e5 f3g4 ...
info depth 71 seldepth 24 multipv 4 score mate 12 nodes 100000916 nps 4562293 hashfull 223 tbhits 0 time 21919 pv d6c5 f3f4 ...

lc0 with 128x10 net using moves left head (M column):
info string e8c8  (1715) N:     101 (+11) (P:  7.21%) (Q:  0.99727) (D:  0.003) (M: 12.0) (U: 0.07066) (Q+U:  1.06792) (V:  0.9981) 
info string e8d8  (1716) N:     103 (+15) (P:  7.40%) (Q:  0.99741) (D:  0.003) (M: 11.8) (U: 0.06883) (Q+U:  1.06623) (V:  0.9958) 
info string e8h8  (1719) N:     105 (+16) (P:  7.67%) (Q:  0.99734) (D:  0.003) (M: 11.9) (U: 0.06957) (Q+U:  1.06691) (V:  0.9965) 
info string e8b8  (1714) N:     106 (+16) (P:  7.79%) (Q:  0.99714) (D:  0.003) (M: 12.0) (U: 0.07013) (Q+U:  1.06727) (V:  0.9977) 

The moves left head is very small and nps is within the normal variation compared to the same net with master branch.

This PR needs updated protobuf definitions from LeelaChessZero/lczero-common#9. Only supports cudnn backend at the moment.

Copy link

@leedavid leedavid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very useful function. thanks

@MelleKoning
Copy link
Contributor

This is great enhancement for learning.

  • It's beauty lies in the fact the net learns to prefer to play for shorter wins. Amazing.
  • At the same time prefer longer lines when loosing, hoping opponent makes a mistake
  • All this without search enhancements, but by learning the net size up positions and estimate moves left that influence moves to take.
  • Another benefit of average shorter games will be faster return on investment for training games which current takes up the most time from the community
  • and in general less end of game trolling
    Nice work @Ttl !

@Ttl
Copy link
Member Author

Ttl commented Nov 12, 2019

I added a moves left head to T40B.4-160 by freezing the existing weights and training only the moves left head from CCRL dataset. Policy and value outputs are identical to the original net. It can be downloaded from: https://hforsten.com/leelaz/T40B.4-160_moves-left.pb.gz

Since all existing weights were frozen the moves left output might not be as accurate as if it was included from the start, but in my testing it is still useful.

@ASilver
Copy link

ASilver commented Jan 29, 2020

So I ran some tests on this and the conclusions are curious. There is good reason to believe this might be an Elo gainer, but there are unknowns at play and at least one question I cannot answer.

I trained a 128x10 net with this using a mix of human games rated 2300 Elo to 2800 and selfplay games from my FF/DeusX project. All were rescored with the rescoring binary by Tilps as per instructions to test this. Training went fairly smoothly and I used the Moves-Left weight of 0.1 as per the initial suggested value by Ttl in the training code PR. The LR drop was done at 60k, and I then produced an NN at 90k steps and 120k steps and tested both against Test59 id59300 at 6400 nodes.

To avoid a comparison that required two different NNs, where variance in the training might affect the results, I simply used the exact same NN and tested it once in lc0 v23, where it just ignores the ML policy head, and once in the special binary that supports the ML policy head. This way the exact same NN was used, once with the ML head ignored, and once with it consulted. Please note the match conditions stated no draws were allowed before move 60, hence the spike in the number of games that ended with exactly that number of moves.

Game length without ML head

image

Game length with ML head

image

At 90k steps I ran a first match of 1000 games to see how the results looked. The first result is with v23 and uses no ML head, and the second is with the ML head.

Score of lc0-v23 ML2-90 vs lc0-v23 Test59300: 293 - 200 - 507 [0.546]
Elo difference: 32.41 +/- 15.11

1000 of 1000 games finished.

Score of lc0-moves-left ML2-90 vs lc0-v23 Test59300: 315 - 178 - 507 [0.569]
Elo difference: 47.90 +/- 15.10

1000 of 1000 games finished.

Obviously the error margins still mean there could be some overlap in which the result might be simply luck and they are in fact of equal strength. I trained a bit further until 120k steps, and reran the matches:

Score of lc0-v23 ML2-120 vs lc0-v23 Test59300: 292 - 196 - 512 [0.548]
Elo difference: 33.46 +/- 15.03

1000 of 1000 games finished.

Score of lc0-moves-left ML2-120 vs lc0-v23 Test59300: 337 - 176 - 487 [0.581]
Elo difference: 56.43 +/- 15.42

1000 of 1000 games finished.

With two matches and two results suggesting a difference in strength, it starts to become compelling evidence this needs to be explored further. One question I have is whether the training of all the rest of the NN is the same as normal, with the one exception of the additional ML head.

I ran speed tests, to ensure there was no significant difference, and came up with only a 5% slowdown with the ML head (103knps with v23, and 97knps with ML - tested on a single 2080ti after 60 seconds).

For absolute transparency, I am attaching here not only the PGNs of the matches, but the NN file itself, and a link to the ML binary compiled for Windows.

Windows compile of Lc0 ML (courtesy of Daniel Uranga)
lc0-moves-left.zip

NN file trained to 120k steps (128x10)
AZK-ML2-swa-120000.pb.gz

PGN matches of 120k build
1k-games-matches.zip

@ASilver
Copy link

ASilver commented Jan 29, 2020

I'm adding a few more graphs that might be of interest. Here are the graphs of only decisive games (win or loss), and then there are the graphs of only wins, to see if the average length of its wins shortened in any visible way.

Game length of all decisive games without ML
image
Game length of all decisive games with ML
image
Game length of all decisive games won by the ML NN without ML
image
Average win length: 64 moves

Game length of all decisive games won by the ML NN with ML
image
Average win length: 61 moves

@dkappe
Copy link

dkappe commented Jan 29, 2020

Looking at the wins and losses of ML vs no ML, I calculated the average game length in ply for each:

Wins ML: 123.2
Wins no ML: 129.4

Loss ML: 145.7
Loss no ML: 139.3

So, wins are shorter and losses are longer.

src/mcts/search.cc Outdated Show resolved Hide resolved
Copy link
Contributor

@Naphthalin Naphthalin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is something @ankan-ban should take a look at

@Ttl
Copy link
Member Author

Ttl commented Feb 17, 2020

Here is explanation of this PR. What's its purpose, some current limitations and possible improvements.

Purpose of the moves left head

When the root node Q (Win probability minus loss probability) is extreme (close to -1 or +1) all moves are winning or losing and the tree search can't choose between them. The search is very flat since all moves have the same value and search can't commit to any one of them. lc0 tends to just shuffle pieces when very sure of winning as it's not programmed to favour moves that lead to shorter wins. It seems that sometimes lc0 shuffles so much that the position isn't winning anymore (especially with small nets and lower nodes). AB engines can often calculate much deeper in endgame positions and can sometimes take advantage of it.

Moves left prediction can also be used for time management and analysis purposes.

Use in search

For every NN eval if NN has moves left head the predicted moves left is stored in the node. During the backup when Q is updated, moves left is also updated on the parent nodes. Moves left estimate is a floating point number like Q. Moves left is incremented by one in the backup one white's turn. Terminal nodes have zero moves left including TB terminals (It's updated again if made un-terminal). If node is certain and made terminal, the moves left is not zeroed and it stays the value it was when made terminal. Certain nodes can be left with wrong moves left since it's not updated when making a node certain. Usually it should be very close though, since there should have been many backups from terminal nodes. This could be fixed probably with little effort, but I haven't bothered yet. Certain nodes are favoured over non-certain when picking the best move, so it doesn't have effect on the chosen move.

When absolute value of the best child's Q exceeds MovesLeftThreshold UCI parameter, moves left head is used to add bonus to score of the moves. The bonus is calculated as bonus = factor * max(min(this_node_m - best_node_m, max_moves_scale), -max_moves_scale) / max_moves_scale. best_node_m is moves left estimate of the best child node and this_node_m is the same for the node which score is being calculated. max_moves_scale is MovesLeftScale UCI parameter andfactor is MovesLeftFactor UCI parameter.

If Q is negative, bonus is added to the score so that longer moves are preferred and subtracted if Q is positive to prefer shorter moves. For predicted opponents moves the same logic is used, but Q has different sign for the opponent moves and engine assumes that the opponent would also try to play faster wins and longer losses.

max and min in the bonus are used to clamp the bonus in range [-max_moves_scale, +max_moves_scale] which is then divided by max_moves_scale and multiplied by factor to get a bonus in range [-factor, +factor]. Clamping is used so that the bonus is not unbounded and maximum range is known accurately as we still want to choose the move mainly by Q. We wouldn't want the search to focus only just one move that seems to lead to very short win and ignore all the others.

Often there are shuffling moves that don't make progress and moves making progress. The difference in moves left is expected to be one. Shuffling move wasting one move would then get a bonus of factor / max_moves_scale (0.05 / 10 = 0.005) with the current parameters. This is a very small positive bonus that would be subtracted when winning and added when losing.

However sometimes there are for example moves sacrificing piece that keep the outcome of the game unchanged, but would greatly affect the length of the game. Let's assume that the position is winning and there is a move sacrificing a piece making the game estimated 20 moves longer. The bonus would be clamped with the default max_moves_scale = 10 to MovesLeftFactor = 0.05. This score would be subtracted from the score of that move. The penalty is bigger than for a move that wastes just one move.

There are probably other equations that could be used, this is fast to calculate and seems to work in testing. It might make sense to decrease the bonus as some function of nodes since the U term decreases too, but it needs testing.

UseMovesLeft UCI parameter can be used to disable the moves left logic.

Structure of moves left head neural net

The structure of the moves left head is very similar to WDL value head. It has 1x1 convolution from end of the residual stack that has 8 output channels. Similarly to the value head there is two fully connected layers with relu non-linearity between them. The output of the second fully connected layer is softmaxed. The moves left head is trained to predict the distribution of moves left in the game using cross entropy. The training target is one-hot vector. Expected value of moves left is obtained by usual formula for discrete distributions (https://en.wikipedia.org/wiki/Expected_value#Finite_case).

I tested a version of the head that had sigmoid non-linearity at the end which directly outputted the expected moves left, but that seemed to be too noisy and it's hard for net to predict moves left accurately using just one scaled sigmoid. Softmax is used so that the net would better differentiate between the expected moves left for positions with very similar moves left.

WDL moves left could probably be trained by triplicating the output so that there are three softmax outputs corresponding to moves left assuming win/draw/loss outcome. Training would be done by the same method as before, but only the output corresponding to the result of the game would be trained. Other two branches wouldn't be backpropagated. (credit to Straw from lc0 Discord for the idea)

WDL moves left head would be bigger and take slightly longer to calculate. It would have three output value instead of one and would increase the size of nodes and cache entries by three floats instead of just one float of simple moves left head (or four floats if the expected value is also needed for time management and analysis output). The benefit of having WDL moves left output is questionable, the main proposed purpose has been to use moves left assuming position is winning to choose moves leading to shorter wins. However when Q is extreme the outcome is very certain and the expected moves left is very close moves left assuming the position is winning/losing. Currently search only uses moves left head to choose shorter moves when Q is very close to one. To me it doesn't make sense to try to favor shorter moves when not sure of winning, ordinary tree search plays well when Q is not extreme. The issues occur only at extreme Q when all moves are winning or losing and tree search can't choose between the moves. Only then it makes sense to use moves left to try to favour shorter or winning moves.

src/mcts/node.h Show resolved Hide resolved
src/mcts/node.h Outdated Show resolved Hide resolved
src/mcts/node.h Show resolved Hide resolved
src/mcts/params.cc Outdated Show resolved Hide resolved
src/mcts/params.cc Outdated Show resolved Hide resolved
src/neural/blas/network_blas.cc Outdated Show resolved Hide resolved
src/neural/blas/network_blas.cc Show resolved Hide resolved
src/neural/cache.h Outdated Show resolved Hide resolved
src/neural/network_tf.cc Outdated Show resolved Hide resolved
src/mcts/search.cc Outdated Show resolved Hide resolved
@Ttl
Copy link
Member Author

Ttl commented Feb 26, 2020

I changed the moves left head to output a scalar value and predict plies instead of moves. Scalar seems to work just as well as softmax with correct loss. The structure is now similar to non-WDL value head, but the last non-linearity is relu.

Here are three nets with the new moves left head:

https://hforsten.com/leelaz/128x10-t59-moves_left-scalar-plies-huber10-10000.pb.gz
https://hforsten.com/leelaz/128x10-ccrl-moves_left-scalar-plies-huber-10000.pb.gz
https://hforsten.com/leelaz/62535_moves_left.pb.gz

T59 is 591200 with moves left head trained from CCRL dataset with other weights frozen. It's not very accurate since it wasn't trained with it from the beginning. 62535_moves_left is T60, 62535 net with moves left head added with the same method. Moves left predictions of the both nets are worse than a 128x10 net trained from scratch with moves left head.

The other net is trained from CCRL dataset with moves head left 200k steps with batch size of 2048 and 10k steps afterwards for only moves left head with slightly different loss function. It's pretty weak, but moves left head predictions are much better.

I played some games with the latest code and 591200 with several moves head left settings against a net trained from T40 games for a short duration. No adjudication, 3k fixed nodes, no TBs.

   # PLAYER                                :  RATING  ERROR  POINTS  PLAYED   (%)  CFS(%)    W    D    L  D(%)
   1 lc0_591200_moves_left_10_0.05_0.90    :   227.7   24.1   313.0     400    78      55  237  152   11    38
   2 lc0_591200_moves_left_10_0.05_0.95    :   225.1   24.4   312.0     400    78      53  236  152   12    38
   3 lc0_591200                            :   223.9   24.1   311.5     400    78      55  233  157   10    39
   4 lc0_591200_moves_left_10_0.05_0.98    :   221.3   23.6   310.5     400    78     100  235  151   14    38
   5 lc0_t40_128x10                        :     0.0   11.3   353.0    1600    22     ---   47  612  941    38
Player, average plies in won games
lc0_591200 191.8
lc0_591200_moves_left_10_0.05_0.90 144.2
lc0_591200_moves_left_10_0.05_0.95 146.4
lc0_591200_moves_left_10_0.05_0.98 161.1

With moves left head enabled the strength is within error bounds, but games are on average 30 to 50 plies shorter depending on the settings.

Also for anyone testing note that the search logic is disabled by default. Set MovesLeftFactor = 0.05, MovesLeftThreshold = 0.95 and MovesLeftScale = 10 for example to enable it.

Copy link
Contributor

@Naphthalin Naphthalin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As there already was some confusion on the interpretation of parameters concerning the actual slope (which isn't a direct parameter but the ratio of factor and scale) -- maybe it would be helpful to refactor the parameters and replace scale with slope? Then we would see the 3 important parameters directly: when to apply it, what the maximum effect on Q is, and how important a 1 move difference is.

@jhorthos
Copy link
Contributor

jhorthos commented Feb 28, 2020

One issue is adding a float to every node in the current tree - is it possible to get away with a 2 byte float or a short? We are already pushing memory limits at Sufi time control and for long analysis tests (e.g. ICCF) it is also an issue.

@Naphthalin
Copy link
Contributor

One issue is adding a float to every node in the current tree - is it possible to get away with a 2 byte float or a short? We are already pushing memory limits at Sufi time control and for long analysis tests (e.g. ICCF) it is also an issue.

Please correct me if I'm wrong, but I think master currently has 4 bytes (= 1 free float slot) per node before we sizeofnode needs to be increased, as size of node is only increased in steps of 8.

@jhorthos
Copy link
Contributor

jhorthos commented Feb 28, 2020

Ah you are probably right - I am not much good at C++. :-)
In which case one float is free for now.

src/mcts/search.cc Outdated Show resolved Hide resolved
src/mcts/search.cc Outdated Show resolved Hide resolved
src/mcts/search.cc Outdated Show resolved Hide resolved
src/mcts/search.h Outdated Show resolved Hide resolved
Propagate moves left for certain nodes and assign moves left
for TB nodes based on parent node.
@Ttl
Copy link
Member Author

Ttl commented Mar 1, 2020

The latest commit improves moves left estimate for certain and TB nodes. Not all certain node siblings are considered. If node is certain losing, the parent node is made a certain win with plies left of 1 + child node plies left. But there could be siblings with faster wins. This shouldn't underestimate the plies left and is better than just freezing the plies left as was done previously.

TB nodes don't have NN evaluation. Plies left is assigned to them as parent node plies left - 1.

Some tests. lc0_591200_moves_left_10_0.05_0.95_terminals is the latest commit:

fixed 5k nodes, 5 piece TBs, syzygy-fast play disabled.

   # PLAYER                                          :  RATING  ERROR  POINTS  PLAYED   (%)  CFS(%)    W     D    L  D(%)
   1 lc0_591200                                      :    57.9    8.9   697.5    1200    58      71  293   809   98    67
   2 lc0_591200_moves_left_10_0.05_0.95_terminals    :    53.7    8.8   690.5    1200    58      56  287   807  106    67
   3 lc0_591200_moves_left_10_0.05_0.95              :    52.5    8.8   688.5    1200    57     100  285   807  108    67
   4 lc0_ld2                                         :     0.0    5.1  1523.5    3600    42     ---  312  2423  865    67

Player, Average plies in won games
lc0_591200 207.3
lc0_591200_moves_left_10_0.05_0.95 182.3
lc0_591200_moves_left_10_0.05_0.95_terminals 167.2
fixed 5k nodes, no TBs.
   # PLAYER                                          :  RATING  ERROR  POINTS  PLAYED   (%)  CFS(%)    W     D    L  D(%)
   1 lc0_591200_moves_left_10_0.05_0.95_terminals    :    61.0    8.7   702.5    1200    59      58  305   795  100    66
   2 lc0_591200_moves_left_10_0.05_0.95              :    59.5    8.8   700.0    1200    58      91  306   788  106    66
   3 lc0_591200                                      :    49.6    8.8   683.5    1200    57     100  259   849   92    71
   4 lc0_ld2                                         :     0.0    5.1  1514.0    3600    42     ---  298  2432  870    68

Player, Average plies in won games
lc0_591200 205.3
lc0_591200_moves_left_10_0.05_0.95 187.5
lc0_591200_moves_left_10_0.05_0.95_terminals 183.4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants