-
Notifications
You must be signed in to change notification settings - Fork 529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Moves left head #961
Moves left head #961
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very useful function. thanks
This is great enhancement for learning.
|
I added a moves left head to T40B.4-160 by freezing the existing weights and training only the moves left head from CCRL dataset. Policy and value outputs are identical to the original net. It can be downloaded from: https://hforsten.com/leelaz/T40B.4-160_moves-left.pb.gz Since all existing weights were frozen the moves left output might not be as accurate as if it was included from the start, but in my testing it is still useful. |
So I ran some tests on this and the conclusions are curious. There is good reason to believe this might be an Elo gainer, but there are unknowns at play and at least one question I cannot answer. I trained a 128x10 net with this using a mix of human games rated 2300 Elo to 2800 and selfplay games from my FF/DeusX project. All were rescored with the rescoring binary by Tilps as per instructions to test this. Training went fairly smoothly and I used the Moves-Left weight of 0.1 as per the initial suggested value by Ttl in the training code PR. The LR drop was done at 60k, and I then produced an NN at 90k steps and 120k steps and tested both against Test59 id59300 at 6400 nodes. To avoid a comparison that required two different NNs, where variance in the training might affect the results, I simply used the exact same NN and tested it once in lc0 v23, where it just ignores the ML policy head, and once in the special binary that supports the ML policy head. This way the exact same NN was used, once with the ML head ignored, and once with it consulted. Please note the match conditions stated no draws were allowed before move 60, hence the spike in the number of games that ended with exactly that number of moves. Game length without ML head Game length with ML head At 90k steps I ran a first match of 1000 games to see how the results looked. The first result is with v23 and uses no ML head, and the second is with the ML head.
Obviously the error margins still mean there could be some overlap in which the result might be simply luck and they are in fact of equal strength. I trained a bit further until 120k steps, and reran the matches:
With two matches and two results suggesting a difference in strength, it starts to become compelling evidence this needs to be explored further. One question I have is whether the training of all the rest of the NN is the same as normal, with the one exception of the additional ML head. I ran speed tests, to ensure there was no significant difference, and came up with only a 5% slowdown with the ML head (103knps with v23, and 97knps with ML - tested on a single 2080ti after 60 seconds). For absolute transparency, I am attaching here not only the PGNs of the matches, but the NN file itself, and a link to the ML binary compiled for Windows. Windows compile of Lc0 ML (courtesy of Daniel Uranga) NN file trained to 120k steps (128x10) PGN matches of 120k build |
Looking at the wins and losses of ML vs no ML, I calculated the average game length in ply for each: Wins ML: 123.2 Loss ML: 145.7 So, wins are shorter and losses are longer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is something @ankan-ban should take a look at
Here is explanation of this PR. What's its purpose, some current limitations and possible improvements. Purpose of the moves left headWhen the root node Q (Win probability minus loss probability) is extreme (close to -1 or +1) all moves are winning or losing and the tree search can't choose between them. The search is very flat since all moves have the same value and search can't commit to any one of them. lc0 tends to just shuffle pieces when very sure of winning as it's not programmed to favour moves that lead to shorter wins. It seems that sometimes lc0 shuffles so much that the position isn't winning anymore (especially with small nets and lower nodes). AB engines can often calculate much deeper in endgame positions and can sometimes take advantage of it. Moves left prediction can also be used for time management and analysis purposes. Use in searchFor every NN eval if NN has moves left head the predicted moves left is stored in the node. During the backup when Q is updated, moves left is also updated on the parent nodes. Moves left estimate is a floating point number like Q. Moves left is incremented by one in the backup one white's turn. Terminal nodes have zero moves left including TB terminals (It's updated again if made un-terminal). If node is certain and made terminal, the moves left is not zeroed and it stays the value it was when made terminal. Certain nodes can be left with wrong moves left since it's not updated when making a node certain. Usually it should be very close though, since there should have been many backups from terminal nodes. This could be fixed probably with little effort, but I haven't bothered yet. Certain nodes are favoured over non-certain when picking the best move, so it doesn't have effect on the chosen move. When absolute value of the best child's Q exceeds If Q is negative, bonus is added to the score so that longer moves are preferred and subtracted if Q is positive to prefer shorter moves. For predicted opponents moves the same logic is used, but Q has different sign for the opponent moves and engine assumes that the opponent would also try to play faster wins and longer losses. max and min in the bonus are used to clamp the bonus in range Often there are shuffling moves that don't make progress and moves making progress. The difference in moves left is expected to be one. Shuffling move wasting one move would then get a bonus of However sometimes there are for example moves sacrificing piece that keep the outcome of the game unchanged, but would greatly affect the length of the game. Let's assume that the position is winning and there is a move sacrificing a piece making the game estimated 20 moves longer. The bonus would be clamped with the default There are probably other equations that could be used, this is fast to calculate and seems to work in testing. It might make sense to decrease the bonus as some function of nodes since the U term decreases too, but it needs testing.
Structure of moves left head neural netThe structure of the moves left head is very similar to WDL value head. It has 1x1 convolution from end of the residual stack that has 8 output channels. Similarly to the value head there is two fully connected layers with relu non-linearity between them. The output of the second fully connected layer is softmaxed. The moves left head is trained to predict the distribution of moves left in the game using cross entropy. The training target is one-hot vector. Expected value of moves left is obtained by usual formula for discrete distributions (https://en.wikipedia.org/wiki/Expected_value#Finite_case). I tested a version of the head that had sigmoid non-linearity at the end which directly outputted the expected moves left, but that seemed to be too noisy and it's hard for net to predict moves left accurately using just one scaled sigmoid. Softmax is used so that the net would better differentiate between the expected moves left for positions with very similar moves left. WDL moves left could probably be trained by triplicating the output so that there are three softmax outputs corresponding to moves left assuming win/draw/loss outcome. Training would be done by the same method as before, but only the output corresponding to the result of the game would be trained. Other two branches wouldn't be backpropagated. (credit to Straw from lc0 Discord for the idea) WDL moves left head would be bigger and take slightly longer to calculate. It would have three output value instead of one and would increase the size of nodes and cache entries by three floats instead of just one float of simple moves left head (or four floats if the expected value is also needed for time management and analysis output). The benefit of having WDL moves left output is questionable, the main proposed purpose has been to use moves left assuming position is winning to choose moves leading to shorter wins. However when Q is extreme the outcome is very certain and the expected moves left is very close moves left assuming the position is winning/losing. Currently search only uses moves left head to choose shorter moves when Q is very close to one. To me it doesn't make sense to try to favor shorter moves when not sure of winning, ordinary tree search plays well when Q is not extreme. The issues occur only at extreme Q when all moves are winning or losing and tree search can't choose between the moves. Only then it makes sense to use moves left to try to favour shorter or winning moves. |
Use the estimate for better centering the moves left score bonus.
I changed the moves left head to output a scalar value and predict plies instead of moves. Scalar seems to work just as well as softmax with correct loss. The structure is now similar to non-WDL value head, but the last non-linearity is relu. Here are three nets with the new moves left head: https://hforsten.com/leelaz/128x10-t59-moves_left-scalar-plies-huber10-10000.pb.gz T59 is 591200 with moves left head trained from CCRL dataset with other weights frozen. It's not very accurate since it wasn't trained with it from the beginning. The other net is trained from CCRL dataset with moves head left 200k steps with batch size of 2048 and 10k steps afterwards for only moves left head with slightly different loss function. It's pretty weak, but moves left head predictions are much better. I played some games with the latest code and 591200 with several moves head left settings against a net trained from T40 games for a short duration. No adjudication, 3k fixed nodes, no TBs.
With moves left head enabled the strength is within error bounds, but games are on average 30 to 50 plies shorter depending on the settings. Also for anyone testing note that the search logic is disabled by default. Set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As there already was some confusion on the interpretation of parameters concerning the actual slope (which isn't a direct parameter but the ratio of factor
and scale
) -- maybe it would be helpful to refactor the parameters and replace scale
with slope
? Then we would see the 3 important parameters directly: when to apply it, what the maximum effect on Q
is, and how important a 1 move difference is.
One issue is adding a float to every node in the current tree - is it possible to get away with a 2 byte float or a short? We are already pushing memory limits at Sufi time control and for long analysis tests (e.g. ICCF) it is also an issue. |
Please correct me if I'm wrong, but I think |
Ah you are probably right - I am not much good at C++. :-) |
Can't test this on Linux.
Propagate moves left for certain nodes and assign moves left for TB nodes based on parent node.
The latest commit improves moves left estimate for certain and TB nodes. Not all certain node siblings are considered. If node is certain losing, the parent node is made a certain win with plies left of 1 + child node plies left. But there could be siblings with faster wins. This shouldn't underestimate the plies left and is better than just freezing the plies left as was done previously. TB nodes don't have NN evaluation. Plies left is assigned to them as parent node plies left - 1. Some tests.
|
Adds support for optional moves left head in the NN that tries to predict how many moves are remaining in the current game. Needs NN trained with moves left head to have any effect.
The moves left head is currently used to pick shorter moves when the root Q is above configurable threshold and longer moves when root Q is below the negative threhold. This fixes the problem of trolling where lc0 either throws away pieces when losing or doesn't make progress when winning.
I trained a 128x10 network from rescored T40 data with moves left head and matched it against 128x10 net trained from CCRL dataset. No adjudication or TB. Each player had 10k fixed nodes.
lc0_128x10_moves_left_t40_0.95_0.05_10
uses the moves left logic,lc0_128x10_moves_left_t40_master
is the same network using master branch andlc0_128x10_ccrl
is the CCRL trained net using master branch.The 128x10 network with moves left head can be downloaded here.
Won games with the moves left logic enabled were on average 70 plies shorter.
Since the moves left head is trained from T40 games it might not be that representative of full strength games. Rescorer fixes the moves left counter for some endgame positions. Example endgame position that rescorer can currently rescore the moves counter correctly for:
The moves left head is very small and nps is within the normal variation compared to the same net with master branch.
This PR needs updated protobuf definitions from LeelaChessZero/lczero-common#9. Only supports cudnn backend at the moment.