-
Notifications
You must be signed in to change notification settings - Fork 528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trade penalty per Node #536
Conversation
* move kStartBoardPos to ChessBoard * always build with WholeProgramOptimization on Appveyor * make kStartingFen char* * better constant names
* add --movetime option * set benchmark search defaults to uci ones
…aChessZero#494) (It measures slighly wrong time from what was intended)
* opencl: replace thread_local with a resource pool. * Local variables clean-up. * clang-format with -style=google * Removing compiler warnings. * Local variables naming fix. * Removed a no longer used mutex. * Fix XgemmBatched/Xgemv for retrieving wavefront size. * Fixing OpenCLBuffers ctor (const ref). * Fixing OpenCLComputation ctor (const ref).
With the help of jjosh and Videodr0me I changed the trade-penalty (piece trade "contempt") to change in depth of search, because the original seemed to only include the total piece count at root and didn't change. My nickname is "Swynndla" on discord. The Below test was all with lc0 v0.19.0-rc5 and was using a TC of 30+0.5 (30s+0.5).
|
I'm still hesitant to add ("non-zero") changes which intend to patch some misbehavior, rather than help NN avoid that behavior. |
This PR or #466 may not be "zero", but they seem very logical to me. To win, a chess-playing entity has to create a favorable imbalance, and agreeing to equal trades is opposed to that concept. If either PR gains significantly against AB engines, and doesn't regress in self-play, why not just use it? |
@mooskagh In case of opponents of equal strength (not related to contempt) it still makes sense to have an "avoid trading" penalty for the winning side. I agree the engine should find it out by itself, but it requires to look far ahead and have knowledge about the endgame. |
From a "puristic" point of view, the cleanest solution is to let the NN decide on everything, including trading down pieces. Moreover, I disagree with the statement that you should tweak the behavior depending on the opponent. Suppose you are a chess truth oracle, you just play (one of) the best move(s), irrespectively. I am not claiming Leela is such an oracle already, but why not try to aim for that in a consequent zero spirit? |
Because chess starts with score 0.0 and even a perfect engine could run into a 3-fold draw. Would a human grandmaster ever do this to a chess rookie? Btw I don't know why the zero concept is still floating around. Even the bugged test10 networks on good GPUs are stronger vs. crippled Stockfish 8 than Alpha Zero on Google's hardware. |
Let's indeed say chess is a draw (and you know it as an oracle), I didn't say that you have to make life easy for your opponent by going to the quickest 3 fold rep. A perfect engine should also try to maximize the possibility for its (imperfect) opponent to go astray, on top of playing the best moves that don't change the outcome of the game. Keeping pieces on the board is most likely a good strategy to do that, but why not let Leela find that out by herself? (EDIT: I now realize that there is maybe a loophole in my argument. If Leela gets better and better, she doesn't play against "weaker" opponents during selfplay to reinforce this kind of behavior, namely not trading pieces, but only against her (stronger and stronger) self. For this stronger self, it might be less important whether pieces are traded or not, if it doesn't affect the theoretical outcome of the game. This seems like a strong argument in favor of contempt during matchplay.) |
As ddobbelaere points out, playing training games only against an opponent with the same strength as itself will never teach Leela to maximise Elo in tournament settings. If we want Leela to really behave a like human, and take the strength of the opponent into account, the zero way of doing it would be to provide Elo of opponent as input to the NN and include weaker versions of herself in training, and - during tournament play - include opponent Elo in the UCI protocol. However, the UCI protocol is what it is and not up to us to change, so while a zero solution would be possible and preferable, it is not likely to happen. However, all non-zero patches of playing behaviour should be also be thoroughly tested against SF-dev, since the goal is to be best, not to beat laser more convincingly, and only be merged into master if they do not perform worse against SF-dev. |
"lc0-536-tuned" is with parameters from the tuning I did resulting in: Tuning and matches on very short TC. "lc0-536-tuned" vs. default:
|
@hans-ekbrand The UCI protocol already has the option to pass Elo (whether it's used in any tournaments is a separate question):
|
This pr
|
Results so far vs SF (30s+0.5, SF only has 4 cores):
|
I'm wondering, if trade-penalty / piece-trade contempt works, then it should maybe have even more of an effect vs SF (rather than laser) because SF is so good at endgames ... and if so, it means that we don't have to choose to use it only vs weaker opponents etc, but we can use it all the time. Also, it's true that Leela is weak in endgames as compared to AB engines, and part of that may be things like temperature in training and also Leela doesn't get to see many endgames, but even if that is fixed in training, Leela will still be weaker than AB engines in endgames due to the nature of MCTS vs AB, where AB really comes to life due to low branching factor of endgames (ie seeing ahead 50 - 70 ply) and also the requirement of precise play on lines (and this is why MCTS can out-perform AB in middlegames over endgames). Therefore some sort of piece-trade contempt would probably help even if Leela's endgame training is improved. Certainty prop would help a bit, and maybe an alternative to MCTS would be better, but for now MCTS is the best we have. |
Another thing I'd like to mention, if some sort of piece trade penalty method is being tested, then probably it doesn't make sense to test it with very short TC, because it heavily depends on the search - ie trades are compared in the search, and at very short TC there'd be lots of noise, whereas the longer the TC, the more the piece trade penalty would have an effect. I've tested at 30s + 0.5, and I wouldn't think less than this is a good idea? ... I'd like to see tests with longer TC's. |
Final results vs SF (4-threads) 30s+0.5:
Will test longer TC next. |
Added the source code update and tuned again for long TC, tuned parameter values are very similar to those from short TC tuning but converged in less number of games: Now there is clearly an improvement:
Running "lc0-356-tuned" vs. "lc0" next.... |
It’s also useful as an analysis tool, so I wouldn’t call it a patch for a misbehavior. It is an excellent feature for trying to get different insights into a position regarding complexity. You cannot train Leela into offering different analyses, so offering a feature like this is useful. Gaining Elo against some engines is a nice side effect. |
Added parameter options for trade penalty so people can specify their own: To be called with eg: --trade-penalty=0.0258 --trade-penalty2=11.1 |
They should probably default to off. |
All 30+0.5 testing Tried zz's clop and it was so much worse I stopped it early since swylandia also was having bad early results and I have more promising things to test.
The prior defaults though are stronger than mine against SF! 83% LOS
Both are nearly as strong as fast cfish dev built November 1st! regular leela is ~250 elo behind vs 60 elo or 8 elo here. |
Yea I tried ZZ's clopped results and they were worse. I'll try with the 0.005 & 32 numbers I was using before and test again just to make sure I didn't make a mistake in the parameter options version (although testing on my test position showed it was fine). I'll also change the parameters to default to off as gonzalezjo suggested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performed extensive NPS-Tuner clop to identify the best parameters for the following versions.
Followed up with gauntlets against TCEC div 3 engines at 30 min + 10 sec increment.
Config 1 v0.19 Submitted to CCC 3 – operating well to date at 78% (current td 2nd)
Config 4 v0.19-rc4 contempt and v20-dev trade prevention both returned 82.14% against TCEC DIV 3 engines with no losses.
I strongly recommend submitting v20-dev “Trade Prevention” for TCEC DIV 3 with the above recommended parameters and/ adjustments by jjosh.
The final release v20 with merge of Trade Prevention to be submitted for TCEC div 2 following parameter tuning and testing prior to div 2 starting.
There is little to zero downside risk (imo) in adopting the above approach.
Trade-penalty should probably default to off so the PR is neutral for people that don't like the concept (@mooskagh), but there isn't really a good reason not to use this in TCEC/CCC. The tunings show a rather convincing benefit in my opinion. |
Trade Penalty is no penalty, it is an advantage! |
Selfplay with my tuned values didn't go well:
Now I will try tuning 1) at selfplay and 2) at selfplay alternating with playing weaker stockfish (equal opponent and weaker opponent simultaneously). |
I re-ran the 30s + 0.5 vs SF for piece of mind, because my previous test was using v19 rc5 and also didn't use the -0.9999 to 0.9999 bit ... the PR is using v19 release and it's also using:
... the results of the new 30s + 0.5 vs SF that I ran with up-to-date PR are:
So it's still good - not quite as good but still good. |
Tuning result in selfplay and vs. 300Elo weaker Stockfish simultaenously: Edit: The difference in play with parameter changes above was negligible. I retuned vs. weaker SF again but with narrower bounds ([-0.01;0.01] for the first parameter) and this time I got:
Selfplay:
|
I'm continuing to test this tp version with a lot of different values and ideas, but so far the one submitted to tcec keeps coming out of top (ie the 0.005 & 32 one) - at lease for short time controls, as can be seen here (30s+0.5):
... and if I combine that version with previous 30s+0.5 results:
(I believe jjosh & Navs are testing longer time controls.) |
...and combining the 30s+0.5 tests vs SF9:
|
The 3-fold repitition draw: rofChade vs Lc0 TP in tcec s14 div3 2018-12-05:
SF wants to do 22...Kg8 after a while of thinking. Using --trade-penalty=0.005 --trade-penalty2=16.0 instead of --trade-penalty=0.005 --trade-penalty2=32.0 tests just as well so far (not many games) using 30s+0.5:
It's interesting that tests show there's not a lot in it for elo with values of --trade-penalty2 between 32 & 16 (and even 36 seems fine) but using 0 loses some elo, and -16 loses more, and -32 loses so much in tests that it's weaker than laser. So it looks like --trade-penalty=0.005 is giving the penalty for trading pieces, and --trade-penalty2=16.0 is giving standard contempt (due to the calculation that's used, in the above position it would be 24 pieces - 16). |
For v0.19.1-rc2 vs v0.19.1-rc2 tp (30s+0.5) tested:
... ie tp still helps |
For v0.19.1-rc2 vs SF9 (30s+0.5), although it initially started out well, it has since not done so well in my test:
... the number of games isn't great and so might be due to noise, but it looks as that tp isn't that helpful vs SF (even though it seemed to be for v0.19.0). |
I tried v0.19.1-rc2 tp vs SF with a longer TC of 2m+2s (120+2) so that variable cpuct could kick in (and also my theory is that tp does well with deeper search):
... not many games yet but at least it's promising so far. Next I'll actually merge rc2 into this PR so others can test too. |
I broke git trying to upgrade to rc2 ... I google and asked on discord and tried heaps of things (spent ages on this), but I think I've broken git more. When I do a git clone: git clone -b patch-1 --recurse-submodules https://github.com/nblaxall/lc0.git
|
Add trade-penalty to search