-
Notifications
You must be signed in to change notification settings - Fork 529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WDL Conversion for more realistic WDL and contempt #1791
Conversation
…et's see if this compiles
Explaining the general mechanicsThe formulas used for WDL conversion aren't as random as they look, and instead are the result of an underlying mathematical model. The current parametrization reflects this inner model, but it would probably much more accessible if one could enter playing level and Elo difference, or simply white and black Elo. The mathematical modelWe model a chess game (or, really any other game where there is a theoretically best move) as players taking turns with playing inaccuracies drawn from an inaccuracy distribution. The current position gets an "objective" value, from which a random walk is started, and if at some point the value is above (or below) a certain value, the game is scored as a win (loss); if it ultimately ends up between these values, it's a draw. The WDL distribution at any point should predict the expected outcomes of the three results based on the skill level of both players. Assumptions for simplification:
Derivation of the formulas:The areas above +1 and below -1 of a logistic curve with mean Applying a random walk with Since we know mean and variance of all 4 involved players (white/black, reference/target players), we can approximate However, in this transformation not all 8 parameters are neded. In fact, as training is between equal opponents anyway, scaling E and var by a constant factor only changes the internal value of |
} else { | ||
node_to_process->v = v; | ||
node_to_process->d = d; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this new code should be a function, instead of an inlined block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely agree! However, I am not yet worried about the optimization and code style, and only about the functionality. Also, I was wondering whether it is a good idea to do the conversion here, which is also executed on NN cache hits, instead of doing it in a way that the converted values are cached.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to see this conversion pushed "deeper" so that search is getting already adjusted NNEval
s. This should also make merge with DAG easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you consider "deeper"? Right now, it is as deep as it can get without touching the NN output itself; whenever a new node is created, it gets the (unaltered) WDL from the NN (either by an actual NN GPU eval or by a cache hit), and applies the conversion. Do you suggest to alter the NN WDL output, so we don't have to calculate the conversion multiple times on cache hits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I suggest to do the WDL adjustment before or on cache insertion so that we don't have to do that (multiple times) while fetching NN evals in search.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have thought about it a bit more, and I don't overly like storing it in the NN cache, as I can easily imagine restarting search with different parameters without clearing the cache. However, the conversion formula now really is lightweight, equivalent to calculating the PUCT score for maybe 5 nodes, but only performed once per addition of a new node. If there is any measureable CPU delay at all and (with DAG) the ratio of (new nodes requesting NN evals) / (cache insertions)
is much bigger than 1, it is worth reconsidering, though that would likely mean touching multiple backends, and somehow transferring search parameters there.
…he parameters are meaningless like that
…instead of 0. Also added a few comments and turned magic numbers into meaningful const float variables
throw Exception("Invalid default contempt: " + entry); | ||
} | ||
} else if (parts.size() == 2) { | ||
if (std::search(name.begin(), name.end(), parts[0].begin(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to check that parts[x] is a substring of name, right? (case insensitive).
(are you sure that you need a substring rather than a full match)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes; common practice in TCEC and CCC seems to be to add some versioning etc to the name, which we won't know in advance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC the original reasoning was to catch something like "Dragon" and "Komodo Dragon".
Adds an Elo based WDL transformation of the NN value head output. Helps with more accurate play at high level (WDL sharpening), more aggressive play against weaker opponents and draw avoiding openings (contempt), piece odds play. Also adds a new ScoreType `WDL_mu` which follows the new eval convention, where +1.00 means 50% white win chance. (cherry picked from commit 53b31ae)
} else if (score_type == "WDL_mu") { | ||
// Reports the WDL mu value whenever it is reasonable, and defaults to | ||
// centipawn otherwise. | ||
const float centipawn_fallback_threshold = 0.99f; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Threshold currently has the fallback if W or L drops below 0.5%, which seems to be a bit soon. Should probably be changed to 0.996 or so, which would make it act if W or L drop below 0.2%
If using Lc0 in analysis mode on an already-completed armageddon game between two (human players, is there a combination of values to |
Examples on how to use the PR, pt 3Armageddon gamesArmageddon games are usually played under the following conditions:
This could be for example achieved by the following settings:
note: a position with equal chances for white and black will show as approx. 0.00 when using I used these settings to kibitz the KGA Armageddon game between Hikaru and Magnus, and from a "practical chances" perspective the KGA didn't seem like a too bad choice for an Armageddon game :) Just curious, is this also the game you were interested in @yuzisee ? PS: In the near future, the |
How did you know 🙂 |
The current contempt implementation is based on drawscore, which works to some extent but is ultimately flawed because the WDL distribution (especially the draw rate) coming from training games doesn't take the Elo difference into account, and ultimately the stronger side is avoiding not only short draws, but also positions Leela can likely hold against herself.
Similarly, in high level matches D is usually too low (and the winning chances of the disadvantaged side absurdly high), which again makes the WDL inaccurate, as DivP opponents won't make the same frequent inaccuracies as training game Leela.
Third, Leela would be much more interesting for opening game preparation if she would suggest different lines in the opening depending on the Elo difference between black and white and the time control (and playing level) dependent expected draw rate. Current drawscore contempt absolutely fails at this, as it basically never influences the opening choice.
This PR adds a general WDL conversion which hopefully serves both as an accuracy rescaling and as an expected inaccuracy based contempt. I will explain the general mechanics and the derivation of the theoretical model in a separate post.