Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update q to cp conversion formula #1193

Merged
merged 3 commits into from
Apr 22, 2020
Merged

Conversation

Ttl
Copy link
Member

@Ttl Ttl commented Apr 10, 2020

Based on the current TCEC games, the Q to centipawn conversion formula seems to be too pessimistic at high Q. lc0 cp evaluations are often much lower than the other engines.

I downloaded the TCEC SUFI evaluations and compared to the lc0 cp evaluations to SF evaluations. Below are the results:

lc0_q

The evaluation for Q < 0 comes from only one game and can be ignored. The current formula is the line labeled with centipawn. The first too optimistic conversion function is centipawn_2018 line. I also plotted the conversion function from #841 that used the old equation but fitted to more games. The #841 equation seems to fit very well to the median SF evaluation. I can get slightly better fit with more complex functions, but this one has advantage of being easily invertible which is nice for calculating the Q from centipawns. This PR reverts the conversion function to one from #841.

The average error to median SF eval on this dataset is 0.79 pawns with the current function, 0.28 with this function and 1.85 with the centipawn_2018.

The script for downloading evaluations and plotting the graphs can be found at: https://gist.github.com/Ttl/9926f70800b3fbd7314c150108d4ba61

Revert the formula to PR841 equation.
@mooskagh
Copy link
Member

It seems that after every TCEC season we start to retune our formula as it doesn't fit the centipawn scale well.

I guess what's really happen is different networks have different Q scale so to say, meaning that different networks will need different tunes.

Maybe it makes sense to have a library of a few hundreds positions with SF eval and run Lc0 evals periodically on them to re-tune the formula..

@Ttl
Copy link
Member Author

Ttl commented Apr 12, 2020

Don't merge this just yet, I'll tune this further when there are slightly more games played in sufi,

@Naphthalin
Copy link
Contributor

I think this formula is much better than our current one for 2 reasons:
1.) it is invertible
2.) if I see that correctly, it reaches +10.0 just below our resign threshold in training (threshold 0.96, +10.0 at 0.934)
and would really appreciate to employ it as our new default formula.

@zz4032
Copy link
Contributor

zz4032 commented Apr 13, 2020

I guess what's really happen is different networks have different Q scale so to say, meaning that different networks will need different tunes.

I reran the adjusting process on random positions from human games as I did previously for the existing conversion formula.

Current formula + old sample points (T40):
previous
Indeed it seems that T60 networks produce much fewer extreme Q values (close to -1 or 1) compared to T40. Current formula (red) and new sample data with T60 don't correlate well starting from CP +/-300.
A small adjustment of the current formula however with exponent parameter changed from 14 to 8 shows a much better fit (green): uci_info.score = 295 * wl / (1 - 0.976953126 * std::pow(wl, 8));. Orange is PR1193.
Current formula + new sample points (SV-T60):
adjusted
Same diagram with a zoom at CP [-200;200]:
pr-close

@Ttl
Copy link
Member Author

Ttl commented Apr 21, 2020

All of the SUFI games are now played. The best fit is now slightly closer to the current formula than previously.

lc0_q

About +5 previous evaluation is now +10 with this formula.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release blocker Bugs which block releases
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants