Update q to cp conversion formula #1193

Ttl · 2020-04-10T15:40:49Z

Based on the current TCEC games, the Q to centipawn conversion formula seems to be too pessimistic at high Q. lc0 cp evaluations are often much lower than the other engines.

I downloaded the TCEC SUFI evaluations and compared to the lc0 cp evaluations to SF evaluations. Below are the results:

The evaluation for Q < 0 comes from only one game and can be ignored. The current formula is the line labeled with centipawn. The first too optimistic conversion function is centipawn_2018 line. I also plotted the conversion function from #841 that used the old equation but fitted to more games. The #841 equation seems to fit very well to the median SF evaluation. I can get slightly better fit with more complex functions, but this one has advantage of being easily invertible which is nice for calculating the Q from centipawns. This PR reverts the conversion function to one from #841.

The average error to median SF eval on this dataset is 0.79 pawns with the current function, 0.28 with this function and 1.85 with the centipawn_2018.

The script for downloading evaluations and plotting the graphs can be found at: https://gist.github.com/Ttl/9926f70800b3fbd7314c150108d4ba61

Revert the formula to PR841 equation.

mooskagh · 2020-04-12T19:23:02Z

It seems that after every TCEC season we start to retune our formula as it doesn't fit the centipawn scale well.

I guess what's really happen is different networks have different Q scale so to say, meaning that different networks will need different tunes.

Maybe it makes sense to have a library of a few hundreds positions with SF eval and run Lc0 evals periodically on them to re-tune the formula..

Ttl · 2020-04-12T19:29:28Z

Don't merge this just yet, I'll tune this further when there are slightly more games played in sufi,

Naphthalin · 2020-04-12T19:58:35Z

I think this formula is much better than our current one for 2 reasons:
1.) it is invertible
2.) if I see that correctly, it reaches +10.0 just below our resign threshold in training (threshold 0.96, +10.0 at 0.934)
and would really appreciate to employ it as our new default formula.

zz4032 · 2020-04-13T11:01:27Z

I guess what's really happen is different networks have different Q scale so to say, meaning that different networks will need different tunes.

I reran the adjusting process on random positions from human games as I did previously for the existing conversion formula.

Current formula + old sample points (T40):

Indeed it seems that T60 networks produce much fewer extreme Q values (close to -1 or 1) compared to T40. Current formula (red) and new sample data with T60 don't correlate well starting from CP +/-300.
A small adjustment of the current formula however with exponent parameter changed from 14 to 8 shows a much better fit (green): uci_info.score = 295 * wl / (1 - 0.976953126 * std::pow(wl, 8));. Orange is PR1193.
Current formula + new sample points (SV-T60):

Same diagram with a zoom at CP [-200;200]:

Ttl · 2020-04-21T12:11:16Z

All of the SUFI games are now played. The best fit is now slightly closer to the current formula than previously.

About +5 previous evaluation is now +10 with this formula.

Update q to cp conversion formula

1921974

Revert the formula to PR841 equation.

mooskagh approved these changes Apr 12, 2020

View reviewed changes

Ttl added 2 commits April 21, 2020 14:41

Tune constants

8c16bed

Merge remote-tracking branch 'origin/master' into q_to_cp

549b890

mooskagh added the release blocker label Apr 22, 2020

mooskagh merged commit c65895d into LeelaChessZero:master Apr 22, 2020

Naphthalin mentioned this pull request Apr 28, 2020

Flat eval at q=0 #1239

Closed

zz4032 mentioned this pull request Jan 3, 2021

SF-like centipawn formula as distinct option. #1477

Closed

yuzisee mentioned this pull request Mar 20, 2023

Normalize evaluation official-stockfish/Stockfish#4216

Merged

yuzisee mentioned this pull request Aug 29, 2023

How does the graph differ from Lichess's? rooklift/nibbler#242

Closed

Naphthalin mentioned this pull request Oct 11, 2024

Change centipawn fallback to account for sharper WDL with high WDLCalibrationElo #2075

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update q to cp conversion formula #1193

Update q to cp conversion formula #1193

Ttl commented Apr 10, 2020

mooskagh commented Apr 12, 2020

Ttl commented Apr 12, 2020

Naphthalin commented Apr 12, 2020

zz4032 commented Apr 13, 2020

Ttl commented Apr 21, 2020

Update q to cp conversion formula #1193

Update q to cp conversion formula #1193

Conversation

Ttl commented Apr 10, 2020

mooskagh commented Apr 12, 2020

Ttl commented Apr 12, 2020

Naphthalin commented Apr 12, 2020

zz4032 commented Apr 13, 2020

Ttl commented Apr 21, 2020