Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add value pessimism parameter #707

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Conversation

Ttl
Copy link
Member

@Ttl Ttl commented Jan 27, 2019

This parameter controls the number of visits that initial Q should have in later Q updates. I clopped the best value at very quick time controls to be around 0.6.

At longer time controls and other values being default it seems to be improvement:

Score of lc0_fpu_visits_35258 vs lc0_master_35258: 45 - 25 - 330  [0.525] 400
Elo difference: 17.39 +/- 14.16, LOS: 99.16 %, DrawRatio: 82.5 %

Network 35258, TC: 10s+0.5s, GPU GTX 1080 Ti, 5 man TBs and all other parameters defaults.

src/mcts/node.h Outdated Show resolved Hide resolved
@Ttl
Copy link
Member Author

Ttl commented Feb 15, 2019

Some more tests with the latest commit. Hopefully no functional changes. It uses the pessimistic Q value only for picking the positions for the batch. The reported score is not affected anymore.

Score of lc0_value_pessimism_06 vs lc0_master: 92 - 74 - 634  [0.511] 800
Elo difference: 7.82 +/- 10.95, LOS: 91.88 %, DrawRatio: 79.3 %

800 nodes, MinibatchSize=32, Slowmover=0 
Score of lc0_value_pessimism_06 vs lc0_master: 68 - 60 - 272  [0.510] 400
Elo difference: 6.95 +/- 19.25, LOS: 76.02 %, DrawRatio: 68.0 %

800 nodes, MinibatchSize=32 
Score of lc0_value_pessimism_06 vs lc0_master: 92 - 74 - 634  [0.511] 800
Elo difference: 7.82 +/- 10.95, LOS: 91.88 %, DrawRatio: 79.3 %

1+0.5s, GTX 1080 Ti

The later tests are all positive, but all are inside the error bounds. Probably still needs more tests to make sure that this gains Elo.

@Ttl Ttl added the wip Work in progress label Feb 15, 2019
@Ttl Ttl changed the title Add FPU visits parameter Add value pessimism parameter Feb 15, 2019
@Naphthalin
Copy link
Contributor

This is an interesting approach (and somehow similar in general to the initial AlphaGo paper, where the NN evaluation was treated as 100 MC rollouts). @Ttl do you still think this is a sensible approach, and if yes update the PR and test with recent nets?

@Naphthalin Naphthalin added stale Outdated PR, might be closed due to merge conflicts or inactivity enhancement New feature or request and removed wip Work in progress labels Nov 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request stale Outdated PR, might be closed due to merge conflicts or inactivity
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants