-
Notifications
You must be signed in to change notification settings - Fork 527
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Full implementation of resign #47
Full implementation of resign #47
Conversation
GetBestMove was modified because I couldn't think of a cleaner way of doing things. If you know a better way (there has to be) then I'd say go ahead and implement it. I would've done a struct, but I wasn't sure if that would be okay with the project or not.
Really concerned about code quality here and in search.cc. Better approaches are welcome.
Gets rid of a comment from when resign was only partially implemented.
src/mcts/search.cc
Outdated
@@ -83,6 +85,9 @@ void Search::PopulateUciParams(OptionsParser* options) { | |||
"policy-softmax-temp") = 1.0f; | |||
options->Add<IntOption>(kAllowedNodeCollisionsStr, 0, 1024, | |||
"allowed-node-collisions") = 0; | |||
options->Add<IntOption>(kResignPercentageStr, 0, 100, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FloatOption
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Thanks.
src/mcts/search.cc
Outdated
@@ -49,6 +49,8 @@ const char* Search::kExtraVirtualLossStr = "Extra virtual loss"; | |||
const char* Search::kPolicySoftmaxTempStr = "Policy softmax temperature"; | |||
const char* Search::kAllowedNodeCollisionsStr = | |||
"Allowed node collisions, per batch"; | |||
const char* Search::kResignPercentageStr = | |||
"Minimum estimated win chance before resign. Zero is off."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In many GUIs long UCI params don't fit, so consider shortening it.
Also UCI params don't have periods in the end (and --help
adds period itself)
src/mcts/search.cc
Outdated
@@ -718,17 +725,29 @@ std::pair<Node*, bool> Search::PickNodeToExtend(Node* node, | |||
} | |||
} | |||
|
|||
std::pair<Move, Move> Search::GetBestMove() const { | |||
std::tuple<Move, Move, int> Search::GetBestMove() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's search's setting and not calling class, then it should return decision, and not percentage.
In this case, it's better to replace that tuple with a struct (maybe reuse BestMoveInfo, but actually it has too many unneeded fields, so probably make another struct).
However, as it doesn't really makes sense in some modes (uci), maybe it's better to move this setting out of search.cc into tournament.cc or game.cc, and instead of GetBestMove() returning decision, have a separate function float Search::GetBestMoveWinrate() const
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe it's better to move this setting out of search.cc into tournament.cc or game.cc, and instead of GetBestMove()
Not sure how to do this cleanly w/regards to getting an OptionsParser in game.cc. I'll take a look tomorrow, but if you can offer any tips or code for that, I'd be super grateful!
src/mcts/search.cc
Outdated
|
||
// if resigning is enabled, we check if we should resign. | ||
auto win_percent = GetBestNodeInternal()->GetQ(0, 0) + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetQ
returns values from -1 to 1. Therefore percent will be 0% .. 200% in this case. Should be divided by 2.
Also win_percent
doesn't really contain percents, it's in scale of 0..2 at that point, not 0..200, so the name is incorrect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetQ returns values from -1 to 1. Therefore percent will be 0% .. 200% in this case. Should be divided by 2.
This was intentional. From my understanding, it only matters when it's negative, because that's what the bot that would resign would see. If GetQ is positive, the current side has an advantage, but it's not like it can decide for the other player to resign anyway. I'm probably really misunderstanding something, though.
Also win_percent doesn't really contain percents, it's in scale of 0..2 at that point, not 0..200, so the name is incorrect.
Good catch.
src/mcts/search.cc
Outdated
return std::make_tuple( | ||
best.first, | ||
best.second, | ||
win_percent * 100 < kResignPercentage ? 1 : 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it doesn't return winrate but rather decision? Why int
then instead of bool
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used to C. My bad. That should've been a pretty obvious simplification.
src/mcts/search.h
Outdated
@@ -69,7 +69,8 @@ class Search { | |||
void Wait(); | |||
|
|||
// Returns best move, from the point of view of white player. And also ponder. | |||
std::pair<Move, Move> GetBestMove() const; | |||
// The integer represents whether or not the engine believes this is a helpless move based on resign settings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have clang installed, run
clang-format -I -style=google filename.cc
on modified files
src/mcts/search.h
Outdated
|
||
private: | ||
// Can run several copies of it in separate threads. | ||
void Worker(); | ||
|
||
Node* GetBestNodeInternal() const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's not expected to be modified, consider returning const Node*
src/selfplay/game.cc
Outdated
if (std::get<2>(move_data)) { | ||
game_result_ = blacks_move ? | ||
GameResult::WHITE_WON : GameResult::BLACK_WON; | ||
puts("Will resign."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that intended? It surely shouldn't write anything to stdout
because it's for protocol (uci or interaction with client or whatever).
For logging into stderr
, I think it's not important enough to output unconditionally.
If we had proper logging (like glog), it would surely be good to log it there, but we don't have it yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied what lczero does: https://github.com/glinscott/leela-chess/blob/39009b46065af65ab23181308aa56ab609d2f7f0/src/UCTSearch.cpp#L228
I will remove it, though.
Fix type mismatch + accidentally removing line... sigh
I made some changes. I'm still looking for help with the GetQ/win_percent stuff, and with the organization of GetBestMove(). I replied to your (very nicely presented, by the way. thanks!) requested changes with more specific questions and comments. Hopefully you or someone else can help me out more with that, but I'm not going to be able to do anything else today. |
// std::min()'d to make the variable name accurate | ||
// someone, *please* suggest a better name. | ||
auto win_percent = std::min( | ||
(GetBestNodeInternal()->GetQ(0, 0) + 1) * 100, 100.0f); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is what we want.
There are 3 options that seem valid to me.
- An overload of GetBestNodeInternal which disables temperature to ensure we use the actual best node.
- Use the Q of the current head instead (if that is being updated by the search code, I've not checked).
- Use the node which corresponds to the selected move, rather than resampling again at random.
The scripts which are used to decide on a resignpct which is acceptable have been tuned on one of these three, so it should probably match those scripts - or the scripts will need updating and a new resignpct threshold decided.
I don't really like option 3 because it might have a q value based off one visit, which is a very poor estimate so it makes it more likely to resign incorrectly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mm, I see what you mean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking that the resign option should be in game.cc, not in search.cc -- search doesn't really care about resigning or not, it's just evaluating. The game driving code is what cares -- either engine.cc in usermode case, or game.cc in training mode case, and since UCI doesn't do resign, there's not much point in putting it in engine.cc.
So GetBestMove should return the float eval rather than a resign bool, and let the driver do what it wants with that.
Having said all that, Tilps is correct that how it is right now isn't quite right, since resign needs to be decided on the best move's eval, not on the chosen-with-temperature move's eval. And you are also correct that the GetBest{Move,Child} hierarchy is a bit convoluted, I've run into similar problems with my dynamic temp implementation as well.
I think what I'm going to do is submit a separate PR which redesigns Search's GetBest* API altogether, and make it both cleaner, and more suitable to resign/dynamic temp purposes. Then we can rebase both this PR and my dynamictemp one top of that redesigned API.
// std::min()'d to make the variable name accurate | ||
// someone, *please* suggest a better name. | ||
auto win_percent = std::min( | ||
(GetBestNodeInternal()->GetQ(0, 0) + 1) * 100, 100.0f); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mm, I see what you mean.
Upon further consideration, I believe crem's current setup is indeed ~optimal. It's not the first time it's happened to me that "more thought" -> "oh, crem was right the whole time" :) What resign should do is to leave current Edit: And of course none of this isn't stuff that crem hasn't already suggested. Doh. I could make the changes and make my own PR, or push my own changes to your branch here, or if you prefer you can just do it yourself |
Fixing up the GetBests (either through lots of refactoring, or just the rename + new function) would be a really good place to start. Seems like you've already went ahead with that on PR 80 (and then some) which cleans things up a lot. I'm completely Git-clueless, so I think continuing on your PR is the best way forward. |
Seriously concerned about code quality here, because I'm really not the best at C++. I feel like there were better ways to do things better. That said, these edits seem to work perfectly fine in my testing.