Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full implementation of resign #47

Closed

Conversation

gonzalezjo
Copy link

Seriously concerned about code quality here, because I'm really not the best at C++. I feel like there were better ways to do things better. That said, these edits seem to work perfectly fine in my testing.

GetBestMove was modified because I couldn't think of a cleaner way of doing things. If you know a better way (there has to be) then I'd say go ahead and implement it. I would've done a struct, but I wasn't sure if that would be okay with the project or not.
Really concerned about code quality here and in search.cc. Better approaches are welcome.
Gets rid of a comment from when resign was only partially implemented.
This was referenced Jun 6, 2018
@@ -83,6 +85,9 @@ void Search::PopulateUciParams(OptionsParser* options) {
"policy-softmax-temp") = 1.0f;
options->Add<IntOption>(kAllowedNodeCollisionsStr, 0, 1024,
"allowed-node-collisions") = 0;
options->Add<IntOption>(kResignPercentageStr, 0, 100,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FloatOption

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Thanks.

@@ -49,6 +49,8 @@ const char* Search::kExtraVirtualLossStr = "Extra virtual loss";
const char* Search::kPolicySoftmaxTempStr = "Policy softmax temperature";
const char* Search::kAllowedNodeCollisionsStr =
"Allowed node collisions, per batch";
const char* Search::kResignPercentageStr =
"Minimum estimated win chance before resign. Zero is off.";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In many GUIs long UCI params don't fit, so consider shortening it.
Also UCI params don't have periods in the end (and --help adds period itself)

@@ -718,17 +725,29 @@ std::pair<Node*, bool> Search::PickNodeToExtend(Node* node,
}
}

std::pair<Move, Move> Search::GetBestMove() const {
std::tuple<Move, Move, int> Search::GetBestMove() const {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's search's setting and not calling class, then it should return decision, and not percentage.
In this case, it's better to replace that tuple with a struct (maybe reuse BestMoveInfo, but actually it has too many unneeded fields, so probably make another struct).

However, as it doesn't really makes sense in some modes (uci), maybe it's better to move this setting out of search.cc into tournament.cc or game.cc, and instead of GetBestMove() returning decision, have a separate function float Search::GetBestMoveWinrate() const.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it's better to move this setting out of search.cc into tournament.cc or game.cc, and instead of GetBestMove()

Not sure how to do this cleanly w/regards to getting an OptionsParser in game.cc. I'll take a look tomorrow, but if you can offer any tips or code for that, I'd be super grateful!


// if resigning is enabled, we check if we should resign.
auto win_percent = GetBestNodeInternal()->GetQ(0, 0) + 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetQ returns values from -1 to 1. Therefore percent will be 0% .. 200% in this case. Should be divided by 2.
Also win_percent doesn't really contain percents, it's in scale of 0..2 at that point, not 0..200, so the name is incorrect.

Copy link
Author

@gonzalezjo gonzalezjo Jun 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetQ returns values from -1 to 1. Therefore percent will be 0% .. 200% in this case. Should be divided by 2.

This was intentional. From my understanding, it only matters when it's negative, because that's what the bot that would resign would see. If GetQ is positive, the current side has an advantage, but it's not like it can decide for the other player to resign anyway. I'm probably really misunderstanding something, though.

Also win_percent doesn't really contain percents, it's in scale of 0..2 at that point, not 0..200, so the name is incorrect.

Good catch.

return std::make_tuple(
best.first,
best.second,
win_percent * 100 < kResignPercentage ? 1 : 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it doesn't return winrate but rather decision? Why int then instead of bool?

Copy link
Author

@gonzalezjo gonzalezjo Jun 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used to C. My bad. That should've been a pretty obvious simplification.

@@ -69,7 +69,8 @@ class Search {
void Wait();

// Returns best move, from the point of view of white player. And also ponder.
std::pair<Move, Move> GetBestMove() const;
// The integer represents whether or not the engine believes this is a helpless move based on resign settings.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have clang installed, run
clang-format -I -style=google filename.cc
on modified files


private:
// Can run several copies of it in separate threads.
void Worker();

Node* GetBestNodeInternal() const;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's not expected to be modified, consider returning const Node*

if (std::get<2>(move_data)) {
game_result_ = blacks_move ?
GameResult::WHITE_WON : GameResult::BLACK_WON;
puts("Will resign.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that intended? It surely shouldn't write anything to stdout because it's for protocol (uci or interaction with client or whatever).
For logging into stderr, I think it's not important enough to output unconditionally.

If we had proper logging (like glog), it would surely be good to log it there, but we don't have it yet.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gonzalezjo
Copy link
Author

gonzalezjo commented Jun 7, 2018

I made some changes. I'm still looking for help with the GetQ/win_percent stuff, and with the organization of GetBestMove(). I replied to your (very nicely presented, by the way. thanks!) requested changes with more specific questions and comments. Hopefully you or someone else can help me out more with that, but I'm not going to be able to do anything else today.

// std::min()'d to make the variable name accurate
// someone, *please* suggest a better name.
auto win_percent = std::min(
(GetBestNodeInternal()->GetQ(0, 0) + 1) * 100, 100.0f);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is what we want.
There are 3 options that seem valid to me.

  1. An overload of GetBestNodeInternal which disables temperature to ensure we use the actual best node.
  2. Use the Q of the current head instead (if that is being updated by the search code, I've not checked).
  3. Use the node which corresponds to the selected move, rather than resampling again at random.
    The scripts which are used to decide on a resignpct which is acceptable have been tuned on one of these three, so it should probably match those scripts - or the scripts will need updating and a new resignpct threshold decided.

I don't really like option 3 because it might have a q value based off one visit, which is a very poor estimate so it makes it more likely to resign incorrectly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mm, I see what you mean.

Copy link
Member

@dubslow dubslow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking that the resign option should be in game.cc, not in search.cc -- search doesn't really care about resigning or not, it's just evaluating. The game driving code is what cares -- either engine.cc in usermode case, or game.cc in training mode case, and since UCI doesn't do resign, there's not much point in putting it in engine.cc.

So GetBestMove should return the float eval rather than a resign bool, and let the driver do what it wants with that.

Having said all that, Tilps is correct that how it is right now isn't quite right, since resign needs to be decided on the best move's eval, not on the chosen-with-temperature move's eval. And you are also correct that the GetBest{Move,Child} hierarchy is a bit convoluted, I've run into similar problems with my dynamic temp implementation as well.

I think what I'm going to do is submit a separate PR which redesigns Search's GetBest* API altogether, and make it both cleaner, and more suitable to resign/dynamic temp purposes. Then we can rebase both this PR and my dynamictemp one top of that redesigned API.

// std::min()'d to make the variable name accurate
// someone, *please* suggest a better name.
auto win_percent = std::min(
(GetBestNodeInternal()->GetQ(0, 0) + 1) * 100, 100.0f);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mm, I see what you mean.

@dubslow
Copy link
Member

dubslow commented Jun 12, 2018

Upon further consideration, I believe crem's current setup is indeed ~optimal. It's not the first time it's happened to me that "more thought" -> "oh, crem was right the whole time" :) What resign should do is to leave current Search functions untouched (except possibly rename GetBestChild to GetBestChildNoTemperature), and instead add new separate function named GetBestEval or similar which returns the eval of the no-temp-best move, then selfplay/game.cc can use that to make an adjudication decision.

Edit: And of course none of this isn't stuff that crem hasn't already suggested. Doh.

I could make the changes and make my own PR, or push my own changes to your branch here, or if you prefer you can just do it yourself

@gonzalezjo
Copy link
Author

Fixing up the GetBests (either through lots of refactoring, or just the rename + new function) would be a really good place to start. Seems like you've already went ahead with that on PR 80 (and then some) which cleans things up a lot. I'm completely Git-clueless, so I think continuing on your PR is the best way forward.

@gonzalezjo gonzalezjo closed this Jun 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants