Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add explanations for all AllocationDeciders #4934

Merged
merged 1 commit into from Jan 31, 2014

Conversation

dakrone
Copy link
Member

@dakrone dakrone commented Jan 28, 2014

This adds explanations for all of the allocation deciders for their yes and no answers. It should help when using the reroute API to explain why a shard can or cannot be moved to a different node.

I would like to move to a full explain-like API for shard allocation, but I wanted to submit this as a separate PR since it can easily be backported to all branches to be useful without any breaking changes.

I tried to keep the explanations short but distinct.

Related to #4380 and #2483

@spinscale
Copy link
Contributor

@dakrone great usability feature

Might make more sense to use static fields for all the strings being used?

@kimchy
Copy link
Member

kimchy commented Jan 29, 2014

If we have this explanation, where will it be used?

sync with @uboness, he tried to tackle this a bit, and should have a branch with many more explanations. I wonder how / where would you want to use that info? We found logging to be close to useless.

An idea was to allow to run reroute API in "debug" mode, and gather the decisions made, and return them as part of the reroute response. But note that with the way the balanced shard allocator works, its going to be very verbose. I believe @uboness tried it, and it ended being so verbose that again became useless.

Another option is to allow in reroute to give a shard, and a node, and return why this shard is not allocated on a node.

@kimchy
Copy link
Member

kimchy commented Jan 29, 2014

ahh, I see when it will be used, in the reroute when we return the list of decisions of why we can't move one shard to another for example.

It would be nice to have those explanations only enabled we we want them. This run will create a lot of garbage during normal operation that ends up calling the deciders a lot.

Also, for example, canAllocate does an early break on NO, where its used in move command in reroute as an example, I would not want to do an early break on NO in the move command case, so the full explanation on why a shard can't be moved to a node will be provided.

I am thinking of a decision debug flag on RoutingAllocation, that when enabled, we will not shortcut on NO decisions, and create Decision.single instead of using the enum Decision.NO (this can be abstracted in a method like "RoutingAllocation#decision(Enum, String, params)", that returns the full decision only when debug is enabled.

@spinscale I don't think we need static vars for strings, we only have them once? its cruft?

@synhershko
Copy link
Contributor

@kimchy yes, see #4380

Thanks for this guys, it looks great

IMO it should always be on when called from the reroute API, otherwise it probably is easier to just return a YES/NO value, though it may be worthwhile to have some decisions logged (low hard disk space is one example that comes to mind where you want this logged).

@kimchy
Copy link
Member

kimchy commented Jan 29, 2014

@synhershko agreed, having this debug flag turned on for the explicit reroute API call makes sense.

@dakrone
Copy link
Member Author

dakrone commented Jan 29, 2014

Added changes that delegate to RoutingAllocation.decision() that only includes the reason if a debug flag is set to true. The debug flag defaults to false, being set only in the case where the reroute API is used.

Also made the decisions not short-circuit if the debug flag is true.

@kimchy
Copy link
Member

kimchy commented Jan 30, 2014

looks great!. I am missing one more thing, when a no or throttle decision is made (or even YES...), a lot of times is because a some sort of threshold matched or not. I would love to see those values in the message we associate when in debug mode.

I would add Object... args to the decision method, and call String#format on the text with args when in debug mode. Then, in all the places where we provide a debug message, add more info on relevant values that caused that decision.

@dakrone
Copy link
Member Author

dakrone commented Jan 30, 2014

That's a good idea, I will make that change.

@dakrone
Copy link
Member Author

dakrone commented Jan 30, 2014

Added the parameter passing and constraints for the Deciders where it makes sense. Also added a .toString() method for the DiscoverNodeFilters so they're human readable now.

@kimchy
Copy link
Member

kimchy commented Jan 30, 2014

LGTM, this is great!.

@dakrone dakrone merged commit 5448477 into elastic:master Jan 31, 2014
@s1monw
Copy link
Contributor

s1monw commented Jan 31, 2014

very cool stuff I think we should backport this to 0.90.12 as well as 1.0.0.RC2

@dakrone dakrone deleted the 4380-explain-decisions branch April 21, 2014 22:56
@lcawl lcawl added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. and removed :Allocation labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. >enhancement v0.90.11 v1.0.0 v1.1.0 v2.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants