New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add explanations for all AllocationDeciders #4934
Conversation
@dakrone great usability feature Might make more sense to use static fields for all the strings being used? |
If we have this explanation, where will it be used? sync with @uboness, he tried to tackle this a bit, and should have a branch with many more explanations. I wonder how / where would you want to use that info? We found logging to be close to useless. An idea was to allow to run reroute API in "debug" mode, and gather the decisions made, and return them as part of the reroute response. But note that with the way the balanced shard allocator works, its going to be very verbose. I believe @uboness tried it, and it ended being so verbose that again became useless. Another option is to allow in reroute to give a shard, and a node, and return why this shard is not allocated on a node. |
ahh, I see when it will be used, in the reroute when we return the list of decisions of why we can't move one shard to another for example. It would be nice to have those explanations only enabled we we want them. This run will create a lot of garbage during normal operation that ends up calling the deciders a lot. Also, for example, canAllocate does an early break on NO, where its used in move command in reroute as an example, I would not want to do an early break on NO in the move command case, so the full explanation on why a shard can't be moved to a node will be provided. I am thinking of a decision debug flag on @spinscale I don't think we need static vars for strings, we only have them once? its cruft? |
Thanks for this guys, it looks great IMO it should always be on when called from the reroute API, otherwise it probably is easier to just return a YES/NO value, though it may be worthwhile to have some decisions logged (low hard disk space is one example that comes to mind where you want this logged). |
@synhershko agreed, having this debug flag turned on for the explicit reroute API call makes sense. |
Added changes that delegate to RoutingAllocation.decision() that only includes the reason if a debug flag is set to true. The debug flag defaults to false, being set only in the case where the reroute API is used. Also made the decisions not short-circuit if the debug flag is true. |
looks great!. I am missing one more thing, when a no or throttle decision is made (or even YES...), a lot of times is because a some sort of threshold matched or not. I would love to see those values in the message we associate when in debug mode. I would add |
That's a good idea, I will make that change. |
Added the parameter passing and constraints for the Deciders where it makes sense. Also added a |
LGTM, this is great!. |
Relates to elastic#4380 Relates to elastic#2483
very cool stuff I think we should backport this to |
This adds explanations for all of the allocation deciders for their
yes
andno
answers. It should help when using the reroute API to explain why a shard can or cannot be moved to a different node.I would like to move to a full explain-like API for shard allocation, but I wanted to submit this as a separate PR since it can easily be backported to all branches to be useful without any breaking changes.
I tried to keep the explanations short but distinct.
Related to #4380 and #2483