Add explanations for all AllocationDeciders #4934

dakrone · 2014-01-28T23:44:28Z

This adds explanations for all of the allocation deciders for their yes and no answers. It should help when using the reroute API to explain why a shard can or cannot be moved to a different node.

I would like to move to a full explain-like API for shard allocation, but I wanted to submit this as a separate PR since it can easily be backported to all branches to be useful without any breaking changes.

I tried to keep the explanations short but distinct.

Related to #4380 and #2483

spinscale · 2014-01-29T08:40:58Z

@dakrone great usability feature

Might make more sense to use static fields for all the strings being used?

kimchy · 2014-01-29T08:52:41Z

If we have this explanation, where will it be used?

sync with @uboness, he tried to tackle this a bit, and should have a branch with many more explanations. I wonder how / where would you want to use that info? We found logging to be close to useless.

An idea was to allow to run reroute API in "debug" mode, and gather the decisions made, and return them as part of the reroute response. But note that with the way the balanced shard allocator works, its going to be very verbose. I believe @uboness tried it, and it ended being so verbose that again became useless.

Another option is to allow in reroute to give a shard, and a node, and return why this shard is not allocated on a node.

kimchy · 2014-01-29T09:04:00Z

ahh, I see when it will be used, in the reroute when we return the list of decisions of why we can't move one shard to another for example.

It would be nice to have those explanations only enabled we we want them. This run will create a lot of garbage during normal operation that ends up calling the deciders a lot.

Also, for example, canAllocate does an early break on NO, where its used in move command in reroute as an example, I would not want to do an early break on NO in the move command case, so the full explanation on why a shard can't be moved to a node will be provided.

I am thinking of a decision debug flag on RoutingAllocation, that when enabled, we will not shortcut on NO decisions, and create Decision.single instead of using the enum Decision.NO (this can be abstracted in a method like "RoutingAllocation#decision(Enum, String, params)", that returns the full decision only when debug is enabled.

@spinscale I don't think we need static vars for strings, we only have them once? its cruft?

synhershko · 2014-01-29T09:15:45Z

@kimchy yes, see #4380

Thanks for this guys, it looks great

IMO it should always be on when called from the reroute API, otherwise it probably is easier to just return a YES/NO value, though it may be worthwhile to have some decisions logged (low hard disk space is one example that comes to mind where you want this logged).

kimchy · 2014-01-29T09:17:04Z

@synhershko agreed, having this debug flag turned on for the explicit reroute API call makes sense.

dakrone · 2014-01-29T18:44:59Z

Added changes that delegate to RoutingAllocation.decision() that only includes the reason if a debug flag is set to true. The debug flag defaults to false, being set only in the case where the reroute API is used.

Also made the decisions not short-circuit if the debug flag is true.

kimchy · 2014-01-30T19:30:25Z

looks great!. I am missing one more thing, when a no or throttle decision is made (or even YES...), a lot of times is because a some sort of threshold matched or not. I would love to see those values in the message we associate when in debug mode.

I would add Object... args to the decision method, and call String#format on the text with args when in debug mode. Then, in all the places where we provide a debug message, add more info on relevant values that caused that decision.

dakrone · 2014-01-30T19:31:46Z

That's a good idea, I will make that change.

dakrone · 2014-01-30T22:11:30Z

Added the parameter passing and constraints for the Deciders where it makes sense. Also added a .toString() method for the DiscoverNodeFilters so they're human readable now.

kimchy · 2014-01-30T22:16:09Z

LGTM, this is great!.

Relates to elastic#4380 Relates to elastic#2483

s1monw · 2014-01-31T20:24:04Z

very cool stuff I think we should backport this to 0.90.12 as well as 1.0.0.RC2

dakrone merged commit 5448477 into elastic:master Jan 31, 2014

Add explanations for all AllocationDeciders

5448477

Relates to elastic#4380 Relates to elastic#2483

dakrone deleted the 4380-explain-decisions branch April 21, 2014 22:56

clintongormley added the :Allocation label Jun 7, 2015

lcawl added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. and removed :Allocation labels Feb 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add explanations for all AllocationDeciders #4934

Add explanations for all AllocationDeciders #4934

dakrone commented Jan 28, 2014

spinscale commented Jan 29, 2014

kimchy commented Jan 29, 2014

kimchy commented Jan 29, 2014

synhershko commented Jan 29, 2014

kimchy commented Jan 29, 2014

dakrone commented Jan 29, 2014

kimchy commented Jan 30, 2014

dakrone commented Jan 30, 2014

dakrone commented Jan 30, 2014

kimchy commented Jan 30, 2014

s1monw commented Jan 31, 2014

Add explanations for all AllocationDeciders #4934

Add explanations for all AllocationDeciders #4934

Conversation

dakrone commented Jan 28, 2014

spinscale commented Jan 29, 2014

kimchy commented Jan 29, 2014

kimchy commented Jan 29, 2014

synhershko commented Jan 29, 2014

kimchy commented Jan 29, 2014

dakrone commented Jan 29, 2014

kimchy commented Jan 30, 2014

dakrone commented Jan 30, 2014

dakrone commented Jan 30, 2014

kimchy commented Jan 30, 2014

s1monw commented Jan 31, 2014