Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add explain flag support to the reroute API #5027

Closed
wants to merge 22 commits into from

Conversation

dakrone
Copy link
Member

@dakrone dakrone commented Feb 5, 2014

By specifying the explain flag, an explanation for the reason a
command can or cannot be executed is returned. No allocation commands
are actually performed.

Returns a response similar to:

{
  "state": { ... cluster state ...},
  "acknowledge": true,
  "explanations" : [ {
    "command" : "cancel",
      "parameters" : {
        "index" : "decide",
        "shard" : 0,
        "node" : "IvpoKRdtRiGrQ_WKtt4_4w",
        "allow_primary" : false
      },
      "decisions" : [ {
        "decider" : "cancel_allocation_command",
        "decision" : "YES",
        "explanation" : "..."
        } ]
     }, {
      "command" : "move",
      "parameters" : {
        "index" : "decide",
        "shard" : 0,
        "from_node" : "IvpoKRdtRiGrQ_WKtt4_4w",
        "to_node" : "IvpoKRdtRiGrQ_WKtt4_4w"
       },
       "decisions" : [ {
         "decider" : "same_shard",
         "decision" : "NO",
         "explanation" : "shard cannot be allocated on same node [IvpoKRdtRiGrQ_WKtt4_4w] it already exists on"
       },
       etc, etc, etc
       ]
  }]
}

An example of the full output is in #2483

Closes #5169

clusterStateToSend = newState;
if (request.dryRun) {
return currentState;
if (request.explain()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so it is not possible to get the explain if I also want to reroute ie. actually exectue the operation? I mean we could still make ups of the dryRun flag and get explains if dryRun = false

By specifying the `explain` flag, an explanation for the reason a
command can or cannot be executed is returned. No allocation commands
are actually performed.

Returns a response similar to:

{
  "explanations" : [ {
    "command" : "cancel",
      "parameters" : {
        "index" : "decide",
        "shard" : 0,
        "node" : "IvpoKRdtRiGrQ_WKtt4_4w",
        "allow_primary" : false
      },
      "decisions" : [ {
        "decider" : "CancelAllocationCommand",
        "decision" : "YES",
        "explanation" : "..."
        } ]
     }, {
      "command" : "move",
      "parameters" : {
        "index" : "decide",
        "shard" : 0,
        "from_node" : "IvpoKRdtRiGrQ_WKtt4_4w",
        "to_node" : "IvpoKRdtRiGrQ_WKtt4_4w"
       },
       "decisions" : [ {
         "decider" : "same_shard",
         "decision" : "NO",
         "explanation" : "shard cannot be allocated on same node [IvpoKRdtRiGrQ_WKtt4_4w] it already exists on"
       },
       etc
       ]
  }]
}

Closes elastic#2483
@@ -110,6 +128,7 @@ public void readFrom(StreamInput in) throws IOException {
super.readFrom(in);
commands = AllocationCommands.readFrom(in);
dryRun = in.readBoolean();
explain = in.readBoolean();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we need a version check here sicne we are changing the protocol?

@s1monw
Copy link
Contributor

s1monw commented Feb 19, 2014

I have a couple of problems with how this at the current stage. Implementation wise the code looks great but I think from my understanding a reroute call with explain = true implies dryRun=true no matter what you set there. Is that correct? If so I really thing we shouldn't do this. It will confuse the hell out of people if we treat this not as a flag but as a different action. I personally thing we should make explain the default and execute it all the time. so basically we should only have a flag for dryRun and always build an explain. if folks are not interested in the response they can just ignore it. If you want to simulate you set dryRun=true and you always see what is going on, makes sense?

@dakrone
Copy link
Member Author

dakrone commented Feb 19, 2014

Yea, I understand what you're saying. I will work on removing the RoutingAllocation.Result from the cluster state, which should allow me to return the explain results every time, even while executing the commands.

@dakrone
Copy link
Member Author

dakrone commented Feb 21, 2014

@s1monw okay, I have made a lot of changes, please take another look

  • removed AllocationExplanation from cluster state, as per remove RoutingAllocation.Result from cluster state #5169 (the RoutingTable is still part of the state, which is why the entire Result object is not removed)
  • made the RoutingExplanations part of the Result object now
  • you can now do .explain(true) and execute the reroute commands in addition to explaining (or still do a dry_run with explanations)
  • removed the separate .explain() methods on the allocation commands to removed the duplicated code
  • added version checks for the changed serialization in the Reroute stuff and the ClusterState

@@ -110,6 +129,11 @@ public void readFrom(StreamInput in) throws IOException {
super.readFrom(in);
commands = AllocationCommands.readFrom(in);
dryRun = in.readBoolean();
if (in.getVersion().onOrAfter(Version.V_2_0_0)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be V_1_1_0?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I wasn't sure where this was going to end up, thought it might be 2.0-only since it includes cluster state changes, but I can definitely change that.

@s1monw
Copy link
Contributor

s1monw commented Feb 26, 2014

I like this much better I left some comments I think we are close...

@dakrone
Copy link
Member Author

dakrone commented Feb 26, 2014

@s1monw updated again with the changes you mentioned.

@s1monw
Copy link
Contributor

s1monw commented Feb 27, 2014

LGTM

@dakrone
Copy link
Member Author

dakrone commented Feb 27, 2014

Merged to 1.x and master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

remove RoutingAllocation.Result from cluster state
3 participants