Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

index.unassigned.node_left.delayed_timeout not working stably in 1.7 #12566

Closed
mkliu opened this issue Jul 30, 2015 · 7 comments
Closed

index.unassigned.node_left.delayed_timeout not working stably in 1.7 #12566

mkliu opened this issue Jul 30, 2015 · 7 comments
Assignees
Labels
discuss :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) feedback_needed

Comments

@mkliu
Copy link

mkliu commented Jul 30, 2015

For example in the case below (data retrieved from _cluster/health)
Right after I kill the node:

{
  "cluster_name" : "essandbox-cluster",
  "status" : "yellow",
  "timed_out" : false,
  "number_of_nodes" : 19,
  "number_of_data_nodes" : 13,
  "active_primary_shards" : 783,
  "active_shards" : 1480,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 86,
  "delayed_unassigned_shards" : 86,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0
}

I set the timeout to 30s. The node is back at around 10s later. But shards only gradually start recovering at until ~1.5 min later. And it's not at the speed I’m expecting. And I don’t know why it’s relocating_shards.

And worst is sometimes after a while it looks as if it stopped recovering, and I need to manually reroute unassigned.

{
  "cluster_name" : "essandbox-cluster",
  "status" : "yellow",
  "timed_out" : false,
  "number_of_nodes" : 20,
  "number_of_data_nodes" : 14,
  "active_primary_shards" : 783,
  "active_shards" : 1482,
  "relocating_shards" : 2,
  "initializing_shards" : 1,
  "unassigned_shards" : 83,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0
}
@dakrone
Copy link
Member

dakrone commented Jul 30, 2015

@mkliu when the node left the cluster, do you see the log message about the delay in the ES logs, it should look like:

delaying allocation for [N] unassigned shards, next check in [Ns]

(where N is a number), can you paste what it says?

@mkliu
Copy link
Author

mkliu commented Jul 30, 2015

July 30th 2015, 11:13:47.609    essandbox-cluster   INFO    [xxx] delaying allocation for [86] unassigned shards, next check in [29.1s]
July 30th 2015, 11:14:19.007    essandbox-cluster   INFO    [xxx] delaying allocation for [0] unassigned shards, next check in [0s]
July 30th 2015, 11:14:19.580    essandbox-cluster   INFO    [xxx] delaying allocation for [0] unassigned shards, next check in [0s]

@dakrone
Copy link
Member

dakrone commented Jul 30, 2015

@mkliu according to the timestamp it looks like it did do the reroute at the correct time (13:47 and then ~30 seconds later at 14:19).

The log message is confusing and will be fixed by #12532

@mkliu
Copy link
Author

mkliu commented Jul 30, 2015

@dakrone hmm, it's actually not doing reroute, as described in the first post, I had the manually kick start in the end. The

  "delaying allocation for [0] unassigned shards"

goes on and on and on and on.

@dakrone
Copy link
Member

dakrone commented Aug 5, 2015

@mkliu can you increase the logging level for your cluster to DEBUG and make the master log available so I can take a look?

@dakrone
Copy link
Member

dakrone commented Oct 30, 2015

I think this may have been fixed by #12678 , @mkliu can you confirm?

@clintongormley
Copy link

No further feedback. Closing

@lcawl lcawl added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. and removed :Allocation labels Feb 13, 2018
@clintongormley clintongormley added :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) and removed :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. labels Feb 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) feedback_needed
Projects
None yet
Development

No branches or pull requests

4 participants