Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

index.auto_expand_replicas and shard allocation settings don't play well together #2869

Closed
synhershko opened this issue Apr 7, 2013 · 14 comments · Fixed by #69334
Closed
Labels
:Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >enhancement high hanging fruit Team:Distributed Meta label for distributed team

Comments

@synhershko
Copy link
Contributor

With an index configured with auto_expand_replicas="0-all" the cluster will try to allocate one primary and ({number nodes} - 1) replicas for each shard, ie a copy of each shard on each node.

However, with "index.routing.allocation.include.zone"="zone1" the cluster is blocked from allocating a shard (either primary or replica) to any nodes that are not configured with a zone attribute set to "zone1", ie "node.zone=zone1" in the elasticsearch.yaml config file. So if a cluster has 3 nodes in "zone1" and 3 nodes in "zone2" it will allocate 3 shards (primary + 2 x replicas) to the "zone1" nodes and mark an additional 3 replicas as unassigned. I observed this using the elasticsearch head utility.

So the allocation behaviour is as desired, ie auto expand replicas to all nodes with a specific zone attribute, but the unassigned shards will make the cluster be in red state

See https://groups.google.com/forum/#!msg/elasticsearch/95hC-wGu7GE/BPPSWsfj8UkJ

@synhershko
Copy link
Contributor Author

using latest master I should add

@clintongormley
Copy link

I'm wondering if we should deprecate auto-expand... It just seems like the wrong solution

@synhershko
Copy link
Contributor Author

IMO no, auto-expand is a very useful feature and a way for clusters to automatically grow to accommodate peaks with read-heavy operations. It's just somewhat broken when shard-allocation rules aren't trivial.

I admit tho, auto-expand does require multi-cast to be enabled (or predefining IPs for the machines dynamically provisioned during peaks), so at this point it does seem more like of an exotic feature.

@nik9000
Copy link
Member

nik9000 commented Aug 8, 2014

We used it in CirrusSearch so people with only a single node or two nodes
don't end up with a yellow cluster. We document the autocorrelating
behavior and the implications to redundancy. I still don't particularly
trust it BUT it is a nice way to make installation smooth even in small
environments. We use most of the CirrusSearch details both in development
and on the production cluster and it helps with that.
On Aug 8, 2014 12:00 PM, "Itamar Syn-Hershko" notifications@github.com
wrote:

IMO no, auto-expand is a very useful feature and a way for clusters to
automatically grow to accommodate peaks with read-heavy operations. It's
just somewhat broken when shard-allocation rules aren't trivial.

I admit tho, auto-expand does require multi-cast to be enabled (or
predefining IPs for the machines dynamically provisioned during peaks), so
at this point it does seem more like of an exotic feature.


Reply to this email directly or view it on GitHub
#2869 (comment)
.

@markharwood
Copy link
Contributor

Attached an "adoptme" tag as this needs some further looking into to establish if this is an issue.
The auto-expand feature is used internally to keep a copy of the .scripts index (stored scripts) on each node so this does need to work.

@clintongormley
Copy link

Having reread this issue, it seems that if we were able to limit the max number of auto-expand replicas to the number of nodes that would allow allocation (eg awareness rules) then we'd avoid the yellow status.

Not sure how feasible this is.

@colings86
Copy link
Contributor

@dakrone Do you know whether @clintongormley suggestion above is feasible?

@dakrone
Copy link
Member

dakrone commented Nov 16, 2015

@colings86 @clintongormley I looked at where the setting is updated, it's
currently updated in MetaDataUpdateSettingsService, which submits a new
cluster state update to change the number of replicas if it detects that the
setting must be changed.

I think it would be possible to try and use the AllocationDeciders to adjust the
number of replicas to the number of nodes where it can be allocated, but I don't
think it's trivial.

Maybe it should be a separate setting? Instead of 0-all it can be
0-available or something?

@vb3
Copy link

vb3 commented Sep 20, 2016

I noticed that in ES1.7
.scripts index has the 0-all setting automatically
My cluster is yellow because it is unable to expand replicas to nodes that are full in data capacity bounded by

        "cluster.routing.allocation.disk.watermark.low": "80%",
        "cluster.routing.allocation.disk.watermark.high": "85%"

is there a solution for this?

@AlexP-Elastic
Copy link

AlexP-Elastic commented Oct 17, 2016

We have run into this issue (or a heavily related one) twice in the last month on Cloud, both times on 2.x clusters (#1656, details here and here in the comments)

In particular it appears (it's happening on production clusters so unfortunately there's a limit to how much debugging we can do before it is necessary to reset the cluster state) that an unallocated "auto expand" index is preventing other indexes from migrating given a cluster.routing.allocation.exclude._name directive - is that even remotely possible? (here is my notes on what I saw the most recent time) ...

We are about to deploy a workaround that will temporarily disable "auto expand" for .scripts and .security while updating a cluster (and perhaps user indexes in the future), but .security settings cannot be changed in 2.x so this workaround is incomplete.

cc @alexbrasetvik

@s1monw
Copy link
Contributor

s1monw commented Jan 19, 2018

we spoke about this in fixit friday to keep this issue going. We came up with a possible different way of looking at the problem. If we'd treat an index that has n-all set we could just consider it fully allocated as long as at least max(1,n) replicas are allocated. Since this is all the allocator can do at this point. We haven't fully flashed out the consequences and how hard it would be to do that but it might be a simpler solution independent of the allocation deciders. /cc @ywelsch

@lcawl lcawl added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. and removed :Allocation labels Feb 13, 2018
@DaveCTurner DaveCTurner added the :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) label Mar 15, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@DaveCTurner DaveCTurner removed the :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. label Mar 15, 2018
@inqueue
Copy link
Member

inqueue commented May 2, 2018

This issue can also manifest when decommissioning a data node where, for example, the node is holding a .security-6 copy and cluster level allocation filtering has been applied by _ip. Applying shard allocation filtering does not remove shards having auto-expand configured. While this is not a detrimental problem, it can cause confusion and it should be verified that any remaining shards on a node are due to having auto-expand replicas configured. It also requires the user to know which indices have auto-expand configured.

Perhaps shard allocation settings + shard allocation behavior with auto-expand replicas should be documented until a technical resolution is reached given how long this issue has been open.

@ywelsch
Copy link
Contributor

ywelsch commented May 11, 2018

Perhaps shard allocation settings + shard allocation behavior with auto-expand replicas should be documented until a technical resolution is reached given how long this issue has been open.

@inqueue I've opened #30531 to document this.

@ywelsch ywelsch removed the v5.4.4 label Nov 15, 2019
@rjernst rjernst added the Team:Distributed Meta label for distributed team label May 4, 2020
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Feb 22, 2021
Today if an index is set to `auto_expand_replicas: N-all` then we will
try and create a shard copy on every node that matches the applicable
allocation filters. This conflits with shard allocation awareness and
the same-host allocation decider if there is an uneven distribution of
nodes across zones or hosts, since these deciders prevent shard copies
from being allocated unevenly and may therefore leave some unassigned
shards.

The point of these two deciders is to improve resilience given a limited
number of shard copies but there is no need for this behaviour when the
number of shard copies is not limited, so this commit supresses them in
that case.

Closes elastic#54151
Closes elastic#2869
DaveCTurner added a commit that referenced this issue Feb 22, 2021
Today if an index is set to `auto_expand_replicas: N-all` then we will
try and create a shard copy on every node that matches the applicable
allocation filters. This conflits with shard allocation awareness and
the same-host allocation decider if there is an uneven distribution of
nodes across zones or hosts, since these deciders prevent shard copies
from being allocated unevenly and may therefore leave some unassigned
shards.

The point of these two deciders is to improve resilience given a limited
number of shard copies but there is no need for this behaviour when the
number of shard copies is not limited, so this commit supresses them in
that case.

Closes #54151
Closes #2869
DaveCTurner added a commit that referenced this issue Feb 22, 2021
Today if an index is set to `auto_expand_replicas: N-all` then we will
try and create a shard copy on every node that matches the applicable
allocation filters. This conflits with shard allocation awareness and
the same-host allocation decider if there is an uneven distribution of
nodes across zones or hosts, since these deciders prevent shard copies
from being allocated unevenly and may therefore leave some unassigned
shards.

The point of these two deciders is to improve resilience given a limited
number of shard copies but there is no need for this behaviour when the
number of shard copies is not limited, so this commit supresses them in
that case.

Closes #54151
Closes #2869
easyice pushed a commit to easyice/elasticsearch that referenced this issue Mar 25, 2021
Today if an index is set to `auto_expand_replicas: N-all` then we will
try and create a shard copy on every node that matches the applicable
allocation filters. This conflits with shard allocation awareness and
the same-host allocation decider if there is an uneven distribution of
nodes across zones or hosts, since these deciders prevent shard copies
from being allocated unevenly and may therefore leave some unassigned
shards.

The point of these two deciders is to improve resilience given a limited
number of shard copies but there is no need for this behaviour when the
number of shard copies is not limited, so this commit supresses them in
that case.

Closes elastic#54151
Closes elastic#2869
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >enhancement high hanging fruit Team:Distributed Meta label for distributed team
Projects
None yet
Development

Successfully merging a pull request may close this issue.