index.auto_expand_replicas and shard allocation settings don't play well together #2869

synhershko · 2013-04-07T18:10:01Z

With an index configured with auto_expand_replicas="0-all" the cluster will try to allocate one primary and ({number nodes} - 1) replicas for each shard, ie a copy of each shard on each node.

However, with "index.routing.allocation.include.zone"="zone1" the cluster is blocked from allocating a shard (either primary or replica) to any nodes that are not configured with a zone attribute set to "zone1", ie "node.zone=zone1" in the elasticsearch.yaml config file. So if a cluster has 3 nodes in "zone1" and 3 nodes in "zone2" it will allocate 3 shards (primary + 2 x replicas) to the "zone1" nodes and mark an additional 3 replicas as unassigned. I observed this using the elasticsearch head utility.

So the allocation behaviour is as desired, ie auto expand replicas to all nodes with a specific zone attribute, but the unassigned shards will make the cluster be in red state

See https://groups.google.com/forum/#!msg/elasticsearch/95hC-wGu7GE/BPPSWsfj8UkJ

synhershko · 2013-04-07T18:10:18Z

using latest master I should add

clintongormley · 2014-08-08T10:54:56Z

I'm wondering if we should deprecate auto-expand... It just seems like the wrong solution

synhershko · 2014-08-08T11:00:30Z

IMO no, auto-expand is a very useful feature and a way for clusters to automatically grow to accommodate peaks with read-heavy operations. It's just somewhat broken when shard-allocation rules aren't trivial.

I admit tho, auto-expand does require multi-cast to be enabled (or predefining IPs for the machines dynamically provisioned during peaks), so at this point it does seem more like of an exotic feature.

nik9000 · 2014-08-08T11:28:07Z

We used it in CirrusSearch so people with only a single node or two nodes
don't end up with a yellow cluster. We document the autocorrelating
behavior and the implications to redundancy. I still don't particularly
trust it BUT it is a nice way to make installation smooth even in small
environments. We use most of the CirrusSearch details both in development
and on the production cluster and it helps with that.
On Aug 8, 2014 12:00 PM, "Itamar Syn-Hershko" notifications@github.com
wrote:

IMO no, auto-expand is a very useful feature and a way for clusters to
automatically grow to accommodate peaks with read-heavy operations. It's
just somewhat broken when shard-allocation rules aren't trivial.

I admit tho, auto-expand does require multi-cast to be enabled (or
predefining IPs for the machines dynamically provisioned during peaks), so
at this point it does seem more like of an exotic feature.

—
Reply to this email directly or view it on GitHub
#2869 (comment)
.

markharwood · 2014-09-05T08:27:54Z

Attached an "adoptme" tag as this needs some further looking into to establish if this is an issue.
The auto-expand feature is used internally to keep a copy of the .scripts index (stored scripts) on each node so this does need to work.

clintongormley · 2015-09-19T17:29:45Z

Having reread this issue, it seems that if we were able to limit the max number of auto-expand replicas to the number of nodes that would allow allocation (eg awareness rules) then we'd avoid the yellow status.

Not sure how feasible this is.

colings86 · 2015-11-13T10:11:43Z

@dakrone Do you know whether @clintongormley suggestion above is feasible?

dakrone · 2015-11-16T18:37:57Z

@colings86 @clintongormley I looked at where the setting is updated, it's
currently updated in MetaDataUpdateSettingsService, which submits a new
cluster state update to change the number of replicas if it detects that the
setting must be changed.

I think it would be possible to try and use the AllocationDeciders to adjust the
number of replicas to the number of nodes where it can be allocated, but I don't
think it's trivial.

Maybe it should be a separate setting? Instead of 0-all it can be
0-available or something?

vb3 · 2016-09-20T18:28:28Z

I noticed that in ES1.7
.scripts index has the 0-all setting automatically
My cluster is yellow because it is unable to expand replicas to nodes that are full in data capacity bounded by

        "cluster.routing.allocation.disk.watermark.low": "80%",
        "cluster.routing.allocation.disk.watermark.high": "85%"

is there a solution for this?

AlexP-Elastic · 2016-10-17T14:37:37Z

We have run into this issue (or a heavily related one) twice in the last month on Cloud, both times on 2.x clusters (#1656, details here and here in the comments)

In particular it appears (it's happening on production clusters so unfortunately there's a limit to how much debugging we can do before it is necessary to reset the cluster state) that an unallocated "auto expand" index is preventing other indexes from migrating given a cluster.routing.allocation.exclude._name directive - is that even remotely possible? (here is my notes on what I saw the most recent time) ...

We are about to deploy a workaround that will temporarily disable "auto expand" for .scripts and .security while updating a cluster (and perhaps user indexes in the future), but .security settings cannot be changed in 2.x so this workaround is incomplete.

cc @alexbrasetvik

s1monw · 2018-01-19T14:52:39Z

we spoke about this in fixit friday to keep this issue going. We came up with a possible different way of looking at the problem. If we'd treat an index that has n-all set we could just consider it fully allocated as long as at least max(1,n) replicas are allocated. Since this is all the allocator can do at this point. We haven't fully flashed out the consequences and how hard it would be to do that but it might be a simpler solution independent of the allocation deciders. /cc @ywelsch

elasticmachine · 2018-03-15T14:07:05Z

Pinging @elastic/es-distributed

inqueue · 2018-05-02T21:19:39Z

This issue can also manifest when decommissioning a data node where, for example, the node is holding a .security-6 copy and cluster level allocation filtering has been applied by _ip. Applying shard allocation filtering does not remove shards having auto-expand configured. While this is not a detrimental problem, it can cause confusion and it should be verified that any remaining shards on a node are due to having auto-expand replicas configured. It also requires the user to know which indices have auto-expand configured.

Perhaps shard allocation settings + shard allocation behavior with auto-expand replicas should be documented until a technical resolution is reached given how long this issue has been open.

ywelsch · 2018-05-11T08:52:42Z

Perhaps shard allocation settings + shard allocation behavior with auto-expand replicas should be documented until a technical resolution is reached given how long this issue has been open.

@inqueue I've opened #30531 to document this.

…30531) Relates to #2869

Today if an index is set to `auto_expand_replicas: N-all` then we will try and create a shard copy on every node that matches the applicable allocation filters. This conflits with shard allocation awareness and the same-host allocation decider if there is an uneven distribution of nodes across zones or hosts, since these deciders prevent shard copies from being allocated unevenly and may therefore leave some unassigned shards. The point of these two deciders is to improve resilience given a limited number of shard copies but there is no need for this behaviour when the number of shard copies is not limited, so this commit supresses them in that case. Closes elastic#54151 Closes elastic#2869

Today if an index is set to `auto_expand_replicas: N-all` then we will try and create a shard copy on every node that matches the applicable allocation filters. This conflits with shard allocation awareness and the same-host allocation decider if there is an uneven distribution of nodes across zones or hosts, since these deciders prevent shard copies from being allocated unevenly and may therefore leave some unassigned shards. The point of these two deciders is to improve resilience given a limited number of shard copies but there is no need for this behaviour when the number of shard copies is not limited, so this commit supresses them in that case. Closes #54151 Closes #2869

Today if an index is set to `auto_expand_replicas: N-all` then we will try and create a shard copy on every node that matches the applicable allocation filters. This conflits with shard allocation awareness and the same-host allocation decider if there is an uneven distribution of nodes across zones or hosts, since these deciders prevent shard copies from being allocated unevenly and may therefore leave some unassigned shards. The point of these two deciders is to improve resilience given a limited number of shard copies but there is no need for this behaviour when the number of shard copies is not limited, so this commit supresses them in that case. Closes elastic#54151 Closes elastic#2869

clintongormley added the discuss label Aug 8, 2014

markharwood added the adoptme label Sep 5, 2014

markharwood removed the discuss label Sep 5, 2014

clintongormley added :Allocation >enhancement discuss and removed help wanted adoptme labels Sep 19, 2015

clintongormley mentioned this issue Nov 28, 2015

Unallocated shards because of "index.auto_expand_replicas":"0-all" and "cluster.routing.allocation.same_shard.host": true #14979

Closed

clintongormley mentioned this issue Jun 3, 2016

Further shard / index placement control by node/racks #18723

Closed

apatrida mentioned this issue Sep 21, 2016

Replace replica count, auto expand and rack awareness with topology constraints #18729

Closed

clintongormley added the v5.1.1 label Oct 17, 2016

clintongormley added v5.2.0 and removed v5.1.1 labels Dec 7, 2016

clintongormley added v5.3.0 and removed v5.2.0 labels Jan 24, 2017

clintongormley added v5.4.0 and removed v5.3.0 labels Feb 7, 2017

clintongormley added v5.4.1 and removed v5.4.0 labels Apr 28, 2017

clintongormley added the v5.4.2 label May 15, 2017

clintongormley added v5.4.3 and removed v5.4.2 labels Jun 14, 2017

clintongormley added v5.4.4 and removed v5.4.3 labels Jun 27, 2017

colings86 added high hanging fruit and removed discuss labels Jan 19, 2018

lcawl added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. and removed :Allocation labels Feb 13, 2018

DaveCTurner added the :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) label Mar 15, 2018

DaveCTurner removed the :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. label Mar 15, 2018

ywelsch mentioned this issue May 11, 2018

Document woes between auto-expand-replicas and allocation filtering #30531

Merged

ywelsch added a commit that referenced this issue May 14, 2018

Document woes between auto-expand-replicas and allocation filtering (#…

c96f2d7

…30531) Relates to #2869

ywelsch added a commit that referenced this issue May 14, 2018

Document woes between auto-expand-replicas and allocation filtering (#…

eb3d184

…30531) Relates to #2869

ywelsch added a commit that referenced this issue May 14, 2018

Document woes between auto-expand-replicas and allocation filtering (#…

3c21393

…30531) Relates to #2869

ywelsch mentioned this issue Jun 29, 2018

.security with status yellow when cluster.routing.allocation.same_shard.host enabled #29933

Closed

DaveCTurner mentioned this issue Nov 15, 2019

Auto-expand indices according to allocation filtering rules #48974

Merged

ywelsch removed the v5.4.4 label Nov 15, 2019

jasontedor mentioned this issue Apr 1, 2020

Enrich indices can clash with user allocation settings #54151

Closed

rjernst added the Team:Distributed Meta label for distributed team label May 4, 2020

DaveCTurner mentioned this issue Feb 22, 2021

Skip zone/host awareness with auto-expand replicas #69334

Merged

DaveCTurner closed this as completed in #69334 Feb 22, 2021

BaurzhanSakhariev mentioned this issue Apr 16, 2024

max_shards_per_node not behaving as documented crate/crate#15803

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

index.auto_expand_replicas and shard allocation settings don't play well together #2869

index.auto_expand_replicas and shard allocation settings don't play well together #2869

synhershko commented Apr 7, 2013

synhershko commented Apr 7, 2013

clintongormley commented Aug 8, 2014

synhershko commented Aug 8, 2014

nik9000 commented Aug 8, 2014

markharwood commented Sep 5, 2014

clintongormley commented Sep 19, 2015

colings86 commented Nov 13, 2015

dakrone commented Nov 16, 2015

vb3 commented Sep 20, 2016

AlexP-Elastic commented Oct 17, 2016 •

edited

s1monw commented Jan 19, 2018

elasticmachine commented Mar 15, 2018

inqueue commented May 2, 2018

ywelsch commented May 11, 2018

index.auto_expand_replicas and shard allocation settings don't play well together #2869

index.auto_expand_replicas and shard allocation settings don't play well together #2869

Comments

synhershko commented Apr 7, 2013

synhershko commented Apr 7, 2013

clintongormley commented Aug 8, 2014

synhershko commented Aug 8, 2014

nik9000 commented Aug 8, 2014

markharwood commented Sep 5, 2014

clintongormley commented Sep 19, 2015

colings86 commented Nov 13, 2015

dakrone commented Nov 16, 2015

vb3 commented Sep 20, 2016

AlexP-Elastic commented Oct 17, 2016 • edited

s1monw commented Jan 19, 2018

elasticmachine commented Mar 15, 2018

inqueue commented May 2, 2018

ywelsch commented May 11, 2018

AlexP-Elastic commented Oct 17, 2016 •

edited