Elasticsearch AWS EKS with multiple availability zones #927

DaveWHarvey · 2018-11-15T03:07:48Z

When I ran the elasticsearch chart on AWS after configuring multiple availability zones, some pods for statefulSets would not start. This is due to AWS volumes: once created they can only be used in one AZ. A newer k8s version will delay binding of volume until pod is created, but that is not available.

Even it is was available, elasticsearch, with shard allocation awareness enabled, is strict about placement and also routing of traffic from coordinating nodes to data nodes. An approximately even random distribution of pods across AZs will be less desirable than a guaranteed even distribution, so even features of 1.13 will not create a complete solution.

To support multiple AZs, it is best to have a storage class per AZ, as well as a statefulset per AZ that uses that storage class (along with an even distribution of worker nodes across AZs). We can then configure the number of AWS volumes per AZ, and be sure that we launch, in each AZ, enough stateful sets distribute replicas completely evenly across AZs.

My intent is to learn enough about helm to implement this is some other way than brute force manual creation of 3 completely different statefulSets. I'd like to be able to instantiate each statefulSet from a common template. Please let me know if I'm missing something.

javsalgar · 2018-11-15T07:33:02Z

Hi,

This is something we have been experiencing issues with. Not only with EKS, but also with other clusters that have nodes in different availability zones. Having the pod deployed in one area and the volume in a different one makes the solution undeployable.

I'm sure that k8s will implement a solution that addresses this kind of issues:

Some threads about the issue:

kubernetes/kubernetes#49906
rancher/rancher#12704

DaveWHarvey · 2018-11-15T16:06:46Z

1.12 introduced "VolumeBindingMode: WaitForFirstConsumer", would would have improved the situation, likely avoiding the pods startup failure. EKS is at 1.11

However, elasticsearch behavior when shard allocation awareness is set justifies more control of placement across AZs than kubernetes will provide with a single stateful set. If there are 3 coordinating nodes and 4 AZs, and pods are distributed across AZs using anti-affinity, then no load will go to one AZ, at least once there is a copy of each shard in each AZ (guaranteed if there are 4 replicas).

There is a tradeoff between abstraction and control. Availability zone and region boundaries have implicit and explicit cost in crossing those boundaries (AWS charges for cross-AZ traffic). For elasticsearch, our conclusion is we need explicit control of number of pods and PVs in each AZ. I will likely hack in templates data2* and data3* as I try to learn enough about helm to do something cleaner.

javsalgar · 2018-11-19T07:53:58Z

I see. In principle I do not see any other alternative than creating different statefulsets per AZ. You could have a loop in helm that creates different statefulsets (using the range control structure), provided that you introduce in the values.yaml each of the AZs. I think your use case is interesting so my advice would be to create a post in the kubernetes page so they consider this kind of cases for future releases of K8s. Maybe they come up with an alternative.

stale · 2018-12-04T08:47:13Z

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

stale · 2018-12-09T15:39:53Z

Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Issue. Do not hesitate to reopen it later if necessary.

stale bot added the stale 15 days without activity label Dec 4, 2018

stale bot closed this as completed Dec 9, 2018

AceHack mentioned this issue Nov 11, 2019

Allow Solr to be run across Availability Zones apache/solr-operator#53

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Elasticsearch AWS EKS with multiple availability zones #927

Elasticsearch AWS EKS with multiple availability zones #927

DaveWHarvey commented Nov 15, 2018 •

edited

Loading

javsalgar commented Nov 15, 2018

DaveWHarvey commented Nov 15, 2018

javsalgar commented Nov 19, 2018

stale bot commented Dec 4, 2018

stale bot commented Dec 9, 2018

Elasticsearch AWS EKS with multiple availability zones #927

Elasticsearch AWS EKS with multiple availability zones #927

Comments

DaveWHarvey commented Nov 15, 2018 • edited Loading

javsalgar commented Nov 15, 2018

DaveWHarvey commented Nov 15, 2018

javsalgar commented Nov 19, 2018

stale bot commented Dec 4, 2018

stale bot commented Dec 9, 2018

DaveWHarvey commented Nov 15, 2018 •

edited

Loading