Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Configuration to customize discovery/zen/fd/master_ping #36822

Closed
kimxogus opened this issue Dec 19, 2018 · 14 comments
Closed
Labels
:Distributed/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >enhancement team-discuss

Comments

@kimxogus
Copy link

Describe the feature:

  • Configuration to customize discovery/zen/fd/master_ping. A config option to make elasticsearch skip pinging and waiting for old master before new master.

In kubernetes environment, ip of each member node in cluster are assigned to a pod which is a docker container. When a pod(node) is terminated. you will have a ping timeout to old master address as newly created pod(node) will have a different ip address. In this situation, cluster outage occurs for `discovery.zen.join_timeout` * 20 times(as [documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-discovery-zen.html#master-election)) which will be more than a minute. Reducing `ping_timeout` lower than 1 second is too dangerous(may have a problem in master-election) and waiting for several seconds after SIGTERM to elasticsearch for maintaining pod ip for ping doesn't seem to be a proper solution. As [this discussion](https://discuss.elastic.co/t/timed-out-waiting-for-all-nodes-to-process-published-state-and-cluster-unavailability/138590), I believe that adding a config option to make elasticsearch skip pinging and waiting for old master before new master will be a good solution.

Elasticsearch version (bin/elasticsearch --version): 6.2.3

Plugins installed: [ingest-geoip, ingest-user-agent, repository-s3]

JVM version (java -version):

openjdk version "1.8.0_161"
OpenJDK Runtime Environment (build 1.8.0_161-b14)
OpenJDK 64-Bit Server VM (build 25.161-b14, mixed mode)

OS version (uname -a if on a Unix-like system): Linux {HOSTNAME} 4.9.0-6-amd64 #1 SMP Debian 4.9.82-1+deb9u3 (2018-03-02) x86_64 x86_64 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:

Steps to reproduce:

Please include a minimal but complete recreation of the problem, including
(e.g.) index creation, mappings, settings, query etc. The easier you make for
us to reproduce it, the more likely that somebody will take the time to look at it.

  1. Deploy elasticsearch cluster in kubernetes (helm chart in my case)
  2. Terminate current master pod(node)
  3. New master is elected within 3~5 seconds, but any member node in cluster doesn't
    respond to http requests about 1 minute(with discovery.zen.ping_timeout=3s and discovery.zen.fd.ping_timeout=3s).

Provide logs (if relevant):

[2018-12-19T09:12:33,326][INFO ][o.e.c.s.ClusterApplierService] [es-monitoring-elasticsearch-master-0] detected_master {es-monitoring-elasticsearch-master-1}{IYHqXZysTTeNLaIGIs3Ggw}{SAoOYOl1T0W-XdV2kEzoYA}{100.96.166.11}{100.96.166.11:9300}, added {{es-monitoring-elasticsearch-client-57654b8f98-p47cm}{HJlePFqgQxq_wmFDEDNQEw}{Thx48_UDSL2CwzLrg0NL2w}{100.96.161.172}{100.96.161.172:9300},{es-monitoring-elasticsearch-master-2}{v3FjSTfcQ4OHAzXCzDcKFQ}{e1X2hVV8SIOkDk1wvE3LKw}{100.96.162.240}{100.96.162.240:9300},{es-monitoring-elasticsearch-data-2}{V2meIqpNTQOH8zY4PCtQ7g}{Pr9uoG03Qc6Xx2h4x-o62A}{100.96.162.225}{100.96.162.225:9300},{es-monitoring-elasticsearch-data-1}{mqfXo0yqTaCcEc956tVmpA}{NQehgvsvQq2Kh1K6tKZaxA}{100.96.161.175}{100.96.161.175:9300},{es-monitoring-elasticsearch-data-0}{rn-v-yB8RbeoHXovkC4UYQ}{vF1s-vC7TheNqloIyEJg4A}{100.96.165.88}{100.96.165.88:9300},{es-monitoring-elasticsearch-master-1}{IYHqXZysTTeNLaIGIs3Ggw}{SAoOYOl1T0W-XdV2kEzoYA}{100.96.166.11}{100.96.166.11:9300},{es-monitoring-elasticsearch-client-57654b8f98-dgvxm}{462pBrdyScC9WgmlkJr8ug}{vv_jrSxbTHi3wo-r03k0fQ}{100.96.166.205}{100.96.166.205:9300},}, reason: apply cluster state (from master [master {es-monitoring-elasticsearch-master-1}{IYHqXZysTTeNLaIGIs3Ggw}{SAoOYOl1T0W-XdV2kEzoYA}{100.96.166.11}{100.96.166.11:9300} committed version [367]])
[2018-12-19T09:12:43,331][INFO ][o.e.d.z.ZenDiscovery     ] [es-monitoring-elasticsearch-master-0] master_left [{es-monitoring-elasticsearch-master-1}{IYHqXZysTTeNLaIGIs3Ggw}{SAoOYOl1T0W-XdV2kEzoYA}{100.96.166.11}{100.96.166.11:9300}], reason [failed to ping, tried [3] times, each with  maximum [3s] timeout]
[2018-12-19T09:12:43,332][WARN ][o.e.d.z.ZenDiscovery     ] [es-monitoring-elasticsearch-master-0] master left (reason = failed to ping, tried [3] times, each with  maximum [3s] timeout), current nodes: nodes:
   {es-monitoring-elasticsearch-client-57654b8f98-p47cm}{HJlePFqgQxq_wmFDEDNQEw}{Thx48_UDSL2CwzLrg0NL2w}{100.96.161.172}{100.96.161.172:9300}
   {es-monitoring-elasticsearch-master-2}{v3FjSTfcQ4OHAzXCzDcKFQ}{e1X2hVV8SIOkDk1wvE3LKw}{100.96.162.240}{100.96.162.240:9300}
   {es-monitoring-elasticsearch-data-2}{V2meIqpNTQOH8zY4PCtQ7g}{Pr9uoG03Qc6Xx2h4x-o62A}{100.96.162.225}{100.96.162.225:9300}
   {es-monitoring-elasticsearch-master-0}{K6kMktL9QJC2sc7K-35McA}{srwO3u3SS9GYAWYeLyUn-g}{100.96.165.141}{100.96.165.141:9300}, local
   {es-monitoring-elasticsearch-data-1}{mqfXo0yqTaCcEc956tVmpA}{NQehgvsvQq2Kh1K6tKZaxA}{100.96.161.175}{100.96.161.175:9300}
   {es-monitoring-elasticsearch-data-0}{rn-v-yB8RbeoHXovkC4UYQ}{vF1s-vC7TheNqloIyEJg4A}{100.96.165.88}{100.96.165.88:9300}
   {es-monitoring-elasticsearch-master-1}{IYHqXZysTTeNLaIGIs3Ggw}{SAoOYOl1T0W-XdV2kEzoYA}{100.96.166.11}{100.96.166.11:9300}, master
   {es-monitoring-elasticsearch-client-57654b8f98-dgvxm}{462pBrdyScC9WgmlkJr8ug}{vv_jrSxbTHi3wo-r03k0fQ}{100.96.166.205}{100.96.166.205:9300}

[2018-12-19T09:12:57,612][INFO ][o.e.d.z.ZenDiscovery     ] [es-monitoring-elasticsearch-master-0] failed to send join request to master [{es-monitoring-elasticsearch-master-1}{IYHqXZysTTeNLaIGIs3Ggw}{SAoOYOl1T0W-XdV2kEzoYA}{100.96.166.11}{100.96.166.11:9300}], reason [ElasticsearchTimeoutException[java.util.concurrent.TimeoutException: Timeout waiting for task.]; nested: TimeoutException[Timeout waiting for task.]; ]
[2018-12-19T09:13:01,851][INFO ][o.e.c.s.ClusterApplierService] [es-monitoring-elasticsearch-master-0] detected_master {es-monitoring-elasticsearch-master-1}{IYHqXZysTTeNLaIGIs3Ggw}{SAoOYOl1T0W-XdV2kEzoYA}{100.96.166.11}{100.96.166.11:9300}, reason: apply cluster state (from master [master {es-monitoring-elasticsearch-master-1}{IYHqXZysTTeNLaIGIs3Ggw}{SAoOYOl1T0W-XdV2kEzoYA}{100.96.166.11}{100.96.166.11:9300} committed version [368]])
[2018-12-19T09:13:01,857][WARN ][o.e.t.TransportService   ] [es-monitoring-elasticsearch-master-0] Received response for a request that has timed out, sent [27532ms] ago, timed out [24531ms] ago, action [internal:discovery/zen/fd/master_ping], node [{es-monitoring-elasticsearch-master-1}{IYHqXZysTTeNLaIGIs3Ggw}{SAoOYOl1T0W-XdV2kEzoYA}{100.96.166.11}{100.96.166.11:9300}], id [29]
[2018-12-19T09:13:01,857][WARN ][o.e.t.TransportService   ] [es-monitoring-elasticsearch-master-0] Received response for a request that has timed out, sent [24529ms] ago, timed out [21529ms] ago, action [internal:discovery/zen/fd/master_ping], node [{es-monitoring-elasticsearch-master-1}{IYHqXZysTTeNLaIGIs3Ggw}{SAoOYOl1T0W-XdV2kEzoYA}{100.96.166.11}{100.96.166.11:9300}], id [30]
[2018-12-19T09:13:01,858][WARN ][o.e.t.TransportService   ] [es-monitoring-elasticsearch-master-0] Received response for a request that has timed out, sent [21530ms] ago, timed out [18530ms] ago, action [internal:discovery/zen/fd/master_ping], node [{es-monitoring-elasticsearch-master-1}{IYHqXZysTTeNLaIGIs3Ggw}{SAoOYOl1T0W-XdV2kEzoYA}{100.96.166.11}{100.96.166.11:9300}], id [31]
[2018-12-19T09:15:54,284][INFO ][o.e.c.s.ClusterApplierService] [es-monitoring-elasticsearch-master-0] removed {{es-monitoring-elasticsearch-client-57654b8f98-dgvxm}{462pBrdyScC9WgmlkJr8ug}{vv_jrSxbTHi3wo-r03k0fQ}{100.96.166.205}{100.96.166.205:9300},}, reason: apply cluster state (from master [master {es-monitoring-elasticsearch-master-1}{IYHqXZysTTeNLaIGIs3Ggw}{SAoOYOl1T0W-XdV2kEzoYA}{100.96.166.11}{100.96.166.11:9300} committed version [369]])
@kimxogus kimxogus changed the title Configuration to customize discovery/zen/fd/master_ping [Feature Request] Configuration to customize discovery/zen/fd/master_ping Dec 19, 2018
@dliappis dliappis added the :Distributed/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. label Dec 19, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@dliappis
Copy link
Contributor

@DaveCTurner not sure if it makes sense to consider a zen proposal such as this, given zen2 progress.

@dliappis
Copy link
Contributor

@kimxogus Might be worth taking a look at Elastic's own helm chart -- currently in alpha status -- for Elasticsearch and esp. the clustering and node discovery approach.

@DaveCTurner
Copy link
Contributor

We certainly won't fix this as described - the fault detection and master election mechanisms are completely changing for 7.0 as described in #32006 - but I do think we can do better in this situation. Marking this for team discussion.

The proposal doesn't actually fix the problem described anyway, because it's not a pinging problem:

New master is elected within 3~5 seconds, but any member node in cluster doesn't
respond to http requests about 1 minute(with discovery.zen.ping_timeout=3s and discovery.zen.fd.ping_timeout=3s).

I think the actual problem here is #29025, but a more orderly master handover process would also help.

@DaveCTurner
Copy link
Contributor

DaveCTurner commented Dec 19, 2018

On Linux, reducing net.ipv4.tcp_retries2 (the sysctl, i.e. /proc/sys/net/ipv4/tcp_retries2, not an Elasticsearch setting) ought to help here too. See #34405 (comment).

@kimxogus
Copy link
Author

Reducing net.ipv4.tcp_retries2 to 3 didn't help mine in both https://github.com/helm/charts/tree/master/stable/elasticsearch and elastic's own chart.
I made sure net.ipv4.tcp_retries2 = 3 and discovery.zen.ping_timeout = 3s, but outage was about 1m.

@kimxogus
Copy link
Author

and internal:discovery/zen/fd/master_ping is stil taking longer than 120000ms.

log in elastic's own chart with net.ipv4.tcp_retries2 = 3 and discovery.zen.ping_timeout = 3s

[2018-12-20T01:10:54,073][WARN ][o.e.t.TransportService   ] [elasticsearch-master-0] Received response for a request that has timed out, sent [148120ms] ago, timed out [145119ms] ago, action [internal:discovery/zen/fd/master_ping], node [{elasticsearch-master-1}{tmWFtydtQBiU54yOoiN0Jw}{pr09c9sPQtGOEZ2AtdUaFQ}{100.96.166.76}{100.96.166.76:9300}{ml.machine_memory=805306368, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}], id [77]
[2018-12-20T01:10:54,074][WARN ][o.e.t.TransportService   ] [elasticsearch-master-0] Received response for a request that has timed out, sent [145119ms] ago, timed out [142117ms] ago, action [internal:discovery/zen/fd/master_ping], node [{elasticsearch-master-1}{tmWFtydtQBiU54yOoiN0Jw}{pr09c9sPQtGOEZ2AtdUaFQ}{100.96.166.76}{100.96.166.76:9300}{ml.machine_memory=805306368, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}], id [78]
[2018-12-20T01:10:54,074][WARN ][o.e.t.TransportService   ] [elasticsearch-master-0] Received response for a request that has timed out, sent [142117ms] ago, timed out [139115ms] ago, action [internal:discovery/zen/fd/master_ping], node [{elasticsearch-master-1}{tmWFtydtQBiU54yOoiN0Jw}{pr09c9sPQtGOEZ2AtdUaFQ}{100.96.166.76}{100.96.166.76:9300}{ml.machine_memory=805306368, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}], id [79]

@DaveCTurner
Copy link
Contributor

I do not understand what these messages have to do with the original post, or how you managed to get them. The OP was talking about shutting down a master, but if the master were shut down then it'd never respond, so that's not how these messages arose. Also these requests timed out after 3 seconds, and Elasticsearch reacted to the timeout at that time.

@DaveCTurner
Copy link
Contributor

Could you share logs from both the old, stopping, master and the newly-elected master for the time period from when the old master stopped until the new master was elected and the cluster has fully recovered?

@kimxogus
Copy link
Author

@DaveCTurner
This is my test chart based on https://github.com/elastic/helm-charts.

I created test master cluster with helm install ./elasticsearch --name es-test

and logs with logger.level=debug config option.

old master(master-0) took SIGTERM about 2018-12-21T07:46:16,521. and outage was about 1 minute.

@kimxogus
Copy link
Author

kimxogus commented Dec 21, 2018

vm.max_map_count=262144 and net.ipv4.tcp_retries2=3 in sysctl by k8s pod init containers.
logger.level=debug, discovery.zen.ping_timeout=3s, discovery.zen.fd.ping_timeout=3s by environment variable setting.

Other settings are default values in original chart and image is official image.

@DaveCTurner
Copy link
Contributor

Thanks, the logs were helpful. The issue you are facing is related to #29025: the first cluster state update from the new master causes all the nodes to try and re-establish their connections to the old master, expecting this either to succeed or fail immediately. However Docker's network doesn't behave as expected: if the container has completely gone away, connection attempts receive no response and eventually time out. Worse, we try twice before continuing, so it takes two connection timeouts (each 30 seconds by default) before the cluster proceeds.

I would reset your ping_timeout since it's actually making things a bit worse here, and instead consider reducing transport.tcp.connect_timeout until #29025 is resolved.

@DaveCTurner
Copy link
Contributor

Duplicates #29025.

@kimxogus
Copy link
Author

kimxogus commented Dec 21, 2018

Thank you 👍
reducing transport.tcp.connect_timeout to 2 ~ 3 seconds made outage around 8 ~ 10 seconds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >enhancement team-discuss
Projects
None yet
Development

No branches or pull requests

4 participants