[Feature Request] Configuration to customize discovery/zen/fd/master_ping

**Describe the feature**:
- Configuration to customize discovery/zen/fd/master_ping. A config option to make elasticsearch skip pinging and waiting for old master before new master.
<br/>
In kubernetes environment, ip of each member node in cluster are assigned to a pod which is a docker container. When a pod(node) is terminated. you will have a ping timeout to old master address as newly created pod(node) will have a different ip address. In this situation, cluster outage occurs for `discovery.zen.join_timeout` * 20 times(as [documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-discovery-zen.html#master-election)) which will be more than a minute. Reducing `ping_timeout` lower than 1 second is too dangerous(may have a problem in master-election) and waiting for several seconds after SIGTERM to elasticsearch for maintaining pod ip for ping doesn't seem to be a proper solution. As [this discussion](https://discuss.elastic.co/t/timed-out-waiting-for-all-nodes-to-process-published-state-and-cluster-unavailability/138590), I believe that adding a config option to make elasticsearch skip pinging and waiting for old master before new master will be a good solution.

- reference: https://github.com/helm/charts/issues/8785 , https://discuss.elastic.co/t/timed-out-waiting-for-all-nodes-to-process-published-state-and-cluster-unavailability/138590



**Elasticsearch version** (`bin/elasticsearch --version`): 6.2.3

**Plugins installed**: [`ingest-geoip`, `ingest-user-agent`, `repository-s3`]

**JVM version** (`java -version`): 
```
openjdk version "1.8.0_161"
OpenJDK Runtime Environment (build 1.8.0_161-b14)
OpenJDK 64-Bit Server VM (build 25.161-b14, mixed mode)
```

**OS version** (`uname -a` if on a Unix-like system): `Linux {HOSTNAME} 4.9.0-6-amd64 #1 SMP Debian 4.9.82-1+deb9u3 (2018-03-02) x86_64 x86_64 x86_64 GNU/Linux`

**Description of the problem including expected versus actual behavior**:

**Steps to reproduce**:

Please include a *minimal* but *complete* recreation of the problem, including
(e.g.) index creation, mappings, settings, query etc.  The easier you make for
us to reproduce it, the more likely that somebody will take the time to look at it.

 1. Deploy elasticsearch cluster in kubernetes ([helm chart](https://github.com/helm/charts/tree/master/stable/elasticsearch) in my case)
 2. Terminate current master pod(node)
 3. New master is elected within 3~5 seconds, but any member node in cluster doesn't 
respond to http requests about 1 minute(with `discovery.zen.ping_timeout=3s` and `discovery.zen.fd.ping_timeout=3s`).

**Provide logs (if relevant)**:

```
[2018-12-19T09:12:33,326][INFO ][o.e.c.s.ClusterApplierService] [es-monitoring-elasticsearch-master-0] detected_master {es-monitoring-elasticsearch-master-1}{IYHqXZysTTeNLaIGIs3Ggw}{SAoOYOl1T0W-XdV2kEzoYA}{100.96.166.11}{100.96.166.11:9300}, added {{es-monitoring-elasticsearch-client-57654b8f98-p47cm}{HJlePFqgQxq_wmFDEDNQEw}{Thx48_UDSL2CwzLrg0NL2w}{100.96.161.172}{100.96.161.172:9300},{es-monitoring-elasticsearch-master-2}{v3FjSTfcQ4OHAzXCzDcKFQ}{e1X2hVV8SIOkDk1wvE3LKw}{100.96.162.240}{100.96.162.240:9300},{es-monitoring-elasticsearch-data-2}{V2meIqpNTQOH8zY4PCtQ7g}{Pr9uoG03Qc6Xx2h4x-o62A}{100.96.162.225}{100.96.162.225:9300},{es-monitoring-elasticsearch-data-1}{mqfXo0yqTaCcEc956tVmpA}{NQehgvsvQq2Kh1K6tKZaxA}{100.96.161.175}{100.96.161.175:9300},{es-monitoring-elasticsearch-data-0}{rn-v-yB8RbeoHXovkC4UYQ}{vF1s-vC7TheNqloIyEJg4A}{100.96.165.88}{100.96.165.88:9300},{es-monitoring-elasticsearch-master-1}{IYHqXZysTTeNLaIGIs3Ggw}{SAoOYOl1T0W-XdV2kEzoYA}{100.96.166.11}{100.96.166.11:9300},{es-monitoring-elasticsearch-client-57654b8f98-dgvxm}{462pBrdyScC9WgmlkJr8ug}{vv_jrSxbTHi3wo-r03k0fQ}{100.96.166.205}{100.96.166.205:9300},}, reason: apply cluster state (from master [master {es-monitoring-elasticsearch-master-1}{IYHqXZysTTeNLaIGIs3Ggw}{SAoOYOl1T0W-XdV2kEzoYA}{100.96.166.11}{100.96.166.11:9300} committed version [367]])
[2018-12-19T09:12:43,331][INFO ][o.e.d.z.ZenDiscovery     ] [es-monitoring-elasticsearch-master-0] master_left [{es-monitoring-elasticsearch-master-1}{IYHqXZysTTeNLaIGIs3Ggw}{SAoOYOl1T0W-XdV2kEzoYA}{100.96.166.11}{100.96.166.11:9300}], reason [failed to ping, tried [3] times, each with  maximum [3s] timeout]
[2018-12-19T09:12:43,332][WARN ][o.e.d.z.ZenDiscovery     ] [es-monitoring-elasticsearch-master-0] master left (reason = failed to ping, tried [3] times, each with  maximum [3s] timeout), current nodes: nodes:
   {es-monitoring-elasticsearch-client-57654b8f98-p47cm}{HJlePFqgQxq_wmFDEDNQEw}{Thx48_UDSL2CwzLrg0NL2w}{100.96.161.172}{100.96.161.172:9300}
   {es-monitoring-elasticsearch-master-2}{v3FjSTfcQ4OHAzXCzDcKFQ}{e1X2hVV8SIOkDk1wvE3LKw}{100.96.162.240}{100.96.162.240:9300}
   {es-monitoring-elasticsearch-data-2}{V2meIqpNTQOH8zY4PCtQ7g}{Pr9uoG03Qc6Xx2h4x-o62A}{100.96.162.225}{100.96.162.225:9300}
   {es-monitoring-elasticsearch-master-0}{K6kMktL9QJC2sc7K-35McA}{srwO3u3SS9GYAWYeLyUn-g}{100.96.165.141}{100.96.165.141:9300}, local
   {es-monitoring-elasticsearch-data-1}{mqfXo0yqTaCcEc956tVmpA}{NQehgvsvQq2Kh1K6tKZaxA}{100.96.161.175}{100.96.161.175:9300}
   {es-monitoring-elasticsearch-data-0}{rn-v-yB8RbeoHXovkC4UYQ}{vF1s-vC7TheNqloIyEJg4A}{100.96.165.88}{100.96.165.88:9300}
   {es-monitoring-elasticsearch-master-1}{IYHqXZysTTeNLaIGIs3Ggw}{SAoOYOl1T0W-XdV2kEzoYA}{100.96.166.11}{100.96.166.11:9300}, master
   {es-monitoring-elasticsearch-client-57654b8f98-dgvxm}{462pBrdyScC9WgmlkJr8ug}{vv_jrSxbTHi3wo-r03k0fQ}{100.96.166.205}{100.96.166.205:9300}

[2018-12-19T09:12:57,612][INFO ][o.e.d.z.ZenDiscovery     ] [es-monitoring-elasticsearch-master-0] failed to send join request to master [{es-monitoring-elasticsearch-master-1}{IYHqXZysTTeNLaIGIs3Ggw}{SAoOYOl1T0W-XdV2kEzoYA}{100.96.166.11}{100.96.166.11:9300}], reason [ElasticsearchTimeoutException[java.util.concurrent.TimeoutException: Timeout waiting for task.]; nested: TimeoutException[Timeout waiting for task.]; ]
[2018-12-19T09:13:01,851][INFO ][o.e.c.s.ClusterApplierService] [es-monitoring-elasticsearch-master-0] detected_master {es-monitoring-elasticsearch-master-1}{IYHqXZysTTeNLaIGIs3Ggw}{SAoOYOl1T0W-XdV2kEzoYA}{100.96.166.11}{100.96.166.11:9300}, reason: apply cluster state (from master [master {es-monitoring-elasticsearch-master-1}{IYHqXZysTTeNLaIGIs3Ggw}{SAoOYOl1T0W-XdV2kEzoYA}{100.96.166.11}{100.96.166.11:9300} committed version [368]])
[2018-12-19T09:13:01,857][WARN ][o.e.t.TransportService   ] [es-monitoring-elasticsearch-master-0] Received response for a request that has timed out, sent [27532ms] ago, timed out [24531ms] ago, action [internal:discovery/zen/fd/master_ping], node [{es-monitoring-elasticsearch-master-1}{IYHqXZysTTeNLaIGIs3Ggw}{SAoOYOl1T0W-XdV2kEzoYA}{100.96.166.11}{100.96.166.11:9300}], id [29]
[2018-12-19T09:13:01,857][WARN ][o.e.t.TransportService   ] [es-monitoring-elasticsearch-master-0] Received response for a request that has timed out, sent [24529ms] ago, timed out [21529ms] ago, action [internal:discovery/zen/fd/master_ping], node [{es-monitoring-elasticsearch-master-1}{IYHqXZysTTeNLaIGIs3Ggw}{SAoOYOl1T0W-XdV2kEzoYA}{100.96.166.11}{100.96.166.11:9300}], id [30]
[2018-12-19T09:13:01,858][WARN ][o.e.t.TransportService   ] [es-monitoring-elasticsearch-master-0] Received response for a request that has timed out, sent [21530ms] ago, timed out [18530ms] ago, action [internal:discovery/zen/fd/master_ping], node [{es-monitoring-elasticsearch-master-1}{IYHqXZysTTeNLaIGIs3Ggw}{SAoOYOl1T0W-XdV2kEzoYA}{100.96.166.11}{100.96.166.11:9300}], id [31]
[2018-12-19T09:15:54,284][INFO ][o.e.c.s.ClusterApplierService] [es-monitoring-elasticsearch-master-0] removed {{es-monitoring-elasticsearch-client-57654b8f98-dgvxm}{462pBrdyScC9WgmlkJr8ug}{vv_jrSxbTHi3wo-r03k0fQ}{100.96.166.205}{100.96.166.205:9300},}, reason: apply cluster state (from master [master {es-monitoring-elasticsearch-master-1}{IYHqXZysTTeNLaIGIs3Ggw}{SAoOYOl1T0W-XdV2kEzoYA}{100.96.166.11}{100.96.166.11:9300} committed version [369]])
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Request] Configuration to customize discovery/zen/fd/master_ping #36822

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request] Configuration to customize discovery/zen/fd/master_ping #36822

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions