An Issue Named “minimum_master_nodes does not prevent split-brain if splits are intersecting” #19

LiangShang · 2016-07-06T03:05:17Z

之前有一篇大概翻译了一下 ES 为什么会出现 Split Brain 以及通过设置参数minimum_master_nodes来预防。那篇文章的结尾提到了有一个 Issue 指出即使设置了这个参数依然会有 Split Brain 的问题出现。这篇文章就试着解释一下这个 Issue 所提到的问题和一些与之相关的问题。

Why Still Split Brain?

假设有三个节点组成一个集群，其中 Node2 是 Master，如下图所示。

之后 Node2 和 Node3 之间的网络连接断了，但是它们和 Node1 之间还可以互相 Ping 到。这时，Node2 以为集群里只剩下 Node1 和 Node2 了。同时 Node3 也以为集群只剩下 Node1 和 Node3 了。Node2 还是小集群里的 Master。而这个时候如果 Node3 也将自己选举为 Master 的话，这个集群里就出现了两个 Master 并且会出现 Split Brain。

这个 Split Brain 的问题即使是将 minimum_master_nodes 设置为2 也就是 3/2 +1 时也会出现。

Follow-up

目前这个 Issue 已经被关闭，也就是问题被解决了。

可以看到在一个 PR 上进行了 Zen Discovery 的优化来解决了这个问题。具体怎么解决的还没看懂。。。先来介绍一下 ES 使用的 Zen Discovery 吧。

Zen Discovery

（这一部分要去看相关代码才能明白但是有点看跪了。。。所以就借鉴了这篇文章里的内容）

ES 在集群内部是使用的是一种叫做 Zen Discovery 的方法。它提供的是一种叫做 Unicast Discovery，也就是“单播发现”的机制来维护集群的状态。Zen DIscovery 主要包含以下几个模块：

Ping
Unicast
Master Election
Fault Detection
Cluster State Updates
No Master Block

Ping

ES instance 通过 Ping 来找到其它节点。具体在 Ping 里做了什么要看代码了。。

Unicast

Unicast 是一个点对点的传播。在这里为了提高效率可以指定某些 host 作为 Gossip Router。也就是会采用 Gossip Protocol 的方式进行 unicast。

Unicast 采用的是 Transport 模块来 perform discovery。

Master Election

Master Election 就是用来选举 Master 的。这部分也要先看代码了。。。

As part of the ping process a master of the cluster is either elected or joined to. This is done automatically. The discovery.zen.ping_timeout (which defaults to 3s) allows for the tweaking of election time to handle cases of slow or congested networks (higher values assure less chance of failure). Once a node joins, it will send a join request to the master (discovery.zen.join_timeout) with a timeout defaulting at 20 times the ping timeout.

When the master node stops or has encountered a problem, the cluster nodes start pinging again and will elect a new master. This pinging round also serves as a protection against (partial) network failures where a node may unjustly think that the master has failed. In this case the node will simply hear from other nodes about the currently active master.

If discovery.zen.master_election.filter_client is true, pings from client nodes (nodes where node.client is true, or both node.data and node.master are false) are ignored during master election; the default value is true. If discovery.zen.master_election.filter_data is true, pings from non-master-eligible data nodes (nodes where node.data is true and node.master is false) are ignored during master election; the default value is false. Pings from master-eligible nodes are always observed during master election.

Nodes can be excluded from becoming a master by setting node.master to false. Note, once a node is a client node (node.client set to true), it will not be allowed to become a master (node.master is automatically set to false).

The discovery.zen.minimum_master_nodes sets the minimum number of master eligible nodes that need to join a newly elected master in order for an election to complete and for the elected node to accept it’s mastership. The same setting controls the minimum number of active master eligible nodes that should be a part of any active cluster. If this requirement is not met the active master node will step down and a new master election will be begin.

This setting must be set to a quorum of your master eligible nodes. It is recommended to avoid having only two master eligible nodes, since a quorum of two is two. Therefore, a loss of either master node will result in an inoperable cluster.

Fault Detection

用 Ping 的方式来确认别的 node 是否还在集群里面。Detection 分为

MasterFaultDetection 在所有 node 上运行。
NodesFaultDetection 只在 Master 上运行。

There are two fault detection processes running. The first is by the master, to ping all the other nodes in the cluster and verify that they are alive. And on the other end, each node pings to master to verify if its still alive or an election process needs to be initiated.

The following settings control the fault detection process using the discovery.zen.fd prefix:

ping_interval How often a node gets pinged. Defaults to 1s.
ping_timeout How long to wait for a ping response, defaults to 30s.
ping_retries How many ping failures / timeouts cause a node to be considered failed. Defaults to 3.

The text was updated successfully, but these errors were encountered:

LiangShang added the elasticsearch label Jul 7, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

An Issue Named “minimum_master_nodes does not prevent split-brain if splits are intersecting” #19

An Issue Named “minimum_master_nodes does not prevent split-brain if splits are intersecting” #19

LiangShang commented Jul 6, 2016 •

edited

Loading

An Issue Named “minimum_master_nodes does not prevent split-brain if splits are intersecting” #19

An Issue Named “minimum_master_nodes does not prevent split-brain if splits are intersecting” #19

Comments

LiangShang commented Jul 6, 2016 • edited Loading

Why Still Split Brain?

Follow-up

Zen Discovery

Ping

Unicast

Master Election

Fault Detection

LiangShang commented Jul 6, 2016 •

edited

Loading