Minority failure during cluster configuration change risks deadlock #3699

andres-erbsen · 2015-10-17T19:31:30Z

Adding a node to a degraded cluster and then electing it as the leader causes the addition not to reach the nodes that were down. In particular, starting with a cluster of 3 nodes a, b, c and adding d while a is not available and then electing d as the leader makes a unable to participate in the cluster. a will remain useless as long as d is the leader. This is especially bad in case another node (e.g. c) goes down, and the cluster would have to rely on the participation of a to make progress. Thus it is possible to have a situation where there is a live majority from both the old cluster (a, b) and the new cluster (a, b, d) but the cluster is stuck because a does not listen to d even though d is the leader.

I am not sure how to fix this. The invariant from the thesis does not seem applicable since etcd configuration changes apply at a different time and I do not understand the correctness reasoning behind the etcd configuration change algorithm well enough to make changes to it.

A screencast of me reproducing this issue is available at
http://web.mit.edu/andreser/Public/etcd-reconfiguration-deadlock/index.html

The text was updated successfully, but these errors were encountered:

yichengq · 2015-10-17T22:34:59Z

@andres-erbsen I could reproduce, so this is a bug.

The reason AFAIK is that a doesn't know the address of d, so it cannot send raft message to it. We need to let a know d's address in some way.

xiang90 · 2015-10-17T22:42:37Z

@andres-erbsen Interesting... This is a bug. As far as I can see, the root cause is not raft but a bug in transportation layer. We will fix it soon.

daniel-ziegler · 2015-10-20T01:04:22Z

cc @aphyr

daniel-ziegler · 2015-12-14T02:32:19Z

Any updates on this?

xiang90 · 2015-12-22T01:07:06Z

@daniel-ziegler I am going to look into this soon. This was not a priority since it does not happen frequently in real world.

cluster integration now supports adding members with stopped nodes, too Fixes etcd-io#3699

andres-erbsen mentioned this issue Oct 17, 2015

replication: configuration changes without downtime YahooArchive/coname#30

Open

yichengq added the type/bug label Oct 17, 2015

xiang90 added the priority/P1 label Dec 22, 2015

heyitsanthony added a commit to heyitsanthony/etcd that referenced this issue Feb 3, 2016

etcdserver: add leader to transport if not in member list

e661a7d

cluster integration now supports adding members with stopped nodes, too Fixes etcd-io#3699

heyitsanthony mentioned this issue Feb 3, 2016

rafthttp: add leader to transport if peer does not exist #4402

Merged

heyitsanthony added a commit to heyitsanthony/etcd that referenced this issue Feb 3, 2016

rafthttp: add requester to transport if peer does not exist

db0b505

cluster integration now supports adding members with stopped nodes, too Fixes etcd-io#3699

heyitsanthony closed this as completed in #4402 Feb 3, 2016

heyitsanthony mentioned this issue Feb 6, 2016

rafthttp: leader spoofing via X-PeerURLs #4444

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minority failure during cluster configuration change risks deadlock #3699

Minority failure during cluster configuration change risks deadlock #3699

andres-erbsen commented Oct 17, 2015

yichengq commented Oct 17, 2015

xiang90 commented Oct 17, 2015

daniel-ziegler commented Oct 20, 2015

daniel-ziegler commented Dec 14, 2015

xiang90 commented Dec 22, 2015

Minority failure during cluster configuration change risks deadlock #3699

Minority failure during cluster configuration change risks deadlock #3699

Comments

andres-erbsen commented Oct 17, 2015

yichengq commented Oct 17, 2015

xiang90 commented Oct 17, 2015

daniel-ziegler commented Oct 20, 2015

daniel-ziegler commented Dec 14, 2015

xiang90 commented Dec 22, 2015