Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible cluster unavailbility #6378

Closed
ramanala opened this issue Sep 7, 2016 · 3 comments
Closed

Possible cluster unavailbility #6378

ramanala opened this issue Sep 7, 2016 · 3 comments

Comments

@ramanala
Copy link

ramanala commented Sep 7, 2016

Possible cluster unavailability

1 rename(source="/data/etcd/infra0.etcd/member/wal/0000000000000001-000000000000001b.wal.tmp", dest="/data/etcd/infra0.etcd/member/wal/0000000000000001-000000000000001b.wal")
2 append("/data/etcd/infra0.etcd/member/wal/0000000000000001-000000000000001b.wal", offset=88, count=4096)
3 append("/data/etcd/infra0.etcd/member/wal/0000000000000001-000000000000001b.wal", offset=4184, count=4242)
4 fdatasync("/data/etcd/infra0.etcd/member/wal/0000000000000001-000000000000001b.wal")

I see the above sequence of system calls when etcd appends a user data item to its wal file. Now, if a crash happens just before the 4th operation (fdatasync), and if the file system reorders the 2nd append and the 3rd append (this reordering is possible on commonly used file systems such as ext4 ordered mode), during recovery, the server will crash with the following error in its debug-log-file.

...timestamp... I | etcdmain: etcd Version: 2.3.0
...timestamp... I | etcdmain: Git SHA: 3719912
...timestamp... I | etcdmain: Go Version: go1.6
...timestamp... I | etcdmain: Go OS/Arch: linux/amd64
...timestamp... I | etcdmain: setting maximum number of CPUs to 40, total number of available CPUs is 40
...timestamp... N | etcdmain: the server is already initialized as member before, starting as etcd member...
...timestamp... I | etcdmain: listening for peers on http://172.17.0.2:2380
...timestamp... I | etcdmain: listening for client requests on http://172.17.0.2:2379
...timestamp... I | etcdserver: name = infra0
...timestamp... I | etcdserver: data dir = /data/etcd/infra0.etcd/
...timestamp... I | etcdserver: member dir = /data/etcd/infra0.etcd//member
...timestamp... I | etcdserver: heartbeat = 100ms
...timestamp... I | etcdserver: election = 1000ms
...timestamp... I | etcdserver: snapshot count = 10000
...timestamp... I | etcdserver: advertise client URLs = http://172.17.0.2:2379
...timestamp... C | etcdserver: read wal error (walpb: crc mismatch) and cannot be repaired

Two nodes in a three node cluster can easily get into this state and so the majority of servers can go unusable. Thus, the third node in the cluster cannot make progress alone as there is no majority, rendering the cluster unavailable.

Although the window of vulnerability is small, this is a potential problem that can be fixed in etcd's recovery code after a crash.

@xiang90
Copy link
Contributor

xiang90 commented Sep 7, 2016

I think this is already fixed, no? @heyitsanthony

@heyitsanthony
Copy link
Contributor

@xiang90 yeah all the O_APPEND stuff is gone as of 3.0.

@xiang90
Copy link
Contributor

xiang90 commented Sep 7, 2016

@ramanala This should be fixed. FYI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants