Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AE prev conflicts with committed entry #53

Closed
aphyr opened this issue May 10, 2020 · 1 comment
Closed

AE prev conflicts with committed entry #53

aphyr opened this issue May 10, 2020 · 1 comment

Comments

@aphyr
Copy link
Collaborator

aphyr commented May 10, 2020

With process crashes and partitions, we observed the following crash on startup of node n4:

20200510T091809.000-0400.zip

2020-05-10 06:27:47 Jepsen starting redis-server --bind 0.0.0.0 --dbfilename redis.rdb --loadmodule /opt/redis/redisraft.so loglevel=debug raft-log-filename=raf
tlog.db raft-log-max-file-size=32000 raft-log-max-cache-size=1000000 follower-proxy=yes
197382:C 10 May 2020 06:27:47.277 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
197382:C 10 May 2020 06:27:47.277 # Redis version=999.999.999, bits=64, commit=f88f8661, modified=0, pid=197382, just started
197382:C 10 May 2020 06:27:47.277 # Configuration loaded
197382:M 10 May 2020 06:27:47.277 * Increased maximum number of open files to 10032 (it was originally set to 1024).
197382:M 10 May 2020 06:27:47.278 * Running mode=standalone, port=6379.
197382:M 10 May 2020 06:27:47.278 # Server initialized
197382:M 10 May 2020 06:27:47.278 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overco
mmit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
197382:M 10 May 2020 06:27:47.278 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issu
es with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order t
o retain the setting after a reboot. Redis must be restarted after THP is disabled.
197382:M 10 May 2020 06:27:47.278 * <redisraft> RedisRaft starting, arguments: loglevel=debug raft-log-filename=raftlog.db raft-log-max-file-size=32000 raft-log
-max-cache-size=1000000 follower-proxy=yes
197382:M 10 May 2020 06:27:47.279 * Module 'redisraft' loaded from /opt/redis/redisraft.so
197382:M 10 May 2020 06:27:47.279 * Loading RDB produced by version 999.999.999
197382:M 10 May 2020 06:27:47.279 * RDB age 12 seconds
197382:M 10 May 2020 06:27:47.279 * RDB memory usage when created 1.54 Mb
197382:M 10 May 2020 06:27:47.280 * DB loaded from disk: 0.001 seconds
197382:M 10 May 2020 06:27:47.280 * Ready to accept connections
197382:10 May 06:27:47.779 Loading: Redis loading complete, snapshot LOADED
197382:10 May 06:27:47.779 Loading: Snapshot: applied term=177 index=12482
197382:10 May 06:27:47.779 Loading: Snapshot config: node id=290960629 [192.168.122.11:6379], active=1, voting=1
197382:10 May 06:27:47.779 Loading: Snapshot config: node id=1114280924 [192.168.122.14:6379], active=1, voting=1
197382:10 May 06:27:47.779 Loading: Snapshot config: node id=1919582493 [192.168.122.15:6379], active=1, voting=1
197382:10 May 06:27:47.779 Loading: Snapshot config: node id=1347621502 [192.168.122.12:6379], active=1, voting=1
197382:10 May 06:27:47.779 Loading: Snapshot config: node id=1823052095 [192.168.122.13:6379], active=1, voting=1
197382:10 May 06:27:47.779 Snapshot configuration loaded. Raft state:
197382:10 May 06:27:47.779   node <unknown?>
197382:10 May 06:27:47.779   node id=290960629,addr=192.168.122.11,port=6379
197382:10 May 06:27:47.779   node id=1919582493,addr=192.168.122.15,port=6379
197382:10 May 06:27:47.779   node id=1347621502,addr=192.168.122.12,port=6379
197382:10 May 06:27:47.780   node id=1823052095,addr=192.168.122.13,port=6379
197382:10 May 06:27:47.780 Loading: Log loaded, 0 entries, snapshot last term=177, index=12482
197382:10 May 06:27:47.780 Loading: Log starts from snapshot term=177, index=12482
197382:10 May 06:27:47.780 Raft Log: loaded current term=183, vote=1114280924
197382:10 May 06:27:47.780 Raft state after applying log: log_count=0, current_idx=12482, last_applied_idx=12482
197382:10 May 06:27:47.888 node:1823052095: node.c:74: Node connection established.
197382:10 May 06:27:47.888 node:1347621502: node.c:74: Node connection established.
197382:10 May 06:27:47.888 node:1919582493: node.c:74: Node connection established.
197382:10 May 06:27:47.888 node:290960629: node.c:74: Node connection established.
197382:10 May 06:27:48.696 <raftlib> becoming follower
197382:10 May 06:27:48.696 State change: Node is now a follower, term 184
197382:10 May 06:27:48.696 <raftlib> randomize election timeout to 1574
197382:10 May 06:27:48.696 node:290960629: raft.c:479: <raftlib> node requested vote: -1728024480 replying: not granted
197382:10 May 06:27:48.708 node:290960629: raft.c:479: <raftlib> recvd appendentries t:184 ci:12483 lc:12482 pli:12576 plt:177 #1
197382:10 May 06:27:48.708 node:290960629: raft.c:479: <raftlib> AE no log at prev_idx 12576
197382:10 May 06:27:48.709 node:290960629: raft.c:479: <raftlib> recvd appendentries t:184 ci:12483 lc:12482 pli:12483 plt:177 #94
197382:10 May 06:27:48.709 node:290960629: raft.c:479: <raftlib> AE no log at prev_idx 12483
197382:10 May 06:27:48.711 node:290960629: raft.c:479: <raftlib> recvd appendentries t:184 ci:12483 lc:12482 pli:12482 plt:177 #95
197382:10 May 06:27:49.401 node:290960629: raft.c:479: <raftlib> recvd appendentries t:184 ci:12578 lc:12577 pli:12482 plt:177 #95
197382:10 May 06:27:50.401 node:290960629: raft.c:479: <raftlib> recvd appendentries t:184 ci:12673 lc:12637 pli:12577 plt:184 #61
197382:10 May 06:27:50.401 node:290960629: raft.c:479: <raftlib> AE term doesn't match prev_term (ie. 177 vs 184) ci:12673 comi:12577 lcomi:12637 pli:12577
197382:10 May 06:27:50.401 node:290960629: raft.c:479: <raftlib> AE prev conflicts with committed entry
redis-server: raft.c:784: callRaftPeriodic: Assertion `ret == 0' failed.

The error appeared recoverable; a subsequent restart of the node did not encounter it, and we did not observe any safety impact from this crash. However, a write-loss event (#52) did happen roughly forty seconds prior to this assertion error, which raises the possibility that the two events are somehow related.

You can reproduce this problem on Jepsen.redis ddb68c8961a8f2b1cb0b37f263bd1eb0930ab5b8 by running:

lein run test-all --raft-version 2d1cf30 --time-limit 600 --nemesis partition,kill --follower-proxy --test-count 40 --concurrency 4n
@aphyr
Copy link
Collaborator Author

aphyr commented Jun 15, 2020

This also appears resolved in e0123a9

@aphyr aphyr closed this as completed Jun 15, 2020
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant