Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash (but also not a crash?) in RaftLogWriteEntry+0x85 #54

Closed
aphyr opened this issue May 10, 2020 · 1 comment
Closed

Crash (but also not a crash?) in RaftLogWriteEntry+0x85 #54

aphyr opened this issue May 10, 2020 · 1 comment

Comments

@aphyr
Copy link
Collaborator

aphyr commented May 10, 2020

With redis f88f866 (redisraft 2d1cf30), processes kills can result in a node segfaulting, but... also maybe the process kept running? I'm not sure what to make of these log entries.

20200510T112604.000-0400.zip

214601:10 May 08:33:28.418 Raft log file size is 38850, initiating snapshot.
214601:10 May 08:33:28.418 Initiating snapshot.
214601:10 May 08:33:28.418 <raftlib> begin snapshot sli:14981 slt:80 slogs:234
214601:10 May 08:33:28.418 Snapshot scope: first_entry_idx=14981, current_idx=15026
214601:10 May 08:33:28.420 node:560516317: raft.c:479: <raftlib> recvd appendentries t:83 ci:15026 lc:15029 pli:15008 plt:80 #21
214612:C 10 May 2020 08:33:28.429 * DB saved on disk


=== REDIS BUG REPORT START: Cut & paste starting from here ===
214612:C 10 May 2020 08:33:28.438 # Redis 999.999.999 crashed by signal: 11
214612:C 10 May 2020 08:33:28.438 # Crashed running the instruction at: 0x7f60c1bb249f
214612:C 10 May 2020 08:33:28.438 # Accessing address: (nil)
214612:C 10 May 2020 08:33:28.438 # Failed assertion: <no assertion failed> (<no file>:0)

------ STACK TRACE ------
EIP:
/opt/redis/redisraft.so(RaftLogWriteEntry+0x85)[0x7f60c1bb249f]

Backtrace:
redis-server 0.0.0.0:6379(logStackTrace+0x37)[0x561e561f5e57]
redis-server 0.0.0.0:6379(sigsegvHandler+0xb0)[0x561e561f6580]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x110e0)[0x7f60c2b460e0]
/opt/redis/redisraft.so(RaftLogWriteEntry+0x85)[0x7f60c1bb249f]
/opt/redis/redisraft.so(RaftLogRewrite+0x9e)[0x7f60c1bb2aad]
/opt/redis/redisraft.so(initiateSnapshot+0x5c0)[0x7f60c1bad33b]
/opt/redis/redisraft.so(+0x2464e)[0x7f60c1ba864e]
/opt/redis/redisraft.so(uv__run_timers+0x58)[0x7f60c1bc56f8]
/opt/redis/redisraft.so(uv_run+0x7e)[0x7f60c1bc9b7e]
/opt/redis/redisraft.so(+0x24737)[0x7f60c1ba8737]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x74a4)[0x7f60c2b3c4a4]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f60c287ed0f]

------ INFO OUTPUT ------
# Server
redis_version:999.999.999
redis_git_sha1:f88f8661
redis_git_dirty:0
redis_build_id:e03d01ed22d54f5a
redis_mode:standalone
os:Linux 5.6.0-1-amd64 x86_64
arch_bits:64
multiplexing_api:epoll
atomicvar_api:atomic-builtin
gcc_version:6.3.0
process_id:214612
run_id:8a724e3ef721e32c7c617ae3a012f0b0754c69f5
tcp_port:6379
uptime_in_seconds:2
uptime_in_days:0
hz:10
configured_hz:10
lru_clock:12066504
executable:/opt/redis/redis-server
config_file:

# Clients
connected_clients:8
client_recent_max_input_buffer:4
client_recent_max_output_buffer:0
blocked_clients:3
tracking_clients:0

# Memory
used_memory:1372672
used_memory_human:1.31M
used_memory_rss:6799360
used_memory_rss_human:6.48M
used_memory_peak:1372672
used_memory_peak_human:1.31M
used_memory_peak_perc:103.74%
used_memory_overhead:1076190
used_memory_startup:830000
used_memory_dataset:296482
used_memory_dataset_perc:54.63%
allocator_allocated:2144136
allocator_active:2641920
allocator_resident:10711040
total_system_memory:135201345536
total_system_memory_human:125.92G
used_memory_lua:37888
used_memory_lua_human:37.00K
used_memory_scripts:0
used_memory_scripts_human:0B
number_of_cached_scripts:0
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
allocator_frag_ratio:1.23
allocator_frag_bytes:497784
allocator_rss_ratio:4.05
allocator_rss_bytes:8069120
rss_overhead_ratio:0.63
rss_overhead_bytes:-3911680
mem_fragmentation_ratio:5.14
mem_fragmentation_bytes:5476208
mem_not_counted_for_evict:0
mem_replication_backlog:0
mem_clients_slaves:0
mem_clients_normal:234150
mem_aof_buffer:0
mem_allocator:jemalloc-5.1.0
active_defrag_running:0
lazyfree_pending_objects:0

# Persistence
loading:0
rdb_changes_since_last_save:0
rdb_bgsave_in_progress:0
rdb_last_save_time:1589124808
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
rdb_last_cow_size:0
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
aof_last_cow_size:0
module_fork_in_progress:0
module_fork_last_cow_size:0

# Stats
total_connections_received:8
total_commands_processed:732
instantaneous_ops_per_sec:76
total_net_input_bytes:11846
total_net_output_bytes:1853
instantaneous_input_kbps:5.05
instantaneous_output_kbps:1.11
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:0
evicted_keys:0
keyspace_hits:283
keyspace_misses:1
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:0
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
tracking_total_keys:0
tracking_total_items:0

# Replication
role:master
connected_slaves:0
master_replid:48be50ae093588329980597873dfc4702b0eb26d
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

# CPU
used_cpu_sys:0.000000
used_cpu_user:0.003208
used_cpu_sys_children:0.000000
used_cpu_user_children:0.000000

# Modules
module:name=redisraft,ver=1,api=1,filters=1,usedby=[],using=[],options=[]

# Commandstats
cmdstat_info:calls=1,usec=79,usec_per_call=79.00
cmdstat_raft:calls=121,usec=3882,usec_per_call=32.08
cmdstat_lrange:calls=284,usec=3494,usec_per_call=12.30
cmdstat_raft.ae:calls=3,usec=230,usec_per_call=76.67
cmdstat_save:calls=1,usec=8621,usec_per_call=8621.00
cmdstat_raft.requestvote:calls=1,usec=43,usec_per_call=43.00
cmdstat_rpush:calls=318,usec=473,usec_per_call=1.49
cmdstat_config:calls=3,usec=47,usec_per_call=15.67

# Cluster
cluster_enabled:0

# Keyspace
db0:keys=249,expires=0,avg_ttl=0

------ CLIENT LIST OUTPUT ------
id=8 addr=192.168.122.1:47660 fd=16 name= age=2 idle=0 flags=b db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=raft user=default
id=11 addr=192.168.122.1:47666 fd=17 name= age=2 idle=2 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=NULL user=default
id=12 addr=192.168.122.1:47668 fd=18 name= age=2 idle=0 flags=b db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=raft user=default
id=38 addr=192.168.122.13:58300 fd=23 name= age=1 idle=1 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=NULL user=default
id=39 addr=192.168.122.12:42016 fd=24 name= age=1 idle=1 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=NULL user=default
id=40 addr=192.168.122.11:51572 fd=25 name= age=1 idle=0 flags=b db=0 sub=0 psub=0 multi=-1 qbuf=3481 qbuf-free=29287 obl=0 oll=0 omem=0 events=r cmd=raft.ae user=default
id=41 addr=192.168.122.14:36048 fd=26 name= age=1 idle=1 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=NULL user=default
id=7 addr=192.168.122.1:47658 fd=15 name= age=2 idle=2 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=NULL user=default

------ REGISTERS ------
214612:C 10 May 2020 08:33:28.439 # 
RAX:0000000000000000 RBX:0000000000003ab2
RCX:0000000000000a0d RDX:0000000000000005
RDI:00007f60b0004b2d RSI:00007f60b00047a0
RBP:00007f60c1b80f30 RSP:00007f60c1b80f00
R8 :00007f60b0004700 R9 :00007f60b00046c0
R10:0000000000000075 R11:00000000ffffffff
R12:00007f60c2324000 R13:00007f60c2324048
R14:00007f60c1383000 R15:0000000000000000
RIP:00007f60c1bb249f EFL:0000000000010206
CSGSFS:002b000000000033
214612:C 10 May 2020 08:33:28.439 # (00007f60c1b80f0f) -> 00007f60c1bad33b
214612:C 10 May 2020 08:33:28.439 # (00007f60c1b80f0e) -> 00007f60c1b812e0
214612:C 10 May 2020 08:33:28.439 # (00007f60c1b80f0d) -> 0000000000000001
214612:C 10 May 2020 08:33:28.439 # (00007f60c1b80f0c) -> 0000000000003a86
214612:C 10 May 2020 08:33:28.439 # (00007f60c1b80f0b) -> 00007f60c2222300
214612:C 10 May 2020 08:33:28.439 # (00007f60c1b80f0a) -> 0000000000000000
214612:C 10 May 2020 08:33:28.439 # (00007f60c1b80f09) -> 00007f60c1dfecc0
214612:C 10 May 2020 08:33:28.439 # (00007f60c1b80f08) -> 00007f60c1b810a0
214612:C 10 May 2020 08:33:28.439 # (00007f60c1b80f07) -> 00007f60c1bb2aad
214612:C 10 May 2020 08:33:28.439 # (00007f60c1b80f06) -> 00007f60c1b80f70
214612:C 10 May 2020 08:33:28.439 # (00007f60c1b80f05) -> 000000000000000f
214612:C 10 May 2020 08:33:28.439 # (00007f60c1b80f04) -> 0000000b00000000
214612:C 10 May 2020 08:33:28.439 # (00007f60c1b80f03) -> 00007f60c1dfecc0
214612:C 10 May 2020 08:33:28.439 # (00007f60c1b80f02) -> 0000000000003a86
214612:C 10 May 2020 08:33:28.439 # (00007f60c1b80f01) -> 00007f60c2222300
214612:C 10 May 2020 08:33:28.439 # (00007f60c1b80f00) -> 0000000000000000

------ MODULES INFO OUTPUT ------

------ FAST MEMORY TEST ------
*** Preparing to test memory region 561e56545000 (2523136 bytes)
*** Preparing to test memory region 561e58638000 (135168 bytes)
*** Preparing to test memory region 7f60a4000000 (135168 bytes)
*** Preparing to test memory region 7f60ac000000 (135168 bytes)
*** Preparing to test memory region 7f60b0000000 (139264 bytes)
*** Preparing to test memory region 7f60b7800000 (8388608 bytes)
*** Preparing to test memory region 7f60b8000000 (135168 bytes)
*** Preparing to test memory region 7f60bc57c000 (2621440 bytes)
*** Preparing to test memory region 7f60bc7fd000 (8388608 bytes)
*** Preparing to test memor2020-05-10 08:34:00 Jepsen starting redis-server --bind 0.0.0.0 --dbfilename redis.rdb --loadmodule /opt/redis/redisraft.so loglevel=debug raft-log-filename=raftlog.db raft-log-max-file-size=32000 raft-log-max-cache-size=1000000 follower-proxy=yes
y region 7f60bcffe000 (8388608 bytes)
*** Preparing to test memory region 7f60bd7ff000 (8388608 bytes)
*** Preparing to test memory region 7f60be000000 (8388608 bytes)
*** Preparing to test memory region 7f60be800000 (10485760 bytes)
*** Preparing to test memory region 7f60bf380000 (8388608 bytes)
*** Preparing to test memory region 7f60bfb81000 (8388608 bytes)
*** Preparing to test memory region 7f60c0382000 (8388608 bytes)
*** Preparing to test memory region 7f60c0b83000 (8388608 bytes)
*** Preparing to test memory region 7f60c1384000 (8388608 bytes)
*** Preparing to test memory region 7f60c1dff000 (4096 bytes)
*** Preparing to test memory region 7f60c1e00000 (8388608 bytes)
*** Preparing to test memory region 7f60c2b31000 (16384 bytes)
*** Preparing to test memory region 7f60c2d4e000 (16384 bytes)
*** Preparing to test memory region 7f60c3674000 (32768 bytes)
*** Preparing to test memory region 7f60c3687000 (4096 bytes)

Weirdly, it doesn't look like that segfault actually triggered a crash, because it kept logging applying log messages:

*** Preparing to test memory region 7f60c3687000 (4096 bytes)
.214601:10 May 08:33:28.447 node:560516317: raft.c:479: <raftlib> recvd appendentries t:83 ci:15029 lc:15029 pli:15008 plt:80 #21
214601:10 May 08:33:28.447 node:560516317: raft.c:479: <raftlib> recvd appendentries t:83 ci:15029 lc:15030 pli:15026 plt:83 #5
O.O.O.O.O.214601:10 May 08:33:28.465 node:560516317: node.c:361: NodeAddPendingResponse: id=1, type=proxy, request_time=1589124808465
214601:10 May 08:33:28.465 node:560516317: node.c:361: NodeAddPendingResponse: id=2, type=proxy, request_time=1589124808465
214601:10 May 08:33:28.476 node:560516317: raft.c:479: <raftlib> recvd appendentries t:83 ci:15031 lc:15033 pli:15026 plt:83 #9
O.O.O.214601:10 May 08:33:28.502 node:560516317: raft.c:479: <raftlib> recvd appendentries t:83 ci:15035 lc:15033 pli:15029 plt:83 #9
214601:10 May 08:33:28.529 <raftlib> applying log: 14982, id: 122846204 size: 65
214601:10 May 08:33:28.529 <raftlib> applying log: 14983, id: 1563058141 size: 65
214601:10 May 08:33:28.529 <raftlib> applying log: 14984, id: 966673969 size: 121
214601:10 May 08:33:28.529 <raftlib> applying log: 14985, id: 451325546 size: 115

We also encountered duplicate elements and incompatible reads in this history, which might be #52. To reproduce, use jepsen.redis ddb68c89, and try:

lein run test-all --raft-version 2d1cf30 --time-limit 600 --nemesis kill --follower-proxy --test-count 40 --concurrency 4n
@aphyr
Copy link
Collaborator Author

aphyr commented Jun 15, 2020

This looks resolved as of e0123a9!

@aphyr aphyr closed this as completed Jun 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant