Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible segfault? #50

Closed
aphyr opened this issue May 9, 2020 · 4 comments
Closed

Possible segfault? #50

aphyr opened this issue May 9, 2020 · 4 comments

Comments

@aphyr
Copy link
Collaborator

aphyr commented May 9, 2020

With redis f88f866 and redis-raft 2d1cf30, tests with process pauses, crashes, network partitions, and membership changes could rarely cause nodes to exit with a nil assertion failure, no stacktrace, and signal 11, which is SIGSEGV, right?

20200509T141124.000-0400.zip

84255:09 May 11:14:05.742 node:1701768179: raft.c:479: <raftlib> recvd appendentries t:15 ci:4836 lc:4836 pli:4836 plt:15 #1
84255:09 May 11:14:05.750 node:1701768179: node.c:361: NodeAddPendingResponse: id=744, type=proxy, request_time=1589048045750
84255:09 May 11:14:05.757 node:1701768179: raft.c:479: <raftlib> recvd appendentries t:15 ci:4837 lc:4836 pli:4836 plt:15 #3
84255:09 May 11:14:05.770 node:964805122: node.c:74: Node connection established.
84255:09 May 11:14:05.770 node:644665335: node.c:74: Node connection established.


=== REDIS BUG REPORT START: Cut & paste starting from here ===
84255:M 09 May 2020 11:14:05.770 # Redis 999.999.999 crashed by signal: 11
84255:M 09 May 2020 11:14:05.770 # Crashed running the instruction at: 0x7fac51395854
84255:M 09 May 2020 11:14:05.770 # Accessing address: (nil)
84255:M 09 May 2020 11:14:05.770 # Failed assertion: <no assertion failed> (<no file>:0)

------ STACK TRACE ------

I recognize this is incredibly unhelpful, and I am SO sorry to file this. Maybe my cluster is cursed.

@aphyr
Copy link
Collaborator Author

aphyr commented May 10, 2020

Got a second case, at least!

20200510T062516.000-0400.zip

@aphyr
Copy link
Collaborator Author

aphyr commented May 10, 2020

Oh hey, this one's got an actual stacktrace!

20200510T051208.000-0400.zip

=== REDIS BUG REPORT START: Cut & paste starting from here ===
171068:M 10 May 2020 02:14:02.541 # Redis 999.999.999 crashed by signal: 11
171068:M 10 May 2020 02:14:02.541 # Crashed running the instruction at: 0x7f2d8abbad20
171068:M 10 May 2020 02:14:02.541 # Accessing address: 0x18
171068:M 10 May 2020 02:14:02.541 # Failed assertion: <no assertion failed> (<no file>:0)

------ STACK TRACE ------
EIP:
/opt/redis/redisraft.so(raft_node_is_voting+0x0)[0x7f2d8abbad20]

Backtrace:
redis-server 0.0.0.0:6379(logStackTrace+0x37)[0x56466e753e57]
redis-server 0.0.0.0:6379(sigsegvHandler+0xb0)[0x56466e754580]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x110e0)[0x7f2d8ba500e0]
/opt/redis/redisraft.so(raft_node_is_voting+0x0)[0x7f2d8abbad20]
/opt/redis/redisraft.so(raft_node_set_voting+0x2d)[0x7f2d8abbad5d]
/opt/redis/redisraft.so(RaftLogDelete+0xcd)[0x7f2d8abb28d8]
/opt/redis/redisraft.so(+0x2f14b)[0x7f2d8abb314b]
/opt/redis/redisraft.so(raft_recv_appendentries+0x398)[0x7f2d8abb9008]
/opt/redis/redisraft.so(+0x25522)[0x7f2d8aba9522]
/opt/redis/redisraft.so(RaftReqHandleQueue+0x49)[0x7f2d8aba9386]
/opt/redis/redisraft.so(+0x453ab)[0x7f2d8abc93ab]
/opt/redis/redisraft.so(uv__io_poll+0x298)[0x7f2d8abd6d88]
/opt/redis/redisraft.so(uv_run+0x115)[0x7f2d8abc9c15]
/opt/redis/redisraft.so(+0x24737)[0x7f2d8aba8737]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x74a4)[0x7f2d8ba464a4]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f2d8b788d0f]

------ INFO OUTPUT ------
# Server
redis_version:999.999.999
redis_git_sha1:f88f8661
redis_git_dirty:0
redis_build_id:79b3a241a6702136
redis_mode:standalone
os:Linux 5.6.0-1-amd64 x86_64
arch_bits:64
multiplexing_api:epoll
atomicvar_api:atomic-builtin
gcc_version:6.3.0
process_id:171068
run_id:f4ce065d0c3cab5986ce70f0632e952aab2f1177
tcp_port:6379
uptime_in_seconds:1
uptime_in_days:0
hz:10
configured_hz:10
lru_clock:12043738
executable:/opt/redis/redis-server
config_file:

# Clients
connected_clients:12
client_recent_max_input_buffer:4
client_recent_max_output_buffer:0
blocked_clients:1
tracking_clients:0

# Memory
used_memory:1265144
used_memory_human:1.21M
used_memory_rss:8908800
used_memory_rss_human:8.50M
used_memory_peak:1265144
used_memory_peak_human:1.21M
used_memory_peak_perc:100.30%
used_memory_overhead:1132866
used_memory_startup:830000
used_memory_dataset:132278
used_memory_dataset_perc:30.40%
allocator_allocated:2006696
allocator_active:2494464
allocator_resident:8622080
total_system_memory:135201345536
total_system_memory_human:125.92G
used_memory_lua:37888
used_memory_lua_human:37.00K
used_memory_scripts:0
used_memory_scripts_human:0B
number_of_cached_scripts:0
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
allocator_frag_ratio:1.24
allocator_frag_bytes:487768
allocator_rss_ratio:3.46
allocator_rss_bytes:6127616
rss_overhead_ratio:1.03
rss_overhead_bytes:286720
mem_fragmentation_ratio:7.06
mem_fragmentation_bytes:7647440
mem_not_counted_for_evict:0
mem_replication_backlog:0
mem_clients_slaves:0
mem_clients_normal:302066
mem_aof_buffer:0
mem_allocator:jemalloc-5.1.0
active_defrag_running:0
lazyfree_pending_objects:0

# Persistence
loading:0
rdb_changes_since_last_save:0
rdb_bgsave_in_progress:0
rdb_last_save_time:1589102041
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
rdb_last_cow_size:0
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
aof_last_cow_size:0
module_fork_in_progress:0
module_fork_last_cow_size:0

# Stats
total_connections_received:12
total_commands_processed:179
instantaneous_ops_per_sec:171
total_net_input_bytes:4658
total_net_output_bytes:2627
instantaneous_input_kbps:2.71
instantaneous_output_kbps:1.59
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:0
evicted_keys:0
keyspace_hits:0
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:0
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
tracking_total_keys:0
tracking_total_items:0

# Replication
role:master
connected_slaves:0
master_replid:8b81b8a668d033e7476928ccd377cccf4d4a5462
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

# CPU
used_cpu_sys:0.049448
used_cpu_user:0.019548
used_cpu_sys_children:0.000000
used_cpu_user_children:0.000000

# Modules
module:name=redisraft,ver=1,api=1,filters=1,usedby=[],using=[],options=[]

# Commandstats
cmdstat_config:calls=2,usec=29,usec_per_call=14.50
cmdstat_raft.requestvote:calls=1,usec=68,usec_per_call=68.00
cmdstat_raft.ae:calls=1,usec=50,usec_per_call=50.00
cmdstat_info:calls=1,usec=87,usec_per_call=87.00
cmdstat_raft:calls=174,usec=6280,usec_per_call=36.09

# Cluster
cluster_enabled:0

# Keyspace
db0:keys=16,expires=0,avg_ttl=0

------ CLIENT LIST OUTPUT ------
id=12 addr=192.168.122.1:48964 fd=18 name= age=1 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=raft user=default
id=32 addr=192.168.122.1:48986 fd=19 name= age=1 idle=1 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=NULL user=default
id=33 addr=192.168.122.1:48988 fd=20 name= age=1 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=raft user=default
id=60 addr=192.168.122.14:48210 fd=25 name= age=1 idle=0 flags=b db=0 sub=0 psub=0 multi=-1 qbuf=76 qbuf-free=32692 obl=0 oll=0 omem=0 events=r cmd=raft.ae user=default
id=7 addr=192.168.122.1:48950 fd=15 name= age=1 idle=1 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=NULL user=default
id=8 addr=192.168.122.1:48952 fd=16 name= age=1 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=raft user=default
id=61 addr=192.168.122.13:56556 fd=26 name= age=1 idle=1 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=NULL user=default
id=62 addr=192.168.122.12:59360 fd=27 name= age=1 idle=1 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=NULL user=default
id=63 addr=192.168.122.15:48520 fd=28 name= age=1 idle=1 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=NULL user=default
id=122 addr=192.168.122.1:49038 fd=29 name= age=1 idle=1 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=NULL user=default
id=123 addr=192.168.122.1:49040 fd=30 name= age=1 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=raft user=default
id=11 addr=192.168.122.1:48962 fd=17 name= age=1 idle=1 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=NULL user=default

------ REGISTERS ------
171068:M 10 May 2020 02:14:02.542 # 
RAX:0000000000000000 RBX:0000000000000000
RCX:0000000000000363 RDX:00007f2d8b30fb80
RDI:0000000000000000 RSI:0000000000000000
RBP:00007f2d8ab7dee0 RSP:00007f2d8ab7dec8
R8 :00007f2d8b209f51 R9 :0000000000000000
R10:0000000000000000 R11:00007f2d8b7ec040
R12:0000000000000000 R13:00007f2d8ab7e080
R14:00007f2d8b30fe28 R15:000000000000035d
RIP:00007f2d8abbad20 EFL:0000000000010246
CSGSFS:002b000000000033
171068:M 10 May 2020 02:14:02.542 # (00007f2d8ab7ded7) -> 00007f2d8ab7df80
171068:M 10 May 2020 02:14:02.542 # (00007f2d8ab7ded6) -> 0000000000000363
171068:M 10 May 2020 02:14:02.542 # (00007f2d8ab7ded5) -> 000000000000035d
171068:M 10 May 2020 02:14:02.542 # (00007f2d8ab7ded4) -> 0000000000006423
171068:M 10 May 2020 02:14:02.542 # (00007f2d8ab7ded3) -> 00007f2d8b30fb80
171068:M 10 May 2020 02:14:02.542 # (00007f2d8ab7ded2) -> 0000000587e3a000
171068:M 10 May 2020 02:14:02.542 # (00007f2d8ab7ded1) -> 00007f2d8b30da60
171068:M 10 May 2020 02:14:02.542 # (00007f2d8ab7ded0) -> 00007f2d8b222180
171068:M 10 May 2020 02:14:02.542 # (00007f2d8ab7decf) -> 000000000000035d
171068:M 10 May 2020 02:14:02.542 # (00007f2d8ab7dece) -> 00007f2d8abb8290
171068:M 10 May 2020 02:14:02.542 # (00007f2d8ab7decd) -> 00007f2d87e17000
171068:M 10 May 2020 02:14:02.542 # (00007f2d8ab7decc) -> 00007f2d8abb28d8
171068:M 10 May 2020 02:14:02.542 # (00007f2d8ab7decb) -> 00007f2d8ab7df40
171068:M 10 May 2020 02:14:02.542 # (00007f2d8ab7deca) -> 00007f2d87e17000
171068:M 10 May 2020 02:14:02.542 # (00007f2d8ab7dec9) -> 00007f2d8ab7e080
171068:M 10 May 2020 02:14:02.542 # (00007f2d8ab7dec8) -> 00007f2d8abbad5d

------ MODULES INFO OUTPUT ------

------ FAST MEMORY TEST ------
171068:M 10 May 2020 02:14:02.543 # Bio thread for job type #0 terminated
171068:M 10 May 2020 02:14:02.543 # Bio thread for job type #1 terminated
171068:M 10 May 2020 02:14:02.543 # Bio thread for job type #2 terminated
*** Preparing to test memory region 56466eaa3000 (2523136 bytes)
*** Preparing to test memory region 564670163000 (135168 bytes)
*** Preparing to test memory region 7f2d70000000 (135168 bytes)
*** Preparing to test memory region 7f2d78000000 (139264 bytes)
*** Preparing to test memory region 7f2d7c000000 (135168 bytes)
*** Preparing to test memory region 7f2d80000000 (135168 bytes)
*** Preparing to test memory region 7f2d8537c000 (8388608 bytes)
*** Preparing to test memory region 7f2d85b7d000 (8388608 bytes)
*** Preparing to test memory region 7f2d8637e000 (8388608 bytes)
*** Preparing to test memory region 7f2d86b7f000 (8388608 bytes)
*** Preparing to test memory region 7f2d8737f000 (2621440 bytes)
*** Preparing to test memory region 7f2d87600000 (8388608 bytes)
*** Preparing to test memory region 7f2d87e00000 (4194304 bytes)
*** Preparing to test memory region 7f2d88380000 (8388608 bytes)
*** Preparing to test memory region 7f2d88b81000 (8388608 bytes)
*** Preparing to test memory region 7f2d89382000 (8388608 bytes)
*** Preparing to test memory region 7f2d89b83000 (8388608 bytes)
*** Preparing to test memory region 7f2d8a384000 (8388608 bytes)
*** Preparing to test memory region 7f2d8adff000 (4096 bytes)
*** Preparing to test memory region 7f2d8ae00000 (8388608 bytes)
*** Preparing to test memory region 7f2d8ba3b000 (16384 bytes)
*** Preparing to test memory region 7f2d8bc58000 (16384 bytes)
*** Preparing to test memory region 7f2d8c57d000 (32768 bytes)
*** Preparing to test memory region 7f2d8c591000 (4096 bytes)

@aphyr
Copy link
Collaborator Author

aphyr commented May 10, 2020

Huh, and here's another, from raft_node_set_next_idx--maybe a different bug?

20200510T050141.000-0400.zip

------ STACK TRACE ------
EIP:
/opt/redis/redisraft.so(raft_node_set_next_idx+0x10)[0x7f7bfc9baca0]

Backtrace:
redis-server 0.0.0.0:6379(logStackTrace+0x37)[0x55e4282f1e57]
redis-server 0.0.0.0:6379(sigsegvHandler+0xb0)[0x55e4282f2580]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x110e0)[0x7f7bfd96a0e0]
/opt/redis/redisraft.so(raft_node_set_next_idx+0x10)[0x7f7bfc9baca0]
/opt/redis/redisraft.so(+0x2a4a7)[0x7f7bfc9ae4a7]
/opt/redis/redisraft.so(redisProcessCallbacks+0xdf)[0x7f7bfc9bcc9f]
/opt/redis/redisraft.so(redisAsyncHandleRead+0x66)[0x7f7bfc9bd316]
/opt/redis/redisraft.so(+0x19a91)[0x7f7bfc99da91]
/opt/redis/redisraft.so(uv__io_poll+0x298)[0x7f7bfc9d6d88]
/opt/redis/redisraft.so(uv_run+0x115)[0x7f7bfc9c9c15]
/opt/redis/redisraft.so(+0x24737)[0x7f7bfc9a8737]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x74a4)[0x7f7bfd9604a4]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f7bfd6a2d0f]

@aphyr
Copy link
Collaborator Author

aphyr commented Jun 15, 2020

This looks resolved as of e0123a9.

@aphyr aphyr closed this as completed Jun 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant