Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CRASH] Segfault starting up using flash and external module #821

Open
lukepalmer opened this issue Apr 10, 2024 · 4 comments
Open

[CRASH] Segfault starting up using flash and external module #821

lukepalmer opened this issue Apr 10, 2024 · 4 comments

Comments

@lukepalmer
Copy link
Contributor

lukepalmer commented Apr 10, 2024

Crash report

When using the RedisTimeSeries module, restoring data from flash results in a segmentation fault.

To reproduce:

  1. Start keydb --storage-provider flash --loadmodule redistimeseries.so
  2. Write some data with TS.ADD
  3. Shut down keydb
  4. Start keydb same as before
  5. Observe immediate segfault on startup

The wildly large number of bytes attempting to be allocated is both impossible and changes every time, so there's probably garbage getting in there somehow.

Works fine with flash off. Same thing happens on the async branch.

Happy to help debug if you'd have suggestions.

=== KEYDB BUG REPORT START: Cut & paste starting from here ===
1029861:1029861:M 09 Apr 2024 23:11:03.669 # ------------------------------------------------
1029861:1029861:M 09 Apr 2024 23:11:03.669 # !!! Software Failure. Press left mouse button to continue
1029861:1029861:M 09 Apr 2024 23:11:03.669 # Guru Meditation: KeyDB aborting for OUT OF MEMORY. Allocating 3386706370027781980 bytes! #server.cpp:7104

------ STACK TRACE ------

Backtrace:
/usr/local/home/xxxxxxxx/keydb-server *:6379(redisOutOfMemoryHandler(unsigned long)+0x26) [0x6f5016]
/usr/local/home/xxxxxxxx/keydb-server *:6379(zmalloc_usable+0x2d) [0x6ee81d]
/usr/local/home/xxxxxxxx/keydb-server *:6379(_sdsnewlen+0x1a5) [0x6efa35]
/usr/local/home/xxxxxxxx/keydb-server *:6379(dupStringObject(redisObject const*)+0x89) [0x6d7db9]
/usr/local/home/xxxxxxxx/keydb-server *:6379(RM_CreateStringFromString(RedisModuleCtx*, redisObject const*)+0x10) [0x649550]
/usr/local/home/xxxxxxxx/redistimeseries.so(IndexMetricFromName+0x1b) [0x7ffff1470a0b]
/usr/local/home/xxxxxxxx/redistimeseries.so(NotifyCallback+0xe6) [0x7ffff1466d56]
/usr/local/home/xxxxxxxx/keydb-server *:6379(moduleNotifyKeyspaceEvent(int, char const*, redisObject*, int)+0xa8) [0x6406f8]
/usr/local/home/xxxxxxxx/keydb-server *:6379(moduleLoadCallback(char const*, unsigned long, void*)+0x43) [0x6d3ba3]
/usr/local/home/xxxxxxxx/keydb-server *:6379(RocksDBStorageFactory::create(int, void (*)(char const*, unsigned long, void*), void*)+0x241) [0x602851]
/usr/local/home/xxxxxxxx/keydb-server *:6379(redisDb::storageProviderInitialize()+0x9f) [0x6d43ef]
/usr/local/home/xxxxxxxx/keydb-server *:6379(InitServerLast()+0x38) [0x6fcd78]
/usr/local/home/xxxxxxxx/keydb-server *:6379(main+0x537) [0x5d7687]
/lib64/libc.so.6(__libc_start_main+0xe5) [0x7ffff4e6fd85]
/usr/local/home/xxxxxxxx/keydb-server *:6379(_start+0x2e) [0x5f71de]

------ INFO OUTPUT ------
# Server
redis_version:255.255.255
redis_git_sha1:603ebb27
redis_git_dirty:1
redis_build_id:7086df96b5071d74
redis_mode:standalone
os:Linux 4.18.0-513.18.1.el8_9.jane2.x86_64 x86_64
arch_bits:64
multiplexing_api:epoll
atomicvar_api:atomic-builtin
gcc_version:8.5.0
process_id:1029861
process_supervised:no
run_id:f1cfb6f2b2fd79d2f02a537405aa70a3775211b4
tcp_port:6379
server_time_usec:1712718663510859
uptime_in_seconds:0
uptime_in_days:0
hz:10
configured_hz:10
lru_clock:1442631
executable:/usr/local/home/xxxxxxxx/keydb-server
config_file:
availability_zone:
features:cluster_mget

# Clients
connected_clients:0
cluster_connections:0
maxclients:10000
client_recent_max_input_buffer:0
client_recent_max_output_buffer:0
blocked_clients:0
tracking_clients:0
clients_in_timeout_table:0
current_client_thread:0
thread_0_clients:0

# Memory
used_memory:3705888
used_memory_human:3.53M
used_memory_rss:0
used_memory_rss_human:0B
used_memory_peak:3705888
used_memory_peak_human:3.53M
used_memory_peak_perc:101.29%
used_memory_overhead:0
used_memory_startup:0
used_memory_dataset:3705888
used_memory_dataset_perc:100.00%
allocator_allocated:0
allocator_active:0
allocator_resident:0
total_system_memory:1068558966784
total_system_memory_human:995.17G
used_memory_lua:37888
used_memory_lua_human:37.00K
used_memory_scripts:0
used_memory_scripts_human:0B
number_of_cached_scripts:0
maxmemory:30000000000
maxmemory_human:27.94G
maxmemory_policy:allkeys-lfu
allocator_frag_ratio:-nan
allocator_frag_bytes:0
allocator_rss_ratio:-nan
allocator_rss_bytes:0
rss_overhead_ratio:-nan
rss_overhead_bytes:0
mem_fragmentation_ratio:-nan
mem_fragmentation_bytes:0
mem_not_counted_for_evict:0
mem_replication_backlog:0
mem_clients_slaves:0
mem_clients_normal:0
mem_aof_buffer:0
mem_allocator:jemalloc-5.2.1
active_defrag_running:0
lazyfree_pending_objects:0
lazyfreed_objects:0
storage_provider:flash
available_system_memory:unavailable

# Persistence
loading:0
current_cow_size:0
current_cow_size_age:0
current_fork_perc:0.00
current_save_keys_processed:0
current_save_keys_total:0
rdb_changes_since_last_save:0
rdb_bgsave_in_progress:0
rdb_last_save_time:1712718663
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
rdb_last_cow_size:0
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
aof_last_cow_size:0
module_fork_in_progress:0
module_fork_last_cow_size:0
storage_flash_used_bytes:17843716542464
storage_flash_total_bytes:18394273284096
storage_flash_rocksdb_bytes:5379700319

# Stats
total_connections_received:0
total_commands_processed:2
instantaneous_ops_per_sec:0
total_net_input_bytes:0
total_net_output_bytes:0
instantaneous_input_kbps:0.00
instantaneous_output_kbps:0.00
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:0
evicted_keys:0
keyspace_hits:0
keyspace_misses:267
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:0
total_forks:0
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
total_error_replies:0
dump_payload_sanitizations:0
total_reads_processed:0
total_writes_processed:0
instantaneous_lock_contention:1
avg_lock_contention:0.000000
storage_provider_read_hits:0
storage_provider_read_misses:0

# Replication
role:master
connected_slaves:0
master_failover_state:no-failover
master_replid:5b48a46c63cc6c12f813f6c1a23be68f801c089f
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

# CPU
used_cpu_sys:0.075850
used_cpu_user:0.099823
used_cpu_sys_children:0.000000
used_cpu_user_children:0.000000
server_threads:1
long_lock_waits:0
used_cpu_sys_main_thread:0.063032
used_cpu_user_main_thread:0.097057

# Modules
module:name=timeseries,ver=999999,api=1,filters=0,usedby=[],using=[],options=[handle-io-errors]

# Commandstats
cmdstat_info:calls=2,usec=15,usec_per_call=7.50,rejected_calls=0,failed_calls=0

# Errorstats

# Cluster
cluster_enabled:0

# Keyspace

# KeyDB
mvcc_depth:0

------ CLIENT LIST OUTPUT ------

------ MODULES INFO OUTPUT ------

------ FAST MEMORY TEST ------
1029861:1029861:M 09 Apr 2024 23:11:03.670 # KeyDB 255.255.255 crashed by signal: 11, si_code: 1
1029861:1029861:M 09 Apr 2024 23:11:03.670 # Accessing address: 0x2d0
1029861:1029861:M 09 Apr 2024 23:11:03.670 # Crashed running the instruction at: 0x7ffff5209b74

------ STACK TRACE ------
EIP:
/lib64/libpthread.so.0(pthread_cancel+0x4) [0x7ffff5209b74]

Backtrace:
/lib64/libpthread.so.0(+0x12d10) [0x7ffff520cd10]
/lib64/libpthread.so.0(pthread_cancel+0x4) [0x7ffff5209b74]
/usr/local/home/xxxxxxxx/keydb-server *:6379() [0x6792ac]
/usr/local/home/xxxxxxxx/keydb-server *:6379(doFastMemoryTest()+0x38) [0x679a78]
/usr/local/home/xxxxxxxx/keydb-server *:6379(_serverPanic+0x161) [0x679f41]
/usr/local/home/xxxxxxxx/keydb-server *:6379(redisOutOfMemoryHandler(unsigned long)+0x26) [0x6f5016]
/usr/local/home/xxxxxxxx/keydb-server *:6379(zmalloc_usable+0x2d) [0x6ee81d]
/usr/local/home/xxxxxxxx/keydb-server *:6379(_sdsnewlen+0x1a5) [0x6efa35]
/usr/local/home/xxxxxxxx/keydb-server *:6379(dupStringObject(redisObject const*)+0x89) [0x6d7db9]
/usr/local/home/xxxxxxxx/keydb-server *:6379(RM_CreateStringFromString(RedisModuleCtx*, redisObject const*)+0x10) [0x649550]
/usr/local/home/xxxxxxxx/redistimeseries.so(IndexMetricFromName+0x1b) [0x7ffff1470a0b]
/usr/local/home/xxxxxxxx/redistimeseries.so(NotifyCallback+0xe6) [0x7ffff1466d56]
/usr/local/home/xxxxxxxx/keydb-server *:6379(moduleNotifyKeyspaceEvent(int, char const*, redisObject*, int)+0xa8) [0x6406f8]
/usr/local/home/xxxxxxxx/keydb-server *:6379(moduleLoadCallback(char const*, unsigned long, void*)+0x43) [0x6d3ba3]
/usr/local/home/xxxxxxxx/keydb-server *:6379(RocksDBStorageFactory::create(int, void (*)(char const*, unsigned long, void*), void*)+0x241) [0x602851]
/usr/local/home/xxxxxxxx/keydb-server *:6379(redisDb::storageProviderInitialize()+0x9f) [0x6d43ef]
/usr/local/home/xxxxxxxx/keydb-server *:6379(InitServerLast()+0x38) [0x6fcd78]
/usr/local/home/xxxxxxxx/keydb-server *:6379(main+0x537) [0x5d7687]
/lib64/libc.so.6(__libc_start_main+0xe5) [0x7ffff4e6fd85]
/usr/local/home/xxxxxxxx/keydb-server *:6379(_start+0x2e) [0x5f71de]

------ REGISTERS ------
1029861:1029861:M 09 Apr 2024 23:11:03.672 # 
RAX:0000000000000000 RBX:0000000000000000
RCX:0000000000000b40 RDX:0000000000d24940
RDI:0000000000000000 RSI:00007ffff51f6860
RBP:00007ffff7faf680 RSP:00007fffffffcb08
R8 :00007ffff51f6860 R9 :00007ffff7faf680
R10:ffffffffffffff84 R11:00007ffff5209b70
R12:0000000000000004 R13:00007fffdef46512
R14:2effff80000eff5c R15:0000000000000021
RIP:00007ffff5209b74 EFL:0000000000010202
CSGSFS:002b000000000033
1029861:1029861:M 09 Apr 2024 23:11:03.672 # (00007fffffffcb17) -> 61636f6c6c41202e
1029861:1029861:M 09 Apr 2024 23:11:03.672 # (00007fffffffcb16) -> 59524f4d454d2046
1029861:1029861:M 09 Apr 2024 23:11:03.672 # (00007fffffffcb15) -> 4f2054554f20726f
1029861:1029861:M 09 Apr 2024 23:11:03.672 # (00007fffffffcb14) -> 6620676e6974726f
1029861:1029861:M 09 Apr 2024 23:11:03.672 # (00007fffffffcb13) -> 626120424479654b
1029861:1029861:M 09 Apr 2024 23:11:03.672 # (00007fffffffcb12) -> 0000000000001000
1029861:1029861:M 09 Apr 2024 23:11:03.672 # (00007fffffffcb11) -> 00007fffffffcc60
1029861:1029861:M 09 Apr 2024 23:11:03.672 # (00007fffffffcb10) -> 00007fffffffcd30
1029861:1029861:M 09 Apr 2024 23:11:03.672 # (00007fffffffcb0f) -> 0000003000000020
1029861:1029861:M 09 Apr 2024 23:11:03.672 # (00007fffffffcb0e) -> 0000000000679f41
1029861:1029861:M 09 Apr 2024 23:11:03.672 # (00007fffffffcb0d) -> 0000000000000004
1029861:1029861:M 09 Apr 2024 23:11:03.672 # (00007fffffffcb0c) -> 0000000000679a78
1029861:1029861:M 09 Apr 2024 23:11:03.672 # (00007fffffffcb0b) -> 0000000000001bc0
1029861:1029861:M 09 Apr 2024 23:11:03.672 # (00007fffffffcb0a) -> 0000000000bd2dd2
1029861:1029861:M 09 Apr 2024 23:11:03.672 # (00007fffffffcb09) -> 00007fffdef46512
1029861:1029861:M 09 Apr 2024 23:11:03.672 # (00007fffffffcb08) -> 00000000006792ac

------ INFO OUTPUT ------
# Server
redis_version:255.255.255
redis_git_sha1:603ebb27
redis_git_dirty:1
redis_build_id:7086df96b5071d74
redis_mode:standalone
os:Linux 4.18.0-513.18.1.el8_9.jane2.x86_64 x86_64
arch_bits:64
multiplexing_api:epoll
atomicvar_api:atomic-builtin
gcc_version:8.5.0
process_id:1029861
process_supervised:no
run_id:f1cfb6f2b2fd79d2f02a537405aa70a3775211b4
tcp_port:6379
server_time_usec:1712718663510859
uptime_in_seconds:0
uptime_in_days:0
hz:10
configured_hz:10
lru_clock:1442631
executable:/usr/local/home/xxxxxxx/keydb-server
config_file:
availability_zone:
features:cluster_mget

# Clients
connected_clients:0
cluster_connections:0
maxclients:10000
client_recent_max_input_buffer:0
client_recent_max_output_buffer:0
blocked_clients:0
tracking_clients:0
clients_in_timeout_table:0
current_client_thread:0
thread_0_clients:0

# Memory
used_memory:3705664
used_memory_human:3.53M
used_memory_rss:0
used_memory_rss_human:0B
used_memory_peak:3705888
used_memory_peak_human:3.53M
used_memory_peak_perc:99.99%
used_memory_overhead:0
used_memory_startup:0
used_memory_dataset:3705664
used_memory_dataset_perc:100.00%
allocator_allocated:0
allocator_active:0
allocator_resident:0
total_system_memory:1068558966784
total_system_memory_human:995.17G
used_memory_lua:37888
used_memory_lua_human:37.00K
used_memory_scripts:0
used_memory_scripts_human:0B
number_of_cached_scripts:0
maxmemory:30000000000
maxmemory_human:27.94G
maxmemory_policy:allkeys-lfu
allocator_frag_ratio:-nan
allocator_frag_bytes:0
allocator_rss_ratio:-nan
allocator_rss_bytes:0
rss_overhead_ratio:-nan
rss_overhead_bytes:0
mem_fragmentation_ratio:-nan
mem_fragmentation_bytes:0
mem_not_counted_for_evict:0
mem_replication_backlog:0
mem_clients_slaves:0
mem_clients_normal:0
mem_aof_buffer:0
mem_allocator:jemalloc-5.2.1
active_defrag_running:0
lazyfree_pending_objects:0
lazyfreed_objects:0
storage_provider:flash
available_system_memory:unavailable

# Persistence
loading:0
current_cow_size:0
current_cow_size_age:0
current_fork_perc:0.00
current_save_keys_processed:0
current_save_keys_total:0
rdb_changes_since_last_save:0
rdb_bgsave_in_progress:0
rdb_last_save_time:1712718663
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
rdb_last_cow_size:0
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
aof_last_cow_size:0
module_fork_in_progress:0
module_fork_last_cow_size:0
storage_flash_used_bytes:17843782774784
storage_flash_total_bytes:18394273284096
storage_flash_rocksdb_bytes:5379700319

# Stats
total_connections_received:0
total_commands_processed:2
instantaneous_ops_per_sec:0
total_net_input_bytes:0
total_net_output_bytes:0
instantaneous_input_kbps:0.00
instantaneous_output_kbps:0.00
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:0
evicted_keys:0
keyspace_hits:0
keyspace_misses:267
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:0
total_forks:0
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
total_error_replies:0
dump_payload_sanitizations:0
total_reads_processed:0
total_writes_processed:0
instantaneous_lock_contention:1
avg_lock_contention:0.000000
storage_provider_read_hits:0
storage_provider_read_misses:0

# Replication
role:master
connected_slaves:0
master_failover_state:no-failover
master_replid:5b48a46c63cc6c12f813f6c1a23be68f801c089f
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

# CPU
used_cpu_sys:0.077773
used_cpu_user:0.101721
used_cpu_sys_children:0.000000
used_cpu_user_children:0.000000
server_threads:1
long_lock_waits:0
used_cpu_sys_main_thread:0.063032
used_cpu_user_main_thread:0.098880

# Modules
module:name=timeseries,ver=999999,api=1,filters=0,usedby=[],using=[],options=[handle-io-errors]

# Commandstats
cmdstat_info:calls=2,usec=15,usec_per_call=7.50,rejected_calls=0,failed_calls=0

# Errorstats

# Cluster
cluster_enabled:0

# Keyspace

# KeyDB
mvcc_depth:0

------ CLIENT LIST OUTPUT ------

------ MODULES INFO OUTPUT ------

------ FAST MEMORY TEST ------
Segmentation fault (core dumped)

Aditional information

  1. OS distribution and version
  2. Steps to reproduce (if any)
@lukepalmer lukepalmer changed the title [CRASH] [CRASH] Segfault starting up using flash and external module Apr 10, 2024
@keithchew
Copy link

keithchew commented Apr 12, 2024

Hi @lukepalmer

I recall getting this error a while back, but cannot remember the exact fix I did on my local.

In moduleLoadCallback() db.cpp, can you try replacing:

    initStaticStringObject(keyobj, const_cast<char *>(rgchKey));

with

    initStaticStringObject(keyobj, (char *)rgchKey);

@lukepalmer
Copy link
Contributor Author

Thanks @keithchew!
Unfortunately no difference with that change (and I'd have been really curious why if that worked). Let me know if you have other ideas and I'm happy to dig a bit.

@keithchew
Copy link

Yeah, apologies, it must have been another change or bug. The last time I was testing KeyDB key notifications to modules was here:

#708
#536

Back then, I made numerous hacks on my local copy and got things running smoothly.

@keithchew
Copy link

keithchew commented Apr 13, 2024

Ah, it might be thing one I was thinking about:
#742

Just looking at my notes, even though I found the cause and fix in the link above, it appears that I encountered subsequent problems with KeyDB and modules on startup. I ended up not using key_load_itr in the creation of the storage factory, but created a new method in server.cpp to perform key loading notification after everything has been initialised.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants