Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected Crash when booting up (Segmentation Fault) #1176

Closed
namirsab opened this issue May 24, 2022 · 32 comments · Fixed by #1185
Closed

Unexpected Crash when booting up (Segmentation Fault) #1176

namirsab opened this issue May 24, 2022 · 32 comments · Fixed by #1185

Comments

@namirsab
Copy link

namirsab commented May 24, 2022

Redis Version: 6.2.6
RedisTimeSeries version: 1.6.7

After a crash, we tried bringing redis up and it keeps crashing in RedisModuleCommandDispatcher and the only module we have is TimeSeries, hence I think this bug belongs here.

The trace:


=== REDIS BUG REPORT START: Cut & paste starting from here ===

1:M 24 May 2022  # Redis 6.2.6 crashed by signal: 11, si_code: 1

1:M 24 May 2022  # Accessing address: 0x18d121284

1:M 24 May 2022  # Crashed running the instruction at: 0x5641cd3e6531


redis-server *:6379(loadAppendOnlyFile+0x639)[0x5641cd3aa419]

redis-server *:6379(loadDataFromDisk+0x13c)[0x5641cd3603fc]

/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb)[0x7fec2d23f09b]


1:M 24 May 2022  # 

RCX:00007fcf80000000 RDX:0000000000000100

R8 :0000000000000001 R9 :00007fec2ce064f8

R12:00007fec2cf43b80 R13:00007fec2ce299a0


------ STACK TRACE ------

EIP:

redis-server *:6379(RedisModuleCommandDispatcher+0x81)[0x5641cd3e6531]


Backtrace:

/lib/x86_64-linux-gnu/libpthread.so.0(+0x12730)[0x7fec2d3f0730]

redis-server *:6379(RedisModuleCommandDispatcher+0x81)[0x5641cd3e6531]

redis-server *:6379(call+0xa1)[0x5641cd35a1e1]

redis-server *:6379(execCommand+0x1f4)[0x5641cd3ac524]

redis-server *:6379(main+0x2b8)[0x5641cd34f698]

redis-server *:6379(_start+0x2a)[0x5641cd34fbea]

------ REGISTERS ------

RAX:00007fd6e8c1dde0 RBX:0000000000000000

RDI:000000018d121280 RSI:00007fec2d000900

RBP:00007ffd73c1ca70 RSP:00007ffd73c1c9e0

R10:00007fec2d217898 R11:0000000000000008

R14:00007fec2cf43b80 R15:0000033acf9a5c7e

RIP:00005641cd3e6531 EFL:0000000000010246

1:M 24 May 2022  # (00007ffd73c1c9ea) -> 0000000000000000

1:M 24 May 2022  # (00007ffd73c1c9e8) -> 0000000000000000

1:M 24 May 2022  # (00007ffd73c1c9e6) -> 0000000000000041

1:M 24 May 2022  # (00007ffd73c1c9e4) -> 0000000000000000

1:M 24 May 2022  # (00007ffd73c1c9e1) -> 00007fec2ce0d580

1:M 24 May 2022  # (00007ffd73c1c9e0) -> 00005641cd3de8f0


------ INFO OUTPUT ------

# Server

redis_version:6.2.6

redis_git_sha1:00000000

redis_git_dirty:0

redis_build_id:c97a7d6fe7316bcd

redis_mode:standalone

os:Linux 5.4.0-90-generic x86_64

arch_bits:64

multiplexing_api:epoll

atomicvar_api:c11-builtin

gcc_version:8.3.0

process_id:1

process_supervised:no

run_id:751c6b795f6a5451a824c34d3ec38f1260778248

tcp_port:6379

server_time_usec:1653413017240336

uptime_in_seconds:647

uptime_in_days:0

hz:10

configured_hz:10

lru_clock:9245202

executable:/redis-server

config_file:

io_threads_active:0


# Clients

connected_clients:2

cluster_connections:0

maxclients:10000

client_recent_max_input_buffer:0

client_recent_max_output_buffer:0

blocked_clients:0

tracking_clients:0

clients_in_timeout_table:0


# Memory

used_memory:125409205256

used_memory_human:116.80G

used_memory_rss:128869486592

used_memory_rss_human:120.02G

used_memory_peak:125409209352

used_memory_peak_human:116.80G

used_memory_peak_perc:100.00%

used_memory_overhead:119403240

used_memory_startup:881136

used_memory_dataset:125289802016

allocator_allocated:125409116928

total_system_memory:270163423232

used_memory_scripts_human:0B

number_of_cached_scripts:0

maxmemory_human:0B

maxmemory_policy:noeviction

allocator_frag_bytes:2952448

allocator_rss_ratio:1.03

allocator_rss_bytes:3461402624

rss_overhead_ratio:1.00

rss_overhead_bytes:-3985408

mem_fragmentation_ratio:1.03

mem_fragmentation_bytes:3460408464

mem_not_counted_for_evict:0

mem_replication_backlog:0

mem_clients_slaves:0

mem_clients_normal:0

mem_aof_buffer:0

mem_allocator:jemalloc-5.1.0

active_defrag_running:0

lazyfree_pending_objects:0

lazyfreed_objects:0


# Persistence

loading:1

current_cow_size:0

current_cow_size_age:0

current_fork_perc:0.00

current_save_keys_processed:0

current_save_keys_total:0

rdb_changes_since_last_save:8530222

rdb_bgsave_in_progress:0

rdb_last_save_time:1653412370

rdb_last_bgsave_status:ok

rdb_last_bgsave_time_sec:-1

rdb_current_bgsave_time_sec:-1

rdb_last_cow_size:0

aof_enabled:0

aof_rewrite_in_progress:0

aof_rewrite_scheduled:0

aof_last_rewrite_time_sec:-1

aof_current_rewrite_time_sec:-1

aof_last_bgrewrite_status:ok

aof_last_write_status:ok

aof_last_cow_size:0

module_fork_in_progress:0

module_fork_last_cow_size:0

aof_current_size:0

aof_base_size:0

aof_pending_rewrite:0

aof_buffer_length:0

aof_rewrite_buffer_length:0

loading_start_time:1653412370

loading_loaded_bytes:105470092522

# Stats

total_connections_received:768

total_commands_processed:3287083

instantaneous_ops_per_sec:0

total_net_input_bytes:260281

total_net_output_bytes:519910

instantaneous_input_kbps:0.00

instantaneous_output_kbps:0.00

rejected_connections:0

sync_full:0

sync_partial_ok:0

sync_partial_err:0

expired_keys:0

expired_stale_perc:0.00

expired_time_cap_reached_count:0

expire_cycle_cpu_milliseconds:0

evicted_keys:0

keyspace_hits:2124177

CSGSFS:002b000000000033

1:M 24 May 2022  # (00007ffd73c1c9ef) -> 00005641cd40a6a6

1:M 24 May 2022  # (00007ffd73c1c9ee) -> 0000000000000000

1:M 24 May 2022  # (00007ffd73c1c9ed) -> 0000000000000000

1:M 24 May 2022  # (00007ffd73c1c9ec) -> 0000000000000000

1:M 24 May 2022  # (00007ffd73c1c9eb) -> 0000000000000000

1:M 24 May 2022  # (00007ffd73c1c9e9) -> 0000000000000000

1:M 24 May 2022  # (00007ffd73c1c9e7) -> 0000000000000000

1:M 24 May 2022  # (00007ffd73c1c9e5) -> 0000000000000000

1:M 24 May 2022  # (00007ffd73c1c9e3) -> 0000000000000000

1:M 24 May 2022  # (00007ffd73c1c9e2) -> 00007fec2cf43b80

used_memory_dataset_perc:99.91%

allocator_active:125412069376

allocator_resident:128873472000

total_system_memory_human:251.61G

used_memory_lua:37888

used_memory_lua_human:37.00K

used_memory_scripts:0

maxmemory:0

allocator_frag_ratio:1.00

aof_pending_bio_fsync:0

aof_delayed_fsync:0

loading_total_bytes:105470266717

loading_rdb_used_mem:125333379408

loading_loaded_perc:100.00

loading_eta_seconds:0


keyspace_misses:0

pubsub_channels:1

pubsub_patterns:0

latest_fork_usec:0

total_forks:0

migrate_cached_sockets:0

slave_expires_tracked_keys:0

active_defrag_hits:0

active_defrag_misses:0

active_defrag_key_hits:0

active_defrag_key_misses:0

tracking_total_keys:0

tracking_total_items:0

tracking_total_prefixes:0

unexpected_error_replies:2480

total_error_replies:4484

dump_payload_sanitizations:0

total_reads_processed:3872

total_writes_processed:3418

io_threaded_reads_processed:0

io_threaded_writes_processed:0


# Replication

connected_slaves:0

master_replid:9d7b455337bbd82c0b989626a8463cc363d6d4df

master_replid2:0000000000000000000000000000000000000000

master_repl_offset:0

second_repl_offset:-1

repl_backlog_active:0

repl_backlog_size:1048576

repl_backlog_first_byte_offset:0

repl_backlog_histlen:0


# CPU

used_cpu_sys:71.949834

used_cpu_user:566.245590

used_cpu_sys_children:0.010117

used_cpu_user_children:0.048656

used_cpu_user_main_thread:566.240511

# Modules

module:name=timeseries,ver=10607,api=1,filters=0,usedby=[],using=[],options=[handle-io-errors]


# Commandstats

cmdstat_subscribe:calls=1,usec=2,usec_per_call=2.00,rejected_calls=0,failed_calls=0

cmdstat_psync:calls=0,usec=0,usec_per_call=0.00,rejected_calls=629,failed_calls=0

cmdstat_client:calls=2,usec=3,usec_per_call=1.50,rejected_calls=0,failed_calls=0

cmdstat_publish:calls=389,usec=1286,usec_per_call=3.31,rejected_calls=0,failed_calls=0

cmdstat_replconf:calls=1258,usec=686,usec_per_call=0.55,rejected_calls=0,failed_calls=0

cmdstat_ts.madd:calls=0,usec=0,usec_per_call=0.00,rejected_calls=0,failed_calls=502


# Errorstats

errorstat_ERR:count=2480

errorstat_LOADING:count=1375

errorstat_NOAUTH:count=629


*** Preparing to test memory region 7fec2bdee000 (8388608 bytes)

*** Preparing to test memory region 7fec2c788000 (16384 bytes)

*** Preparing to test memory region 7fec2c9ff000 (4096 bytes)

*** Preparing to test memory region 7fec2d217000 (16384 bytes)

*** Preparing to test memory region 7fec2d3d8000 (24576 bytes)

*** Preparing to test memory region 7fec2d3fb000 (16384 bytes)

*** Preparing to test memory region 7fec2d90c000 (8192 bytes)

role:master

master_failover_state:no-failover

used_cpu_sys_main_thread:71.949188


cmdstat_auth:calls=768,usec=3083,usec_per_call=4.01,rejected_calls=0,failed_calls=0

cmdstat_info:calls=79,usec=3805,usec_per_call=48.16,rejected_calls=0,failed_calls=0

cmdstat_ping:calls=0,usec=0,usec_per_call=0.00,rejected_calls=1375,failed_calls=0

# Cluster

cluster_enabled:0


# Keyspace

db0:keys=2124191,expires=0,avg_ttl=0


------ CLIENT LIST OUTPUT ------

id=7 addr=100.64.19.20:44336 laddr=100.64.19.20:6379 fd=14 name=sentinel-7e6731eb-cmd age=646 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=40954 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=61448 events=r cmd=ping user=default redir=-1

id=8 addr=100.64.19.20:44338 laddr=100.64.19.20:6379 fd=15 name=sentinel-7e6731eb-pubsub age=646 idle=1 flags=P db=0 sub=1 psub=0 multi=-1 qbuf=0 qbuf-free=40954 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=61456 events=r cmd=subscribe user=default redir=-1


------ MODULES INFO OUTPUT ------


------ FAST MEMORY TEST ------

1:M 24 May 2022  # Bio thread for job type #0 terminated

1:M 24 May 2022  # Bio thread for job type #1 terminated

1:M 24 May 2022  # Bio thread for job type #2 terminated

*** Preparing to test memory region 5641cd511000 (2277376 bytes)

*** Preparing to test memory region 5641cecc6000 (405504 bytes)

*** Preparing to test memory region 7fcc3fa00000 (136828682240 bytes)

*** Preparing to test memory region 7fec1b5e6000 (218103808 bytes)

*** Preparing to test memory region 7fec285e7000 (8388608 bytes)

*** Preparing to test memory region 7fec28de8000 (8388608 bytes)

*** Preparing to test memory region 7fec295e9000 (8388608 bytes)

*** Preparing to test memory region 7fec29dea000 (8388608 bytes)

*** Preparing to test memory region 7fec2a5eb000 (8388608 bytes)

*** Preparing to test memory region 7fec2adec000 (8388608 bytes)

*** Preparing to test memory region 7fec2b5ed000 (8388608 bytes)

*** Preparing to test memory region 7fec2ca00000 (8388608 bytes)

*** Preparing to test memory region 7fec2d6e4000 (16384 bytes)

*** Preparing to test memory region 7fec2d93a000 (4096 bytes)

1:X 24 May 2022  # +sdown master ampmasters 100.64.19.20 6379

.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O

Fast memory test PASSED, however your memory can still be broken. Please run a memory test for several hours if possible.


------ DUMPING CODE AROUND EIP ------

Symbol: RedisModuleCommandDispatcher (base: 0x5641cd3e64b0)

Module: redis-server *:6379 (base 0x5641cd30c000)

$ xxd -r -p /tmp/dump.hex /tmp/dump.bin

$ objdump --adjust-vma=0x5641cd3e64b0 -D -b binary -m i386:x86-64 /tmp/dump.bin

------

1:M 24 May 2022  # dump of function (hexdump of 257 bytes):

55b90c0000004889e541544989fc53488d9d70ffffff4883c480488b4770488d7d88418b542448498b7424504c8b402831c0f348ab488d050484ffff4889dfc745a04000000048898570ffffff498b004c89658048898578ffffff41ff50084889dfe8e9feffff418b44244885c07e2731db660f1f440000498b442450488b3cd8837f04017e05e8e41ff9ff4883c30141395c24487fe14883ec805b415c5dc35531c04889e5415741564189d641554989f5be0001000041544c8da550ffffff534889cbb90d0000004881ec880000004c8b7f28488dbd60fffffff348ab488d055b83ffff4889df48898550ffffff498b07c745800200000048898558ffffffe8

Function at 0x5641cd3e6400 is moduleFreeContext

Function at 0x5641cd378520 is trimStringObjectIfNeeded


=== REDIS BUG REPORT END. Make sure to include from START to END. ===

We tried fixing the AOF with the redis-check-aof tool, but it didn't work out.

Finally we had to restore from a backup, unfortunately.

Any clue why this happened?

@namirsab namirsab changed the title Unexpected Crash when booting up Unexpected Crash when booting up (Segmentation Fault) May 24, 2022
@OfirMos
Copy link
Contributor

OfirMos commented May 24, 2022

Hi @namirsab
Can you pls share the command or sequence of commands that caused the crash?

@namirsab
Copy link
Author

Hey @OfirMos
This crash happens when booting.

I could show you the crash that happened before:

=== REDIS BUG REPORT START: Cut & paste starting from here ===

1:S 24 May 2022  # Redis 6.2.6 crashed by signal: 11, si_code: 128

1:S 24 May 2022  # Accessing address: (nil)

1:S 24 May 2022  # Crashed running the instruction at: 0x5563dbd0947a


------ STACK TRACE ------

EIP:

redis-server *:6379(ACLCheckAllPerm+0x1a)[0x5563dbd0947a]


Backtrace:

/lib/x86_64-linux-gnu/libpthread.so.0(+0x12730)[0x7f005bb53730]

redis-server *:6379(ACLCheckAllPerm+0x1a)[0x5563dbd0947a]

redis-server *:6379(execCommand+0x1d2)[0x5563dbcb2502]

redis-server *:6379(call+0xa1)[0x5563dbc601e1]

redis-server *:6379(processCommand+0x644)[0x5563dbc61e14]

redis-server *:6379(processCommandAndResetClient+0x1d)[0x5563dbc75d1d]

redis-server *:6379(processInputBuffer+0xea)[0x5563dbc7854a]

redis-server *:6379(+0xfbf21)[0x5563dbd0df21]

redis-server *:6379(aeProcessEvents+0x2a5)[0x5563dbc58f45]

redis-server *:6379(aeMain+0x25)[0x5563dbc591b5]

redis-server *:6379(main+0x31d)[0x5563dbc556fd]

/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb)[0x7f005b9a209b]

redis-server *:6379(_start+0x2a)[0x5563dbc55bea]


------ REGISTERS ------

1:S 24 May 2022  # 

RAX:0000000000000000 RBX:00007ee3c96c5780

RCX:000002000000000a RDX:0043455845010404

RDI:00007ee3c96c5780 RSI:00007ffd3abcd60c

RBP:00007ffd3abcd5c0 RSP:00007ffd3abcd5b0

R8 :00007ffd3abfa090 R9 :00007ffd3abcd638

R10:00007ffd3abcd630 R11:00000000003625f3

R12:00007ffd3abcd60c R13:00007ee3c96c5780

R14:0000000000000000 R15:0000033a3c903a68

RIP:00005563dbd0947a EFL:0000000000010246

CSGSFS:002b000000000033

1:S 24 May 2022  # (00007ffd3abcd5bf) -> 00005563dbe0e4c8

1:S 24 May 2022  # (00007ffd3abcd5be) -> 000000000000000a

1:S 24 May 2022  # (00007ffd3abcd5bd) -> 000000000000000f

1:S 24 May 2022  # (00007ffd3abcd5bc) -> 00007ef0b5c52a68

1:S 24 May 2022  # (00007ffd3abcd5bb) -> 00007f005ba84ff6

1:S 24 May 2022  # (00007ffd3abcd5ba) -> 00007ffd3abcd630

1:S 24 May 2022  # (00007ffd3abcd5b9) -> 000000000000000a

1:S 24 May 2022  # (00007ffd3abcd5b8) -> 00007f005b44a321

1:S 24 May 2022  # (00007ffd3abcd5b7) -> 00007ee3d6bc4108

1:S 24 May 2022  # (00007ffd3abcd5b6) -> 000000013abcd600

1:S 24 May 2022  # (00007ffd3abcd5b5) -> 00005563dbe0e4c8

1:S 24 May 2022  # (00007ffd3abcd5b4) -> 0000000000000001

1:S 24 May 2022  # (00007ffd3abcd5b3) -> 00005563dbcb2502

1:S 24 May 2022  # (00007ffd3abcd5b2) -> 00007ffd3abcd640

1:S 24 May 2022  # (00007ffd3abcd5b1) -> 00007ffd3abcd60c

1:S 24 May 2022  # (00007ffd3abcd5b0) -> 0000000000000000


------ INFO OUTPUT ------

# Server

redis_version:6.2.6

redis_git_sha1:00000000

redis_git_dirty:0

redis_build_id:c97a7d6fe7316bcd

redis_mode:standalone

os:Linux 5.4.0-90-generic x86_64

arch_bits:64

multiplexing_api:epoll

atomicvar_api:c11-builtin

gcc_version:8.3.0

process_id:1

process_supervised:no

run_id:3ab0c37104d212eb3c43a9b51f30bc0c3dfd6529

tcp_port:6379

server_time_usec:1653410548602698

uptime_in_seconds:23428

uptime_in_days:0

hz:10

configured_hz:10

lru_clock:9243380

executable:/redis-server

config_file:

io_threads_active:0


# Clients

connected_clients:12

cluster_connections:0

maxclients:10000

client_recent_max_input_buffer:48

client_recent_max_output_buffer:0

blocked_clients:0

tracking_clients:0

clients_in_timeout_table:0


# Memory

used_memory:125410786584

used_memory_human:116.80G

used_memory_rss:128877752320

used_memory_rss_human:120.03G

used_memory_peak:125416641424

used_memory_peak_human:116.80G

used_memory_peak_perc:100.00%

used_memory_overhead:120701504

used_memory_startup:881208

used_memory_dataset:125290085080

used_memory_dataset_perc:99.90%

allocator_allocated:125411004728

allocator_active:125414772736

allocator_resident:128881516544

total_system_memory:270163410944

total_system_memory_human:251.61G

used_memory_lua:37888

used_memory_lua_human:37.00K

used_memory_scripts:0

used_memory_scripts_human:0B

number_of_cached_scripts:0

maxmemory:0

maxmemory_human:0B

maxmemory_policy:noeviction

allocator_frag_ratio:1.00

allocator_frag_bytes:3768008

allocator_rss_ratio:1.03

allocator_rss_bytes:3466743808

rss_overhead_ratio:1.00

rss_overhead_bytes:-3764224

mem_fragmentation_ratio:1.03

mem_fragmentation_bytes:3466921504

mem_not_counted_for_evict:3578

mem_replication_backlog:1048576

mem_clients_slaves:0

mem_clients_normal:246032

mem_aof_buffer:3584

mem_allocator:jemalloc-5.1.0

active_defrag_running:0

lazyfree_pending_objects:0

lazyfreed_objects:0


# Persistence

loading:0

current_cow_size:0

current_cow_size_age:0

current_fork_perc:0.00

current_save_keys_processed:0

current_save_keys_total:0

rdb_changes_since_last_save:37665789

rdb_bgsave_in_progress:0

rdb_last_save_time:1653387120

rdb_last_bgsave_status:ok

rdb_last_bgsave_time_sec:-1

rdb_current_bgsave_time_sec:-1

rdb_last_cow_size:0

aof_enabled:1

aof_rewrite_in_progress:0

aof_rewrite_scheduled:0

aof_last_rewrite_time_sec:1331

aof_current_rewrite_time_sec:-1

aof_last_bgrewrite_status:ok

aof_last_write_status:ok

aof_last_cow_size:3269165056

module_fork_in_progress:0

module_fork_last_cow_size:0

aof_current_size:108360172205

aof_base_size:104490092022

aof_pending_rewrite:0

aof_buffer_length:0

aof_rewrite_buffer_length:0

aof_pending_bio_fsync:0

aof_delayed_fsync:0


# Stats

total_connections_received:153669

total_commands_processed:42939108

instantaneous_ops_per_sec:5401

total_net_input_bytes:108397403576

total_net_output_bytes:5943015556

instantaneous_input_kbps:321.33

instantaneous_output_kbps:1003.78

rejected_connections:0

sync_full:0

sync_partial_ok:0

sync_partial_err:0

expired_keys:0

expired_stale_perc:0.00

expired_time_cap_reached_count:0

expire_cycle_cpu_milliseconds:0

evicted_keys:0

keyspace_hits:4388557

keyspace_misses:15

pubsub_channels:1

pubsub_patterns:0

latest_fork_usec:1660219

total_forks:1

migrate_cached_sockets:0

slave_expires_tracked_keys:0

active_defrag_hits:0

active_defrag_misses:0

active_defrag_key_hits:0

active_defrag_key_misses:0

tracking_total_keys:0

tracking_total_items:0

tracking_total_prefixes:0

unexpected_error_replies:7916

total_error_replies:31313

dump_payload_sanitizations:0

total_reads_processed:9060089

total_writes_processed:693168

io_threaded_reads_processed:0

io_threaded_writes_processed:0


# Replication

role:slave

master_host:100.64.19.17

master_port:6379

master_link_status:up

master_last_io_seconds_ago:0

master_sync_in_progress:0

slave_read_repl_offset:422289938704

slave_repl_offset:422289938514

slave_priority:100

slave_read_only:1

replica_announced:1

connected_slaves:0

master_failover_state:no-failover

master_replid:93839d9a9a85c2da2941c80f9851df552953c141

master_replid2:0000000000000000000000000000000000000000

master_repl_offset:422289938514

second_repl_offset:-1

repl_backlog_active:1

repl_backlog_size:1048576

repl_backlog_first_byte_offset:422288889939

repl_backlog_histlen:1048576


# CPU

used_cpu_sys:1286.525431

used_cpu_user:2908.672416

used_cpu_sys_children:151.054667

used_cpu_user_children:878.864505

used_cpu_sys_main_thread:1256.781258

used_cpu_user_main_thread:2896.467580


# Modules

module:name=timeseries,ver=10607,api=1,filters=0,usedby=[],using=[],options=[handle-io-errors]


# Commandstats

cmdstat_hset:calls=2335,usec=6154,usec_per_call=2.64,rejected_calls=0,failed_calls=0

cmdstat_subscribe:calls=9,usec=20,usec_per_call=2.22,rejected_calls=0,failed_calls=0

cmdstat_exec:calls=8885418,usec=1348527169,usec_per_call=151.77,rejected_calls=0,failed_calls=0

cmdstat_exists:calls=4,usec=13,usec_per_call=3.25,rejected_calls=0,failed_calls=0

cmdstat_sadd:calls=516,usec=4234,usec_per_call=8.21,rejected_calls=0,failed_calls=0

cmdstat_ts.alter:calls=21,usec=1169,usec_per_call=55.67,rejected_calls=0,failed_calls=0

cmdstat_ts.mrange:calls=65,usec=2888,usec_per_call=44.43,rejected_calls=0,failed_calls=1

cmdstat_ts.del:calls=129,usec=31977,usec_per_call=247.88,rejected_calls=0,failed_calls=0

cmdstat_hdel:calls=11,usec=112,usec_per_call=10.18,rejected_calls=0,failed_calls=0

cmdstat_ts.add:calls=13696597,usec=92502460,usec_per_call=6.75,rejected_calls=0,failed_calls=0

cmdstat_del:calls=35,usec=468,usec_per_call=13.37,rejected_calls=1,failed_calls=0

cmdstat_ts.madd:calls=8885418,usec=1281330472,usec_per_call=144.21,rejected_calls=0,failed_calls=1594

cmdstat_type:calls=15513,usec=18115,usec_per_call=1.17,rejected_calls=0,failed_calls=0

cmdstat_ts.info:calls=1,usec=32,usec_per_call=32.00,rejected_calls=0,failed_calls=0

cmdstat_dump:calls=126999,usec=92336703,usec_per_call=727.07,rejected_calls=21953,failed_calls=0

cmdstat_publish:calls=67936,usec=479671,usec_per_call=7.06,rejected_calls=0,failed_calls=0

cmdstat_info:calls=14914,usec=1663549,usec_per_call=111.54,rejected_calls=0,failed_calls=0

cmdstat_multi:calls=8885419,usec=3235181,usec_per_call=0.36,rejected_calls=0,failed_calls=0

cmdstat_hmset:calls=1247,usec=470847,usec_per_call=377.58,rejected_calls=0,failed_calls=0

cmdstat_client:calls=16,usec=12,usec_per_call=0.75,rejected_calls=0,failed_calls=0

cmdstat_set:calls=66,usec=816,usec_per_call=12.36,rejected_calls=0,failed_calls=0

cmdstat_auth:calls=153669,usec=1185969,usec_per_call=7.72,rejected_calls=0,failed_calls=0

cmdstat_select:calls=1,usec=1,usec_per_call=1.00,rejected_calls=0,failed_calls=0

cmdstat_hgetall:calls=2,usec=15,usec_per_call=7.50,rejected_calls=0,failed_calls=0

cmdstat_keys:calls=2,usec=2211178,usec_per_call=1105589.00,rejected_calls=0,failed_calls=0

cmdstat_srem:calls=96,usec=1140,usec_per_call=11.88,rejected_calls=0,failed_calls=0

cmdstat_ping:calls=71551,usec=79191,usec_per_call=1.11,rejected_calls=1431,failed_calls=0

cmdstat_scan:calls=31,usec=30875,usec_per_call=995.97,rejected_calls=0,failed_calls=0

cmdstat_ts.range:calls=40,usec=24952,usec_per_call=623.80,rejected_calls=0,failed_calls=4

cmdstat_ts.create:calls=36,usec=1087,usec_per_call=30.19,rejected_calls=1,failed_calls=0


# Errorstats

errorstat_ERR:count=29880

errorstat_LOADING:count=1431

errorstat_READONLY:count=2


# Cluster

cluster_enabled:0


# Keyspace

db0:keys=2124191,expires=0,avg_ttl=0

@namirsab
Copy link
Author

But regarding commands, we have the feeling that the problem might be related to TS.DEL

We called TS.DEL on a non existing time series, that's for sure. This normally shouldn't cause any isseus, but my feeling is that if you call TS.DEL with a big enough time range, it might crash redis.

I couldn't reproduce this though, as we are right now dealing with the original problem: our database doesn't boot up

@namirsab
Copy link
Author

In case this helps, this also happens a lot when reading the AOF. Even though this happened before and nothing crashed:

1:M 24 May 2022 19:24:05.517 # == CRITICAL == This server is sending an error to its AOF-loading-client: '-ERR TSDB: the key is not a TSDB key' after processing the command 'exec'

@OfirMos
Copy link
Contributor

OfirMos commented May 24, 2022

@namirsab Is it possible for you to send me the AOF or some other reproduction script?

@namirsab
Copy link
Author

We could, but the AOF is quite big, 100GB. Our database is quite big unfortunately.

@OfirMos
Copy link
Contributor

OfirMos commented May 24, 2022

@namirsab So maybe a small reproduction script

@namirsab
Copy link
Author

namirsab commented May 24, 2022

As said, I cannot reproduce it, I just told you all the information I have. Sorry :(

@OfirMos
Copy link
Contributor

OfirMos commented May 24, 2022

namirsab is it cluster or single shard(node) redis?
And can you send me the redis logs pls?
The logs should be in: /var/log/redis/
If you can send me the folder it will be great.

@namirsab
Copy link
Author

@OfirMos is a single node using sentinel, with 3 replicas.

Unfortuntately there are not more logs. Before the crashes, the ony logs we had were similar to this one:

1:M 24 May 2022 19:24:05.517 # == CRITICAL == This server is sending an error to its AOF-loading-client: '-ERR TSDB: the key is not a TSDB key' after processing the command 'exec'

Also I see other previous crashes, which I don't know if they are related. I'll post one, but I think they are mostly related to TS.MADD:

=== REDIS BUG REPORT START: Cut & paste starting from here ===

1:M 23 May 2022  # Crashed running the instruction at: 0x559f40006876


1:M 23 May 2022  # Redis 6.2.6 crashed by signal: 11, si_code: 128

1:M 23 May 2022  # Accessing address: (nil)

------ STACK TRACE ------

EIP:


redis-server *:6379(zrealloc+0x13)[0x559f3ff27143]

redis-server *:6379(call+0xa1)[0x559f3ff1d1e1]

redis-server *:6379(processInputBuffer+0xea)[0x559f3ff3554a]

redis-server *:6379(+0xfbf21)[0x559f3ffcaf21]

redis-server *:6379(main+0x31d)[0x559f3ff126fd]

------ REGISTERS ------

RAX:0000559f4008edc0 RBX:0000000000000050

RBP:00007ffe656835b0 RSP:00007ffe656833f0

R12:0000000000000050 R13:00007f567007a730


Backtrace:

/lib/x86_64-linux-gnu/libpthread.so.0(+0x12730)[0x7f5670253730]

redis-server *:6379(sdsRemoveFreeSpace+0x8b)[0x559f3ff2456b]

redis-server *:6379(RedisModuleCommandDispatcher+0x8c)[0x559f3ffa953c]

redis-server *:6379(je_arena_ralloc_no_move+0x1c6)[0x559f40006876]

redis-server *:6379(je_arena_ralloc+0xab)[0x559f40006c9b]

redis-server *:6379(ztryrealloc_usable+0x39)[0x559f3ff270b9]

redis-server *:6379(processCommand+0x644)[0x559f3ff1ee14]

redis-server *:6379(processCommandAndResetClient+0x1d)[0x559f3ff32d1d]

redis-server *:6379(aeProcessEvents+0x2a5)[0x559f3ff15f45]

redis-server *:6379(_start+0x2a)[0x559f3ff12bea]


1:M 23 May 2022  # 

RIP:0000559f40006876 EFL:0000000000010a07

CSGSFS:002b000000000033

1:M 23 May 2022  # (00007ffe656833ff) -> 0000559f3fffc792

redis-server *:6379(je_arena_ralloc_no_move+0x1c6)[0x559f40006876]

redis-server *:6379(je_realloc+0xe2)[0x559f3fffc792]

redis-server *:6379(+0x6ba14)[0x559f3ff3aa14]

redis-server *:6379(aeMain+0x25)[0x559f3ff161b5]

/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb)[0x7f56700a209b]

RCX:0000559f4008edc0 RDX:0000000000000050

RDI:1fffffffffffffff RSI:00007f297a3b3640

R8 :0000000040000001 R9 :0000559f4008ebc0

R10:7000000000000000 R11:0000000000000009

R14:0000000000000000 R15:0000000000000008

1:M 23 May 2022  # (00007ffe656833fd) -> 0000000000000002

1:M 23 May 2022  # (00007ffe656833fc) -> 0000000000000008

1:M 23 May 2022  # (00007ffe656833fb) -> 0000000000000014

mem_fragmentation_bytes:3604518376

mem_clients_normal:909568

active_defrag_running:0

lazyfreed_objects:0

loading:0

rdb_changes_since_last_save:562714066

rdb_bgsave_in_progress:0

rdb_last_bgsave_status:ok

aof_current_rewrite_time_sec:-1

module_fork_in_progress:0

aof_current_size:107377523681

aof_base_size:103296135040

aof_buffer_length:15


instantaneous_ops_per_sec:17487

expired_stale_perc:0.00

expire_cycle_cpu_milliseconds:26048

evicted_keys:0

pubsub_patterns:0

total_forks:7854

active_defrag_misses:0

active_defrag_key_hits:0

tracking_total_keys:0

tracking_total_prefixes:0

total_error_replies:228297755

1:M 23 May 2022  # (00007ffe656833f6) -> 00007f3bb2bc68c0

1:M 23 May 2022  # (00007ffe656833f5) -> 0000559f40006f90

1:M 23 May 2022  # (00007ffe656833f3) -> 0000000000000010

------ INFO OUTPUT ------

# Server

redis_build_id:c97a7d6fe7316bcd

arch_bits:64

multiplexing_api:epoll

process_supervised:no

tcp_port:6379

uptime_in_seconds:2156194

hz:10

configured_hz:10

cluster_connections:0

maxclients:10000

tracking_clients:0

used_memory_peak_human:116.19G

used_memory_dataset_perc:99.90%

allocator_active:124783517696

used_memory_lua:16555008

used_memory_scripts:264

maxmemory_human:0B

maxmemory_policy:noeviction

allocator_rss_ratio:1.03

rss_overhead_ratio:1.00

rss_overhead_bytes:13275136

mem_replication_backlog:1048576

mem_allocator:jemalloc-5.1.0

lazyfree_pending_objects:0

# Persistence

current_cow_size_age:0

current_fork_perc:0.00

current_save_keys_processed:0

rdb_last_save_time:1652706182

rdb_last_bgsave_time_sec:1419

rdb_last_cow_size:1639383040

aof_rewrite_scheduled:0

aof_last_write_status:ok

aof_last_cow_size:1641897984

module_fork_last_cow_size:0

# Stats

total_connections_received:34644520

total_net_output_bytes:10728826411065

sync_partial_ok:36

keyspace_hits:20618966756

1:M 23 May 2022  # (00007ffe656833f9) -> 00007f567007a730

1:M 23 May 2022  # (00007ffe656833f2) -> 0000000000000180

1:M 23 May 2022  # (00007ffe656833f1) -> 0000559f4005aaf5

1:M 23 May 2022  # (00007ffe656833f0) -> 00007ffe65683410


redis_git_dirty:0

redis_mode:standalone

os:Linux 5.4.0-90-generic x86_64

atomicvar_api:c11-builtin

gcc_version:8.3.0

process_id:1

run_id:381a82c5c54be9a42ca7f315ddf172aa525ae21d

config_file:


used_memory_overhead:120852560

used_memory_startup:881208

used_memory_dataset:124637881672

allocator_allocated:124759000704

allocator_resident:128349978624

total_system_memory:270163423232

used_memory_scripts_human:264B

number_of_cached_scripts:1

maxmemory:0

allocator_frag_bytes:24516992

allocator_rss_bytes:3566460928

mem_fragmentation_ratio:1.03

mem_clients_slaves:61536

aof_rewrite_in_progress:0

instantaneous_input_kbps:1802.26

sync_full:7784

keyspace_misses:3218763023

migrate_cached_sockets:0

slave_expires_tracked_keys:0

active_defrag_hits:0

active_defrag_key_misses:0

tracking_total_items:0

unexpected_error_replies:11955

dump_payload_sanitizations:0

1:M 23 May 2022  # (00007ffe656833fe) -> 00007ffe656834c0

1:M 23 May 2022  # (00007ffe656833fa) -> 00007f3dee2624b8

1:M 23 May 2022  # (00007ffe656833f8) -> 0000000000000011

1:M 23 May 2022  # (00007ffe656833f7) -> 00007f566fe009c0

1:M 23 May 2022  # (00007ffe656833f4) -> 00007ffe656835f0

redis_version:6.2.6

redis_git_sha1:00000000

server_time_usec:1653312609892187

uptime_in_days:24

lru_clock:9145441

executable:/redis-server

io_threads_active:0

# Clients

connected_clients:40

client_recent_max_input_buffer:12296

client_recent_max_output_buffer:451088

blocked_clients:0

clients_in_timeout_table:0


# Memory

used_memory:124758734232

used_memory_human:116.19G

used_memory_rss:128363253760

used_memory_rss_human:119.55G

used_memory_peak:124762231088

used_memory_peak_perc:100.00%

total_system_memory_human:251.61G

used_memory_lua_human:15.79M

allocator_frag_ratio:1.00

mem_not_counted_for_evict:3578

mem_aof_buffer:3584


current_cow_size:0

current_save_keys_total:0

rdb_current_bgsave_time_sec:-1

aof_enabled:1

aof_last_rewrite_time_sec:1476

aof_last_bgrewrite_status:ok

aof_pending_rewrite:0

aof_rewrite_buffer_length:0

aof_pending_bio_fsync:0

aof_delayed_fsync:0

total_commands_processed:28269698851

total_net_input_bytes:2747555241613

instantaneous_output_kbps:4621.77

rejected_connections:0

sync_partial_err:465

expired_keys:0

expired_time_cap_reached_count:0

pubsub_channels:1

latest_fork_usec:1687729

io_threaded_reads_processed:0

# Replication

role:master

connected_slaves:3

slave1:ip=100.64.18.25,port=6379,state=online,offset=402877047697,lag=0

master_repl_offset:402877561347

repl_backlog_first_byte_offset:402876512772


# CPU

used_cpu_user_children:506795.606184

used_cpu_user_main_thread:467589.110525

# Modules

cmdstat_ts.del:calls=27809986,usec=1882471024,usec_per_call=67.69,rejected_calls=0,failed_calls=0

cmdstat_exec:calls=12843956,usec=2051038982,usec_per_call=159.69,rejected_calls=0,failed_calls=0

cmdstat_select:calls=7543,usec=10988,usec_per_call=1.46,rejected_calls=0,failed_calls=0

cmdstat_publish:calls=3289124,usec=25672584,usec_per_call=7.81,rejected_calls=0,failed_calls=0

cmdstat_ts.madd:calls=684264155,usec=133005516795,usec_per_call=194.38,rejected_calls=0,failed_calls=120949

cmdstat_ts.mrevrange:calls=23154455,usec=7595500279,usec_per_call=328.04,rejected_calls=0,failed_calls=0

cmdstat_auth:calls=34644514,usec=240130914,usec_per_call=6.93,rejected_calls=0,failed_calls=0

cmdstat_spop:calls=3263852,usec=6423520,usec_per_call=1.97,rejected_calls=0,failed_calls=0

cmdstat_scan:calls=1489049258,usec=29711523470,usec_per_call=19.95,rejected_calls=1,failed_calls=0

cmdstat_ts.mrange:calls=5969463,usec=1067388776,usec_per_call=178.81,rejected_calls=0,failed_calls=0

cmdstat_hlen:calls=47,usec=105,usec_per_call=2.23,rejected_calls=0,failed_calls=0

cmdstat_hscan:calls=25,usec=525,usec_per_call=21.00,rejected_calls=0,failed_calls=0

cmdstat_info:calls=10567692,usec=1268348464,usec_per_call=120.02,rejected_calls=0,failed_calls=0

cmdstat_memory:calls=96,usec=330,usec_per_call=3.44,rejected_calls=0,failed_calls=0

id=34640698 addr=100.64.16.208:45150 laddr=100.64.19.13:6379 fd=35 name= age=18662 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=40954 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=61464 events=r cmd=hget user=default redir=-1

id=33933409 addr=100.64.15.33:54962 laddr=100.64.19.13:6379 fd=29 name= age=40102 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=40954 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=61472 events=r cmd=ts.madd user=default redir=-1

total_reads_processed:9588338689

io_threaded_writes_processed:0

slave0:ip=100.64.17.8,port=6379,state=online,offset=402877047697,lag=0

slave2:ip=100.64.0.6,port=6379,state=online,offset=402877019034,lag=0

master_failover_state:no-failover

master_replid:29575aa569caf1e86a2fed6e6a478b4342324175

master_replid2:bcfd96dbc3c7f893aff1f82f26bdfb83c3f6c42a

repl_backlog_active:1

used_cpu_sys:432033.100249

used_cpu_user:467103.172939


module:name=timeseries,ver=10607,api=1,filters=0,usedby=[],using=[],options=[handle-io-errors]

cmdstat_sadd:calls=1475376788,usec=3782426073,usec_per_call=2.56,rejected_calls=0,failed_calls=0

cmdstat_ttl:calls=63,usec=155,usec_per_call=2.46,rejected_calls=0,failed_calls=0

cmdstat_get:calls=1340325372,usec=2028436158,usec_per_call=1.51,rejected_calls=0,failed_calls=0

cmdstat_set:calls=13891,usec=142310,usec_per_call=10.24,rejected_calls=0,failed_calls=0

cmdstat_mget:calls=6322,usec=25340,usec_per_call=4.01,rejected_calls=0,failed_calls=0

cmdstat_hgetall:calls=711221543,usec=2073282726,usec_per_call=2.92,rejected_calls=1,failed_calls=0

cmdstat_multi:calls=12843956,usec=4475384,usec_per_call=0.35,rejected_calls=0,failed_calls=0

cmdstat_module:calls=0,usec=0,usec_per_call=0.00,rejected_calls=1,failed_calls=0


errorstat_NOMASTERLINK:count=31523

# Keyspace

db0:keys=2109834,expires=0,avg_ttl=0

------ CLIENT LIST OUTPUT ------

id=34640699 addr=100.64.16.208:45166 laddr=100.64.19.13:6379 fd=42 name= age=18662 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=40954 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=61472 events=r cmd=ts.madd user=default redir=-1

id=33933408 addr=100.64.15.33:54952 laddr=100.64.19.13:6379 fd=25 name= age=40102 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=40954 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=61464 events=r cmd=hget user=default redir=-1

id=28953009 addr=100.64.8.15:44340 laddr=100.64.19.13:6379 fd=33 name= age=349574 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=40954 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=61456 events=r cmd=smembers user=default redir=-1

id=9156361 addr=100.64.2.31:52192 laddr=100.64.19.13:6379 fd=46 name= age=1556174 idle=934217 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=20544 events=r cmd=sscan user=default redir=-1

id=1484911 addr=100.64.19.13:50850 laddr=100.64.19.13:6379 fd=20 name=sentinel-6b236530-pubsub age=2092589 idle=0 flags=P db=0 sub=1 psub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=20504 events=r cmd=subscribe user=default redir=-1

id=11271931 addr=100.64.18.25:49882 laddr=100.64.19.13:6379 fd=49 name= age=1434272 idle=0 flags=S db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=40954 argv-mem=0 obl=15 oll=0 omem=0 tot-mem=61464 events=r cmd=replconf user=default redir=-1

id=33933436 addr=100.64.5.86:52720 laddr=100.64.19.13:6379 fd=31 name= age=39988 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=1638 qbuf-free=39316 argv-mem=1327 obl=245 oll=0 omem=0 tot-mem=63151 events=r cmd=ts.madd user=default redir=-1

id=34643632 addr=100.64.8.15:45906 laddr=100.64.19.13:6379 fd=22 name= age=4375 idle=79 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=20504 events=r cmd=ts.get user=default redir=-1

id=34643721 addr=100.64.5.85:51678 laddr=100.64.19.13:6379 fd=24 name= age=3960 idle=521 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=20504 events=r cmd=ts.create user=default redir=-1

id=28952994 addr=100.64.11.168:60154 laddr=100.64.19.13:6379 fd=59 name= age=349587 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=40954 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=61464 events=r cmd=hget user=default redir=-1

id=34643350 addr=100.64.7.193:45474 laddr=100.64.19.13:6379 fd=16 name= age=5758 idle=430 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=20504 events=r cmd=ts.create user=default redir=-1

id=1484910 addr=100.64.19.13:50848 laddr=100.64.19.13:6379 fd=17 name=sentinel-6b236530-cmd age=2092589 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=40954 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=61464 events=r cmd=publish user=default redir=-1

id=24719429 addr=100.64.0.6:51360 laddr=100.64.19.13:6379 fd=37 name= age=607849 idle=0 flags=S db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=40954 argv-mem=0 obl=15 oll=0 omem=0 tot-mem=61464 events=r cmd=replconf user=default redir=-1

id=33933384 addr=100.64.11.167:60660 laddr=100.64.19.13:6379 fd=40 name= age=40209 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=40954 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=61472 events=r cmd=ts.madd user=default redir=-1

id=28952989 addr=100.64.15.32:39510 laddr=100.64.19.13:6379 fd=53 name= age=349588 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=40954 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=61464 events=r cmd=hget user=default redir=-1


total_writes_processed:11575601875


second_repl_offset:4050312776

repl_backlog_size:1048576

repl_backlog_histlen:1048576

used_cpu_sys_children:125502.382651

used_cpu_sys_main_thread:429896.435160


cmdstat_slowlog:calls=134001,usec=218572,usec_per_call=1.63,rejected_calls=0,failed_calls=0

cmdstat_ts.create:calls=23924,usec=572481,usec_per_call=23.93,rejected_calls=0,failed_calls=194

cmdstat_type:calls=13404,usec=14826,usec_per_call=1.11,rejected_calls=0,failed_calls=0

cmdstat_hset:calls=2124621,usec=4370479,usec_per_call=2.06,rejected_calls=0,failed_calls=0

cmdstat_config:calls=3,usec=54,usec_per_call=18.00,rejected_calls=0,failed_calls=2

cmdstat_ts.info:calls=12666,usec=239943,usec_per_call=18.94,rejected_calls=0,failed_calls=1206

cmdstat_keys:calls=18,usec=16416953,usec_per_call=912052.94,rejected_calls=0,failed_calls=0

cmdstat_subscribe:calls=125,usec=539,usec_per_call=4.31,rejected_calls=0,failed_calls=0

cmdstat_ts.get:calls=14787235,usec=88482991,usec_per_call=5.98,rejected_calls=0,failed_calls=342041

cmdstat_sismember:calls=4744598731,usec=7403959325,usec_per_call=1.56,rejected_calls=0,failed_calls=0

cmdstat_srem:calls=9084,usec=112660,usec_per_call=12.40,rejected_calls=0,failed_calls=0

cmdstat_psync:calls=39343,usec=650071784,usec_per_call=16523.19,rejected_calls=1372,failed_calls=31523

cmdstat_ts.alter:calls=11458,usec=412960,usec_per_call=36.04,rejected_calls=0,failed_calls=0

cmdstat_ts.range:calls=1723704368,usec=62794077445,usec_per_call=36.43,rejected_calls=0,failed_calls=222376617

cmdstat_ping:calls=6622740,usec=5682503,usec_per_call=0.86,rejected_calls=43688,failed_calls=0

errorstat_ERR:count=228221170

errorstat_LOADING:count=4347

# Cluster

cluster_enabled:0



id=33933463 addr=100.64.13.211:54494 laddr=100.64.19.13:6379 fd=15 name= age=39867 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=40954 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=61464 events=r cmd=hget user=default redir=-1

id=33933464 addr=100.64.13.211:54510 laddr=100.64.19.13:6379 fd=34 name= age=39867 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=40954 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=61472 events=r cmd=ts.madd user=default redir=-1

id=28953002 addr=100.64.7.193:40380 laddr=100.64.19.13:6379 fd=14 name= age=349577 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=40954 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=61464 events=r cmd=hget user=default redir=-1

id=28953012 addr=100.64.5.85:48084 laddr=100.64.19.13:6379 fd=39 name= age=349573 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=40954 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=61464 events=r cmd=hget user=default redir=-1

id=33933435 addr=100.64.5.86:52712 laddr=100.64.19.13:6379 fd=30 name= age=39988 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=40954 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=61464 events=r cmd=sismember user=default redir=-1

id=1484991 addr=100.64.2.31:49518 laddr=100.64.19.13:6379 fd=32 name= age=2092358 idle=940195 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=20512 events=r cmd=memory user=default redir=-1

id=34643348 addr=100.64.6.213:38968 laddr=100.64.19.13:6379 fd=27 name= age=5764 idle=733 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=20504 events=r cmd=ts.create user=default redir=-1

id=17032879 addr=100.64.17.8:34882 laddr=100.64.19.13:6379 fd=43 name=sentinel-fc5d8434-cmd age=1069238 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=20496 events=r cmd=ping user=default redir=-1

id=33933455 addr=100.64.7.194:53888 laddr=100.64.19.13:6379 fd=8 name= age=39897 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=40954 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=61464 events=r cmd=hget user=default redir=-1

id=33933456 addr=100.64.7.194:53896 laddr=100.64.19.13:6379 fd=13 name= age=39897 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=40954 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=61472 events=r cmd=ts.madd user=default redir=-1

id=17032957 addr=100.64.17.8:34952 laddr=100.64.19.13:6379 fd=51 name=sentinel-fc5d8434-pubsub age=1069231 idle=0 flags=P db=0 sub=1 psub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=20504 events=r cmd=subscribe user=default redir=-1

id=1484994 addr=100.64.2.31:49576 laddr=100.64.19.13:6379 fd=41 name= age=2092348 idle=465 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=20504 events=r cmd=del user=default redir=-1

id=34643715 addr=100.64.13.210:41094 laddr=100.64.19.13:6379 fd=18 name= age=3981 idle=384 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=20504 events=r cmd=ts.create user=default redir=-1

id=34644526 addr=100.64.7.234:59348 laddr=100.64.19.13:6379 fd=48 name= age=6 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=40954 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=61520 events=r cmd=ts.madd user=default redir=-1

id=17032871 addr=100.64.18.25:34988 laddr=100.64.19.13:6379 fd=23 name=sentinel-c4d261ae-cmd age=1069266 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=40954 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=61448 events=r cmd=ping user=default redir=-1

id=34643640 addr=100.64.11.168:46198 laddr=100.64.19.13:6379 fd=19 name= age=4344 idle=39 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=20504 events=r cmd=ts.create user=default redir=-1

------ CURRENT CLIENT INFO ------

# Commandstats

cmdstat_dump:calls=29257090,usec=18924775253,usec_per_call=646.84,rejected_calls=4899207,failed_calls=0

cmdstat_sscan:calls=8,usec=266,usec_per_call=33.25,rejected_calls=0,failed_calls=0

cmdstat_exists:calls=5826792322,usec=4751302702,usec_per_call=0.82,rejected_calls=0,failed_calls=0

cmdstat_hmset:calls=120167,usec=45632824,usec_per_call=379.75,rejected_calls=0,failed_calls=0

cmdstat_replconf:calls=5187094,usec=7402125,usec_per_call=1.43,rejected_calls=0,failed_calls=0

cmdstat_slaveof:calls=2,usec=6325,usec_per_call=3162.50,rejected_calls=0,failed_calls=0

cmdstat_hget:calls=7279753861,usec=13267624983,usec_per_call=1.82,rejected_calls=0,failed_calls=0

cmdstat_scard:calls=16,usec=34,usec_per_call=2.12,rejected_calls=0,failed_calls=0

cmdstat_ts.add:calls=1189803458,usec=9396429250,usec_per_call=7.90,rejected_calls=0,failed_calls=0

cmdstat_eval:calls=1,usec=1392276,usec_per_call=1392276.00,rejected_calls=0,failed_calls=0

cmdstat_hdel:calls=530174,usec=378856,usec_per_call=0.71,rejected_calls=0,failed_calls=0

cmdstat_client:calls=548,usec=1733,usec_per_call=3.16,rejected_calls=0,failed_calls=0

cmdstat_del:calls=2044,usec=859655,usec_per_call=420.57,rejected_calls=0,failed_calls=0

cmdstat_smembers:calls=1610792449,usec=3506060566,usec_per_call=2.18,rejected_calls=0,failed_calls=0

# Errorstats

errorstat_NOAUTH:count=40715


id=28952986 addr=100.64.16.207:54884 laddr=100.64.19.13:6379 fd=36 name= age=349588 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=40954 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=61464 events=r cmd=hget user=default redir=-1

id=19148282 addr=100.64.2.31:47658 laddr=100.64.19.13:6379 fd=45 name= age=940192 idle=1363 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=20496 events=r cmd=ping user=default redir=-1

id=34643719 addr=100.64.16.207:58718 laddr=100.64.19.13:6379 fd=28 name= age=3967 idle=770 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=20504 events=r cmd=ts.create user=default redir=-1

id=1484912 addr=100.64.17.8:33228 laddr=100.64.19.13:6379 fd=21 name= age=2092585 idle=0 flags=S db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=40954 argv-mem=0 obl=15 oll=0 omem=0 tot-mem=61464 events=r cmd=replconf user=default redir=-1

id=34643383 addr=100.64.15.32:50946 laddr=100.64.19.13:6379 fd=44 name= age=5596 idle=1468 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=20504 events=r cmd=ts.get user=default redir=-1

id=28953004 addr=100.64.6.213:45864 laddr=100.64.19.13:6379 fd=26 name= age=349576 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=40954 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=61456 events=r cmd=smembers user=default redir=-1

id=17032868 addr=100.64.18.25:34862 laddr=100.64.19.13:6379 fd=55 name=sentinel-c4d261ae-pubsub age=1069279 idle=0 flags=P db=0 sub=1 psub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=20504 events=r cmd=subscribe user=default redir=-1

id=34644525 addr=100.64.7.234:59328 laddr=100.64.19.13:6379 fd=47 name= age=6 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=40954 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=61464 events=r cmd=hget user=default redir=-1

id=33933383 addr=100.64.11.167:60650 laddr=100.64.19.13:6379 fd=38 name= age=40209 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=40954 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=61464 events=r cmd=hget user=default redir=-1

id=28952991 addr=100.64.13.210:48022 laddr=100.64.19.13:6379 fd=56 name= age=349588 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=40954 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=61464 events=r cmd=hget user=default redir=-1

argv[1]: 'raw:618b9088541c180007b61601:janitza1001/Active Energy L1234 [kWh]'

argv[8]: '1653312420000'

argv[10]: 'raw:618b9088541c180007b61601:janitza1001/Active Energy L1234 [kWh]'

argv[24]: '-10842500'

argv[29]: '1653312000000'

argv[31]: 'raw:618b9088541c180007b61601:janitza1001/Active Energy L1234 [kWh]'

------ MODULES INFO OUTPUT ------

1:M 23 May 2022  # Bio thread for job type #0 terminated

1:M 23 May 2022  # Bio thread for job type #1 terminated

*** Preparing to test memory region 7f36a22fb000 (4198400 bytes)

*** Preparing to test memory region 7f36a26fc000 (2101248 bytes)

*** Preparing to test memory region 7f36a3800000 (8388608 bytes)

*** Preparing to test memory region 7f566c403000 (8388608 bytes)

*** Preparing to test memory region 7f566e407000 (8388608 bytes)

*** Preparing to test memory region 7f566f588000 (16384 bytes)

*** Preparing to test memory region 7f566f800000 (8388608 bytes)

*** Preparing to test memory region 7f567007a000 (16384 bytes)

*** Preparing to test memory region 7f567025e000 (16384 bytes)

*** Preparing to test memory region 7f567079d000 (4096 bytes)

argv[12]: '-10842600'

argv[14]: '1653312300000'

argv[16]: 'raw:618b9088541c180007b61601:janitza1001/Active Energy L1234 [kWh]'

argv[23]: '1653312120000'

argv[25]: 'raw:618b9088541c180007b61601:janitza1001/Active Energy L1234 [kWh]'

argv[30]: '-10842500'

argv[32]: '1653311940000'

argv[33]: '-10842500'

argv[34]: 'raw:618b9088541c180007b61601:janitza1001/Active Energy L1234 [kWh]'

argv[35]: '1653311880000'

argv[36]: '-10842500'

argv[37]: 'raw:618b9088541c180007b61601:janitza1001/Active Energy L1234 [kWh]'

argv[38]: '1653311820000'

argv[39]: '-10842400'

argv[40]: 'raw:618b9088541c180007b61601:janitza1001/Active Energy L1234 [kWh]'

argv[41]: '1653311760000'

argv[42]: '-10842400'

argv[43]: 'raw:618b9088541c180007b61601:janitza1001/Active Energy L1234 [kWh]'

argv[45]: '-10842400'

1:M 23 May 2022  # key 'raw:618b9088541c180007b61601:janitza1001/Active Energy L1234 [kWh]' found in DB containing the following object:

1:M 23 May 2022  # Object type: 5

1:M 23 May 2022  # Object refcount: 1


------ FAST MEMORY TEST ------

1:M 23 May 2022  # Bio thread for job type #2 terminated

*** Preparing to test memory region 559f412d9000 (12300288 bytes)

*** Preparing to test memory region 7f36a3400000 (2097152 bytes)

*** Preparing to test memory region 7f36a4000000 (136486846464 bytes)

argv[0]: 'TS.MADD'

argv[5]: '1653312480000'

argv[9]: '-10842600'

argv[17]: '1653312240000'

argv[20]: '1653312180000'

argv[22]: 'raw:618b9088541c180007b61601:janitza1001/Active Energy L1234 [kWh]'

*** Preparing to test memory region 7f5670547000 (16384 bytes)

id=33933436 addr=100.64.5.86:52720 laddr=100.64.19.13:6379 fd=31 name= age=39988 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=1638 qbuf-free=39316 argv-mem=1327 obl=245 oll=0 omem=0 tot-mem=63151 events=r cmd=ts.madd user=default redir=-1

argv[2]: '1653312540000'

argv[3]: '-10842600'

argv[4]: 'raw:618b9088541c180007b61601:janitza1001/Active Energy L1234 [kWh]'

argv[6]: '-10842600'

argv[7]: 'raw:618b9088541c180007b61601:janitza1001/Active Energy L1234 [kWh]'

argv[11]: '1653312360000'

argv[13]: 'raw:618b9088541c180007b61601:janitza1001/Active Energy L1234 [kWh]'

argv[15]: '-10842600'

argv[18]: '-10842600'

argv[19]: 'raw:618b9088541c180007b61601:janitza1001/Active Energy L1234 [kWh]'

argv[21]: '-10842500'

argv[26]: '1653312060000'

argv[27]: '-10842500'

argv[28]: 'raw:618b9088541c180007b61601:janitza1001/Active Energy L1234 [kWh]'

argv[44]: '1653311700000'

1:M 23 May 2022  # Object encoding: 0


*** Preparing to test memory region 559f400d4000 (2277376 bytes)

*** Preparing to test memory region 7f2974000000 (56505798656 bytes)

*** Preparing to test memory region 7f36a2c00000 (8388608 bytes)

*** Preparing to test memory region 7f566b401000 (8388608 bytes)

*** Preparing to test memory region 7f566bc02000 (8388608 bytes)

*** Preparing to test memory region 7f566cc04000 (8388608 bytes)

*** Preparing to test memory region 7f566d405000 (8388608 bytes)

*** Preparing to test memory region 7f566dc06000 (8388608 bytes)

*** Preparing to test memory region 7f566ec08000 (8388608 bytes)

*** Preparing to test memory region 7f566f7ff000 (4096 bytes)

*** Preparing to test memory region 7f567023b000 (24576 bytes)

*** Preparing to test memory region 7f567076f000 (8192 bytes)

.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O

Fast memory test PASSED, however your memory can still be broken. Please run a memory test for several hours if possible.


------ DUMPING CODE AROUND EIP ------

Symbol: je_arena_ralloc_no_move (base: 0x559f400066b0)

Module: redis-server *:6379 (base 0x559f3fecf000)

$ xxd -r -p /tmp/dump.hex /tmp/dump.bin

$ objdump --adjust-vma=0x559f400066b0 -D -b binary -m i386:x86-64 /tmp/dump.bin

------

1:M 23 May 2022  # dump of function (hexdump of 582 bytes):

49ba00000000000000704c39d10f873d030000554889f04c8d5f284889e541574589cf41564989d641554989fd41544989cc534c89c34881ec980100004885ff0f845a0200004989c04889c249c1e81a4881e2000000c04181e0f00000004d01d8498b084839ca0f85a301000048c1e80925f8ff1f0049034008488b30498d4424ff48c1e61048c1fe104883e6fe4981fc001000000f877502000048c1e803488d3d6a840800440fb61c07488d055e8608004a8b14d84c01e34981fe003800000f96c14881fa00380000410f96c04121c84881fb001000000f87720100004883eb014c8d0d27840800488d0d2086080048c1eb03410fb63c19488b1cf94584c00f84d20100004881fb003800000f867d0000004d39f40f87c40000004939de0f87bb0000004c8b264c8d35e1ac0e004181e4ff0f00004b8b1ce64d85ed743c488bb3106800004d8bbdb80100008b364d85ff0f8498000000413b75040f838e0000004189f14b8d04cf4885c00f847e0000008328010f88e501000031c04881c4980100005b415c415d415e415f5dc3904c8d5bff4881fb001000000f87ef0100004c8d0d6883080049c1eb03470fb63c194981fe001000000f875a020000498d7eff48c1ef03450fb614394539fa0f8451ffffff4d39f40f863fffffff0f1f00b801000000eb96660f1f8400000000004c89efe8482cffff4885c00f8571ffffffe975ffffff662e0f1f840000000000493b93000100000f85420200004d8b8b0801000049898b0001000048c1e809498b480825f8ff1f0049898b080100004c01c84989104d

Function at 0x559f3fff94f0 is je_arena_tdata_get_hard


=== REDIS BUG REPORT END. Make sure to include from START to END. ===

I'll try to get you the log files, but I think it might be lost. I can only see the logs so far in Kibana.

Do you have an idea what dththis could be related to?

@gkorland
Copy link
Contributor

@namirsab I see that you're using Redis 6.2.4 how did you install/build it?

@namirsab
Copy link
Author

Hey @gkorland sorry, but we are using 6.2.6, i made I mistake in the first comment.
We are using the official bitnami docker image for redis 6.2.6

The exact image we are using is bitnami/redis:6.2.6-debian-10-r103

@WeaselScience
Copy link

Since we have an AOF file that can reliably reproduce the crash, we could provide you with a dedicated Ubuntu machine (by authorising your public SSH key) that has sufficient RAM and the AOF file in place for debugging this. Would that work for you?

@OfirMos
Copy link
Contributor

OfirMos commented May 25, 2022

Sounds great!
@namirsab @WeaselScience, before we do this, can you try to upgrade the RedisTimeSeries version to 1.6.11 to see if it still reproduces?

@oranagra
Copy link

i spent a few minutes looking at the disassembly,
the crash that happen on startup was in this line in RedisModuleCommandDispatcher

        if (c->argv[i]->refcount > 1)

the attempt to access the refcount member produced segmentation fault.
this code runs after the module command was executed, so i'm guessing that somehow in that flow (EXEC from within AOF), the module releases that string object, or messes up the array, and that causes redis to crash.

as for the other crash, the one in ACLCheckAllPerm, the disassembly dump is missing here, @namirsab can you add it?

p.s. the stack traces seems a bit messed up and out of order. as if several processes are printing to the same log file and crashed at the same time.

i also like to draw attention to these:

unexpected_error_replies:2480

and

# == CRITICAL == This server is sending an error to its AOF-loading-client: '-ERR TSDB: the key is not a TSDB key' after processing the command 'exec'

this seem like they could cause some corruption.

@namirsab
Copy link
Author

namirsab commented May 25, 2022

Hey @oranagra
Thanks for your help

We are trying to figure out why that CRITICAL error happens. What we are doing is taking the latest backlog that gets output after the error comes:

1:S 25 May 2022 16:20:28.846 # == CRITICAL == This replica is sending an error to its master: '-ERR TSDB: the key is not a TSDB key' after processing the command 'exec'
18:20:28.846
1:S 25 May 2022 16:20:28.846 # Latest backlog is: '"2.01.dc_current_string_22\r\n$13\r\n1653495480000\r\n$1\r\n0\r\n$60\r\nraw:61e81942595e540007d28274:- WR 02.01.dc_current_string_22\r\n$13\r\n1653495420000\r\n$1\r\n0\r\n$60\r\nraw:61e81942595e540007d28274:- WR 02.01.dc_current_string_22\r\n$13\r\n1653495360000\r\n$1\r\n0\r\n*1\r\n$4\r\nEXEC\r\n"'

Here we can see that it's an EXEC command in the end, but in other instances we see of course MULTI before and also TS.MADD so it seems to be related with adding samples to those time series.

We are checking manually what happens if we run TS.INFO in the exact same replica, and it tells us everything is fine and that the key exists. Really strange.

Regarding the missing disassembly dump, unfortunately that data is not there, the bug report from Redis stopped before somehow... 🤷

@OfirMos we are already running in 1.6.10 since yesterday after the crash, do you think upgrading to 1.6.11 would have a big impact?

@OfirMos
Copy link
Contributor

OfirMos commented May 25, 2022

@namirsab the CRITICAL error is indeed a bug which will be fixed in the next version (I was able to reproduce it).
This bug shouldn't cause a crash or any other corruption or data loss and it's happening because you are trying to madd a sample to a key which isn't of type timeseries.
@oranagra can you think of something else that might cause the crash?
In 1.6.11 there we some fixes of some issues but I'm not sure if it will really fix the problem.

@OfirMos
Copy link
Contributor

OfirMos commented May 25, 2022

@namirsab Can you share what are the type of commands you are using and also give me access to the dedicated ubuntu machine on which it reproduces?

@oranagra
Copy link

@OfirMos i can't think of anything else. as i said it looks like the command arguments are freed before the module command returns. maybe that only happens in some error flow. e.g. the cleanup code releases (decrements the refcount) a string on error, but in some flows it was not retained (refcount wasn't incremented).

i think it would be a good idea to disassemble the other crashes (the one in ACLCheckAllPerm), we can do that despite the incomplete crash log, if we get a hold of the exact same binaries (version and build) that were used.

@namirsab
Copy link
Author

namirsab commented May 26, 2022

@namirsab the CRITICAL error is indeed a bug which will be fixed in the next version (I was able to reproduce it). This bug shouldn't cause a crash or any other corruption or data loss and it's happening because you are trying to madd a sample to a key which isn't of type timeseries. @oranagra can you think of something else that might cause the crash? In 1.6.11 there we some fixes of some issues but I'm not sure if it will really fix the problem.

Hey @OfirMos
Thanks for the update. We doubled checked in every replica and the commands that are triggering the error in the logs are actually correct: we've run them in each replica and they work, so i wonder how we get the error. But we will of course update as soon as you release the fix if you think.

Regarding the environment, we will prepare it and share it with you as soon as we can, but we have a weird situation: the crash bug stopped happening, even when we use the same AOF and the same version. We don't really know why.

EDIT: We can reproduce it 100% of the times, it seems we were using a different version accidentally.

My colleague will write a follow up message with more details

@WeaselScience
Copy link

In the end, we did manage to produce an Ubuntu machine that can reliably crash an AOF file that is present there, which is the file that originally took our production down. The crash happens with Redis TimeSeries 1.6.7 and with 1.6.10. For both cases, we used Redis 6.2.6.

If you provide your SSH public keys, i can authorise them on the machine. If you would prefer, you can also join a space on our company's Google Chat (by Google account email), so we can have more real-time communication on the matter. Or we could join a Slack space or similar.

@OfirMos
Copy link
Contributor

OfirMos commented May 26, 2022

@WeaselScience @namirsab I merged a fix that will probably fix the bug, it's in origin/1.6 , can you try to upgrade to this branch? You'll probably still won't be able to load the AOF and if it's still needed we will have to edit it.

@namirsab
Copy link
Author

namirsab commented May 26, 2022

@OfirMos that's great, thanks.

We found a probably related issue with our database, that might show either corruption or a bug in TS.RANGE. We are testing this with version 1.6.7, because we are trying to reproduce the original problem we had.

Querying the same key, first for the whole month, and then for just a subset of the month, gives us different results.

For the whole month we run

# Entire Month (1st of January to 1st of February 2022)
TS.RANGE energyDelta:6193a943383f940007c0b782 1640991600000 1643670000000

and we get results until the 19th of the month, which is not correct because data is missing.

We know that because when we run a query for a subset of the month, we get data:

  # 22nd-26th of January
TS.RANGE energyDelta:6193a943383f940007c0b782 1642806000000 1643151600000

And this is correct.

Even more weird, when we use TS.REVRANGE we get correct results for the entire month.

Also TS.INFO shows also less samples than the correct result of TS.REVRANGE, and it should be more because we also have date in other months

You can find the results attached. You can just check that timestamps in the the results from the 22nd to the 26th are missing in the entireMonth file, but are present in the revrangeEntireMonth.

entireMonth.txt
22nd-26th.txt
revrangeEntireMonth.txt

@namirsab
Copy link
Author

We have been able to export the affected time series. The script below will show you that:

  • Range returns a result that does not include the result below, even though it should
  • Revrange returns the data as expected
# Import
cat energyDelta_6193a943383f940007c0b782.bin.txt | redis-cli -x RESTORE test_energyDelta_6193a943383f940007c0b782 0
# Create Check Directory
mkdir check
# Entire Month
echo "TS.RANGE test_energyDelta_6193a943383f940007c0b782 1640991600000 1643670000000" | redis-cli > check/range_full.txt
# 22nd-26th
echo "TS.RANGE test_energyDelta_6193a943383f940007c0b782 1642806000000 1643151600000" | redis-cli > check/range_part.txt
# Entire Month (Reverse)
echo "TS.REVRANGE test_energyDelta_6193a943383f940007c0b782 1640991600000 1643670000000" | redis-cli > check/revrange_full.txt
# Show Filesizes
ls -lh check

We hope this helps!

energyDelta_6193a943383f940007c0b782.bin.txt

@WeaselScience
Copy link

WeaselScience commented May 27, 2022

I am able to reproduce the above mentioned issue in origin/1.6 running Redis using the following docker image:

FROM ubuntu:18.04 AS builder
RUN apt-get update && apt-get install -y git build-essential
RUN git clone --recursive https://github.com/RedisTimeSeries/RedisTimeSeries.git
RUN cd RedisTimeSeries && git checkout "origin/1.6" && ./deps/readies/bin/getpy3 && ./system-setup.py && make build

FROM redis:6.2.6
COPY --from=builder /RedisTimeSeries/bin/redistimeseries.so /redistimeseries.so

CMD redis-server --loadmodule /redistimeseries.so DUPLICATE_POLICY LAST

Running the commands in the above post by @namirsab indeed produced unexpectedly inconsistent datasets.

@WeaselScience
Copy link

We've been able to reproduce the issue using a fresh Redis 6.2.6 instance with no data, with the latest commit of Redis TimeSeries branch 1.6.

Reproduction steps:

  • Create a new timeseries called subject.
  • Populate it using TS.MADD for two months time, from January 1st 2022 till March 1st 2022, in intervals of 30 minutes, with random numbers for data.
  • Use TS.RANGE subject - + to observe the expected number of samples.
  • Use TS.DEL to delete samples from January 10th 2022 till January 20th 2022.
  • Use TS.RANGE subject - + to observe the gap that correctly forms in the timeseries data.
  • Repopulate the above deleted interval of samples using TS.MADD, again in intervals of 30 minutes, with random data.
  • Run TS.MRANGE subject - + and observe that, unexpectedly, data until around January 13th 2022 is missing from the response.

We've packaged the reproduction into a repository: https://gitlab.com/weasel.science/redis-timeseries-tsdel-bug-reproduction

It includes a Dockerfile that self-containedly compiles Redis TimeSeries and reproduces the bug.

In this reproduction, the behavior of TS.RANGE and TS.REVRANGE seems consistently broken.

@WeaselScience
Copy link

Thank you for the patch. I've rerun the reproduction on the master branch, and it does seem to work properly now. Is this patch going to be released as a bugfix version for the 1.6 track?

@OfirMos
Copy link
Contributor

OfirMos commented May 30, 2022

Yes, it's going to be released hopefully this week as 1.6.12

@OfirMos
Copy link
Contributor

OfirMos commented May 30, 2022

@WeaselScience Please rebase over master/1.6 latest again there is another PR that is relevant to the bug:
#1188

@namirsab
Copy link
Author

namirsab commented Jun 1, 2022

@OfirMos is it safe to use the latest tag 1.6.13 or should we wait for the official release? Thanks!

@LiorKogan
Copy link
Member

@namirsab you can use it on an intel CPU. It yet to be tested on M1 before release.

@WeaselScience
Copy link

We've retested on 1.6.13, we can't reproduce the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants