Skip to content
This repository

SLOWLOG related crash in Redis 2.4.4 #322

Closed
antirez opened this Issue · 10 comments

2 participants

Salvatore Sanfilippo Szymon Jeż
Salvatore Sanfilippo
Owner

@jeznet reported the following:

I got an error while Redis was under heavy load:

[3149] 29 Jan 17:25:33 # === REDIS BUG REPORT START: Cut & paste starting from here ===
[3149] 29 Jan 17:25:33 #     Redis 2.4.4 crashed by signal: 11
[3149] 29 Jan 17:25:33 #     Failed assertion: <no assertion failed> (<no file>:0)
[3149] 29 Jan 17:25:33 # --- STACK TRACE
[3149] 29 Jan 17:25:33 # ./redis-server() [0x80851f3]
[3149] 29 Jan 17:25:33 # ./redis-server() [0x808ebbf]
[3149] 29 Jan 17:25:33 # ./redis-server(je_free+0x35a) [0x807f0ea]
[3149] 29 Jan 17:25:33 # ./redis-server(slowlogFreeEntry+0x28) [0x807bd28]
[3149] 29 Jan 17:25:33 # ./redis-server(listDelNode+0x32) [0x80514a2]
[3149] 29 Jan 17:25:33 # ./redis-server(slowlogPushEntryIfNeeded+0x77) [0x807beb7]
[3149] 29 Jan 17:25:33 # ./redis-server(call+0x75) [0x80550c5]
[3149] 29 Jan 17:25:33 # ./redis-server(processCommand+0x253) [0x8056c13]
[3149] 29 Jan 17:25:33 # ./redis-server(processInputBuffer+0x57) [0x805e7d7]
[3149] 29 Jan 17:25:33 # ./redis-server(readQueryFromClient+0x85) [0x805e8d5]
[3149] 29 Jan 17:25:33 # ./redis-server(aeProcessEvents+0x12e) [0x8051dee]
[3149] 29 Jan 17:25:33 # ./redis-server(aeMain+0x37) [0x8052187]
[3149] 29 Jan 17:25:33 # ./redis-server(main+0x110) [0x8057890]
[3149] 29 Jan 17:25:33 # /lib32/libc.so.6(__libc_start_main+0xe7) [0xf75b1e37]
[3149] 29 Jan 17:25:33 # ./redis-server() [0x80511b1]
[3149] 29 Jan 17:25:33 # --- INFO OUTPUT
[3149] 29 Jan 17:25:33 # redis_version:2.4.4
redis_git_sha1:00000000
redis_git_dirty:0
arch_bits:32
multiplexing_api:epoll
process_id:3149
uptime_in_seconds:352769
uptime_in_days:4
lru_clock:664857
used_cpu_sys:23251.42
used_cpu_user:16173.52
used_cpu_sys_children:9795.19
used_cpu_user_children:7721.67
connected_clients:12
connected_slaves:0
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0
used_memory:4064789620
used_memory_human:3.79G
used_memory_rss:4084191232
used_memory_peak:4064771596
used_memory_peak_human:3.79G
mem_fragmentation_ratio:1.00
mem_allocator:jemalloc-2.2.5
loading:0
aof_enabled:0
changes_since_last_save:172547
bgsave_in_progress:1
last_save_time:1327854257
bgrewriteaof_in_progress:0
total_connections_received:240
total_commands_processed:3339961223
expired_keys:0
evicted_keys:0
keyspace_hits:3257524674
keyspace_misses:82035803
pubsub_channels:1
pubsub_patterns:0
latest_fork_usec:54424
vm_enabled:0
role:master
db0:keys=19175,expires=0
db1:keys=5431,expires=0
db3:keys=438,expires=0

[3149] 29 Jan 17:25:33 # --- CLIENT LIST OUTPUT
[3149] 29 Jan 17:25:33 # addr=127.0.0.1:37539 fd=6 idle=349310 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=slowlog
addr=127.0.0.1:32846 fd=10 idle=0 flags=N db=1 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=exists
addr=127.0.0.1:34065 fd=11 idle=26 flags=N db=1 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=exec
addr=127.0.0.1:34066 fd=12 idle=37759 flags=N db=1 sub=1 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=subscribe
addr=127.0.0.1:40813 fd=5 idle=183759 flags=N db=1 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=lrem
addr=127.0.0.1:36696 fd=8 idle=0 flags=N db=1 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=sismember
addr=127.0.0.1:36734 fd=15 idle=0 flags=N db=1 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=hget
addr=127.0.0.1:39132 fd=13 idle=0 flags=N db=1 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=get
addr=127.0.0.1:46486 fd=14 idle=0 flags=N db=1 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=hget
addr=127.0.0.1:33530 fd=7 idle=0 flags=N db=1 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=rpush
addr=127.0.0.1:41346 fd=16 idle=0 flags=N db=1 sub=0 psub=0 qbuf=0 obl=7 oll=0 events=rw cmd=hget
addr=127.0.0.1:54446 fd=9 idle=0 flags=N db=1 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=rpush

[3149] 29 Jan 17:25:33 # === REDIS BUG REPORT END. Make sure to include from START to END. ===

Segmentation fault
[30611] 29 Jan 17:26:03 * DB saved on disk
Szymon Jeż

It might help that I have:

  • set CONFIG SET slowlog-log-slower-than 25 and
  • compiled Redis to 32 bit and
  • the used_memory_human: got 3.79G.
Salvatore Sanfilippo
Owner

@jeznet well this are very useful informations indeed! Please can you tell me if it ever happened when you where not so near to the 4GB 32bit process addressing space? I think this is very likely not a bug but an issue with jemalloc when you allocate all the available address space in a 32bit process. Thanks.

Szymon Jeż

@antirez It is the first time I ran in to this issue. The available system memory was also near it's limit at that time.

Salvatore Sanfilippo
Owner

So the 4GB limit is likely our issue here, however I'm not sure why jemalloc would behave like this, Redis should crash for out of memory in such a case. I'll try to simulate the issue locally and report back in this thread. Thanks!

p.s. please if you see any further issue post the stack trace here.

Salvatore Sanfilippo
Owner

I tried it with 2.6 and got this:

zmalloc: Out of memory trying to allocate 123 bytes

So Redis did not crashed with sigsegv but just more "gracefully" for out of memory.
Not sure why you got this different kind of crash, but I'm pretty confident it is for the exact same reason.

Please could you tell me if this instance is a 64 bit box with a 32 bit Redis instance or if the box is 32 bit itself? Because I tried in the former setup of a 64 bit box with a 32 bit instance and this may lead to differences.

Thanks,
Salvatore

Szymon Jeż
Salvatore Sanfilippo
Owner

Hey @jeznet, allright, as expected I got a corruption as well once I retried with a different workload in your exact setup:

=== REDIS BUG REPORT START: Cut & paste starting from here ===
[23222] 01 Feb 10:42:06 # ------------------------------------------------
[23222] 01 Feb 10:42:06 # !!! Software Failure. Press left mouse button to continue   
[23222] 01 Feb 10:42:06 # Guru Meditation: "decrRefCount against refcount <= 0" #object.c:195
[23222] 01 Feb 10:42:06 # (forcing SIGSEGV in order to print the stack trace)
[23222] 01 Feb 10:42:06 # ------------------------------------------------
[23222] 01 Feb 10:42:06 #     Redis 2.9.3 crashed by signal: 11
[23222] 01 Feb 10:42:06 #     Failed assertion: <no assertion failed> (<no file>:0)   
[23222] 01 Feb 10:42:06 # --- STACK TRACE
[23222] 01 Feb 10:42:06 # ./redis-server(_redisPanic+0x84) [0x807f094]
[23222] 01 Feb 10:42:06 # ./redis-server(decrRefCount+0xd6) [0x8066d96]
[23222] 01 Feb 10:42:06 # ./redis-server(resetClient+0x29) [0x8062d99]
[23222] 01 Feb 10:42:06 # ./redis-server(processInputBuffer+0x5b) [0x806488b]
[23222] 01 Feb 10:42:06 # ./redis-server(readQueryFromClient+0x9d) [0x806499d]
[23222] 01 Feb 10:42:06 # ./redis-server(aeProcessEvents+0x126) [0x8056956]
[23222] 01 Feb 10:42:06 # ./redis-server(aeMain+0x34) [0x8056b84]
[23222] 01 Feb 10:42:06 # ./redis-server(main+0x2b9) [0x805d329]
[23222] 01 Feb 10:42:06 # /lib32/libc.so.6(__libc_start_main+0xe6) [0xf75eabd6]
[23222] 01 Feb 10:42:06 # ./redis-server() [0x8055f71]
[23222] 01 Feb 10:42:06 # --- INFO OUTPUT

Long story short: jemalloc() can corrupt badly once you are out of memory (mine was at: used_memory_human:3.89G) and we should probably place a default limit around 3.8 GB in all the 32 bit instances that will prevent Redis to accept more data when this limit is reached.

So we can be sure what you experienced is not a bug with SLOWLOG or broken code, but a matter of out of memory. However I'm taking this open since I want to resolve the 32bit limit issue in some way.

Thanks!

Szymon Jeż
Salvatore Sanfilippo
Owner

@jeznet thanks, I suggest that you set some limit to avoid having this issue again for now, but I'll provide a general fix for 32 bit instances, However you can already get 99% of what the final fix will be with:

maxmemory 3500MB
maxmemory-policy noeviction

Cheers,
Salvatore

Salvatore Sanfilippo
Owner

Now both 2.4 and unstable branch has a default limit to 3.5 GB if a 32 bit instance without explicit user-specified max memory limit is detected. Closing this issue, thanks for participating!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.