SLOWLOG related crash in Redis 2.4.4 #322

Closed
antirez opened this Issue Jan 30, 2012 · 10 comments

Comments

Projects
None yet
2 participants
@antirez
Owner

antirez commented Jan 30, 2012

@jeznet reported the following:

I got an error while Redis was under heavy load:

[3149] 29 Jan 17:25:33 # === REDIS BUG REPORT START: Cut & paste starting from here ===
[3149] 29 Jan 17:25:33 #     Redis 2.4.4 crashed by signal: 11
[3149] 29 Jan 17:25:33 #     Failed assertion: <no assertion failed> (<no file>:0)
[3149] 29 Jan 17:25:33 # --- STACK TRACE
[3149] 29 Jan 17:25:33 # ./redis-server() [0x80851f3]
[3149] 29 Jan 17:25:33 # ./redis-server() [0x808ebbf]
[3149] 29 Jan 17:25:33 # ./redis-server(je_free+0x35a) [0x807f0ea]
[3149] 29 Jan 17:25:33 # ./redis-server(slowlogFreeEntry+0x28) [0x807bd28]
[3149] 29 Jan 17:25:33 # ./redis-server(listDelNode+0x32) [0x80514a2]
[3149] 29 Jan 17:25:33 # ./redis-server(slowlogPushEntryIfNeeded+0x77) [0x807beb7]
[3149] 29 Jan 17:25:33 # ./redis-server(call+0x75) [0x80550c5]
[3149] 29 Jan 17:25:33 # ./redis-server(processCommand+0x253) [0x8056c13]
[3149] 29 Jan 17:25:33 # ./redis-server(processInputBuffer+0x57) [0x805e7d7]
[3149] 29 Jan 17:25:33 # ./redis-server(readQueryFromClient+0x85) [0x805e8d5]
[3149] 29 Jan 17:25:33 # ./redis-server(aeProcessEvents+0x12e) [0x8051dee]
[3149] 29 Jan 17:25:33 # ./redis-server(aeMain+0x37) [0x8052187]
[3149] 29 Jan 17:25:33 # ./redis-server(main+0x110) [0x8057890]
[3149] 29 Jan 17:25:33 # /lib32/libc.so.6(__libc_start_main+0xe7) [0xf75b1e37]
[3149] 29 Jan 17:25:33 # ./redis-server() [0x80511b1]
[3149] 29 Jan 17:25:33 # --- INFO OUTPUT
[3149] 29 Jan 17:25:33 # redis_version:2.4.4
redis_git_sha1:00000000
redis_git_dirty:0
arch_bits:32
multiplexing_api:epoll
process_id:3149
uptime_in_seconds:352769
uptime_in_days:4
lru_clock:664857
used_cpu_sys:23251.42
used_cpu_user:16173.52
used_cpu_sys_children:9795.19
used_cpu_user_children:7721.67
connected_clients:12
connected_slaves:0
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0
used_memory:4064789620
used_memory_human:3.79G
used_memory_rss:4084191232
used_memory_peak:4064771596
used_memory_peak_human:3.79G
mem_fragmentation_ratio:1.00
mem_allocator:jemalloc-2.2.5
loading:0
aof_enabled:0
changes_since_last_save:172547
bgsave_in_progress:1
last_save_time:1327854257
bgrewriteaof_in_progress:0
total_connections_received:240
total_commands_processed:3339961223
expired_keys:0
evicted_keys:0
keyspace_hits:3257524674
keyspace_misses:82035803
pubsub_channels:1
pubsub_patterns:0
latest_fork_usec:54424
vm_enabled:0
role:master
db0:keys=19175,expires=0
db1:keys=5431,expires=0
db3:keys=438,expires=0

[3149] 29 Jan 17:25:33 # --- CLIENT LIST OUTPUT
[3149] 29 Jan 17:25:33 # addr=127.0.0.1:37539 fd=6 idle=349310 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=slowlog
addr=127.0.0.1:32846 fd=10 idle=0 flags=N db=1 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=exists
addr=127.0.0.1:34065 fd=11 idle=26 flags=N db=1 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=exec
addr=127.0.0.1:34066 fd=12 idle=37759 flags=N db=1 sub=1 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=subscribe
addr=127.0.0.1:40813 fd=5 idle=183759 flags=N db=1 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=lrem
addr=127.0.0.1:36696 fd=8 idle=0 flags=N db=1 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=sismember
addr=127.0.0.1:36734 fd=15 idle=0 flags=N db=1 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=hget
addr=127.0.0.1:39132 fd=13 idle=0 flags=N db=1 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=get
addr=127.0.0.1:46486 fd=14 idle=0 flags=N db=1 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=hget
addr=127.0.0.1:33530 fd=7 idle=0 flags=N db=1 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=rpush
addr=127.0.0.1:41346 fd=16 idle=0 flags=N db=1 sub=0 psub=0 qbuf=0 obl=7 oll=0 events=rw cmd=hget
addr=127.0.0.1:54446 fd=9 idle=0 flags=N db=1 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=rpush

[3149] 29 Jan 17:25:33 # === REDIS BUG REPORT END. Make sure to include from START to END. ===

Segmentation fault
[30611] 29 Jan 17:26:03 * DB saved on disk
@szymon-jez

This comment has been minimized.

Show comment Hide comment
@szymon-jez

szymon-jez Jan 30, 2012

It might help that I have:

  • set CONFIG SET slowlog-log-slower-than 25 and
  • compiled Redis to 32 bit and
  • the used_memory_human: got 3.79G.

It might help that I have:

  • set CONFIG SET slowlog-log-slower-than 25 and
  • compiled Redis to 32 bit and
  • the used_memory_human: got 3.79G.
@antirez

This comment has been minimized.

Show comment Hide comment
@antirez

antirez Jan 30, 2012

Owner

@jeznet well this are very useful informations indeed! Please can you tell me if it ever happened when you where not so near to the 4GB 32bit process addressing space? I think this is very likely not a bug but an issue with jemalloc when you allocate all the available address space in a 32bit process. Thanks.

Owner

antirez commented Jan 30, 2012

@jeznet well this are very useful informations indeed! Please can you tell me if it ever happened when you where not so near to the 4GB 32bit process addressing space? I think this is very likely not a bug but an issue with jemalloc when you allocate all the available address space in a 32bit process. Thanks.

@szymon-jez

This comment has been minimized.

Show comment Hide comment
@szymon-jez

szymon-jez Jan 30, 2012

@antirez It is the first time I ran in to this issue. The available system memory was also near it's limit at that time.

@antirez It is the first time I ran in to this issue. The available system memory was also near it's limit at that time.

@antirez

This comment has been minimized.

Show comment Hide comment
@antirez

antirez Jan 30, 2012

Owner

So the 4GB limit is likely our issue here, however I'm not sure why jemalloc would behave like this, Redis should crash for out of memory in such a case. I'll try to simulate the issue locally and report back in this thread. Thanks!

p.s. please if you see any further issue post the stack trace here.

Owner

antirez commented Jan 30, 2012

So the 4GB limit is likely our issue here, however I'm not sure why jemalloc would behave like this, Redis should crash for out of memory in such a case. I'll try to simulate the issue locally and report back in this thread. Thanks!

p.s. please if you see any further issue post the stack trace here.

@ghost ghost assigned antirez Jan 30, 2012

@antirez

This comment has been minimized.

Show comment Hide comment
@antirez

antirez Jan 31, 2012

Owner

I tried it with 2.6 and got this:

zmalloc: Out of memory trying to allocate 123 bytes

So Redis did not crashed with sigsegv but just more "gracefully" for out of memory.
Not sure why you got this different kind of crash, but I'm pretty confident it is for the exact same reason.

Please could you tell me if this instance is a 64 bit box with a 32 bit Redis instance or if the box is 32 bit itself? Because I tried in the former setup of a 64 bit box with a 32 bit instance and this may lead to differences.

Thanks,
Salvatore

Owner

antirez commented Jan 31, 2012

I tried it with 2.6 and got this:

zmalloc: Out of memory trying to allocate 123 bytes

So Redis did not crashed with sigsegv but just more "gracefully" for out of memory.
Not sure why you got this different kind of crash, but I'm pretty confident it is for the exact same reason.

Please could you tell me if this instance is a 64 bit box with a 32 bit Redis instance or if the box is 32 bit itself? Because I tried in the former setup of a 64 bit box with a 32 bit instance and this may lead to differences.

Thanks,
Salvatore

@szymon-jez

This comment has been minimized.

Show comment Hide comment
@szymon-jez

szymon-jez Feb 1, 2012

It was a 64 bit box with a 32 bit Redis, so like your test setup. Did
you enable the SLOWLOG setting I posted above?
Maybe relevant: A few Resque ques filled up to 100k+ jobs caused the
memory to fill up.

It was a 64 bit box with a 32 bit Redis, so like your test setup. Did
you enable the SLOWLOG setting I posted above?
Maybe relevant: A few Resque ques filled up to 100k+ jobs caused the
memory to fill up.

@antirez

This comment has been minimized.

Show comment Hide comment
@antirez

antirez Feb 1, 2012

Owner

Hey @jeznet, allright, as expected I got a corruption as well once I retried with a different workload in your exact setup:

=== REDIS BUG REPORT START: Cut & paste starting from here ===
[23222] 01 Feb 10:42:06 # ------------------------------------------------
[23222] 01 Feb 10:42:06 # !!! Software Failure. Press left mouse button to continue   
[23222] 01 Feb 10:42:06 # Guru Meditation: "decrRefCount against refcount <= 0" #object.c:195
[23222] 01 Feb 10:42:06 # (forcing SIGSEGV in order to print the stack trace)
[23222] 01 Feb 10:42:06 # ------------------------------------------------
[23222] 01 Feb 10:42:06 #     Redis 2.9.3 crashed by signal: 11
[23222] 01 Feb 10:42:06 #     Failed assertion: <no assertion failed> (<no file>:0)   
[23222] 01 Feb 10:42:06 # --- STACK TRACE
[23222] 01 Feb 10:42:06 # ./redis-server(_redisPanic+0x84) [0x807f094]
[23222] 01 Feb 10:42:06 # ./redis-server(decrRefCount+0xd6) [0x8066d96]
[23222] 01 Feb 10:42:06 # ./redis-server(resetClient+0x29) [0x8062d99]
[23222] 01 Feb 10:42:06 # ./redis-server(processInputBuffer+0x5b) [0x806488b]
[23222] 01 Feb 10:42:06 # ./redis-server(readQueryFromClient+0x9d) [0x806499d]
[23222] 01 Feb 10:42:06 # ./redis-server(aeProcessEvents+0x126) [0x8056956]
[23222] 01 Feb 10:42:06 # ./redis-server(aeMain+0x34) [0x8056b84]
[23222] 01 Feb 10:42:06 # ./redis-server(main+0x2b9) [0x805d329]
[23222] 01 Feb 10:42:06 # /lib32/libc.so.6(__libc_start_main+0xe6) [0xf75eabd6]
[23222] 01 Feb 10:42:06 # ./redis-server() [0x8055f71]
[23222] 01 Feb 10:42:06 # --- INFO OUTPUT

Long story short: jemalloc() can corrupt badly once you are out of memory (mine was at: used_memory_human:3.89G) and we should probably place a default limit around 3.8 GB in all the 32 bit instances that will prevent Redis to accept more data when this limit is reached.

So we can be sure what you experienced is not a bug with SLOWLOG or broken code, but a matter of out of memory. However I'm taking this open since I want to resolve the 32bit limit issue in some way.

Thanks!

Owner

antirez commented Feb 1, 2012

Hey @jeznet, allright, as expected I got a corruption as well once I retried with a different workload in your exact setup:

=== REDIS BUG REPORT START: Cut & paste starting from here ===
[23222] 01 Feb 10:42:06 # ------------------------------------------------
[23222] 01 Feb 10:42:06 # !!! Software Failure. Press left mouse button to continue   
[23222] 01 Feb 10:42:06 # Guru Meditation: "decrRefCount against refcount <= 0" #object.c:195
[23222] 01 Feb 10:42:06 # (forcing SIGSEGV in order to print the stack trace)
[23222] 01 Feb 10:42:06 # ------------------------------------------------
[23222] 01 Feb 10:42:06 #     Redis 2.9.3 crashed by signal: 11
[23222] 01 Feb 10:42:06 #     Failed assertion: <no assertion failed> (<no file>:0)   
[23222] 01 Feb 10:42:06 # --- STACK TRACE
[23222] 01 Feb 10:42:06 # ./redis-server(_redisPanic+0x84) [0x807f094]
[23222] 01 Feb 10:42:06 # ./redis-server(decrRefCount+0xd6) [0x8066d96]
[23222] 01 Feb 10:42:06 # ./redis-server(resetClient+0x29) [0x8062d99]
[23222] 01 Feb 10:42:06 # ./redis-server(processInputBuffer+0x5b) [0x806488b]
[23222] 01 Feb 10:42:06 # ./redis-server(readQueryFromClient+0x9d) [0x806499d]
[23222] 01 Feb 10:42:06 # ./redis-server(aeProcessEvents+0x126) [0x8056956]
[23222] 01 Feb 10:42:06 # ./redis-server(aeMain+0x34) [0x8056b84]
[23222] 01 Feb 10:42:06 # ./redis-server(main+0x2b9) [0x805d329]
[23222] 01 Feb 10:42:06 # /lib32/libc.so.6(__libc_start_main+0xe6) [0xf75eabd6]
[23222] 01 Feb 10:42:06 # ./redis-server() [0x8055f71]
[23222] 01 Feb 10:42:06 # --- INFO OUTPUT

Long story short: jemalloc() can corrupt badly once you are out of memory (mine was at: used_memory_human:3.89G) and we should probably place a default limit around 3.8 GB in all the 32 bit instances that will prevent Redis to accept more data when this limit is reached.

So we can be sure what you experienced is not a bug with SLOWLOG or broken code, but a matter of out of memory. However I'm taking this open since I want to resolve the 32bit limit issue in some way.

Thanks!

@szymon-jez

This comment has been minimized.

Show comment Hide comment
@szymon-jez

szymon-jez Feb 1, 2012

Good to hear that You have found what is going on. I'm glad I could help.

Good to hear that You have found what is going on. I'm glad I could help.

@antirez

This comment has been minimized.

Show comment Hide comment
@antirez

antirez Feb 1, 2012

Owner

@jeznet thanks, I suggest that you set some limit to avoid having this issue again for now, but I'll provide a general fix for 32 bit instances, However you can already get 99% of what the final fix will be with:

maxmemory 3500MB
maxmemory-policy noeviction

Cheers,
Salvatore

Owner

antirez commented Feb 1, 2012

@jeznet thanks, I suggest that you set some limit to avoid having this issue again for now, but I'll provide a general fix for 32 bit instances, However you can already get 99% of what the final fix will be with:

maxmemory 3500MB
maxmemory-policy noeviction

Cheers,
Salvatore

@antirez

This comment has been minimized.

Show comment Hide comment
@antirez

antirez Feb 2, 2012

Owner

Now both 2.4 and unstable branch has a default limit to 3.5 GB if a 32 bit instance without explicit user-specified max memory limit is detected. Closing this issue, thanks for participating!

Owner

antirez commented Feb 2, 2012

Now both 2.4 and unstable branch has a default limit to 3.5 GB if a 32 bit instance without explicit user-specified max memory limit is detected. Closing this issue, thanks for participating!

@antirez antirez closed this Feb 2, 2012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment