Skip to content
This repository

slave crash -- Redis 2.4.8 crashed by signal: 11 #504

Open
cutter opened this Issue May 08, 2012 · 31 comments

4 participants

cutter Salvatore Sanfilippo jokea Andrew Chilton
cutter
cutter commented May 08, 2012

Let me know if you need any more information that isn't included in the auto-generated report

May 8 22:14:26 redis2 redis[15908]: === REDIS BUG REPORT START: Cut & paste starting from here ===
May 8 22:14:26 redis2 redis[15908]: Redis 2.4.8 crashed by signal: 11
May 8 22:14:26 redis2 redis[15908]: Failed assertion: (:0)
May 8 22:14:26 redis2 redis[15908]: --- STACK TRACE
May 8 22:14:26 redis2 redis[15908]: /lib/x86_64-linux-gnu/libc.so.6(memcpy+0xe1) [0x7fc195226a41]
May 8 22:14:26 redis2 redis[15908]: /lib/x86_64-linux-gnu/libc.so.6(memcpy+0xe1) [0x7fc195226a41]
May 8 22:14:26 redis2 redis[15908]: /usr/local/bin/redis-server(sdscatlen+0x55) [0x411f15]
May 8 22:14:26 redis2 redis[15908]: /usr/local/bin/redis-server(feedAppendOnlyFile+0x12d) [0x42cb7d]
May 8 22:14:26 redis2 redis[15908]: /usr/local/bin/redis-server(call+0xd3) [0x40f613]
May 8 22:14:26 redis2 redis[15908]: /usr/local/bin/redis-server(execCommand+0x8d) [0x4310fd]
May 8 22:14:26 redis2 redis[15908]: /usr/local/bin/redis-server(call+0x34) [0x40f574]
May 8 22:14:26 redis2 redis[15908]: /usr/local/bin/redis-server(processCommand+0x23c) [0x410f2c]
May 8 22:14:26 redis2 redis[15908]: /usr/local/bin/redis-server(processInputBuffer+0x4f) [0x41917f]
May 8 22:14:26 redis2 redis[15908]: /usr/local/bin/redis-server(readQueryFromClient+0x8a) [0x41927a]
May 8 22:14:26 redis2 redis[15908]: /usr/local/bin/redis-server(aeProcessEvents+0x15d) [0x40c3bd]
May 8 22:14:26 redis2 redis[15908]: /usr/local/bin/redis-server(aeMain+0x2e) [0x40c72e]
May 8 22:14:26 redis2 redis[15908]: /usr/local/bin/redis-server(main+0xf7) [0x411ae7]
May 8 22:14:26 redis2 redis[15908]: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xff) [0x7fc1951baeff]
May 8 22:14:26 redis2 redis[15908]: /usr/local/bin/redis-server() [0x40b6b9]
May 8 22:14:26 redis2 redis[15908]: --- INFO OUTPUT
May 8 22:14:26 redis2 redis[15908]: redis_version:2.4.8
redis_git_sha1:00000000
redis_git_dirty:0
arch_bits:64
multiplexing_api:epoll
gcc_version:4.5.2
process_id:15908
uptime_in_seconds:11629
uptime_in_days:0
lru_clock:1530950
used_cpu_sys:1754.67
used_cpu_user:4405.30
used_cpu_sys_children:268.87
used_cpu_user_children:3022.72
connected_clients:1
connected_slaves:0
client_longest_output_list:0
client_biggest_input_buf:10411
blocked_clients:0
used_memory:6434505224
used_memory_human:5.99G
used_memory_rss:6761967616
used_memory_peak:6442464320
used_memory_peak_human:6.00G
mem_fragmentation_ratio:1.05
mem_allocator:jemalloc-2.2.5
loading:0
aof_enabled:1
changes_since_last_save:72470421
bgsave_in_progress:0
last_save_time:1336514845
bgrewriteaof_in_progress:1
total_connections_received:64
total_commands_processed:659188863
expired_keys:0
evicted_keys:0
keyspace_hits:0
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:1384839
vm_enabled:0
role:slave
aof_current_size:9883656677
aof_base_size:3006068095
aof_pending_rewrite:0
aof_buffer_length:191276
aof_pending_bio_fsync:1
master_host:XXXXXX
master_port:6379
master_link_status:up
master_last_io_seconds_ago:0
master_sync_in_progress:0
db0:keys=95007,expires=0
May 8 22:14:26 redis2 redis[15908]: --- CLIENT LIST OUTPUT
May 8 22:14:26 redis2 redis[15908]: addr=X.X.x.X:6379 fd=7 idle=0 flags=Mx db=0 sub=0 psub=0 qbuf=10411 obl=2803 oll=0 events=rw cmd=exec
May 8 22:14:26 redis2 redis[15908]: --- CURRENT CLIENT INFO
May 8 22:14:26 redis2 redis[15908]: client: addr=X.X.X.X:6379 fd=7 idle=0 flags=Mx db=0 sub=0 psub=0 qbuf=10411 obl=2803 oll=0 events=rw cmd=exec
May 8 22:14:26 redis2 redis[15908]: argv[0]: 'ZADD'
May 8 22:14:26 redis2 redis[15908]: argv[1]: 'magp:f:u:139061:n'
May 8 22:14:26 redis2 redis[15908]: argv[2]: '1.336514862E9'
May 8 22:14:26 redis2 redis[15908]: argv[3]: '{"aid":165952,"t":"user_followed","fid":92289}'
May 8 22:14:26 redis2 redis[15908]: key 'magp:f:u:139061:n' found in DB containing the following object:
May 8 22:14:26 redis2 redis[15908]: Object type: 3
May 8 22:14:26 redis2 redis[15908]: Object encoding: 7
May 8 22:14:26 redis2 redis[15908]: Object refcount: 1
May 8 22:14:26 redis2 redis[15908]: Sorted set size: 574
May 8 22:14:26 redis2 redis[15908]: Skiplist level: 7
May 8 22:14:26 redis2 redis[15908]: === REDIS BUG REPORT END. Make sure to include from START to END. ===#12#012 Please report the crash opening an issue on github:#12#012 http://github.com/antirez/redis/issues#012
May 8 22:16:33 redis2 init: redis-server main process (15907) terminated with status 139

Salvatore Sanfilippo
Owner
antirez commented May 08, 2012

Thanks I'll look into it ASAP

Salvatore Sanfilippo
Owner
antirez commented May 11, 2012

Hello, please could you provide the following informations:

  • Kind of hardware / vm where the instance is running.
  • If this is the first crash or if it happened other times.

Thanks,
Salvatore

cutter
cutter commented May 11, 2012
jokea
jokea commented May 21, 2012

Same crash happened on one of our instances running on a real linux box:

[5816] 06 May 01:00:10 # === REDIS BUG REPORT START: Cut & paste starting from here ===
[5816] 06 May 01:00:10 # Redis 2.4.7 crashed by signal: 11
[5816] 06 May 01:00:10 # Failed assertion: (:0)
[5816] 06 May 01:00:10 # --- STACK TRACE
[5816] 06 May 01:00:10 # /lib64/libc.so.6(memcpy+0xe1) [0x358007c321]
[5816] 06 May 01:00:10 # /lib64/libc.so.6(memcpy+0xe1) [0x358007c321]
[5816] 06 May 01:00:10 # /usr/local/redis/bin/redis-server(sdscatlen+0x4c) [0x7fe1734a27dc]
[5816] 06 May 01:00:10 # /usr/local/redis/bin/redis-server(feedAppendOnlyFile+0x130) [0x7fe1734c0c10]
[5816] 06 May 01:00:10 # /usr/local/redis/bin/redis-server(call+0x118) [0x7fe17349e478]
[5816] 06 May 01:00:10 # /usr/local/redis/bin/redis-server(processCommand+0x357) [0x7fe1734a0a97]
[5816] 06 May 01:00:10 # /usr/local/redis/bin/redis-server(processInputBuffer+0x4d) [0x7fe1734a946d]
[5816] 06 May 01:00:10 # /usr/local/redis/bin/redis-server(readQueryFromClient+0x168) [0x7fe1734a9638]
[5816] 06 May 01:00:10 # /usr/local/redis/bin/redis-server(aeProcessEvents+0x168) [0x7fe17349bb58]
[5816] 06 May 01:00:10 # /usr/local/redis/bin/redis-server(aeMain+0x2e) [0x7fe17349bd6e]
[5816] 06 May 01:00:10 # /usr/local/redis/bin/redis-server(redismain+0x1a2) [0x7fe17349fef2]
[5816] 06 May 01:00:10 # /usr/local/redis/bin/redis-server(main+0x1e8) [0x402cc8]
[5816] 06 May 01:00:10 # /lib64/libc.so.6(__libc_start_main+0xf4) [0x358001d994]
[5816] 06 May 01:00:10 # /usr/local/redis/bin/redis-server [0x401ff9]
[5816] 06 May 01:00:10 # --- INFO OUTPUT
[5816] 06 May 01:00:10 # redis_version:2.4.7
redis_git_sha1:00000000
redis_git_dirty:0
arch_bits:64
multiplexing_api:epoll
gcc_version:4.1.2
process_id:5816
uptime_in_seconds:5562075
uptime_in_days:64
lru_clock:1503144
used_cpu_sys:1631685.62
used_cpu_user:1283509.62
used_cpu_sys_children:11923.27
used_cpu_user_children:1127.09
connected_clients:1605
connected_slaves:0
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0
used_memory:2171300952
used_memory_human:2.02G
used_memory_rss:2173657088
used_memory_peak:8192716024
used_memory_peak_human:7.63G
mem_fragmentation_ratio:1.00
mem_allocator:jemalloc-2.2.5
loading:0
aof_enabled:1
changes_since_last_save:26231039047
bgsave_in_progress:0
last_save_time:1330675135
bgrewriteaof_in_progress:1
total_connections_received:1566860857
total_commands_processed:162322121019
expired_keys:0
evicted_keys:0
keyspace_hits:16593910369
keyspace_misses:121
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:1459
vm_enabled:0
role:master
aof_current_size:2149564731
aof_base_size:916682
aof_pending_rewrite:0
aof_buffer_length:33
aof_pending_bio_fsync:0
db0:keys=3,expires=0
[5816] 06 May 01:00:10 # --- CLIENT LIST OUTPUT
[5816] 06 May 01:00:10 # addr=172.16.106.163:17663 fd=1612 idle=0 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=rpush
addr=172.16.106.164:38915 fd=166 idle=0 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=rpush
addr=172.16.106.164:18373 fd=1574 idle=0 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=rpush
addr=172.16.106.163:35167 fd=969 idle=1 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=rpush
addr=172.16.35.131:20634 fd=31 idle=0 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=lpop
addr=172.16.35.234:30843 fd=15 idle=0 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=lpop
addr=172.16.35.233:54405 fd=1847 idle=0 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=lpop
addr=172.16.35.131:22668 fd=1520 idle=0 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=lpop
addr=172.16.35.235:38440 fd=20 idle=0 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=lpop
addr=172.16.35.233:20888 fd=1575 idle=0 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=lpop
addr=172.16.35.131:39500 fd=1676 idle=0 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=lpop
addr=172.16.35.131:22384 fd=1304 idle=0 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=lpop
addr=172.16.35.232:42711 fd=169 idle=0 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=lpop
addr=172.16.35.234:28839 fd=17 idle=0 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=lpop
addr=172.16.35.232:46625 fd=1716 idle=0 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=lpop
addr=172.16.35.131:12505 fd=210 idle=0 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=lpop
addr=172.16.35.233:15253 fd=1464 idle=0 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=lpop
addr=172.16.35.232:53445 fd=343 idle=0 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=lpop
addr=172.16.35.131:53344 fd=1570 idle=0 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=lpop
addr=172.16.35.233:25833 fd=815 idle=0 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=lpop
addr=172.16.35.131:17862 fd=543 idle=0 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=lpop
addr=172.16.35.131:33888 fd=1713 idle=0 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=lpop
addr=172.16.35.232:58148 fd=406 idle=0 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=lpop
addr=172.16.35.234:49764 fd=356 idle=0 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=lpop
addr=172.16.35.233:41197 fd=1013 idle=0 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=lpop
addr=172.16.35.131:14577 fd=93 idle=0 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=lpop
addr=172.16.35.233:38251 fd=1132 idle=0 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=lpop
addr=172.16.35.235:40231 fd=416 idle=0 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=lpop
addr=172.16.35.235:40713 fd=450 idle=0 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=lpop
addr=172.16.35.235:41169 fd=1412 idle=0 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=lpop
addr=172.16.35.235:41456 fd=729 idle=0 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=lpop
addr=172.16.35.234:11880 fd=1848 idle=0 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=lpop
addr=172.16.35.232:23682 fd=8 idle=0 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=lpop
addr=172.16.35.233:56097 fd=16 idle=0 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=lpop
addr=172.16.35.234:18248 fd=489 idle=0 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=lpop
addr=172.16.35.235:50384 fd=1477 idle=0 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=lpop
addr=172.16.35.235:55205 fd=157 idle=0 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=lpop
addr=172.16.35.232:29120 fd=1435 idle=0 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=lpop
addr=172.16.35.233:36860 fd=562 idle=0 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=lpop
addr=172.16.35.232:30546 fd=1784 idle=0 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=lp
[5816] 06 May 01:00:10 # --- CURRENT CLIENT INFO
[5816] 06 May 01:00:10 # client: addr=172.16.106.164:57728 fd=10 idle=0 flags=N db=0 sub=0 psub=0 qbuf=0 obl=1480 oll=0 events=rw cmd=lpop
[5816] 06 May 01:00:10 # argv[0]: 'LPOP'
[5816] 06 May 01:00:10 # argv[1]: 'FH_protoFeed'
[5816] 06 May 01:00:10 # === REDIS BUG REPORT END. Make sure to include from START to END. ===
Please report the crash opening an issue on github:
http://github.com/antirez/redis/issues

Salvatore Sanfilippo
Owner
antirez commented May 21, 2012

Thanks @jokea, this time role is master so I can disregard the possibility this is slave specific.
Btw a crash after 64 days of uptime means this is not going to be the most trivial thing on the earth to spot / reproduce without a core dump.

Please could you send me the redis-server executable? Thank you.

Andrew Chilton
chilts commented May 21, 2012

The previous report said :

uptime_in_seconds:11629
uptime_in_days:0

(Just in case you missed it.) So yeah, 64 days and 0 days. :)

Salvatore Sanfilippo
Owner
antirez commented May 21, 2012

@chilts this suggests that the first instance has conditions that make the error more likely to happen in theory, but if you check what @cutter says, it's the first time that this happens... so I really need a core maybe, but I'll make sure to investigate very seriously in order to understand if it is possible to fix this analytically.

jokea
jokea commented May 21, 2012

Actually this instance is running with our modified version, which builds redis to a shared library, and use an outside
executable which loads the library and calls redis:main().

The reason for this modification is that we are running a really large amount of instances, and we need a easier
way to update them to a newer version. With this modification, we can update to a new version with simply a CONFIG
command.

The crash happened two weeks ago and I didn't report it because I was thinking that it was caused by our modification.
Maybe @cutter could provide his redis-server for further investigation.

jokea
jokea commented May 21, 2012

Both instance were doing bgrewrite when the crash happens:

bgrewriteaof_in_progress:1

I think this may provide some clue.

Salvatore Sanfilippo
Owner
antirez commented May 21, 2012

@jokea definitely, there must be some kind of race there. Thanks.

Salvatore Sanfilippo
Owner
antirez commented May 21, 2012

One thing is sure... even without the executable my GCC version gives offsets that are more or less the same and the crash happens here:

243             server.bgrewritebuf = sdscatlen(server.bgrewritebuf,buf,sdslen(b...

This was pretty obvious but it's good to be sure about this since it was not completely impossible that the call to sdscatprintf() was somewhat inlined.

So now there is to understand, starting from the rewrite process, when it can happen that server.bgrewritebuf gets corrupted, set to NULL, or otherwise broken in case of some kind of race.

Salvatore Sanfilippo
Owner
antirez commented May 21, 2012

p.s. it's also worth to note that the call sdslen(buf) does not crash, so it is somewhat more likely that the crash is due to memcpy() trying to write against the destination, and not trying to read from the source. Not a matter of fact with the current information, but I would say, it's at 99% of probabilities.

Salvatore Sanfilippo
Owner
antirez commented May 21, 2012

I've an idea... this could be as simple as the fact that an SDS string can't hold more than 2GB of data.
On slow rewrites, what we get is that the rewrite buffer accumulating the new changes may overflow and crash everything...

Both this instances are large, it makes sense. Now simulating this stuff myself to see the stack trace I get, but should be almost identical.

Salvatore Sanfilippo
Owner
antirez commented May 21, 2012

That's the stack trace in my simulation, this really seems identical to the other ones:

3324] 21 May 13:06:20 # === REDIS BUG REPORT START: Cut & paste starting from h
ere ===
[3324] 21 May 13:06:20 #     Redis 2.4.8 crashed by signal: 11
[3324] 21 May 13:06:20 #     Failed assertion: <no assertion failed> (<no file>:
0)
[3324] 21 May 13:06:20 # --- STACK TRACE
[3324] 21 May 13:06:20 # /lib/x86_64-linux-gnu/libc.so.6(+0x14881d) [0x7f4eb46b3
81d]  
[3324] 21 May 13:06:20 # /lib/x86_64-linux-gnu/libc.so.6(+0x14881d) [0x7f4eb46b3
81d]  
[3324] 21 May 13:06:20 # ./redis-server(sdscatlen+0x4f) [0x411eef]
[3324] 21 May 13:06:20 # ./redis-server(feedAppendOnlyFile+0x12d) [0x42c76d]
[3324] 21 May 13:06:20 # ./redis-server(call+0xd3) [0x40f8d3]
[3324] 21 May 13:06:20 # ./redis-server(processCommand+0x253) [0x4111c3]
[3324] 21 May 13:06:20 # ./redis-server(processInputBuffer+0x4f) [0x418dff]
[3324] 21 May 13:06:20 # ./redis-server(readQueryFromClient+0x8a) [0x418efa]
[3324] 21 May 13:06:20 # ./redis-server(aeProcessEvents+0x135) [0x40c685]
[3324] 21 May 13:06:20 # ./redis-server(aeMain+0x2e) [0x40ca1e]
[3324] 21 May 13:06:20 # ./redis-server(main+0x106) [0x40b8b6]
[3324] 21 May 13:06:20 # /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)
 [0x7f4eb458c76d]
[3324] 21 May 13:06:20 # ./redis-server() [0x40b9dd]
[3324] 21 May 13:06:20 # --- INFO OUTPUT
[3324] 21 May 13:06:20 # redis_version:2.4.8
jokea
jokea commented May 21, 2012

Good job!

Salvatore Sanfilippo antirez referenced this issue from a commit May 22, 2012
Salvatore Sanfilippo Allow an AOF rewrite buffer > 2GB (Fix for issue #504).
During the AOF rewrite process, the parent process needs to accumulate
the new writes in an in-memory buffer: when the child will terminate the
AOF rewriting process this buffer (that ist the difference between the
dataset when the rewrite was started, and the current dataset) is
flushed to the new AOF file.

We used to implement this buffer using an sds.c string, but sds.c has a
2GB limit. Sometimes the dataset can be big enough, the amount of writes
so high, and the rewrite process slow enough that we overflow the 2GB
limit, causing a crash, documented on github by issue #504.

In order to prevent this from happening, this commit introduces a new
system to accumulate writes, implemented by a linked list of blocks of
10 MB each, so that we also avoid paying the reallocation cost.

Note that theoretically modern operating systems may implement realloc()
simply as a remaping of the old pages, thus with very good performances,
see for instance the mremap() syscall on Linux. However this is not
always true, and jemalloc by default avoids doing this because there are
issues with the current implementation of mremap().

For this reason we are using a linked list of blocks instead of a single
block that gets reallocated again and again.

The changes in this commit lacks testing, that will be performed before
merging into the unstable branch. This fix will not enter 2.4 because it
is too invasive. However 2.4 will log a warning when the AOF rewrite
buffer is near to the 2GB limit.
67eccdb
Salvatore Sanfilippo antirez referenced this issue from a commit May 24, 2012
Commit has since been removed from the repository and is no longer available.
Salvatore Sanfilippo antirez referenced this issue from a commit May 22, 2012
Salvatore Sanfilippo Allow an AOF rewrite buffer > 2GB (Fix for issue #504).
During the AOF rewrite process, the parent process needs to accumulate
the new writes in an in-memory buffer: when the child will terminate the
AOF rewriting process this buffer (that ist the difference between the
dataset when the rewrite was started, and the current dataset) is
flushed to the new AOF file.

We used to implement this buffer using an sds.c string, but sds.c has a
2GB limit. Sometimes the dataset can be big enough, the amount of writes
so high, and the rewrite process slow enough that we overflow the 2GB
limit, causing a crash, documented on github by issue #504.

In order to prevent this from happening, this commit introduces a new
system to accumulate writes, implemented by a linked list of blocks of
10 MB each, so that we also avoid paying the reallocation cost.

Note that theoretically modern operating systems may implement realloc()
simply as a remaping of the old pages, thus with very good performances,
see for instance the mremap() syscall on Linux. However this is not
always true, and jemalloc by default avoids doing this because there are
issues with the current implementation of mremap().

For this reason we are using a linked list of blocks instead of a single
block that gets reallocated again and again.

The changes in this commit lacks testing, that will be performed before
merging into the unstable branch. This fix will not enter 2.4 because it
is too invasive. However 2.4 will log a warning when the AOF rewrite
buffer is near to the 2GB limit.
47ca4b6
Salvatore Sanfilippo antirez referenced this issue from a commit May 22, 2012
Salvatore Sanfilippo Allow an AOF rewrite buffer > 2GB (Fix for issue #504).
During the AOF rewrite process, the parent process needs to accumulate
the new writes in an in-memory buffer: when the child will terminate the
AOF rewriting process this buffer (that ist the difference between the
dataset when the rewrite was started, and the current dataset) is
flushed to the new AOF file.

We used to implement this buffer using an sds.c string, but sds.c has a
2GB limit. Sometimes the dataset can be big enough, the amount of writes
so high, and the rewrite process slow enough that we overflow the 2GB
limit, causing a crash, documented on github by issue #504.

In order to prevent this from happening, this commit introduces a new
system to accumulate writes, implemented by a linked list of blocks of
10 MB each, so that we also avoid paying the reallocation cost.

Note that theoretically modern operating systems may implement realloc()
simply as a remaping of the old pages, thus with very good performances,
see for instance the mremap() syscall on Linux. However this is not
always true, and jemalloc by default avoids doing this because there are
issues with the current implementation of mremap().

For this reason we are using a linked list of blocks instead of a single
block that gets reallocated again and again.

The changes in this commit lacks testing, that will be performed before
merging into the unstable branch. This fix will not enter 2.4 because it
is too invasive. However 2.4 will log a warning when the AOF rewrite
buffer is near to the 2GB limit.
edb9525
jokea
jokea commented May 25, 2012

@cutter could you please provide some log before the crash? My instance's
loglevel was set to warning so nothing useful can be found.

I've a feeling that there's something wrong with the bgrewrite process and
it got stucked by some reason. Since in my case, it's a relatively small
database, and what's more important, the instance is configured to
do bgrewrite at 5:20 everyday(auto rewrite is disabled), and it crashes at 01:00,
which means that the bgrewrite process runs for 20 hours...

cutter
cutter commented May 25, 2012

Hi yes, It started crashing again today coincidentally except it was crashing every 15 mins or so. So much that we've had to turn off appendonly on the slave (note that we have upgraded to 2.4.10 in the meantime), you can see how much it hates EBS:

May 25 01:02:00 redis5 redis[21372]: 1 clients connected (0 slaves), 23676998584 bytes in use
May 25 01:02:02 redis5 redis[21372]: Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
May 25 01:02:06 redis5 redis[21372]: Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
May 25 01:02:13 redis5 redis[21372]: DB 0: 106792 keys (0 volatile) in 131072 slots HT.
May 25 01:02:13 redis5 redis[21372]: 1 clients connected (0 slaves), 23730775784 bytes in use
May 25 01:02:22 redis5 redis[21372]: Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
May 25 01:02:26 redis5 redis[21372]: DB 0: 106793 keys (0 volatile) in 131072 slots HT.
May 25 01:02:26 redis5 redis[21372]: 1 clients connected (0 slaves), 23783761520 bytes in use
May 25 01:02:27 redis5 redis[21372]: Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
May 25 01:02:30 redis5 redis[21372]: Accepted 127.0.0.1:54208
May 25 01:02:30 redis5 redis[21372]: Client closed connection
May 25 01:02:35 redis5 redis[21372]: Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
May 25 01:02:38 redis5 redis[21372]: DB 0: 106793 keys (0 volatile) in 131072 slots HT.
May 25 01:02:38 redis5 redis[21372]: 1 clients connected (0 slaves), 23840845312 bytes in use
May 25 01:02:48 redis5 redis[21372]: Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
May 25 01:02:52 redis5 redis[21372]: Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
May 25 01:02:52 redis5 redis[21372]: DB 0: 106793 keys (0 volatile) in 131072 slots HT.
May 25 01:02:52 redis5 redis[21372]: 1 clients connected (0 slaves), 23895410072 bytes in use
May 25 01:02:56 redis5 redis[21372]: Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
May 25 01:03:05 redis5 redis[21372]: Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
May 25 01:03:08 redis5 redis[21372]: DB 0: 106793 keys (0 volatile) in 131072 slots HT.
May 25 01:03:08 redis5 redis[21372]: 1 clients connected (0 slaves), 23959511824 bytes in use
May 25 01:03:12 redis5 redis[21372]: === REDIS BUG REPORT START: Cut & paste starting from here ===
May 25 01:03:12 redis5 redis[21372]: Redis 2.4.10 crashed by signal: 11
May 25 01:03:12 redis5 redis[21372]: Failed assertion: (:0)
May 25 01:03:12 redis5 redis[21372]: --- STACK TRACE
May 25 01:03:12 redis5 redis[21372]: /lib/x86_64-linux-gnu/libc.so.6(memcpy+0xe1) [0x7fb630d97a41]
May 25 01:03:12 redis5 redis[21372]: /lib/x86_64-linux-gnu/libc.so.6(memcpy+0xe1) [0x7fb630d97a41]
May 25 01:03:12 redis5 redis[21372]: /usr/local/bin/redis-server(sdscatlen+0x55) [0x412215]
May 25 01:03:12 redis5 redis[21372]: /usr/local/bin/redis-server(feedAppendOnlyFile+0x12d) [0x42ce5d]
May 25 01:03:12 redis5 redis[21372]: /usr/local/bin/redis-server(call+0xd3) [0x40f863]
May 25 01:03:12 redis5 redis[21372]: /usr/local/bin/redis-server(execCommand+0x8d) [0x4313dd]
May 25 01:03:12 redis5 redis[21372]: /usr/local/bin/redis-server(call+0x34) [0x40f7c4]
May 25 01:03:12 redis5 redis[21372]: /usr/local/bin/redis-server(processCommand+0x23c) [0x41117c]
May 25 01:03:12 redis5 redis[21372]: /usr/local/bin/redis-server(processInputBuffer+0x4f) [0x4194af]
May 25 01:03:12 redis5 redis[21372]: /usr/local/bin/redis-server(readQueryFromClient+0x8a) [0x4195aa]
etc...

Salvatore Sanfilippo
Owner
antirez commented May 25, 2012

@jokea @cutter yes I concur that there are two different problems here: one is the overflow because of the 2GB limit (that was fixed in 2.6 and unstable branch). Another issue is, why the rewrite process is taking so long to complete?

15 minutes are ok if the disk is extremely slow like in the EC2 EBS case, and this can be a plain overflow issue, but many hours is strange: either the process exited but was not "catched" by Redis's wait syscall, or something else.

Salvatore Sanfilippo
Owner
antirez commented May 25, 2012

p.s. it is not excluded that issue #507 is somewhat related to this one, but I'm not sure.

jokea
jokea commented May 25, 2012

I think it has something to do with jemalloc, see tests here: https://gist.github.com/2786537

Salvatore Sanfilippo
Owner
antirez commented May 25, 2012

@jokea: you rock, see jemalloc 3.0.0 changelog:

  • Fix fork-related bugs that could cause deadlock in children between fork and exec.
Salvatore Sanfilippo
Owner
antirez commented May 25, 2012

p.s. if while you are it you can try your test code against 3.0.0 to verify it actually fixes the issue, I'll appreciate it :)

jokea
jokea commented May 25, 2012

I've tried 3.0.0 and it fails too, the result is in previous post too.

Salvatore Sanfilippo
Owner
antirez commented May 25, 2012

oops... 3.0.0 still broken is definitely not good news, but on the other side with your test code they should be able to fix jemalloc without too much efforts.

However we have an alternative... if I understand correctly what is happening in Redis is that the thread doing the allocation while forking is the one at bio.c. We can make sure in some way it does not allocate at all while forking... can't see other alternatives if 3.0.0 is still broken.

Thanks a lot for this stuff, it's really useful.

jokea
jokea commented May 25, 2012

That alternative totally works.

Still I'm not sure if I did the test right since jemalloc-3.0.0 claimed to have fixed the deadlock.

Salvatore Sanfilippo
Owner
antirez commented May 25, 2012

We could try posting the issue into the jemalloc-discussion mailing list to see if they have any clue about it.

jokea
jokea commented May 25, 2012

Looks like this is the best thing we can do for now. I'll do it right now.

Salvatore Sanfilippo
Owner
antirez commented May 25, 2012

Thanks!

Salvatore Sanfilippo antirez referenced this issue from a commit May 25, 2012
Salvatore Sanfilippo Don't fork() when threads are into critical sections.
While investigating issue #504 it was found that jemalloc may produce a
deadlock in the child process when forking a parent with threads doing
allocations at the same time.

In oder to make sure that the bio.c threads (the only currently in use
within Redis) are not holding mutexes nor performing allocations this
commit modifies the code so that Redis waits for the count of pending
jobs to reach zero before calling fork().

When this happens, all the threads should be blocked into
pthread_cond_wait(), so blocked (not doing allocations) and not holding
the lock.

This commit is indirectly related to issue #504, where the deadlock is
not the actual bug reason but the indirect cause, blocking the AOF rewrite
child and allowing the AOF background rewrite buffer to go over 2GB,
resulting in the server crash.
938d6d9
Salvatore Sanfilippo
Owner

An update about this issue:

  • Unstable and 2.6 no longer have 2GB limit.
  • The jemalloc developer acknowledged there was a problem before jemalloc 3.0, and that our testing code was not working on 3.0 as well because there was no malloc in the main thread before creating new threads. So I upgrade 2.4.x to Jemaloc 3.0 as well.

This is almost enough to close the issue, but there is still something missing, that 2.4 does not produce a warning when near to the 2GB limit (2.6 / unstable now log warnings when the bg aof buffer grows too much), it should so taking the issue open, but I'll fix it in the next days.

jokea
jokea commented June 05, 2012

It's great to finally locate the issue.

About the 2.4 release, how about setting a hard limit for the size of the bg aof buffer to prevent redis crash?
When it reaches the limit, we can free the buffer and kill the child process and warn the user about what had happened.
I think most users/admins don't like crashes.

Salvatore Sanfilippo
Owner

@jokea it is a good idea indeed, my only fear is that a few users will not notice the warning and get an huge AOF, but I guess we can't have everything at the same time, so seems like the best option indeed :) Thanks.

Geoff Garside geoffgarside referenced this issue from a commit June 07, 2012
Commit has since been removed from the repository and is no longer available.
Steffen Müller tsee referenced this issue from a commit in tsee/redis May 22, 2012
Salvatore Sanfilippo Allow an AOF rewrite buffer > 2GB (Fix for issue #504).
During the AOF rewrite process, the parent process needs to accumulate
the new writes in an in-memory buffer: when the child will terminate the
AOF rewriting process this buffer (that ist the difference between the
dataset when the rewrite was started, and the current dataset) is
flushed to the new AOF file.

We used to implement this buffer using an sds.c string, but sds.c has a
2GB limit. Sometimes the dataset can be big enough, the amount of writes
so high, and the rewrite process slow enough that we overflow the 2GB
limit, causing a crash, documented on github by issue #504.

In order to prevent this from happening, this commit introduces a new
system to accumulate writes, implemented by a linked list of blocks of
10 MB each, so that we also avoid paying the reallocation cost.

Note that theoretically modern operating systems may implement realloc()
simply as a remaping of the old pages, thus with very good performances,
see for instance the mremap() syscall on Linux. However this is not
always true, and jemalloc by default avoids doing this because there are
issues with the current implementation of mremap().

For this reason we are using a linked list of blocks instead of a single
block that gets reallocated again and again.

The changes in this commit lacks testing, that will be performed before
merging into the unstable branch. This fix will not enter 2.4 because it
is too invasive. However 2.4 will log a warning when the AOF rewrite
buffer is near to the 2GB limit.
a6fc07b
jokea jokea referenced this issue April 26, 2013
Closed

Redis crash #1056

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.