New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
slave crash -- Redis 2.4.8 crashed by signal: 11 #504
Comments
Thanks I'll look into it ASAP |
Hello, please could you provide the following informations:
Thanks, |
Hey Salvatore! It was in ec2, the largest memory host they have: High-Memory Quadruple Extra Large Instance 68.4 GB of memory And yes it was the first time we've seen that. Cheers, cb On May 11, 2012, at 6:37 AM, Salvatore Sanfilippo wrote:
|
Same crash happened on one of our instances running on a real linux box: [5816] 06 May 01:00:10 # === REDIS BUG REPORT START: Cut & paste starting from here === |
Thanks @jokea, this time role is master so I can disregard the possibility this is slave specific. Please could you send me the redis-server executable? Thank you. |
The previous report said : uptime_in_seconds:11629 (Just in case you missed it.) So yeah, 64 days and 0 days. :) |
@chilts this suggests that the first instance has conditions that make the error more likely to happen in theory, but if you check what @cutter says, it's the first time that this happens... so I really need a core maybe, but I'll make sure to investigate very seriously in order to understand if it is possible to fix this analytically. |
Actually this instance is running with our modified version, which builds redis to a shared library, and use an outside The reason for this modification is that we are running a really large amount of instances, and we need a easier The crash happened two weeks ago and I didn't report it because I was thinking that it was caused by our modification. |
Both instance were doing bgrewrite when the crash happens: bgrewriteaof_in_progress:1 I think this may provide some clue. |
@jokea definitely, there must be some kind of race there. Thanks. |
One thing is sure... even without the executable my GCC version gives offsets that are more or less the same and the crash happens here:
This was pretty obvious but it's good to be sure about this since it was not completely impossible that the call to sdscatprintf() was somewhat inlined. So now there is to understand, starting from the rewrite process, when it can happen that server.bgrewritebuf gets corrupted, set to NULL, or otherwise broken in case of some kind of race. |
p.s. it's also worth to note that the call |
I've an idea... this could be as simple as the fact that an SDS string can't hold more than 2GB of data. Both this instances are large, it makes sense. Now simulating this stuff myself to see the stack trace I get, but should be almost identical. |
That's the stack trace in my simulation, this really seems identical to the other ones:
|
Good job! |
During the AOF rewrite process, the parent process needs to accumulate the new writes in an in-memory buffer: when the child will terminate the AOF rewriting process this buffer (that ist the difference between the dataset when the rewrite was started, and the current dataset) is flushed to the new AOF file. We used to implement this buffer using an sds.c string, but sds.c has a 2GB limit. Sometimes the dataset can be big enough, the amount of writes so high, and the rewrite process slow enough that we overflow the 2GB limit, causing a crash, documented on github by issue #504. In order to prevent this from happening, this commit introduces a new system to accumulate writes, implemented by a linked list of blocks of 10 MB each, so that we also avoid paying the reallocation cost. Note that theoretically modern operating systems may implement realloc() simply as a remaping of the old pages, thus with very good performances, see for instance the mremap() syscall on Linux. However this is not always true, and jemalloc by default avoids doing this because there are issues with the current implementation of mremap(). For this reason we are using a linked list of blocks instead of a single block that gets reallocated again and again. The changes in this commit lacks testing, that will be performed before merging into the unstable branch. This fix will not enter 2.4 because it is too invasive. However 2.4 will log a warning when the AOF rewrite buffer is near to the 2GB limit.
During the AOF rewrite process, the parent process needs to accumulate the new writes in an in-memory buffer: when the child will terminate the AOF rewriting process this buffer (that ist the difference between the dataset when the rewrite was started, and the current dataset) is flushed to the new AOF file. We used to implement this buffer using an sds.c string, but sds.c has a 2GB limit. Sometimes the dataset can be big enough, the amount of writes so high, and the rewrite process slow enough that we overflow the 2GB limit, causing a crash, documented on github by issue #504. In order to prevent this from happening, this commit introduces a new system to accumulate writes, implemented by a linked list of blocks of 10 MB each, so that we also avoid paying the reallocation cost. Note that theoretically modern operating systems may implement realloc() simply as a remaping of the old pages, thus with very good performances, see for instance the mremap() syscall on Linux. However this is not always true, and jemalloc by default avoids doing this because there are issues with the current implementation of mremap(). For this reason we are using a linked list of blocks instead of a single block that gets reallocated again and again. The changes in this commit lacks testing, that will be performed before merging into the unstable branch. This fix will not enter 2.4 because it is too invasive. However 2.4 will log a warning when the AOF rewrite buffer is near to the 2GB limit.
During the AOF rewrite process, the parent process needs to accumulate the new writes in an in-memory buffer: when the child will terminate the AOF rewriting process this buffer (that ist the difference between the dataset when the rewrite was started, and the current dataset) is flushed to the new AOF file. We used to implement this buffer using an sds.c string, but sds.c has a 2GB limit. Sometimes the dataset can be big enough, the amount of writes so high, and the rewrite process slow enough that we overflow the 2GB limit, causing a crash, documented on github by issue #504. In order to prevent this from happening, this commit introduces a new system to accumulate writes, implemented by a linked list of blocks of 10 MB each, so that we also avoid paying the reallocation cost. Note that theoretically modern operating systems may implement realloc() simply as a remaping of the old pages, thus with very good performances, see for instance the mremap() syscall on Linux. However this is not always true, and jemalloc by default avoids doing this because there are issues with the current implementation of mremap(). For this reason we are using a linked list of blocks instead of a single block that gets reallocated again and again. The changes in this commit lacks testing, that will be performed before merging into the unstable branch. This fix will not enter 2.4 because it is too invasive. However 2.4 will log a warning when the AOF rewrite buffer is near to the 2GB limit.
@cutter could you please provide some log before the crash? My instance's I've a feeling that there's something wrong with the bgrewrite process and |
Hi yes, It started crashing again today coincidentally except it was crashing every 15 mins or so. So much that we've had to turn off appendonly on the slave (note that we have upgraded to 2.4.10 in the meantime), you can see how much it hates EBS: May 25 01:02:00 redis5 redis[21372]: 1 clients connected (0 slaves), 23676998584 bytes in use |
@jokea @cutter yes I concur that there are two different problems here: one is the overflow because of the 2GB limit (that was fixed in 2.6 and unstable branch). Another issue is, why the rewrite process is taking so long to complete? 15 minutes are ok if the disk is extremely slow like in the EC2 EBS case, and this can be a plain overflow issue, but many hours is strange: either the process exited but was not "catched" by Redis's wait syscall, or something else. |
p.s. it is not excluded that issue #507 is somewhat related to this one, but I'm not sure. |
I think it has something to do with jemalloc, see tests here: https://gist.github.com/2786537 |
@jokea: you rock, see jemalloc 3.0.0 changelog:
|
p.s. if while you are it you can try your test code against 3.0.0 to verify it actually fixes the issue, I'll appreciate it :) |
I've tried 3.0.0 and it fails too, the result is in previous post too. |
oops... 3.0.0 still broken is definitely not good news, but on the other side with your test code they should be able to fix jemalloc without too much efforts. However we have an alternative... if I understand correctly what is happening in Redis is that the thread doing the allocation while forking is the one at bio.c. We can make sure in some way it does not allocate at all while forking... can't see other alternatives if 3.0.0 is still broken. Thanks a lot for this stuff, it's really useful. |
That alternative totally works. Still I'm not sure if I did the test right since jemalloc-3.0.0 claimed to have fixed the deadlock. |
We could try posting the issue into the jemalloc-discussion mailing list to see if they have any clue about it. |
Looks like this is the best thing we can do for now. I'll do it right now. |
Thanks! |
While investigating issue #504 it was found that jemalloc may produce a deadlock in the child process when forking a parent with threads doing allocations at the same time. In oder to make sure that the bio.c threads (the only currently in use within Redis) are not holding mutexes nor performing allocations this commit modifies the code so that Redis waits for the count of pending jobs to reach zero before calling fork(). When this happens, all the threads should be blocked into pthread_cond_wait(), so blocked (not doing allocations) and not holding the lock. This commit is indirectly related to issue #504, where the deadlock is not the actual bug reason but the indirect cause, blocking the AOF rewrite child and allowing the AOF background rewrite buffer to go over 2GB, resulting in the server crash.
An update about this issue:
This is almost enough to close the issue, but there is still something missing, that 2.4 does not produce a warning when near to the 2GB limit (2.6 / unstable now log warnings when the bg aof buffer grows too much), it should so taking the issue open, but I'll fix it in the next days. |
It's great to finally locate the issue. About the 2.4 release, how about setting a hard limit for the size of the bg aof buffer to prevent redis crash? |
@jokea it is a good idea indeed, my only fear is that a few users will not notice the warning and get an huge AOF, but I guess we can't have everything at the same time, so seems like the best option indeed :) Thanks. |
During the AOF rewrite process, the parent process needs to accumulate the new writes in an in-memory buffer: when the child will terminate the AOF rewriting process this buffer (that ist the difference between the dataset when the rewrite was started, and the current dataset) is flushed to the new AOF file. We used to implement this buffer using an sds.c string, but sds.c has a 2GB limit. Sometimes the dataset can be big enough, the amount of writes so high, and the rewrite process slow enough that we overflow the 2GB limit, causing a crash, documented on github by issue redis#504. In order to prevent this from happening, this commit introduces a new system to accumulate writes, implemented by a linked list of blocks of 10 MB each, so that we also avoid paying the reallocation cost. Note that theoretically modern operating systems may implement realloc() simply as a remaping of the old pages, thus with very good performances, see for instance the mremap() syscall on Linux. However this is not always true, and jemalloc by default avoids doing this because there are issues with the current implementation of mremap(). For this reason we are using a linked list of blocks instead of a single block that gets reallocated again and again. The changes in this commit lacks testing, that will be performed before merging into the unstable branch. This fix will not enter 2.4 because it is too invasive. However 2.4 will log a warning when the AOF rewrite buffer is near to the 2GB limit.
<ChangeLog> UPGRADE URGENCY: moderate if you use AOF, otherwise low. * [BUGFIX] Jemalloc updated to 3.0.0. This fixes a possibly AOF rewrite issue. See redis/redis#504 for info. </ChangeLog>
Let me know if you need any more information that isn't included in the auto-generated report
May 8 22:14:26 redis2 redis[15908]: === REDIS BUG REPORT START: Cut & paste starting from here ===
May 8 22:14:26 redis2 redis[15908]: Redis 2.4.8 crashed by signal: 11
May 8 22:14:26 redis2 redis[15908]: Failed assertion: (:0)
May 8 22:14:26 redis2 redis[15908]: --- STACK TRACE
May 8 22:14:26 redis2 redis[15908]: /lib/x86_64-linux-gnu/libc.so.6(memcpy+0xe1) [0x7fc195226a41]
May 8 22:14:26 redis2 redis[15908]: /lib/x86_64-linux-gnu/libc.so.6(memcpy+0xe1) [0x7fc195226a41]
May 8 22:14:26 redis2 redis[15908]: /usr/local/bin/redis-server(sdscatlen+0x55) [0x411f15]
May 8 22:14:26 redis2 redis[15908]: /usr/local/bin/redis-server(feedAppendOnlyFile+0x12d) [0x42cb7d]
May 8 22:14:26 redis2 redis[15908]: /usr/local/bin/redis-server(call+0xd3) [0x40f613]
May 8 22:14:26 redis2 redis[15908]: /usr/local/bin/redis-server(execCommand+0x8d) [0x4310fd]
May 8 22:14:26 redis2 redis[15908]: /usr/local/bin/redis-server(call+0x34) [0x40f574]
May 8 22:14:26 redis2 redis[15908]: /usr/local/bin/redis-server(processCommand+0x23c) [0x410f2c]
May 8 22:14:26 redis2 redis[15908]: /usr/local/bin/redis-server(processInputBuffer+0x4f) [0x41917f]
May 8 22:14:26 redis2 redis[15908]: /usr/local/bin/redis-server(readQueryFromClient+0x8a) [0x41927a]
May 8 22:14:26 redis2 redis[15908]: /usr/local/bin/redis-server(aeProcessEvents+0x15d) [0x40c3bd]
May 8 22:14:26 redis2 redis[15908]: /usr/local/bin/redis-server(aeMain+0x2e) [0x40c72e]
May 8 22:14:26 redis2 redis[15908]: /usr/local/bin/redis-server(main+0xf7) [0x411ae7]
May 8 22:14:26 redis2 redis[15908]: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xff) [0x7fc1951baeff]
May 8 22:14:26 redis2 redis[15908]: /usr/local/bin/redis-server() [0x40b6b9]
May 8 22:14:26 redis2 redis[15908]: --- INFO OUTPUT
May 8 22:14:26 redis2 redis[15908]: redis_version:2.4.8
redis_git_sha1:00000000
redis_git_dirty:0
arch_bits:64
multiplexing_api:epoll
gcc_version:4.5.2
process_id:15908
uptime_in_seconds:11629
uptime_in_days:0
lru_clock:1530950
used_cpu_sys:1754.67
used_cpu_user:4405.30
used_cpu_sys_children:268.87
used_cpu_user_children:3022.72
connected_clients:1
connected_slaves:0
client_longest_output_list:0
client_biggest_input_buf:10411
blocked_clients:0
used_memory:6434505224
used_memory_human:5.99G
used_memory_rss:6761967616
used_memory_peak:6442464320
used_memory_peak_human:6.00G
mem_fragmentation_ratio:1.05
mem_allocator:jemalloc-2.2.5
loading:0
aof_enabled:1
changes_since_last_save:72470421
bgsave_in_progress:0
last_save_time:1336514845
bgrewriteaof_in_progress:1
total_connections_received:64
total_commands_processed:659188863
expired_keys:0
evicted_keys:0
keyspace_hits:0
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:1384839
vm_enabled:0
role:slave
aof_current_size:9883656677
aof_base_size:3006068095
aof_pending_rewrite:0
aof_buffer_length:191276
aof_pending_bio_fsync:1
master_host:XXXXXX
master_port:6379
master_link_status:up
master_last_io_seconds_ago:0
master_sync_in_progress:0
db0:keys=95007,expires=0
May 8 22:14:26 redis2 redis[15908]: --- CLIENT LIST OUTPUT
May 8 22:14:26 redis2 redis[15908]: addr=X.X.x.X:6379 fd=7 idle=0 flags=Mx db=0 sub=0 psub=0 qbuf=10411 obl=2803 oll=0 events=rw cmd=exec
May 8 22:14:26 redis2 redis[15908]: --- CURRENT CLIENT INFO
May 8 22:14:26 redis2 redis[15908]: client: addr=X.X.X.X:6379 fd=7 idle=0 flags=Mx db=0 sub=0 psub=0 qbuf=10411 obl=2803 oll=0 events=rw cmd=exec
May 8 22:14:26 redis2 redis[15908]: argv[0]: 'ZADD'
May 8 22:14:26 redis2 redis[15908]: argv[1]: 'magp:f:u:139061:n'
May 8 22:14:26 redis2 redis[15908]: argv[2]: '1.336514862E9'
May 8 22:14:26 redis2 redis[15908]: argv[3]: '{"aid":165952,"t":"user_followed","fid":92289}'
May 8 22:14:26 redis2 redis[15908]: key 'magp:f:u:139061:n' found in DB containing the following object:
May 8 22:14:26 redis2 redis[15908]: Object type: 3
May 8 22:14:26 redis2 redis[15908]: Object encoding: 7
May 8 22:14:26 redis2 redis[15908]: Object refcount: 1
May 8 22:14:26 redis2 redis[15908]: Sorted set size: 574
May 8 22:14:26 redis2 redis[15908]: Skiplist level: 7
May 8 22:14:26 redis2 redis[15908]: === REDIS BUG REPORT END. Make sure to include from START to END. ===#12#012 Please report the crash opening an issue on github:#12#012 http://github.com/antirez/redis/issues#012
May 8 22:16:33 redis2 init: redis-server main process (15907) terminated with status 139
The text was updated successfully, but these errors were encountered: