Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Redis 2.4.16 crashed by signal: 7 #733

Closed
bfanti opened this Issue · 9 comments

3 participants

@bfanti

Log dump with partial stack trace and info pasted below.
Please let me know if you need any other information about the environment.
Totally clueless as to what could have caused this.
Thanks!

[30010] 26 Oct 04:26:04 * DB saved on disk
[19869] 26 Oct 04:26:05 * Background saving terminated with success
[19869] 26 Oct 04:30:29 # === REDIS BUG REPORT START: Cut & paste starting from here ===
[19869] 26 Oct 04:30:29 # Redis 2.4.16 crashed by signal: 7
[19869] 26 Oct 04:30:29 # Failed assertion: (:0)
[19869] 26 Oct 04:30:29 # --- STACK TRACE
[19869] 26 Oct 04:30:29 # /lib/x86_64-linux-gnu/libc.so.6(epoll_wait+0x33) [0x7f708f1bbb53]
[19869] 26 Oct 04:30:29 # /lib/x86_64-linux-gnu/libc.so.6(epoll_wait+0x33) [0x7f708f1bbb53]
[19869] 26 Oct 04:30:29 # /usr/local/bin/redis-server(aeProcessEvents+0x66) [0x40c8f6]
[19869] 26 Oct 04:30:29 # /usr/local/bin/redis-server(aeMain+0x2e) [0x40cc4e]
[19869] 26 Oct 04:30:29 # /usr/local/bin/redis-server(main+0x14e) [0x40bbae]
[19869] 26 Oct 04:30:29 # /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7f708f0ea76d]
[19869] 26 Oct 04:30:29 # /usr/local/bin/redis-server() [0x40bd19]
[19869] 26 Oct 04:30:29 # --- INFO OUTPUT
[19869] 26 Oct 04:30:29 # redis_version:2.4.16^M
redis_git_sha1:00000000^M
redis_git_dirty:0^M
arch_bits:64^M
multiplexing_api:epoll^M
gcc_version:4.6.3^M
process_id:19869^M
uptime_in_seconds:5973455^M
uptime_in_days:69^M
lru_clock:904854^M
used_cpu_sys:134937.92^M
used_cpu_user:15677.37^M
used_cpu_sys_children:164864.86^M
used_cpu_user_children:766204.81^M
connected_clients:652^M
connected_slaves:0^M
client_longest_output_list:0^M
client_biggest_input_buf:0^M
blocked_clients:0^M
used_memory:7323584792^M
used_memory_human:6.82G^M
used_memory_rss:7183241216^M
used_memory_peak:7545377128^M
used_memory_peak_human:7.03G^M
mem_fragmentation_ratio:0.98^M
mem_allocator:jemalloc-3.0.0^M
loading:0^M
aof_enabled:0^M
changes_since_last_save:0^M
bgsave_in_progress:0^M
last_save_time:1351225565^M
bgrewriteaof_in_progress:0^M
total_connections_received:263682^M
total_commands_processed:515894950^M
expired_keys:685610^M
evicted_keys:6243392^M
keyspace_hits:40612362^M
keyspace_misses:89092494^M
pubsub_channels:0^M
pubsub_patterns:0^M
latest_fork_usec:2651518^M
vm_enabled:0^M
role:master^
db0:keys=49694,expires=33044^M

[19869] 26 Oct 04:30:29 # --- CLIENT LIST OUTPUT
[19869] 26 Oct 04:30:29 # addr=67.192.172.206:64232 fd=798 idle=67145 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=expire
addr=67.192.172.206:64233 fd=799 idle=66447 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=get
addr=67.192.172.206:64314 fd=800 idle=66447 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=expire
addr=67.192.172.206:64315 fd=801 idle=66447 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=get
addr=67.192.172.206:64342 fd=802 idle=66447 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=expire
addr=67.192.172.206:64347 fd=803 idle=66447 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=get
addr=67.192.172.206:64395 fd=804 idle=66447 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=expire
addr=67.192.172.206:64396 fd=805 idle=66426 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=get
addr=67.192.172.206:64477 fd=806 idle=66426 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=expire
addr=67.192.172.206:64478 fd=807 idle=66426 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=get
addr=67.192.172.206:64610 fd=808 idle=66426 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=expire
addr=67.192.172.206:64611 fd=809 idle=66416 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=get
addr=67.192.172.206:64622 fd=810 idle=66416 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=expire
addr=67.192.172.206:64623 fd=811 idle=66416 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=get
addr=67.192.172.206:64752 fd=815 idle=66089 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=get
addr=67.192.172.206:64847 fd=816 idle=66089 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=expire
addr=67.192.172.206:64848 fd=817 idle=66089 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=get
addr=67.192.172.206:64849 fd=818 idle=66089 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=expire
addr=67.192.172.206:64850 fd=819 idle=65545 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=get
addr=67.192.172.206:64851 fd=820 idle=65545 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=expire
addr=67.192.172.206:64852 fd=821 idle=65545 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=get
addr=67.192.172.206:64943 fd=822 idle=65545 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=expire
addr=67.192.172.206:64944 fd=823 idle=64867 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=get
addr=67.192.172.206:64945 fd=824 idle=64867 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=expire
addr=67.192.172.206:64946 fd=825 idle=64867 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=get
addr=67.192.172.206:64947 fd=826 idle=64867 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=expire
addr=67.192.172.206:64948 fd=827 idle=64853 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=get
addr=67.192.172.206:64949 fd=828 idle=64853 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=expire
addr=67.192.172.206:65200 fd=829 idle=64852 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=get
addr=67.192.172.206:65201 fd=830 idle=64852 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=expire
addr=67.192.172.206:49292 fd=831 idle=64698 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=get
addr=67.192.172.206:49293 fd=832 idle=64698 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=expire
addr=67.192.172.206:49294 fd=833 idle=64697 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=get
addr=67.192.172.206:49295 fd=834 idle=64697 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=expire
addr=67.192.172.206:49299 fd=835 idle=64697 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=get
addr=67.192.172.206:49300 fd=836 idle=64697 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=get
addr=67.192.172.206:49304 fd=837 idle=64697 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=setex
addr=67.192.172.206:49329 fd=838 idle=64131 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=get
addr=67.192.17
[19869] 26 Oct 04:30:29 # === REDIS BUG REPORT END. Make sure to include from START to END. ===

   Please report the crash opening an issue on github:

       http://github.com/antirez/redis/issues

Suspect RAM error? Use redis-server --test-memory to veryfy it.

[8797] 26 Oct 04:35:47 * Server started, Redis version 2.4.16
[8797] 26 Oct 04:36:47 * DB loaded from disk: 60 seconds
[8797] 26 Oct 04:36:47 * The server is now ready to accept connections on port 6379
[8797] 26 Oct 04:50:48 * 1 changes in 900 seconds. Saving...
[8797] 26 Oct 04:50:50 * Background saving started by pid 9591

@antirez
Owner

Hello, please could you report the uname -a output for that system?

@bfanti

Good call, apologies, here it is:

Linux liveredis 3.2.0-24-virtual #37-Ubuntu SMP Wed Apr 25 10:17:19 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

@antirez
Owner

Thank you,

I'm a bit puzzled by the fact that after two months of high traffic the process seems to crash when calling a syscall:

[19869] 26 Oct 04:30:29 # /lib/x86_64-linux-gnu/libc.so.6(epoll_wait+0x33) [0x7f708f1bbb53]

The only thing that could be somewhat related is that 2.4 had a few limitations in the max number of clients that can be handled correctly, but here I see we are still pretty low compared to the limit.

I see that there are 652 connected clients, was that the norm or a spike? Thank you.

@bfanti

ulimit -n is set to 10240, and the connection limit in our config is set to 5000 for safety reasons. Daily operations run around 650 connected clients, with spikes up to 1250, rarely any higher than that. We have had sustained connected clients in the 1200's without issues.

We did roll out an update to the ServiceStack C# Redis driver to production yesterday at 5pm which could have caused issues later in the night, we're looking into it on our end. All 650-1250 connections are generated by the different app pools on our primary and secondary App servers via the C# driver.

Another thing to note: the box has ~8GB RAM, we have configured a hard-limit of 7GB in the redis.config, and the box is generally in the 99% RAM usage range. This is a dedicated Redis box in the RackSpace cloud.

Thanks for looking into this!

@antirez
Owner

Thanks, that's a lot of useful info.

I would exclude that's the number of clients, we are at 650 at the time of the crash, while at 1250 everything runs fine usually.

If it is the C# driver upgrade, it is still a bug in Redis, but it seems highly unlikely as well because the stack trace shows that we are inside a system call instead of a portion of the protocol parsing code or alike.

Is it possible that the OOM killer killed the process? However it is not compatible with the signal received by the process.

Did you tried a memory test against the box? ./redis-server --test-memory is a possibility.

It's a very strange issue and since the stack trace is strange I don't have really any good hint.
Please if it happens again, I will appreciate if you can post the update here.

In general for everything new you want to discuss I'm here to help tracking the problem as more data points are available. However there is no knonw issue like this Redis release so for sure it is not a documented / known bug.

Thanks!

@anydot
@antirez
Owner

This can happen if there is some glibc preprocessing, but if it is a pure wrapper, what should happen is that the function call returns EFAULT, that's why I'm puzzled.

@antirez
Owner

p.s. with 2.6 we get registers and stack dump, much more useful to investigate that kind of strange issues.

@bfanti

Grazie Salvatore!

We're looking into upgrading to 2.6 shortly.
I ran the redis-server memory test immediately after the crash and it seemed to work properly, but I'm seeing other weird issues right now on that box (we're suddenly using Swap memory space and I can't determine why) which could point to damaged RAM on there possibly. Will report back asap!

@antirez antirez closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.