Serf spin causes 100% CPU usage #674
Comments
|
Looking at The socket comes from |
Looking at the apr code, I think we may be hitting a bug in The non- |
Current guess: |
What OS are you on? |
We encountered the same problem. Here are the stack of nginx process,
and the system call traces
We are running |
Hi jeffkaufman,
|
Hi, I've just upgraded to nginx 1.10.1 with ngx_pagespeed 1.11.33.2 (using the packages from dotdeb.org) and it looks like I'm being affected by a bug very similar to this one. strace output is the same (repeated reads returning EAGAIN), and the gdb backtraces look like:
libapr1 is at version 1.5.1 (standard Debian Jessie package). Is there anything I can do to avoid this problem? Sorry if it's a different issue, if so let me know and I'll open a new one instead. |
that block in apr hasn't been updated since 2003 (although it's possible that apr is still to blame). This does look like the same issue. How often does this happen, and can you reproduce this with any regularity? |
We only went live with this new version a couple of days ago, but I've seen it happen a couple of times so far. Our traffic peaks at around 300 reqs/sec currently, so it's not exactly that frequent relatively speaking. I have a theory (as yet untested) that it happens when our failover system (using keepalived) triggers -- I actually noticed this happening because we'd just failed over and there were still nginx processes using CPU on the server that had just gone out of service. I suspect that the connection that pagespeed had open became invalid because the IP at had been using as its source address was no longer on the interface. |
Your theory seems plausible. One thing I can't tell for sure from your If it's in serf_url_async_fetcher.cc we might be able to work around it by -Josh On Fri, Jun 10, 2016 at 7:31 AM, crispygoth notifications@github.com
|
Thanks for the stack trace @crispygoth! It turns out that SerfFetch::ReadBody doesn't handle EAGAIN correctly and immediately tries again to read from a non-blocking socket. If there's no data available, like when a socket is hung after a fail-over, it will spin. I'm working on a fix. |
The fix for this was in the serf fetcher, which is in the core mod_pagespeed repo. I found a number of problems with the error handling and the fix will definitely be in our next release. If you want to patch it locally, here are links to the sequence of commits you'll need: apache/incubator-pagespeed-mod@9d76d90 |
The process stack:
The text was updated successfully, but these errors were encountered: