nginx worker 100% cpu usage (spinning on write returning EAGAIN) #1380
Comments
I suspect the SSI module may also be a required element to reproduce this. As a quick workaround, increasing the send buffer size for the pipe used to communicate between nginx and ngx_pagespeed may lessen or even eliminate the problem. |
Hi - just wondering if there are any plans to fix this? We have mitigated it in our environment by just setting up monitoring and fixing the locked threads manually when detected but as you can imagine it is not ideal. Disabling SSI did not solve this. |
@urifoox so what makes this hard to fix is that I can't reproduce it myself. If my hunch is correct, bumping the buffer size of the pipe used to communicate between nginx and pagespeed would at least lessen and maybe in practice even eliminate the problem. The following patch does that: https://gist.github.com/oschaaf/2382c735e29f4c960b1e3ca1dacc22fd If that works, we can:
So it would be very helpful if you could try that patch |
@oschaaf - for thoroughness, even though it is discussed at https://groups.google.com/forum/#!searchin/ngx-pagespeed-discuss/uri$20foox%7Csort:relevance/ngx-pagespeed-discuss/yrEMXV2m-Ig/r9PMBzPPCQAJ, this patch has resolved our issue. |
Bump the pipe capacity, because running out of buffer space may cause a write to spin indefinitely on EAGAIN. Bumping the pipe capacity should eliminate the problem in practice, though in theory the module could still be subject to it. For now, leaving behind a todo with a suggested solution (should the problem ever show up again). Fixes #1380
Bump the pipe capacity, because running out of buffer space may cause a write to spin indefinitely on EAGAIN. Bumping the pipe capacity should eliminate the problem in practice, though in theory the module could still be subject to it. For now, leaving behind a todo with a suggested solution (should the problem ever show up again). Fixes #1380
Re-opening as this fix depends on a kernel not everyone has. |
@jmarantz It wouldn’t get a beauty price, but perhaps round robin assignment of base fetches over a configurable number of pipes instead of a single one would be sufficient (and simple).. wdyt? |
I am hitting this we think. http://mailman.nginx.org/pipermail/nginx/2021-January/060344.html pagespeed saves us a lot of bandwidth. It seems to me in this case that nothing is being read from the pipe in our situation ? |
We are running 1.13.35.2 which I believe includes that patch on kernel 5.9.0-0.bpo.2-amd64 |
@zz9pzza seems relevant to this issue, sure. Apparently the tweak to bump the buffer sizes associated to the pipe didn't help in your case, so it sounds like a proper fix for the TODO over at https://github.com/apache/incubator-pagespeed-ngx/blob/master/src/ngx_event_connection.cc#L157 is necessary. It's been a while, so my memory isn't crystal clear, but I think the that when
The code that now writes to the pipe should first check the queue and process any buffered writes, and then proceed as normal (though possibly disarming the timer if the queue was fully drained ). Unfortunately I don't tink I will have time soon to make this happen, but maybe the braindump above will enthuse someone to write a fix, or else it might serve as a note to future me :-) |
Would just raising the number in https://github.com/apache/incubator-pagespeed-ngx/blob/master/src/ngx_event_connection.cc#L64 help ( not as a solution for everyone just to make it less likely ) ? |
Well, it's worth trying, and the change is probably trivial, but I wouldn't bet on it :( |
I bumped the number from
And over 4 servers with 200 we had the 17 times on a day with 57 millon page views. the next week was 58 million and the same servers with the new code hit the issue 6 times. |
Reported via https://groups.google.com/forum/#!topic/ngx-pagespeed-discuss/yrEMXV2m-Ig
The text was updated successfully, but these errors were encountered: