nginx worker 100% cpu usage (spinning on write returning EAGAIN) #1380

oschaaf · 2017-02-12T09:34:05Z

Reported via https://groups.google.com/forum/#!topic/ngx-pagespeed-discuss/yrEMXV2m-Ig

root@xx lsb_release  -a

No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 14.04.3 LTS
Release:        14.04
Codename:       trusty

root@xx# uname -r
3.19.0-79-generic

root@xx:~# nginx -V
nginx version: nginx/1.10.3
built by gcc 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04.1)
built with OpenSSL 1.0.2j  26 Sep 2016
TLS SNI support enabled
configure arguments: --prefix=/etc/nginx --sbin-path=/usr/sbin/nginx --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --pid-path=/var/run/nginx.pid --lock-path=/var/run/nginx.lock --http-client-body-temp-path=/var/cache/nginx/client_temp --http-proxy-temp-path=/var/cache/nginx/proxy_temp --http-fastcgi-temp-path=/var/cache/nginx/fastcgi_temp --http-uwsgi-temp-path=/var/cache/nginx/uwsgi_temp --http-scgi-temp-path=/var/cache/nginx/scgi_temp --user=nginx --group=nginx --with-http_ssl_module --with-http_realip_module --with-http_addition_module --with-http_sub_module --with-http_dav_module --with-http_flv_module --with-http_mp4_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_random_index_module --with-http_secure_link_module --with-http_stub_status_module --with-http_auth_request_module --with-mail --with-mail_ssl_module --with-file-aio --with-http_v2_module --with-cc-opt='-g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security' --with-ld-opt='-Wl,-Bsymbolic-functions -Wl,-z,relro' --add-module=/root/ngx_pagespeed-release-1.11.33.4-beta --with-ipv6

(gdb) bt
#0  0x00007fe4e9a4135d in write () at ../sysdeps/unix/syscall-template.S:81
#1  0x00000000004c1d92 in net_instaweb::NgxEventConnection::WriteEvent (this=0x20d1100, type=<optimized out>, sender=sender@entry=0x21e8a30) at /root/ngx_pagespeed-release-1.11.33.4-beta/src/ngx_event_connection.cc:142
#2  0x00000000004c124c in net_instaweb::NgxBaseFetch::RequestCollection (this=0x21e8a30, type=<optimized out>) at /root/ngx_pagespeed-release-1.11.33.4-beta/src/ngx_base_fetch.cc:290
#3  0x00000000004c12ae in RequestCollection (type=70 'F', this=<optimized out>) at /root/ngx_pagespeed-release-1.11.33.4-beta/src/ngx_base_fetch.cc:317
#4  net_instaweb::NgxBaseFetch::HandleFlush (this=<optimized out>, handler=<optimized out>) at /root/ngx_pagespeed-release-1.11.33.4-beta/src/ngx_base_fetch.cc:318
#5  0x00000000004c6fc2 in net_instaweb::(anonymous namespace)::ps_send_to_pagespeed (in=in@entry=0x2228470, cfg_s=<optimized out>, ctx=<optimized out>, ctx=<optimized out>, r=<optimized out>)
    at /root/ngx_pagespeed-release-1.11.33.4-beta/src/ngx_pagespeed.cc:2087
#6  0x00000000004c7147 in net_instaweb::(anonymous namespace)::html_rewrite::ps_html_rewrite_body_filter (r=<optimized out>, in=0x2228470) at /root/ngx_pagespeed-release-1.11.33.4-beta/src/ngx_pagespeed.cc:2309
#7  0x0000000000481753 in ngx_http_ssi_body_filter (r=0x21dd6e0, in=<optimized out>) at src/http/modules/ngx_http_ssi_filter_module.c:447
#8  0x00000000004846b7 in ngx_http_charset_body_filter (r=0x21dd6e0, in=0x2228470) at src/http/modules/ngx_http_charset_filter_module.c:647
#9  0x0000000000486787 in ngx_http_addition_body_filter (r=0x21dd6e0, in=0x2228470) at src/http/modules/ngx_http_addition_filter_module.c:166
#10 0x0000000000486dfc in ngx_http_gunzip_body_filter (r=0x21dd6e0, in=0x2228470) at src/http/modules/ngx_http_gunzip_filter_module.c:185
#11 0x000000000042dae7 in ngx_output_chain (ctx=ctx@entry=0x2214f70, in=in@entry=0x21cf640) at src/core/ngx_output_chain.c:214
#12 0x0000000000488d5c in ngx_http_copy_filter (r=0x21dd6e0, in=0x21cf640) at src/http/ngx_http_copy_filter_module.c:152
#13 0x000000000045c88b in ngx_http_output_filter (r=r@entry=0x21dd6e0, in=<optimized out>) at src/http/ngx_http_core_module.c:1970
#14 0x0000000000470489 in ngx_http_upstream_output_filter (data=0x21dd6e0, chain=<optimized out>) at src/http/ngx_http_upstream.c:3587
#15 0x00000000004479c3 in ngx_event_pipe_write_to_downstream (p=0x21df188) at src/event/ngx_event_pipe.c:690
#16 ngx_event_pipe (p=0x21df188, do_write=do_write@entry=0) at src/event/ngx_event_pipe.c:33
#17 0x0000000000470c7b in ngx_http_upstream_process_upstream (r=0x21dd6e0, u=0x21ded10) at src/http/ngx_http_upstream.c:3727
#18 0x000000000046f639 in ngx_http_upstream_handler (ev=<optimized out>) at src/http/ngx_http_upstream.c:1117
#19 0x000000000044d857 in ngx_epoll_process_events (cycle=<optimized out>, timer=<optimized out>, flags=<optimized out>) at src/event/modules/ngx_epoll_module.c:822
#20 0x0000000000445b13 in ngx_process_events_and_timers (cycle=cycle@entry=0x1f1c390) at src/event/ngx_event.c:242
#21 0x000000000044ba71 in ngx_worker_process_cycle (cycle=cycle@entry=0x1f1c390, data=data@entry=0x16) at src/os/unix/ngx_process_cycle.c:753
#22 0x000000000044a544 in ngx_spawn_process (cycle=cycle@entry=0x1f1c390, proc=proc@entry=0x44b9f0 <ngx_worker_process_cycle>, data=data@entry=0x16, name=name@entry=0xa6a72f "worker process", respawn=respawn@entry=-3)
    at src/os/unix/ngx_process.c:198
#23 0x000000000044bc34 in ngx_start_worker_processes (cycle=cycle@entry=0x1f1c390, n=32, type=type@entry=-3) at src/os/unix/ngx_process_cycle.c:358
#24 0x000000000044c638 in ngx_master_process_cycle (cycle=cycle@entry=0x1f1c390) at src/os/unix/ngx_process_cycle.c:130
#25 0x0000000000426779 in main (argc=<optimized out>, argv=<optimized out>) at src/core/nginx.c:367

I forgot to add the output of strace which is basically just

write(64, "F\0\0\0\0\0\0\0\220\231y\2\0\0\0\0\260AJ\2\0\0\0\0", 24) = -1 EAGAIN (Resource temporarily unavailable)

repeated infinity.

The text was updated successfully, but these errors were encountered:

oschaaf · 2017-02-12T09:35:24Z

I suspect the SSI module may also be a required element to reproduce this.

As a quick workaround, increasing the send buffer size for the pipe used to communicate between nginx and ngx_pagespeed may lessen or even eliminate the problem.
But a better solution would probably be setting up a mechanism where we reschedule events for sending later, when we can not send them right away (because write returned EAGAIN).

urifoox · 2017-04-28T18:43:41Z

Hi - just wondering if there are any plans to fix this? We have mitigated it in our environment by just setting up monitoring and fixing the locked threads manually when detected but as you can imagine it is not ideal. Disabling SSI did not solve this.

oschaaf · 2017-04-29T15:03:45Z

@urifoox so what makes this hard to fix is that I can't reproduce it myself.

If my hunch is correct, bumping the buffer size of the pipe used to communicate between nginx and pagespeed would at least lessen and maybe in practice even eliminate the problem. The following patch does that: https://gist.github.com/oschaaf/2382c735e29f4c960b1e3ca1dacc22fd

If that works, we can:

at least confirm that the problem is diagnosed correctly
maybe merge the patch, as that would be a quick improvement at least, and
land a proper fix later on

So it would be very helpful if you could try that patch

urifoox · 2017-09-24T15:39:19Z

@oschaaf - for thoroughness, even though it is discussed at https://groups.google.com/forum/#!searchin/ngx-pagespeed-discuss/uri$20foox%7Csort:relevance/ngx-pagespeed-discuss/yrEMXV2m-Ig/r9PMBzPPCQAJ, this patch has resolved our issue.

Bump the pipe capacity, because running out of buffer space may cause a write to spin indefinitely on EAGAIN. Bumping the pipe capacity should eliminate the problem in practice, though in theory the module could still be subject to it. For now, leaving behind a todo with a suggested solution (should the problem ever show up again). Fixes #1380

jmarantz · 2017-11-27T22:28:29Z

Re-opening as this fix depends on a kernel not everyone has.

oschaaf · 2017-11-27T22:49:44Z

@jmarantz It wouldn’t get a beauty price, but perhaps round robin assignment of base fetches over a configurable number of pipes instead of a single one would be sufficient (and simple).. wdyt?

zz9pzza · 2021-01-19T23:09:58Z

I am hitting this we think.

http://mailman.nginx.org/pipermail/nginx/2021-January/060344.html
http://mailman.nginx.org/pipermail/nginx/2021-January/060346.html

pagespeed saves us a lot of bandwidth.

It seems to me in this case that nothing is being read from the pipe in our situation ?

urifoox · 2021-01-19T23:23:33Z

@zz9pzza - did you try to apply the patch / recompile? #1481

zz9pzza · 2021-01-19T23:31:09Z

james_@DESKTOP-OTOBGUP:~$ curl -I https://archiveofourown.org/
HTTP/2 200
server: nginx/1.19.6
....
x-page-speed: 1.13.35.2-0

We are running 1.13.35.2 which I believe includes that patch on kernel 5.9.0-0.bpo.2-amd64

oschaaf · 2021-01-19T23:32:24Z

@zz9pzza seems relevant to this issue, sure. Apparently the tweak to bump the buffer sizes associated to the pipe didn't help in your case, so it sounds like a proper fix for the TODO over at https://github.com/apache/incubator-pagespeed-ngx/blob/master/src/ngx_event_connection.cc#L157 is necessary.

It's been a while, so my memory isn't crystal clear, but I think the that when NgxEventConnection runs into the situation where it repeatedly cannot write, a change is needed:

A queue to buffer writes (ordering is important)
A timer, to ensure processing events in the flow when there are no subsequent i/o wakeups

The code that now writes to the pipe should first check the queue and process any buffered writes, and then proceed as normal (though possibly disarming the timer if the queue was fully drained ).
Similarly the new code associated to the new timer event should also processes buffered up writes.

Unfortunately I don't tink I will have time soon to make this happen, but maybe the braindump above will enthuse someone to write a fix, or else it might serve as a note to future me :-)

zz9pzza · 2021-01-21T18:30:10Z

Would just raising the number in https://github.com/apache/incubator-pagespeed-ngx/blob/master/src/ngx_event_connection.cc#L64 help ( not as a solution for everyone just to make it less likely ) ?

oschaaf · 2021-01-21T20:26:59Z

Would just raising the number in https://github.com/apache/incubator-pagespeed-ngx/blob/master/src/ngx_event_connection.cc#L64 help ( not as a solution for everyone just to make it less likely ) ?

Well, it's worth trying, and the change is probably trivial, but I wouldn't bet on it :(

zz9pzza · 2021-01-31T15:39:42Z

I bumped the number from

61c61
<     fcntl(pipe_write_fd_, F_SETPIPE_SZ, 1024*1024 /* minimal amount of bytes */);
---
>     fcntl(pipe_write_fd_, F_SETPIPE_SZ, 200*1024 /* minimal amount of bytes */);

And over 4 servers with 200 we had the 17 times on a day with 57 millon page views. the next week was 58 million and the same servers with the new code hit the issue 6 times.

oschaaf mentioned this issue Sep 25, 2017

Fix nginx worker 100% cpu usage (spinning on write returning EAGAIN) #1481

Merged

oschaaf closed this as completed in #1481 Sep 29, 2017

oschaaf mentioned this issue Nov 27, 2017

F_SETPIPE_SZ: check for existence before using #1505

Merged

jmarantz reopened this Nov 27, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nginx worker 100% cpu usage (spinning on write returning EAGAIN) #1380

nginx worker 100% cpu usage (spinning on write returning EAGAIN) #1380

oschaaf commented Feb 12, 2017

oschaaf commented Feb 12, 2017

urifoox commented Apr 28, 2017

oschaaf commented Apr 29, 2017

urifoox commented Sep 24, 2017

jmarantz commented Nov 27, 2017

oschaaf commented Nov 27, 2017

zz9pzza commented Jan 19, 2021

urifoox commented Jan 19, 2021

zz9pzza commented Jan 19, 2021

oschaaf commented Jan 19, 2021

zz9pzza commented Jan 21, 2021

oschaaf commented Jan 21, 2021 •

edited

zz9pzza commented Jan 31, 2021 •

edited

nginx worker 100% cpu usage (spinning on write returning EAGAIN) #1380

nginx worker 100% cpu usage (spinning on write returning EAGAIN) #1380

Comments

oschaaf commented Feb 12, 2017

oschaaf commented Feb 12, 2017

urifoox commented Apr 28, 2017

oschaaf commented Apr 29, 2017

urifoox commented Sep 24, 2017

jmarantz commented Nov 27, 2017

oschaaf commented Nov 27, 2017

zz9pzza commented Jan 19, 2021

urifoox commented Jan 19, 2021

zz9pzza commented Jan 19, 2021

oschaaf commented Jan 19, 2021

zz9pzza commented Jan 21, 2021

oschaaf commented Jan 21, 2021 • edited

zz9pzza commented Jan 31, 2021 • edited

oschaaf commented Jan 21, 2021 •

edited

zz9pzza commented Jan 31, 2021 •

edited