New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worker process exited on signal 11 (core dump) #1514

Closed
discostur opened this Issue Dec 5, 2017 · 10 comments

Comments

Projects
None yet
4 participants
@discostur
Copy link

discostur commented Dec 5, 2017

Hi,

seems like i found a problem mit pagespeed / nginx. I configured nginx with debug mode and have some core dumps:

Nginx version:

nginx version: nginx/1.13.7
built by gcc 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) 
built with OpenSSL 1.1.0g  2 Nov 2017
TLS SNI support enabled
configure arguments: --user=nginx --group=nginx --prefix=/usr/share/nginx --sbin-path=/usr/sbin/nginx --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --http-client-body-temp-path=/var/lib/nginx/tmp/client_body --http-proxy-temp-path=/var/lib/nginx/tmp/proxy --http-fastcgi-temp-path=/var/lib/nginx/tmp/fastcgi --pid-path=/var/run/nginx.pid --lock-path=/var/lock/subsys/nginx --with-poll_module --with-http_ssl_module --with-http_v2_module --with-http_realip_module --with-http_addition_module --with-http_sub_module --with-http_dav_module --with-http_flv_module --with-http_mp4_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_auth_request_module --with-http_random_index_module --with-http_secure_link_module --with-http_degradation_module --with-http_stub_status_module --with-http_xslt_module --with-http_image_filter_module --with-http_geoip_module --with-http_perl_module --with-stream --with-stream_ssl_module --with-stream_realip_module --with-stream_geoip_module --with-threads --with-libatomic --with-pcre-jit --with-pcre --with-ipv6 --with-http_slice_module --with-stream_ssl_preread_module --with-http_sub_module --with-debug --add-module=/home/abuild/rpmbuild/BUILD/nginx-1.13.7/naxsi-0.55.3/naxsi_src --with-ld-opt='-Wl,-z,relro,-z,now -Wl,-rpath -Wl,-Bsymbolic-functions -Wl,--as-needed -fPIE' --with-cc-opt='-g -Ofast -march=native -ffast-math -fstack-protector-strong -Wformat -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2' --with-openssl=/home/abuild/rpmbuild/BUILD/nginx-1.13.7/openssl-1.1.0g --with-openssl-opt=enable-ec_nistp_64_gcc_128 --with-file-aio --add-module=/home/abuild/rpmbuild/BUILD/nginx-1.13.7/Nginx-limit-traffic-rate-module --add-module=/home/abuild/rpmbuild/BUILD/nginx-1.13.7/ngx_cache_purge-2.3 --add-module=/home/abuild/rpmbuild/BUILD/nginx-1.13.7/ngx_pagespeed-1.13.35.1-beta --with-threads --with-mail --with-mail_ssl_module

Core Dump:

(gdb) bt full
#0  0x0000000000916c26 in net_instaweb::ResourceSlot::LastContext() const ()
No symbol table info available.
#1  0x000000000092119a in net_instaweb::RewriteContext::AddSlot(net_instaweb::RefCountedPtr<net_instaweb::ResourceSlot> const&) ()
No symbol table info available.
#2  0x00000000008dbdfa in net_instaweb::CacheExtender::StartElementImpl(net_instaweb::HtmlElement*) ()
No symbol table info available.
#3  0x00000000006e3a3b in net_instaweb::HtmlParse::ApplyFilter(net_instaweb::HtmlFilter*) ()
No symbol table info available.
#4  0x00000000005c64ca in net_instaweb::RewriteDriver::FlushAsync(net_instaweb::Function*) ()
No symbol table info available.
#5  0x00000000005a3826 in net_instaweb::ProxyFetch::ExecuteQueued() ()
No symbol table info available.
#6  0x0000000000683617 in net_instaweb::Function::CallRun() ()
No symbol table info available.
#7  0x000000000074b910 in net_instaweb::QueuedWorkerPool::Run(net_instaweb::QueuedWorkerPool::Sequence*, net_instaweb::QueuedWorker*) ()
No symbol table info available.
#8  0x0000000000683617 in net_instaweb::Function::CallRun() ()
No symbol table info available.
#9  0x000000000074e9f7 in net_instaweb::Worker::WorkThread::Run() ()
No symbol table info available.
#10 0x00000000007508b2 in net_instaweb::PthreadThreadImpl::InvokeRun(void*) ()
No symbol table info available.
#11 0x00007f536ffe3e25 in start_thread (arg=0x7f536970a700) at pthread_create.c:308
        __res = <optimized out>
        pd = 0x7f536970a700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139996228003584, -3917339975527390799, 0, 139996228004288, 139996228003584, 6, 3964296789653234097, 3964311097298297265}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, 
              canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
#12 0x00007f536de6c34d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
No locals.

Debug log before crash:

2017/12/05 11:20:47 [debug] 760#760: *5359 xslt filter body
2017/12/05 11:20:47 [debug] 760#760: *5359 http postpone filter "/?" 000000001151C750
2017/12/05 11:20:47 [debug] 760#760: *5359 http pagespeed html rewrite body filter "/"
2017/12/05 11:20:47 [debug] 760#760: *5359 http pagespeed write filter "/"
2017/12/05 11:20:47 [debug] 760#760: *5359 http copy filter: -2 "/?"
2017/12/05 11:20:47 [debug] 760#760: *5359 pipe write downstream done
2017/12/05 11:20:47 [debug] 760#760: *5359 event timer: 39, old: 1512469307602, new: 1512469307608
2017/12/05 11:20:47 [debug] 760#760: *5359 http file cache update
2017/12/05 11:20:47 [debug] 760#760: *5359 http file cache rename: "/var/ngx_filecache/9/d9/a5a0e6a69817ba1b9925eee9579abd99.0000001955" to "/var/ngx_filecache/9/d9/a5a0e6a69817ba1b9925eee9579abd99"
2017/12/05 11:20:47 [debug] 760#760: shmtx lock
2017/12/05 11:20:47 [debug] 760#760: shmtx unlock
2017/12/05 11:20:47 [debug] 760#760: *5359 http upstream exit: 0000000000000000
2017/12/05 11:20:47 [debug] 760#760: *5359 finalize http upstream request: 0
2017/12/05 11:20:47 [debug] 760#760: *5359 finalize http proxy request
2017/12/05 11:20:47 [debug] 760#760: *5359 free keepalive peer
2017/12/05 11:20:47 [debug] 760#760: *5359 free keepalive peer: saving connection 0000000005792C60
2017/12/05 11:20:47 [debug] 760#760: *5359 event timer del: 39: 1512469307602
2017/12/05 11:20:47 [debug] 760#760: *5359 free rr peer 1 0
2017/12/05 11:20:47 [debug] 760#760: *5359 http upstream temp fd: 40
2017/12/05 11:20:47 [debug] 760#760: *5359 http output filter "/?"
2017/12/05 11:20:47 [debug] 760#760: *5359 http copy filter: "/?"
2017/12/05 11:20:47 [debug] 760#760: *5359 limit traffic rate filter
2017/12/05 11:20:47 [debug] 760#760: *5359 image filter
2017/12/05 11:20:47 [debug] 760#760: *5359 xslt filter body
2017/12/05 11:20:47 [debug] 760#760: *5359 http postpone filter "/?" 00007FFC399B1830
2017/12/05 11:20:47 [debug] 760#760: *5359 http pagespeed html rewrite body filter "/"
2017/12/05 11:20:47 [debug] 760#760: [ngx_pagespeed 1.13.35.1-0] [1205/112047:VERBOSE1:proxy_fetch.cc(993)] Fetch result:1 https://www.domain.url/ : 200
2017/12/05 11:20:47 [debug] 760#760: *5359 http pagespeed write filter "/"
2017/12/05 11:20:47 [debug] 760#760: *5359 http copy filter: -2 "/?"
2017/12/05 11:20:47 [debug] 760#760: *5359 http finalize request: -2, "/?" a:1, c:1
2017/12/05 11:20:47 [debug] 760#760: *5359 event timer add: 37: 60000:1512469307608
2017/12/05 11:20:47 [debug] 760#760: keepalive dummy handler
2017/12/05 11:20:47 [debug] 760#760: timer delta: 1
2017/12/05 11:20:47 [debug] 760#760: worker cycle
2017/12/05 11:20:47 [debug] 760#760: epoll timer: 12911
2017/12/05 11:20:47 [info] 760#762: [ngx_pagespeed 1.13.35.1-0] No permission to rewrite 'https://www.facebook.com/tr?id=1655255864705433&ev=PageView&noscript=1'
2017/12/05 11:20:47 [info] 760#762: [ngx_pagespeed 1.13.35.1-0] No permission to rewrite 'https://maxcdn.bootstrapcdn.com/font-awesome/4.6.3/css/font-awesome.min.css?ver=4.3.1'
2017/12/05 11:20:47 [info] 760#762: [ngx_pagespeed 1.13.35.1-0] No permission to rewrite 'https://fonts.googleapis.com/css?family=Roboto+Slab%3A300%2C400%2C500%2C600%2C700%2C900%7CLora%3A300%2C400%2C500%2C600%2C700%2C900&subset=latin&ver=4.9.1'
2017/12/05 11:20:47 [info] 760#762: [ngx_pagespeed 1.13.35.1-0] No permission to rewrite 'https://fonts.googleapis.com/css?family=Oswald%3A400%2C300%7COpen+Sans%3A700%2C400%2C400i%7CRoboto%3A700%2C400&ver=4.9.1'
2017/12/05 11:20:47 [info] 760#762: [ngx_pagespeed 1.13.35.1-0] No permission to rewrite 'https://platform-api.sharethis.com/js/sharethis.js#product=ga&property=58adab5aec741d001187f4b1'
2017/12/05 11:20:47 [info] 760#762: [ngx_pagespeed 1.13.35.1-0] No permission to rewrite 'https://www.facebook.com/tr?id=1655255864705433&ev=PageView&noscript=1'
2017/12/05 11:20:48 [notice] 207#207: signal 17 (SIGCHLD) received from 760
2017/12/05 11:20:48 [alert] 207#207: worker process 760 exited on signal 11 (core dumped)
2017/12/05 11:20:48 [debug] 207#207: shmtx forced unlock
2017/12/05 11:20:48 [debug] 207#207: shmtx forced unlock
2017/12/05 11:20:48 [debug] 207#207: shmtx forced unlock
2017/12/05 11:20:48 [debug] 207#207: wake up, sigio 0
2017/12/05 11:20:48 [debug] 207#207: reap children
@jmarantz

This comment has been minimized.

Copy link
Contributor

jmarantz commented Dec 5, 2017

Hi -- can you share your pagespeed configuration options? In particular, we are interested in several things based on your stack trace:

  1. is honoring CSP enabled (ModPagespeedHonorCsp)?
  2. do you have srcset references in some of the HTML being served?
  3. are you able to repro this crash, or does it happen randomly while serving?
  4. can you correlate the crash with a particular URL in your access log?
@oschaaf

This comment has been minimized.

Copy link
Member

oschaaf commented Dec 6, 2017

A reproduction of the back trace above on my system is fixed via apache/incubator-pagespeed-mod#1691

On top of jmarantz his questions, I'm also curious if memcached is involved.

@discostur

This comment has been minimized.

Copy link

discostur commented Dec 7, 2017

@jmarantz I traced it down and found that it only happens in a special configuration, where i have activated mod_pagespeed AND naxsi (https://github.com/nbs-system/naxsi) for one of my websites. If i disable naxsi and only mod_pagespeed is enabled, i can't reproduce the error. I think it may come from a special URL, because it only happens on one special domain, but i haven't found the particular URL yet :/
I can reproduce it, but only by browsing the website, not by hitting on a particular URL (not yet). CSP is not enabled.

@oschaaf Yes, two memcached backends are active in this configuration

oschaaf referenced this issue in apache/incubator-pagespeed-mod Dec 20, 2017

oschaaf added a commit to apache/incubator-pagespeed-mod that referenced this issue Dec 20, 2017

@nolybom

This comment has been minimized.

Copy link

nolybom commented Dec 20, 2017

Having the exact same error. Installed ngx_pagespeed 1.12.34stable about 12h ago.
Obviously missed the fix by just a few hours. Recompiling nginx now and testing again.

@jmarantz

This comment has been minimized.

Copy link
Contributor

jmarantz commented Dec 20, 2017

@nolybom Just a heads up that the fix hasn't been merged into trunk yet, IIUC.

@oschaaf

This comment has been minimized.

Copy link
Member

oschaaf commented Dec 20, 2017

@jmarantz is right, the fix for this is still pending review

@nolybom

This comment has been minimized.

Copy link

nolybom commented Dec 20, 2017

Debugged a little deeper and can confirm @Volodymyr Lupu 's (centminmod) catch that the combination of brotli and pagespeed causes a huge loading delay which sometimes leads to a site crash.
Like @discostur mentioned, it happens on random uri‘s while browsing. Adding..

#pagespeed HttpCacheCompressionLevel 0;

...to the pagespeed.conf didn’t help.

I have a https site, three more http and phpmyadmin open. Switching in between and reloading them causes at some point the server to go down. Log file show signal 11 and a few signal 9s. And trying to display..

_DOMAIN_/pagespeed_global_admin/message_history

shows the error:

Writing to mod_pagespeed_message failed. Verify that MessageBufferSize is not set to 0 in pagespeed.conf.

One interesting thing is that when brotli and pagespeed are both enabled and only http sites are reloaded then everything seems fine. But the moment the https site comes to the game with a fresh reload the problems start.

Log file show besided signal 11 then also:

worker process xxxxx exited on signal 9

@oschaaf

This comment has been minimized.

Copy link
Member

oschaaf commented Dec 20, 2017

@nolybom I think you found a different problem, I can't really relate what you describe to the stack trace posted initially here (which we reproduced, and for which a fix is pending).
Could you file a new issue?

@nolybom

This comment has been minimized.

Copy link

nolybom commented Dec 21, 2017

The relation are the log errors "worker process exited on signal 11" and saw a similiarity to @discostur 's post...but sure....just opened a new issue -> issue 1518

oschaaf added a commit to apache/incubator-pagespeed-mod that referenced this issue Dec 21, 2017

@oschaaf

This comment has been minimized.

Copy link
Member

oschaaf commented Dec 21, 2017

@oschaaf oschaaf closed this Dec 21, 2017

oschaaf added a commit to apache/incubator-pagespeed-mod that referenced this issue Dec 27, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment