Crashes on custom 404s for .pagespeed. resources #1081
Comments
Yikes, sorry about that! Were there any other ngx_pagespeed logs around when this happened? Do you have the core dump it says it took? Would you be up for running a 1.10 compiled with --with-debug so we can get cores and backtraces with symbols? |
Yes i can try to test it compiling pagespeed --with-debug and send you the core dump, but please teach me how to get it or send me a pointer to some how-to for getting the core dump file, I'm not a C developer: I just compile nignx and pagespeed to use on my server. So, I don't have a 'core dump' for the case, as I don't know very well how to get it saved to a file, i searched a bit and i tried to put this in nginx.conf: worker_rlimit_core 500M; But the directory was empty when the issue happened (maybe because the main process didn't died but just became unable to serve requests?) Is this enough? I still didn't compiled with --with-debug, was this the reason i was unable to see the core dumps? After recompiling with debug, is debugging with gdb a good way too to get the data you need? What command I can issue to do this? Is #gdb /path/to/binary/nginx/ enough? Thanks |
@creativeprogramming When you have a core dump, this should give you a backtrace: gdb /path/to/nginx /path/to/cores/nginx.core And then, when gdb has started:
Having nginx compiled with --debug would help debugging a lot as it will give us easier to understand backtraces. Seeing the messages in your logs, I would have expected a core dump to exist in the configured directory. If you want, we could try to figure this out together via instant messaging |
I finally managed to get the core dumps files, before posting you the backtrace details i write here how i solved my issue for people coming from search engines ad to keep a public note for myself too It was all due to ulimits:
so I'd to increase those limits
|
here comes more details let's start with nginx error.log just seconds after nginx restart:
sometimes i get 'no threading detected' message at startup and slow read/write as i previously posted, but this time not |
@jeffkaufman @oschaaf here's the backtrace
|
my nginx compile info:
Note: i tried also to complie nginx without any other third party module (e.g. naxsi, purge etc.) the problem persists |
Tell me if you need more |
PS. no rush for me: I'm running anyway very fine with 11.9.32.10 i've no urgency to upgrade, so, keep up the good work and thank you for it. |
Thanks a lot for diving in and getting out the backtrace with gdb. 1.10 contains changes that makes ngx_pagespeed use a single named pipe to communitcate with PageSpeed instead of using a named pipe per request. Looking at the backtrace, what happens is that an assumption I made in that change fails: we receive an event from the pipe originating from a base fetch that is no longer associated with the active request, while the nginx side has not released the base fetch associated to the event. To properly handle that situation, I need to precisely understand what leads up to that.
And subsequently run with a debug build? Note that the will probably generate a LOT of logging though, so you have to keep an eye on that. Meantime, I will try to reproduce this myself. After some thought I have a suspicion about what may be leading up to this (nginx overwriting the complete request context in some situations). I'll update here if I succeed in that. |
Cross linking related post in pagespeed-dev: |
Here are some minutes of debug log: |
here's nginx.conf (nevermind the comments, it's a bit messy) #TODO: check buffers and other things described here: http://www.cyberciti.biz/tips/linux-unix-bsd-nginx-webserver-security.html
user www-data www-data;
worker_processes 6;
worker_rlimit_core 1M;
working_directory /usr/local/nginx-coredumps;
#thread_pool default threads=64 max_queue=65536;
#server_tokens off
#error_log /var/log/nginx/error.debug.log debug;
error_log /var/log/nginx/error.log;
pid /var/run/nginx.pid;
events {
worker_connections 1024;
multi_accept on; # on 14.10.2014 see:http://seravo.fi/2013/optimizing-web-server-performance-with-nginx-and-php
# Linux performance awesomeness on
use epoll;
}
http {
include /etc/nginx/naxsi_core.rules;
log_format vhostslogging '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" $request_time for $server_name';
log_format my_post_tracking $request_body;
server_tokens off;
include /etc/nginx/mime.types;
default_type application/octet-stream;
#include /etc/nginx/passenger.conf;
#added 6 sept 2013 as joomla isotopic send server seems to go in error: upstream sent too big header while reading response header from upstream
#see also: http://stackoverflow.com/questions/13894386/upstream-too-big-nginx-codeigniter
#proxy_buffer_size 128k;
#proxy_buffers 4 256k;
#proxy_busy_buffers_size 256k;
proxy_buffering off;
#end added 6 sept 2013 as joomla isotopic send server seems to go in error: upstream sent too big header while reading response header from upstream
#see also: http://stackoverflow.com/questions/13894386/upstream-too-big-nginx-codeigniter
# sendfile on;
# before 17 May 2013 was sendfile on, and aio line not present!!! i'm testing async io
# should improve cpu usage and troughput, see here: http://blog.lighttpd.net/articles/2006/11/09/async-io-on-linux/
#sendfile off; # on before 17 May 2013 18.20
#AIO should be good for large files but disables linux VFS cache TEMP EXPERIMENT
#On Linux, AIO is usable starting from kernel version 2.6.22; plus, it is also necessary to enable directio, otherwise reading will be blocking:
#directio on; # not present before 17 May 2013
#aio on; # not present before 17 May 2013
#experimental performance 17 May 2013
client_body_buffer_size 128k; #http://nginx.org/en/docs/http/ngx_http_core_module.html#client_body_buffer_size
#Sets buffer size for reading client request body. In case request body is larger than the buffer, the whole body or only its part is written to a temporary file.
large_client_header_buffers 4 128k;
#end experimental performance 17 may 2013
#experimental protection 17 May 2013
## Start: Timeouts ##
client_body_timeout 20s;
client_header_timeout 20s;
#send_timeout 10; #36s see below
## End: Timeouts ##
#end experimental protection 17 may 2013
#http://wiki.nginx.org/ReadMoreAboutTcpNopush
#tcp_nopush off; #before 23 mar 2013 was on
#TODO: ON FOR WEBSOCKETS
tcp_nodelay on; #before 23 mar 2013 was offi
#The tcp_nodelay disables the Nagle buffering algorithm. It is usable when the server doesn’t require a response from the client.
#General web use does need a response, so this should be turned off, unless you need to send out small bursts of information, like tracking mouse movements.
# server_names_hash_bucket_size 64;
types_hash_max_size 2048;
# server_tokens off;
# Where to store the body of large client requests on disk
# NGINX will stream this to disk before posting it to your Mongrels,
# preventing slow clients tying up your app.
client_body_temp_path /tmp/nginx-client-body 1 2;
# Max size of a request from a client (usually a POST). This will limit
# the size of file uploads to your app
client_max_body_size 32m;
access_log /var/log/nginx/access.log;
#Optional otpimization for avoid too much handshakes,
#see: http://nginx.org/en/docs/http/configuring_https_servers.html
ssl_session_cache shared:SSL:128m; #1m = around 4000 ssl sessions
# server_names_hash_bucket_size 64;
# server_name_in_redirect off;
#added 23 mar 2013 open_file_cache to try improve wait time (first byte of static)
open_file_cache max=30000 inactive=120s;
open_file_cache_valid 120s;
open_file_cache_min_uses 1;
open_file_cache_errors on;
send_timeout 60s; #was 36s until 15.10.2014
#end 23 mar 2013
#keepalive_timeout 0;
#keepalive_timeout 65; #default
#keepalive_timeout 120s; <-- was this until 17 may 2013
keepalive_timeout 120s 60s; #13 may added header timeout to convince some browsers to close the keepalive
keepalive_requests 9999; #nginx default is 100
#see: http://en.wikipedia.org/wiki/HTTP_persistent_connection
#tcp_noauth on;
gzip on; #turned off on 15.10.14 to improve TTFB
#sendfile on;
sendfile on;
#sendfile_max_chunk 512k;
tcp_nopush on;
#seehttps://t37.net/nginx-optimization-understanding-sendfile-tcp_nodelay-and-tcp_nopush.html
#aio sendfile;
#directio on;
#aio on;
#gzip_static off; #turned on 14.10.2014 see:http://www.media-division.com/generation-of-gzip-files-for-nginx/
gzip_http_version 1.1;
gzip_comp_level 1; #was unusefully 9 until 23 mar 2013 (no gain cp wrost)
gzip_min_length 20;
# gzip_buffers 16 8k;
gzip_types "application/x-javascript; charset=utf-8";
gzip_types text/plain application/xhtml+xml text/css application/xml application/xml+rss text/javascript application/javascript application/x-javascript
gzip_proxied any;
gzip_disable "MSIE [1-6]\.";
gzip_vary on;
charset utf-8;
source_charset utf-8;
#gzip_proxied any;
#gzip_types text/plain text/html text/css application/x-javascript text/xml application/xml application/xml+rss text/javascript;
#limit_req_zone $binary_remote_addr zone=antiddosspider:1m rate=2r/m;
#limit_req_zone $binary_remote_addr zone=antiddosphp:1m rate=3r/s;
#limit_req_zone $binary_remote_addr zone=antiddosstatic:1m rate=30r/s;
#To make an exception, you have to provide empty value for a
#variable in limit_req_zone (see http://nginx.org/r/limit_req_zone).
#Correct config for exceptions based geo would be (involving
#intermediate map as geo doesn't allow variables in a resulting
#value):
geo $limited {
default 1;
120.0.0.1/32 0;
}
map $limited $if_nonlocal_binary_remoteaddr {
1 $binary_remote_addr;
0 "";
}
limit_req_zone $if_nonlocal_binary_remoteaddr zone=antiddosspider:1m rate=2r/m;
limit_req_zone $if_nonlocal_binary_remoteaddr zone=antiddosphp:1m rate=7r/s; #was 3r/s until 20 gen 2015
limit_req_zone $if_nonlocal_binary_remoteaddr zone=antiddosstatic:1m rate=500r/s; #was 30r/s until 20 gen 2015
# limit_req_zone $limit zone=foo:1m rate=10r/m;
# limit_req zone=foo burst=5;
#As you can see from the above config, limit_req_zone now works
#based on a $limit variable, which is either client address, or an
#empty string. In a latter case client isn't limited.
include /etc/nginx/conf.d/*.conf;
include /etc/nginx/sites-enabled/*;
#pagespeed
#pagespeed Directives that can only be set globally:
#Global scope in Apache, http block in Nginx.
#DangerPermitFetchFromUnknownHosts
#default fetcher psol serp (threaded, native is evented)
pagespeed FileCachePath /home/pagespeed_cache/others;
pagespeed UseNativeFetcher on; #experimental
resolver 8.8.8.8;
#pagespeed FetchWithGzip on;
#pagespeed FetcherTimeoutMs 30000; #millis = 30s
#InheritVHostConfig
#MessageBufferSize
#NumRewriteThreads
#NumExpensiveRewriteThreads
#UsePerVHostStatistics
#end pagespeed see: https://developers.google.com/speed/docs/mod_pagespeed/configuration
}
# mail {
# # See sample authentication script at:
# # http://wiki.nginx.org/NginxImapAuthenticateWithApachePhpScript
#
# # auth_http localhost/auth.php;
# # pop3_capabilities "TOP" "USER";
# # imap_capabilities "IMAP4rev1" "UIDPLUS";
#
# server {
# listen localhost:110;
# protocol pop3;
# proxy on;
# }
#
# server {
# listen localhost:143;
# protocol imap;
# proxy on;
# }
# }
|
Thanks. The log contains 18
I can now consistently reproduce the core dump by setting up a custom 404 location like:
And requesting a pagespeed resource I know doesn't exist:
I think this was triggered by 3efebb7 ,which forwards handling of error responses (like 404) for There's also an internal redirect involved in the reproduction case, so I'm going to dig some more to make sure I understand/fix the root cause. |
- Fix crasher on .pagespeed. resources that return 404 - Additionally, check for a wiped request context and make sure we do not dereference a null pointer, which is what hurt in the flow we hit above. - Also, do not check fail when we would receive a stale event originating from a NgxBaseFetch that is no longer associated with our request context. This is not the final change yet, as I want to back this with tests where possible, and log warning when appropriate. For #1081
@creativeprogramming #1085 should fix this problem. |
@oschaaf many thanks, I compiled your |
I confirm, with #1085 all is running fine, no crashes, no errors in log. |
@creativeprogramming Great, thanks for confirming. |
Re-opening, we will close this issue when the fix is reviewed and merged. Thanks! |
- Fix nginx-side flow so we handle .pagespeed. resources ok when they will land on a customized 404 internal location. - Additionally, check for a wiped request context and make sure we do not dereference a null pointer, which is what hurt in the flow we entered above as the IPRO lookup still was generating events while the nginx side request context was gone. - Also, as a preliminary measure, do not check fail when we receive a stale event originating from a NgxBaseFetch that is no longer associated with our request context. Do log a warning so we'll hear about this happening either through system test failures or a bug report. Fixes #1081
fix is merged |
- Fix nginx-side flow so we handle .pagespeed. resources ok when they will land on a customized 404 internal location. - Additionally, check for a wiped request context and make sure we do not dereference a null pointer, which is what hurt in the flow we entered above as the IPRO lookup still was generating events while the nginx side request context was gone. - Also, as a preliminary measure, do not check fail when we receive a stale event originating from a NgxBaseFetch that is no longer associated with our request context. Do log a warning so we'll hear about this happening either through system test failures or a bug report. Fixes #1081
- Fix nginx-side flow so we handle .pagespeed. resources ok when they will land on a customized 404 internal location. - Additionally, check for a wiped request context and make sure we do not dereference a null pointer, which is what hurt in the flow we entered above as the IPRO lookup still was generating events while the nginx side request context was gone. - Also, as a preliminary measure, do not check fail when we receive a stale event originating from a NgxBaseFetch that is no longer associated with our request context. Do log a warning so we'll hear about this happening either through system test failures or a bug report. Fixes #1081
- Fix nginx-side flow so we handle .pagespeed. resources ok when they will land on a customized 404 internal location. - Additionally, check for a wiped request context and make sure we do not dereference a null pointer, which is what hurt in the flow we entered above as the IPRO lookup still was generating events while the nginx side request context was gone. - Also, as a preliminary measure, do not check fail when we receive a stale event originating from a NgxBaseFetch that is no longer associated with our request context. Do log a warning so we'll hear about this happening either through system test failures or a bug report. Fixes #1081
I recently upgraded to 1.10.33.1, but I had to switch back to 1.9.32.10-7423 (last version i was using) due to following errors, that caused my nginx to freeze after few hours of high traffic (nginx daemon didn't died, but was unable to serve any request)
I'm using nginx 1.8.0 but i tried also to go with 1.9.9, with no luck, now I think something new in pagespeed is the issue for the signal 11 issue as I'm not getting it anymore after switching back to 1.9.32.10
The text was updated successfully, but these errors were encountered: