IPRO recorder crashes on corrupt brigades with multiple EOS buckets. #1191

mfiala · 2015-11-19T15:16:04Z

Hello,

We are running last mod pagespped on Ubuntu 14.04 LTS (Linux xhost 3.19.0-31-generic #36~14.04.1-Ubuntu SMP Thu Oct 8 10:21:08 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux), mod pagespeed version 1.9.32.10-7497 @8494, apache 2.4 (2.4.7-1ubuntu4.8) - mpm worker. We have tested binary stable and also beta version, both are crushing with segfault. So now I have compiled last version with symbols. Gdb output see bellow. Please, where could be the problem ?

Thanks

Modul configuration:

     ModPagespeed on
    ModPagespeedInheritVHostConfig on
    AddOutputFilterByType MOD_PAGESPEED_OUTPUT_FILTER text/html
    ModPagespeedFileCachePath            "/var/cache/mod_pagespeed/"
    ModPagespeedLogDir "/var/log/pagespeed"
    ModPagespeedSslCertDirectory "/etc/ssl/certs"
    ModPagespeedFileCacheInodeLimit        500000
    ModPagespeedStatistics off
    ModPagespeedRateLimitBackgroundFetches off
    <Location /pagespeed_admin>
        Order allow,deny
        Allow from localhost
        Allow from 127.0.0.1
        SetHandler pagespeed_admin
    </Location>
    <Location /pagespeed_global_admin>
        Order allow,deny
        Allow from localhost
        Allow from 127.0.0.1
        SetHandler pagespeed_global_admin
    </Location>
    ModPagespeedStatisticsLogging on
    ModPagespeedMessageBufferSize 100000

in vhost, there is local configuration

  ModPagespeed on
  ModPagespeedEnableFilters combine_javascript
  ModPagespeedLoadFromFileMatch ...

Gdb output:

Core was generated by `/usr/sbin/apache2 -k start'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007ff255780e09 in net_instaweb::Variable::Add (this=0xdeadbeefdeadbeef, non_negative_delta=1) at ./pagespeed/kernel/base/statistics.h:56
56      return AddHelper(non_negative_delta);
(gdb) bt
#0  0x00007ff255780e09 in net_instaweb::Variable::Add (this=0xdeadbeefdeadbeef, non_negative_delta=1) at ./pagespeed/kernel/base/statistics.h:56
#1  0x00007ff255c73f3c in net_instaweb::InPlaceResourceRecorder::DoneAndSetHeaders (this=0x7ff1e4188868, response_headers=0x7ff242fec920, 
    entire_response_received=false) at net/instaweb/system/in_place_resource_recorder.cc:230
#2  0x00007ff255795b64 in net_instaweb::(anonymous namespace)::instaweb_in_place_check_headers_filter (filter=0x7ff2503f46c0, bb=0x7ff25a2a4830)
    at net/instaweb/apache/mod_instaweb.cc:823
#3  0x00007ff25a3af5dc in ap_content_length_filter (f=0x7ff25a2a3520, b=0x7ff25a2a4830) at protocol.c:1403
#4  0x00007ff25a3d811e in ap_send_error_response (r=0x7ff2503fa028, recursive_error=0) at http_protocol.c:1526
#5  0x00007ff25885de42 in action_handler (r=0x7ff2503f4718) at mod_actions.c:205
#6  0x00007ff25a3c3be0 in ap_run_handler (r=0x7ff2503f4718) at config.c:169
#7  0x00007ff25a3c4129 in ap_invoke_handler (r=r@entry=0x7ff2503f4718) at config.c:439
#8  0x00007ff25a3d918c in ap_internal_redirect (new_uri=<optimized out>, r=<optimized out>) at http_request.c:648
#9  0x00007ff254abacfc in handler_redirect (r=0x7ff25a2a20a0) at mod_rewrite.c:5063
#10 0x00007ff25a3c3be0 in ap_run_handler (r=0x7ff25a2a20a0) at config.c:169
#11 0x00007ff25a3c4129 in ap_invoke_handler (r=r@entry=0x7ff25a2a20a0) at config.c:439
#12 0x00007ff25a3d96ca in ap_process_async_request (r=r@entry=0x7ff25a2a20a0) at http_request.c:317
#13 0x00007ff25a3d99a4 in ap_process_request (r=r@entry=0x7ff25a2a20a0) at http_request.c:363
#14 0x00007ff25a3d6442 in ap_process_http_sync_connection (c=0x7ff25a2b02c8) at http_core.c:190
#15 ap_process_http_connection (c=0x7ff25a2b02c8) at http_core.c:231
#16 0x00007ff25a3cd220 in ap_run_process_connection (c=0x7ff25a2b02c8) at connection.c:41
#17 0x00007ff25a3cd608 in ap_process_connection (c=c@entry=0x7ff25a2b02c8, csd=csd@entry=0x7ff25a2b00b0) at connection.c:202
#18 0x00007ff2567e6293 in process_socket (bucket_alloc=0x7ff25a2a6028, my_thread_num=18, my_child_num=1, sock=0x7ff25a2b00b0, p=0x7ff25a2b0028, 
    thd=0x7ff25a5539a8) at worker.c:619
#19 worker_thread (thd=0x7ff25a5539a8, dummy=<optimized out>) at worker.c:978
#20 0x00007ff25989c182 in start_thread (arg=0x7ff242fed700) at pthread_create.c:312
#21 0x00007ff2595c947d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

The text was updated successfully, but these errors were encountered:

jmarantz · 2015-11-19T15:32:03Z

Hi thanks for the detailed traceback. Could you correlate this failure
against a request in your access.log and share with us the resource that
was fetched, plus your pagespeed configuration?

One other thing, can you tell us what other modules you have installed?

This looks like something we should be able to reproduce and fix. I just
glanced at the code in question and I see it's not coded defensively
against a corrupt apache bucket brigade.

On Thu, Nov 19, 2015 at 10:16 AM, mfiala notifications@github.com wrote:

Hello,

We are running last mod pagespped on Ubuntu 14.04 LTS (Linux xhost
3.19.0-31-generic #36
#36 SMP
Thu Oct 8 10:21:08 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux), mod pagespeed
version 1.9.32.10-7497 @8494, apache 2.4 (2.4.7-1ubuntu4.8) - mpm worker.
We have tested binary stable and also beta version, both are crushing with
segfault. So now I have compiled last version with symbols. Gdb output see
bellow. Please, where could be the problem ?

Thanks

Modul configuration:

ModPagespeed on
ModPagespeedInheritVHostConfig on
AddOutputFilterByType MOD_PAGESPEED_OUTPUT_FILTER text/html
ModPagespeedFileCachePath "/var/cache/mod_pagespeed/"
ModPagespeedLogDir "/var/log/pagespeed"
ModPagespeedSslCertDirectory "/etc/ssl/certs"
ModPagespeedFileCacheInodeLimit 500000
ModPagespeedStatistics off
ModPagespeedRateLimitBackgroundFetches off
<Location /pagespeed_admin>
Order allow,deny
Allow from localhost
Allow from 127.0.0.1
SetHandler pagespeed_admin

<Location /pagespeed_global_admin>
Order allow,deny
Allow from localhost
Allow from 127.0.0.1
SetHandler pagespeed_global_admin

ModPagespeedStatisticsLogging on
ModPagespeedMessageBufferSize 100000

in vhost, there is local configuration
ModPagespeed on
ModPagespeedEnableFilters combine_javascript
ModPagespeedLoadFromFileMatch ...

Gdb output:

Core was generated by `/usr/sbin/apache2 -k start'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007ff255780e09 in net_instaweb::Variable::Add
(this=0xdeadbeefdeadbeef, non_negative_delta=1) at
./pagespeed/kernel/base/statistics.h:56
56 return AddHelper(non_negative_delta);
(gdb) bt
#0 0x00007ff255780e09 in net_instaweb::Variable::Add
(this=0xdeadbeefdeadbeef, non_negative_delta=1) at
./pagespeed/kernel/base/statistics.h:56
#1 #1
0x00007ff255c73f3c in
net_instaweb::InPlaceResourceRecorder::DoneAndSetHeaders
(this=0x7ff1e4188868, response_headers=0x7ff242fec920,
entire_response_received=false) at
net/instaweb/system/in_place_resource_recorder.cc:230
#2 #2
0x00007ff255795b64 in net_instaweb::(anonymous
namespace)::instaweb_in_place_check_headers_filter (filter=0x7ff2503f46c0,
bb=0x7ff25a2a4830)
at net/instaweb/apache/mod_instaweb.cc:823
#3 #3
0x00007ff25a3af5dc in ap_content_length_filter (f=0x7ff25a2a3520,
b=0x7ff25a2a4830) at protocol.c:1403
#4 #4
0x00007ff25a3d811e in ap_send_error_response (r=0x7ff2503fa028,
recursive_error=0) at http_protocol.c:1526
#5 #5
0x00007ff25885de42 in action_handler (r=0x7ff2503f4718) at mod_actions.c:205
#6 #6
0x00007ff25a3c3be0 in ap_run_handler (r=0x7ff2503f4718) at config.c:169
#7 #7
0x00007ff25a3c4129 in ap_invoke_handler (r=r@entry=0x7ff2503f4718) at
config.c:439
#8 #8
0x00007ff25a3d918c in ap_internal_redirect (new_uri=, r=) at
http_request.c:648
#9 #9
0x00007ff254abacfc in handler_redirect (r=0x7ff25a2a20a0) at
mod_rewrite.c:5063
#10 #10
0x00007ff25a3c3be0 in ap_run_handler (r=0x7ff25a2a20a0) at config.c:169
#11 #11
0x00007ff25a3c4129 in ap_invoke_handler (r=r@entry=0x7ff25a2a20a0) at
config.c:439
#12 #12
0x00007ff25a3d96ca in ap_process_async_request (r=r@entry=0x7ff25a2a20a0)
at http_request.c:317
#13 #13
0x00007ff25a3d99a4 in ap_process_request (r=r@entry=0x7ff25a2a20a0) at
http_request.c:363
#14 #14
0x00007ff25a3d6442 in ap_process_http_sync_connection (c=0x7ff25a2b02c8) at
http_core.c:190
#15 #15
ap_process_http_connection (c=0x7ff25a2b02c8) at http_core.c:231
#16 #16
0x00007ff25a3cd220 in ap_run_process_connection (c=0x7ff25a2b02c8) at
connection.c:41
#17 #17
0x00007ff25a3cd608 in ap_process_connection (c=c@entry=0x7ff25a2b02c8,
csd=csd@entry=0x7ff25a2b00b0) at connection.c:202
#18 #18
0x00007ff2567e6293 in process_socket (bucket_alloc=0x7ff25a2a6028,
my_thread_num=18, my_child_num=1, sock=0x7ff25a2b00b0, p=0x7ff25a2b0028,
thd=0x7ff25a5539a8) at worker.c:619
#19 #19 worker_thread
(thd=0x7ff25a5539a8, dummy=) at worker.c:978
#20 #20
0x00007ff25989c182 in start_thread (arg=0x7ff242fed700) at
pthread_create.c:312
#21 #21
0x00007ff2595c947d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:111

—
Reply to this email directly or view it on GitHub
#1191.

mfiala · 2015-11-20T08:36:43Z

Hi,

I have created testing virtual host in apache. I have tested vhost via benchmark tool siege:

 siege -v -i -d 1 -c 8 -t 10s -f urls.txt

urls.txt see attachment
urls.txt
I have tested mpm worker and also mpm prefork, the problem exists in both mpm.
There is no problem during running siege test, but after the test a segfault appears (when number of apache threads/processes decreases). The problem does not appear everytime I am running the test. I can say, when I repeat the test 10 times, then the problem appears in one of the run.

Loaded apache modules:

  core.c, http_core.c, mod_access_compat.c, mod_actions.c, mod_alias.c,
  mod_auth_basic.c, mod_authn_core.c, mod_authn_file.c, mod_authz_core.c,
  mod_authz_host.c, mod_authz_user.c, mod_autoindex.c, mod_deflate.c,
  mod_dir.c, mod_env.c, mod_fastcgi.c, mod_filter.c, mod_info.c,
  mod_instaweb.cc, mod_log_config.c, mod_logio.c, mod_mime.c,
  mod_negotiation.c, mod_rewrite.c, mod_setenvif.c, mod_so.c, mod_status.c,
  mod_unixd.c, mod_version.c, mod_watchdog.c, worker.c,

Server Apache Settings

  Server Version: Apache/2.4.7 (Ubuntu) mod_fastcgi/mod_fastcgi-SNAP-0910052141
  Server Built: Oct 14 2015 14:20:21
  Server loaded APR Version: 1.5.1-dev
  Compiled with APR Version: 1.5.1-dev
  Server loaded APU Version: 1.5.3
  Compiled with APU Version: 1.5.3
  Module Magic Number: 20120211:27
  Hostname/port: localhost:80
  Timeouts: connection: 300    keep-alive: 5
  MPM Name: worker
  MPM Information: Max Daemons: 1 Threaded: yes Forked: yes
  Server Architecture: 64-bit
  Server Root: /etc/apache2
  Config File: /etc/apache2/apache2.conf
  Server Built With: -D APR_HAS_SENDFILE -D APR_HAS_MMAP -D APR_HAVE_IPV6
  (IPv4-mapped addresses enabled) -D APR_USE_SYSVSEM_SERIALIZE -
  D SINGLE_LISTEN_UNSERIALIZED_ACCEPT -D APR_HAS_OTHER_CHILD -
  D AP_HAVE_RELIABLE_PIPED_LOGS -D HTTPD_ROOT="/etc/apache2" -D SUEXEC_BIN="/
  usr/lib/apache2/suexec" -D DEFAULT_PIDLOG="/var/run/apache2.pid" -
  D DEFAULT_SCOREBOARD="logs/apache_runtime_status" -D DEFAULT_ERRORLOG="logs/
  error_log" -D AP_TYPES_CONFIG_FILE="mime.types" -
  D SERVER_CONFIG_FILE="apache2.conf"

Modul configuration (global)

    <IfModule pagespeed_module>
    ModPagespeed on
    ModPagespeedInheritVHostConfig on
    AddOutputFilterByType MOD_PAGESPEED_OUTPUT_FILTER text/html
    ModPagespeedFileCachePath            "/var/cache/mod_pagespeed/"
    ModPagespeedLogDir "/var/log/pagespeed"
    ModPagespeedSslCertDirectory "/etc/ssl/certs"
    ModPagespeedFileCacheInodeLimit        500000
    ModPagespeedStatistics off
    ModPagespeedRateLimitBackgroundFetches off
    <Location /pagespeed_admin>
        Order allow,deny
        Allow from localhost
        Allow from 127.0.0.1
        SetHandler pagespeed_admin
    </Location>
    <Location /pagespeed_global_admin>
        Order allow,deny
        Allow from localhost
        Allow from 127.0.0.1
        SetHandler pagespeed_global_admin
    </Location>
    ModPagespeedStatisticsLogging on
    ModPagespeedMessageBufferSize 100000
    </IfModule>

Modul configuraion (testing vhost, other vhosts have  ModPagespeed unplugged)

    ModPagespeed on
    ModPagespeedEnableFilters combine_javascript
    ModPagespeedLoadFromFileMatch "^https?://(amplion.centrum.cz|econ.slevydnes.cz)/(ads|adSeznam|cache|cache_web|css|gfx|img|js|mailing|newsletters|soutez)/" "/sdb1/www/slevydnes.cz/www/\\1/"

jmarantz · 2015-11-20T13:32:51Z

Thanks for the config info and url list. It looks like the only non-default module you have besides pagespeed is fastcgi; is that right?

The stack-trace you gave me points to a bug in the code that is trivial to fix, and I have attached a source-patch that you could apply and rebuild. The reason that problem hasn't showed up in the past is, I think, because there has to be some other bug upstream of PageSpeed to trigger it. Specifically, apache transmits response-bodies between filters as bucket-brigades, and I think for us to run into our bug we have to get a corrupt brigade (two EOS-markers) from an upstream filter. It might be sufficient to just fix the problem in mod_pagespeed, but it would be better to fix the problem at the source, if we can figure out what it is.

That's why I was hoping you could correlate the crash against your access log, but that might be hard. But if you can, please apply this patch and see if it solves the problem:

diff --git a/pagespeed/apache/mod_instaweb.cc b/pagespeed/apache/mod_instaweb.cc
index c9d6686..b697558 100644
--- a/pagespeed/apache/mod_instaweb.cc
+++ b/pagespeed/apache/mod_instaweb.cc
@@ -828,6 +828,8 @@ apr_status_t instaweb_in_place_check_headers_filter(ap_filter_t* filter,
       // Deletes recorder
       recorder->DoneAndSetHeaders(&response_headers,
                                   !request->connection->aborted);
+      filter->ctx = NULL;
+      break;
     }
   }

jmarantz · 2015-11-20T13:35:12Z

That patch got garbled when I pasted it in. Attaching it instead:

patch.txt

delete the recorder. This is an attempt to address the stack-trace reported in #1191

jmarantz · 2015-11-20T21:45:37Z

I checked in a potential fix to trunk here, which is slightly different than the proposed patch above: 4846786

But if you can, I'd love for you to try it, preferably using this patch instead (identical to the commit):

patch2.txt

If this fixes the problem we should consider doing a 1.9 patch to resolve, however we will soon release 1.10 which will have this fix in it.

mfiala · 2015-11-20T21:56:19Z

Hi Joshua,
thank you, I will work on it on Monday.
Have a nice weekend.

mfiala · 2015-11-23T12:42:37Z

Hi,

I was not able to reproduce the problem, it looks that your patch has fixed the problem. Thank you Joshua. We will use the modified version of pagespeed in production. If something goes wrong, I will contact you. Are you going to add this patch to next version of pagespeed? if so, when?

Regards

Michal

jeffkaufman · 2015-12-08T22:10:37Z

This patch will be in 1.10, which should be out soon.

jmarantz added a commit that referenced this issue Nov 20, 2015

Break out of bucket-loop in ipro-data-recorder after we hit EOS and

4846786

delete the recorder. This is an attempt to address the stack-trace reported in #1191

jeffkaufman closed this as completed Dec 8, 2015

jeffkaufman changed the title ~~Segmentation fault on Ubuntu 14.04 LTS~~ IPRO recorder crashes on corrupt brigades with multiple EOS buckets. Dec 8, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IPRO recorder crashes on corrupt brigades with multiple EOS buckets. #1191

IPRO recorder crashes on corrupt brigades with multiple EOS buckets. #1191

mfiala commented Nov 19, 2015

jmarantz commented Nov 19, 2015

mfiala commented Nov 20, 2015

jmarantz commented Nov 20, 2015

jmarantz commented Nov 20, 2015

jmarantz commented Nov 20, 2015

mfiala commented Nov 20, 2015

mfiala commented Nov 23, 2015

jeffkaufman commented Dec 8, 2015

IPRO recorder crashes on corrupt brigades with multiple EOS buckets. #1191

IPRO recorder crashes on corrupt brigades with multiple EOS buckets. #1191

Comments

mfiala commented Nov 19, 2015

jmarantz commented Nov 19, 2015

mfiala commented Nov 20, 2015

jmarantz commented Nov 20, 2015

jmarantz commented Nov 20, 2015

jmarantz commented Nov 20, 2015

mfiala commented Nov 20, 2015

mfiala commented Nov 23, 2015

jeffkaufman commented Dec 8, 2015