Skip to content
This repository has been archived by the owner on Apr 21, 2023. It is now read-only.

mod_pagespeed uses too many threads -- in proportion to the number of vhosts #334

Closed
GoogleCodeExporter opened this issue Apr 6, 2015 · 21 comments

Comments

@GoogleCodeExporter
Copy link

mod_pagespeed crashes while stress testing using JMeter with 100 threads in the 
first 30 seconds. I increased the ULIMIT_MAX_FILES for user "nobody" but still 
not progress. 

Below are lines from the apache error log -
[Mon Aug 15 14:35:50 2011] [alert] [mod_pagespeed 0.9.17.7-716 @20399] 
[0815/143550:FATAL:net/instaweb/apache/serf_url_async_fetcher.cc(466)] Check 
failed: 0 == apr_thread_create(&thread_id_, __null, SerfThreadFn, this, pool_) 
(0 vs. 11)\nBacktrace:\n\t/etc/httpd/modules/mod_pagespeed.so(+0x36b1a) 
[0x7f66c5bd0b1a]\n\t/etc/httpd/modules/mod_pagespeed.so(+0x24efa) 
[0x7f66c5bbeefa]\n\t/etc/httpd/modules/mod_pagespeed.so(+0x38b6d) 
[0x7f66c5bd2b6d]\n\t/etc/httpd/modules/mod_pagespeed.so(+0x34982) 
[0x7f66c5bce982]\n\t/etc/httpd/modules/mod_pagespeed.so(+0x2b0e7) 
[0x7f66c5bc50e7]\n\t/etc/httpd/modules/mod_pagespeed.so(+0x9b33b) 
[0x7f66c5c3533b]\n\t/etc/httpd/modules/mod_pagespeed.so(+0x9a3d5) 
[0x7f66c5c343d5]\n\t/etc/httpd/modules/mod_pagespeed.so(+0x9a4fc) 
[0x7f66c5c344fc]\n\t/etc/httpd/modules/mod_pagespeed.so(+0x23b24) 
[0x7f66c5bbdb24]\n\t/usr/sbin/httpd(ap_run_translate_name+0x68) 
[0x7f66d1f8a5e8]\n\t/usr/sbin/httpd(ap_process_request_internal+0x126) 
[0x7f66d1f8c7c6]\n\t/usr/sbin/httpd(ap_process_request+0x1b0) 
[0x7f66d1f9e790]\n\t/usr/sbin/httpd(+0x37668) 
[0x7f66d1f9b668]\n\t/usr/sbin/httpd(ap_run_process_connection+0x68) 
[0x7f66d1f97398]\n\t/usr/sbin/httpd(+0x3f097) 
[0x7f66d1fa3097]\n\t/usr/sbin/httpd(+0x3f3aa) 
[0x7f66d1fa33aa]\n\t/usr/sbin/httpd(ap_mpm_run+0xc1c) 
[0x7f66d1fa402c]\n\t/usr/sbin/httpd(main+0xb30) 
[0x7f66d1f7b840]\n\t/lib64/libc.so.6(__libc_start_main+0xfd) 
[0x7f66d04d3c5d]\n\t/usr/sbin/httpd(+0x16809) [0x7f66d1f7a809]
[Mon Aug 15 14:35:50 2011] [notice] child pid 20392 exit signal Aborted (6)
[Mon Aug 15 14:35:50 2011] [notice] child pid 20393 exit signal Aborted (6)
[Mon Aug 15 14:35:50 2011] [notice] child pid 20394 exit signal Aborted (6)
[Mon Aug 15 14:35:50 2011] [notice] child pid 20396 exit signal Aborted (6)
[Mon Aug 15 14:35:50 2011] [notice] child pid 20399 exit signal Aborted (6)
[Mon Aug 15 14:35:50 2011] [alert] [mod_pagespeed 0.9.17.7-716 @20397] 
[0815/143550:FATAL:net/instaweb/apache/serf_url_async_fetcher.cc(466)] Check 
failed: 0 == apr_thread_create(&thread_id_, __null, SerfThreadFn, this, pool_) 
(0 vs. 11)\nBacktrace:\n\t/etc/httpd/modules/mod_pagespeed.so(+0x36b1a) 
[0x7f66c5bd0b1a]\n\t/etc/httpd/modules/mod_pagespeed.so(+0x24efa) 
[0x7f66c5bbeefa]\n\t/etc/httpd/modules/mod_pagespeed.so(+0x38b6d) 
[0x7f66c5bd2b6d]\n\t/etc/httpd/modules/mod_pagespeed.so(+0x34982) 
[0x7f66c5bce982]\n\t/etc/httpd/modules/mod_pagespeed.so(+0x2b0e7) 
[0x7f66c5bc50e7]\n\t/etc/httpd/modules/mod_pagespeed.so(+0x9b33b) 
[0x7f66c5c3533b]\n\t/etc/httpd/modules/mod_pagespeed.so(+0x9a3d5) 
[0x7f66c5c343d5]\n\t/etc/httpd/modules/mod_pagespeed.so(+0x9a4fc) 
[0x7f66c5c344fc]\n\t/etc/httpd/modules/mod_pagespeed.so(+0x23b24) 
[0x7f66c5bbdb24]\n\t/usr/sbin/httpd(ap_run_translate_name+0x68) 
[0x7f66d1f8a5e8]\n\t/usr/sbin/httpd(ap_process_request_internal+0x126) 
[0x7f66d1f8c7c6]\n\t/usr/sbin/httpd(ap_process_request+0x1b0) 
[0x7f66d1f9e790]\n\t/usr/sbin/httpd(+0x37668) 
[0x7f66d1f9b668]\n\t/usr/sbin/httpd(ap_run_process_connection+0x68) 
[0x7f66d1f97398]\n\t/usr/sbin/httpd(+0x3f097) 
[0x7f66d1fa3097]\n\t/usr/sbin/httpd(+0x3f3aa) 
[0x7f66d1fa33aa]\n\t/usr/sbin/httpd(ap_mpm_run+0xc1c) 
[0x7f66d1fa402c]\n\t/usr/sbin/httpd(main+0xb30) 
[0x7f66d1f7b840]\n\t/lib64/libc.so.6(__libc_start_main+0xfd) 
[0x7f66d04d3c5d]\n\t/usr/sbin/httpd(+0x16809) [0x7f66d1f7a809]

More Information
1. X-Mod-Pagespeed  0.9.17.7-716
2. Apache version 2.2.15
3. OS: Redhat Enterprise 6 running Linux 2.6.x (64 bit)




Original issue reported on code.google.com by haris...@gmail.com on 15 Aug 2011 at 9:42

@GoogleCodeExporter
Copy link
Author

Can you attach your jmeter config files?

Also, we've released 0.9.18.7.  Could you upgrade & try with that?  It looks 
like you are running out of threads on your machine though -- we're failing 
with apr_thread_create.

We only create 1 serf thread per process, so it seems likely that the system is 
running out of resources due to Apache spawning large numbers of child 
processes.  What do you have your child-limit set to in httpd.conf?

Original comment by jmara...@google.com on 15 Aug 2011 at 9:48

@GoogleCodeExporter
Copy link
Author

Thanks for the quick response. Upgraded mod_pagespeed still no progress. 
Attached are the JMeter configuration and  error log. MaxClients is set to 256. 
Thanks.

Original comment by haris...@gmail.com on 15 Aug 2011 at 11:16

Attachments:

@GoogleCodeExporter
Copy link
Author

Can you try cutting max clients in half a few times?  By the time you get to 20 
it should work but it would be good to learn the cutoff.

This is probably not a load issue for mod_pagespeed as much as it is a poorly 
handled and reported resource limit.


Original comment by jmara...@google.com on 15 Aug 2011 at 11:25

@GoogleCodeExporter
Copy link
Author

Reducing MaxClients from 256 to 20 sure works but it limits # users visiting 
the website and results in poor response time. Secondly, MaxClient value of 20 
sound low. Is there any common ground to make mod_pagespeed work with out 
crashing?

Original comment by haris...@gmail.com on 15 Aug 2011 at 11:53

@GoogleCodeExporter
Copy link
Author

Somewhere in between 20 and 256 is the real limit.  Can you try zeroing in via 
binary search?

It's also quite possible we can resolve the issue with some sort of system 
limitation on thread-count.  E.g. see

   http://stackoverflow.com/questions/344203/maximum-number-of-threads-per-process-in-linux

It has some suggestions for bumping up thread-count limits but I'm not sure if 
they are applicable in this case.  mod_pagespeed should only create one or two 
worker threads per child process...hmm....now I'm not recalling whether it 
might create a distinct worker thread or two for each vhost in each process.  
That might be expensive...

Do you have a lot of VirtualHosts?

By the way are you using pre-fork MPM or some other MPM?

Finally, what sort of response-times do you measure with 256 child processes vs 
20?  Are you doing a lot of server-side computation for each request?



Original comment by jmara...@google.com on 16 Aug 2011 at 12:06

@GoogleCodeExporter
Copy link
Author

Seems like the real limit is less than 128. This just won't work. I would like 
to get a  response time of 400ms (current) or better with mod_pagespeed. 
There's aren't much computation on the severs side and the current apache 
configuration uses prefork

Original comment by haris...@gmail.com on 16 Aug 2011 at 1:02

@GoogleCodeExporter
Copy link
Author

How many VirtualHosts do you have?

Are you using KeepAlive?

Original comment by jmara...@google.com on 16 Aug 2011 at 1:16

@GoogleCodeExporter
Copy link
Author

Also, did you look at that stackoverflow page about increasing the thread-limit 
on your OS?

setrlimit or that magic 'cat' might help.

Original comment by jmara...@google.com on 16 Aug 2011 at 1:24

@GoogleCodeExporter
Copy link
Author

I have 6 VirtualHosts & KeepAlive is On. I increased 
/proc/sys/kernel/threads-max to 100000 & my stack size to 64 MIB still 
mod_pagespeed crashes.   Do you think this could be due to resources allocated 
to user "nobody"? - If so, how can I change?

[Tue Aug 16 07:24:55 2011] [alert] [mod_pagespeed 0.9.18.7-900 @8028] 
[0816/072455:FATAL:net/instaweb/apache/serf_url_async_fetcher.cc(470)] Check 
failed: 0 == apr_thread_create(&thread_id_, __null, SerfThreadFn, this, pool_) 
(0 vs. 11)\nBacktrace:\n\t/etc/httpd/modules/mod_pagespeed.so(+0x3aeba) 
[0x7f75ebf53eba]\n\t/etc/httpd/modules/mod_pagespeed.so(+0x28bfa) 
[0x7f75ebf41bfa]\n\t/etc/httpd/modules/mod_pagespeed.so(+0x3cf0d) 
[0x7f75ebf55f0d]\n\t/etc/httpd/modules/mod_pagespeed.so(+0x38d12) 
[0x7f75ebf51d12]\n\t/etc/httpd/modules/mod_pagespeed.so(+0x2eda7) 
[0x7f75ebf47da7]\n\t/etc/httpd/modules/mod_pagespeed.so(+0xa3abb) 
[0x7f75ebfbcabb]\n\t/etc/httpd/modules/mod_pagespeed.so(+0xa1c79) 
[0x7f75ebfbac79]\n\t/etc/httpd/modules/mod_pagespeed.so(+0x277d1) 
[0x7f75ebf407d1]\n\t/usr/sbin/httpd(ap_run_translate_name+0x68) 
[0x7f75f833b5e8]\n\t/usr/sbin/httpd(ap_process_request_internal+0x126) 
[0x7f75f833d7c6]\n\t/usr/sbin/httpd(ap_process_request+0x1b0) 
[0x7f75f834f790]\n\t/usr/sbin/httpd(+0x37668) 
[0x7f75f834c668]\n\t/usr/sbin/httpd(ap_run_process_connection+0x68) 
[0x7f75f8348398]\n\t/usr/sbin/httpd(+0x3f097) 
[0x7f75f8354097]\n\t/usr/sbin/httpd(+0x3f3aa) 
[0x7f75f83543aa]\n\t/usr/sbin/httpd(ap_mpm_run+0xc1c) 
[0x7f75f835502c]\n\t/usr/sbin/httpd(main+0xb30) 
[0x7f75f832c840]\n\t/lib64/libc.so.6(__libc_start_main+0xfd) 
[0x7f75f6884c5d]\n\t/usr/sbin/httpd(+0x16809) [0x7f75f832b809]
[Tue Aug 16 07:24:55 2011] [emerg] (22)Invalid argument: couldn't grab the 
accept mutex
[Tue Aug 16 07:24:55 2011] [emerg] (22)Invalid argument: couldn't grab the 
accept mutex
[Tue Aug 16 07:24:55 2011] [emerg] (22)Invalid argument: couldn't grab the 
accept mutex



Original comment by haris...@gmail.com on 16 Aug 2011 at 2:56

@GoogleCodeExporter
Copy link
Author

Summary was: mod_pagespeed crashes while stress testing using JMeter with 100 
threads in the first 30 seconds

First of all I want to acknowledge we have a real problem here: we use too many 
threads.  Specifically we are using a "serf thread" per vhost per process.  
That adds up, and we can fix this, but it won't be done instantly.

To make matters worse I suggested you upgrade to 0.9.18.7 and in fact I think 
that we might allocate yet another thread per vhost in that version, plus 
another thread per process (for cache cleaning).  Sorry about that.



Now I'm thinking of tweaks you can make to your systems.  6 vhosts * 256 
processes * 2 threads per vhost + 256 cache-cleaning threads = 3328 threads.  
This is a lot but your system should be able to handle it if the settings are 
tuned.

I'm wondering if you went the wrong direction with the stack-size.  Can you 
make the stack size much smaller?  64Mb should not be necessary.  What was it 
before?  Can you make it something more like 200k?  That alone would be 700M 
just for thread stack size.


And again, I think we regressed a bit on thread-usage with 0.9.18.7 and we need 
to do some work on this in our system.  Stay tuned.

Original comment by jmara...@google.com on 16 Aug 2011 at 3:22

  • Changed title: mod_pagespeed uses too many threads -- in proportion to the number of vhosts

@GoogleCodeExporter
Copy link
Author

My original stack size was 10 MB which I changed to 64 MB. Now it is back to10 
MB. 

I have now increased /proc/sys/kernel/threads-max to 100000 - far more than 
33228!  Here is my updated error - uid 99 is 'nobody'.


Tue Aug 16 09:25:57 2011] [alert] (11)Resource temporarily unavailable: setuid: 
unable to change to uid: 99
[Tue Aug 16 09:25:57 2011] [alert] (11)Resource temporarily unavailable: 
setuid: unable to change to uid: 99
[Tue Aug 16 09:25:57 2011] [alert] (11)Resource temporarily unavailable: 
setuid: unable to change to uid: 99
[Tue Aug 16 09:25:57 2011] [alert] (11)Resource temporarily unavailable: 
setuid: unable to change to uid: 99
[Tue Aug 16 09:25:57 2011] [error] [mod_pagespeed 0.9.18.7-900 @10784] Unable 
to start background work thread.
[Tue Aug 16 09:25:58 2011] [alert] Child 10768 returned a Fatal error... Apache 
is exiting!

Original comment by haris...@gmail.com on 16 Aug 2011 at 4:33

@GoogleCodeExporter
Copy link
Author

So I think we are needing about 3k threads.  at 10M we are going to reserve 30G 
of virtual memory just for stack space.

That's why I suggested 200k for stack size, but why don't you ratchet it down 
by cutting it in half until it works.  You might also take a look at httpd in 
'top' and see how much virtual memory is being consumed by each child process 
as it starts up.

Note that httpd will not allocate your max-child processes until your jmeter 
test prods it into doing so by sending it requests.  So to get a better view in 
'top' you might want to slow jmeter down a little.

You are probably not running out of thread-count, but our call to create a 
thread for doing http fetches is failing for another reason, likely memory 
exhaustion.

Original comment by jmara...@google.com on 16 Aug 2011 at 4:41

@GoogleCodeExporter
Copy link
Author

[deleted comment]

@GoogleCodeExporter
Copy link
Author

Thanks for explaining. I reduced the stack size from 10240 KB to 200 KB and 
mod_pagespeed still crashes! Any other options?

Original comment by haris...@gmail.com on 16 Aug 2011 at 7:54

@GoogleCodeExporter
Copy link
Author

All I can suggest is to keep tuning till you find the sweet spot.

I wish I knew how many threads you were actually creating on your system.  
Perhaps the "Tasks" count from "Top:" will be a good estimate, if you take that 
number's peak as apache/mod_pagespeed starts and compare it to the trough when 
apache is not running.

I would also like to know if you can take a look at the dynamics of the 
memory-usage as you ramp up jmeter request pressure.  BTW jmeter allows a slow 
ramp up and that might help you to look at the VIRT & RES size in TOP as it 
grows.  Even better might be to start up an xterm running "vmstat 1" before you 
start Apache.

Also try running "ps -o thcount" periodically in another xterm (e.g. in a bash 
loop with a sleep 1).


I'm going to start looking at a proper resolution to this.  My goal is to have 
no more than a small fixed number of threads per child process.  Right now I 
think it's the VirtualHost multiplier that's killing you.

Original comment by jmara...@google.com on 16 Aug 2011 at 8:05

  • Changed state: Started

@GoogleCodeExporter
Copy link
Author

I still can't find a sweet spot after much testing!  The tools mentioned 
earlier didn't provide much information either. mod_pagespeed crashed when 
#tasks exceeds 425. Appreciate your time and support. Looking forward the next 
release!.

Original comment by haris...@gmail.com on 16 Aug 2011 at 10:30

@GoogleCodeExporter
Copy link
Author

I think we will push forward on reducing our thread-usage but given that you 
are not able to affect your jmeter results at all by tweaking system settings 
concerns me.  I'm not sure we are fixing the right problem.

A couple of questions answered might help:

1. Did you check the user-specific usage limits, e.g.  what does 'ulimit -a' 
say? 

2. Your tasks exceeded "425" while jmeter was ramping up?  Is that right?  What 
was it when apache was not running?

3. How much memory & swap-space do you have configured on your machine?

4. Do you have any other interesting modules loaded in Apache?  It's possible 
some may be interacting with mod_pagespeed in an unexpected way.

Original comment by jmara...@google.com on 18 Aug 2011 at 4:56

@GoogleCodeExporter
Copy link
Author

I have found that using the Worker MPM even with just one VirtualHost results 
in a large number of threads being created.  This may help us repro this 
problem.  I found this due to Issue 337:

http://code.google.com/p/modpagespeed/issues/detail?id=337

Original comment by jmara...@google.com on 28 Aug 2011 at 6:05

@GoogleCodeExporter
Copy link
Author

The number of threads consumed under working should be fixed as of the release 
0.10.19.5.

Original comment by jmara...@google.com on 7 Feb 2012 at 3:22

  • Changed state: Fixed
  • Added labels: release-note

@GoogleCodeExporter
Copy link
Author

This was actually addressed in release 0.10.19.5, but was never release-noted.  
And it will be addressed better in 0.10.22.* due to 
http://code.google.com/p/modpagespeed/source/detail?r=1595

Original comment by jmara...@google.com on 23 May 2012 at 2:42

  • Added labels: Milestone-v21

@GoogleCodeExporter
Copy link
Author

Original comment by jmara...@google.com on 23 May 2012 at 2:42

  • Added labels: Milestone-v22
  • Removed labels: Milestone-v21

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant