Skip to content
This repository has been archived by the owner on Apr 21, 2023. It is now read-only.

Mod_pagespeed deadlock? #1662

Open
lanki567 opened this issue Sep 11, 2019 · 16 comments
Open

Mod_pagespeed deadlock? #1662

lanki567 opened this issue Sep 11, 2019 · 16 comments

Comments

@lanki567
Copy link

Our apache 2.4 is configured to use latest stable version of mod_pagespeed but unfortunately after sometime apache doesn't serve the response for all the requests configured to use pagespeed. As per the Mod_forensic log, Response was never sent for these requests. During this period, all url's configured with "ModPagespeedDisallow" works as expected. Even the url's configured to use pagespeed works as expected when the header/request parameter "ModPagespeed=off" is set. Thread dump shows several threads stuck at the address "0x00007f1417d94253". We are consistently able to reproduce this on all our RHEL 7.6 servers(kernel version is 3.10.0-957.21.3.el7.x86_64. )

addr2line -e mod_pagespeed_ap24.so 0x00007f1417d94253 ??:0
#0 0x00007f1427f42d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f1417d94253 in ?? () from /apache/modules/mod_pagespeed_ap24.so #2 0x00007f1417bfccfc in ?? () from /apache/modules/mod_pagespeed_ap24.so #3 0x00007f1417bfcd8e in ?? () from /apache/modules/mod_pagespeed_ap24.so #4 0x00007f1417acda2e in ?? () from /apache/modules/mod_pagespeed_ap24.so #5 0x00007f1417aa44e8 in ?? () from /apache/modules/mod_pagespeed_ap24.so #6 0x00007f1417aa626f in ?? () from /apache/modules/mod_pagespeed_ap24.so #7 0x00007f1426f24194 in filter_harness (f=0x7f12580c6e48, bb=0x7f12580617e0) at mod_filter.c:323 #8 0x00007f1418621d4d in ajp_process_callback (msg=<optimized out>, pmsg=0x292bfc8, ae=0x292bf50, r=0x7f12d47f2a60, l=0x2727470) at jk_ajp_common.c:2146 #9 0x00007f14186243ca in ajp_get_reply (e=<optimized out>, s=0x7f12d47f2a60, l=0x2727470, p=0x292bf50, op=0x7f12d47f1830) at jk_ajp_common.c:2310 #10 0x00007f1418628cb7 in ajp_service (e=<optimized out>, s=0x7f12d47f2a60, l=<optimized out>, is_error=<optimized out>) at jk_ajp_common.c:2678 #11 0x00007f1418611b4f in service (e=<optimized out>, s=<optimized out>, l=<optimized out>, is_error=0x7f12d47f2cbc) at jk_lb_worker.c:1418 #12 0x00007f14185fe121 in jk_handler (r=<optimized out>) at mod_jk.c:2896 #13 0x0000000000453960 in ap_run_handler (r=r@entry=0x7f1258069ce0) at config.c:170 #14 0x0000000000453ea9 in ap_invoke_handler (r=r@entry=0x7f1258069ce0) at config.c:444 #15 0x00000000004697ba in ap_process_async_request (r=0x7f1258069ce0) at http_request.c:453 #16 0x0000000000469a7e in ap_process_request (r=r@entry=0x7f1258069ce0) at http_request.c:488 #17 0x0000000000465c45 in ap_process_http_sync_connection (c=0x7f12ac021098) at http_core.c:210 #18 ap_process_http_connection (c=0x7f12ac021098) at http_core.c:251 #19 0x000000000045d480 in ap_run_process_connection (c=c@entry=0x7f12ac021098) at connection.c:42 #20 0x000000000045d9a8 in ap_process_connection (c=c@entry=0x7f12ac021098, csd=csd@entry=0x7f12ac020e80) at connection.c:219 #21 0x00000000004706f2 in process_socket (bucket_alloc=0x7f125801ef08, my_thread_num=13, my_child_num=7, sock=0x7f12ac020e80, p=0x7f12ac020df8, thd=0x2f481e8) at worker.c:479 #22 worker_thread (thd=0x2f481e8, dummy=<optimized out>) at worker.c:808 #23 0x00007f1427f3edd5 in start_thread () from /lib64/libpthread.so.0 #24 0x00007f1427a6402d in clone () from /lib64/libc.so.6

@oschaaf
Copy link
Member

oschaaf commented Sep 11, 2019

would you be willing to try and reproduce this with a debug build?
Hopefully that gives us a backtrace with names instead of addresses.
(and then it might be interesting to see backtraces of all the threads)

@lanki567
Copy link
Author

I will check internally if we are allowed to install the apache 2.4 debug build and update here. Does the same pagespeed rpm work or do we need to rebuild pagespeed from source?

@Lofesa
Copy link

Lofesa commented Sep 14, 2019

Hi @lanki567
I think you have to rebuild it. RPM are build as Release branch and you need a Debug branch. If you see the docs for rebuild pagespeed you will see a parameter BUILDTYPE set to Release, you ust change it to Debug

@lanki567
Copy link
Author

lanki567 commented Sep 24, 2019

Hi @Lofesa

We don't possess skillset on c++ or python. it is taking forever to fix the build issues. All branches are currently broken due to missing git submodules hosted on apache infra. Is there any debug build RPM available for centos 7.x?

@Lofesa
Copy link

Lofesa commented Sep 24, 2019

I don´t know if there is. But as stated in apache infra maybe the solution take too much time so maybe pagespeed module must change their deps to the github himself.
@oschaaf and @jmarantz what do you think?

@oschaaf
Copy link
Member

oschaaf commented Sep 24, 2019

@lanki567 you could try this to work around git.apache.org being down:

git config --global url.https://github.com/apache/.insteadOf https://git.apache.org/

@Lofesa
Updating the submodules to point them to github sounds lika a good idea to me, but what makes me hesitate is that I'm not sure if it's according to ASF policy to do so. Maybe ask around on the dev list about that?

@Lofesa
Copy link

Lofesa commented Sep 25, 2019

@oschaaf
As stated in this doc we should use the github url.
"....There will no longer be a git.apache.org URL for git mirrors, to lessen the confusion with gitbox.apache.org. Projects wishing to point to a git copy of their subversion repository should use their respective GitHub URLs. "

@oschaaf
Copy link
Member

oschaaf commented Sep 25, 2019

Oh wow, I did not see that coming; but yes, we should run replace git.apache.org with github.com/apache/ then

@lanki567
Copy link
Author

lanki567 commented Oct 2, 2019

We have executed the below command on the latest stable branch but the build fails for the BUILDTYPE=Debug however build is successful with BUILDTYPE=Release. Any help is much appreciated. Thanks in advance

Release build type generated binary for the following commands on RHEL 7

  • make BUILDTYPE=Release mod_pagespeed_test pagespeed_automatic_test
  • ./out/Release/mod_pagespeed_test
  • ./out/Release/pagespeed_automatic_test
  • make BUILDTYPE=Release

Debug build fails for the following command on RHEL 7

Please find the log below. Let me know if you need the complete log
make -d BUILDTYPE=Debug mod_pagespeed_test pagespeed_automatic_test

Partial Log
Rejecting impossible implicit prerequisitepagespeed/kernel/http/bot_checker.gperf.s'.
Trying pattern rule with stem bot_checker.gperf'. Rejecting impossible implicit prerequisite pagespeed/kernel/http/bot_checker.gperf.S'.
Trying pattern rule with stem bot_checker.gperf'. Rejecting impossible implicit prerequisite pagespeed/kernel/http/bot_checker.gperf.mod'.
Trying pattern rule with stem bot_checker.gperf'. Trying implicit prerequisite pagespeed/kernel/http/bot_checker.gperf.sh'.
Looking for a rule with intermediate file pagespeed/kernel/http/bot_checker.gperf.sh'. Avoiding implicit rule recursion. No implicit rule found for pagespeed/kernel/http/bot_checker.gperf'.
Finished prerequisites of target file pagespeed/kernel/http/bot_checker.gperf'. No need to remake target pagespeed/kernel/http/bot_checker.gperf'.
Pruning file FORCE_DO_CMD'. Finished prerequisites of target file out/Debug/obj/gen/gperf_out/instaweb/kernel/http/bot_checker.gp.cc'.
Prerequisite pagespeed/kernel/http/bot_checker.gperf' is older than target out/Debug/obj/gen/gperf_out/instaweb/kernel/http/bot_checker.gp.cc'.
Prerequisite FORCE_DO_CMD' of target out/Debug/obj/gen/gperf_out/instaweb/kernel/http/bot_checker.gp.cc' does not exist.
Must remake target out/Debug/obj/gen/gperf_out/instaweb/kernel/http/bot_checker.gp.cc'. Invoking recipe from pagespeed/pagespeed_http_gperf.target.mk:10 to update target out/Debug/obj/gen/gperf_out/instaweb/kernel/http/bot_checker.gp.cc'.
Successfully remade target file out/Debug/obj/gen/gperf_out/instaweb/kernel/http/bot_checker.gp.cc'. Pruning file FORCE_DO_CMD'.
Pruning file out/Debug/obj/gen/gperf_out/instaweb/kernel/http/bot_checker.gp.cc'. Finished prerequisites of target file out/Debug/obj.target/pagespeed_http_gperf/gen/gperf_out/instaweb/kernel/http/bot_checker.gp.o'.
Must remake target out/Debug/obj.target/pagespeed_http_gperf/gen/gperf_out/instaweb/kernel/http/bot_checker.gp.o'. Invoking recipe from pagespeed/pagespeed_http_gperf.target.mk:327 to update target out/Debug/obj.target/pagespeed_http_gperf/gen/gperf_out/instaweb/kernel/http/bot_checker.gp.o'.
Putting child 0x2a9b070 (out/Debug/obj.target/pagespeed_http_gperf/gen/gperf_out/instaweb/kernel/http/bot_checker.gp.o) PID 11913 on the chain.
Live child 0x2a9b070 (out/Debug/obj.target/pagespeed_http_gperf/gen/gperf_out/instaweb/kernel/http/bot_checker.gp.o) PID 11913
CXX(target) out/Debug/obj.target/pagespeed_http_gperf/gen/gperf_out/instaweb/kernel/http/bot_checker.gp.o
Reaping winning child 0x2a9b070 PID 11913
Live child 0x2a9b070 (out/Debug/obj.target/pagespeed_http_gperf/gen/gperf_out/instaweb/kernel/http/bot_checker.gp.o) PID 11914
Reaping winning child 0x2a9b070 PID 11914
Live child 0x2a9b070 (out/Debug/obj.target/pagespeed_http_gperf/gen/gperf_out/instaweb/kernel/http/bot_checker.gp.o) PID 11915
In file included from /usr/include/c++/4.8.2/x86_64-redhat-linux/bits/os_defines.h:39:0,
from /usr/include/c++/4.8.2/x86_64-redhat-linux/bits/c++config.h:2097,
from /usr/include/c++/4.8.2/cctype:41,
from ./pagespeed/kernel/base/string_util.h:22,
from ./pagespeed/kernel/http/bot_checker.h:22,
from ../pagespeed/kernel/http/bot_checker.gperf:5:
/usr/include/features.h:330:4: error: #warning _FORTIFY_SOURCE requires compiling with optimization (-O) [-Werror=cpp]
warning _FORTIFY_SOURCE requires compiling with optimization (-O)
^
cc1plus: all warnings being treated as errors
Reaping losing child 0x2a9b070 PID 11915
make: *** [out/Debug/obj.target/pagespeed_http_gperf/gen/gperf_out/instaweb/kernel/http/bot_checker.gp.o] Error 1
Removing child 0x2a9b070 PID 11915 from chain.
`

@lanki567
Copy link
Author

lanki567 commented Oct 3, 2019

Below is the stacktrace of 2 threads with the Release module compiled from latest stable build branch. We see 100's of threads with the below stacktrace
`Thread 24 (Thread 0x7f3c425d6700 (LWP 12478)):
#0 0x00007f3da8e984ed in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f3da8e93dcb in _L_lock_883 () from /lib64/libpthread.so.0
#2 0x00007f3da8e93c98 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007f3d98e11afb in net_instaweb::SharedMemCache<64ul>::Get(std::string const&, net_instaweb::CacheInterface::Callback*) () from /apache/config/modules/mod_pagespeed_ap24.so
#4 0x00007f3d98e05977 in net_instaweb::CacheStats::Get(std::string const&, net_instaweb::CacheInterface::Callback*) () from /apache/config/modules/mod_pagespeed_ap24.so
#5 0x00007f3d98e05977 in net_instaweb::CacheStats::Get(std::string const&, net_instaweb::CacheInterface::Callback*) () from /apache/config/modules/mod_pagespeed_ap24.so
#6 0x00007f3d98bb1888 in net_instaweb::CachePropertyStore::Get(std::string const&, std::string const&, std::string const&, std::vector<net_instaweb::PropertyCache::Cohort const*, std::allocator<net_instaweb::PropertyCache::Cohort const*> > const&, net_instaweb::PropertyPage*, net_instaweb::Callback1, net_instaweb::AbstractPropertyStoreGetCallback**) () from /apache/config/modules/mod_pagespeed_ap24.so
#7 0x00007f3d98bb53b4 in net_instaweb::PropertyPage::Read(std::vector<net_instaweb::PropertyCache::Cohort const
, std::allocator<net_instaweb::PropertyCache::Cohort const*> > const&) () from /apache/config/modules/mod_pagespeed_ap24.so
#8 0x00007f3d989e0606 in net_instaweb::InstawebContext::BlockingPropertyCacheLookup() () from /apache/config/modules/mod_pagespeed_ap24.so
#9 0x00007f3d989e0eb5 in net_instaweb::InstawebContext::InstawebContext(request_rec*, net_instaweb::RequestHeaders*, net_instaweb::ContentType const&, net_instaweb::ApacheServerContext*, std::string const&, net_instaweb::RefCountedPtr<net_instaweb::RequestContext> const&, net_instaweb::QueryParams const&, net_instaweb::QueryParams const&, bool, net_instaweb::RewriteOptions const&) () from /apache/config/modules/mod_pagespeed_ap24.so
#10 0x00007f3d989e8169 in net_instaweb::(anonymous namespace)::build_context_for_request(request_rec*) () from /apache/config/modules/mod_pagespeed_ap24.so
#11 0x00007f3d989e8755 in net_instaweb::(anonymous namespace)::instaweb_out_filter(ap_filter_t*, apr_bucket_brigade*) () from /apache/config/modules/mod_pagespeed_ap24.so
#12 0x00007f3da7e77194 in filter_harness (f=0x7f3bf8071fa0, bb=0x7f3bf80723a0) at mod_filter.c:323
#13 0x00000000004421f7 in default_handler (r=0x7f3bf8078020) at core.c:4836
#14 0x0000000000453960 in ap_run_handler (r=r@entry=0x7f3bf8078020) at config.c:170
#15 0x0000000000453ea9 in ap_invoke_handler (r=r@entry=0x7f3bf8078020) at config.c:444
#16 0x00000000004697ba in ap_process_async_request (r=0x7f3bf8078020) at http_request.c:453
#17 0x0000000000469a7e in ap_process_request (r=r@entry=0x7f3bf8078020) at http_request.c:488
#18 0x0000000000465c45 in ap_process_http_sync_connection (c=0x7f3c3005b2c8) at http_core.c:210
#19 ap_process_http_connection (c=0x7f3c3005b2c8) at http_core.c:251
#20 0x000000000045d480 in ap_run_process_connection (c=c@entry=0x7f3c3005b2c8) at connection.c:42
#21 0x000000000045d9a8 in ap_process_connection (c=c@entry=0x7f3c3005b2c8, csd=csd@entry=0x7f3c3005b0b0) at connection.c:219
#22 0x00000000004706f2 in process_socket (bucket_alloc=0x7f3bf806aa08, my_thread_num=44, my_child_num=3, sock=0x7f3c3005b0b0, p=0x7f3c3005b038, thd=0xc47928) at worker.c:479
#23 worker_thread (thd=0xc47928, dummy=) at worker.c:808
#24 0x00007f3da8e91dd5 in start_thread () from /lib64/libpthread.so.0
#25 0x00007f3da89b702d in clone () from /lib64/libc.so.6

Thread 25 (Thread 0x7f3c42fd7700 (LWP 12477)):
#0 0x00007f3da8e984ed in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f3da8e93dcb in _L_lock_883 () from /lib64/libpthread.so.0
#2 0x00007f3da8e93c98 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007f3d98e11afb in net_instaweb::SharedMemCache<64ul>::Get(std::string const&, net_instaweb::CacheInterface::Callback*) () from /apache/config/modules/mod_pagespeed_ap24.so
#4 0x00007f3d98e05977 in net_instaweb::CacheStats::Get(std::string const&, net_instaweb::CacheInterface::Callback*) () from /apache/config/modules/mod_pagespeed_ap24.so
#5 0x00007f3d98e05977 in net_instaweb::CacheStats::Get(std::string const&, net_instaweb::CacheInterface::Callback*) () from /apache/config/modules/mod_pagespeed_ap24.so
#6 0x00007f3d98bb1888 in net_instaweb::CachePropertyStore::Get(std::string const&, std::string const&, std::string const&, std::vector<net_instaweb::PropertyCache::Cohort const*, std::allocator<net_instaweb::PropertyCache::Cohort const*> > const&, net_instaweb::PropertyPage*, net_instaweb::Callback1, net_instaweb::AbstractPropertyStoreGetCallback**) () from /apache/config/modules/mod_pagespeed_ap24.so
#7 0x00007f3d98bb53b4 in net_instaweb::PropertyPage::Read(std::vector<net_instaweb::PropertyCache::Cohort const
, std::allocator<net_instaweb::PropertyCache::Cohort const*> > const&) () from /apache/config/modules/mod_pagespeed_ap24.so
#8 0x00007f3d989e0606 in net_instaweb::InstawebContext::BlockingPropertyCacheLookup() () from /apache/config/modules/mod_pagespeed_ap24.so
#9 0x00007f3d989e0eb5 in net_instaweb::InstawebContext::InstawebContext(request_rec*, net_instaweb::RequestHeaders*, net_instaweb::ContentType const&, net_instaweb::ApacheServerContext*, std::string const&, net_instaweb::RefCountedPtr<net_instaweb::RequestContext> const&, net_instaweb::QueryParams const&, net_instaweb::QueryParams const&, bool, net_instaweb::RewriteOptions const&) () from /apache/config/modules/mod_pagespeed_ap24.so
#10 0x00007f3d989e8169 in net_instaweb::(anonymous namespace)::build_context_for_request(request_rec*) () from /apache/config/modules/mod_pagespeed_ap24.so
#11 0x00007f3d989e8755 in net_instaweb::(anonymous namespace)::instaweb_out_filter(ap_filter_t*, apr_bucket_brigade*) () from /apache/config/modules/mod_pagespeed_ap24.so
#12 0x00007f3da7e77194 in filter_harness (f=0x7f3bec071f88, bb=0x7f3bec072388) at mod_filter.c:323
#13 0x00000000004421f7 in default_handler (r=0x7f3bec078020) at core.c:4836
#14 0x0000000000453960 in ap_run_handler (r=r@entry=0x7f3bec078020) at config.c:170
#15 0x0000000000453ea9 in ap_invoke_handler (r=r@entry=0x7f3bec078020) at config.c:444
#16 0x00000000004697ba in ap_process_async_request (r=0x7f3bec078020) at http_request.c:453
#17 0x0000000000469a7e in ap_process_request (r=r@entry=0x7f3bec078020) at http_request.c:488
#18 0x0000000000465c45 in ap_process_http_sync_connection (c=0x7f3c300591e8) at http_core.c:210
#19 ap_process_http_connection (c=0x7f3c300591e8) at http_core.c:251
#20 0x000000000045d480 in ap_run_process_connection (c=c@entry=0x7f3c300591e8) at connection.c:42
#21 0x000000000045d9a8 in ap_process_connection (c=c@entry=0x7f3c300591e8, csd=csd@entry=0x7f3c30058fd0) at connection.c:219
#22 0x00000000004706f2 in process_socket (bucket_alloc=0x7f3bec06aa08, my_thread_num=43, my_child_num=3, sock=0x7f3c30058fd0, p=0x7f3c30058f58, thd=0xc478f8) at worker.c:479
#23 worker_thread (thd=0xc478f8, dummy=) at worker.c:808
#24 0x00007f3da8e91dd5 in start_thread () from /lib64/libpthread.so.0
#25 0x00007f3da89b702d in clone () from /lib64/libc.so.6`

@oschaaf
Copy link
Member

oschaaf commented Oct 3, 2019

Thanks so much for hanging in there and persisting. These backtraces look very very helpful. One last question; could you, while you're at it, post backtraces from all the threads somewhere for reference later on?

@lanki567
Copy link
Author

lanki567 commented Oct 3, 2019

Unfortunately, Policies here doesn't let me upload all the logs. Are you specifically looking for anything? I can paste those logs here.

Is there any workaround/property to disable this specific functionality?

@oschaaf
Copy link
Member

oschaaf commented Oct 3, 2019

Well, I was hoping to inspect what all the threads are at when the hang occurs. The two threads you posted both seem to be waiting to acquire a lock, but there's probably (hopefully!) also another one out there that holds it.

@jmarantz
Copy link
Contributor

jmarantz commented Oct 3, 2019

Thank you for the backtraces. They reveal this is happening in the shared-memory cache, where multiple processes use a cross-process mutex to access shared memory.

I have always been concerned that a process crash, or a SIGKILL while holding onto a shared-memory mutex could make it impossible to release the lock. However the shared-memory cache has been running for probably 7 years now and I've never heard of this actually occurring.

If this is just a once in a decade fluke, it might be a reasonable workaround to simply reboot the machine to clear the shm-locks and hopefully you will not be unlucky twice.

If, after reboot, the problem re-appears, a workaround is to disable the shared-memory cache, and instead use per-process in-memory LRU caches as your L1s. See https://www.modpagespeed.com/doc/system for details.

Hope this helps!

@lanki567
Copy link
Author

lanki567 commented Oct 9, 2019

I just wanted to provide an update. Into 6th day, We didn't see the issue yet after disabling shared memory cache. Thanks for the workaround

We are now little bit concerned about performance. Should be consider memcache/redis as alternative solution?

@jmarantz
Copy link
Contributor

jmarantz commented Oct 9, 2019 via email

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants