-
Notifications
You must be signed in to change notification settings - Fork 268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Daemon segfault after a few graceful reloads with multiple threads #81
Comments
If ceilometer is the only WSGI application running in that specific mod_wsgi daemon process group within that VirtualHost, instead of using: WSGIApplicationGroup ceilometer try: WSGIApplicationGroup %{GLOBAL} Some third party Python packages don’t always work properly in Python sub interpreters and instead need to run in the main (first) Python interpreter. If this isn’t done, one of the symptoms can be process crashes. Using ‘%{GLOBAL}’ will force the use of the main interpreter context within the process. So as first step, the would be best thing to try and see if that helps. If not then we can look in more detail at the other information available you have collected. You can find a bit more information on this issue with sub interpreters mentioned at:
Graham
|
Ok I changed WSGIApplicationGroup from ceilometer to %{GLOBAL} in the ceilometer virtual host. (For the record, all of the other virtual host files already had this %{GLOBAL} setting.) After changing the virtual host file, I restarted apache and tried my "for i in |
Well I'm at a loss here. I can't even tell if this issue is a problem with mod-wsgi or apache2. I found the apache code that generates the error "seg fault or similar nasty error detected in the parent process". In the apache (v2.4.12-2) source, httpd-2.4.12/server/mpm_unix.c: My attention is drawn to: "This comparison won't match if the crashing thread is some module's thread that runs in the parent process." $ ps -p 11120 -lfT So we have 8 threads of ceilometer running in process 11120, and all under the parent process 11117. For the record, the process 11117 is the apache process itself: I think that all of the ceilometer and keystone processes are wsgi process (because in each of their virtual hosts file, they have WSGIDaemonProcess, WSGIApplicationGroup, WSGIProcessGroup). Is this correct? If the processes 11120-11127 are wsgi processes (module processes), then none of these threads has the SPID that matches the parent process of apache2, 11117. Do you have any suggestions? |
Sorry for the delay on this one. Was in the process of starting a new job so busy with that. Do you have full stack traces from a core dump? I can't see that you provided any. Once in gdb, try running:
I think this will give full stack traces for all active threads in the process for the core dump. Usually I use this when attached to live process, so hopefully works for core dump as well. That the issue related to receiving signals while things were possibly still starting up, I can imagine what issue may be as I have had issues with that in the past, but believed they were fixed. I do still know of some issues related to signal handling when preloading Python code into a process, but I can't see right now how that would be related. |
Here is the full stack trace: $ sudo gdb /usr/sbin/apache2 /tmp/mycoredump/core Thread 1 (Thread 0x7f5086dff780 (LWP 33130)): |
I ran into the same issue too. Segfaults happen on logrotate. |
The same is true for me - ran into this issue several times. |
Try 4.4.15 and see if issue goes away. That includes a fix for core dumps when doing graceful restarts and mod_wsgi was being loaded the first time. The fix was for a permanent failure and not a transient one though, so could well be different. |
The same for me. 4.4.15 didn't fix issue. |
@bkupidura Are you using embedded mode or daemon mode of mod_wsgi? If using daemon mode, are you also setting:
Use of Also, are you perhaps using |
Daemon mode. Configuration: <VirtualHost *:35357> Vhost docrootDocumentRoot "/usr/lib/cgi-bin/keystone" Directories, there should at least be a declaration for /usr/lib/cgi-bin/keystone<Directory "/usr/lib/cgi-bin/keystone"> LoggingErrorLog "/var/log/apache2/keystone_wsgi_admin_error.log" Custom fragmentLimitRequestFieldSize 81900 Additing "WSGIRestrictedEmbedded" to mods-enabled/wsgi.conf didnt change anything. Issue still there. |
Ahhhh. The KeyStone application from OpenStack. I have had separate reports via Red Hat of Apache process crashes on process shutdown when it specifically is being hosted. I am not sure yet what it is about that specific application which is causing problems. I am not seeing reports like that for other applications. One possible cause for problems being investigated is whether KeyStone is creating background threads. These can be a problem because they will not be getting stopped prior to destroying the Python interpreter and so if the thread gets woken up again during shutdown, it can then access invalid memory. It is not an easy problem to solve completely. Any application creating background threads should really be registering an atexit callback that attempts to shutdown the threads so they aren't running when the interpreter is destroyed. |
See also stack traces in: from #132. |
I am going to close this finally. There have been a few fixes since this was originally reported which address problems in mod_wsgi related to memory usage. Also not seen any further reports related to Keystone on Apache for a long time now, so assuming not using Apache anymore or problems may have resolved themselves. Create a new issue if this is still an issue. |
We are running on our own Debian Jessie derivative with apache 2.4.10 and mod-wsgi (4.4.13) and experience a segfault upon multiple reloads of apache. The segfault also seems to be dependent on amount of time between reloads. For example, if we do
for i in
seq 1 10; do echo $i; sudo service apache2 reload; sleep 2; done;
it will fail on the 7th reload every time. If we increase the sleep in between reloads to sleep 3, all 10 reloads are issued successfully every time.Our Jessie derivative is the base OS for openstack so we have several openstack processes running:
$ ps -ef | grep apache2
root 27505 33129 0 14:52 pts/1 00:00:00 sudo vi /var/log/apache2/error.log
root 27506 27505 0 14:52 pts/1 00:00:00 vi /var/log/apache2/error.log
root 31470 1 0 15:07 ? 00:00:00 /usr/sbin/apache2 -k start
ceilome+ 31474 31470 0 15:07 ? 00:00:39 /usr/sbin/apache2 -k start
ceilome+ 31475 31470 0 15:07 ? 00:00:40 /usr/sbin/apache2 -k start
ceilome+ 31476 31470 0 15:07 ? 00:00:40 /usr/sbin/apache2 -k start
ceilome+ 31477 31470 0 15:07 ? 00:00:39 /usr/sbin/apache2 -k start
horizon 31478 31470 1 15:07 ? 00:03:27 /usr/sbin/apache2 -k start
horizon 31479 31470 1 15:07 ? 00:03:28 /usr/sbin/apache2 -k start
horizon 31480 31470 1 15:07 ? 00:03:24 /usr/sbin/apache2 -k start
keystone 31481 31470 0 15:07 ? 00:01:53 /usr/sbin/apache2 -k start
keystone 31482 31470 0 15:07 ? 00:01:50 /usr/sbin/apache2 -k start
keystone 31483 31470 1 15:07 ? 00:02:16 /usr/sbin/apache2 -k start
keystone 31484 31470 1 15:07 ? 00:02:15 /usr/sbin/apache2 -k start
www-data 31485 31470 0 15:07 ? 00:00:37 /usr/sbin/apache2 -k start
www-data 31486 31470 0 15:07 ? 00:01:06 /usr/sbin/apache2 -k start
stack 32639 25024 0 18:38 pts/3 00:00:00 grep apache2
If i do a
sudo service apache2 reload
anywhere from 3-10 times, at some point it will fail. Note that this happens if there is less than 1 second in between reloads:$ sudo service apache2 reload
Job for apache2.service failed. See 'systemctl status apache2.service' and 'journalctl -xn' for details.
$ systemctl status apache2.service
● apache2.service - LSB: Apache2 web server
Loaded: loaded (/etc/init.d/apache2)
Active: active (running) (Result: exit-code) since Fri 2015-07-10 15:07:19 EDT; 5h 35min ago
Process: 31436 ExecStop=/etc/init.d/apache2 stop (code=exited, status=1/FAILURE)
Process: 13873 ExecReload=/etc/init.d/apache2 reload (code=exited, status=1/FAILURE)
Process: 31449 ExecStart=/etc/init.d/apache2 start (code=exited, status=0/SUCCESS)
CGroup: /system.slice/apache2.service
└─13413 /usr/sbin/apache2 -k start
$ journalctl -xn
No journal files were found.
Analyzing the core dump file with gdb yields:
$ sudo gdb /usr/sbin/apache2 /tmp/mycoredump/core
GNU gdb (Debian 7.7.1+dfsg-5) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
http://www.gnu.org/software/gdb/documentation/.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/sbin/apache2...done.
[New LWP 31470]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/sbin/apache2 -k start'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007f1526f92ad6 in ?? ()
In the /var/log/apache2/error.log, each successful reload causes this sequence:
For the situation where we use a sleep 2 (or less) in between reloads, we see in the error log:
*[core:notice] [pid 48512] AH00060: seg fault or similar nasty error detected in the parent process
And the attach interpreter never happens.
Then apache2 has to just be restarted (sudo service apache2 restart) before we can do any further reloads. Also, searching around online, I found https://bugs.launchpad.net/ubuntu/+source/redland-bindings/+bug/1416875 which may be related.
Looking at one of the virtual hosts conf file, it's running mod_wsgi in daemon mode and everything looks fine to me:
Listen 127.0.0.1:8777
WSGIPythonHome /opt/stack/service/ceilometer-common/venv/bin/../
<VirtualHost *:8777>
WSGIScriptAlias / /opt/stack/service/ceilometer-common/venv/bin/../lib/python2.7/site-packages/ceilometer/api/app.wsgi
WSGIDaemonProcess ceilometer user=ceilometer group=ceilometer processes=4 threads=5 python-path=/opt/stack/service/ceilometer-common/venv/bin/../lib/python2.7/site-packages
WSGIApplicationGroup ceilometer
WSGIProcessGroup ceilometer
ErrorLog /var/log/apache2/ceilometer_modwsgi.log
LogLevel info
CustomLog /var/log/apache2/ceilometer_access.log combined
<Directory /opt/stack/service/ceilometer-common/venv/bin/../lib/python2.7/site-packages/ceilometer>
Options Indexes FollowSymLinks MultiViews
Require all granted
AllowOverride None
Order allow,deny
allow from all
LimitRequestBody 102400
Do you have any ideas of what could be causing the reloads to fail? Any advice or thoughts would be very useful.
Thank you for looking at this issue,
Heather Brown
hbrown@hp.com
The text was updated successfully, but these errors were encountered: