Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

apache plugin HTTPS - multithread crash on RHES 5.7 #858

Open
toni-moreno opened this issue Dec 11, 2014 · 13 comments
Open

apache plugin HTTPS - multithread crash on RHES 5.7 #858

toni-moreno opened this issue Dec 11, 2014 · 13 comments

Comments

@toni-moreno
Copy link
Contributor

I've deployed recently collectd to read mod_status apache info through collectd apache plugin ( over https protocol).

Collectd was compiled and built directly from 25 May 2014 master brach ( d76d251).
Libcurl (libcurl.so.4.3.0) was also compiled and built from official curl-7.35.0 sources.

System: RHEL 5.7 (Linux 2.6.18-274.el5 #1 SMP Fri Jul 8 17:36:59 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux)
OpenSSL: openssl-0.9.8e-27.el5_10.3

After some days running ok the process suddenly crashes ( core dump shows some kind of error with ssl lib).

Core was generated by `/opt/collectd/sbin/collectd -C /opt/collectd/etc/collectd.conf'.
Program terminated with signal 11, Segmentation fault.
#0  0x0000003d5787c39b in ?? ()
(gdb) backtrace
#0  0x0000003d5787c39b in ?? ()
#1  0x00000036efcdeb5e in SHA1_Update () from /lib64/libcrypto.so.6
#2  0x00000036efcdbd0f in ?? () from /lib64/libcrypto.so.6
#3  0x00000036efcdb5a6 in ?? () from /lib64/libcrypto.so.6
#4  0x00000036f001d351 in ssl3_client_hello () from /lib64/libssl.so.6
#5  0x00000036f001e7c9 in ssl3_connect () from /lib64/libssl.so.6
#6  0x00002aed7c3d180d in ossl_connect_common () from /opt/collectd/lib/libcurl.so.4
#7  0x00002aed7c3d273d in Curl_ssl_connect_nonblocking () from /opt/collectd/lib/libcurl.so.4
#8  0x00002aed7c398cde in https_connecting () from /opt/collectd/lib/libcurl.so.4
#9  0x00002aed7c3a5a3e in Curl_protocol_connect () from /opt/collectd/lib/libcurl.so.4
#10 0x00002aed7c3b888f in multi_runsingle () from /opt/collectd/lib/libcurl.so.4
#11 0x00002aed7c3b9895 in curl_multi_perform () from /opt/collectd/lib/libcurl.so.4
#12 0x00002aed7c3b33f4 in curl_easy_perform () from /opt/collectd/lib/libcurl.so.4
#13 0x00002aed7c189521 in apache_read_host (user_data=0xb96a768) at apache.c:615
#14 0x000000000041017f in plugin_read_thread (args=0x0) at plugin.c:462
#15 0x0000003d5840673d in ?? ()
#16 0x0000000000000000 in ?? ()

Anyone knows what is the origin of the problem? what versions are known to run ok with collectd (5.4.1) + apache plugin (https) + libcurl 7.35 on RHEL 5.7?

@mfournier
Copy link

The symptom you describe reminds me a lot of #513. See 5f2f969 and ddffda7. If you can't update to current master, try cherry-picking these 2 patches.

Please keep us updated !

@toni-moreno
Copy link
Contributor Author

Hi @mfournier I've patched apache.c with (5f2f969 ) , and deployed on test servers waiting to reproduce the crash .

The other patch doesn't apply to my environment since we are not loading the network plugin.

@toni-moreno
Copy link
Contributor Author

Hi @mfournier , after patch applied and deployed over some servers (with the same software versions but other servers than the previous backtrace )2 things happened.

  1. On servers where still working (7 servers ok) there are a lot of random errors ( errors on connections)

https://gist.github.com/toni-moreno/5a30930fe7361afefa1b#file-collectd_apache_log

  1. one server is still crashing but as you can see in the backtrace there is no information available ( I've compiled with -g -C0 ) ,

https://gist.github.com/toni-moreno/5a30930fe7361afefa1b#file-collectd-gdb

Where crashing is a production servers ( there is no way to reproduce the crash on test and preproduction servers).

What to do?

@toni-moreno
Copy link
Contributor Author

Hi @mfournier
The last crash information I posted at (https://gist.github.com/toni-moreno/5a30930fe7361afefa1b#file-collectd-gdb ) has been fixed ( there where another bug in a self made plugin already fixed).

After fixed the apache+openssl bug persist as you can see in the following backtrace.

[root]# gdb /opt/collectd/sbin/collectd ./core.20150120
.
.
.
warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7fffc11fd000
Core was generated by `/opt/collectd/sbin/collectd -C /opt/collectd/etc/collectd.conf'.
Program terminated with signal 11, Segmentation fault.
#0  0x000000320347b11b in ?? ()
(gdb) backtrace
#0  0x000000320347b11b in ?? ()
#1  0x0000003f92adeb5e in SHA1_Update () from /lib64/libcrypto.so.6
#2  0x0000003f92adbd0f in ?? () from /lib64/libcrypto.so.6
#3  0x0000003f92adb5a6 in ?? () from /lib64/libcrypto.so.6
#4  0x0000003f92e1d351 in ssl3_client_hello () from /lib64/libssl.so.6
#5  0x0000003f92e1e7c9 in ssl3_connect () from /lib64/libssl.so.6
#6  0x00002b5c167da80d in ossl_connect_common () from /opt/collectd/lib/libcurl.so.4
#7  0x00002b5c167db73d in Curl_ssl_connect_nonblocking () from /opt/collectd/lib/libcurl.so.4
#8  0x00002b5c167a1cde in https_connecting () from /opt/collectd/lib/libcurl.so.4
#9  0x00002b5c167aea3e in Curl_protocol_connect () from /opt/collectd/lib/libcurl.so.4
#10 0x00002b5c167c188f in multi_runsingle () from /opt/collectd/lib/libcurl.so.4
#11 0x00002b5c167c2895 in curl_multi_perform () from /opt/collectd/lib/libcurl.so.4
#12 0x00002b5c167bc3f4 in curl_easy_perform () from /opt/collectd/lib/libcurl.so.4
#13 0x00002b5c165925c1 in apache_read_host (user_data=0x40f0b88) at apache.c:615
#14 0x000000000041017f in plugin_read_thread (args=0x0) at plugin.c:462
#15 0x000000320400677d in ?? ()
#16 0x0000000000000000 in ?? ()

The patch was only

[root]# diff src/apache.c.prepatch858 src/apache.c
673a674,681
> static int apache_init (void) /* {{{ */
> {
>       /* Call this while collectd is still single-threaded to avoid
>        * initialization issues in libgcrypt. */
>       curl_global_init (CURL_GLOBAL_SSL);
>       return (0);
> } /* }}} int apache_init */
>
676a685
>       plugin_register_init ("apache", apache_init);

what to do know?

@toni-moreno
Copy link
Contributor Author

Hi @mfournier I'm reviewing how to use libcurl on multithreaded environtment and

it seems that is needed a different flag that you are doing

curl_global_init(CURL_GLOBAL_ALL);

instead of

curl_global_init (CURL_GLOBAL_SSL);

as you are doing in 5f2f969

a least this is the way shown at

http://curl.haxx.se/libcurl/c/multithread.html

what do you think about?

@toni-moreno
Copy link
Contributor Author

Hi @mfournier after review the curl.h init flags CURL_GLOBAL_ALL and CURL_GLOBAL_SSL seems to be the same.

I have found that collectd + plugin apache works ok with HTTP queries and HTTPS when only are collecting data from one HTTPS instance.

So it seems to be a HTTP bug in multithreaded environment. so I've opened an issue to the libcurl guys.

http://sourceforge.net/p/curl/bugs/1475/

@toni-moreno
Copy link
Contributor Author

Hi @mfournier the libcurl developer (Daniel Stenberg) answered us about this bug.

It seems to be needed that collectd set and use a special and proper OpenSSL mutex callbacks to work in multithread.

https://sourceforge.net/p/curl/bugs/1475/

Seems that libcurl + openssl needs a thread setup , like in the next example code.

http://curl.haxx.se/libcurl/c/opensslthreadlock.hml

@toni-moreno
Copy link
Contributor Author

Hi @mfournier,

cc/ @octo, @tokkee , @pyr
Hi all .

After review the new information gave for Daniel Stenberg seems that is needed a only openssl_setup for all openssl multithread comunicacions.

I would like to build a patch for this issue but I think the apache_init() is not the best place to do this common initialization. Perhaps there are some other plugin that needs also thi custom ssl init.

Perhaps could be interesting a global parameter "EnableMultithreadSSL" in the collectd.conf base add a sslmultithread.c file at src/daemon dir and call the cd_ssl_mthead_init() at:

collectd.c::init_global_variables() 

So the CRYPTO_xxxxx () callbacks will be availables for all plugins.

what do you think about this?

@toni-moreno toni-moreno changed the title apache plugin crash on RHES 5.7 apache plugin HTTPS - multithread crash on RHES 5.7 Jan 31, 2015
@tokkee
Copy link
Member

tokkee commented Feb 2, 2015

A config parameter does not sound like the right solution to me. Users should not have to worry about this. Would it hurt to call the SSL init function multiple times (i.e. from each affected plugin's init function)?

@toni-moreno
Copy link
Contributor Author

@tokkee I agree that users should not have to worry about this.

About your question if could be possible to be more than one CRYPTO_xxx callback ( one for each plugin) , I don't know, I reviewed documentation and there is nothing on that question.

http://www.openssl.org/docs/crypto/threads.html#DESCRIPTION

I can ask Daniel Stenberg but it seems that only one callback is allowed and will be overwritten if you do twice the same initialization.

When in doubt , perhaps is better to force this crypto initialization only once.

@toni-moreno
Copy link
Contributor Author

@Tokke , another way could be create the sslmultithread.c linked with collectd binary but not initialized and enable each plugin the initialization in its _init() function.

The cd_ssl_mthead_init() will do CRYPTO callbacks initialization only on the its first call.

what do you think about this way?

@tokkee
Copy link
Member

tokkee commented Feb 3, 2015

Without having looked into the details, that's exactly what I would have proposed then as well.

@toni-moreno
Copy link
Contributor Author

Hi guys.

/cc @tokkee

After apply the patch ( #943) collectd have been running ok on production ( up than 12 different apache instances with https ) for up than 3 weeks !!!

@rpv-tomsk rpv-tomsk added this to the issues milestone Jul 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants