New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rgw: setup locks for libopenssl #20390
Conversation
@mdw-at-linuxbox please have a look, sir :) [I'm interested in whether this could be a root cause and fix for memory leak issues, as well?] |
jenkins test this please (arm test failure on bluefs: ceph.conf not found) |
This was triggered by using https://github.com/rakyll/hey which is a load gen similar to apache bench and setting up openstack keystone on a vstart cluster and making aws requests which force keystone to be contacted at every request. something as simple as
could trigger the crash in my env. (s3 auth use keystone and keystone url were set) |
I don't think this will fix any memory leak problems with curl/nss. I can believe it will fix crashing problems with curl/openssl. This is a really ugly property of curl, and one that does not make me happy. What it means is that somehow, at compile time, the configuration logic needs to determine if curl is going to be linked against openssl or gnutls, and make the appropriate lock setup calls directly. Failure to get it rlght = crash. |
I don't think so either, but we do see crashes with openssl and I believe it is related to this. I was able to test this locally with the command I posted.
We probably also need checks for openssl versions <=1.10? I'm not so sure whether we need a similar callbacks for newer versions, but I could be wrong |
In fact you don't need those locking calls in newer versions of openssl - 1.1 removes them. So ideally there should also be a cmake test to see if openssl has/needs this locking stuff. The locking stuff also affects civetweb but that's a different problem. |
+1. and some way to test whether libcurl is using openssl (it's nss on rhel) - whether that's at build- or runtime |
I tried using curl-config and cmake to get around. I assume since we depend on curl-devel we should be able to do it this way? (GetPrerequisites may be an alternative if we can map from the shared library back to package.. assuming we know the libopenssl so names for all distros) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the curl-config stuff looks good!
CMakeLists.txt
Outdated
if (NOT NO_CURL_SSL_LINK) | ||
if (OPENSSL_VERSION VERSION_LESS "1.1.0") | ||
message(STATUS "Found openssl ${OPENSSL_VERSION} < 1.1.0 need to explicitly set locking primitives") | ||
set(WITH_RADOSGW_OPENSSL ON) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this define is a lot more specific than 'radosgw uses openssl' - maybe something like WITH_LIBCURL_OPENSSL_LOCK_COMPAT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack much better
src/rgw/CMakeLists.txt
Outdated
@@ -189,6 +194,10 @@ add_dependencies(radosgw cls_rgw cls_lock cls_refcount | |||
cls_version cls_replica_log cls_user) | |||
install(TARGETS radosgw DESTINATION bin) | |||
|
|||
if (WITH_RADOSGW_OPENSSL) | |||
target_link_libraries(radosgw ${OPENSSL_LIBRARIES}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mdw-at-linuxbox recently argued against linking directly to openssl libs (in discussions about ssl for the beast frontend). Marcus, do you think we should use dlopen(LIBCRYPTO_SONAME)
for this stuff instead? is there any risk that civetweb and libcurl would be linking this differently?
src/rgw/rgw_http_client_ssl.h
Outdated
unsigned long rgw_ssl_thread_id_callback(); | ||
|
||
|
||
class RGWSSLSetup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this class, along with the dout defines/includes, can live in the source file instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
ed53413
to
4f083be
Compare
I've figured out there's another wrinkle to this. Our debian build of ceph builds against "curl-gnutls", which needs a similar but different locking fix. curl-config should be a safe way to do things provided it can tell you if openssl/gnutls is there. For openssl, you don't want to try forcing the system version of openssl - that way just creates many minefields. You can instead use OPENSSL_API_COMPAT to dissect what should be there, but it's usually better to just see if the relevant functions exist (or in this case do not). |
https://curl.haxx.se/docs/install.htm at select-tls-backend mentions --without-ssl --with-gnutls for this. All other ssl engines seem to be added after explicitly disabling openssl and enabling the other tls backends.
Thanks; I'll try to rework the PR based on this |
I have moved the curl related global functions to a seperate file; I'm still to use openssl_api_compat or using cmake's check_symbol for identifying fi openssl needs those locks. GnuTLS locking primitives look much simpler but they need to be added as well. |
b9e8112
to
5d9b884
Compare
I tried looking into gnutls though I'm not sure what needs to be done in order to initialize this lib. https://gnutls.org/manual/html_node/Thread-safety.html doesn't mention any special functions to be called to set up other than linking the application against pthreads; https://www.gnupg.org/documentation/manuals/gcrypt-devel/Multi_002dThreading.html mentions the need for |
Changelog:
|
src/rgw/rgw_http_client_curl.cc
Outdated
#include <openssl/crypto.h> | ||
#endif | ||
|
||
namespace rgw { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're now using C++17, you can also write:
namespace rgw::curl {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My hope is to backport some of this to Luminous and Jewel since the bug affects all versions of rgw atm
src/rgw/rgw_http_client_curl.cc
Outdated
namespace rgw { | ||
namespace curl { | ||
|
||
#ifdef HAVE_CURL_MULTI_WAIT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine, but you might also consider moving the ifdef inside of the function; the functionality is the same but it's a bit more concise.
return (unsigned long)pthread_self(); | ||
} | ||
|
||
void init_ssl(){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here as above-- it may be slightly more clear moving the ifdefs inside of the function body, but it's your call.
} | ||
|
||
unsigned long rgw_ssl_thread_id_callback(){ | ||
return (unsigned long)pthread_self(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Insert my usual nit against C style casts here. ;-)
There is a major case I didn't factor when writing this. Civetweb will init openssl libraries and setup locks when civetweb is terminated with an ssl endpoint. We shouldn't override the libopenssl locks in that case. I haven't looked into libcurl's code yet, but in the case of civetweb with ssl termination, there is a possibility that both the libraries might initialize libssl, in which case; we may need to tell curl not to initialize libopenssl. cc @mdw-at-linuxbox |
Looking more closely at the gnutls stuff I don't see where there are any necessary locks either. |
@mdw-at-linuxbox with openssl (assuming civetweb isn't https terminated) I'm almost always able to reproduce the crash with keystone turned on & if I run the program as I mentioned in comment#3; with civetweb terminated on ssl however I haven't been able to reproduce the crash (but civetweb sets up openssl locks when it is terminated with ssl; so that explains curl not needing this) |
42627a8
to
35f0f46
Compare
changeset
|
} | ||
|
||
bool fe_inits_ssl(boost::optional <const fe_map_t&> m, long& curl_global_flags){ | ||
if (m) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessary, but notice that if you turned this into a guard:
if (!m)
return false;
...then the whole function doesn't need to be in an if-block. Up to you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we also have to return false when we later find out that ssl_certificate is not set.
@theanalyst this needs rebase --- pr 20390 --- pulling https://github.com/theanalyst/ceph.git branch rgw/openssl-init
|
35f0f46
to
4571e64
Compare
i've been following the changes and my approval still stands 👍 |
@theanalyst Please rebase. |
4571e64
to
3828715
Compare
openssl <= 1.02 requires explicit callbacks for locking which libcurl doesn't set. This causes random segmentation faults when openssl uses its global structures across multiple threads. Providing a simple mutex lock/unlock functions as a callback. We determine whether openssl is used for libcurl via curl-config utility which should be installed as a part of our curl development headers package. We also additionally check that the openssl version is < 1.1.0 which alleviates the need for these callbacks. In this patchset we have done the following: - move all curl related global init functionality under rgw::curl namespace since libcurl may need to set up various ssl libraries etc during its init - introduce WITH_CURL_OPENSSL in cmake this checks the backend curl is deployed with using curl-config. Since curl devel is expected to be installed anyway, this binary should be available and can help identify the ssl backend curl was compiled with. - we only setup the locks if beast/civetweb aren't terminated with ssl, since these libraries setup the locks anyway and we want to prevent double initialization of openssl. Also we pass in ~CURL_GLOBAL_SSL making curl not initialize openssl if civetwb/beast is initializing them. Unfortunately this flag is a noop from curl >= 7.57 wherein both the libraries will end up initializing openssl anyway, which might override certain settings like error strings if using openssl < 1.1 https://curl.haxx.se/libcurl/c/threadsafe.html https://www.openssl.org/docs/man1.0.2/crypto/threads.html#DESCRIPTION Fixes: http://tracker.ceph.com/issues/22951 Signed-off-by: Abhishek Lekshmanan <abhishek@suse.com> Signed-off-by: Jesse Williamson <jwilliamson@suse.com>
Since rgw admin also needs to init curl globally where we don't have a frontend map Signed-off-by: Abhishek Lekshmanan <abhishek@suse.com>
since we use http manager which in turn uses curl and uses curl multi interfaces. While curl is initialized at the first call of curl_easy_init() this method isn't guaranteed to be safe when multiple threads may call the function since curl_global_init isn't reentrant. Calling curl_global_init via rgw::curl::setup_curl which additionally sets up ssl interfaces etc. when openssl is used as curl's ssl backend. Similarly moving rgw target link to accomodate this change. Signed-off-by: Abhishek Lekshmanan <abhishek@suse.com>
3828715
to
112ba0b
Compare
Since rgw admin can also use it which will fail otherwise. Fixes: http://tracker.ceph.com/issues/23203 Signed-off-by: Abhishek Lekshmanan <abhishek@suse.com>
src/rgw/rgw_admin.cc
Outdated
@@ -7246,5 +7247,6 @@ int main(int argc, const char **argv) | |||
} | |||
} | |||
|
|||
rgw::curl::cleanup_curl(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a lot of the admin commands will do an early return, so we can't rely on this to catch everything - that's why i had to go with raii in #20694
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense, I'll update rgw_admin
Since a lot of rgw-admin commands can exit early Signed-off-by: Abhishek Lekshmanan <abhishek@suse.com>
6fc49c7
to
4667937
Compare
@cbodley updated |
openssl <= 1.02 requires explicit callbacks for locking which libcurl doesn't
set. This causes random segmentation faults when openssl uses its global
structures across multiple threads. Providing a simple mutex lock/unlock
functions as a callback.
https://curl.haxx.se/libcurl/c/threadsafe.html
https://www.openssl.org/docs/man1.0.2/crypto/threads.html#DESCRIPTION
Fixes: http://tracker.ceph.com/issues/22951
Signed-off-by: Abhishek Lekshmanan abhishek@suse.com
Signed-off-by: Jesse Williamson jwilliamson@suse.com
edit- tracker ref