Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rgw: setup locks for libopenssl #20390

Merged
merged 5 commits into from Mar 8, 2018
Merged

Conversation

theanalyst
Copy link
Member

@theanalyst theanalyst commented Feb 9, 2018

openssl <= 1.02 requires explicit callbacks for locking which libcurl doesn't
set. This causes random segmentation faults when openssl uses its global
structures across multiple threads. Providing a simple mutex lock/unlock
functions as a callback.

https://curl.haxx.se/libcurl/c/threadsafe.html
https://www.openssl.org/docs/man1.0.2/crypto/threads.html#DESCRIPTION

Fixes: http://tracker.ceph.com/issues/22951
Signed-off-by: Abhishek Lekshmanan abhishek@suse.com
Signed-off-by: Jesse Williamson jwilliamson@suse.com

edit- tracker ref

@mattbenjamin
Copy link
Contributor

mattbenjamin commented Feb 9, 2018

@mdw-at-linuxbox please have a look, sir :) [I'm interested in whether this could be a root cause and fix for memory leak issues, as well?]

@theanalyst
Copy link
Member Author

jenkins test this please (arm test failure on bluefs: ceph.conf not found)

@theanalyst
Copy link
Member Author

theanalyst commented Feb 12, 2018

This was triggered by using https://github.com/rakyll/hey which is a load gen similar to apache bench and setting up openstack keystone on a vstart cluster and making aws requests which force keystone to be contacted at every request.

something as simple as

$ hey -c 50 -n 100000 -H "Authorization: AWS access1:bar" -H "DATE: $(LC_ALL=C date -u +'%a, %d %b %Y %X %z')" http://localhost:8000/foo/bar

could trigger the crash in my env. (s3 auth use keystone and keystone url were set)

@mdw-at-linuxbox
Copy link
Contributor

I don't think this will fix any memory leak problems with curl/nss. I can believe it will fix crashing problems with curl/openssl.

This is a really ugly property of curl, and one that does not make me happy. What it means is that somehow, at compile time, the configuration logic needs to determine if curl is going to be linked against openssl or gnutls, and make the appropriate lock setup calls directly. Failure to get it rlght = crash.

@theanalyst
Copy link
Member Author

I don't think this will fix any memory leak problems with curl/nss. I can believe it will fix crashing problems with curl/openssl.

I don't think so either, but we do see crashes with openssl and I believe it is related to this. I was able to test this locally with the command I posted.

This is a really ugly property of curl, and one that does not make me happy. What it means is that somehow, at compile time, the configuration logic needs to determine if curl is going to be linked against openssl or gnutls, and make the appropriate lock setup calls directly. Failure to get it rlght = crash.

We probably also need checks for openssl versions <=1.10? I'm not so sure whether we need a similar callbacks for newer versions, but I could be wrong

@mdw-at-linuxbox
Copy link
Contributor

In fact you don't need those locking calls in newer versions of openssl - 1.1 removes them. So ideally there should also be a cmake test to see if openssl has/needs this locking stuff.

The locking stuff also affects civetweb but that's a different problem.

@cbodley
Copy link
Contributor

cbodley commented Feb 13, 2018

So ideally there should also be a cmake test to see if openssl has/needs this locking stuff.

+1. and some way to test whether libcurl is using openssl (it's nss on rhel) - whether that's at build- or runtime

@theanalyst
Copy link
Member Author

theanalyst commented Feb 13, 2018

+1. and some way to test whether libcurl is using openssl (it's nss on rhel) - whether that's at build- or runtime

I tried using curl-config and cmake to get around. I assume since we depend on curl-devel we should be able to do it this way? (GetPrerequisites may be an alternative if we can map from the shared library back to package.. assuming we know the libopenssl so names for all distros)

Copy link
Contributor

@cbodley cbodley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the curl-config stuff looks good!

CMakeLists.txt Outdated
if (NOT NO_CURL_SSL_LINK)
if (OPENSSL_VERSION VERSION_LESS "1.1.0")
message(STATUS "Found openssl ${OPENSSL_VERSION} < 1.1.0 need to explicitly set locking primitives")
set(WITH_RADOSGW_OPENSSL ON)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this define is a lot more specific than 'radosgw uses openssl' - maybe something like WITH_LIBCURL_OPENSSL_LOCK_COMPAT

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack much better

@@ -189,6 +194,10 @@ add_dependencies(radosgw cls_rgw cls_lock cls_refcount
cls_version cls_replica_log cls_user)
install(TARGETS radosgw DESTINATION bin)

if (WITH_RADOSGW_OPENSSL)
target_link_libraries(radosgw ${OPENSSL_LIBRARIES})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mdw-at-linuxbox recently argued against linking directly to openssl libs (in discussions about ssl for the beast frontend). Marcus, do you think we should use dlopen(LIBCRYPTO_SONAME) for this stuff instead? is there any risk that civetweb and libcurl would be linking this differently?

unsigned long rgw_ssl_thread_id_callback();


class RGWSSLSetup
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this class, along with the dout defines/includes, can live in the source file instead

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

@theanalyst theanalyst force-pushed the rgw/openssl-init branch 3 times, most recently from ed53413 to 4f083be Compare February 13, 2018 18:19
@mdw-at-linuxbox
Copy link
Contributor

I've figured out there's another wrinkle to this. Our debian build of ceph builds against "curl-gnutls", which needs a similar but different locking fix.

curl-config should be a safe way to do things provided it can tell you if openssl/gnutls is there. For openssl, you don't want to try forcing the system version of openssl - that way just creates many minefields. You can instead use OPENSSL_API_COMPAT to dissect what should be there, but it's usually better to just see if the relevant functions exist (or in this case do not).

@theanalyst
Copy link
Member Author

I've figured out there's another wrinkle to this. Our debian build of ceph builds against "curl-gnutls", which needs a similar but different locking fix.

https://curl.haxx.se/docs/install.htm at select-tls-backend mentions --without-ssl --with-gnutls for this. All other ssl engines seem to be added after explicitly disabling openssl and enabling the other tls backends.

curl-config should be a safe way to do things provided it can tell you if openssl/gnutls is there. For openssl, you don't want to try forcing the system version of openssl - that way just creates many minefields. You can instead use OPENSSL_API_COMPAT to dissect what should be there, but it's usually better to just see if the relevant functions exist (or in this case do not).

Thanks; I'll try to rework the PR based on this

@theanalyst
Copy link
Member Author

I have moved the curl related global functions to a seperate file; I'm still to use openssl_api_compat or using cmake's check_symbol for identifying fi openssl needs those locks. GnuTLS locking primitives look much simpler but they need to be added as well.

@theanalyst
Copy link
Member Author

I tried looking into gnutls though I'm not sure what needs to be done in order to initialize this lib. https://gnutls.org/manual/html_node/Thread-safety.html doesn't mention any special functions to be called to set up other than linking the application against pthreads; https://www.gnupg.org/documentation/manuals/gcrypt-devel/Multi_002dThreading.html mentions the need for gcry_control(GCRYCTL_SET_THREAD_CBS); however https://www.gnupg.org/documentation/manuals/gcrypt/Controlling-the-library.html mentions this option is obsolete.

@theanalyst
Copy link
Member Author

Changelog:

  • moved all curl related initialization routines from rgw_main to a seperate rgw_http_client_curl
  • used openssl_api_compat to determine whether we need to include the locking callbacks.

#include <openssl/crypto.h>
#endif

namespace rgw {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're now using C++17, you can also write:
namespace rgw::curl {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My hope is to backport some of this to Luminous and Jewel since the bug affects all versions of rgw atm

namespace rgw {
namespace curl {

#ifdef HAVE_CURL_MULTI_WAIT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine, but you might also consider moving the ifdef inside of the function; the functionality is the same but it's a bit more concise.

return (unsigned long)pthread_self();
}

void init_ssl(){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here as above-- it may be slightly more clear moving the ifdefs inside of the function body, but it's your call.

}

unsigned long rgw_ssl_thread_id_callback(){
return (unsigned long)pthread_self();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Insert my usual nit against C style casts here. ;-)

@theanalyst
Copy link
Member Author

There is a major case I didn't factor when writing this. Civetweb will init openssl libraries and setup locks when civetweb is terminated with an ssl endpoint. We shouldn't override the libopenssl locks in that case. I haven't looked into libcurl's code yet, but in the case of civetweb with ssl termination, there is a possibility that both the libraries might initialize libssl, in which case; we may need to tell curl not to initialize libopenssl. cc @mdw-at-linuxbox

@mdw-at-linuxbox
Copy link
Contributor

Looking more closely at the gnutls stuff I don't see where there are any necessary locks either.
I also put together a heavily threaded standalone program that does lots of connects. Interestingly enough, I did not find any locks to be necessary with either gnutls or openssl. No crashes, no complaints either. (Openssl was xenial openssl 1.0.2g-1ubuntu4.6)

@theanalyst
Copy link
Member Author

@mdw-at-linuxbox with openssl (assuming civetweb isn't https terminated) I'm almost always able to reproduce the crash with keystone turned on & if I run the program as I mentioned in comment#3; with civetweb terminated on ssl however I haven't been able to reproduce the crash (but civetweb sets up openssl locks when it is terminated with ssl; so that explains curl not needing this)

@theanalyst
Copy link
Member Author

changeset

  • made rgw frontend map a boost::optional type
  • rgw::curl::setup_curl is called by rgw admin as well (fixed cmake accordingly to optionally link on rgw_a)
  • print in status when we may be setting up locks due to curl being linked with ssl

}

bool fe_inits_ssl(boost::optional <const fe_map_t&> m, long& curl_global_flags){
if (m) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary, but notice that if you turned this into a guard:
if (!m)
return false;

...then the whole function doesn't need to be in an if-block. Up to you!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also have to return false when we later find out that ssl_certificate is not set.

@yuriw
Copy link
Contributor

yuriw commented Mar 1, 2018

@theanalyst this needs rebase

--- pr 20390 --- pulling https://github.com/theanalyst/ceph.git branch rgw/openssl-init
remote: Counting objects: 24, done.
remote: Total 24 (delta 19), reused 19 (delta 19), pack-reused 5
Unpacking objects: 100% (24/24), done.
From https://github.com/theanalyst/ceph

  • branch rgw/openssl-init -> FETCH_HEAD
    Auto-merging src/rgw/rgw_main.cc
    CONFLICT (content): Merge conflict in src/rgw/rgw_main.cc
    Auto-merging src/rgw/rgw_admin.cc
    Auto-merging CMakeLists.txt
    Automatic merge failed; fix conflicts and then commit the result.
    Traceback (most recent call last):
    File "/home/yuriw/wip1/src/script/build-integration-branch", line 62, in
    assert not r
    AssertionError

@theanalyst
Copy link
Member Author

@yuriw rebased, but I think this may conflict with #20635

@theanalyst
Copy link
Member Author

@cbodley can you take a look at the new changes when you have the time.

@yuriw this definitely would conflict with #20635

@cbodley
Copy link
Contributor

cbodley commented Mar 1, 2018

@cbodley can you take a look at the new changes when you have the time.

i've been following the changes and my approval still stands 👍

@smithfarm
Copy link
Contributor

@theanalyst Please rebase.

openssl <= 1.02 requires explicit callbacks for locking which libcurl doesn't
set. This causes random segmentation faults when openssl uses its global
structures across multiple threads. Providing a simple mutex lock/unlock
functions as a callback. We determine whether openssl is used for libcurl via
curl-config utility which should be installed as a part of our curl development
headers package. We also additionally check that the openssl version is < 1.1.0
which alleviates the need for these callbacks. In this patchset we have done the
following:

- move all curl related global init functionality under rgw::curl namespace
  since libcurl may need to set up various ssl libraries etc during its init
- introduce WITH_CURL_OPENSSL in cmake
  this checks the backend curl is deployed with using curl-config. Since curl
  devel is expected to be installed anyway, this binary should be available and
  can help identify the ssl backend curl was compiled with.
- we only setup the locks if beast/civetweb aren't terminated with ssl, since
  these libraries setup the locks anyway and we want to prevent double
  initialization of openssl. Also we pass in ~CURL_GLOBAL_SSL making curl not
  initialize openssl if civetwb/beast is initializing them. Unfortunately this
  flag is a noop from curl >= 7.57 wherein both the libraries will end up
  initializing openssl anyway, which might override certain settings like error
  strings if using openssl < 1.1

https://curl.haxx.se/libcurl/c/threadsafe.html
https://www.openssl.org/docs/man1.0.2/crypto/threads.html#DESCRIPTION

Fixes: http://tracker.ceph.com/issues/22951
Signed-off-by: Abhishek Lekshmanan <abhishek@suse.com>
Signed-off-by: Jesse Williamson <jwilliamson@suse.com>
Since rgw admin also needs to init curl globally where we don't have a frontend
map

Signed-off-by: Abhishek Lekshmanan <abhishek@suse.com>
since we use http manager which in turn uses curl and uses curl multi
interfaces. While curl is initialized at the first call of curl_easy_init() this
method isn't guaranteed to be safe when multiple threads may call the function
since curl_global_init isn't reentrant. Calling curl_global_init via
rgw::curl::setup_curl which additionally sets up ssl interfaces etc. when
openssl is used as curl's ssl backend. Similarly moving rgw target link to
accomodate this change.

Signed-off-by: Abhishek Lekshmanan <abhishek@suse.com>
Since rgw admin can also use it which will fail otherwise.

Fixes: http://tracker.ceph.com/issues/23203

Signed-off-by: Abhishek Lekshmanan <abhishek@suse.com>
@@ -7246,5 +7247,6 @@ int main(int argc, const char **argv)
}
}

rgw::curl::cleanup_curl();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a lot of the admin commands will do an early return, so we can't rely on this to catch everything - that's why i had to go with raii in #20694

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, I'll update rgw_admin

@cbodley cbodley dismissed their stale review March 5, 2018 16:44

use raii for cleanup in rgw_admin.cc

Since a lot of rgw-admin commands can exit early

Signed-off-by: Abhishek Lekshmanan <abhishek@suse.com>
@theanalyst
Copy link
Member Author

@cbodley updated

@yuriw
Copy link
Contributor

yuriw commented Mar 6, 2018

@yuriw
Copy link
Contributor

yuriw commented Mar 7, 2018

@yuriw yuriw merged commit 39f7377 into ceph:master Mar 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants