Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rgw: fix valgrind errors when protected_fixedsize_stack is used #39025

Merged
merged 1 commit into from Jan 29, 2021

Conversation

yuvalif
Copy link
Contributor

@yuvalif yuvalif commented Jan 22, 2021

Fixes: https://tracker.ceph.com/issues/48963

Signed-off-by: Yuval Lifshitz ylifshit@redhat.com


Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

Copy link
Contributor

@mattbenjamin mattbenjamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting; does protected_fixedsize_stack supercede the approaches we've been using in other coroutines?

@@ -19,6 +19,8 @@ function(gperf_generate input output)
)
endfunction()

add_definitions(-DBOOST_USE_VALGRIND)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this one will need to apply globally in ceph, because we use boost all over. we don't want rgw to build it differently than other libraries like common/librados

it's also not clear what performance impact this has, so we might only want to turn it on when ALLOCATOR=libc, which (i think?) is the only way we can run with valgrind

i'd also like to see a short comment here describing why we need the flag

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be turned on when WITH_BOOST_VALGRIND is turned on. I thought it was.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the pointer @adamemerson!

in CMakeLists.txt:

CMAKE_DEPENDENT_OPTION(WITH_BOOST_VALGRIND "Boost support for valgrind" OFF
  "NOT WITH_SYSTEM_BOOST" OFF)

and cmake/modules/BuildBoost.cmake:

    if((c MATCHES "coroutine|context") AND (WITH_BOOST_VALGRIND))
      set_target_properties(Boost::${c} PROPERTIES
        INTERFACE_COMPILE_DEFINITIONS "BOOST_USE_VALGRIND")
    endif()

it looks like that only takes effect for non-system boost, which i don't think is the case for any of our shaman builds. for our teuthology testing, we'd specifically need this on for the centos notcmalloc build

@cbodley
Copy link
Contributor

cbodley commented Jan 22, 2021

interesting; does protected_fixedsize_stack supercede the approaches we've been using in other coroutines?

nope, that is the same mmap+mprotect allocator that we switched to for rgw_asio_frontend.cc. i don't really think it's necessary here in rgw_notify.cc, where the code is very small and unlikely to overflow its stack - but these coroutines are long-lived, so the overhead of the extra system calls on spawn() is negligible

@yuvalif
Copy link
Contributor Author

yuvalif commented Jan 22, 2021

interesting; does protected_fixedsize_stack supercede the approaches we've been using in other coroutines?

nope, that is the same mmap+mprotect allocator that we switched to for rgw_asio_frontend.cc. i don't really think it's necessary here in rgw_notify.cc, where the code is very small and unlikely to overflow its stack - but these coroutines are long-lived, so the overhead of the extra system calls on spawn() is negligible

i did not increase the stack size beyond the default, but wanted to make sure it crashed if actual size is too big

@yuvalif
Copy link
Contributor Author

yuvalif commented Jan 23, 2021

@cbodley
Copy link
Contributor

cbodley commented Jan 25, 2021

that's with valgrind off. there's #38871 to turn it back on. you can test against that with:

--suite-repo https://github.com/cbodley/ceph.git --suite-branch wip-qa-rgw-valgrind-on

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

Copy link
Contributor

@adamemerson adamemerson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lawndarts, a Game Too Menacing

@cbodley cbodley merged commit 09bdf42 into ceph:master Jan 29, 2021
@cbodley
Copy link
Contributor

cbodley commented Feb 3, 2021

i still saw the InvalidRead issues from spawn::detail::continuation_context::resume() on master with https://pulpito.ceph.com/cbodley-2021-02-02_00:26:24-rgw:verify-master-distro-basic-gibba/

looking through the shaman build logs, it doesn't look like it's using WITH_SYSTEM_BOOST so this PR doesn't take effect

WITH_BOOST_VALGRIND seems to be the right way to resolve this one. i opened ceph/ceph-build#1736 against the ceph-build project to add that flag to the cmake command line for notcmalloc builds

regarding this PR, i think we should revert the cmake change but keep and backport the protected_fixedsize_stack change in rgw_notify.cc

@cbodley
Copy link
Contributor

cbodley commented Feb 3, 2021

opened #39263 to revert the cmake change

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants