Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

librbd: use async librados notifications #7668

Merged
merged 11 commits into from Feb 19, 2016

Conversation

dillaman
Copy link

There is a possible edge condition when multiple images are opened and a refresh is required. The WorkQueue will be blocked waiting for the notification to be ACKed, but it cannot be ACKed until the (blocked) refresh is complete.

This also includes minor modifications (and associated tests) to ensure librbd and properly replay uncommitted events followed by resuming live ops without upsetting lockdep.

@dillaman dillaman force-pushed the wip-librbd-replay-locks branch 3 times, most recently from 7f5be72 to b70b4d8 Compare February 17, 2016 14:23
@dillaman dillaman changed the title [DNM] librbd: use async librados notifications librbd: use async librados notifications Feb 17, 2016
@jdurgin
Copy link
Member

jdurgin commented Feb 18, 2016

getting a unittest failure with RBD_FEATURES=5:

[ RUN      ] DiffIterateTest/1.DiffIterateDiscard
using new format!
[New Thread 0x7fffd238b700 (LWP 2955)]
[New Thread 0x7fffdd3a1700 (LWP 2956)]
iterate_cb 0~4194304
iterate_cb 0~4194304

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffde3a3700 (LWP 1876)]
Mutex::Lock (this=this@entry=0x8, no_lockdep=no_lockdep@entry=false) at common/Mutex.cc:95
95        if (lockdep && g_lockdep && !no_lockdep) _will_lock();
Missing separate debuginfos, use: debuginfo-install boost-iostreams-1.54.0-10.fc20.x86_64 boost-random-1.54.0-10.fc20.x86_64 boost-system-1.54.0-10.fc20.x86_64 boost-thread-1.54.0-10.fc20.x86_64 bzip2-libs-1.0.6-9.fc20.x86_64 libblkid-2.24.2-2.fc20.x86_64 libgcc-4.8.3-7.fc20.x86_64 libstdc++-4.8.3-7.fc20.x86_64 libuuid-2.24.2-2.fc20.x86_64 lttng-ust-2.3.0-1.fc20.x86_64 nspr-4.10.7-1.fc20.x86_64 nss-3.17.2-1.fc20.x86_64 nss-util-3.17.2-1.fc20.x86_64 userspace-rcu-0.7.7-2.fc20.x86_64 zlib-1.2.8-3.fc20.x86_64
(gdb) bt
#0  Mutex::Lock (this=this@entry=0x8, no_lockdep=no_lockdep@entry=false) at common/Mutex.cc:95
#1  0x0000555555955260 in Locker (m=..., this=<synthetic pointer>) at ./common/Mutex.h:115
#2  librbd::ExclusiveLock<librbd::ImageCtx>::handle_lock_released (this=0x0) at ./librbd/ExclusiveLock.cc:163
#3  0x0000555555b86975 in librbd::ImageWatcher::handle_payload (this=this@entry=0x7fffc8188ce0, payload=..., ack_ctx=ack_ctx@entry=0x7fffc002c010) at librbd/ImageWatcher.cc:524
#4  0x0000555555b8710c in operator()<librbd::watch_notify::ReleasedLockPayload> (payload=..., this=<optimized out>) at ./librbd/ImageWatcher.h:215
#5  internal_visit<librbd::watch_notify::ReleasedLockPayload const> (operand=..., this=<synthetic pointer>) at /usr/include/boost/variant/variant.hpp:1017
#6  visitation_impl_invoke_impl<boost::detail::variant::invoke_visitor<librbd::ImageWatcher::HandlePayloadVisitor const>, void const*, librbd::watch_notify::ReleasedLockPayload> (storage=0x7fffde3a0df8, 
    visitor=<synthetic pointer>) at /usr/include/boost/variant/detail/visitation_impl.hpp:130
#7  visitation_impl_invoke<boost::detail::variant::invoke_visitor<librbd::ImageWatcher::HandlePayloadVisitor const>, void const*, librbd::watch_notify::ReleasedLockPayload, boost::variant<librbd::watch_notify::AcquiredLockPayload, librbd::watch_notify::ReleasedLockPayload, librbd::watch_notify::RequestLockPayload, librbd::watch_notify::HeaderUpdatePayload, librbd::watch_notify::AsyncProgressPayload, librbd::watch_notify::AsyncCompletePayload, librbd::watch_notify::FlattenPayload, librbd::watch_notify::ResizePayload, librbd::watch_notify::SnapCreatePayload, librbd::watch_notify::SnapRemovePayload, librbd::watch_notify::SnapRenamePayload, librbd::watch_notify::SnapProtectPayload, librbd::watch_notify::SnapUnprotectPayload, librbd::watch_notify::RebuildObjectMapPayload, librbd::watch_notify::RenamePayload, librbd::watch_notify::UnknownPayload>::has_fallback_type_> (internal_which=<optimized out>, t=0x0, storage=0x7fffde3a0df8, visitor=<synthetic pointer>) at /usr/include/boost/variant/detail/visitation_impl.hpp:173
#8  visitation_impl<mpl_::int_<0>, boost::detail::variant::visitation_impl_step<boost::mpl::l_iter<boost::mpl::l_item<mpl_::long_<16l>, librbd::watch_notify::AcquiredLockPayload, boost::mpl::l_item<mpl_::long_<15l>, librbd::watch_notify::ReleasedLockPayload, boost::mpl::l_item<mpl_::long_<14l>, librbd::watch_notify::RequestLockPayload, boost::mpl::l_item<mpl_::long_<13l>, librbd::watch_notify::HeaderUpdatePayload, boost::mpl::l_item<mpl_::long_<12l>, librbd::watch_notify::AsyncProgressPayload, boost::mpl::l_item<mpl_::long_<11l>, librbd::watch_notify::AsyncCompletePayload, boost::mpl::l_item<mpl_::long_<10l>, librbd::watch_notify::FlattenPayload, boost::mpl::l_item<mpl_::long_<9l>, librbd::watch_notify::ResizePayload, boost::mpl::l_item<mpl_::long_<8l>, librbd::watch_notify::SnapCreatePayload, boost::mpl::l_item<mpl_::long_<7l>, librbd::watch_notify::SnapRemovePayload, boost::mpl::l_item<mpl_::long_<6l>, librbd::watch_notify::SnapRenamePayload, boost::mpl::l_item<mpl_::long_<5l>, librbd::watch_notify::SnapProtectPayload, boost::mpl::l_item<mpl_::long_<4l>, librbd::watch_notify::SnapUnprotectPayload, boost::mpl::l_item<mpl_::long_<3l>, librbd::watch_notify::RebuildObjectMapPayload, boost::mpl::l_item<mpl_::long_<2l>, librbd::watch_notify::RenamePayload, boost::mpl::l_item<mpl_::long_<1l>, librbd::watch_notify::UnknownPayload, boost::mpl::l_end> > > > > > > > > > > > > > > > >, boost::mpl::l_iter<boost::mpl::l_end> >, boost::detail::variant::invoke_visitor<librbd::ImageWatcher::HandlePayloadVisitor const>, void const*, boost::variant<librbd::watch_notify::AcquiredLockPayload, librbd::watch_notify::ReleasedLockPayload, librbd::watch_notify::RequestLockPayload, librbd::watch_notify::HeaderUpdatePayload, librbd::watch_notify::AsyncProgressPayload, librbd::watch_notify::AsyncCompletePayload, librbd::watch_notify::FlattenPayload, librbd::watch_notify::ResizePayload, librbd::watch_notify::SnapCreatePayload, librbd::watch_notify::SnapRemovePayload, librbd::watch_notify::SnapRenamePayload, librbd::watch_notify::SnapProtectPayload, librbd::watch_notify::SnapUnprotectPayload, librbd::watch_notify::RebuildObjectMapPayload, librbd::watch_notify::RenamePayload, librbd::watch_notify::UnknownPayload>::has_fallback_type_> (no_backup_flag=..., storage=0x7fffde3a0df8, visitor=<synthetic pointer>, 
    logical_which=<optimized out>, internal_which=<optimized out>) at /usr/include/boost/variant/detail/visitation_impl.hpp:256
#9  internal_apply_visitor_impl<boost::detail::variant::invoke_visitor<librbd::ImageWatcher::HandlePayloadVisitor const>, void const*> (storage=0x7fffde3a0df8, visitor=<synthetic pointer>, 
    logical_which=<optimized out>, internal_which=<optimized out>) at /usr/include/boost/variant/variant.hpp:2326
#10 internal_apply_visitor<boost::detail::variant::invoke_visitor<librbd::ImageWatcher::HandlePayloadVisitor const> > (visitor=<synthetic pointer>, this=0x7fffde3a0df0)
    at /usr/include/boost/variant/variant.hpp:2348
#11 apply_visitor<librbd::ImageWatcher::HandlePayloadVisitor const> (visitor=..., this=0x7fffde3a0df0) at /usr/include/boost/variant/variant.hpp:2370
#12 apply_visitor<librbd::ImageWatcher::HandlePayloadVisitor, boost::variant<librbd::watch_notify::AcquiredLockPayload, librbd::watch_notify::ReleasedLockPayload, librbd::watch_notify::RequestLockPayload, librbd::watch_notify::HeaderUpdatePayload, librbd::watch_notify::AsyncProgressPayload, librbd::watch_notify::AsyncCompletePayload, librbd::watch_notify::FlattenPayload, librbd::watch_notify::ResizePayload, librbd::watch_notify::SnapCreatePayload, librbd::watch_notify::SnapRemovePayload, librbd::watch_notify::SnapRenamePayload, librbd::watch_notify::SnapProtectPayload, librbd::watch_notify::SnapUnprotectPayload, librbd::watch_notify::RebuildObjectMapPayload, librbd::watch_notify::RenamePayload, librbd::watch_notify::UnknownPayload> const> (visitable=..., visitor=...) at /usr/include/boost/variant/detail/apply_visitor_unary.hpp:76
#13 librbd::ImageWatcher::process_payload (this=this@entry=0x7fffc8188ce0, notify_id=notify_id@entry=299, handle=handle@entry=127, payload=..., r=r@entry=0) at librbd/ImageWatcher.cc:760
#14 0x0000555555b87326 in librbd::ImageWatcher::handle_notify (this=0x7fffc8188ce0, notify_id=299, handle=127, bl=...) at librbd/ImageWatcher.cc:787
#15 0x0000555555c3fa78 in librados::TestWatchNotify::execute_notify (this=0x55555f20de00, oid="rbd_header.078024832", bl=..., notify_id=299) at test/librados_test_stub/TestWatchNotify.cc:171
#16 0x00005555559ecb1a in operator() (a0=<optimized out>, this=<optimized out>) at /usr/include/boost/function/function_template.hpp:767
#17 FunctionContext::finish (this=<optimized out>, r=<optimized out>) at ./include/Context.h:460
#18 0x0000555555914fe9 in Context::complete (this=0x7fffc801af80, r=<optimized out>) at ./include/Context.h:64
#19 0x0000555555ca2b76 in Finisher::finisher_thread_entry (this=0x55555f20e000) at common/Finisher.cc:68
#20 0x00007fffed8f8f33 in start_thread (arg=0x7fffde3a3700) at pthread_create.c:309
#21 0x00007fffec9e3ead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Jason Dillaman added 11 commits February 18, 2016 15:45
The header update and lock notifications might be invoked
from the librados AIO thread.  Update the close state
machine to flush any potential AIO notifications.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
…s thread"

This reverts commit d898995.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
If two or more images share the same CephContext, notifications
from one image can block the work queue which will potentially
block acknowledging the notification until after it times out.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
lockdep will complain about loop cycles that won't cause an
issue in reality as replay and record are two different
journal states.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Previously the image could not have been renamed twice without
re-opening the image.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Avoid leaving in-flight notification messages when transitioning lock
states.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
The write lock will be taken when the new state is applied.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
@dillaman
Copy link
Author

@jdurgin I ran run-rbd-unit-tests.sh for two hours in a loop under a single core and under a 6-core environment without issue. Let me know if something happens in your environment.

jdurgin added a commit that referenced this pull request Feb 19, 2016
librbd: use async librados notifications

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
@jdurgin jdurgin merged commit b556b24 into ceph:master Feb 19, 2016
@dillaman dillaman deleted the wip-librbd-replay-locks branch February 19, 2016 14:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants