Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

librbd: image refresh code paths converted to async state machines #6859

Merged
merged 32 commits into from Dec 15, 2015

Conversation

dillaman
Copy link

@dillaman dillaman commented Dec 8, 2015

The goal is to eliminate all possible deadlock potential between mixing synchronous and asynchronous workloads.

@jdurgin
Copy link
Member

jdurgin commented Dec 10, 2015

I like the approach. The ascii-art diagrams and send/handle naming conventions help a lot. lgtm in general, though I didn't go over all the new state machines in detail. I'm curious to see the integration.

@dillaman dillaman changed the title librbd: image refresh code paths converted to async state machines [DNM] librbd: image refresh code paths converted to async state machines Dec 11, 2015
@dillaman dillaman force-pushed the wip-librbd-async-refresh branch 2 times, most recently from 8f5d671 to b38fd31 Compare December 11, 2015 21:24
@jdurgin
Copy link
Member

jdurgin commented Dec 12, 2015

Guessing there are still some bugs to work out once the lab is back next week. unit tests pass, but opening an image hangs, i.e.: rbd create -s 1 test; rbd info test

Thread 1 (Thread 0x7ffff7fd59c0 (LWP 13705)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007ffff4fb022a in Wait (mutex=..., this=0x7fffffffd3d0) at ./common/Cond.h:55
#2  wait (this=0x7fffffffd370) at ./common/Cond.h:186
#3  librbd::ImageState<librbd::ImageCtx>::open (this=<optimized out>) at librbd/ImageState.cc:39
#4  0x00007ffff4f5a19a in librbd::RBD::open_read_only (this=0x7fffffffd57e, io_ctx=..., image=..., name=0x555557d4d0a8 "test", 
    snap_name=0x0) at librbd/librbd.cc:151
#5  0x000055555568ea4f in rbd::utils::open_image (io_ctx=..., image_name="test", read_only=read_only@entry=true, 
    image=image@entry=0x7fffffffd6c0) at tools/rbd/Utils.cc:546
#6  0x000055555568ec79 in rbd::utils::init_and_open_image (pool_name="rbd", image_name="test", snap_name="", 
    read_only=read_only@entry=true, rados=rados@entry=0x7fffffffd6a0, io_ctx=io_ctx@entry=0x7fffffffd6b0, 
    image=image@entry=0x7fffffffd6c0) at tools/rbd/Utils.cc:569
#7  0x00005555556b2c85 in rbd::action::info::execute (vm=...) at tools/rbd/action/Info.cc:219
#8  0x000055555568ac6b in rbd::Shell::execute (this=this@entry=0x7fffffffddcf, arg_count=arg_count@entry=3, 
    arg_values=arg_values@entry=0x7fffffffdee8) at tools/rbd/Shell.cc:154
#9  0x000055555565d3c2 in main (argc=3, argv=0x7fffffffdee8) at tools/rbd/rbd.cc:19

Thread 15 (Thread 0x7fffe1ffb700 (LWP 13724)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007ffff5116c12 in Wait (mutex=..., this=0x555557da60c8) at common/Cond.h:55
#2  ThreadPool::drain (this=0x555557da6000, wq=0x555557da3a30) at common/WorkQueue.cc:266
#3  0x00007ffff4fa4229 in drain (this=0x555557da3a30) at ./common/WorkQueue.h:373
#4  librbd::ImageCtx::~ImageCtx (this=0x555557db0760, __in_chrg=<optimized out>) at librbd/ImageCtx.cc:215
#5  0x00007ffff4ffb639 in librbd::image::OpenRequest<librbd::ImageCtx>::handle_close_image (this=0x555557daeeb0,
    result=result@entry=0x7fffe1ff760c) at librbd/image/OpenRequest.cc:361
#6  0x00007ffff4ffb92d in librbd::util::detail::C_StateCallbackAdapter<librbd::image::OpenRequest<librbd::ImageCtx>, &librbd::image::OpenRequest<librbd::ImageCtx>::handle_close_image, true>::complete (this=0x7fffc40039a0, r=0) at ./librbd/Utils.h:61
#7  0x00007ffff4ff9683 in librbd::image::CloseRequest<librbd::ImageCtx>::finish (this=0x7fffc40028c0)
    at librbd/image/CloseRequest.cc:252
#8  0x00007ffff4ff9c15 in librbd::image::CloseRequest<librbd::ImageCtx>::send_close_parent (this=this@entry=0x7fffc40028c0)
    at librbd/image/CloseRequest.cc:221
#9  0x00007ffff4ff9cb8 in librbd::image::CloseRequest<librbd::ImageCtx>::handle_flush_op_work_queue (this=0x7fffc40028c0, r=0)
    at librbd/image/CloseRequest.cc:215
#10 0x00007ffff4f7f0b9 in Context::complete (this=0x7fffc8000ba0, r=<optimized out>) at ./include/Context.h:64
#11 0x00007ffff4fab814 in ContextWQ::process (this=0x555557da3a30, ctx=0x7fffc8000ba0) at ./common/WorkQueue.h:610
#12 0x00007ffff511c11e in ThreadPool::worker (this=0x555557da6000, wt=0x555557da62d0) at common/WorkQueue.cc:128
#13 0x00007ffff511cff0 in ThreadPool::WorkThread::entry (this=<optimized out>) at common/WorkQueue.h:448
#14 0x00007ffff17c5f33 in start_thread (arg=0x7fffe1ffb700) at pthread_create.c:309
#15 0x00007ffff0344ead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

@dillaman dillaman changed the title [DNM] librbd: image refresh code paths converted to async state machines librbd: image refresh code paths converted to async state machines Dec 14, 2015
@jdurgin
Copy link
Member

jdurgin commented Dec 15, 2015

running it through tests now, but needs a rebase

Jason Dillaman added 9 commits December 14, 2015 20:30
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
It should only run when the object map is enabled for the image.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
New async versions of get_id, get_immutable_metadata,
get_mutable_metadata, get_flags, get_stripe_unit_count,
snapshot_list, and old_snapshot_list.  These are needed
by the new librbd async ImageCtx op state machines.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman added 23 commits December 14, 2015 20:30
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Creating async versions to support an async image refresh

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Use new ExclusiveLock state machine to handle all the proper
transitions between lock states.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
This class handles more than just refreshing the image from
disk.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
This avoids blocking the AIO path with the previous synchronous
refresh path.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
The AIO path already ensures asynchronous image refreshes, but the
watch/notify path for maintenance ops was still using the sync refresh
path.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
An image refresh will no longer trigger an automatic object map
reload, therefore explicitly reload the object map after rollback.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
@dillaman
Copy link
Author

@jdurgin rebased

@dillaman
Copy link
Author

RGW failure in loic-bot

@jdurgin
Copy link
Member

jdurgin commented Dec 15, 2015

@yehudasa

rgw/rgw_rest_swift.cc: In member function 'virtual void RGWDeleteObj_ObjStore_SWIFT::send_response()':
rgw/rgw_rest_swift.cc:822:29: error: 'struct req_state' has no member named 'bucket_name_str'
       path.bucket_name = s->bucket_name_str;
                             ^

jdurgin added a commit that referenced this pull request Dec 15, 2015
librbd: image refresh code paths converted to async state machines

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
@jdurgin jdurgin merged commit 265e18f into ceph:master Dec 15, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants