Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

librbd: asynchronous clone state machine #12041

Merged
merged 6 commits into from Mar 21, 2017

Conversation

Projects
None yet
5 participants
@yangdongsheng
Copy link
Member

yangdongsheng commented Nov 17, 2016

No description provided.

@yangdongsheng

This comment has been minimized.

Copy link
Member Author

yangdongsheng commented Nov 17, 2016

I am still in working struggling with some bugs.

@yangdongsheng yangdongsheng force-pushed the yangdongsheng:rbd_mirror_clone branch 3 times, most recently from 62d09c1 to bfa98fe Nov 18, 2016

@yangdongsheng yangdongsheng changed the title [RFC] rbd: asynchronous clone [DNM] rbd: asynchronous clone Nov 18, 2016

@yangdongsheng

This comment has been minimized.

Copy link
Member Author

yangdongsheng commented Nov 18, 2016

There is still some problems I have to figure out.

@yangdongsheng yangdongsheng force-pushed the yangdongsheng:rbd_mirror_clone branch from bfa98fe to cf71908 Nov 19, 2016

@yangdongsheng

This comment has been minimized.

Copy link
Member Author

yangdongsheng commented Nov 19, 2016

@dillaman @trociny It's almost ready for review, Please help to take a look. thanx :)

@yangdongsheng yangdongsheng force-pushed the yangdongsheng:rbd_mirror_clone branch from cf71908 to f25bc57 Nov 19, 2016

class CloneRequest {
public:
static CloneRequest *create(ImageCtx *p_imctx, IoCtx &c_ioctx, const std::string &c_name,
ImageOptions c_options,

This comment has been minimized.

Copy link
@trociny

trociny Nov 21, 2016

Contributor

@yangdongsheng Why not pass image options by reference? And then you don't need ImageOptions copy constructor?

This comment has been minimized.

Copy link
@yangdongsheng

yangdongsheng Nov 21, 2016

Author Member

CloneRequest need to pass an options to CreateRequest. So, we should keep an m_image_opts in CloneRequest, Because rbd-mirror is passing a local variable of options to CloneRequest, if we pass it by reference to CloneRequest, we will get an seg fault when defer it.

This comment has been minimized.

Copy link
@trociny

trociny Nov 21, 2016

Contributor

Ok, agree we need a copy constructor. Still, I would like the options to be passed by (const) reference here, to be consistent with CreateRequest. It will still be copied to m_opts in the constructor.

This comment has been minimized.

Copy link
@yangdongsheng

yangdongsheng Nov 21, 2016

Author Member

Agreed, good point. Thanx @trociny

Context *ctx = new FunctionContext([this](int r) {
librbd::NoOpProgressContext no_op;
r = librbd::remove(m_ioctx, m_name, "", no_op);
handle_remove(r);

This comment has been minimized.

Copy link
@trociny

trociny Nov 21, 2016

Contributor

So it looks you need async remove. And that is why TestMockImageReplayerCreateImageRequest.Clone unittest hangs in Jenkins.

This comment has been minimized.

Copy link
@yangdongsheng

yangdongsheng Nov 21, 2016

Author Member

Hmmm, m_work_queue is working on tp_librbd right, so even I queue this ctx to m_work_queue, there would be a deadlock, right? So I have to create another thread ? Any suggestion? thanx

This comment has been minimized.

Copy link
@dillaman

dillaman Nov 21, 2016

Contributor

You will need to create an asynchronous remove image state machine before you can create an async clone state machine.

This comment has been minimized.

Copy link
@yangdongsheng

yangdongsheng Nov 21, 2016

Author Member

yes, that's true. And @vshankar has send a pr for the async remove state machine. #10896

Maybe I should wait for that pr merged, but before that, am I understanding it correctly? I have to create another thread for the librbd::remove(), instead of calling it in tp_librbd? am I right? thanx @dillaman

This comment has been minimized.

Copy link
@vshankar

vshankar Nov 21, 2016

Contributor

@yangdongsheng you could carve out image remove async state machine from PR #10896 (https://github.com/ceph/ceph/pull/10896/commits) as a separate PR and reference it here if you want.

This comment has been minimized.

Copy link
@yangdongsheng

yangdongsheng Nov 21, 2016

Author Member

Oh, thanx @vshankar 😉

handle_clone_image(r);
});
m_work_queue->queue(ctx, 0);
RWLock::RLocker snap_locker(m_remote_image_ctx->snap_lock);

This comment has been minimized.

Copy link
@trociny

trociny Nov 21, 2016

Contributor

I suppose no need for snap lock here?

This comment has been minimized.

Copy link
@yangdongsheng

yangdongsheng Nov 21, 2016

Author Member

Oh, right, I just copy it from the original sync version implementation. thanx

}
}
}

This comment has been minimized.

Copy link
@trociny

trociny Nov 21, 2016

Contributor

You could have something like below instead:

  void image_options_copy(rbd_image_options_t* opts,
			  rbd_image_options_t orig)
  {
    image_options_ref* opts_ = new image_options_ref(new image_options_t());
    image_options_ref* orig_ = static_cast<image_options_ref*>(orig);

    *opts_ = *orig_;
    *opts = static_cast<rbd_image_options_t>(opts_);
  }

  ImageOptions::ImageOptions(const ImageOptions &imgopts)
  {
    librbd::image_options_copy(&opts, imgopts.opts);
  }

This comment has been minimized.

Copy link
@yangdongsheng

yangdongsheng Nov 21, 2016

Author Member

but opts is private....

This comment has been minimized.

Copy link
@yangdongsheng

yangdongsheng Nov 21, 2016

Author Member

So we have to iterate the image options and use the accessors (get() and set()) to do a full copy.

@@ -586,6 +586,11 @@ namespace librbd {
librbd::image_options_create_ref(&opts, opts_);
}

ImageOptions::ImageOptions(ImageOptions &imgopts)

This comment has been minimized.

Copy link
@trociny

trociny Nov 21, 2016

Contributor

const ImageOptions &imgopt

This comment has been minimized.

Copy link
@trociny

trociny Dec 8, 2016

Contributor

Why no const modifier?

This comment has been minimized.

Copy link
@yangdongsheng

yangdongsheng Dec 8, 2016

Author Member

Thanx @trociny , Actually I agree with you here. But I have to wait #12102 to be merged. Then I will rebase this branch and address the comments all. Thanx for your kindly help.

return m_on_finish;
}

if (m_r_saved == 0) {

This comment has been minimized.

Copy link
@trociny

trociny Nov 21, 2016

Contributor

Please initialize m_r_saved in the class definition or the constructor. It may be uninitialized at this point.

This comment has been minimized.

Copy link
@yangdongsheng

yangdongsheng Nov 21, 2016

Author Member

yes, I forgot it.

@yangdongsheng yangdongsheng force-pushed the yangdongsheng:rbd_mirror_clone branch 2 times, most recently from 0bac6ac to f4f3506 Dec 1, 2016

@yangdongsheng

This comment has been minimized.

Copy link
Member Author

yangdongsheng commented Dec 5, 2016

Hi @trociny There is some error in my unittest

/home/jenkins-build/build/workspace/ceph-pull-requests/src/test/librados_test_stub/MockTestMemIoCtxImpl.h:148: ERROR: this mock object (used in test TestMockOperationSnapshotRemoveRequest.FlattenedCloneRemovesChild) should be deleted but never is. Its address is @0x7fe778007440.
/home/jenkins-build/build/workspace/ceph-pull-requests/src/test/librados_test_stub/MockTestMemIoCtxImpl.h:148: ERROR: this mock object (used in test TestMockOperationSnapshotRemoveRequest.FlattenedCloneRemovesChild) should be deleted but never is. Its address is @0x7fe77800e540.
/home/jenkins-build/build/workspace/ceph-pull-requests/src/test/librados_test_stub/MockTestMemIoCtxImpl.h:148: ERROR: this mock object (used in test TestMockOperationSnapshotRemoveRequest.RemoveChildError) should be deleted but never is. Its address is @0x7fe778014b60.
/home/jenkins-build/build/workspace/ceph-pull-requests/src/test/librados_test_stub/MockTestMemIoCtxImpl.h:148: ERROR: this mock object (used in test TestMockOperationSnapshotRemoveRequest.RemoveChildError) should be deleted but never is. Its address is @0x7fe77801e210.
/home/jenkins-build/build/workspace/ceph-pull-requests/src/test/librados_test_stub/MockTestMemIoCtxImpl.h:148: ERROR: this mock object (used in test TestMockImageRefreshRequest.SuccessChild) should be deleted but never is. Its address is @0x7fe798016fe0.
/home/jenkins-build/build/workspace/ceph-pull-requests/src/test/librados_test_stub/MockTestMemIoCtxImpl.h:148: ERROR: this mock object (used in test TestMockImageRefreshRequest.SuccessChild) should be deleted but never is. Its address is @0x7fe7980193c0.
/home/jenkins-build/build/workspace/ceph-pull-requests/src/test/librados_test_stub/MockTestMemIoCtxImpl.h:148: ERROR: this mock object (used in test TestMockOperationSnapshotRemoveRequest.FlattenedCloneRemovesChild) should be deleted but never is. Its address is @0x7fe7deb05510.
/home/jenkins-build/build/workspace/ceph-pull-requests/src/test/librados_test_stub/MockTestMemIoCtxImpl.h:148: ERROR: this mock object (used in test TestMockOperationSnapshotRemoveRequest.RemoveChildError) should be deleted but never is. Its address is @0x7fe7deb137d0.
/home/jenkins-build/build/workspace/ceph-pull-requests/src/test/librados_test_stub/MockTestMemIoCtxImpl.h:148: ERROR: this mock object (used in test TestMockImageRefreshRequest.SuccessChild) should be deleted but never is. Its address is @0x7fe7deb1ac40.
ERROR: 9 leaked mock objects found at program exit.

Any suggestion about it? thanx a lot.

@trociny

This comment has been minimized.

Copy link
Contributor

trociny commented Dec 5, 2016

@yangdongsheng I suppose those just are consequences of earlier test failures, like this one:

[ RUN      ] TestMockImageReplayerCreateImageRequest.CloneError
/home/jenkins-build/build/workspace/ceph-pull-requests/src/test/rbd_mirror/image_replayer/test_mock_CreateImageRequest.cc:605: Failure
      Expected: -22
To be equal to: ctx.wait()
      Which is: 0

When you fix these I guess those you referred will have gone too.
Just in case you are not aware, you can run this locally:
CEPH_LIB=./lib ./bin/unittest_rbd_mirror
or for one test:
CEPH_LIB=./lib ./bin/unittest_rbd_mirror --gtest_filter='TestMockImageReplayerCreateImageRequest.CloneError'

@yangdongsheng yangdongsheng force-pushed the yangdongsheng:rbd_mirror_clone branch from f4f3506 to 8452919 Dec 6, 2016

@yangdongsheng

This comment has been minimized.

Copy link
Member Author

yangdongsheng commented Dec 6, 2016

@ Thanx for your help, But the failure is fixed, the leaked error has not gone. @trociny

@trociny

This comment has been minimized.

Copy link
Contributor

trociny commented Dec 6, 2016

@yangdongsheng Right now I don't have an idea about these leaked mock objects (will have to look closer later), but in Jenkins I see there other warnings that need to be fixed, like this one:

[ RUN      ] TestMockImageSyncSnapshotCopyRequest.SnapUnprotectRemove

GMOCK WARNING:
Uninteresting mock function call - returning default value.
    Function call: get_snap_namespace(8, 0x7fcee47c8600)
          Returns: 0
NOTE: You can safely ignore the above warning unless this call should not happen.  Do not suppress it by blindly adding an EXPECT_CALL() if you don't mean to enforce the call.  See https://github.com/google/googletest/blob/master/googlemock/docs/CookBook.md#knowing-when-to-expect for details.

GMOCK WARNING:
Uninteresting mock function call - returning default value.
    Function call: get_snap_namespace(11, 0x7fcee47c85d0)
          Returns: 0
NOTE: You can safely ignore the above warning unless this call should not happen.  Do not suppress it by blindly adding an EXPECT_CALL() if you don't mean to enforce the call.  See https://github.com/google/googletest/blob/master/googlemock/docs/CookBook.md#knowing-when-to-expect for details.
[       OK ] TestMockImageSyncSnapshotCopyRequest.SnapUnprotectRemove (21 ms)

Note, there are also warnings from TestMockImageReplayerBootstrapRequest about is_resync_requested call. These are not related to your changes. I will create a PR for these.

@yangdongsheng yangdongsheng force-pushed the yangdongsheng:rbd_mirror_clone branch 2 times, most recently from 86f372f to 28db2c0 Dec 7, 2016

@yangdongsheng

This comment has been minimized.

Copy link
Member Author

yangdongsheng commented Dec 7, 2016

@trociny Thanx for your help. I have found the problems now. it's about I forgot to delete the objects in CloneRequest. Now the Error about leaked objects has gone.

.WillOnce(Return(r));
void expect_clone_image(MockCloneRequest &mock_clone_request,
librbd::MockTestImageCtx *mock_parent_imctx,
int r) {

This comment has been minimized.

Copy link
@trociny

trociny Dec 8, 2016

Contributor

Why not pass mock_parent_imctx by reference? To be consistent with other methods like expect_open/close_image.

This comment has been minimized.

Copy link
@yangdongsheng

yangdongsheng Dec 19, 2016

Author Member

Good suggestion, will change it as suggested. thanx

m_op_work_queue(op_work_queue), m_on_finish(on_finish),
m_use_p_features(true) {

m_ioctx.dup(c_ioctx);

This comment has been minimized.

Copy link
@trociny

trociny Dec 8, 2016

Contributor

Do we need to duplicate ioctx here? Wouldn't it work if we use a reference (define m_ioctx as IoCtx &m_ioctx)?

This comment has been minimized.

Copy link
@yangdongsheng

yangdongsheng Dec 19, 2016

Author Member

I followed the conversation in src/librbd/image/XXXrequest. For example, CreateRequest and RemoveRequest. IMO, the difference is that reference way does not work if we delete the ioctx before this request complete, although I did not find a scenario like this. So, I think the dup way is more safe.

This comment has been minimized.

Copy link
@dillaman

dillaman Dec 19, 2016

Contributor

The dup call will make it impossible to create mock tests against any librados calls within the state machine. I think dup should basically be reserved for the case when you need to have a separate librados::IoCtx w/ different settings from the base one (i.e. snapshot context).

This comment has been minimized.

Copy link
@yangdongsheng

yangdongsheng Dec 20, 2016

Author Member

thanx @trociny and @dillaman , testing point is convincible to me. So I will use reference way in CloneRequest here and then open a separate PR to fix the similar problems otherwhere if necessary. thanx a lot.

This comment has been minimized.

Copy link
@yangdongsheng

yangdongsheng Dec 20, 2016

Author Member

BTW, @trociny and @dillaman I think the #12102 is almost ready, please help to take a look. This PR is based on it. I prefer to merge #12102 at first. thanx

@yangdongsheng yangdongsheng force-pushed the yangdongsheng:rbd_mirror_clone branch 2 times, most recently from 3791b43 to 0ea70c8 Dec 19, 2016

@yangdongsheng yangdongsheng force-pushed the yangdongsheng:rbd_mirror_clone branch from 8da6d03 to acfaebf Feb 28, 2017

@yangdongsheng

This comment has been minimized.

Copy link
Member Author

yangdongsheng commented Mar 3, 2017

@dillaman Therer is a fail, but seems not related.

@trociny

This comment has been minimized.

Copy link
Contributor

trociny commented Mar 3, 2017

@yangdongsheng github reports about a conflict in src/librbd/internal.cc

@yangdongsheng

This comment has been minimized.

Copy link
Member Author

yangdongsheng commented Mar 3, 2017

@trociny oops, will rebase soon.

@yangdongsheng yangdongsheng force-pushed the yangdongsheng:rbd_mirror_clone branch from acfaebf to 8125c29 Mar 3, 2017

@yangdongsheng yangdongsheng changed the title [DNM] rbd: asynchronous clone rbd: asynchronous clone Mar 8, 2017

@yangdongsheng yangdongsheng force-pushed the yangdongsheng:rbd_mirror_clone branch from 8125c29 to 161d8e2 Mar 13, 2017

cls_rbd_client: support asynchronous add_child.
Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn>

@yangdongsheng yangdongsheng force-pushed the yangdongsheng:rbd_mirror_clone branch 3 times, most recently from 81eb652 to e6ccf78 Mar 13, 2017

@dillaman

This comment has been minimized.

Copy link
Contributor

dillaman commented Mar 13, 2017

Nit: spelling mistake in "rbd-mirror: use asynchorous CloneRequest" commit subject

@dillaman
Copy link
Contributor

dillaman left a comment

@yangdongsheng sorry for the delay -- just noticed that you pushed an update

*opts = static_cast<rbd_image_options_t>(opts_);

for (auto &i : IMAGE_OPTIONS_TYPE_MAPPING) {
if (i.second == STR) {

This comment has been minimized.

Copy link
@dillaman

dillaman Mar 13, 2017

Contributor

Nit: for code maintenance, I would recommend using a switch statement

@@ -34,6 +34,7 @@ set(librbd_internal_srcs
exclusive_lock/StandardPolicy.cc
image/CloseRequest.cc
image/CreateRequest.cc
image/CloneRequest.cc

This comment has been minimized.

Copy link
@dillaman

dillaman Mar 13, 2017

Contributor

Nit: move this before image/CloseRequest.cc to keep in alphabetical order

@@ -35,6 +35,7 @@
#include "librbd/exclusive_lock/StandardPolicy.h"
#include "librbd/image/CreateRequest.h"
#include "librbd/image/RemoveRequest.h"
#include "librbd/image/CloneRequest.h"

This comment has been minimized.

Copy link
@dillaman

dillaman Mar 13, 2017

Contributor

Nit: put in alphabetical order above CreateRequest

* | | REFRESH
* | | / |
* | CLEAN DIR_CHILDREN <--------/ v (meta is empty)
* | |\ GET METAS IN PARENT . . . . . . .

This comment has been minimized.

Copy link
@dillaman

dillaman Mar 13, 2017

Contributor

Nit: METAS -> META here and below

* | | SET METAS IN CHILD v
* | | / | .
* | -------<-------/ v .
* | CLOSE IMAGE . . . . .< . . .

This comment has been minimized.

Copy link
@dillaman

dillaman Mar 13, 2017

Contributor

Enabling mirroring needs to be added to the diagram

void send_close();
Context *handle_close(int *r);

void switch_thread_context();

This comment has been minimized.

Copy link
@dillaman

dillaman Mar 13, 2017

Contributor

Hopefully this is no longer required

Context *CloneRequest<I>::handle_close(int *r) {
ldout(m_cct, 20) << this << " " << __func__ << dendl;

if (*r < 0) {

This comment has been minimized.

Copy link
@dillaman

dillaman Mar 13, 2017

Contributor

Assuming switch_thread_context can be removed now, ensure the ImageCtx is destroyed even on error

}

m_on_finish->complete(r);
delete this;

This comment has been minimized.

Copy link
@dillaman

dillaman Mar 13, 2017

Contributor

Nit: if this state machine is going to mix-and-match the Context *handle_XYX(int *)-style which auto-deletes itself and this style, I'd rather convert the whole state machine to eliminate the Context *handle_XYZ(int *)-style in favor of this style.

This comment has been minimized.

Copy link
@yangdongsheng

yangdongsheng Mar 14, 2017

Author Member

Thanx I use the Context *handle_xxx(int *) style in the whole state machine, same with other state machine in librbd/image/

This comment has been minimized.

Copy link
@dillaman

dillaman Mar 15, 2017

Contributor

OK -- but that implies this method shouldn't exist.

This comment has been minimized.

Copy link
@yangdongsheng

yangdongsheng Mar 16, 2017

Author Member

But there is some cases using this method before we start a "real" state machine. such as the validate_options(). The CreateRequest and CloseRequest have the same problem. So they implemented CreateRequest::complete(int r) and CloseRequest::finish().

What I can do here I think is converting Context *handle_xxx(int *) into void handle_xxx(int r). And use the complete() method everywhere in CloneRequest as what you suggested previously.

This comment has been minimized.

Copy link
@dillaman

dillaman Mar 16, 2017

Contributor

Yeah, I noticed CreateRequest had the same issue and I just fixed it while fixing another ticket.

@yangdongsheng yangdongsheng force-pushed the yangdongsheng:rbd_mirror_clone branch from e6ccf78 to 14cbf1a Mar 14, 2017

rbd: introduce a new constructor for ImageOptions
Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn>

@yangdongsheng yangdongsheng force-pushed the yangdongsheng:rbd_mirror_clone branch from 14cbf1a to a33dfdf Mar 14, 2017

@dillaman

This comment has been minimized.

Copy link
Contributor

dillaman commented Mar 15, 2017

retest this please

}

if (m_r_saved == 0) {
complete(0);

This comment has been minimized.

Copy link
@dillaman

dillaman Mar 15, 2017

Contributor

@yangdongsheng One of many examples of how complete(XYZ) is being used instead of just returning m_on_finish from the Context *handle_XYZ function.

yangdongsheng added some commits Nov 17, 2016

librbd: Asynchronous image clone
Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
librbd: use CloneRequest in librbd::clone
Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
rbd-mirror: use asynchrorous CloneRequest
Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
rbd-mirror: fix the unittest for new CloneRequest
Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn>

@yangdongsheng yangdongsheng force-pushed the yangdongsheng:rbd_mirror_clone branch from a33dfdf to 8283cb5 Mar 16, 2017

@yangdongsheng

This comment has been minimized.

Copy link
Member Author

yangdongsheng commented Mar 16, 2017

@dillaman updated. :)

@dillaman
Copy link
Contributor

dillaman left a comment

lgtm

out-of-date review

@dillaman dillaman changed the title rbd: asynchronous clone librbd: asynchronous clone state machine Mar 21, 2017

@dillaman dillaman merged commit 16f7da4 into ceph:master Mar 21, 2017

3 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodifed Submodules submodules for project are unmodified
Details
default Build finished.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.