New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rbd-mirror: replace remote pool polling with add/remove notifications #12364
Conversation
[DNM] rbd-mirror: replace remote pool polling with add/remove notifications #12364
@dillaman observing crashes when running rbd_mirror(_stress).sh on teuthology: http://pulpito.ceph.com/trociny-2016-12-09_06:43:36-rbd-wip-mgolub-testing---basic-mira/ and locally:
|
ad77529
to
e319db9
Compare
@trociny Pushed a fix for that crash |
@dillaman looks like related:
|
6eb4019
to
8b071d7
Compare
} | ||
|
||
void expect_mirroring_watcher_is_unregister(MockMirroringWatcher &mock_mirroring_watcher, | ||
bool unregistered) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dillaman May be the function name was supposed to be expect_mirroring_watcher_is_unregistered
?
1b43779
to
87e198d
Compare
Note: there is a race condition with the |
Outdated |
Sorry, wrong. |
@dillaman May be a subset of this PR (namely, "utilize global image id as internal unique key" and "preliminary support to track multiple remote peer image sources") could be merged as a separate PR, not waiting for #10896? I am just starting working on InstanceReplayerInterface [1] and it looks to me it would be good if your patches were merged before. |
if (r < 0) { | ||
derr << "error resolving remote pool " << m_remote_pool_id | ||
derr << "error resolving remote pool " << m_local_pool_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dillaman I think this error message is confusing now -- I suppose users would thinking the ID in the message is from the remote cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed -- probably should just pass in the pool name string since there really isn't a need to look it up.
@trociny Sure -- I'll pull out the parts I can into a cleanup PR |
1f5ec42
to
d49b4dc
Compare
src/tools/rbd_mirror/Replayer.cc
Outdated
stop_image_replayers(on_finish); | ||
}); | ||
ctx = create_async_context_callback(m_threads->work_queue, ctx); | ||
m_threads->timer->add_event_after(1, ctx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dillaman Observing teuthology failures [1]
It looks like all cases are due to the timer lock is not held here:
/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.0.0-1436-gb9f5564/rpm/el7/BUILD/ceph-12.0.0-1436-gb9f5564/src/common/Timer.cc: 127: FAILED assert(lock.is_locked())
ceph version 12.0.0-1436-gb9f5564 (b9f556438d3c7612e9c0a042c8d8f0959cabd3a0)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x7f351bd9c1a0]
2: (()+0x2b840d) [0x7f351bd9440d]
3: (rbd::mirror::Replayer::stop_image_replayers(Context*)+0x179) [0x7f3524d19119]
4: (rbd::mirror::Replayer::handle_shut_down_pool_watcher(int, Context*)+0xb6) [0x7f3524d194a6]
5: (FunctionContext::finish(int)+0x2a) [0x7f3524d1dc0a]
6: (Context::complete(int)+0x9) [0x7f3524d1cc29]
7: (ThreadPool::worker(ThreadPool::WorkThread*)+0xb59) [0x7f351bda4649]
8: (ThreadPool::WorkThread::entry()+0x10) [0x7f351bda5660]
9: (()+0x7dc5) [0x7f351a686dc5]
10: (clone()+0x6d) [0x7f351914573d]
[1] http://pulpito.ceph.com/trociny-2017-03-16_07:53:00-rbd-wip-mgolub-testing---basic-smithi/
d49b4dc
to
2bdf25a
Compare
src/tools/rbd_mirror/PoolWatcher.cc
Outdated
} | ||
|
||
virtual void handle_rewatch_complete(int r) override { | ||
m_pool_watcher->handle_rewatch_complete(r); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is using both virtual and override intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just very old code from before the switch-over -- I'll correct it
src/tools/rbd_mirror/PoolWatcher.cc
Outdated
m_pool_watcher->handle_rewatch_complete(r); | ||
} | ||
|
||
virtual void handle_mode_updated(cls::rbd::MirrorMode mirror_mode) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
override
src/tools/rbd_mirror/PoolWatcher.cc
Outdated
|
||
virtual void handle_image_updated(cls::rbd::MirrorImageState state, | ||
const std::string &remote_image_id, | ||
const std::string &global_image_id) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
override
src/tools/rbd_mirror/PoolWatcher.cc
Outdated
} | ||
for (auto &updated_image : m_updated_images) { | ||
updated_image.invalid = true; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this for
loop, taking that after the code above we should have a single (invalid) in-flight request?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed -- missed this during a refactor
} | ||
|
||
virtual void handle_update(const ImageIds &added_image_ids, | ||
const ImageIds &removed_image_ids) override { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both virtual and override, it is intentional?
src/tools/rbd_mirror/Replayer.h
Outdated
} | ||
|
||
virtual void handle_update(const ImageIds &added_image_ids, | ||
const ImageIds &removed_image_ids) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
override
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
…ions Fixes: http://tracker.ceph.com/issues/15029 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
The local image id set should be up-to-date when attempting to determine which images need to be deleted. Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
2bdf25a
to
b8e70d5
Compare
@ceph-jenkins retest this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@ceph-jenkins try again: retest this please |
No description provided.