-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rbd-mirror: add support for active/passive daemon instances #12948
Conversation
@dillaman updated according to your comments. Still I am not sure about my approach for LeaderWatcher state machine. I use variables "acquiring" and "releasing" to make sure we don't start new state transition when we still execution the previous one. So if we are in transition state, leader watcher messages (that may trigger transition) are ignored, which I suppose is wrong. Not sure what is correct to do in this case. May be I could use ManagedLock hooks to serialize events properly? |
src/tools/rbd_mirror/LeaderWatcher.h
Outdated
void notify_lock_acquired(Context *on_finish); | ||
void notify_lock_released(Context *on_finish); | ||
|
||
virtual void handle_heartbeat(Context *on_ack) = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: these methods are unused in this commit and refactored into protected, non-virtual methods in another commit
|
||
namespace { | ||
|
||
static const uint64_t NOTIFY_TIMEOUT_MS = 5000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: unused variable
Context *on_notify_ack) { | ||
dout(20) << "heartbeat" << dendl; | ||
|
||
handle_heartbeat(on_notify_ack); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: why do you have these overloaded methods just invoke other methods instead of just combining the two?
|
||
namespace { | ||
|
||
static const uint64_t NOTIFY_TIMEOUT_MS = 5000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: unused variable
#include "librbd/Watcher.h" | ||
|
||
namespace librbd { | ||
class ImageCtx; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: not used in this header
Mutex::Locker locker(m_lock); | ||
|
||
if (r < 0) { | ||
derr << "error acquiring leader lock: " << cpp_strerror(r) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: if -EAGAIN, no need to output an error message -- just a debug message
Mutex::Locker locker(m_lock); | ||
|
||
if (r < 0) { | ||
derr << "error retrieving leader lock owner: " << cpp_strerror(r) << dendl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: if -ENOENT, no need to output error message. Should you immediately retry to acquire the lock in that case?
src/tools/rbd_mirror/LeaderWatcher.h
Outdated
} | ||
|
||
void check_leader_alive(utime_t &now, int heartbeat_interval); | ||
|
||
void notify_heartbeat(Context *on_finish); | ||
void notify_lock_acquired(Context *on_finish); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these two methods (notify_lock_acquired and notify_lock_released ) need to be public? You have overloaded versions that interact w/ the internal state machine and these stand-alone ones.
Mutex::Locker locker(m_lock); | ||
if (!m_stopping && !m_releasing && !m_acquiring) { | ||
if (m_leader_lock->is_lock_owner()) { | ||
dout(0) << "got another leader heartbeat, releasing lock" << dendl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you want to do that here (or below). You can have a transient network hiccup and the Objecter could replay old messages from before this instance acquired the lock. I'd just ignore it since you should have been blacklisted if the lock was broken.
src/tools/rbd_mirror/LeaderWatcher.h
Outdated
* REGISTER_WATCH * lock_released received | | ||
* | /-------------------------\ | ||
* v v | | ||
* ACQUIRE_LEADER_LOCK * *> GET_LOCKER -> <peon> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: peon -> passive or secondary
@dillaman significantly modified:
|
@trociny For (5), I think it would be good for |
810a7a1
to
0eb941a
Compare
@@ -0,0 +1,270 @@ | |||
// -*- mode:C++; tab-width:8; c-basic-offset:2; indent-tabs-mode:t -*- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you create a basic mock test case for the leader watcher? It would require templatizing the LeaderWatcher class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is in progress.
return 0; | ||
} | ||
|
||
void LeaderWatcher::shut_down() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be better to make this an async method w/ helper methods now that you have to unlock/lock over and over around to get around the fact that it is internally calling async methods.
release_leader_lock(); | ||
} | ||
|
||
Context *LeaderWatcher::async(Context * ctx) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: rename to create_async_context
? or update the existing librbd::util::create_async_context
to be able to handle either an ImageCtx
-like param or a ContextWQ
-like param.
m_locker = locker; | ||
} | ||
|
||
schedule_timer_task("break leader", 2, false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two missed heartbeats results in a break lock? This should probably be configurable and be more aligned to 30 seconds like the watch timeout.
|
||
assert(m_lock.is_locked()); | ||
|
||
std::shared_ptr<librbd::managed_lock::Locker> locker( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just use a member variable for this in-flight request storage? You shouldn't have concurrent get locker requests in-flight, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I expect concurrent get locker requests. They are generated by these events:
- lock acquire failed,
- lock break fail,
- lock released,
- "heartbeat" message received,
- "lock acquired" message received,
as these are both internal and external events, it needs some serialization if we want to avoid concurrent requests. In my previous version I had a check in get_locker to skip if there is already an in-flight request. This requires two additional member variables (locker and a "in-flight" boolean). Using a shared pointer looked a little nicer to me, but I don't have a strong opinion here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also just create a custom struct context wrapper that has its own locker member variable. That would combine the FunctionContext and Locker allocations into a single wrapper.
src/tools/rbd_mirror/Mirror.cc
Outdated
@@ -147,6 +160,13 @@ class MirrorAdminSocketHook : public AdminSocketHook { | |||
if (r == 0) { | |||
commands[command] = new FlushCommand(mirror); | |||
} | |||
|
|||
command = "rbd mirror release leader"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: perhaps "rbd mirror leader release" in case new leader asok commands are added in the future
src/tools/rbd_mirror/Replayer.cc
Outdated
@@ -152,6 +166,13 @@ class ReplayerAdminSocketHook : public AdminSocketHook { | |||
if (r == 0) { | |||
commands[command] = new FlushCommand(replayer); | |||
} | |||
|
|||
command = "rbd mirror release leader " + name; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: same comment here
src/tools/rbd_mirror/Replayer.cc
Outdated
set_sources(ImageIds()); | ||
if (!m_image_replayers.empty()) { | ||
Mutex::Locker timer_locker(m_threads->timer_lock); | ||
Context *task = new FunctionContext( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be cleaned up? The nested FunctionContexts are hard to read.
src/tools/rbd_mirror/LeaderWatcher.h
Outdated
bool is_leader(); | ||
void release_leader(); | ||
|
||
virtual void post_acquire_leader_handler(Context *on_finish) = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps just create a listener interface to avoid having to inherit from this class just to receive callbacks.
@dillaman It looks ready for review now. |
@trociny Thanks -- I'll start a deep review today. Did you see the build failure? |
src/tools/rbd_mirror/LeaderWatcher.h
Outdated
|
||
LeaderWatcher(Threads *threads, librados::IoCtx &io_ctx); | ||
|
||
void add_listener(Listener *listener); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to need the add/remove listener methods and not just pass in the listener in the constructor? Is there a one-to-many relationship somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no any particular reason so far. It just looked more generic and could be useful for testing (to add test hooks). I can also imagine some useful applications like a cli watcher, though it is unlikely we have such and this always can be changed. So if you prefer the listener is passed in the constructor, I will update this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only concern was the added complexity w/ handling multiple observers and how to properly handle adding / removing them if a callback is in-flight. The code becomes cleaner w/o support for multiple.
|
||
assert(m_lock.is_locked()); | ||
|
||
Context *ctx = create_async_context_callback( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: librbd::Watcher::register_watch
is guaranteed to be an async callback already. Did you hit some oddity here that required an extra async wrapper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I didn't. I did hit it in unregister_watch, when the client was blacklisted, and just changed this for safety too (even if it is guaranteed to be an async callback now, I can't see why someone might not change this in the future, forgetting to update LeaderWatcher).
If you don't like unnecessary async calls I will update this. Though I would like we have some clear conventions (markers in header file?) to know if a function is guaranteed to be an async callback without need to look in the code. Or update functions like unregister_watch to provide this guarantee.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine w/ leaving it -- no harm and this isn't a high performance code path. It was more of a curiosity.
void handle_get_locker(int r, librbd::managed_lock::Locker& locker); | ||
|
||
void acquire_leader_lock(bool reset_attempt_counter); | ||
void acquire_leader_lock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I'd just update the call sites to this override to use acquire_leader_lock(false)
directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to pass it to schedule_timer_task, which expects void(*callback)()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
if (is_leader(m_lock)) { | ||
dout(5) << "got another leader lock_acquired, ignoring" << dendl; | ||
} else { | ||
cancel_timer_task(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should "acquire" and "release" notifications reset m_acquire_attempts
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For "release" case it is already reset , but it is missed for "acquire". Thanks, I will update.
|
||
namespace { | ||
|
||
static const uint64_t NOTIFY_TIMEOUT_MS = 5000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: comment still open
src/tools/rbd_mirror/Replayer.cc
Outdated
void Replayer::set_sources(const ImageIds &image_ids) | ||
{ | ||
dout(20) << "enter" << dendl; | ||
|
||
assert(m_lock.is_locked()); | ||
|
||
if (!m_init_images.empty()) { | ||
if (!m_init_images.empty() && !m_stopping.read() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: move &&
to end of this line
@dillaman updated |
@@ -0,0 +1,207 @@ | |||
#!/bin/sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@trociny Just realized while running the tests that this needs an associate suite test case to execute it under teuthology
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dillaman The tests starts/stops rbd-daemons, while under teuthology it is out of the script control.
I suppose I can modify the script so when it is being run under teuthology it executes only a subset of tests (may be using "leader release" instead of stop in some cases). I will investigate this tomorrow. May be you already have some hints or suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@trociny Hmm -- perhaps what we really just need (long term) is a "thrasher" task like the ones for OSDs/MDSs. I'll create a separate tracker ticket for that.
@trociny Running the rbd_mirror_ha.sh script locally has failed for me 3 out of 4 times. I will try to investigate more tomorrow. |
Signed-off-by: Mykola Golub <mgolub@mirantis.com>
Signed-off-by: Mykola Golub <mgolub@mirantis.com>
Signed-off-by: Mykola Golub <mgolub@mirantis.com>
Signed-off-by: Mykola Golub <mgolub@mirantis.com>
Signed-off-by: Mykola Golub <mgolub@mirantis.com>
Signed-off-by: Mykola Golub <mgolub@mirantis.com>
Signed-off-by: Mykola Golub <mgolub@mirantis.com>
Fixes: http://tracker.ceph.com/issues/17018 Signed-off-by: Mykola Golub <mgolub@mirantis.com>
Fixes: http://tracker.ceph.com/issues/17019 Signed-off-by: Mykola Golub <mgolub@mirantis.com>
…ired/released Fixes: http://tracker.ceph.com/issues/17020 Signed-off-by: Mykola Golub <mgolub@mirantis.com>
by optionally specifyning daemon instance after cluster name and colon, like: start_mirror ${cluster}:${instance} Signed-off-by: Mykola Golub <mgolub@mirantis.com>
Signed-off-by: Mykola Golub <mgolub@mirantis.com>
This var is mostly used when running rbd_mirror test scripts on teuthology. It can be used locally though to speedup re-running the tests: Set a test temp directory: export RBD_MIRROR_TEMDIR=/tmp/tmp.rbd_mirror Run the tests the first time with NOCLEANUP flag (the cluster and daemons are not stopped on finish): RBD_MIRROR_NOCLEANUP=1 ../qa/workunits/rbd/rbd_mirror.sh Now, to re-run the test without restarting the cluster, run cleanup with USE_EXISTING_CLUSTER flag: RBD_MIRROR_USE_EXISTING_CLUSTER=1 \ ../qa/workunits/rbd/rbd_mirror_ha.sh cleanup and then run the tests: RBD_MIRROR_USE_EXISTING_CLUSTER=1 ../qa/workunits/rbd/rbd_mirror_ha.sh Signed-off-by: Mykola Golub <mgolub@mirantis.com>
@dillaman I was able to reproduce rbd_mirror_ha.sh test failure after several runs (not sure it was your case though). In my case it failed on "crash leader" test due to too short timeout in rbd_mirror_ha.sh wait_for_leader (forgot to update the test after I added acquire attempts to the LeaderWatcher). I updated the PR. Also, I have added a suite test, trying to run this on teuthology... |
Signed-off-by: Mykola Golub <mgolub@mirantis.com>
But I like your idea with a "thrasher" task too. |
@trociny Thanks -- works for me now |
@dillaman after this PR, we can not add more than one peer for a local pool anymore? |
@runsisi That has never been supported (yet) -- not sure why you think this PR removed that capability. |
@dillaman did some tests on the latest Jewel release, i.e., 10.2.6, results are showed as follows, it shows the images in cluster so i think the do you mean the
|
@runsisi N remote peers syncing to 1 local cluster is definitely a scenario we have never tested nor designed against. Feel free to submit a PR to fix whatever got broken but N clusters -> 1 cluster is a very odd scenario that could run into other issues in the future (e.g. name collisions). |
@dillaman yeah, thank you for your explanation. but then i am wondering why the rbd cli add the support to add more than one peer for a local pool? |
@runsisi It's definitely a long-term goal to support N-to-N replication. We would like to have a better understanding of required use-cases in the real-world before we dive into such an endeavor. It probably would be a good idea to prevent the rbd CLI from adding more than one peer to a pool in the meantime. |
@dillaman i see, jason thank you very much! |
@runsisi In the meantime, if you have cluster cs4 and cluster cs5 use different pool names, they could replicate to the associated two pools in cluster cs6. |
@dillaman yes, i suppose this is the designed behavior :) |
No description provided.