rbd-mirror A/A: coordinate image syncs with leader #14745

trociny · 2017-04-24T06:23:01Z

Fixes: http://tracker.ceph.com/issues/18789
Signed-off-by: Mykola Golub mgolub@mirantis.com

trociny · 2017-04-24T06:29:48Z

Implementation details:

The throttler changed from per cluster to per pool, because the instance leader control is per pool.
The LeaderWatcher is used for sync messages.
To handle situations like a message lost, an instance crash or the leader changed, the following strategy is used. The proxy instance keeps resending sync_request message with 10 sec interval until sync_request_ack is recieved from the leader. The leader keeps to re-send sync_request_ack with 10 sec interval until sync_complete is received. Also the instance remove hook is added to clear removed instance syncs in progress.
The sync state is not persisted in case of leader failover (as it was proposed in the ticket [1]). I am not sure it is necessary. Due to resending sync_request messages, the new leader will handle the syncs that are not started yet. Those that are already started will send sync_complete notification to the new leader, which will be just ignored. The only drawback I see is the number of concurrent syncs may temporary double when the leader is changed.
sync_started messages look like not necessary for this implementation.
Instance removal hook is added as a separate commit right now, because I am not sure it is the best approach.

[1] http://tracker.ceph.com/issues/18789

dillaman · 2017-04-24T18:42:43Z

Nit: typo of "separate" in "rbd-mirror: refactor ImageSyncThrottler" commit message

dillaman

My original thinking was that these messages would be part of InstanceWatcher since that would allow direct RPC between the instance requesting the sync and the leader that can grant it. Sending the RPC via the LeaderWatcher would result in a broadcast message.

The InstanceWatcher already has the notion of repeating messages until acked which seems like it could be re-used here. Also, I think you would be able to eliminate the listener pattern by instantiating the InstanceSyncThrottler when the leader is acquired and by passing the pointer to InstanceWatcher (and clearing it when the leader role is lost).

dillaman · 2017-04-24T19:07:35Z

src/tools/rbd_mirror/ImageSyncThrottler.cc

-  }
+  sync_holder->m_on_finish->complete(r);
+  m_throttler->finish_op(sync_holder->m_local_image_id);
+  Mutex::Locker locker(m_lock);


Nit: I would prefer to clean up the local state before firing the completion down to the instance throttler

dillaman · 2017-04-24T19:10:48Z

src/test/rbd_mirror/test_mock_ImageSyncThrottler.cc

@@ -100,6 +100,7 @@ std::vector<MockImageSync *> MockImageSync::instances;


 // template definitions
+#include "tools/rbd_mirror/InstanceSyncThrottler.cc"


A template specialization of InstanceSyncThrottle should be defined above and eliminate the need to pull in the CC file here. That would also allow the test cases below to specifically test ImageSyncThrottler instead of its interactions w/ InstanceSyncThrottler.

dillaman · 2017-04-24T19:17:44Z

src/tools/rbd_mirror/leader_watcher/Types.h

@@ -68,6 +136,10 @@ struct UnknownPayload {
 typedef boost::variant<HeartbeatPayload,
                       LockAcquiredPayload,
                       LockReleasedPayload,
+                       SyncRequestPayload,
+                       SyncRequestAckPayload,


Nit: perhaps this message could be called SyncStartPayload (from leader -> instance to grant permission to start the sync) and we could eliminate the previous use of SyncStartPayload below. The Ack suffix just seems odd to me since it's not an acknowledgement of the request but rather a command to start.

trociny · 2017-04-24T19:59:40Z

@dillaman Thank you for the review. Using LeaderWatcher for sync messages looked appealing to me because of no need to track who is the leader right now to send sync notification. Broadcast message also made things easier. Using InstanceWatcher looked like would complicate the implementation considerably, but I will try this way. Hope I was wrong.

trociny · 2017-05-09T10:37:39Z

@dillaman updated

dillaman · 2017-05-16T01:09:12Z

src/tools/rbd_mirror/ImageSyncThrottler.h

-  std::list<C_SyncHolder *> m_sync_queue;
-  std::map<PoolImageId, C_SyncHolder *> m_inflight_syncs;
-
+  std::map<std::string, C_SyncHolder *> m_waiting_syncs;


Nit: seems like you can remove this collection since once you schedule the sync, you hand the management over to InstanceWatcher for starting / canceling. This collection just seems to be used for an assert.

dillaman · 2017-05-16T01:11:20Z

src/tools/rbd_mirror/ImageSyncThrottler.h


 class ProgressContext;

 /**
 * Manage concurrent image-syncs
 */
 template <typename ImageCtxT = librbd::ImageCtx>
-class ImageSyncThrottler : public md_config_obs_t {
+class ImageSyncThrottler {


Nit: should this class be the InstanceSyncThrottler since it's associated w/ the instance and the original version (now named InstanceSyncThrottler) should remain the ImageSyncThrottler (or LeaderSyncThrottler)? Right now, the InstanceXYZ classes are tied to a specific instance of rbd-mirror.

Alternatively, if you pass the InstanceWatcher reference down to the bootstrap state machine, it could instantly create a ImageSync state machine whose first state is just to invoke InstanceWatcher::notify_sync_start and wait. That would eliminate this pass-through class entirely.

InstanceSyncThrottler name was chosen when I was working on previous version, when this class was handling sync notifications via the leader object. Now this name really looks wrong. I wouldn't like to change it to ImageSyncThrottler because it knows nothing about 'image', and even that operations it throttles are syncs. I am going to change it to just Throttler.

I will evaluate your idea to remove ImageSyncThrottler and use InstanceWatcher::notify_sync_xyz methods directly in ImageSync state machine. Thanks.

dillaman · 2017-05-16T01:17:08Z

src/tools/rbd_mirror/InstanceWatcher.cc

+             << ": canceling request to local throttler" << dendl;
+    if (on_start != nullptr) {
+      if (instance_id == m_instance_id) {
+        send_sync_start(sync_id, on_start);


Nit: can you just bubble up -ESTALE for the local instance as well to avoid the special case?

dillaman · 2017-05-16T01:28:29Z

src/tools/rbd_mirror/InstanceWatcher.cc

+    return;
+  }
+
+  // This should only be possible when notify_finish_sync for the


How does this situation occur?

A user completes a sync by sending notify_finish_sync (which returns immediately, after queuing the sync notification to the leader), and after this (by some reason) calls notify_start_sync for the same image again. At this moment the previous notification for this sync_id may still be in progress.

This could be avoided by using some unique value for sync_id instead of image_id by the caller, but this will complicate things on the caller side.

ACK. Would it be possible to remove the special on_complete callback and just wrap the on_start context in that case with this special handling?

dillaman · 2017-05-16T01:33:47Z

src/tools/rbd_mirror/InstanceWatcher.h

@@ -72,8 +74,19 @@ class InstanceWatcher : protected librbd::Watcher {
                            const std::string &peer_image_id,
 			    bool schedule_delete, Context *on_notify_ack);

+  void notify_start_sync(const std::string &request_id,


Nit: perhaps swap to "noun_verb" form like the methods above? notify_sync_start, notify_sync_cancel, etc

dillaman · 2017-05-16T01:56:25Z

@trociny I think the optimization to directly pass-through image sync requests for the local leader's images might be adding undo complications. The code complexity would most likely be reduced if we just eliminated all those special cases and just send the message and allowed it to be received back by the watcher.

dillaman · 2017-05-16T14:02:19Z

src/tools/rbd_mirror/InstanceWatcher.cc

+  assert(sync_ctx->on_complete == nullptr);
+  sync_ctx->on_complete = new FunctionContext(
+    [this, sync_id, on_start] (int r) {
+      if (r == -ECANCELED) {


Is this state possible? If chaining off a previous sync request that was canceled and somehow we raced to restart the sync request, should the previous -ECANCLED be applied to the new sync request?

When the notification for the previous request is completed, on_complete is always called with r =0 (see handle_notify_complete_sync). This case is when the caller cancels this (not started yet) sync (handled in notify_cancel_sync, if on_complete is not null).

Would that be a race condition back to the throttler? Since the cancel method doesn't take an "on_finish" callback, it could just assume it's all properly cleaned up. However, eventually, the "on_start" will get invoked with an -ECANCELED parameter.

If contexts were passed to the finish and cancel methods, perhaps these boundary cases would be clearer?

@dillaman I think I see some problems with my current code here. I am not sure we are talking about the same thing though, but lets discuss this when I update the code fixing the problems I see and addressing other your comments.

trociny · 2017-05-25T09:49:40Z

@dillaman Here is a new version. The main changes:

ImageSyncThrottler removed (moving the sync request logic to ImageSync state machine).
I tried to simplify sync messages tracking in InstanceWatcher by allowing duplicates.
I returned to SyncRequest/SyncStart scheme to make sure the throttler's finish_op will not be lost (e.g. due to lost ack or the instance crash).

Some details for the last item. Now, when requesting a sync slot, the instance sends SyncRequest messages and waits for ack (resending after timeout). When the leader throttler grants the sync slot, the SyncRequest is acked and the leader sends SyncStart message (resending after timeout until it is acked). The SyncStart is acked by the instance when the sync is complete.

A situation is possible, when the leader receives a duplicate 'SyncRequest' resent after timeout, after it has already granted the sync slot and sent 'SyncStart'. In this case a duplicate 'SyncStart' will be sent, which is handled on the receiver side by completing the previous 'SyncStart' with -ESTALE.

In this scheme it looks like 'SyncComplete' notifications are not necessary and just lead to duplicate sync throttler finish_op. It looks I may remove them, so notify_sync_complete will only complete ack context for received 'SyncStart' notification. I have not removed them yet though -- I want to know your opinion first, about the approach in general, and about 'SyncComplet' notifications in particularly.

dillaman · 2017-05-26T15:14:54Z

@trociny In general it sounds good to me.

trociny · 2017-05-29T07:45:37Z

@dillaman updated (unnecessary 'SyncComplet' messages are removed)

dillaman · 2017-05-30T16:15:56Z

src/tools/rbd_mirror/InstanceWatcher.cc

@@ -785,8 +785,14 @@ void InstanceWatcher<I>::handle_image_acquire(
  const std::string &peer_image_id, Context *on_finish) {
  dout(20) << "global_image_id=" << global_image_id << dendl;

-  m_instance_replayer->acquire_image(global_image_id, peer_mirror_uuid,
-                                     peer_image_id, on_finish);
+  auto ctx = new FunctionContext(


Probably would be a good idea to track these queued callbacks via an AsyncOpTracker to prevent lock release / shut down races.

dillaman · 2017-05-30T16:34:12Z

src/tools/rbd_mirror/Throttler.cc

+// -*- mode:C++; tab-width:8; c-basic-offset:2; indent-tabs-mode:t -*-
+// vim: ts=8 sw=2 smarttab
+
+#include "Throttler.h"


Nit: why the name change to something so generic? This seems to be heavily tied to sync throttling.

Now it can throttle any ops, it has nothing specific to image sync (or just sync). That is why so generic name. If you think it is a bad idea I will change it. ImageSyncThrottler? SyncThrottler?

I figured that was the idea, but it is still tied to a specific config option for max sync and it dump status for number of syncs.

Ah, good point. I forgot about this. I will change it (return back) to ImageSyncThrottler.

dillaman · 2017-05-30T16:35:01Z

src/tools/rbd_mirror/image_replayer/BootstrapRequest.cc

@@ -85,10 +85,14 @@ template <typename I>
 void BootstrapRequest<I>::cancel() {
  dout(20) << dendl;

-  Mutex::Locker locker(m_lock);
-  m_canceled = true;
+  {


Nit: no need for new block

dillaman · 2017-05-30T16:36:30Z

src/tools/rbd_mirror/image_replayer/BootstrapRequest.cc

-                                         m_client_meta, m_work_queue, ctx,
-                                         m_progress_ctx);
+    if (m_canceled) {
+      m_ret_val = -ECANCELED;


Nit: was this change needed? If the image sync wasn't started, no need to worry about a race w/ the result code.

There was no special need, this just looked more readable to me. And yes even if there no need to worry about a race I prefer to keep the lock in the cases like this if it does not make trouble. Because when I look at the code later it always looks alarming to me when I see variables like these accessed without lock and I need to go through the code to ensure it is safe.

Ok, I will return it back to make changes less intrusive.

No worries -- just double checking

dillaman · 2017-05-30T16:43:29Z

src/tools/rbd_mirror/InstanceWatcher.h

@@ -72,8 +74,18 @@ class InstanceWatcher : protected librbd::Watcher {
                            const std::string &peer_image_id,
 			    bool schedule_delete, Context *on_notify_ack);

+  void notify_sync_request(const std::string &sync_id, Context *on_sync_start);
+  bool cancel_notify_sync_request(const std::string &sync_id);


Nit: perhaps just cancel_sync_request?

trociny · 2017-05-31T15:23:01Z

@dillaman updated

dillaman · 2017-06-02T15:17:53Z

retest this please

Remove unnecessary ImageSyncThrottler dependency. Signed-off-by: Mykola Golub <mgolub@mirantis.com>

Signed-off-by: Mykola Golub <mgolub@mirantis.com>

Would have been possible when release_image had lead to cancel sync (i.e. cancel_sync_request). Signed-off-by: Mykola Golub <mgolub@mirantis.com>

Fixes: http://tracker.ceph.com/issues/18789 Signed-off-by: Mykola Golub <mgolub@mirantis.com>

trociny · 2017-06-06T10:52:53Z

rebased to resolve conflicts with master

dillaman

lgtm -- test failures seem unrelated to this change

trociny added feature rbd labels Apr 24, 2017

trociny requested a review from dillaman April 24, 2017 06:23

dillaman reviewed Apr 24, 2017

View reviewed changes

trociny force-pushed the wip-18789 branch from c6e4896 to 9651873 Compare May 9, 2017 10:34

dillaman reviewed May 16, 2017

View reviewed changes

trociny force-pushed the wip-18789 branch from 9651873 to 4cca816 Compare May 25, 2017 09:03

trociny force-pushed the wip-18789 branch from 4cca816 to acd2669 Compare May 28, 2017 20:41

dillaman reviewed May 30, 2017

View reviewed changes

trociny force-pushed the wip-18789 branch 2 times, most recently from 4baf9a0 to 967ad82 Compare May 31, 2017 11:54

trociny changed the title ~~[DNM] rbd-mirror A/A: coordinate image syncs with leader~~ rbd-mirror A/A: coordinate image syncs with leader May 31, 2017

dillaman added the wip-jason-testing label Jun 2, 2017

Mykola Golub added 4 commits June 6, 2017 10:42

test/rbd_mirror: TestMockImageReplayer cleanup

b074260

Remove unnecessary ImageSyncThrottler dependency. Signed-off-by: Mykola Golub <mgolub@mirantis.com>

rbd-mirror: make sync throttler per pool

def50d0

Signed-off-by: Mykola Golub <mgolub@mirantis.com>

rbd-mirror: resolve potential recursive lock

55f9c62

Would have been possible when release_image had lead to cancel sync (i.e. cancel_sync_request). Signed-off-by: Mykola Golub <mgolub@mirantis.com>

rbd-mirror A/A: coordinate image syncs with leader

4db1ee3

Fixes: http://tracker.ceph.com/issues/18789 Signed-off-by: Mykola Golub <mgolub@mirantis.com>

trociny force-pushed the wip-18789 branch from 967ad82 to 4db1ee3 Compare June 6, 2017 09:10

dillaman approved these changes Jun 7, 2017

View reviewed changes

dillaman merged commit 7bda34e into ceph:master Jun 7, 2017

		@@ -100,6 +100,7 @@ std::vector<MockImageSync *> MockImageSync::instances;


		// template definitions
		#include "tools/rbd_mirror/InstanceSyncThrottler.cc"

rbd-mirror A/A: coordinate image syncs with leader #14745

rbd-mirror A/A: coordinate image syncs with leader #14745

Conversation

trociny commented Apr 24, 2017

trociny commented Apr 24, 2017

dillaman commented Apr 24, 2017

dillaman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

trociny commented Apr 24, 2017

trociny commented May 9, 2017

Choose a reason for hiding this comment

dillaman May 16, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dillaman commented May 16, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

trociny commented May 25, 2017

dillaman commented May 26, 2017

trociny commented May 29, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

trociny commented May 31, 2017

dillaman commented Jun 2, 2017

trociny commented Jun 6, 2017

dillaman left a comment

Choose a reason for hiding this comment

dillaman May 16, 2017 •

edited