Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rbd-mirror A/A: coordinate image syncs with leader #14745

Merged
merged 4 commits into from Jun 7, 2017

Conversation

trociny
Copy link
Contributor

@trociny trociny commented Apr 24, 2017

Fixes: http://tracker.ceph.com/issues/18789
Signed-off-by: Mykola Golub mgolub@mirantis.com

@trociny
Copy link
Contributor Author

trociny commented Apr 24, 2017

Implementation details:

  • The throttler changed from per cluster to per pool, because the instance leader control is per pool.
  • The LeaderWatcher is used for sync messages.
  • To handle situations like a message lost, an instance crash or the leader changed, the following strategy is used. The proxy instance keeps resending sync_request message with 10 sec interval until sync_request_ack is recieved from the leader. The leader keeps to re-send sync_request_ack with 10 sec interval until sync_complete is received. Also the instance remove hook is added to clear removed instance syncs in progress.
  • The sync state is not persisted in case of leader failover (as it was proposed in the ticket [1]). I am not sure it is necessary. Due to resending sync_request messages, the new leader will handle the syncs that are not started yet. Those that are already started will send sync_complete notification to the new leader, which will be just ignored. The only drawback I see is the number of concurrent syncs may temporary double when the leader is changed.
  • sync_started messages look like not necessary for this implementation.
  • Instance removal hook is added as a separate commit right now, because I am not sure it is the best approach.

[1] http://tracker.ceph.com/issues/18789

@dillaman
Copy link

Nit: typo of "separate" in "rbd-mirror: refactor ImageSyncThrottler" commit message

Copy link

@dillaman dillaman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My original thinking was that these messages would be part of InstanceWatcher since that would allow direct RPC between the instance requesting the sync and the leader that can grant it. Sending the RPC via the LeaderWatcher would result in a broadcast message.

The InstanceWatcher already has the notion of repeating messages until acked which seems like it could be re-used here. Also, I think you would be able to eliminate the listener pattern by instantiating the InstanceSyncThrottler when the leader is acquired and by passing the pointer to InstanceWatcher (and clearing it when the leader role is lost).

}
sync_holder->m_on_finish->complete(r);
m_throttler->finish_op(sync_holder->m_local_image_id);
Mutex::Locker locker(m_lock);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I would prefer to clean up the local state before firing the completion down to the instance throttler

@@ -100,6 +100,7 @@ std::vector<MockImageSync *> MockImageSync::instances;


// template definitions
#include "tools/rbd_mirror/InstanceSyncThrottler.cc"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A template specialization of InstanceSyncThrottle should be defined above and eliminate the need to pull in the CC file here. That would also allow the test cases below to specifically test ImageSyncThrottler instead of its interactions w/ InstanceSyncThrottler.

@@ -68,6 +136,10 @@ struct UnknownPayload {
typedef boost::variant<HeartbeatPayload,
LockAcquiredPayload,
LockReleasedPayload,
SyncRequestPayload,
SyncRequestAckPayload,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: perhaps this message could be called SyncStartPayload (from leader -> instance to grant permission to start the sync) and we could eliminate the previous use of SyncStartPayload below. The Ack suffix just seems odd to me since it's not an acknowledgement of the request but rather a command to start.

@trociny
Copy link
Contributor Author

trociny commented Apr 24, 2017

@dillaman Thank you for the review. Using LeaderWatcher for sync messages looked appealing to me because of no need to track who is the leader right now to send sync notification. Broadcast message also made things easier. Using InstanceWatcher looked like would complicate the implementation considerably, but I will try this way. Hope I was wrong.

@trociny
Copy link
Contributor Author

trociny commented May 9, 2017

@dillaman updated

std::list<C_SyncHolder *> m_sync_queue;
std::map<PoolImageId, C_SyncHolder *> m_inflight_syncs;

std::map<std::string, C_SyncHolder *> m_waiting_syncs;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: seems like you can remove this collection since once you schedule the sync, you hand the management over to InstanceWatcher for starting / canceling. This collection just seems to be used for an assert.


class ProgressContext;

/**
* Manage concurrent image-syncs
*/
template <typename ImageCtxT = librbd::ImageCtx>
class ImageSyncThrottler : public md_config_obs_t {
class ImageSyncThrottler {
Copy link

@dillaman dillaman May 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: should this class be the InstanceSyncThrottler since it's associated w/ the instance and the original version (now named InstanceSyncThrottler) should remain the ImageSyncThrottler (or LeaderSyncThrottler)? Right now, the InstanceXYZ classes are tied to a specific instance of rbd-mirror.

Alternatively, if you pass the InstanceWatcher reference down to the bootstrap state machine, it could instantly create a ImageSync state machine whose first state is just to invoke InstanceWatcher::notify_sync_start and wait. That would eliminate this pass-through class entirely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InstanceSyncThrottler name was chosen when I was working on previous version, when this class was handling sync notifications via the leader object. Now this name really looks wrong. I wouldn't like to change it to ImageSyncThrottler because it knows nothing about 'image', and even that operations it throttles are syncs. I am going to change it to just Throttler.

I will evaluate your idea to remove ImageSyncThrottler and use InstanceWatcher::notify_sync_xyz methods directly in ImageSync state machine. Thanks.

<< ": canceling request to local throttler" << dendl;
if (on_start != nullptr) {
if (instance_id == m_instance_id) {
send_sync_start(sync_id, on_start);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: can you just bubble up -ESTALE for the local instance as well to avoid the special case?

return;
}

// This should only be possible when notify_finish_sync for the

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this situation occur?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A user completes a sync by sending notify_finish_sync (which returns immediately, after queuing the sync notification to the leader), and after this (by some reason) calls notify_start_sync for the same image again. At this moment the previous notification for this sync_id may still be in progress.

This could be avoided by using some unique value for sync_id instead of image_id by the caller, but this will complicate things on the caller side.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK. Would it be possible to remove the special on_complete callback and just wrap the on_start context in that case with this special handling?

@@ -72,8 +74,19 @@ class InstanceWatcher : protected librbd::Watcher {
const std::string &peer_image_id,
bool schedule_delete, Context *on_notify_ack);

void notify_start_sync(const std::string &request_id,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: perhaps swap to "noun_verb" form like the methods above? notify_sync_start, notify_sync_cancel, etc

@dillaman
Copy link

@trociny I think the optimization to directly pass-through image sync requests for the local leader's images might be adding undo complications. The code complexity would most likely be reduced if we just eliminated all those special cases and just send the message and allowed it to be received back by the watcher.

assert(sync_ctx->on_complete == nullptr);
sync_ctx->on_complete = new FunctionContext(
[this, sync_id, on_start] (int r) {
if (r == -ECANCELED) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this state possible? If chaining off a previous sync request that was canceled and somehow we raced to restart the sync request, should the previous -ECANCLED be applied to the new sync request?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the notification for the previous request is completed, on_complete is always called with r =0 (see handle_notify_complete_sync). This case is when the caller cancels this (not started yet) sync (handled in notify_cancel_sync, if on_complete is not null).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would that be a race condition back to the throttler? Since the cancel method doesn't take an "on_finish" callback, it could just assume it's all properly cleaned up. However, eventually, the "on_start" will get invoked with an -ECANCELED parameter.

If contexts were passed to the finish and cancel methods, perhaps these boundary cases would be clearer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dillaman I think I see some problems with my current code here. I am not sure we are talking about the same thing though, but lets discuss this when I update the code fixing the problems I see and addressing other your comments.

@trociny
Copy link
Contributor Author

trociny commented May 25, 2017

@dillaman Here is a new version. The main changes:

  • ImageSyncThrottler removed (moving the sync request logic to ImageSync state machine).
  • I tried to simplify sync messages tracking in InstanceWatcher by allowing duplicates.
  • I returned to SyncRequest/SyncStart scheme to make sure the throttler's finish_op will not be lost (e.g. due to lost ack or the instance crash).

Some details for the last item. Now, when requesting a sync slot, the instance sends SyncRequest messages and waits for ack (resending after timeout). When the leader throttler grants the sync slot, the SyncRequest is acked and the leader sends SyncStart message (resending after timeout until it is acked). The SyncStart is acked by the instance when the sync is complete.

A situation is possible, when the leader receives a duplicate 'SyncRequest' resent after timeout, after it has already granted the sync slot and sent 'SyncStart'. In this case a duplicate 'SyncStart' will be sent, which is handled on the receiver side by completing the previous 'SyncStart' with -ESTALE.

In this scheme it looks like 'SyncComplete' notifications are not necessary and just lead to duplicate sync throttler finish_op. It looks I may remove them, so notify_sync_complete will only complete ack context for received 'SyncStart' notification. I have not removed them yet though -- I want to know your opinion first, about the approach in general, and about 'SyncComplet' notifications in particularly.

@dillaman
Copy link

@trociny In general it sounds good to me.

@trociny
Copy link
Contributor Author

trociny commented May 29, 2017

@dillaman updated (unnecessary 'SyncComplet' messages are removed)

@@ -785,8 +785,14 @@ void InstanceWatcher<I>::handle_image_acquire(
const std::string &peer_image_id, Context *on_finish) {
dout(20) << "global_image_id=" << global_image_id << dendl;

m_instance_replayer->acquire_image(global_image_id, peer_mirror_uuid,
peer_image_id, on_finish);
auto ctx = new FunctionContext(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably would be a good idea to track these queued callbacks via an AsyncOpTracker to prevent lock release / shut down races.

// -*- mode:C++; tab-width:8; c-basic-offset:2; indent-tabs-mode:t -*-
// vim: ts=8 sw=2 smarttab

#include "Throttler.h"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: why the name change to something so generic? This seems to be heavily tied to sync throttling.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it can throttle any ops, it has nothing specific to image sync (or just sync). That is why so generic name. If you think it is a bad idea I will change it. ImageSyncThrottler? SyncThrottler?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured that was the idea, but it is still tied to a specific config option for max sync and it dump status for number of syncs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point. I forgot about this. I will change it (return back) to ImageSyncThrottler.

@@ -85,10 +85,14 @@ template <typename I>
void BootstrapRequest<I>::cancel() {
dout(20) << dendl;

Mutex::Locker locker(m_lock);
m_canceled = true;
{

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: no need for new block

m_client_meta, m_work_queue, ctx,
m_progress_ctx);
if (m_canceled) {
m_ret_val = -ECANCELED;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: was this change needed? If the image sync wasn't started, no need to worry about a race w/ the result code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was no special need, this just looked more readable to me. And yes even if there no need to worry about a race I prefer to keep the lock in the cases like this if it does not make trouble. Because when I look at the code later it always looks alarming to me when I see variables like these accessed without lock and I need to go through the code to ensure it is safe.

Ok, I will return it back to make changes less intrusive.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries -- just double checking

@@ -72,8 +74,18 @@ class InstanceWatcher : protected librbd::Watcher {
const std::string &peer_image_id,
bool schedule_delete, Context *on_notify_ack);

void notify_sync_request(const std::string &sync_id, Context *on_sync_start);
bool cancel_notify_sync_request(const std::string &sync_id);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: perhaps just cancel_sync_request?

@trociny trociny force-pushed the wip-18789 branch 2 times, most recently from 4baf9a0 to 967ad82 Compare May 31, 2017 11:54
@trociny trociny changed the title [DNM] rbd-mirror A/A: coordinate image syncs with leader rbd-mirror A/A: coordinate image syncs with leader May 31, 2017
@trociny
Copy link
Contributor Author

trociny commented May 31, 2017

@dillaman updated

@dillaman
Copy link

dillaman commented Jun 2, 2017

retest this please

Mykola Golub added 4 commits June 6, 2017 10:42
Remove unnecessary ImageSyncThrottler dependency.

Signed-off-by: Mykola Golub <mgolub@mirantis.com>
Signed-off-by: Mykola Golub <mgolub@mirantis.com>
Would have been possible when release_image had lead to cancel
sync (i.e. cancel_sync_request).

Signed-off-by: Mykola Golub <mgolub@mirantis.com>
Fixes: http://tracker.ceph.com/issues/18789
Signed-off-by: Mykola Golub <mgolub@mirantis.com>
@trociny
Copy link
Contributor Author

trociny commented Jun 6, 2017

rebased to resolve conflicts with master

Copy link

@dillaman dillaman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm -- test failures seem unrelated to this change

@dillaman dillaman merged commit 7bda34e into ceph:master Jun 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants