librbd: optionally unregister "laggy" journal clients #10378

trociny · 2016-07-21T07:57:24Z

Fixes: http://tracker.ceph.com/issues/14738

dillaman · 2016-07-21T19:18:52Z

@trociny Instead of unregistering the laggy client (which would work), I put a placeholder ClientState in the registration record [1]. My original idea was to set this to DISCONNECTED (via a new cls method) and adjust the JournalTrimmer to ignore clients with the state set to DISCONNECTED when determining the minimum set.

The rbd-mirror daemon can then detect if and when it's client state is set to DISCONNECTED, stop replay (if in-progress), and re-initiate an image sync w/o the need to split-brain. The sync point record supports a "from snapshot" so it can be modified to only resync from the last known good snapshot (if any). For example, if the system gets disconnected during the initial sync, it can recover w/o starting over from scratch by reconnecting to the journal, creating a second sync point record tied to the first, and once the first sync is complete, it would start with the second (similar to how VM/storage live migration transfers the full base, and then one or more deltas until it can cut over).

The iterative / recoverable image sync wouldn't be part of this effort, but I wanted to lay out my full plans.

[1] https://github.com/ceph/ceph/blob/master/src/cls/journal/cls_journal_types.h#L79

trociny · 2016-07-22T06:33:09Z

@dillaman Got it! Thanks. Will redo.

trociny · 2016-07-27T13:16:23Z

@dillaman How do you like this version?

dillaman · 2016-07-28T12:22:52Z

src/common/config_opts.h

@@ -1250,6 +1250,7 @@ OPTION(rbd_journal_object_flush_interval, OPT_INT, 0) // maximum number of pendi
 OPTION(rbd_journal_object_flush_bytes, OPT_INT, 0) // maximum number of pending bytes per journal object
 OPTION(rbd_journal_object_flush_age, OPT_DOUBLE, 0) // maximum age (in seconds) for pending commits
 OPTION(rbd_journal_pool, OPT_STR, "") // pool for journal objects
+OPTION(rbd_journal_client_object_sets_behind_max, OPT_INT, 0) // maximum number of object sets a journal client can be behind before it is automatically unregistered


Minor: perhaps rbd_journal_max_concurrent_object_sets?

dillaman · 2016-07-28T12:36:07Z

@trociny Looking good to me. The only missing piece is the smarts in rbd::mirror::ImageReplayer to catch a disconnect while we are actively replaying.

trociny · 2016-08-01T13:38:57Z

@dillaman Updated, still not tested well. Also there are potential issues to discuss:

A laggy client is disconnected, resync is started, the client is re-registered. While it is resyncing the master commit position grows and the mirror client is unregistered again before resync is complete. Should we add a way the rbd-mirror could tell that the client cannot be disconnected (during cync/bootstrap)?
What to do when disconnect is detected when replaying (or just after resync)? Doing resync immediately does not look like a good idea (mirroring is likely overloaded, and resync will only increase the load). Currently it just stops image replay (though without manual flag, so will be restarted on the next pool rescan). May be schedule resync after configurable interval?

trociny · 2016-08-03T15:15:46Z

Added some tests. Also, experimental rbd journal disconnect command is added. It is used in functional tests, not sure though it is useful enough to add it upstream. @dillaman do you think it would be useful to have such command? It could be also implemented as rbd mirror image disconnect.

dillaman · 2016-08-04T21:29:17Z

src/journal/JournalMetadata.cc

+    const std::string &client_id = c.id;
+    uint64_t object_set = 0;
+    if (!c.commit_position.object_positions.empty()) {
+      auto position = *(c.commit_position.object_positions.begin());


Minor: auto &

trociny · 2016-08-10T15:24:46Z

src/journal/JournalMetadata.cc

@@ -986,6 +993,7 @@ void JournalMetadata::committed(uint64_t commit_tid,

    ldout(m_cct, 20) << "updated commit position: " << commit_position << ", "
                     << "on_safe=" << m_commit_position_ctx << dendl;
+


TODO: remove empty line

trociny · 2016-08-10T15:27:41Z

@dillaman updated. Now when a journal client is disconnected, rbd-mirror just stops image replayer, until 'mirror image resync' command is issued by user. Also rbd_mirror_resync_after_disconnect configuration option is added to automatically start the resync.

dillaman · 2016-08-10T20:19:32Z

src/tools/rbd/action/Journal.cc

@@ -985,6 +1084,11 @@ Shell::Action action_import(
  {"journal", "import"}, {}, "Import image journal.", "",
  &get_import_arguments, &execute_import);

+Shell::Action action_disconnect(
+  {"journal", "client", "disconnect"}, {},
+  "Flag image journal client disconnected.", "",


Minor: "... as disconnected"

dillaman · 2016-08-10T20:25:01Z

src/tools/rbd_mirror/ImageReplayer.cc

+
+  if (client.state != cls::journal::CLIENT_STATE_CONNECTED) {
+    dout(0) << "client flagged disconnected, stopping image replay" << dendl;
+    stop(nullptr, false, "disconnected");


Same comment here: should we set an error code so that an admin can see the problem?

dillaman · 2016-08-10T20:27:02Z

@trociny few minor comments, but otherwise lgtm

trociny · 2016-08-15T13:24:07Z

@dillaman Updated. I think I addressed all your comments. The only thing is renaming rbd_mirror_resync_after_disconnect config option to rbd_resync_to_primary_mirror_after_disconnect. I agree that using rbd_mirror prefix is unfortunate here, still I am not very happy with rbd_resync_to_primary_mirror_after_disconnect name. I changed it to rbd_mirroring_resync_after_disconnect instead. What do you think? If you still like rbd_resync_to_primary_mirror_after_disconnect more I will change it.

dillaman · 2016-08-15T13:30:48Z

@trociny That's fine with me -- just wanted to ensure it didn't overlap with "rbd_mirror".

dillaman · 2016-09-05T01:54:51Z

@trociny rebase required

Signed-off-by: Mykola Golub <mgolub@mirantis.com>

Fixes: http://tracker.ceph.com/issues/14738 Signed-off-by: Mykola Golub <mgolub@mirantis.com>

Signed-off-by: Mykola Golub <mgolub@mirantis.com>

…cceeded Signed-off-by: Mykola Golub <mgolub@mirantis.com>

Signed-off-by: Mykola Golub <mgolub@mirantis.com>

…nnect Signed-off-by: Mykola Golub <mgolub@mirantis.com>

trociny · 2016-09-05T09:57:00Z

@dillaman Testing after rebase I noticed the issue with ImageReplayer<I>::handle_remote_journal_metadata_updated(): it might be called when ImageReplayer was stopping and shutting journal down, and metadata was null so it crashed.

I have updated handle_remote_journal_metadata_updated to check the current ImageReplayer state and return if it is not running.

Now, retesting this locally. I will let you know about test results.

trociny · 2016-09-05T13:07:04Z

@dillaman it passed local tests

trociny added feature rbd labels Jul 21, 2016

dillaman self-assigned this Jul 21, 2016

trociny force-pushed the wip-14738 branch 2 times, most recently from 6f922c7 to 8799931 Compare July 27, 2016 13:13

dillaman reviewed Jul 28, 2016
View reviewed changes

trociny force-pushed the wip-14738 branch from 8799931 to 7706e6c Compare August 1, 2016 13:28

trociny force-pushed the wip-14738 branch from 7706e6c to 2173c85 Compare August 3, 2016 14:03

dillaman reviewed Aug 4, 2016
View reviewed changes

trociny force-pushed the wip-14738 branch from 2173c85 to f2c7855 Compare August 10, 2016 15:18

trociny reviewed Aug 10, 2016
View reviewed changes

dillaman reviewed Aug 10, 2016
View reviewed changes

trociny force-pushed the wip-14738 branch from f2c7855 to 892082f Compare August 15, 2016 13:18

trociny force-pushed the wip-14738 branch 2 times, most recently from d827361 to 066b516 Compare August 19, 2016 05:14

dillaman added the wip-jason-testing label Aug 23, 2016

Mykola Golub added 6 commits September 5, 2016 08:51

cls/journal: add async client_update_state method

58b8c66

Signed-off-by: Mykola Golub <mgolub@mirantis.com>

journal: allow to trim journal for "laggy" clients

0b8b1aa

Signed-off-by: Mykola Golub <mgolub@mirantis.com>

librbd: optionally flag "laggy" journal clients disconnected

b8eafef

Fixes: http://tracker.ceph.com/issues/14738 Signed-off-by: Mykola Golub <mgolub@mirantis.com>

rbd: new command to disconnect journal client

fc3ba54

Signed-off-by: Mykola Golub <mgolub@mirantis.com>

rbd-mirror: decode_client_meta should return false on error

cd5eb36

Signed-off-by: Mykola Golub <mgolub@mirantis.com>

rbd-mirror: resync was possible only when image replayer start had su…

4bf6912

…cceeded Signed-off-by: Mykola Golub <mgolub@mirantis.com>

trociny force-pushed the wip-14738 branch from 066b516 to ac2bdad Compare September 5, 2016 05:53

Mykola Golub added 2 commits September 5, 2016 12:48

rbd-mirror: stop replay when client is disconnected

330dba0

Signed-off-by: Mykola Golub <mgolub@mirantis.com>

rbd-mirror: option to automatically resync after journal client disco…

77fd6a1

…nnect Signed-off-by: Mykola Golub <mgolub@mirantis.com>

trociny force-pushed the wip-14738 branch from ac2bdad to 77fd6a1 Compare September 5, 2016 09:49

dillaman merged commit c2a5e70 into ceph:master Sep 7, 2016

trociny deleted the wip-14738 branch September 27, 2016 12:36

dillaman mentioned this pull request Oct 11, 2016

jewel: rbd: mirror: improve resiliency of stress test case #11433

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

librbd: optionally unregister "laggy" journal clients #10378

librbd: optionally unregister "laggy" journal clients #10378

trociny commented Jul 21, 2016

dillaman commented Jul 21, 2016

trociny commented Jul 22, 2016

trociny commented Jul 27, 2016

dillaman Jul 28, 2016

dillaman commented Jul 28, 2016

trociny commented Aug 1, 2016

trociny commented Aug 3, 2016

dillaman Aug 4, 2016

trociny Aug 10, 2016

trociny commented Aug 10, 2016

dillaman Aug 10, 2016

dillaman Aug 10, 2016

dillaman commented Aug 10, 2016

trociny commented Aug 15, 2016

dillaman commented Aug 15, 2016

dillaman commented Sep 5, 2016

trociny commented Sep 5, 2016 •

edited

trociny commented Sep 5, 2016

		@@ -986,6 +993,7 @@ void JournalMetadata::committed(uint64_t commit_tid,

		ldout(m_cct, 20) << "updated commit position: " << commit_position << ", "
		<< "on_safe=" << m_commit_position_ctx << dendl;

librbd: optionally unregister "laggy" journal clients #10378

librbd: optionally unregister "laggy" journal clients #10378

Conversation

trociny commented Jul 21, 2016

dillaman commented Jul 21, 2016

trociny commented Jul 22, 2016

trociny commented Jul 27, 2016

dillaman Jul 28, 2016

Choose a reason for hiding this comment

dillaman commented Jul 28, 2016

trociny commented Aug 1, 2016

trociny commented Aug 3, 2016

dillaman Aug 4, 2016

Choose a reason for hiding this comment

trociny Aug 10, 2016

Choose a reason for hiding this comment

trociny commented Aug 10, 2016

dillaman Aug 10, 2016

Choose a reason for hiding this comment

dillaman Aug 10, 2016

Choose a reason for hiding this comment

dillaman commented Aug 10, 2016

trociny commented Aug 15, 2016

dillaman commented Aug 15, 2016

dillaman commented Sep 5, 2016

trociny commented Sep 5, 2016 • edited

trociny commented Sep 5, 2016

trociny commented Sep 5, 2016 •

edited