New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rbd-mirror: delayed replication support #11879

Merged
merged 5 commits into from Jan 13, 2017

Conversation

Projects
None yet
3 participants
@trociny
Contributor

trociny commented Nov 9, 2016

Fixes: http://tracker.ceph.com/issues/15371
Signed-off-by: Mykola Golub mgolub@mirantis.com

@dillaman dillaman self-assigned this Nov 9, 2016

@dillaman dillaman changed the title from rbd-mirror: delayed replication support to [DNM] rbd-mirror: delayed replication support Nov 9, 2016

@dillaman

This comment has been minimized.

Contributor

dillaman commented Nov 9, 2016

Post-Kraken feature

@@ -384,11 +385,15 @@ void EventEntry::decode(bufferlist::iterator& it) {
}
boost::apply_visitor(DecodeVisitor(struct_v, it), event);
if (struct_v >= 4) {
::decode(timestamp, it);

This comment has been minimized.

@trociny

trociny Nov 14, 2016

Contributor

@dillaman Thinking about this a little more, not sure it will always work. If a new event type is added (or an existent one is extended) I expect it will fail.

Actually, I would like timestamp be encoded at the beginning of the EventEntry, before Event, but I don't see how to preserve compatibility, so an older client still could read a journal created by a new client. A way I see is to
make timestamp a member of Event, i.e. adding to every event structure (or use inheritance). I am not like this much though.

Do you have any suggestions? Do we want older clients to be able to decode newer journals?

This comment has been minimized.

@dillaman

dillaman Nov 14, 2016

Contributor

New fields always need to be added to the end of the structure to preserve compatibility. You are correct that by placing the encode/decode in this common location, there is the possibility that an updated Event will now be creating incompatible streams. I think you need to add this to every event structure.

@@ -1272,6 +1272,7 @@ OPTION(rbd_validate_pool, OPT_BOOL, true) // true if empty pools should be valid
OPTION(rbd_validate_names, OPT_BOOL, true) // true if image specs should be validated
OPTION(rbd_auto_exclusive_lock_until_manual_request, OPT_BOOL, true) // whether to automatically acquire/release exclusive lock until it is explicitly requested, i.e. before we know the user of librbd is properly using the lock API
OPTION(rbd_mirroring_resync_after_disconnect, OPT_BOOL, false) // automatically start image resync after mirroring is disconnected due to being laggy
OPTION(rbd_mirroring_delay, OPT_INT, 0) // time-delay in seconds for rbd-mirror asynchronous replication

This comment has been minimized.

@dillaman

dillaman Nov 14, 2016

Contributor

Minor: perhaps rbd_mirroring_replication/replay_delay?

@@ -384,11 +385,15 @@ void EventEntry::decode(bufferlist::iterator& it) {
}
boost::apply_visitor(DecodeVisitor(struct_v, it), event);
if (struct_v >= 4) {
::decode(timestamp, it);

This comment has been minimized.

@dillaman

dillaman Nov 14, 2016

Contributor

New fields always need to be added to the end of the structure to preserve compatibility. You are correct that by placing the encode/decode in this common location, there is the possibility that an updated Event will now be creating incompatible streams. I think you need to add this to every event structure.

return 0;
}
return (event_time + mirroring_delay - now) + 1;

This comment has been minimized.

@dillaman

dillaman Nov 14, 2016

Contributor

Minor: why add an extra second?

This comment has been minimized.

@trociny

trociny Nov 14, 2016

Contributor

It is to convert a double to the smallest following int (I could use ceil() but thought my version was good enough for this purposes).

return 0;
}
utime_t now = ceph_clock_now(g_ceph_context);

This comment has been minimized.

@dillaman

dillaman Nov 14, 2016

Contributor

Minor: I think we want a new time helper to retrieving the UTC time, otherwise it would rely on users correctly setting the timezone offset and keeping it updated w/ daylight savings, etc.

This comment has been minimized.

@trociny

trociny Nov 14, 2016

Contributor

As I see, ceph_clock_now() uses clock_gettime(), which gets time since the Epoch, so it should not depend on the user timezone?

Also, not sure about using g_ceph_context here. In journal/Types.h I used nullptr, just not to add an additional header file from common. I think this should be consistent, i.e. nullptr here too. On the other hand, being able to shift time via clock_offset config option could be useful for testing, so may be use g_ceph_context here and when encoding?

This comment has been minimized.

@dillaman

dillaman Nov 14, 2016

Contributor

Well, if you are using UTC there should be no need to shift.

@@ -225,6 +225,19 @@ class ImageReplayerAdminSocketHook : public AdminSocketHook {
Commands commands;
};
int replay_delay(const utime_t &event_time, int mirroring_delay) {

This comment has been minimized.

@dillaman

dillaman Nov 14, 2016

Contributor

Minor: perhaps uint32_t calculate_replay_delay?

@dillaman

This comment has been minimized.

Contributor

dillaman commented Nov 14, 2016

I would like to see a mock unit test for the delay where it properly handles a stop while waiting for the delay to complete.

@ghost

This comment has been minimized.

ghost commented Nov 17, 2016

jenkins test this please (no log)

@trociny

This comment has been minimized.

Contributor

trociny commented Nov 17, 2016

@dillaman I am still working on ImageReplayer mock tests. Your other comments are addressed and I would like to know what do you think about my approach for encoding the timestamp? I wouldn't like to add the timestamp into every event entry type. I think, by encoding it in an additional ENCODE_START/FINISH section, I have addressed potential compatibility issues, not sure about the interface (EventMetada type) though, how nice it looks...

@dillaman

This comment has been minimized.

Contributor

dillaman commented Nov 17, 2016

@trociny I like the EventMetadata wrapper -- but why not just add it to EventEntry and have its encode / decode methods encode/decode the metadata after encoding/decoding the actual event?

@trociny

This comment has been minimized.

Contributor

trociny commented Nov 17, 2016

@dillaman I though having a dedicated object for every ENCODE_START/FINISH section would be a little cleaner, but I don't have a strong opinion here, so I will do as you suggest.

@dillaman

This comment has been minimized.

Contributor

dillaman commented Nov 17, 2016

@trociny My rationale is just because EventEntry is already a wrapper -- it runs the encode/decode on the actual Event type it contains. Therefore, it seems like a natural extension point instead of adding external helpers for encoding/decoding the variant and the metadata record.

@trociny

This comment has been minimized.

Contributor

trociny commented Nov 17, 2016

@dillaman Then do we really need a new struct EventMetadata? May be just add timestamp to EventEntry, end have it something like below?

void EventEntry::encode(bufferlist& bl) const {
  ENCODE_START(3, 1, bl);
  boost::apply_visitor(EncodeVisitor(bl), event);
  ENCODE_FINISH(bl);
  ENCODE_START(1, 1, bl);
  ::encode(timestamp, bl);
  ENCODE_FINISH(bl);
}

Or do you think EventMetadata still could be useful?

@dillaman

This comment has been minimized.

Contributor

dillaman commented Nov 17, 2016

@trociny That approach is fine with me as well.

@ghost

This comment has been minimized.

ghost commented Nov 18, 2016

jenkins test this please (jenkins failure)

@trociny

This comment has been minimized.

Contributor

trociny commented Nov 23, 2016

@dillaman updated (timestamp encoding, and ImageReplayer mock tests added)

@@ -710,7 +710,8 @@ int Journal<I>::demote() {
return r;
}
journal::EventEntry event_entry{journal::DemoteEvent{}};
journal::EventEntry event_entry{journal::DemoteEvent{},
ceph_clock_now(nullptr)};

This comment has been minimized.

@dillaman

dillaman Nov 29, 2016

Contributor

I would still like to see a specialty ceph_clock_utc or similar method to ensure the clocks remain consistent throughout the world and are not tied to the local timezone of the daemon and clients.

This comment has been minimized.

@trociny

trociny Nov 29, 2016

Contributor

@dillaman I am not sure I see your point here. Looking at the implementation of ceph_clock_now(), it uses clock_gettime() on linux and gettimeofday() on other platforms. Both provide number of seconds since the epoch (which is always in UTC timezone), so that should not be TZ dependent. Am I missing something?

zhuzha:/tmp% cat test.c  
#include <time.h>
#include <stdio.h>

#include <sys/time.h>

int
main() {

  struct timespec tp;
  clock_gettime(CLOCK_REALTIME, &tp);
  printf("tp.tv_sec=%d\n", (long)tp.tv_sec);

  struct timeval tv;
  gettimeofday(&tv, NULL);
  printf("tv.tv_sec=%d\n", (long)tv.tv_sec);

  return 0;
}

zhuzha:/tmp% gcc /tmp/test.c                        
zhuzha:/tmp% TZ=UTC ./a.out ; TZ=Europe/Kiev ./a.out
tp.tv_sec=1480427775
tv.tv_sec=1480427775
tp.tv_sec=1480427775
tv.tv_sec=1480427775

This comment has been minimized.

@dillaman

dillaman Nov 29, 2016

Contributor

My mistake -- I didn't realize they tracked from the Epoch.

trociny pushed a commit that referenced this pull request Dec 2, 2016

Mykola Golub
Merge branch 'wip-15371' into wip-mgolub-testing
[DNM] rbd-mirror: delayed replication support #11879
@trociny

This comment has been minimized.

Contributor

trociny commented Dec 8, 2016

tests updated after the recent changes to master

@dillaman dillaman changed the title from [DNM] rbd-mirror: delayed replication support to rbd-mirror: delayed replication support Jan 4, 2017

@dillaman

This comment has been minimized.

Contributor

dillaman commented Jan 4, 2017

@trociny Can you rebase this to the latest master? It looks like ceph_clock_now no longer takes an argument.

https://github.com/ceph/ceph/blob/master/src/common/Clock.h#L22

@trociny

This comment has been minimized.

Contributor

trociny commented Jan 4, 2017

@dillaman updated

@dillaman

This comment has been minimized.

Contributor

dillaman commented Jan 5, 2017

@trociny

This comment has been minimized.

Contributor

trociny commented Jan 5, 2017

@dillaman I fixed this by adding a dummy ImageSync class to test_mock_ImageReplayer.cc.

@dillaman

This comment has been minimized.

Contributor

dillaman commented Jan 5, 2017

@trociny thanks

@trociny

This comment has been minimized.

Contributor

trociny commented Jan 8, 2017

@dillaman It turned out I didn't test well it after the latest rebase , after metadata replication had been merged. I just checked unittest and it worked. My test used set_metadata to set delay param on local image, and after the rebase I had to update the test to set metadata param on the remote image and wait for it to be replicated.

I thought after metadata_set operation the corresponding ImageCtx parameter would be updated automatically (by refresh request) but it turned out it is only updated in open request.

I changed the test to set the delay using m_local_cluster->conf_set instead of set_metadata, so it should work now.

But don't you think it is a bug that metadata is not updated by refresh request?

Also, interesting, I investigated why the old version worked with librados test stub. It turned out, that this test would pass on the librados test stub and fail on a real rados:

  open_remote_image(&ictx);
  ictx->cct->_conf->set_val("rbd_mirroring_replay_delay", "111");
  close_image(ictx);
  open_local_image(&ictx);
  ASSERT_EQ(111, ictx->cct->_conf->rbd_mirroring_replay_delay);
  close_image(ictx);

So, it looks like in the librados test stub config is shared between remote and local contexts, while in the real rados it is not. metadata_set operation also changes the current config value, so in the old test version, when metdata_set was executed for the remote context it also updated the config value for local context and the test worked.

Actually, I am also not sure that this is correct to change config value after metadata_set operation. I would rather expect the corresponding ImgCtx variable is only updated.

@dillaman

This comment has been minimized.

Contributor

dillaman commented Jan 9, 2017

@trociny I don't necessarily think it's a bug that the config overrides aren't dynamically picked up via metadata set since no other rbd config values are dynamic (via asok config set).

The issue w/ librados_test_stub is probably due to the fact that it only really supports a singleton version of RadosClient. It's the same reason why blacklisting cannot work w/o a slight refactor. I guess it's never risen to the level of importance to tackle yet, though.

@dillaman

This comment has been minimized.

Contributor

dillaman commented Jan 9, 2017

retest this please

@trociny

This comment has been minimized.

Contributor

trociny commented Jan 9, 2017

I don't necessarily think it's a bug that the config overrides aren't dynamically picked up via metadata set since no other rbd config values are dynamic (via asok config set).

I can agree that it might be not necessary to update relevant ImageCtx vars on metadata_set operation. But still it looks strange that we update contex config variables on metadata update operation. That means if a user executes metadata_set for one image, and then open another image it will use metadata settings for the previous image in this session.

@dillaman

This comment has been minimized.

Contributor

dillaman commented Jan 9, 2017

@trociny Oh -- I see what you are saying now. The fact that Operations<I>::metadata_set is setting the configuration setting is a bug. It should use a dummy md_config_t just to validate that the setting is valid [1]

[1] http://tracker.ceph.com/issues/18465

@dillaman

This comment has been minimized.

Contributor

dillaman commented Jan 12, 2017

retest this please

@dillaman

This comment has been minimized.

Contributor

dillaman commented Jan 12, 2017

@trociny needs a rebase

Mykola Golub added some commits Nov 24, 2016

Mykola Golub
qa/workunits/rbd: allow to tweak rbd-mirror test setup
Signed-off-by: Mykola Golub <mgolub@mirantis.com>
Mykola Golub
rbd-mirror: tweaks to support creating mock test cases
Signed-off-by: Mykola Golub <mgolub@mirantis.com>
Mykola Golub
test/rbd_mirror: ImageReplayer mock tests
Signed-off-by: Mykola Golub <mgolub@mirantis.com>
Mykola Golub
rbd-mirror: delayed replication support
Fixes: http://tracker.ceph.com/issues/15371
Signed-off-by: Mykola Golub <mgolub@mirantis.com>
Mykola Golub
test: add "resync requested" unit test for rbd-mirror bootstrap request
Signed-off-by: Mykola Golub <mgolub@mirantis.com>
@trociny

This comment has been minimized.

Contributor

trociny commented Jan 12, 2017

@dillaman updated

@dillaman dillaman merged commit 5ecfc2c into ceph:master Jan 13, 2017

2 of 3 checks passed

default Build finished.
Details
Signed-off-by all commits in this PR are signed
Details
Unmodifed Submodules submodules for project are unmodified
Details

@trociny trociny deleted the trociny:wip-15371 branch Jan 13, 2017

@chenliuzhong

This comment has been minimized.

Contributor

chenliuzhong commented Feb 7, 2018

@trociny how to use "rbd_mirroring_replay_delay"?I try to use it by setting "rbd image-meta set pooltest/rbdtest conf_rbd_mirroring_replay_delay 600" for primary rbd image,but it did not work.Could you tell how to use this setting Correctly.
ceph version: 12.2.2
Thank you!

@trociny

This comment has been minimized.

Contributor

trociny commented Feb 7, 2018

Your image-meta command looks correct. I have just checked on the master and it works for me. How did you test that it did not work for you?

I was testing this way:

  1. set conf_rbd_mirroring_replay_delay on the primary image
  2. check (with rbd image-meta list) on the secondary, that the setting was successfully mirrored
  3. check (with rbd mirror image status) on the secondary, that the status is replaying and entries_behind_master is 0 (or small).
  4. write some data to the primary image (with rbd bench --io-type write).
  5. monitor rbd mirror image status on the scondary: entries_behind_master is non zero, and only after 600 second it starts decreasing.
@dillaman

This comment has been minimized.

Contributor

dillaman commented Feb 7, 2018

v12.2.2 has a bug where metadata config overrides are not applied to an already opened image (this is fixed in the forthcoming v12.2.3). Therefore, you could either set the "rbd_mirroring_replay_delay = XYZ" override in the rbd-mirror daemon's config file to globally apply a delay, wait for 12.2.3, or with 12.2.2 you can restart rbd-mirror after the metadata change propagates so that it picks up the new configuration option.

@trociny

This comment has been minimized.

Contributor

trociny commented Feb 7, 2018

Ah, I forgot about that bug.

@chenliuzhong also instead of restarting rbd-mirror daemon, you can just restart the image replayer via rbd-mirror admin socket:

ceph --admin-daemon  ${path_to_rbd_mirror_asok} rbd mirror restart ${POOL}/${image}
@chenliuzhong

This comment has been minimized.

Contributor

chenliuzhong commented Feb 8, 2018

@dillaman @trociny Thank you! is there a patch to solve this issue now?

@trociny

This comment has been minimized.

Contributor

trociny commented Feb 8, 2018

You need these:

#19447
#19485

(ProTip! Add .patch or .diff to the end of URLs for Git’s plaintext views.)

@chenliuzhong

This comment has been minimized.

Contributor

chenliuzhong commented Feb 8, 2018

@trociny OK ,thank you !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment