New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rgw multisite: feature of bucket sync enable/disable #15801

Merged
merged 5 commits into from Aug 1, 2017

Conversation

Projects
None yet
2 participants
@zhangsw
Contributor

zhangsw commented Jun 21, 2017

I rebase #10995 for reviewing and testing.

use bucket info'flag to store bucket sync flag, before adding the data log, we check the bucket flag.

add two rgw-admin command
radosgw-admin bucket sync enable
radosgw-admin bucket sync disable

Signed-off-by: Zengran Zhang zhangzengran@h3c.com

@zhangsw

This comment has been minimized.

Contributor

zhangsw commented Jun 21, 2017

@cbodley I've rebased that PR.

@cbodley cbodley self-requested a review Jun 21, 2017

@@ -356,6 +356,8 @@ enum {
OPT_BUCKET_SYNC_STATUS,
OPT_BUCKET_SYNC_INIT,
OPT_BUCKET_SYNC_RUN,
OPT_BUCKET_SYNC_DISABLE,
OPT_BUCKET_SYNC_ENABLE,

This comment has been minimized.

@cbodley

cbodley Jun 21, 2017

Contributor

these new commands will need to be added to _usage() and src/test/cli/radosgw-admin/help.t

if (entry.id > header.max_marker)
header.max_marker = entry.id;
header.syncstopped = false;

This comment has been minimized.

@cbodley

cbodley Jun 21, 2017

Contributor

i think we're still missing the part that avoids writing new entries to the bucket index log while sync is stopped (that means avoiding calls to log_index_operation())

This comment has been minimized.

@cbodley

cbodley Jun 23, 2017

Contributor

followup: all of these calls to log_index_operation() have access to rgw_bucket_dir_header::syncstopped, so you should be able to just do this:

-  if (op.log_op) {
+  if (op.log_op && !header.syncstopped) {
     rc = log_index_operation(hctx, op.key, op.op, op.tag, entry.meta.mtime,

instead of passing bucket_info.datasync_flag_enabled() all the way down through RGWRados

This comment has been minimized.

@zhangsw

zhangsw Jun 26, 2017

Contributor

Yes~ that's elegant. I've updated it.

@zhangsw

This comment has been minimized.

Contributor

zhangsw commented Jun 23, 2017

jenkins test this please

@zhangsw

This comment has been minimized.

Contributor

zhangsw commented Jun 23, 2017

@cbodley I've modified it according to your suggestion.

@@ -538,6 +549,37 @@ class RGWSimpleRadosUnlockCR : public RGWSimpleCoroutine {
int request_complete() override;
};
class RGWSimpleRadosDeleteCR : public RGWSimpleCoroutine {

This comment has been minimized.

@cbodley

cbodley Jun 28, 2017

Contributor

there's a RGWRadosRemoveCR now, can you use that instead? RGWSimpleRadosDeleteCR and RGWAsyncDeleteSystemObj shouldn't be needed

Zengran Zhang and others added some commits Dec 7, 2016

rgw multisite: feature of bucket sync enable/disable
use bucket info'flag to store bucket sync flag, before adding the data log, we check the bucket flag.

add two rgw-admin command
radosgw-admin bucket sync enable
radosgw-admin bucket sync disable

Signed-off-by: Zengran Zhang <zhangzengran@h3c.com>
rgw multisite: add command to radosgw-admin. Stop writing new entries to
bucket index log while bucket sync is stopped.

Signed-off-by: Zhang Shaowen <zhangshaowen@cmss.chinamobile.com>
test/rgw: add test_bucket_sync_disable_enable
Signed-off-by: Zhang Shaowen <zhangshaowen@cmss.chinamobile.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
@cbodley

This comment has been minimized.

Contributor

cbodley commented Jun 29, 2017

@zhangsw I updated your test case from #14218 and pushed to cbodley@de5f616, could you pull that into this pr? it's passing consistently for me with up to 6 zones

there are a couple cases that I'd like to get better coverage on, though:

  • after disable, testing that the sync status objects actually get deleted
  • when RGWBucketShardIncrementalSyncCR sees both SYNCSTOP and RESYNC in the same batch of entries
rgw multisite: remove RGWSimpleRadosDeleteCR and RGWAsyncDeleteSystem…
…Obj.

Use RGWRadosRemoveCR instead.

Signed-off-by: Zhang Shaowen <zhangshaowen@cmss.chinamobile.com>
@zhangsw

This comment has been minimized.

Contributor

zhangsw commented Jul 6, 2017

@cbodley I add two test case: enable sync right after disable sync, check bucket sync status obj

@zhangsw

This comment has been minimized.

Contributor

zhangsw commented Jul 7, 2017

jenkins test this please

yield {
const string& oid = RGWBucketSyncStatusManager::status_oid(sync_env->source_zone, bs);
RGWRados *store = sync_env->store;
call(new RGWRadosRemoveCR(store, rgw_raw_obj{store->get_zone_params().log_pool, oid}));

This comment has been minimized.

@cbodley

cbodley Jul 10, 2017

Contributor

directly after this [delete] op succeeds, i see a [call lock.unlock] op fail with -ENOENT. can you try doing a lease_cr->abort() here to avoid sending that unlock?

@cbodley

tests are passing with rgw override bucket index max shards = 8 - looking good

zone.cluster.admin(cmd)
def check_bucket_sync_status_obj_not_exist(zone, bucket_name):
cmd = ['log', 'list'] + zone.zone_arg() + ['|grep ', bucket_name]

This comment has been minimized.

@cbodley

cbodley Jul 10, 2017

Contributor

i worry about the |grep part here. adding read_only=True to the zone.cluster.admin() call below makes it fail, because the --rgw-cache-enabled false part gets added after the |. it does work as is with test_multi.py - i'll have to see if it works in teuthology

for bucket_name in buckets:
for zone in zonegroup.zones:
check_bucket_sync_status_obj_not_exist(zone, bucket_name)

This comment has been minimized.

@cbodley

cbodley Jul 10, 2017

Contributor

it's probably better to just loop over zones here, and grep the log list output for all bucket names at the same time

@zhangsw

This comment has been minimized.

Contributor

zhangsw commented Jul 13, 2017

@cbodley I've updated.

a = '|'
cmd = ['log', 'list'] + zone.zone_arg() + ['|grep ', '"' + a.join(buckets) + '"']
for _ in range(config.checkpoint_retries):
log_list, ret = zone.cluster.admin(cmd, check_retcode=False)

This comment has been minimized.

@cbodley

cbodley Jul 18, 2017

Contributor

yeah, teuthology is not handling this part well: example results

the Cluster implementation for teuthology is adding extra --cluster and --debug-rgw arguments after the pipe (and since this happens in a loop, those arguments get appended multiple times: --cluster c1 --debug-rgw 0 --cluster c1 --debug-rgw 0 --cluster c1 --debug-rgw 0 ...)

as gross as it is, can you remove the |grep part, and search the log_list string manually? it should also verify that the object name starts with bucket.sync-status. in the run above, i see that it matched an object named 2017-07-18-14-c772d205-b341-46d5-af49-43c21450c74f.4196.9-hoxbdk-35 - i'm not sure where that came from

then we can also add read_only=True to zone.cluster.admin()

@zhangsw

This comment has been minimized.

Contributor

zhangsw commented Jul 24, 2017

@cbodley I've updated it.

def check_buckets_sync_status_obj_not_exist(zone, buckets):
cmd = ['log', 'list'] + zone.zone_arg()
for _ in range(config.checkpoint_retries):
log_list, ret = zone.cluster.admin(cmd, check_retcode=False)

This comment has been minimized.

@cbodley

cbodley Jul 24, 2017

Contributor
  • could you move the cmd = assignment inside the for loop, so teuthology doesn't keep appending extra --cluster c1 --debug-rgw 0 args?
  • add a read_only=True arg for zone.cluster.admin()
rgw multisite: add two bucket sync test case: enable right after disa…
…ble,

sync status obj deleted after disable sync.

Signed-off-by: Zhang Shaowen <zhangshaowen@cmss.chinamobile.com>
@zhangsw

This comment has been minimized.

Contributor

zhangsw commented Jul 26, 2017

jenkins test this please

@cbodley

This comment has been minimized.

Contributor

cbodley commented Jul 26, 2017

thanks, i'll start another teuthology run 👍

@cbodley

This comment has been minimized.

Contributor

cbodley commented Aug 1, 2017

thanks, the test changes are working in teuthology. the failures there are unrelated

@cbodley

cbodley approved these changes Aug 1, 2017

@cbodley cbodley merged commit 14aed80 into ceph:master Aug 1, 2017

4 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodified Submodules submodules for project are unmodified
Details
make check make check succeeded
Details
make check (arm64) make check succeeded
Details

@zhangsw zhangsw deleted the zhangsw:feature-bucket-sync branch Aug 8, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment