Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rgw: improve sync status #19573

Merged
merged 4 commits into from Feb 15, 2018

Conversation

Projects
None yet
7 participants
@Leeshine
Copy link
Contributor

commented Dec 18, 2017

improve sync status in radowgw-admin:

  • display behind shards when execute radosgw-admin sync status
  • display errors in sync error list when sync a object failed
  • do not update marker when sync failed
  • display error data log shards when execute radosgw-admin sync status
  • display lagging bucket shards related with behind shards and error shards
  • display recovering data log shards when execute radosgw-admin sync status

Signed-off-by: lvshanchun lvshanchun@gmail.com

@Leeshine Leeshine force-pushed the Leeshine:wip-rgw-sync-status branch from 9c6f0c0 to 3862dad Dec 18, 2017

@Leeshine

This comment has been minimized.

Copy link
Contributor Author

commented Dec 18, 2017

jenkins test this please

@Leeshine

This comment has been minimized.

Copy link
Contributor Author

commented Dec 18, 2017

@oritwas would you mind take a look?

@cbodley cbodley added the rgw label Dec 18, 2017

@@ -1057,7 +1057,7 @@ class RGWDataSyncSingleEntryCR : public RGWCoroutine {
sync_env->observer->on_bucket_changed(bs.bucket.get_key());
}
/* FIXME: what do do in case of error */
if (marker_tracker && !entry_marker.empty()) {
if (marker_tracker && !entry_marker.empty() && sync_status == 0) {

This comment has been minimized.

Copy link
@cbodley

cbodley Dec 18, 2017

Contributor

i think we do still want to update the marker here, so that an error from one bucket shard doesn't prevent progress on bucket shards that come later in the datalog

instead, we write this key to the error_repo, and RGWDataSyncShardCR::incremental_sync() will continue to retry it (see /* process bucket shards that previously failed */), while also making progress on the next bucket shards in the datalog

This comment has been minimized.

Copy link
@Leeshine

Leeshine Dec 19, 2017

Author Contributor

Thank you for kindly review, to be honest, do not update the marker here has a lot of side effects, I pushed this commit here so that to discuss it with all of you, instead of this implement, maybe we can judge whether a shard is failed in get_data_sync_status() in rgw_admin.cc ?

This comment has been minimized.

Copy link
@cbodley

cbodley Dec 19, 2017

Contributor

do not update the marker here has a lot of side effects

i agree that it makes introspection more difficult - we might add some information to the sync status command about the number of entries in each shard's error_repo. are there other side effects that you worry about?

cc @yehudasa @oritwas

This comment has been minimized.

Copy link
@Leeshine

Leeshine Dec 25, 2017

Author Contributor

get the failed shard_id and the number of entries in error_repo seems not difficult, but get the oldest change not applied of a failed shard seems more difficult, hope all of you can give some suggestions on how to implement it in a appropriate way.
@cbodley @oritwas @yehudasa

shards_behind_str.append(*it + ", ");
}
shards_behind_str.erase(shards_behind_str.size()-2);
shards_behind_str.append("]");

This comment has been minimized.

Copy link
@cbodley

cbodley Dec 18, 2017

Contributor

it looks like this could be simplified by using a std::set<int>, and taking advantage of its operator<< defined in include/types.h:

std::set<int> shards = {0,1,2,3};
std::cout << '[' << shards << ']'; // prints [0,1,2,3]

@Leeshine Leeshine force-pushed the Leeshine:wip-rgw-sync-status branch 2 times, most recently from 81ddd29 to 84f8b06 Dec 28, 2017

@Leeshine

This comment has been minimized.

Copy link
Contributor Author

commented Dec 28, 2017

At now, it will display lagging and error shards in radosgw-admin sync status, just as below:

[root@tencent64 build]# bin/radosgw-admin sync status --rgw-zone=zone1  
          realm 9061760c-9820-42b8-a782-d277a1338d07 (earth)
      zonegroup 2f8830fb-7afb-4cd0-abfd-51cab29ba0f6 (us)
           zone 7acd39bd-70c1-4afa-9e2f-dc9cbc8011b8 (zone1)
  metadata sync syncing
                full sync: 0/64 shards
                incremental sync: 64/64 shards
                metadata is behind on 1 shards
                behind shards: [57]
                oldest incremental change not applied: 2017-12-28 10:21:16.0.774857s
      data sync source: bc3cf7fe-fa03-431c-b417-139c3c37ad33 (zone0)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        1 shards have encountered errors
                        error shards: [104]

@Leeshine Leeshine force-pushed the Leeshine:wip-rgw-sync-status branch from 84f8b06 to 3c882bb Dec 29, 2017

@Leeshine

This comment has been minimized.

Copy link
Contributor Author

commented Dec 29, 2017

At now, when specified --sync-detail in execute radosgw-admin sync status, it will display the lagging bucket shards related with behind log shards and error log shards , just as below:

[root@tencent64 build]# bin/radosgw-admin sync status --rgw-zone=zone1 --sync-detail
          realm 22432a03-2ef9-4fe2-96fe-b47b106aa3d9 (earth)
      zonegroup 1f99eab5-0d81-4430-ac8b-86a0a64787ef (us)
           zone b659bb91-74e4-4050-8e10-6ca36c962523 (zone1)
  metadata sync syncing
                full sync: 0/64 shards
                incremental sync: 64/64 shards
                metadata is behind on 2 shards
                behind shards: [29, 57]
                oldest incremental change not applied: 2017-12-29 20:28:13.0.58078s
      data sync source: cffa4822-5131-4678-a371-7cf9d6078695 (zone0)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        data is behind on 2 shards
                        behind shards: [104, 114]
                        oldest incremental change not applied: 2017-12-29 20:32:59.0.53065s
                        1 shards have encountered errors
                        error shards: [104]
                        data is lagging on 2 buckets
                        lagging buckets: buk1:cffa4822-5131-4678-a371-7cf9d6078695.4121.2
                                         buk2:cffa4822-5131-4678-a371-7cf9d6078695.4121.1

@Leeshine Leeshine force-pushed the Leeshine:wip-rgw-sync-status branch from 3c882bb to c77de5d Dec 29, 2017

@Leeshine

This comment has been minimized.

Copy link
Contributor Author

commented Jan 2, 2018

@cbodley ping

for (auto entry : log_entries_map) {
list<rgw_data_change_log_entry> log_entries = entry.second;
for (list<rgw_data_change_log_entry>::iterator log_iter = log_entries.begin(); log_iter != log_entries.end(); ++log_iter) {
ldout(sync_env->cct, 10 ) << "TESTLV " << log_iter->entry.key << dendl;

This comment has been minimized.

Copy link
@liuchang0812

liuchang0812 Jan 2, 2018

Contributor

TESTLV should be removed

}
for (auto entry : log_entries_map) {
list<rgw_data_change_log_entry> log_entries = entry.second;
for (list<rgw_data_change_log_entry>::iterator log_iter = log_entries.begin(); log_iter != log_entries.end(); ++log_iter) {

This comment has been minimized.

Copy link
@liuchang0812

liuchang0812 Jan 2, 2018

Contributor

for (const auto& log_iter : log_entries)

@@ -2713,6 +2735,8 @@ int main(int argc, const char **argv)
// do nothing
} else if (ceph_argparse_binary_flag(args, i, &sync_stats, NULL, "--sync-stats", (char*)NULL)) {
// do nothing
} else if (ceph_argparse_binary_flag(args, i, &sync_detail, NULL, "--sync-detail", (char*)NULL)) {

This comment has been minimized.

Copy link
@liuchang0812

liuchang0812 Jan 2, 2018

Contributor

/doc/man/8/radosgw-admin.rst should be updated.

@Leeshine Leeshine force-pushed the Leeshine:wip-rgw-sync-status branch 3 times, most recently from 27037e6 to 08cbb15 Jan 2, 2018

@cbodley
Copy link
Contributor

left a comment

excellent work overall!

@@ -222,6 +238,42 @@ class RGWSimpleRadosReadCR : public RGWSimpleCoroutine {
}
};

class RGWSimpleOmapKeysReadCR : public RGWSimpleCoroutine {

This comment has been minimized.

Copy link
@cbodley

cbodley Jan 3, 2018

Contributor

can you use the existing RGWRadosGetOmapKeysCR instead?

This comment has been minimized.

Copy link
@Leeshine

Leeshine Jan 7, 2018

Author Contributor

@cbodley RGWRadosGetOmapKeysCR will return -EIO rather than -ENOENT if the related rados object do not exists , when we use it in the RGWReadDataSyncErrorShardsCR, this CR will failed, since it's super-class RGWShardCollectCR always expects child CR succeed or return with -ENOENT, so maybe we should use RGWSimpleOmapKeysReadCR here, as it's retcode will be -ENOENT or just do some change on the existing RGWRadosGetOmapKeysCR ? related code is as below:

int RGWShardCollectCR::operate() {
  reenter(this) {
    while (spawn_next()) {
      current_running++;

      while (current_running >= max_concurrent) {
        int child_ret;
        yield wait_for_child();
        if (collect_next(&child_ret)) {
          current_running--;
          if (child_ret < 0 && child_ret != -ENOENT) {
            ldout(cct, 10) << __func__ << ": failed to fetch log status, ret=" << child_ret << dendl;
            status = child_ret;
          }
        }
      }
    }
    while (current_running > 0) {
      int child_ret;
      yield wait_for_child();
      if (collect_next(&child_ret)) {
        current_running--;
        if (child_ret < 0 && child_ret != -ENOENT) {
          ldout(cct, 10) << __func__ << ": failed to fetch log status, ret=" << child_ret << dendl;
          status = child_ret;
        }
      }
    }
    if (status < 0) {
      return set_cr_error(status);
    }
    return set_cr_done();
  }
  return 0;
}

This comment has been minimized.

Copy link
@cbodley

cbodley Jan 8, 2018

Contributor

can you tell where EIO is coming from in that case? i don't see much difference between RGWRadosGetOmapKeysCR and the new RGWSimpleOmapKeysReadCR, except that yours is calling the synchronous version of ioctx.omap_get_vals() where RGWRadosGetOmapKeysCR::send_request() is using omap_get_vals2() with aio_operate()

This comment has been minimized.

Copy link
@Leeshine

Leeshine Jan 9, 2018

Author Contributor

@cbodley both RGWSimpleOmapKeysReadCR and RGWRadosGetOmapKeysCR will call librados::ObjectReadOperation::omap_get_vals2(), but RGWRadosGetOmapKeysCR will call it with rval specified, if the related rados object do not exist, RGWRadosGetOmapKeysCR will return with rval, the related code is as below:

//rgw_cr_rados.cc
int RGWRadosGetOmapKeysCR::send_request() {
  int r = store->get_raw_obj_ref(obj, &ref);
  if (r < 0) {
    lderr(store->ctx()) << "ERROR: failed to get ref for (" << obj << ") ret=" << r << dendl;
    return r;
  }

  set_status() << "send request";

  librados::ObjectReadOperation op;
  op.omap_get_vals2(marker, max_entries, entries, nullptr, &rval);

  cn = stack->create_completion_notifier();
  return ref.ioctx.aio_operate(ref.oid, cn->completion(), &op, NULL);
}
//rgw_cr_rados.h
int request_complete() override {
    return rval;
  }

but deep into the call chain of RGWSimpleOmapKeysReadCR, we can find it will call omap_get_vals2() with rval not specified, if the related rados object do not exist, the return value of opeator(oid, &op, &bl) will be passed, the related code is as below:

//librados.cc
int librados::IoCtx::omap_get_vals(const std::string& oid,
                                   const std::string& orig_start_after,
                                   const std::string& filter_prefix,
                                   uint64_t max_return,
                                   std::map<std::string, bufferlist> *out_vals)
{
  bool first = true;
  string start_after = orig_start_after;
  bool more = true;
  while (max_return > 0 && more) {
    std::map<std::string,bufferlist> out;
    ObjectReadOperation op;
    op.omap_get_vals2(start_after, filter_prefix, max_return, &out, &more,
		      nullptr);
    bufferlist bl;
    int ret = operate(oid, &op, &bl);
    if (ret < 0) {
      return ret;
    }
    if (more) {
      if (out.empty()) {
	return -EINVAL;  // wth
      }
      start_after = out.rbegin()->first;
    }
    if (out.size() <= max_return) {
      max_return -= out.size();
    } else {
      max_return = 0;
    }
    if (first) {
      out_vals->swap(out);
      first = false;
    } else {
      out_vals->insert(out.begin(), out.end());
      out.clear();
    }
  }
  return 0;
}

in order to analysis the rval passed by osd, we specified the rval in RGWSimpleOmapKeysReadCR's call chain and display this rval , just as below:

[root@0deae4d9bb38 librados]# git diff librados.cc 
diff --git a/src/librados/librados.cc b/src/librados/librados.cc
index e9157ab..d1789d7 100644
--- a/src/librados/librados.cc
+++ b/src/librados/librados.cc
@@ -396,10 +396,12 @@ int librados::IoCtx::omap_get_vals(const std::string& oid,
   while (max_return > 0 && more) {
     std::map<std::string,bufferlist> out;
     ObjectReadOperation op;
+    int val;
     op.omap_get_vals2(start_after, filter_prefix, max_return, &out, &more,
-                     nullptr);
+                     &val);
     bufferlist bl;
     int ret = operate(oid, &op, &bl);
+    std::cout << cpp_strerror(ret) << " : " << cpp_strerror(val) << std::endl;
     if (ret < 0) {
       return ret;
     }

in this way, the output is as below:

[root@0deae4d9bb38 build]# rgw sync status --rgw-zone=zone1 --debug-rgw=20 2>kk
          realm d642c3d5-eea7-4a55-8a17-bf6296cf4548 (earth)
      zonegroup 38a8d75d-c834-4987-a78f-e615328fa663 (us)
           zone 71b1be97-96d1-4e0a-81e2-c5e430a186a7 (zone1)
  metadata sync syncing
                full sync: 0/64 shards
                incremental sync: 64/64 shards
                metadata is caught up with master
....
(2) No such file or directory : (5) Input/output error
(0) Success : (0) Success
(2) No such file or directory : (5) Input/output error
....
      data sync source: 9a4a40f0-9b31-4820-a5a8-6be46155f3e6 (zone0)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        1 shards are recovering
                        recovering shards: [104]

so, if the related rados object do not exist, the rval from osd is always ENO(besides this ,we can also find it if we display th retcode of RGWRadosGetOmapKeysCR in rgw_data_sync.cc when process bucket shards that previously failed), so maybe we should fix the RGWRadosGetOmapKeysCR, so that it will return with -ENOENT instead of -EIO

This comment has been minimized.

Copy link
@Leeshine

Leeshine Jan 9, 2018

Author Contributor

how come the rval of librados::ObjectReadOperation::omap_get_vals2() will be -EIO, we are not exactly clear now, @liuchang0812 and I will take a deep look into it later on, but if we use rados listomapkeys on an nonexistent object, the retcode will be -ENOENT, just as below:

[root@0deae4d9bb38 build]# bin/rados -p zone1.rgw.log listomapkeys non-exist-object 
error getting omap key set zone1.rgw.log/non-exist-object: (2) No such file or directory

so maybe the reasonable rval in omap_get_vals2() should be -ENOENT, so that it can stay in consensus with osds.
@cbodley @yehudasa @oritwas hope you can give some kindly suggestions on it

This comment has been minimized.

Copy link
@cbodley

cbodley Jan 9, 2018

Contributor

it looks like that EIO is coming from C_ObjectOperation_decodevals::finish() in Objecter.h. i see the osd replying to the op with ENOENT, but finish() gets called with r=0 and an empty bufferlist. trying to decode that bufferlist results in EIO

This comment has been minimized.

Copy link
@cbodley

cbodley Jan 9, 2018

Contributor

please see #19878

@@ -279,7 +282,8 @@ class RGWRemoteDataLog : public RGWCoroutinesManager {
int read_log_info(rgw_datalog_info *log_info);
int read_source_log_shards_info(map<int, RGWDataChangesLogInfo> *shards_info);
int read_source_log_shards_next(map<int, string> shard_markers, map<int, rgw_datalog_shard_data> *result);
int read_sync_status(rgw_data_sync_status *sync_status);
int read_sync_status(rgw_data_sync_status *sync_status, bool read_error);

This comment has been minimized.

Copy link
@cbodley

cbodley Jan 3, 2018

Contributor

i think it would make more sense to split this into a separate function like read_error_shards(set<int>&), instead of extending read_sync_status() and struct rgw_data_sync_status

@@ -279,7 +282,8 @@ class RGWRemoteDataLog : public RGWCoroutinesManager {
int read_log_info(rgw_datalog_info *log_info);
int read_source_log_shards_info(map<int, RGWDataChangesLogInfo> *shards_info);
int read_source_log_shards_next(map<int, string> shard_markers, map<int, rgw_datalog_shard_data> *result);
int read_sync_status(rgw_data_sync_status *sync_status);
int read_sync_status(rgw_data_sync_status *sync_status, bool read_error);
int read_lagging_buckets(rgw_data_sync_status *sync_status, map<int, string>& shard_markers, set<int>& error_shards);

This comment has been minimized.

Copy link
@cbodley

cbodley Jan 3, 2018

Contributor

the first argument can just be set<string>& lagging_buckets instead of adding it to rgw_data_sync_status


if (error_shards.size() > 0) {
push_ss(ss, status, tab) << error_shards.size() << " shards have encountered errors";
push_ss(ss, status, tab) << "error shards: " << "[" << error_shards << "]";

This comment has been minimized.

Copy link
@cbodley

cbodley Jan 3, 2018

Contributor

i'd rather we didn't call these "errors" here, since the majority of them are likely to be from EBUSY and will be retried without admin intervention. can we use different terms, like maybe " shards are recovering from errors" and "recovering shards:"?

}

if (sync_detail) {
ret = sync.read_lagging_buckets(&sync_status, shards_behind, error_shards); // read error shards

This comment has been minimized.

Copy link
@cbodley

cbodley Jan 3, 2018

Contributor

i'm not sure how i feel about this 'lagging buckets' part

on the one hand, it's useful to see which buckets could be affected by the datalog shards that are behind. but on the other hand, it's the bucket's sync status that tells whether the bucket is actually behind. so there will be many cases where buckets have finished sync but are reported as laggy here because there are still some datalog entries referring to them (this would often be due to failures in other buckets on the same datalog shard that delay progress). so i worry that false positives here will confuse the admin, and they'll assume that they need to do something to fix all of those buckets

it's also fairly expensive to compute - first, RGWReadRemoteDataLogCR will read up to 1000 datalog entries from each shard (so up to 128000 entries in memory), then RGWReadDataSyncErrorShardsCR reads all of the error entries (which is not bounded in size). in large deployments, this is likely to either crash radosgw-admin, or just take a really long time and produce too much output

This comment has been minimized.

Copy link
@Leeshine

Leeshine Jan 4, 2018

Author Contributor

thank you for kindly review. maybe just use bucket's sync status can not get all of the lagging bucket shards(i.e the related bucket instance have't been synced), instead of this lagging buckets part, maybe we can do as below?

  1. get the behind data log shards and error data log shards
  2. for every behind data log shard, read the related bucket instance, then use bucket's sync status to judge whether the bucket is actually behind
  3. for every error data log shard, just read the error bucket shards record in the related *.retry obj

any suggestions are appreciated!

This comment has been minimized.

Copy link
@yehudasa

yehudasa Jan 4, 2018

Member

I agree with @cbodley here. I think the proper way to do it would be to flip the responsibility. Instead of having radosgw-admin querying the status of buckets, have radosgw report the status somewhere durable when the state changes. It is already being done for the sync trace node info, just that we currently only keep it in memory.

This comment has been minimized.

Copy link
@cbodley

cbodley Jan 4, 2018

Contributor

@Leeshine i would really like to see improvements to the bucket sync status command so it can query the remote zones and compare their markers with the local bucket sync status. i think that would be the ideal way to inspect a single bucket/shard

so maybe the overall approach should be to start with a sync status command that tells which datalog shards are behind (as you've done). then have some kind of data sync status --shard=x command that lists which bucket shards are left to process (this 'lagging buckets' part). then the admin could use bucket sync status to look into the status of each one

that way we don't make the sync status command itself too complicated or verbose, but provide more tools to inspect the status in detail. what do you guys think?

This comment has been minimized.

Copy link
@oritwas

oritwas Jan 4, 2018

Contributor

@cbodley sounds good!

This comment has been minimized.

Copy link
@cbodley

This comment has been minimized.

Copy link
@Leeshine

Leeshine Jan 5, 2018

Author Contributor

all right, I'll try to working on it. @cbodley

This comment has been minimized.

Copy link
@cbodley

cbodley Jan 5, 2018

Contributor

@Leeshine can we limit this pr to just the sync status changes (minus --sync-detail and 'lagging buckets') and follow up with the other commands separately? i think what you've done so far is close to being ready to merge

This comment has been minimized.

Copy link
@Leeshine

Leeshine Jan 7, 2018

Author Contributor

@cbodley sure

@Leeshine Leeshine force-pushed the Leeshine:wip-rgw-sync-status branch from 08cbb15 to caa8286 Jan 8, 2018

@Leeshine

This comment has been minimized.

Copy link
Contributor Author

commented Jan 8, 2018

@cbodley ping (limited this pr to just the sync status changes)

Leeshine added some commits Dec 18, 2017

rgw: add lagging shard ids in rgw sync status
Signed-off-by: lvshanchun <lvshanchun@gmail.com>
rgw: display errors of object sync failed in sync error list
Signed-off-by: lvshanchun <lvshanchun@gmail.com>

@Leeshine Leeshine force-pushed the Leeshine:wip-rgw-sync-status branch from caa8286 to f22a217 Jan 17, 2018

@Leeshine

This comment has been minimized.

Copy link
Contributor Author

commented Jan 17, 2018

@cbodley ping

[root@node43 /data/samonlv/ceph-dev/build]# bin/radosgw-admin sync status --rgw-zone=zone1
          realm 1f0890d7-123d-4919-bb55-d69d5fcc933d (earth)
      zonegroup 1ff058f2-e6c9-4c09-9551-24234bac59f8 (us)
           zone 6f6a8125-af64-4a5f-90db-564a331f63d7 (zone1)
  metadata sync syncing
                full sync: 0/64 shards
                incremental sync: 64/64 shards
                metadata is behind on 1 shards
                behind shards: [57]
                oldest incremental change not applied: 2018-01-17 15:49:58.0.405048s
      data sync source: 5b48656c-1868-49c3-bf6a-fdd517742983 (zone0)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        1 shards are recovering
                        recovering shards: [104]

public:
RGWReadDataSyncRecoveringShardsCR(RGWDataSyncEnv *env, uint64_t _max_entries, set<int>& _shards,
map<int, std::set<std::string>>& _entries_map)

This comment has been minimized.

Copy link
@cbodley

cbodley Jan 17, 2018

Contributor

since the only caller is always requesting all shards from [0,num_shards), the use of set<int> shards seems like overkill - can you have it take int num_shards instead? see RGWReadDataSyncStatusMarkersCR as an example

int operate() override;
};

int RGWReadRecoveringShardsCoroutine::operate()

This comment has been minimized.

Copy link
@cbodley

cbodley Jan 17, 2018

Contributor

since this coroutine is only making one asynchronous call to RGWReadDataSyncRecoveringShardsCR, with some extra logic for setup/teardown, it probably makes sense to move that extra logic into read_recovering_shards() so we only need the one RGWReadDataSyncRecoveringShardsCR coroutine

return set_cr_error(retcode);
}

for (auto entry : entries_map) {

This comment has been minimized.

Copy link
@cbodley

cbodley Jan 17, 2018

Contributor

use of auto entry here will copy each entry by value, and this is expensive for std::set - prefer const auto& entry

@Leeshine Leeshine force-pushed the Leeshine:wip-rgw-sync-status branch from f22a217 to 517ce71 Jan 18, 2018

@Leeshine

This comment has been minimized.

Copy link
Contributor Author

commented Jan 18, 2018

@cbodley ping

@cbodley
Copy link
Contributor

left a comment

one minor comment, but otherwise looks great 👍

@@ -672,6 +706,34 @@ int RGWRemoteDataLog::read_sync_status(rgw_data_sync_status *sync_status)
return ret;
}

int RGWRemoteDataLog::read_recovering_shards(rgw_data_sync_status *sync_status, set<int>& recovering_shards)

This comment has been minimized.

Copy link
@cbodley

cbodley Jan 18, 2018

Contributor

it looks like this function only uses sync_status->sync_info.num_shards from sync_status, could you just pass int num_shards instead?

Leeshine added some commits Dec 28, 2017

rgw: add RGWReadDataSyncRecoveringShardsCR to read recovering shards
Signed-off-by: lvshanchun <lvshanchun@gmail.com>
rgw: display data sync recovering shards in radosgw-admin sync status
Signed-off-by: lvshanchun <lvshanchun@gmail.com>

@Leeshine Leeshine force-pushed the Leeshine:wip-rgw-sync-status branch from 517ce71 to 8c3ea26 Jan 19, 2018

@Leeshine

This comment has been minimized.

Copy link
Contributor Author

commented Jan 19, 2018

@cbodley thank you for kindly review, I'll put a PR about display lagging bucket shards and their sync detail later on

@cbodley

This comment has been minimized.

Copy link
Contributor

commented Jan 19, 2018

@cbodley thank you for kindly review, I'll put a PR about display lagging bucket shards and their sync detail later on

great. i've been meaning to create a tracker issue for the next steps, and add some more details about how it should look. i'll try to do that today

@cbodley cbodley added the needs-qa label Jan 19, 2018

@vumrao

This comment has been minimized.

Copy link
Contributor

commented Jan 26, 2018

@Leeshine @cbodley I think we can use this old tracker - http://tracker.ceph.com/issues/20473

@ktdreyer ktdreyer requested a review from mattbenjamin Jan 29, 2018

@yuriw

This comment has been minimized.

Copy link
Contributor

commented Feb 14, 2018

@yuriw yuriw merged commit bb4b98b into ceph:master Feb 15, 2018

5 checks passed

Docs: build check OK - docs built
Details
Signed-off-by all commits in this PR are signed
Details
Unmodified Submodules submodules for project are unmodified
Details
make check make check succeeded
Details
make check (arm64) make check succeeded
Details

@Leeshine Leeshine deleted the Leeshine:wip-rgw-sync-status branch Feb 23, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.