Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rgw: fix error handling in ListBucketIndexesCR #18198

Merged
merged 3 commits into from Oct 26, 2017

Conversation

cbodley
Copy link
Contributor

@cbodley cbodley commented Oct 9, 2017

the call to set_state() returns 0, when we want operate() to return the
error code instead. use set_cr_error() to do this

Fixes: http://tracker.ceph.com/issues/21735

@cbodley
Copy link
Contributor Author

cbodley commented Oct 9, 2017

to test, i used this diff to inject an error in RGWListBucketIndexesCR::operate():

   int operate() override {
     reenter(this) {
+#if 0
       yield {
         string entrypoint = string("/admin/metadata/bucket.instance");
         /* FIXME: need a better scaling solution here, requires streaming output */
         call(new RGWReadRESTResourceCR<list<string> >(store->ctx(), sync_env->conn, sync_env->http_manager,
                                                       entrypoint, NULL, &result));
       }
+#else
+      set_retcode(-EINVAL); // inject failure
+#endif

without the fix applied, data sync status got stuck on full sync:

 data sync source: 6e5c3ecc-50ea-4a1f-a6b1-ecfbbb2bd259 (na-1)
                   syncing
                   full sync: 128/128 shards
                   full sync: 0 buckets to sync
                   incremental sync: 0/128 shards
                   data is behind on 128 shards

with the fix applied, data sync status instead gets stuck in the correct 'preparing for full sync' state:

 data sync source: f5628297-2ae2-4a62-a02a-771737830af8 (na-1)
                   preparing for full sync
                   full sync: 1/1 shards
                   full sync: 0 buckets to sync
                   incremental sync: 0/1 shards
                   data is behind on 1 shards

with the change to RGWDataSyncControlCR, these errors are retried with backoff instead of returning control and stopping the data sync thread

@yuriw
Copy link
Contributor

yuriw commented Oct 10, 2017

@yuriw
Copy link
Contributor

yuriw commented Oct 11, 2017

removed tags per @cbodley see test results above

if the metadata listing fails, we won't have to clean up entries_index

Signed-off-by: Casey Bodley <cbodley@redhat.com>
the call to set_state() returns 0, when we want operate() to return the
error code instead. use set_cr_error() to do this

Fixes: http://tracker.ceph.com/issues/21735

Signed-off-by: Casey Bodley <cbodley@redhat.com>
similar to RGWMetaSyncShardControlCR, we don't want to exit and
stop the data sync processor thread on failures. we want to keep
retrying with backoff

Signed-off-by: Casey Bodley <cbodley@redhat.com>
@yuriw
Copy link
Contributor

yuriw commented Oct 23, 2017

@mattbenjamin mattbenjamin merged commit 664ad1a into ceph:master Oct 26, 2017
@cbodley cbodley deleted the wip-21735 branch October 26, 2017 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants