rgw: fix error handling in ListBucketIndexesCR #18198

cbodley · 2017-10-09T18:37:19Z

the call to set_state() returns 0, when we want operate() to return the
error code instead. use set_cr_error() to do this

Fixes: http://tracker.ceph.com/issues/21735

cbodley · 2017-10-09T18:57:12Z

to test, i used this diff to inject an error in RGWListBucketIndexesCR::operate():

   int operate() override {
     reenter(this) {
+#if 0
       yield {
         string entrypoint = string("/admin/metadata/bucket.instance");
         /* FIXME: need a better scaling solution here, requires streaming output */
         call(new RGWReadRESTResourceCR<list<string> >(store->ctx(), sync_env->conn, sync_env->http_manager,
                                                       entrypoint, NULL, &result));
       }
+#else
+      set_retcode(-EINVAL); // inject failure
+#endif

without the fix applied, data sync status got stuck on full sync:

 data sync source: 6e5c3ecc-50ea-4a1f-a6b1-ecfbbb2bd259 (na-1)
                   syncing
                   full sync: 128/128 shards
                   full sync: 0 buckets to sync
                   incremental sync: 0/128 shards
                   data is behind on 128 shards

with the fix applied, data sync status instead gets stuck in the correct 'preparing for full sync' state:

 data sync source: f5628297-2ae2-4a62-a02a-771737830af8 (na-1)
                   preparing for full sync
                   full sync: 1/1 shards
                   full sync: 0 buckets to sync
                   incremental sync: 0/1 shards
                   data is behind on 1 shards

with the change to RGWDataSyncControlCR, these errors are retried with backoff instead of returning control and stopping the data sync thread

yuriw · 2017-10-10T21:15:34Z

wip-yuri-testing-2017-10-10-2113

yuriw · 2017-10-11T15:43:12Z

removed tags per @cbodley see test results above

if the metadata listing fails, we won't have to clean up entries_index Signed-off-by: Casey Bodley <cbodley@redhat.com>

the call to set_state() returns 0, when we want operate() to return the error code instead. use set_cr_error() to do this Fixes: http://tracker.ceph.com/issues/21735 Signed-off-by: Casey Bodley <cbodley@redhat.com>

similar to RGWMetaSyncShardControlCR, we don't want to exit and stop the data sync processor thread on failures. we want to keep retrying with backoff Signed-off-by: Casey Bodley <cbodley@redhat.com>

yuriw · 2017-10-23T20:52:22Z

wip-yuri4-testing-2017-10-23-2051

cbodley added bug-fix rgw labels Oct 9, 2017

cbodley requested a review from yehudasa October 9, 2017 18:37

cbodley added needs-qa wip-yuri-testing labels Oct 10, 2017

yuriw removed needs-qa wip-yuri-testing labels Oct 11, 2017

cbodley added 3 commits October 11, 2017 16:14

rgw: ListBucketIndexesCR spawns entries_index after listing metadata

7f127f5

if the metadata listing fails, we won't have to clean up entries_index Signed-off-by: Casey Bodley <cbodley@redhat.com>

rgw: fix error handling in ListBucketIndexesCR

ed6340a

the call to set_state() returns 0, when we want operate() to return the error code instead. use set_cr_error() to do this Fixes: http://tracker.ceph.com/issues/21735 Signed-off-by: Casey Bodley <cbodley@redhat.com>

rgw: RGWDataSyncControlCR retries on all errors

065e67b

similar to RGWMetaSyncShardControlCR, we don't want to exit and stop the data sync processor thread on failures. we want to keep retrying with backoff Signed-off-by: Casey Bodley <cbodley@redhat.com>

cbodley force-pushed the wip-21735 branch from 8bcf355 to 065e67b Compare October 11, 2017 20:21

oritwas approved these changes Oct 19, 2017

View reviewed changes

cbodley added needs-qa wip-yuri4-testing labels Oct 23, 2017

yuriw removed the wip-yuri4-testing label Oct 24, 2017

yehudasa approved these changes Oct 26, 2017

View reviewed changes

mattbenjamin merged commit 664ad1a into ceph:master Oct 26, 2017

cbodley deleted the wip-21735 branch October 26, 2017 17:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rgw: fix error handling in ListBucketIndexesCR #18198

rgw: fix error handling in ListBucketIndexesCR #18198

cbodley commented Oct 9, 2017

cbodley commented Oct 9, 2017

yuriw commented Oct 10, 2017

yuriw commented Oct 11, 2017

yuriw commented Oct 23, 2017

rgw: fix error handling in ListBucketIndexesCR #18198

rgw: fix error handling in ListBucketIndexesCR #18198

Conversation

cbodley commented Oct 9, 2017

cbodley commented Oct 9, 2017

yuriw commented Oct 10, 2017

yuriw commented Oct 11, 2017

yuriw commented Oct 23, 2017