Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rgw: fix error handling in ListBucketIndexesCR #18198

Merged
merged 3 commits into from Oct 26, 2017

Conversation

Projects
None yet
5 participants
@cbodley
Copy link
Contributor

commented Oct 9, 2017

the call to set_state() returns 0, when we want operate() to return the
error code instead. use set_cr_error() to do this

Fixes: http://tracker.ceph.com/issues/21735

@cbodley cbodley requested a review from yehudasa Oct 9, 2017

@cbodley

This comment has been minimized.

Copy link
Contributor Author

commented Oct 9, 2017

to test, i used this diff to inject an error in RGWListBucketIndexesCR::operate():

   int operate() override {
     reenter(this) {
+#if 0
       yield {
         string entrypoint = string("/admin/metadata/bucket.instance");
         /* FIXME: need a better scaling solution here, requires streaming output */
         call(new RGWReadRESTResourceCR<list<string> >(store->ctx(), sync_env->conn, sync_env->http_manager,
                                                       entrypoint, NULL, &result));
       }
+#else
+      set_retcode(-EINVAL); // inject failure
+#endif

without the fix applied, data sync status got stuck on full sync:

 data sync source: 6e5c3ecc-50ea-4a1f-a6b1-ecfbbb2bd259 (na-1)
                   syncing
                   full sync: 128/128 shards
                   full sync: 0 buckets to sync
                   incremental sync: 0/128 shards
                   data is behind on 128 shards

with the fix applied, data sync status instead gets stuck in the correct 'preparing for full sync' state:

 data sync source: f5628297-2ae2-4a62-a02a-771737830af8 (na-1)
                   preparing for full sync
                   full sync: 1/1 shards
                   full sync: 0 buckets to sync
                   incremental sync: 0/1 shards
                   data is behind on 1 shards

with the change to RGWDataSyncControlCR, these errors are retried with backoff instead of returning control and stopping the data sync thread

@yuriw

This comment has been minimized.

Copy link
Contributor

commented Oct 10, 2017

@yuriw

This comment has been minimized.

Copy link
Contributor

commented Oct 11, 2017

removed tags per @cbodley see test results above

cbodley added some commits Oct 9, 2017

rgw: ListBucketIndexesCR spawns entries_index after listing metadata
if the metadata listing fails, we won't have to clean up entries_index

Signed-off-by: Casey Bodley <cbodley@redhat.com>
rgw: fix error handling in ListBucketIndexesCR
the call to set_state() returns 0, when we want operate() to return the
error code instead. use set_cr_error() to do this

Fixes: http://tracker.ceph.com/issues/21735

Signed-off-by: Casey Bodley <cbodley@redhat.com>
rgw: RGWDataSyncControlCR retries on all errors
similar to RGWMetaSyncShardControlCR, we don't want to exit and
stop the data sync processor thread on failures. we want to keep
retrying with backoff

Signed-off-by: Casey Bodley <cbodley@redhat.com>

@cbodley cbodley force-pushed the cbodley:wip-21735 branch from 8bcf355 to 065e67b Oct 11, 2017

@yuriw

This comment has been minimized.

Copy link
Contributor

commented Oct 23, 2017

@mattbenjamin mattbenjamin merged commit 664ad1a into ceph:master Oct 26, 2017

5 checks passed

Docs: build check OK - docs built
Details
Signed-off-by all commits in this PR are signed
Details
Unmodified Submodules submodules for project are unmodified
Details
make check make check succeeded
Details
make check (arm64) make check succeeded
Details

@cbodley cbodley deleted the cbodley:wip-21735 branch Oct 26, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.