rgw: fix PUT on versioned bucket fails with NoSuchKey by mkogan1 · Pull Request #38705 · ceph/ceph

mkogan1 · 2020-12-23T17:28:21Z

Fixes: https://tracker.ceph.com/issues/48709

Signed-off-by: Mark Kogan mkogan@redhat.com

Checklist

References tracker ticket
Updates documentation if necessary
Includes tests for new functionality or reproducer for bug

Show available Jenkins commands

jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox

Fixes: https://tracker.ceph.com/issues/48709 Signed-off-by: Mark Kogan <mkogan@redhat.com>

mkogan1 · 2020-12-23T18:54:45Z

submitted teuthology full rgw suite:
https://pulpito.ceph.com/mkogan-2020-12-23_18:51:19-rgw-wip-rgw-broken-pipe_i01-distro-basic-smithi/

mattbenjamin · 2020-12-23T19:22:52Z

src/rgw/rgw_rados.cc

-      ldout(store->ctx(), 5) << "failed to get BucketShard object: ret=" << ret << dendl;
+      if (ret == -ENOENT) {
+        ldout(store->ctx(), 5) << "failed to get BucketShard object: ret=" << ret << ", modifying to " << -ERR_INTERNAL_ERROR << dendl;
+        ret = -ERR_INTERNAL_ERROR;


change on this path seems sane--nothing has I think happened, so request seems definitely retryable

mattbenjamin · 2020-12-23T19:23:43Z

src/rgw/rgw_rados.cc

  int ret = get_bucket_shard(&bs);
  if (ret < 0) {
-    ldout(store->ctx(), 5) << "failed to get BucketShard object: ret=" << ret << dendl;
+    if (ret == -ENOENT) {


less sure about this path; what request are we executing, and what has happened in the current request to this point?

mattbenjamin · 2020-12-23T19:24:49Z

src/rgw/rgw_rados.cc

  for (int i = 0; i < NUM_RESHARD_RETRIES; ++i) {
    r = bs->init(pobj->bucket, *pobj, nullptr /* no RGWBucketInfo */);
    if (r < 0) {
-      ldout(cct, 5) << "bs.init() returned ret=" << r << dendl;


assuming this is the same as RGWRados::Bucket::UpdateIndex::guard_reshard()--but I didn't realize there were two guard_reshard() methods

mattbenjamin · 2020-12-23T19:25:35Z

src/rgw/rgw_rados.cc

    bs.init(obj_instance.bucket, obj_instance, nullptr /* no RGWBucketInfo */);
  if (ret < 0) {
-    ldout(cct, 5) << "bs.init() returned ret=" << ret << dendl;
+    if (ret == -ENOENT) {


here again I'm unsure whether to trust that the current request has had no side effect/is safely retryable

mattbenjamin · 2020-12-23T19:25:56Z

src/rgw/rgw_rados.cc

    bs.init(obj_instance.bucket, obj_instance, nullptr /* no RGWBucketInfo */);
  if (ret < 0) {
-    ldout(cct, 5) << "bs.init() returned ret=" << ret << dendl;
+    if (ret == -ENOENT) {


ditto here; what has happened to this point, for the current request?

cbodley · 2021-01-05T18:26:43Z

this looks valid as a workaround, but i think we're still missing something here

once a reshard completes, guard_reshard() calls target->update_bucket_id() so it can retry the op against the new bucket instance/index. we read that new bucket instance metadata into RGWRados::Bucket::bucket_info and use that for the retry

but when writing versioned objects, there are multiple calls to guard_reshard() in RGWRados::Object::Write::_do_write_meta(). if the first one hits this update_bucket_id() case, the updated RGWRados::Bucket::bucket_info will only be used for the retry - later calls to guard_reshard() would still be using the original pre-resharded version of the bucket_info, and so get ENOENT once reshard finishes deleting it

it sounds like everything under _do_write_meta() should be passing the bucket info as a mutable reference RGWBucketInfo&, so that the call to update_bucket_id() applies throughout. does that make sense?

mattbenjamin · 2021-01-05T18:31:35Z

@cbodley it sounds right in that uses after reshard starts would be uniform?

github-actions · 2021-05-05T16:57:58Z

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

stale · 2021-07-21T02:00:42Z

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

mkogan1 · 2021-08-05T08:55:18Z

Unstale please

stale · 2022-01-09T08:27:12Z

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

mkogan1 · 2022-03-16T15:53:01Z

fixed by

cls/rgw: rgw_dir_suggest_changes detects race with completion #45345 -- cls/rgw: rgw_dir_suggest_changes detects race with completion
rgw: Update "CEPH_RGW_DIR_SUGGEST_LOG_OP" for remove entries #45300 -- rgw: Update "CEPH_RGW_DIR_SUGGEST_LOG_OP" for remove entries

rgw: fix PUT on versioned bucket fails with NoSuchKey

7c809a5

Fixes: https://tracker.ceph.com/issues/48709 Signed-off-by: Mark Kogan <mkogan@redhat.com>

mkogan1 self-assigned this Dec 23, 2020

github-actions bot added the rgw label Dec 23, 2020

mkogan1 added bug-fix DNM needs-qa labels Dec 23, 2020

mkogan1 requested review from cbodley and mattbenjamin December 23, 2020 17:29

mattbenjamin reviewed Dec 23, 2020

View reviewed changes

github-actions bot added the needs-rebase label May 5, 2021

stale bot added the stale label Jul 21, 2021

stale bot removed the stale label Aug 5, 2021

stale bot added the stale label Jan 9, 2022

cbodley removed the needs-qa label Jan 27, 2022

stale bot removed the stale label Jan 27, 2022

mkogan1 closed this Mar 16, 2022

Conversation

mkogan1 commented Dec 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Uh oh!

mkogan1 commented Dec 23, 2020

Uh oh!

mattbenjamin Dec 23, 2020

Choose a reason for hiding this comment

Uh oh!

mattbenjamin Dec 23, 2020

Choose a reason for hiding this comment

Uh oh!

mattbenjamin Dec 23, 2020

Choose a reason for hiding this comment

Uh oh!

mattbenjamin Dec 23, 2020

Choose a reason for hiding this comment

Uh oh!

mattbenjamin Dec 23, 2020

Choose a reason for hiding this comment

Uh oh!

cbodley commented Jan 5, 2021

Uh oh!

mattbenjamin commented Jan 5, 2021

Uh oh!

github-actions bot commented May 5, 2021

Uh oh!

stale bot commented Jul 21, 2021

Uh oh!

mkogan1 commented Aug 5, 2021

Uh oh!

stale bot commented Jan 9, 2022

Uh oh!

mkogan1 commented Mar 16, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mkogan1 commented Dec 23, 2020 •

edited

Loading