Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rgw: fix versioned bucket index logic so that incomplete index transactions can be fixed #55165

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

cfsnyder
Copy link
Contributor

rgw: fix versioned bucket index logic so that incomplete index transactions can be fixed

When a versioned bucket index transaction does not complete normally,
it can leave behind index entries which cannot be cleaned up
by subsequent delete requests or by radosgw-admin commands. These
leftover index entries can also cause the bucket object count
to be incorrect in a manner that is not fixable by existing tooling.
This commit fixes the logic to allow those transactions to be
completed by subsequent deletes or by radosgw-admin commands.

Fixes: https://tracker.ceph.com/issues/64016
Signed-off-by: Cory Snyder csnyder@1111systems.com

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@cfsnyder cfsnyder requested a review from a team as a code owner January 12, 2024 21:26
@cfsnyder cfsnyder removed the request for review from a team January 12, 2024 21:27
@cfsnyder cfsnyder requested a review from a team January 12, 2024 21:27
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

Copy link
Member

@ivancich ivancich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had only one specific question. Also, a lot of technical debt has built up over time with long functions with intricate logic and very little documentation. If you're inspired and wouldn't mind helping out on that front, it'd be greatly appreciated.

if (r < 0)
return r;
if (exists) {
r = target->prepare_atomic_modification(dpp, op, false, NULL, NULL, NULL, true, false, y);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This prepare_atomic_modification is protected by a check of exists. Yet I don't see a similar protection below in the corresponding call to complete_atomic_modification. Is that an issue or am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double check my analysis here, but it looks to me like the complete_atomic_modification method is cleaning up the tail of an object and it seems that it's possible for the tail to not be cleaned up (if the process is aborted or killed or the request thread is terminated early in some other way). I was thinking that the subsequent attempts to delete the object should again attempt to delete any tail objects, in case some were left behind. It's a good point though, and a scenario that we should probably test to make sure there are no surprises.

@cfsnyder
Copy link
Contributor Author

I had only one specific question. Also, a lot of technical debt has built up over time with long functions with intricate logic and very little documentation. If you're inspired and wouldn't mind helping out on that front, it'd be greatly appreciated.

If I can carve out some time, I'd be happy to do a little refactoring here and write some better documentation on how the versioned bucket index is implemented. I'll definitely try.

@cbodley
Copy link
Contributor

cbodley commented Feb 1, 2024

@cfsnyder can you please rebase?

…ith existing tooling

Adds test cases to reproduce a scenario where leftover versioned
bucket index entries are not fixable/removable by existing
radosgw-admin commands nor by subsequent delete requests.

Signed-off-by: Cory Snyder <csnyder@1111systems.com>
…ctions can be fixed

When a versioned bucket index transaction does not complete normally,
it can leave behind index entries which cannot be cleaned up
by subsequent delete requests or by radosgw-admin commands. These
leftover index entries can also cause the bucket object count
to be incorrect in a manner that is not fixable by existing tooling.
This commit fixes the logic to allow those transactions to be
completed by subsequent deletes or by radosgw-admin commands.

Fixes: https://tracker.ceph.com/issues/64016
Signed-off-by: Cory Snyder <csnyder@1111systems.com>
@cbodley
Copy link
Contributor

cbodley commented Feb 15, 2024

jenkins test windows

@cbodley
Copy link
Contributor

cbodley commented Feb 16, 2024

failed qa with ceph_test_cls_rgw failures (example teuthology.log):

2024-02-16T10:39:29.879 INFO:tasks.workunit.client.0.smithi049.stdout:[==========] Running 16 tests from 1 test suite.
2024-02-16T10:39:29.879 INFO:tasks.workunit.client.0.smithi049.stdout:[----------] Global test environment set-up.
2024-02-16T10:39:29.879 INFO:tasks.workunit.client.0.smithi049.stdout:[----------] 16 tests from cls_rgw
2024-02-16T10:39:31.738 INFO:tasks.workunit.client.0.smithi049.stdout:[ RUN      ] cls_rgw.index_basic
2024-02-16T10:39:31.783 INFO:tasks.workunit.client.0.smithi049.stdout:[       OK ] cls_rgw.index_basic (45 ms)
2024-02-16T10:39:31.783 INFO:tasks.workunit.client.0.smithi049.stdout:[ RUN      ] cls_rgw.index_multiple_obj_writers
2024-02-16T10:39:31.819 INFO:tasks.workunit.client.0.smithi049.stdout:[       OK ] cls_rgw.index_multiple_obj_writers (36 ms)
2024-02-16T10:39:31.819 INFO:tasks.workunit.client.0.smithi049.stdout:[ RUN      ] cls_rgw.index_remove_object
2024-02-16T10:39:31.860 INFO:tasks.workunit.client.0.smithi049.stdout:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/19.0.0-1461-g6a4516ad/rpm/el9/BUILD/ceph-19.0.0-1461-g6a4516ad/src/test/cls_rgw/test_cls_rgw.cc:94: Failure
2024-02-16T10:39:31.860 INFO:tasks.workunit.client.0.smithi049.stdout:Expected equality of these values:
2024-02-16T10:39:31.860 INFO:tasks.workunit.client.0.smithi049.stdout:  0
2024-02-16T10:39:31.860 INFO:tasks.workunit.client.0.smithi049.stdout:  ioctx.operate(oid, &op)
2024-02-16T10:39:31.861 INFO:tasks.workunit.client.0.smithi049.stdout:    Which is: -22
2024-02-16T10:39:31.861 INFO:tasks.workunit.client.0.smithi049.stdout:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/19.0.0-1461-g6a4516ad/rpm/el9/BUILD/ceph-19.0.0-1461-g6a4516ad/src/test/cls_rgw/test_cls_rgw.cc:69: Failure
2024-02-16T10:39:31.861 INFO:tasks.workunit.client.0.smithi049.stdout:Expected equality of these values:
2024-02-16T10:39:31.861 INFO:tasks.workunit.client.0.smithi049.stdout:  total_size
2024-02-16T10:39:31.861 INFO:tasks.workunit.client.0.smithi049.stdout:    Which is: 41472
2024-02-16T10:39:31.861 INFO:tasks.workunit.client.0.smithi049.stdout:  size
2024-02-16T10:39:31.861 INFO:tasks.workunit.client.0.smithi049.stdout:    Which is: 40960
2024-02-16T10:39:31.865 INFO:tasks.workunit.client.0.smithi049.stdout:[  FAILED  ] cls_rgw.index_remove_object (47 ms)
2024-02-16T10:39:31.865 INFO:tasks.workunit.client.0.smithi049.stdout:[ RUN      ] cls_rgw.index_suggest
2024-02-16T10:39:33.350 INFO:tasks.workunit.client.0.smithi049.stdout:[       OK ] cls_rgw.index_suggest (1484 ms)
2024-02-16T10:39:33.350 INFO:tasks.workunit.client.0.smithi049.stdout:[ RUN      ] cls_rgw.index_suggest_complete
2024-02-16T10:39:33.358 INFO:tasks.workunit.client.0.smithi049.stdout:[       OK ] cls_rgw.index_suggest_complete (8 ms)
2024-02-16T10:39:33.358 INFO:tasks.workunit.client.0.smithi049.stdout:[ RUN      ] cls_rgw.index_list
2024-02-16T10:39:33.385 INFO:tasks.workunit.client.0.smithi049.stdout:[       OK ] cls_rgw.index_list (27 ms)
2024-02-16T10:39:33.385 INFO:tasks.workunit.client.0.smithi049.stdout:[ RUN      ] cls_rgw.index_list_delimited
2024-02-16T10:39:53.157 INFO:tasks.workunit.client.0.smithi049.stdout:[       OK ] cls_rgw.index_list_delimited (19771 ms)
2024-02-16T10:39:53.157 INFO:tasks.workunit.client.0.smithi049.stdout:[ RUN      ] cls_rgw.bi_list
2024-02-16T10:39:53.229 INFO:tasks.workunit.client.0.smithi049.stdout:[       OK ] cls_rgw.bi_list (73 ms)
2024-02-16T10:39:53.229 INFO:tasks.workunit.client.0.smithi049.stdout:[ RUN      ] cls_rgw.gc_set
2024-02-16T10:39:53.240 INFO:tasks.workunit.client.0.smithi049.stdout:[       OK ] cls_rgw.gc_set (11 ms)
2024-02-16T10:39:53.240 INFO:tasks.workunit.client.0.smithi049.stdout:[ RUN      ] cls_rgw.gc_list
2024-02-16T10:39:53.250 INFO:tasks.workunit.client.0.smithi049.stdout:[       OK ] cls_rgw.gc_list (10 ms)
2024-02-16T10:39:53.250 INFO:tasks.workunit.client.0.smithi049.stdout:[ RUN      ] cls_rgw.gc_defer
2024-02-16T10:40:00.807 INFO:tasks.workunit.client.0.smithi049.stdout:[       OK ] cls_rgw.gc_defer (7557 ms)
2024-02-16T10:40:00.808 INFO:tasks.workunit.client.0.smithi049.stdout:[ RUN      ] cls_rgw.usage_basic
2024-02-16T10:40:00.901 INFO:tasks.workunit.client.0.smithi049.stdout:[       OK ] cls_rgw.usage_basic (94 ms)
2024-02-16T10:40:00.901 INFO:tasks.workunit.client.0.smithi049.stdout:[ RUN      ] cls_rgw.usage_clear_no_obj
2024-02-16T10:40:00.902 INFO:tasks.workunit.client.0.smithi049.stdout:[       OK ] cls_rgw.usage_clear_no_obj (1 ms)
2024-02-16T10:40:00.902 INFO:tasks.workunit.client.0.smithi049.stdout:[ RUN      ] cls_rgw.usage_clear
2024-02-16T10:40:00.999 INFO:tasks.workunit.client.0.smithi049.stdout:[       OK ] cls_rgw.usage_clear (98 ms)
2024-02-16T10:40:00.999 INFO:tasks.workunit.client.0.smithi049.stdout:[ RUN      ] cls_rgw.bi_log_trim
2024-02-16T10:40:01.040 INFO:tasks.workunit.client.0.smithi049.stdout:[       OK ] cls_rgw.bi_log_trim (40 ms)
2024-02-16T10:40:01.040 INFO:tasks.workunit.client.0.smithi049.stdout:[ RUN      ] cls_rgw.index_racing_removes
2024-02-16T10:40:01.059 INFO:tasks.workunit.client.0.smithi049.stdout:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/19.0.0-1461-g6a4516ad/rpm/el9/BUILD/ceph-19.0.0-1461-g6a4516ad/src/test/cls_rgw/test_cls_rgw.cc:1308: Failure
2024-02-16T10:40:01.060 INFO:tasks.workunit.client.0.smithi049.stdout:Expected equality of these values:
2024-02-16T10:40:01.060 INFO:tasks.workunit.client.0.smithi049.stdout:  1
2024-02-16T10:40:01.060 INFO:tasks.workunit.client.0.smithi049.stdout:  entries.size()
2024-02-16T10:40:01.060 INFO:tasks.workunit.client.0.smithi049.stdout:    Which is: 0
2024-02-16T10:40:01.060 INFO:tasks.workunit.client.0.smithi049.stdout:[  FAILED  ] cls_rgw.index_racing_removes (20 ms)
2024-02-16T10:40:01.808 INFO:tasks.workunit.client.0.smithi049.stdout:[----------] 16 tests from cls_rgw (29322 ms total)
2024-02-16T10:40:01.808 INFO:tasks.workunit.client.0.smithi049.stdout:
2024-02-16T10:40:01.808 INFO:tasks.workunit.client.0.smithi049.stdout:[----------] Global test environment tear-down
2024-02-16T10:40:01.808 INFO:tasks.workunit.client.0.smithi049.stdout:[==========] 16 tests from 1 test suite ran. (31929 ms total)
2024-02-16T10:40:01.808 INFO:tasks.workunit.client.0.smithi049.stdout:[  PASSED  ] 14 tests.
2024-02-16T10:40:01.809 INFO:tasks.workunit.client.0.smithi049.stdout:[  FAILED  ] 2 tests, listed below:
2024-02-16T10:40:01.809 INFO:tasks.workunit.client.0.smithi049.stdout:[  FAILED  ] cls_rgw.index_remove_object
2024-02-16T10:40:01.809 INFO:tasks.workunit.client.0.smithi049.stdout:[  FAILED  ] cls_rgw.index_racing_removes

@cbodley
Copy link
Contributor

cbodley commented Feb 16, 2024

2024-02-16T10:39:31.860 INFO:tasks.workunit.client.0.smithi049.stdout:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/19.0.0-1461-g6a4516ad/rpm/el9/BUILD/ceph-19.0.0-1461-g6a4516ad/src/test/cls_rgw/test_cls_rgw.cc:94: Failure
2024-02-16T10:39:31.860 INFO:tasks.workunit.client.0.smithi049.stdout:Expected equality of these values:
2024-02-16T10:39:31.860 INFO:tasks.workunit.client.0.smithi049.stdout: 0
2024-02-16T10:39:31.860 INFO:tasks.workunit.client.0.smithi049.stdout: ioctx.operate(oid, &op)
2024-02-16T10:39:31.861 INFO:tasks.workunit.client.0.smithi049.stdout: Which is: -22

this one corresponds to https://github.com/ceph/ceph/blob/6e39aa0/src/test/cls_rgw/test_cls_rgw.cc#L207-L231 where DEL races with ADD. because the DEL completes first and deletes the entry, the ADD fails afterwards

@cfsnyder
Copy link
Contributor Author

Thanks @cbodley , I'll be looking into it this week.

@cfsnyder
Copy link
Contributor Author

cfsnyder commented Apr 1, 2024

@cbodley I'm not seeing how this failing test case represents a realistic scenario. How would a delete happen concurrently with an add for the same object instance? It seems to me that an RGW client would always need to get the response from a PUT to be aware of the instance ID before being able to issue an associated delete, so in practice there should always be a happens-before relationship. Am I missing something?

@cbodley
Copy link
Contributor

cbodley commented Apr 2, 2024

@cbodley I'm not seeing how this failing test case represents a realistic scenario. How would a delete happen concurrently with an add for the same object instance? It seems to me that an RGW client would always need to get the response from a PUT to be aware of the instance ID before being able to issue an associated delete, so in practice there should always be a happens-before relationship. Am I missing something?

  • operations like PutObjectAcl can mutate an existing version, where i believe we send another ADD op to the bucket index. those could race with a DEL on the same version id
  • that failing test case assumes that versioning is disabled. without version ids, normal PutObject requests can race with deletes on the same index entry

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants