Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cls/rgw: index cancelation still cleans up remove_objs #43854

Merged
merged 4 commits into from Nov 16, 2021

Conversation

cbodley
Copy link
Contributor

@cbodley cbodley commented Nov 9, 2021

when multipart uploads complete their final bucket index transaction, they pass the list of part objects in 'remove_objs' for bulk removal - the part objects, along with their bucket stats, get replaced by the head object

but if CompleteMultipart races with another upload, the head object write will fail with ECANCELED and the bucket index transaction gets canceled with CLS_RGW_OP_CANCEL. these canceled uploads still need to clean up their 'remove_objs', but cancelation was returning too early. as a result, these bucket index entries get orphaned and leave the bucket stats inconsistent

this commit reworks rgw_bucket_complete_op() so that CLS_RGW_OP_CANCEL is handled the same way as OP_ADD and OP_DEL, so always runs the loop to clean up 'remove_objs'

Fixes: https://tracker.ceph.com/issues/53199

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

@cbodley
Copy link
Contributor Author

cbodley commented Nov 9, 2021

with this fix, the bucket stats correctly show a single 32M object using the reproducer from https://tracker.ceph.com/issues/53199, even with 36 concurrent multipart uploads (30 of which were canceled):

    "usage": {
        "rgw.main": {
            "size": 33554432,
            "size_actual": 33554432,
            "size_utilized": 33554432,
            "size_kb": 32768,
            "size_kb_actual": 32768,
            "size_kb_utilized": 32768,
            "num_objects": 1
        },
        "rgw.multimeta": {
            "size": 0,
            "size_actual": 0,
            "size_utilized": 0,
            "size_kb": 0,
            "size_kb_actual": 0,
            "size_kb_utilized": 0,
            "num_objects": 0
        }

Copy link
Member

@ivancich ivancich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

src/cls/rgw/cls_rgw.cc Outdated Show resolved Hide resolved
src/cls/rgw/cls_rgw.cc Outdated Show resolved Hide resolved
src/cls/rgw/cls_rgw.cc Show resolved Hide resolved
@ivancich ivancich self-requested a review November 9, 2021 15:13
Copy link
Member

@ivancich ivancich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, my previous review was not the approve that I intended

@cbodley
Copy link
Contributor Author

cbodley commented Nov 9, 2021

thanks for the review @ivancich! i'm still planning to model multipart uploads in #43843, so that should give us good test coverage for this

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
when multipart uploads complete their final bucket index transaction,
they pass the list of part objects in 'remove_objs' for bulk removal -
the part objects, along with their bucket stats, get replaced by the
head object

but if CompleteMultipart races with another upload, the head object
write will fail with ECANCELED and the bucket index transaction gets
canceled with CLS_RGW_OP_CANCEL. these canceled uploads still need to
clean up their 'remove_objs', but cancelation was returning too early.
as a result, these bucket index entries get orphaned and leave the
bucket stats inconsistent

this commit reworks rgw_bucket_complete_op() so that CLS_RGW_OP_CANCEL
is handled the same way as OP_ADD and OP_DEL, so always runs the loop to
clean up 'remove_objs'

Fixes: https://tracker.ceph.com/issues/53199

Signed-off-by: Casey Bodley <cbodley@redhat.com>
whenever an index transaction uses remove_objs for complete(), it also
needs to pass them for cancel() to avoid leaking index entries

Signed-off-by: Casey Bodley <cbodley@redhat.com>
@cbodley
Copy link
Contributor Author

cbodley commented Nov 15, 2021

rebased over #43103

@cbodley
Copy link
Contributor Author

cbodley commented Nov 15, 2021

jenkins test api

1 similar comment
@cbodley
Copy link
Contributor Author

cbodley commented Nov 16, 2021

jenkins test api

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants