Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rgw/notifications: fetch object state to get size, in rgw_lc.cc #49466

Merged
merged 2 commits into from Mar 11, 2023

Conversation

mattbenjamin
Copy link
Contributor

@mattbenjamin mattbenjamin commented Dec 15, 2022

rgw/notifications: fetch object state to get size, in rgw_lc.cc

Failure to call get_obj_state() leaves object size and other members
uninitialized, and appears to result in in lc delete notifications
with 0 for object size.

Fixes: https://tracker.ceph.com/issues/58287

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@github-actions github-actions bot added the rgw label Dec 15, 2022
@mattbenjamin mattbenjamin changed the title Wip lc size rgw/notifications: fetch object state to get size, in rgw_lc.cc Dec 15, 2022
@@ -582,7 +586,7 @@ static int remove_expired_obj(
"ERROR: publishing notification failed, with error: " << ret << dendl;
} else {
// send request to notification manager
(void) notify->publish_commit(dpp, obj->get_obj_size(),
(void) notify->publish_commit(dpp, obj_state->size,
ceph::real_clock::now(),
obj->get_attrs()[RGW_ATTR_ETAG].to_str(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we change this line as well to:

obj_state->attrset[RGW_ATTR_ETAG].to_str()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add that (thanks for contributing this!)

@yuvalif yuvalif self-requested a review December 16, 2022 16:29
mattbenjamin and others added 2 commits December 16, 2022 13:55
Failure to call get_obj_state() leaves object size and other members
uninitialized, and appears to result in in lc delete notifications
with 0 for object size.

Fixes: https://tracker.ceph.com/issues/58287

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
@yuvalif
Copy link
Contributor

yuvalif commented Dec 18, 2022

added this commit for testing: 864cf96

@ivancich
Copy link
Member

jenkins test api

@cbodley
Copy link
Contributor

cbodley commented Mar 11, 2023

@cbodley cbodley merged commit ba99497 into ceph:main Mar 11, 2023
@cfsnyder
Copy link
Contributor

@cbodley @mattbenjamin I'd like to backport this one to Pacific as well, if there aren't any objections? I'm not interested for the sake of the notification bug, but for the fact that this change resolves another issue with index entries being left behind after all instances of objects are deleted. The case is specifically when the OLH has stale pending xattrs (that's another set of issues, but we've seen it happen very frequently in our production environments). They get left behind when ops complete abnormally. They aren't cleaned up unless there are future requests for the key. Without the get_obj_state(follow_olh=true) call here, those aren't cleaned up prior to the apply_olh_log condition [1] for removal of the OLH object and OLH/plain index entries - so the OLH object and index entries stay there permanently if the key isn't reused. When there are a lot of these leftover entries, they are problematic for bucket listing latency because they can cause significant extra iteration. That in turn causes problems for future LC invocations since errors occur when trying to do bucket listings. We currently have about a dozen large buckets where LC fails to make progress due to fallout from this issue.

[1]

cls_obj_check_prefix_exist(rm_op, RGW_ATTR_OLH_PENDING_PREFIX, true); /* fail if found one of these, pending modification */

@mattbenjamin
Copy link
Contributor Author

@cfsnyder I wasn't aware of this side-effect, thanks for the detailed explanation. I have no objection to backporting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants