-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rgw/cloud-transition: fix the crash with publish_commit #57356
Conversation
src/rgw/rgw_lc.cc
Outdated
RGWObjState* obj_state{nullptr}; | ||
ret = obj->get_obj_state(oc.dpp, &obj_state, null_yield, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't really need RGWObjState to get attrs, do we? what does obj->get_attrs()
return at this point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if I am not mistaken, you do indeed need it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you omit get obj state, you'll get an empty attrs sequence
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(ask me how I know)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RadosObject::transition_to_cloud()
calls RadosReadOp::prepare()
to read the head object. this part should initialize those attrs:
https://github.com/ceph/ceph/blob/9b6d380/src/rgw/driver/rados/rgw_sal_rados.cc#L2738
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed that in remove_expired_obj()
which is used in regular LC expiration too, publish_commit
is called after the object is deleted - https://github.com/ceph/ceph/blob/9b6d380/src/rgw/rgw_lc.cc#L625 . Will the attrset
be still valid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will it be cleaner and safe to just save etag
in all the callers of publish_commit()
before applying any LC operation (like you mentioned above)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will it be cleaner and safe to just save etag in all the callers of publish_commit() before applying any LC operation (like you mentioned above)?
i think so
in the refactoring call, we talked about changes to get_obj_state()
to avoid the dangling RGWObjState*
. @dang's suggestion is to rename it to load_obj_state()
and rely on StoreObject::state
to store its updated state. so we'd call obj->load_obj_state()
then read the etag from obj->get_attrs()
. if you leave the get_obj_state()
call where it was but just save the etag, it should be easy to follow up with those other changes later
what do you think @mattbenjamin @dang?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took the original commit for 7.1, but don't object to the idea, I'll rebase it if we take that version this weekish
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dang @cbodley @mattbenjamin updated the PR with the changes as discussed above. Please review.
The obj_state may not be valid anymore post LC operations (esp., cloud-transition). Hence read and store etag prior to them to be used later by notification (publish_commit). Fixes: https://tracker.ceph.com/issues/65862 Signed-off-by: Soumya Koduri <skoduri@redhat.com>
LC Cloud transition should use set_atomic() to prevent any overwrite while updating the HEAD object. Signed-off-by: Soumya Koduri <skoduri@redhat.com>
fa28bc2
to
5be9503
Compare
https://jenkins.ceph.com/job/ceph-api/73729/
commented (again) on https://tracker.ceph.com/issues/62972 |
jenkins test api |
jenkins test make check arm64 |
@@ -1445,7 +1460,7 @@ class LCOpAction_Transition : public LCOpAction { | |||
// send request to notification manager | |||
int publish_ret = notify->publish_commit(oc.dpp, obj_state->size, | |||
ceph::real_clock::now(), | |||
obj_state->attrset[RGW_ATTR_ETAG].to_str(), | |||
etag, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorrry, just noticed when reviewing #57079 - this call to publish_commit()
still relies on obj_state->size
. don't we have the same lifetime issue there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right and moreover post cloud-transition obj_size can be '0' which can be misleading in the notification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, we should save the value of size before the expiration/transition like you did for etag
what do you want to do about the downstream version of this change?
As part of cloud transition, object's head/attrs may get updated and hence state->attrs will not be valid anymore. Fetch obj_state post the transition to access the attrs.
Fixes: https://tracker.ceph.com/issues/65862
Signed-off-by: Soumya Koduri skoduri@redhat.com
Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows
jenkins test rook e2e