Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd/PrimaryLogPG: send requests to primary on cache miss #16884

Merged
merged 2 commits into from Sep 30, 2017

Conversation

liewegas
Copy link
Member

@liewegas liewegas commented Aug 7, 2017

If a client has {BALANCE,LOCALIZE}_READS and sends a request to a
replica, but the object isn't in the cache, send them back to the
primary. Otherwise we might do something rash (like trigger a
promotion from a replica).

Fixes: http://tracker.ceph.com/issues/20919
Signed-off-by: Sage Weil sage@redhat.com

if (!is_primary()) {
dout(20) << __func__ << " cache miss; ask the primary" << dendl;
osd->reply_op_error(op, -EAGAIN);
return cache_result_t::REPLIED_WITH_EAGAIN;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we actually need this to go at the top or we might get weird ordering bugs. eg, if the primary is promoting objects while the listing is happening, or if we return out for NOOPs because we have incorrect metadata that we're looking at as a result of being non-primary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The replica will have a missing set, so it'll bail out already if the pool is out of sync for the object in question before this method is called in do_op(). And more practically we can't move it up any higher in this method anyway or else we're above the "cache hit" case. Do you have a specific case in mind? I can't think of anything..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I don't remember anything more specific now. I guess if we're a replica we're only serving read operations, so as long as we can't serve a read while the object is somehow invalid then we're okay.
I can't think of any way the primary could logically finish nuking or overwriting an object that we wouldn't know about as a replica, so we should be good. Can you?

@gregsfortytwo
Copy link
Member

While you're editing the function, I'm also a bit surprised we check if there's a cache enabled after we check if the CACHE_IGNORE flag is set.

If a client has {BALANCE,LOCALIZE}_READS and sends a request to a
replica, but the object isn't in the cache, send them back to the
primary.  Otherwise we might do something rash (like trigger a
promotion from a replica).

Fixes: http://tracker.ceph.com/issues/20919
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
@liewegas
Copy link
Member Author

Yeah, that was weird.. moved it up!

@liewegas
Copy link
Member Author

retest this please

@liewegas
Copy link
Member Author

@gregsfortytwo ping

@gregsfortytwo
Copy link
Member

Approved this @liewegas.

@tchaikov
Copy link
Contributor

the failed one is caused by d7b29ac#r141996981, i believe

@tchaikov tchaikov merged commit 259b3c1 into ceph:master Sep 30, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants