Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd: populate last_epoch_split during build_initial_pg_history #16519

Merged
merged 1 commit into from Jul 26, 2017

Conversation

liewegas
Copy link
Member

If the request was sent before a split then this isn't actually a bug.

Fixes: http://tracker.ceph.com/issues/20754
Signed-off-by: Sage Weil sage@redhat.com

@liewegas liewegas added this to the luminous milestone Jul 24, 2017
@liewegas liewegas requested a review from jdurgin July 24, 2017 02:37
@@ -1848,7 +1848,8 @@ void PrimaryLogPG::do_op(OpRequestRef& op)
<< std::hex << head.get_hash() << std::dec << dendl;
osd->clog->warn() << info.pgid.pgid << " does not contain " << head
<< " op " << *m;
assert(!cct->_conf->osd_debug_misdirected_ops);
assert(!cct->_conf->osd_debug_misdirected_ops ||
m->get_epoch() < info.history.last_epoch_split);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does not compile, s/m->get_epoch()/m->get_map_epoch()/

Copy link
Member

@gregsfortytwo gregsfortytwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the op have been "split" as well as the PG?

Or if the send epoch was across a boundary when the message was received, we should have detected that and dropped it because the client will resend, right?

If we get a pg create and have to generate the pg_history_t fields,
populate the last_epoch_split field too.  This is needed discard ops sent
from before the last split.  Specifically, PG::can_discard_op() looks at
it when the client has the RESEND_ON_SPLIT feature bit.  If we don't
discard, we may run afoul of assertions later (e.g., that the object
belongs to the PG at all) or potentially process an op out of order.

Fixes: http://tracker.ceph.com/issues/20754
Signed-off-by: Sage Weil <sage@redhat.com>
@liewegas
Copy link
Member Author

You're right.. I thought this was before that check but it's not. core reveals that the last_epoch_split was 0 (should be 80). Pushed a new fix.

@gregsfortytwo
Copy link
Member

jenkins test this please

@liu-chunmei
Copy link
Contributor

Does this PR will avoid the assert? if the send epoch across the boundary when pg split, it should be detected and resend, right?

@liewegas
Copy link
Member Author

The client will already resend; this PR fixes it so that we properly ignore the op sent before the split.

@idryomov
Copy link
Contributor

@liewegas This explains my kernel client paste from last week: https://pastebin.com/B3Tvx2kG. Should have pressed on, I guess ;)

@liewegas
Copy link
Member Author

liewegas commented Jul 25, 2017 via email

@liewegas liewegas changed the title osd/PrimaryLogPG: fix bad assert osd: populate last_epoch_split during build_initial_pg_history Jul 25, 2017
@tchaikov tchaikov merged commit 483efaf into ceph:master Jul 26, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants