New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
osd: populate last_epoch_split during build_initial_pg_history #16519
Conversation
src/osd/PrimaryLogPG.cc
Outdated
@@ -1848,7 +1848,8 @@ void PrimaryLogPG::do_op(OpRequestRef& op) | |||
<< std::hex << head.get_hash() << std::dec << dendl; | |||
osd->clog->warn() << info.pgid.pgid << " does not contain " << head | |||
<< " op " << *m; | |||
assert(!cct->_conf->osd_debug_misdirected_ops); | |||
assert(!cct->_conf->osd_debug_misdirected_ops || | |||
m->get_epoch() < info.history.last_epoch_split); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does not compile, s/m->get_epoch()
/m->get_map_epoch()
/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the op have been "split" as well as the PG?
Or if the send epoch was across a boundary when the message was received, we should have detected that and dropped it because the client will resend, right?
If we get a pg create and have to generate the pg_history_t fields, populate the last_epoch_split field too. This is needed discard ops sent from before the last split. Specifically, PG::can_discard_op() looks at it when the client has the RESEND_ON_SPLIT feature bit. If we don't discard, we may run afoul of assertions later (e.g., that the object belongs to the PG at all) or potentially process an op out of order. Fixes: http://tracker.ceph.com/issues/20754 Signed-off-by: Sage Weil <sage@redhat.com>
You're right.. I thought this was before that check but it's not. core reveals that the last_epoch_split was 0 (should be 80). Pushed a new fix. |
jenkins test this please |
Does this PR will avoid the assert? if the send epoch across the boundary when pg split, it should be detected and resend, right? |
The client will already resend; this PR fixes it so that we properly ignore the op sent before the split. |
@liewegas This explains my kernel client paste from last week: https://pastebin.com/B3Tvx2kG. Should have pressed on, I guess ;) |
Oh, good!
|
If the request was sent before a split then this isn't actually a bug.
Fixes: http://tracker.ceph.com/issues/20754
Signed-off-by: Sage Weil sage@redhat.com