New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
osd: make snapmapper warn+clean up instead of assert #20040
Conversation
src/osd/PG.cc
Outdated
@@ -4182,9 +4182,10 @@ void PG::_scan_snaps(ScrubMap &smap) | |||
<< "...repaired"; | |||
} | |||
snap_mapper.add_oid(hoid, obj_snaps, &_t); | |||
r = osd->store->apply_transaction(osr.get(), std::move(t)); | |||
// we don't particularly care when this is visible or committed. | |||
r = osd->store->queue_transaction(osr.get(), std::move(t), nullptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to worry about skipping a snaptrim, because it gets processed before the repair is complete?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I think that race can happen (although it'd be pretty extreme).
The alternative fix here is to change apply_transaction() to use onreadable_sync so that we don't deadlock against items ahead of os waiting for pg->lock, but that affects apply_transaction behavior everywhere.
Probably the best thing here is to make a apply-transaction variant (or hard code it) to wait on onreadable_sync instead of onreadable. I'll do the latter I think, since apply_transaction goes away in wip-kill-onreadable anyway so this is really a luminous-only fix!
1824b87
to
e17493b
Compare
This looks good to me, but it's still marked WIP so I'm not marking it as approved. Feel free to do so when ready! |
e17493b
to
9d6c58b
Compare
Old option is unused Signed-off-by: Sage Weil <sage@redhat.com>
Better to clean up than to crash the OSD. Signed-off-by: Sage Weil <sage@redhat.com>
This shouldn't ever happen. Signed-off-by: Sage Weil <sage@redhat.com>
9d6c58b
to
393293d
Compare
Wait for our repair to apply before continuing to avoid reading the broken state again. Signed-off-by: Sage Weil <sage@redhat.com>
393293d
to
ea531df
Compare
|
@@ -243,6 +243,7 @@ void SnapMapper::add_oid( | |||
MapCacher::Transaction<std::string, bufferlist> *t) | |||
{ | |||
dout(20) << __func__ << " " << oid << " " << snaps << dendl; | |||
assert(!snaps.empty()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is the case, we might want to assert(0 == "decode snaps failure")
at https://github.com/ceph/ceph/blob/master/src/osd/PG.cc#L3747 in PG::update_snap_map()
.
Signed-off-by: Sage Weil <sage@redhat.com>
http://tracker.ceph.com/issues/22752