New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
osd: allow peek_map_epoch to return an error #5892
Conversation
dedde69
to
e1f7709
Compare
@@ -2864,13 +2866,19 @@ epoch_t PG::peek_map_epoch(ObjectStore *store, | |||
values.clear(); | |||
keys.insert(ek); | |||
store->omap_get_values(META_COLL, legacy_infos_oid, keys, &values); | |||
assert(values.size() == 1); | |||
if (values.size() < 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to ensure this is not an accident data file loss?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to ensure this is not an accident data file loss?
I'm not sure how to tell.
But, let's say the infos object is missing but shouldn't be.. in that
case, all PGs will be ignored and the osd will come up (vs previously we
would crash). There is a corner case where this is bad: the osd in
question had the only copy of the latest write, peering probed it, and got
a does not exist response and silently reverts to an older copy.
Something similar happens from the pg import/export tests currently (see
#12687). I don't think there is a way to tell, though...
ceph/ceph-qa-suite#563 verifies the hammer part of the fix. waiting on gitbuilder to verify the infernalis fix too. |
@dzafman Do you mind reviewing this one? |
e1f7709
to
1123439
Compare
bufferlist *bl) | ||
int PG::peek_map_epoch(ObjectStore *store, | ||
spg_t pgid, | ||
epoch_t *pepoch, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm use to the p at then end of the name instead of the front.
Other than some minor nits we can fix later, 👍 |
Allow PG::peek_map_epoch to return an error indicating the PG should be skipped. Signed-off-by: Sage Weil <sage@redhat.com> (cherry picked from commit f15d958) [fixed *pepoch default of 0] [fixed other return paths in peek_map_epoch]
- pg is removed - osd is stopped before pg is fully removed - on restart, we ignore/skip the pg because its epoch is too old - we upgrade to hammer and convert other pgs, skipping this one, and then remove the legacy infos object - hammer starts, tries to parse the legacy pg, and fails because the infos object is gone, crashing. The fix is to continue ignoring the zombie pg. Fixes: #16030 Signed-off-by: Sage Weil <sage@redhat.com>
1123439
to
4be8a28
Compare
osd: allow peek_map_epoch to return an error Reviewed-by: David Zafman <dzafman@redhat.com>
http://tracker.ceph.com/issues/13060