Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd: allow peek_map_epoch to return an error #5892

Merged
merged 2 commits into from Sep 12, 2015

Conversation

Projects
None yet
4 participants
@liewegas
Copy link
Member

liewegas commented Sep 11, 2015

@liewegas liewegas added this to the hammer milestone Sep 11, 2015

@liewegas liewegas force-pushed the wip-13060-hammer branch from dedde69 to e1f7709 Sep 11, 2015

@@ -2864,13 +2866,19 @@ epoch_t PG::peek_map_epoch(ObjectStore *store,
values.clear();
keys.insert(ek);
store->omap_get_values(META_COLL, legacy_infos_oid, keys, &values);
assert(values.size() == 1);
if (values.size() < 1) {

This comment has been minimized.

@yuyuyu101

yuyuyu101 Sep 11, 2015

Member

Do we need to ensure this is not an accident data file loss?

This comment has been minimized.

@liewegas

liewegas Sep 11, 2015

Author Member

Do we need to ensure this is not an accident data file loss?

I'm not sure how to tell.

But, let's say the infos object is missing but shouldn't be.. in that
case, all PGs will be ignored and the osd will come up (vs previously we
would crash). There is a corner case where this is bad: the osd in
question had the only copy of the latest write, peering probed it, and got
a does not exist response and silently reverts to an older copy.
Something similar happens from the pg import/export tests currently (see
#12687). I don't think there is a way to tell, though...

@liewegas

This comment has been minimized.

Copy link
Member Author

liewegas commented Sep 11, 2015

ceph/ceph-qa-suite#563 verifies the hammer part of the fix. waiting on gitbuilder to verify the infernalis fix too.

@liewegas

This comment has been minimized.

Copy link
Member Author

liewegas commented Sep 11, 2015

@dzafman Do you mind reviewing this one?

@liewegas liewegas force-pushed the wip-13060-hammer branch from e1f7709 to 1123439 Sep 11, 2015

bufferlist *bl)
int PG::peek_map_epoch(ObjectStore *store,
spg_t pgid,
epoch_t *pepoch,

This comment has been minimized.

@dzafman

dzafman Sep 11, 2015

Member

I'm use to the p at then end of the name instead of the front.

@@ -738,10 +738,14 @@ int mark_pg_for_removal(ObjectStore *fs, spg_t pgid, ObjectStore::Transaction *t
ghobject_t pgmeta_oid(info.pgid.make_pgmeta_oid());

bufferlist bl;
PG::peek_map_epoch(fs, pgid, &bl);
epoch_t pg_epoch;
int r = PG::peek_map_epoch(fs, pgid, &pg_epoch, &bl);

This comment has been minimized.

@dzafman

dzafman Sep 11, 2015

Member

Shouldn't pg_epoch be called "dummy" or "unused" ?

map_epoch = PG::peek_map_epoch(fs, pgid, &bl);
r = PG::peek_map_epoch(fs, pgid, &map_epoch, &bl);
if (r < 0)
cerr << "peek_map_epoch returns an error" << std::endl;

This comment has been minimized.

@dzafman

dzafman Sep 11, 2015

Member

We need to initialize map_epoch to 0. We should disallow a pg import with a map_epoch == 0, while allowing recovery of data using rados import. I would fail a rm-past-intervals in the error case.

@dzafman

This comment has been minimized.

Copy link
Member

dzafman commented Sep 11, 2015

Other than some minor nits we can fix later, 👍

liewegas added some commits Sep 11, 2015

osd: allow peek_map_epoch to return an error
Allow PG::peek_map_epoch to return an error indicating the PG
should be skipped.

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit f15d958)

[fixed *pepoch default of 0]
[fixed other return paths in peek_map_epoch]
osd/PG: peek_map_epoch: skip legacy PGs if infos object is missing
- pg is removed
- osd is stopped before pg is fully removed
- on restart, we ignore/skip the pg because its epoch is too old
- we upgrade to hammer and convert other pgs, skipping this one, and then
  remove the legacy infos object
- hammer starts, tries to parse the legacy pg, and fails because the infos
  object is gone, crashing.

The fix is to continue ignoring the zombie pg.

Fixes: #16030
Signed-off-by: Sage Weil <sage@redhat.com>

@liewegas liewegas force-pushed the wip-13060-hammer branch from 1123439 to 4be8a28 Sep 12, 2015

liewegas added a commit that referenced this pull request Sep 12, 2015

Merge pull request #5892 from ceph/wip-13060-hammer
osd: allow peek_map_epoch to return an error

Reviewed-by: David Zafman <dzafman@redhat.com>

@liewegas liewegas merged commit f35c53d into hammer Sep 12, 2015

@liewegas liewegas deleted the wip-13060-hammer branch Sep 12, 2015

@ghost ghost changed the title DNM: osd: allow peek_map_epoch to return an error osd: allow peek_map_epoch to return an error Sep 13, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.