Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mon,mgr: trim osdmap without the help of pgmap #14504

Merged
merged 4 commits into from Apr 21, 2017

Conversation

Projects
None yet
3 participants
@tchaikov
Copy link
Contributor

tchaikov commented Apr 13, 2017

No description provided.

@tchaikov tchaikov added mgr mon labels Apr 13, 2017

@tchaikov tchaikov requested a review from liewegas Apr 13, 2017

@tchaikov

This comment has been minimized.

Copy link
Contributor Author

tchaikov commented Apr 13, 2017

i will run rados suites with this change tomorrow. but any comment is welcomed!

@@ -2331,6 +2331,8 @@ int OSD::init()
MPGStats *m = new MPGStats(monc->get_fsid(), osdmap->get_epoch(), had_for);
m->osd_stat = cur_stat;

min_last_epoch_clean = osdmap->get_epoch();
min_last_epoch_clean_pgs.clear();

This comment has been minimized.

Copy link
@liewegas

liewegas Apr 13, 2017

Member

where are these declared? it looks like they are protected by map_lock here...

This comment has been minimized.

Copy link
@tchaikov

tchaikov Apr 13, 2017

Author Contributor

the are defined as member variables of this. and we need to reset them every time.

monc->send_mon_message(new MOSDBeacon());
auto beacon = new MOSDBeacon(osdmap->get_epoch(), min_last_epoch_clean);
beacon->pgs = std::move(min_last_epoch_clean_pgs);
monc->send_mon_message(beacon);

This comment has been minimized.

Copy link
@liewegas

liewegas Apr 13, 2017

Member

...but we need to hold the same (map_lock) or another lock here, right?

This comment has been minimized.

Copy link
@tchaikov

tchaikov Apr 13, 2017

Author Contributor

yes. i will guard this with a lock (map_lock maybe).

for (const auto& pg_stat : pg_map->pg_stat) {
auto lec = pg_stat.second.get_effective_last_epoch_clean();
min_last_epoch_clean_by_pg[pg_stat.first] = lec;
}

This comment has been minimized.

Copy link
@liewegas

liewegas Apr 13, 2017

Member

this runs every iteration.. we probably don't want that?

maybe it isn't even necessary to transition.. we can just wait until teh new beacons come in and resume trimming once we have a complete view from that?

This comment has been minimized.

Copy link
@tchaikov

tchaikov Apr 13, 2017

Author Contributor

i just want to make sure we have all PGs in min_last_epoch_clean_by_pg before using it to trim osdmap. otherwise, the value of min_lec could be determined by a subset of pg, that would be catastrophic.

@liewegas

This comment has been minimized.

Copy link
Member

liewegas commented Apr 13, 2017

@tchaikov

This comment has been minimized.

Copy link
Contributor Author

tchaikov commented Apr 13, 2017

ahh, right. i will find another way to ensure we have all pg in min_last_epoch_clean_by_pg or check for it before calc the lec.

@gregsfortytwo

This comment has been minimized.

Copy link
Member

gregsfortytwo commented Apr 13, 2017

Can you talk through the strategy here for making sure we don't accidentally trim past where we should? This is something I'm very interested in, but I'm also concerned that certain sequences of events could lead to OSDs which have last_epoch_cleans implying we can trim, yet have PGs which get left behind.
For instance, does last_epoch_clean mean the OSD didn't have any stray PGs at that epoch? (I'm not sure.) If it doesn't, I don't think this can ever work.
I skimmed this and don't see you persisting the osd_epoch map you're using to find floors. Doesn't that mean when the leader mon changes, it'll disregard any OSDs which it doesn't yet have a beacon for when calculating trim? That seems like a bunch of down OSDs can just get ignored!

@liewegas

This comment has been minimized.

Copy link
Member

liewegas commented Apr 13, 2017

@tchaikov tchaikov force-pushed the tchaikov:wip-another-mgr-command branch from 40f07a4 to 44b156e Apr 14, 2017

@tchaikov

This comment has been minimized.

@tchaikov tchaikov requested a review from liewegas Apr 14, 2017

@tchaikov

This comment has been minimized.

@tchaikov

This comment has been minimized.

Copy link
Contributor Author

tchaikov commented Apr 18, 2017

@liewegas @gregsfortytwo mind taking another look?

}
if (!mon->pgmon()->is_readable()) {

This comment has been minimized.

Copy link
@liewegas

liewegas Apr 18, 2017

Member

I think we also need to make sure we have the complete set of pgs accounted for here... that's a slow enumeration though. maybe we could restructure it into a two-level map, poolid -> pg -> epoch, so that you can count by pool, and only trust the value if the pg count matches the pool's pg_num? then if there is a recent pool creation or deletion we end up not trimming at all, which ought to be fine?

This comment has been minimized.

Copy link
@liewegas

liewegas Apr 18, 2017

Member

could put that into a class too with a simple interface like report(pg_t, epoch_t) and bool get_lower_bound(const OSDMap& latest, epoch_t *bound) ?

@@ -2134,6 +2134,9 @@ class OSD : public Dispatcher,
void flush_pg_stats();

ceph::coarse_mono_clock::time_point last_sent_beacon;
epoch_t min_last_epoch_clean = 0;
// which pgs were scanned for min_lec
std::vector<pg_t> min_last_epoch_clean_pgs;

This comment has been minimized.

Copy link
@liewegas

liewegas Apr 18, 2017

Member

these belong in the next commit

{
Mutex::Locker l{min_last_epoch_clean_lock};
beacon = new MOSDBeacon(osdmap->get_epoch(), min_last_epoch_clean);
beacon->pgs = std::move(min_last_epoch_clean_pgs);

This comment has been minimized.

Copy link
@liewegas

liewegas Apr 18, 2017

Member

swap() instead? i don't really understand what the post-condition of std::move is. here we want min_last_epoch_clean_pgs to be valid but empty, but in other cases the object is unusable

@tchaikov tchaikov force-pushed the tchaikov:wip-another-mgr-command branch from 71b5e10 to c1d29b1 Apr 19, 2017

tchaikov added some commits Apr 19, 2017

osd/OSDMonitor: trim last_osd_report if the osd is marked up/down
and clear it at restart()

Signed-off-by: Kefu Chai <kchai@redhat.com>
messages/MOSDBeacon: add last_epoch_clean support
Signed-off-by: Kefu Chai <kchai@redhat.com>
osd: piggyback lec in OSDBeacon message
Signed-off-by: Kefu Chai <kchai@redhat.com>

@tchaikov tchaikov force-pushed the tchaikov:wip-another-mgr-command branch from c1d29b1 to 672e72c Apr 19, 2017

mon/OSDMonitor: use lec in OSDBeacon to trim osdmap
Signed-off-by: Kefu Chai <kchai@redhat.com>

@tchaikov tchaikov force-pushed the tchaikov:wip-another-mgr-command branch from 672e72c to b5dd112 Apr 19, 2017

@liewegas
Copy link
Member

liewegas left a comment

this looks great!

@tchaikov

This comment has been minimized.

@tchaikov tchaikov merged commit 2f09a20 into ceph:master Apr 21, 2017

3 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodifed Submodules submodules for project are unmodified
Details
default Build finished.
Details

@tchaikov tchaikov deleted the tchaikov:wip-another-mgr-command branch Apr 21, 2017

}
epoch_by_pg[ps] = last_epoch_clean;
if (last_epoch_clean < floor) {
floor = last_epoch_clean;

This comment has been minimized.

Copy link
@gregsfortytwo

gregsfortytwo Apr 27, 2017

Member

It's a bit late, but this part here is a bug @tchaikov: there might be another PG whose lec is between floor and ps' last_epoch_clean, so updating the pool's floor requires doing a scan over the PGs. Right?

This comment has been minimized.

Copy link
@tchaikov

tchaikov Apr 27, 2017

Author Contributor

no, we are conservative here. even if there is one or more pgs whose lec is between floor and ps. we use the smallest lec. so we don't trim the epochs still being used by the pg whose lec is the smallest.

This comment has been minimized.

Copy link
@gregsfortytwo

gregsfortytwo Apr 27, 2017

Member

Ah right, this only lets us push floor downwards, never up.

So the part I'm having trouble with is how we ever increase the reported floor at all. Am I missing something where we re-create this LastEpochClean struct?

This comment has been minimized.

Copy link
@tchaikov

tchaikov Apr 28, 2017

Author Contributor

ahh, that's a bug! lemme fix it.

This comment has been minimized.

Copy link
@tchaikov
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.