Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mon,osd,osdc: refactor snap trimming (phase 1) #18276

Merged
merged 32 commits into from
Dec 7, 2017
Merged
Changes from 1 commit
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
c536d4c
osd/osd_types: note about removed_snaps hack
liewegas Oct 13, 2017
c8bfe3f
osd/PG: share_pg_info shares past_itnervals, not PastIntervals()
liewegas Dec 1, 2017
81d63f2
osd/OSDMap: improve osdmap flag dumping in json
liewegas Dec 1, 2017
df7523b
qa/suites/rados/singleton/all/thrash-eio: more whitelist
liewegas Dec 2, 2017
ea308ad
include/interval_set: add get_end() to iterator
liewegas Oct 30, 2017
3119cf5
include/mempool: add flat_set alias
liewegas Oct 16, 2017
1b1eec2
include/types: flat_set operator<<
liewegas Oct 16, 2017
b9c5a24
osd/osd_types: SnapSet: remove get_first_snap_after()
liewegas Oct 28, 2017
e89649d
mds/SnapServer: fix reset()
liewegas Oct 17, 2017
1f133a2
mon/OSDMonitor: reset OSDMap state before decode
liewegas Oct 13, 2017
37c4aff
mon/OSDMonitor: clear pending_metadata* in create_pending
liewegas Oct 12, 2017
553048f
osd/OSDMap: track newly removed and purged snaps in each epoch
liewegas Oct 11, 2017
9d606c5
mon/OSDMonitor: record removed_snaps by epoch outside of the osdmap
liewegas Oct 13, 2017
49833c3
mon/OSDMonitor: share snaps removed during a map gap
liewegas Oct 12, 2017
38e96ec
mon/MgrStatMonitor: dump PGMapDigest at debug level 20
liewegas Nov 29, 2017
32d7538
osdc/Objecter: prune new_removed_snaps from active op snapc's
liewegas Oct 12, 2017
b1b8fc6
osdc/Objecter: rename _scan_requests force_resend -> skipped_map
liewegas Oct 12, 2017
192a8dc
osdc/Objecter: apply removed_snaps from gap to in-flight requests
liewegas Oct 12, 2017
a53ba73
osd,mon: add 'nosnaptrim' osd flag
liewegas Dec 1, 2017
345d3b6
osd/osd_types: add purged_snaps to pg_stat_t
liewegas Oct 12, 2017
6df912b
osd/PG: share purged_snaps with mgr at mimic
liewegas Oct 12, 2017
86f0b81
mon/PGMap: add purged_snaps map to PGMapDigest
liewegas Oct 13, 2017
e5f62fb
osd/PG: move debug_verify_cached_snaps check into PGPool::update
liewegas Oct 13, 2017
33c9907
osd/PG: some whitespace
liewegas Nov 3, 2017
f04729c
osd/PG: break out of Active AdvMap handler if interval change
liewegas Dec 1, 2017
231ec67
osd/PG: simplify replica purged_snaps update
liewegas Dec 1, 2017
6e1b7c4
osd/PG: use new mimic osdmap structures for removed, pruned snaps
liewegas Nov 3, 2017
16c5bcc
osd/osd_types: pg_pool_t: add FLAG_{SELFMANAGED,POOL}_SNAPS flags
liewegas Oct 13, 2017
fd6a59e
mon/OSDMonitor: convert removed_snaps on first mimic map
liewegas Oct 16, 2017
9607a2d
mon/OSDMonitor: prune purged snaps
liewegas Oct 28, 2017
f2d602a
mon/OSDMonitor: propagate new_removed_snaps to other tiers
liewegas Nov 5, 2017
8c44dab
osd/PG: ignore purged_snaps inconsistencies for now
liewegas Dec 2, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions src/osd/PG.cc
Original file line number Diff line number Diff line change
Expand Up @@ -7274,6 +7274,16 @@ PG::RecoveryState::Active::Active(my_context ctx)
boost::statechart::result PG::RecoveryState::Active::react(const AdvMap& advmap)
{
PG *pg = context< RecoveryMachine >().pg;
if (pg->should_restart_peering(
advmap.up_primary,
advmap.acting_primary,
advmap.newup,
advmap.newacting,
advmap.lastmap,
advmap.osdmap)) {
ldout(pg->cct, 10) << "Active advmap interval change, fast return" << dendl;
return forward_event();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn’t this caught somewhere else? We haven’t changed the peering algorithms here...
I presume it’s because of the map processing change, but I’m not quite seeing how.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was one of those cases where I was surprised we hadn't hit it before. Until now, the deeply-nested states' AdvMap handler didn't do anything important, so the fact that the outer-state handler that detects the interval change runs after didn't matter. Now, I've added processing to that handler that gets royally confused when it isn't (yet) aware of the interval change. I forget now which crash I saw, but I think it was that purged_snaps (in pg_info_t) was being updated differently. I suspect the option 1 tolerates that better now, but the rest of this function is still all work that shouldn't be run at all if the interval just changed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

ldout(pg->cct, 10) << "Active advmap" << dendl;
if (!pg->pool.newly_removed_snaps.empty()) {
pg->snap_trimq.union_of(pg->pool.newly_removed_snaps);
Expand Down