-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
osd/PGPool::update: optimize with deleting_snaps #18147
Conversation
Add a pg_pool_t.deleting_snaps member that tracks the current set of snaps to be purged, in order to avoid expensive operations involving the entire set of removed snaps. The snaps are trimmed from deleting snaps at the same time as they are added to the pg_info_t.purged_snaps set. Suggested-by: Greg Farnum <gfarnum@redhat.com> Signed-off-by: Zac Medico <zmedico@gmail.com>
5dc4bff
to
31d20fc
Compare
@@ -91,6 +91,7 @@ DEFINE_CEPH_FEATURE(15, 1, MONENC) | |||
DEFINE_CEPH_FEATURE_RETIRED(16, 1, QUERY_T, JEWEL, LUMINOUS) | |||
DEFINE_CEPH_FEATURE(16, 3, SERVER_O) | |||
DEFINE_CEPH_FEATURE_RETIRED(17, 1, INDEP_PG_MAP, JEWEL, LUMINOUS) | |||
DEFINE_CEPH_FEATURE(17, 1, OSD_DELETING_SNAPS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we may want to piggy-back this on SERVER_M to avoid spending a distinct bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need the feature bit at all. We only use it when encoding and decoding the pg_pool_t, but we can do that entirely based on the local struct_v. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What information do we use to choose between 26 and 27 here, if we don't use a flag like OSD_DELETING_SNAPS or SERVER_M?
uint8_t v = 27;
if (!HAVE_FEATURE(features, OSD_DELETING_SNAPS)) {
v = 26;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we go this route I think we do need feature-dependent decoding. We have a newish requirement that the canonical OSDMaps encoded at the monitor be encoded in the way that the oldest osd in the cluster would encode (to avoid divergence and the need for OSDs to pull full osdmaps from the mon). And pg_pool_t is in the OSDMap. We should just reuse SERVER_MIMIC for this.
But.. I'm not so sure we want to go down this path. I agree that it's a big win compared to the status quo, but once we have a more complete fix I think it will be needlessly complex. I think the question is whether we expect to address that in the mimic timeframe. Started pondering that... let's discuss next week?
My other thought is that if the plan is to make purged_snaps only reflect snaps that have been purged recently but not yet retired into some other non-OSDMap-resident structure, then the sets will be small and this might not be that helpful. |
src/osd/osd_types.cc
Outdated
@@ -1310,6 +1310,14 @@ bool pg_pool_t::maybe_updated_removed_snaps(const interval_set<snapid_t>& cached | |||
return true; | |||
} | |||
|
|||
void pg_pool_t::purged_snap(const snapid_t& purged) { | |||
deleting_snaps.erase(purged); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this needs to check if deleting_snaps contains the purged snap, since I have an OSD that crashes during startup as follows:
ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)
1: (()+0x9fbb3e) [0x562b96be1b3e]
2: (()+0x10ea0) [0x7f0753f13ea0]
3: (gsignal()+0x38) [0x7f075328a148]
4: (abort()+0x16a) [0x7f075328b5ca]
5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x28e) [0x562b96c2489e]
6: (pg_pool_t::purged_snap(snapid_t const&)+0x245) [0x562b968b64f5]
7: (PrimaryLogPG::AwaitAsyncWork::react(PrimaryLogPG::DoSnapWork const&)+0x105a) [0x562b96803fca]
8: (boost::statechart::simple_state<PrimaryLogPG::AwaitAsyncWork, PrimaryLogPG::Trimming, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::
na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0xca) [0x562b96854a3a]
9: (boost::statechart::state_machine<PrimaryLogPG::SnapTrimmer, PrimaryLogPG::NotTrimming, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x56) [0x562b96835146]
10: (PrimaryLogPG::snap_trimmer(unsigned int)+0x19c) [0x562b967a163c]
11: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x100c) [0x562b9666987c]
12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x884) [0x562b96c29504]
13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x562b96c2c540]
14: (()+0x73e4) [0x7f0753f0a3e4]
15: (clone()+0x6d) [0x7f075333ea9d]
This will fix the follwing crash during OSD startup: ceph version 12.2.0 (32ce2a3) luminous (rc) 1: (()+0x9fbb3e) [0x562b96be1b3e] 2: (()+0x10ea0) [0x7f0753f13ea0] 3: (gsignal()+0x38) [0x7f075328a148] 4: (abort()+0x16a) [0x7f075328b5ca] 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x28e) [0x562b96c2489e] 6: (pg_pool_t::purged_snap(snapid_t const&)+0x245) [0x562b968b64f5] 7: (PrimaryLogPG::AwaitAsyncWork::react(PrimaryLogPG::DoSnapWork const&)+0x105a) [0x562b96803fca] 8: (boost::statechart::simple_state<PrimaryLogPG::AwaitAsyncWork, PrimaryLogPG::Trimming, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0xca) [0x562b96854a3a] 9: (boost::statechart::state_machine<PrimaryLogPG::SnapTrimmer, PrimaryLogPG::NotTrimming, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x56) [0x562b96835146] 10: (PrimaryLogPG::snap_trimmer(unsigned int)+0x19c) [0x562b967a163c] 11: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x100c) [0x562b9666987c] 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x884) [0x562b96c29504] 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x562b96c2c540] 14: (()+0x73e4) [0x7f0753f0a3e4] 15: (clone()+0x6d) [0x7f075333ea9d] Fixes: 31d20fc ("osd/PGPool::update: optimize with deleting_snaps") Signed-off-by: Zac Medico <zmedico@gmail.com>
src/osd/osd_types.cc
Outdated
@@ -1311,7 +1311,8 @@ bool pg_pool_t::maybe_updated_removed_snaps(const interval_set<snapid_t>& cached | |||
} | |||
|
|||
void pg_pool_t::purged_snap(const snapid_t& purged) { | |||
deleting_snaps.erase(purged); | |||
if (deleting_snaps.contains(purged, 1)) | |||
deleting_snaps.erase(purged); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be worth doing the iterator-based find() and erase, instead of having to look it up twice.
Normally I wouldn't fuss about something like that, but since you're having CPU issues dealing with the size of these sets... ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. However, in order to do this cleanly, it looks like we'll have to change the interval_set::erase
signature to void erase(T start, T len, bool ignore_missing=false)
or something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh boo, I was just thinking in terms of the STL. Presumably we'd want to add an iterator-based erease() rather than remove the existing interface though.
So based on the name I thought this PR was the (much more extensive!) thing involving the monitor and actually reducing the size of the stuff in the OSDMap. But going through here, it's actually "just" reducing the size of the set used for doing pg-local comparisons against the OSDMap. So, that is presumably actually useful. But it's also not reducing the memory used at all (in fact, it's inflating it a bit), which I thought was also part of what you wanted out of doing the local caching like this. Do you have numbers about how this set of things CPU and memory use? |
I agree that it would be better to work toward reducing the size of the OSDMap. Given that, I'm not sure that it's really worthwhile to attempt to measure the benefits of this patch. Maybe it's better to simply abandon it. My feeling is that the other optimizations that have recently been merged (like #17410 and #17820) are probably good enough for anyone in need of short-term solutions. |
I may have sounded more down on this than I meant. The patch is certainly a good direction to go. I just want to make sure we're not making things worse for these extreme use cases, and that if we're going to complicate the code it's extensible for the future. |
src/osd/PG.cc
Outdated
interval_set<snapid_t> removed_snaps = newly_removed_snaps; | ||
newly_removed_snaps.subtract(cached_removed_snaps); | ||
cached_removed_snaps.swap(removed_snaps); | ||
if (cached_deleting_snaps.subset_of(pi->get_deleting_snaps())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect that the subset_of logic is not quite right here, since deleting_snaps is expected to shrink when snaps are purged.
The deleting_snaps set is expected to shrink when a snap is purged, so cached_deleting_snaps is typically not expected to be a subset of cached_deleting_snaps. Fixes: 31d20fc ("osd/PGPool::update: optimize with deleting_snaps") Signed-off-by: Zac Medico <zmedico@gmail.com>
0afbb6d
to
aa70ee6
Compare
Closing in favor of #18276. |
Add a pg_pool_t.deleting_snaps member that tracks
the current set of snaps to be purged, in order to
avoid expensive operations involving the entire
set of removed snaps. The snaps are trimmed from
deleting snaps at the same time as they are added
to the pg_info_t.purged_snaps set.
Suggested-by: Greg Farnum
@gregsfortytwo @branch-predictor