Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd/PGPool::update: optimize with deleting_snaps #18147

Closed
wants to merge 3 commits into from

Conversation

zmedico
Copy link
Contributor

@zmedico zmedico commented Oct 6, 2017

Add a pg_pool_t.deleting_snaps member that tracks
the current set of snaps to be purged, in order to
avoid expensive operations involving the entire
set of removed snaps. The snaps are trimmed from
deleting snaps at the same time as they are added
to the pg_info_t.purged_snaps set.

Suggested-by: Greg Farnum

@gregsfortytwo @branch-predictor

Add a pg_pool_t.deleting_snaps member that tracks
the current set of snaps to be purged, in order to
avoid expensive operations involving the entire
set of removed snaps. The snaps are trimmed from
deleting snaps at the same time as they are added
to the pg_info_t.purged_snaps set.

Suggested-by: Greg Farnum <gfarnum@redhat.com>
Signed-off-by: Zac Medico <zmedico@gmail.com>
@@ -91,6 +91,7 @@ DEFINE_CEPH_FEATURE(15, 1, MONENC)
DEFINE_CEPH_FEATURE_RETIRED(16, 1, QUERY_T, JEWEL, LUMINOUS)
DEFINE_CEPH_FEATURE(16, 3, SERVER_O)
DEFINE_CEPH_FEATURE_RETIRED(17, 1, INDEP_PG_MAP, JEWEL, LUMINOUS)
DEFINE_CEPH_FEATURE(17, 1, OSD_DELETING_SNAPS)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we may want to piggy-back this on SERVER_M to avoid spending a distinct bit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need the feature bit at all. We only use it when encoding and decoding the pg_pool_t, but we can do that entirely based on the local struct_v. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What information do we use to choose between 26 and 27 here, if we don't use a flag like OSD_DELETING_SNAPS or SERVER_M?

  uint8_t v = 27;
  if (!HAVE_FEATURE(features, OSD_DELETING_SNAPS)) {
    v = 26;
  }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go this route I think we do need feature-dependent decoding. We have a newish requirement that the canonical OSDMaps encoded at the monitor be encoded in the way that the oldest osd in the cluster would encode (to avoid divergence and the need for OSDs to pull full osdmaps from the mon). And pg_pool_t is in the OSDMap. We should just reuse SERVER_MIMIC for this.

But.. I'm not so sure we want to go down this path. I agree that it's a big win compared to the status quo, but once we have a more complete fix I think it will be needlessly complex. I think the question is whether we expect to address that in the mimic timeframe. Started pondering that... let's discuss next week?

@liewegas
Copy link
Member

liewegas commented Oct 6, 2017

My other thought is that if the plan is to make purged_snaps only reflect snaps that have been purged recently but not yet retired into some other non-OSDMap-resident structure, then the sets will be small and this might not be that helpful.

@@ -1310,6 +1310,14 @@ bool pg_pool_t::maybe_updated_removed_snaps(const interval_set<snapid_t>& cached
return true;
}

void pg_pool_t::purged_snap(const snapid_t& purged) {
deleting_snaps.erase(purged);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this needs to check if deleting_snaps contains the purged snap, since I have an OSD that crashes during startup as follows:

 ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)
 1: (()+0x9fbb3e) [0x562b96be1b3e]
 2: (()+0x10ea0) [0x7f0753f13ea0]
 3: (gsignal()+0x38) [0x7f075328a148]
 4: (abort()+0x16a) [0x7f075328b5ca]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x28e) [0x562b96c2489e]
 6: (pg_pool_t::purged_snap(snapid_t const&)+0x245) [0x562b968b64f5]
 7: (PrimaryLogPG::AwaitAsyncWork::react(PrimaryLogPG::DoSnapWork const&)+0x105a) [0x562b96803fca]
 8: (boost::statechart::simple_state<PrimaryLogPG::AwaitAsyncWork, PrimaryLogPG::Trimming, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::
na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0xca) [0x562b96854a3a]
 9: (boost::statechart::state_machine<PrimaryLogPG::SnapTrimmer, PrimaryLogPG::NotTrimming, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x56) [0x562b96835146]
 10: (PrimaryLogPG::snap_trimmer(unsigned int)+0x19c) [0x562b967a163c]
 11: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x100c) [0x562b9666987c]
 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x884) [0x562b96c29504]
 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x562b96c2c540]
 14: (()+0x73e4) [0x7f0753f0a3e4]
 15: (clone()+0x6d) [0x7f075333ea9d]

This will fix the follwing crash during OSD startup:

 ceph version 12.2.0 (32ce2a3) luminous (rc)
 1: (()+0x9fbb3e) [0x562b96be1b3e]
 2: (()+0x10ea0) [0x7f0753f13ea0]
 3: (gsignal()+0x38) [0x7f075328a148]
 4: (abort()+0x16a) [0x7f075328b5ca]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x28e) [0x562b96c2489e]
 6: (pg_pool_t::purged_snap(snapid_t const&)+0x245) [0x562b968b64f5]
 7: (PrimaryLogPG::AwaitAsyncWork::react(PrimaryLogPG::DoSnapWork const&)+0x105a) [0x562b96803fca]
 8: (boost::statechart::simple_state<PrimaryLogPG::AwaitAsyncWork, PrimaryLogPG::Trimming, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0xca) [0x562b96854a3a]
 9: (boost::statechart::state_machine<PrimaryLogPG::SnapTrimmer, PrimaryLogPG::NotTrimming, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x56) [0x562b96835146]
 10: (PrimaryLogPG::snap_trimmer(unsigned int)+0x19c) [0x562b967a163c]
 11: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x100c) [0x562b9666987c]
 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x884) [0x562b96c29504]
 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x562b96c2c540]
 14: (()+0x73e4) [0x7f0753f0a3e4]
 15: (clone()+0x6d) [0x7f075333ea9d]

Fixes: 31d20fc ("osd/PGPool::update: optimize with deleting_snaps")
Signed-off-by: Zac Medico <zmedico@gmail.com>
@@ -1311,7 +1311,8 @@ bool pg_pool_t::maybe_updated_removed_snaps(const interval_set<snapid_t>& cached
}

void pg_pool_t::purged_snap(const snapid_t& purged) {
deleting_snaps.erase(purged);
if (deleting_snaps.contains(purged, 1))
deleting_snaps.erase(purged);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be worth doing the iterator-based find() and erase, instead of having to look it up twice.
Normally I wouldn't fuss about something like that, but since you're having CPU issues dealing with the size of these sets... ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. However, in order to do this cleanly, it looks like we'll have to change the interval_set::erase signature to void erase(T start, T len, bool ignore_missing=false) or something like that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh boo, I was just thinking in terms of the STL. Presumably we'd want to add an iterator-based erease() rather than remove the existing interface though.

@gregsfortytwo
Copy link
Member

So based on the name I thought this PR was the (much more extensive!) thing involving the monitor and actually reducing the size of the stuff in the OSDMap. But going through here, it's actually "just" reducing the size of the set used for doing pg-local comparisons against the OSDMap.

So, that is presumably actually useful. But it's also not reducing the memory used at all (in fact, it's inflating it a bit), which I thought was also part of what you wanted out of doing the local caching like this. Do you have numbers about how this set of things CPU and memory use?

@zmedico
Copy link
Contributor Author

zmedico commented Oct 6, 2017

I agree that it would be better to work toward reducing the size of the OSDMap. Given that, I'm not sure that it's really worthwhile to attempt to measure the benefits of this patch. Maybe it's better to simply abandon it.

My feeling is that the other optimizations that have recently been merged (like #17410 and #17820) are probably good enough for anyone in need of short-term solutions.

@gregsfortytwo
Copy link
Member

I may have sounded more down on this than I meant. The patch is certainly a good direction to go. I just want to make sure we're not making things worse for these extreme use cases, and that if we're going to complicate the code it's extensible for the future.

src/osd/PG.cc Outdated
interval_set<snapid_t> removed_snaps = newly_removed_snaps;
newly_removed_snaps.subtract(cached_removed_snaps);
cached_removed_snaps.swap(removed_snaps);
if (cached_deleting_snaps.subset_of(pi->get_deleting_snaps())) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that the subset_of logic is not quite right here, since deleting_snaps is expected to shrink when snaps are purged.

The deleting_snaps set is expected to shrink when a
snap is purged, so cached_deleting_snaps is typically
not expected to be a subset of cached_deleting_snaps.

Fixes: 31d20fc ("osd/PGPool::update: optimize with deleting_snaps")
Signed-off-by: Zac Medico <zmedico@gmail.com>
@zmedico
Copy link
Contributor Author

zmedico commented Nov 17, 2017

Closing in favor of #18276.

@zmedico zmedico closed this Nov 17, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants