New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
osd/PGPool::update: optimize with subset_of #17820
osd/PGPool::update: optimize with subset_of #17820
Conversation
I need to benchmark this. It looks like subset_of might need to be optimized, since it calls contains, which uses the find_inc/lower_bound method. In order for subset_of to be viable, it would need to dynamically choose between sequential search and lower_bound methods, like the intersect_of implementation since #17088. |
2e77d46
to
3e14687
Compare
I've added an optimized subset_of implementation, and my benchmark program is showing a 38% improvement in performance. |
c224030
to
aad01d5
Compare
I've updated unittest_interval_set to cover all cases of the |
ceph#17820 (1 of 2) Optimize subset_of to use sequential search when it performs better than the lower_bound method, for set size ratios smaller than 10. This is analogous to intersection_of behavior since commit 825470f. The subset_of method can be used in some cases as a less-expensive alternative to the intersection_of method, since subset_of can return early if any element of the smaller set is not contained in the larger set, and intersection_of has the added burden of storing the intersecting elements. Signed-off-by: Zac Medico <zmedico@gmail.com>
ceph#17820 (2 of 2) Replace expensive inverval_set intersection_of and operator== calls with a single subset_of call. I borrowed this idea from Piotr Dałek's "osd/PGPool: don't use intermediate interval set" patch. The following benchmark program demonstrates a 38% performance increase: #include <iostream> #include <chrono> #include "include/interval_set.h" #define NANOSECONDS(d) \ std::chrono::duration_cast<std::chrono::nanoseconds>(d).count() typedef uint64_t snapid_t; typedef std::chrono::steady_clock::duration duration; duration PGPool_update_old(const interval_set<snapid_t> &rs) { std::chrono::steady_clock::time_point start, end; interval_set<snapid_t> newly_removed_snaps, cached_removed_snaps; // initialize state cached_removed_snaps = rs; // start timed simulation start = std::chrono::steady_clock::now(); { newly_removed_snaps = cached_removed_snaps; interval_set<snapid_t> intersection; intersection.intersection_of(newly_removed_snaps, cached_removed_snaps); assert(intersection == cached_removed_snaps); cached_removed_snaps.swap(newly_removed_snaps); newly_removed_snaps = cached_removed_snaps; newly_removed_snaps.subtract(intersection); } // end timed simulation end = std::chrono::steady_clock::now(); return end - start; } duration PGPool_update_new(const interval_set<snapid_t> &rs) { std::chrono::steady_clock::time_point start, end; interval_set<snapid_t> newly_removed_snaps, cached_removed_snaps; // initialize state cached_removed_snaps = rs; // start timed simulation start = std::chrono::steady_clock::now(); { newly_removed_snaps = cached_removed_snaps; assert(cached_removed_snaps.subset_of(newly_removed_snaps)); interval_set<snapid_t> removed_snaps = newly_removed_snaps; newly_removed_snaps.subtract(cached_removed_snaps); cached_removed_snaps.swap(removed_snaps); } // end timed simulation end = std::chrono::steady_clock::now(); return end - start; } int main(int argc, char *argv[]) { assert(argc == 3); const int sample_count = std::stoi(argv[1]); const int interval_count = std::stoi(argv[2]); const int interval_distance = 4; const int interval_size = 2; const int max_offset = interval_count * interval_distance; interval_set<snapid_t> removed_snaps; for (int i = 0; i < max_offset; i += interval_distance) removed_snaps.insert(i, interval_size); duration old_delta(0), new_delta(0); for (int i = 0; i < sample_count; ++i) { old_delta += PGPool_update_old(removed_snaps); new_delta += PGPool_update_new(removed_snaps); } float ratio = float(NANOSECONDS(old_delta)) / NANOSECONDS(new_delta); std::cout << ratio << std::endl; } Suggested-by: Piotr Dałek <piotr.dalek@corp.ovh.com> Signed-off-by: Zac Medico <zmedico@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good except for a missing bounds check.
src/include/interval_set.h
Outdated
while (pa != a_end && pb != b_end) { | ||
|
||
if (pb->first + pb->second <= pa->first) | ||
{ ++pb; continue; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will break if you give it input where pb runs through to the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pb != b_end
bound is checked by the outer while loop. I did think about using an inner while loop here, but it didn't seem worth it since in only optimizes away one pa != a_end
comparison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll go ahead and change it to a while loop with bounds check.
aad01d5
to
710e253
Compare
Added the following bounds check: while (pb->first + pb->second <= pa->first) {
++pb;
if (pb == b_end)
return false;
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, I guess since I misread it that's a good enough reason to change — thanks!
I now believe this is functional but I noted another confusing point...
src/include/interval_set.h
Outdated
b_end = b.m.end(); | ||
|
||
while (pa != a_end && pb != b_end) { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No big deal but FYI for next time: we don't usually put that much whitespace in logic flows like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I've removed some whitespace in the latest update. Let me know how it looks.
src/include/interval_set.h
Outdated
} | ||
|
||
if (pa->first < pb->first) | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, should have noticed this before...this check is stronger than if (pa->first + pa->second <= pb->first)
, but I don't think it needs to follow.
Why not just use it to replace the prior one? I think that will make it clearer over all:
- skip through any extra intervals in (what we assume to be) the longer set, if it doesn't overlap the current shorter set
- if the shorter set has an interval starting point which we just found out isn't covered by the longer set, return false
- if the intervals are equal, iterate to the next one in each set until the intervals aren't equal (and we haven't run out), then start over from the top
- if the shorter set's interval is longer than the longer set's, return false
- or else move to the next interval in the shorter set and loop to the top
Instead of what looks like 1, weak 2, 3, strong 2, 4, 5 as written. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both pa->first < pb->first
and pa->first + pa->second <= pb->first
are distinct cases that need to be handled.
pa->first < pb->first
means the shorter set's interval begins before the other interval
pa->first + pa->second <= pb->first
means the shorter set's interval ends before the other interval begins
So, there's no redundancy in the existing logic. It's tricky enough to visualize that maybe we should have comments for each case, like:
// interval begins before other
if (pa->first < pb->first)
return false;
// interval ends before other begins
if (pa->first + pa->second <= pb->first)
return false;
// interval is longer than other
if (pa->first + pa->second > pb->first + pb->second)
return false;
I'd also like to add an earlier range_end() comparison back in the subset_of method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see it now, "interval ends before other begins" is a case of "interval begins before other", so I've removed the redundant case.
Let's get it into the next testing run somebody does as-is though, in case I'm missing something. |
710e253
to
11bba1c
Compare
Updates:
|
src/include/interval_set.h
Outdated
@@ -286,6 +286,43 @@ class interval_set { | |||
} | |||
} | |||
} | |||
|
|||
bool subset_size_sym(const interval_set &b) const { | |||
typename decltype(m)::const_iterator pa = m.begin(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might want to use
auto pa = m.cbegin(), pb = b.m.cbegin();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, done.
23938f7
to
1f9d8a4
Compare
Okay, I believe the docs issue was in master and fixed in master, and this looks great. Just pending a testing branch run with it. |
Optimize subset_of to use sequential search when it performs better than the lower_bound method, for set size ratios smaller than 10. This is analogous to intersection_of behavior since commit 825470f. The subset_of method can be used in some cases as a less-expensive alternative to the intersection_of method, since subset_of can return early if any element of the smaller set is not contained in the larger set, and intersection_of has the added burden of storing the intersecting elements. Signed-off-by: Zac Medico <zmedico@gmail.com>
1f9d8a4
to
8935f65
Compare
Rebased on changes from caf6803. |
Replace expensive inverval_set intersection_of and operator== calls with a single subset_of call. I borrowed this idea from Piotr Dałek's "osd/PGPool: don't use intermediate interval set" patch. The following benchmark program demonstrates a 38% performance increase: #include <iostream> #include <chrono> #include "include/interval_set.h" #define NANOSECONDS(d) \ std::chrono::duration_cast<std::chrono::nanoseconds>(d).count() typedef uint64_t snapid_t; typedef std::chrono::steady_clock::duration duration; duration PGPool_update_old(const interval_set<snapid_t> &rs) { std::chrono::steady_clock::time_point start, end; interval_set<snapid_t> newly_removed_snaps, cached_removed_snaps; // initialize state cached_removed_snaps = rs; // start timed simulation start = std::chrono::steady_clock::now(); { newly_removed_snaps = cached_removed_snaps; interval_set<snapid_t> intersection; intersection.intersection_of(newly_removed_snaps, cached_removed_snaps); assert(intersection == cached_removed_snaps); cached_removed_snaps.swap(newly_removed_snaps); newly_removed_snaps = cached_removed_snaps; newly_removed_snaps.subtract(intersection); } // end timed simulation end = std::chrono::steady_clock::now(); return end - start; } duration PGPool_update_new(const interval_set<snapid_t> &rs) { std::chrono::steady_clock::time_point start, end; interval_set<snapid_t> newly_removed_snaps, cached_removed_snaps; // initialize state cached_removed_snaps = rs; // start timed simulation start = std::chrono::steady_clock::now(); { newly_removed_snaps = cached_removed_snaps; assert(cached_removed_snaps.subset_of(newly_removed_snaps)); interval_set<snapid_t> removed_snaps = newly_removed_snaps; newly_removed_snaps.subtract(cached_removed_snaps); cached_removed_snaps.swap(removed_snaps); } // end timed simulation end = std::chrono::steady_clock::now(); return end - start; } int main(int argc, char *argv[]) { assert(argc == 3); const int sample_count = std::stoi(argv[1]); const int interval_count = std::stoi(argv[2]); const int interval_distance = 4; const int interval_size = 2; const int max_offset = interval_count * interval_distance; interval_set<snapid_t> removed_snaps; for (int i = 0; i < max_offset; i += interval_distance) removed_snaps.insert(i, interval_size); duration old_delta(0), new_delta(0); for (int i = 0; i < sample_count; ++i) { old_delta += PGPool_update_old(removed_snaps); new_delta += PGPool_update_new(removed_snaps); } float ratio = float(NANOSECONDS(old_delta)) / NANOSECONDS(new_delta); std::cout << ratio << std::endl; } Suggested-by: Piotr Dałek <piotr.dalek@corp.ovh.com> Signed-off-by: Zac Medico <zmedico@gmail.com>
8935f65
to
18cbba4
Compare
RADOS |
ceph#17820 merged in 5c37758 Optimize subset_of to use sequential search when it performs better than the lower_bound method, for set size ratios smaller than 10. This is analogous to intersection_of behavior since commit 825470f. The subset_of method can be used in some cases as a less-expensive alternative to the intersection_of method, since subset_of can return early if any element of the smaller set is not contained in the larger set, and intersection_of has the added burden of storing the intersecting elements. Signed-off-by: Zac Medico <zmedico@gmail.com>
ceph#17820 merged in 18cbba4 Replace expensive inverval_set intersection_of and operator== calls with a single subset_of call. I borrowed this idea from Piotr Dałek's "osd/PGPool: don't use intermediate interval set" patch. The following benchmark program demonstrates a 38% performance increase: #include <iostream> #include <chrono> #include "include/interval_set.h" #define NANOSECONDS(d) \ std::chrono::duration_cast<std::chrono::nanoseconds>(d).count() typedef uint64_t snapid_t; typedef std::chrono::steady_clock::duration duration; duration PGPool_update_old(const interval_set<snapid_t> &rs) { std::chrono::steady_clock::time_point start, end; interval_set<snapid_t> newly_removed_snaps, cached_removed_snaps; // initialize state cached_removed_snaps = rs; // start timed simulation start = std::chrono::steady_clock::now(); { newly_removed_snaps = cached_removed_snaps; interval_set<snapid_t> intersection; intersection.intersection_of(newly_removed_snaps, cached_removed_snaps); assert(intersection == cached_removed_snaps); cached_removed_snaps.swap(newly_removed_snaps); newly_removed_snaps = cached_removed_snaps; newly_removed_snaps.subtract(intersection); } // end timed simulation end = std::chrono::steady_clock::now(); return end - start; } duration PGPool_update_new(const interval_set<snapid_t> &rs) { std::chrono::steady_clock::time_point start, end; interval_set<snapid_t> newly_removed_snaps, cached_removed_snaps; // initialize state cached_removed_snaps = rs; // start timed simulation start = std::chrono::steady_clock::now(); { newly_removed_snaps = cached_removed_snaps; assert(cached_removed_snaps.subset_of(newly_removed_snaps)); interval_set<snapid_t> removed_snaps = newly_removed_snaps; newly_removed_snaps.subtract(cached_removed_snaps); cached_removed_snaps.swap(removed_snaps); } // end timed simulation end = std::chrono::steady_clock::now(); return end - start; } int main(int argc, char *argv[]) { assert(argc == 3); const int sample_count = std::stoi(argv[1]); const int interval_count = std::stoi(argv[2]); const int interval_distance = 4; const int interval_size = 2; const int max_offset = interval_count * interval_distance; interval_set<snapid_t> removed_snaps; for (int i = 0; i < max_offset; i += interval_distance) removed_snaps.insert(i, interval_size); duration old_delta(0), new_delta(0); for (int i = 0; i < sample_count; ++i) { old_delta += PGPool_update_old(removed_snaps); new_delta += PGPool_update_new(removed_snaps); } float ratio = float(NANOSECONDS(old_delta)) / NANOSECONDS(new_delta); std::cout << ratio << std::endl; } Suggested-by: Piotr Dałek <piotr.dalek@corp.ovh.com> Signed-off-by: Zac Medico <zmedico@gmail.com>
Replace expensive inverval_set intersection_of and operator==
calls with a single subset_of call. I borrowed this idea from
Piotr Dałek's "osd/PGPool: don't use intermediate interval set"
patch. The following benchmark program demonstrates a 38%
performance increase:
@branch-predictor