osd/PGPool::update: optimize with subset_of #17820

zmedico · 2017-09-20T02:14:24Z

Replace expensive inverval_set intersection_of and operator==
calls with a single subset_of call. I borrowed this idea from
Piotr Dałek's "osd/PGPool: don't use intermediate interval set"
patch. The following benchmark program demonstrates a 38%
performance increase:

#include <iostream>
#include <chrono>
#include "include/interval_set.h"

#define NANOSECONDS(d) \
    std::chrono::duration_cast<std::chrono::nanoseconds>(d).count()

typedef uint64_t snapid_t;
typedef std::chrono::steady_clock::duration duration;

duration PGPool_update_old(const interval_set<snapid_t> &rs) {
  std::chrono::steady_clock::time_point start, end;
  interval_set<snapid_t> newly_removed_snaps, cached_removed_snaps;

  // initialize state
  cached_removed_snaps = rs;

  // start timed simulation
  start = std::chrono::steady_clock::now();

  {
    newly_removed_snaps = cached_removed_snaps;
    interval_set<snapid_t> intersection;
    intersection.intersection_of(newly_removed_snaps, cached_removed_snaps);

    assert(intersection == cached_removed_snaps);
    cached_removed_snaps.swap(newly_removed_snaps);
    newly_removed_snaps = cached_removed_snaps;
    newly_removed_snaps.subtract(intersection);
  }

  // end timed simulation
  end = std::chrono::steady_clock::now();

  return end - start;
}

duration PGPool_update_new(const interval_set<snapid_t> &rs) {
  std::chrono::steady_clock::time_point start, end;
  interval_set<snapid_t> newly_removed_snaps, cached_removed_snaps;

  // initialize state
  cached_removed_snaps = rs;

  // start timed simulation
  start = std::chrono::steady_clock::now();

  {
    newly_removed_snaps = cached_removed_snaps;

    assert(cached_removed_snaps.subset_of(newly_removed_snaps));
    interval_set<snapid_t> removed_snaps = newly_removed_snaps;
    newly_removed_snaps.subtract(cached_removed_snaps);
    cached_removed_snaps.swap(removed_snaps);
  }

  // end timed simulation
  end = std::chrono::steady_clock::now();

  return end - start;
}

int main(int argc, char *argv[])
{
  assert(argc == 3);
  const int sample_count = std::stoi(argv[1]);
  const int interval_count = std::stoi(argv[2]);
  const int interval_distance = 4;
  const int interval_size = 2;
  const int max_offset = interval_count * interval_distance;
  interval_set<snapid_t> removed_snaps;

  for (int i = 0; i < max_offset; i += interval_distance)
    removed_snaps.insert(i, interval_size);

  duration old_delta(0), new_delta(0);

  for (int i = 0; i < sample_count; ++i) {
    old_delta += PGPool_update_old(removed_snaps);
    new_delta += PGPool_update_new(removed_snaps);
  }

  float ratio = float(NANOSECONDS(old_delta)) / NANOSECONDS(new_delta);

  std::cout << ratio << std::endl;
}

@branch-predictor

zmedico · 2017-09-20T02:37:27Z

I need to benchmark this. It looks like subset_of might need to be optimized, since it calls contains, which uses the find_inc/lower_bound method. In order for subset_of to be viable, it would need to dynamically choose between sequential search and lower_bound methods, like the intersect_of implementation since #17088.

zmedico · 2017-09-22T04:18:40Z

I've added an optimized subset_of implementation, and my benchmark program is showing a 38% improvement in performance.

zmedico · 2017-09-22T20:03:35Z

I've updated unittest_interval_set to cover all cases of the subset_size_sym function.

ceph#17820 (1 of 2) Optimize subset_of to use sequential search when it performs better than the lower_bound method, for set size ratios smaller than 10. This is analogous to intersection_of behavior since commit 825470f. The subset_of method can be used in some cases as a less-expensive alternative to the intersection_of method, since subset_of can return early if any element of the smaller set is not contained in the larger set, and intersection_of has the added burden of storing the intersecting elements. Signed-off-by: Zac Medico <zmedico@gmail.com>

ceph#17820 (2 of 2) Replace expensive inverval_set intersection_of and operator== calls with a single subset_of call. I borrowed this idea from Piotr Dałek's "osd/PGPool: don't use intermediate interval set" patch. The following benchmark program demonstrates a 38% performance increase: #include <iostream> #include <chrono> #include "include/interval_set.h" #define NANOSECONDS(d) \ std::chrono::duration_cast<std::chrono::nanoseconds>(d).count() typedef uint64_t snapid_t; typedef std::chrono::steady_clock::duration duration; duration PGPool_update_old(const interval_set<snapid_t> &rs) { std::chrono::steady_clock::time_point start, end; interval_set<snapid_t> newly_removed_snaps, cached_removed_snaps; // initialize state cached_removed_snaps = rs; // start timed simulation start = std::chrono::steady_clock::now(); { newly_removed_snaps = cached_removed_snaps; interval_set<snapid_t> intersection; intersection.intersection_of(newly_removed_snaps, cached_removed_snaps); assert(intersection == cached_removed_snaps); cached_removed_snaps.swap(newly_removed_snaps); newly_removed_snaps = cached_removed_snaps; newly_removed_snaps.subtract(intersection); } // end timed simulation end = std::chrono::steady_clock::now(); return end - start; } duration PGPool_update_new(const interval_set<snapid_t> &rs) { std::chrono::steady_clock::time_point start, end; interval_set<snapid_t> newly_removed_snaps, cached_removed_snaps; // initialize state cached_removed_snaps = rs; // start timed simulation start = std::chrono::steady_clock::now(); { newly_removed_snaps = cached_removed_snaps; assert(cached_removed_snaps.subset_of(newly_removed_snaps)); interval_set<snapid_t> removed_snaps = newly_removed_snaps; newly_removed_snaps.subtract(cached_removed_snaps); cached_removed_snaps.swap(removed_snaps); } // end timed simulation end = std::chrono::steady_clock::now(); return end - start; } int main(int argc, char *argv[]) { assert(argc == 3); const int sample_count = std::stoi(argv[1]); const int interval_count = std::stoi(argv[2]); const int interval_distance = 4; const int interval_size = 2; const int max_offset = interval_count * interval_distance; interval_set<snapid_t> removed_snaps; for (int i = 0; i < max_offset; i += interval_distance) removed_snaps.insert(i, interval_size); duration old_delta(0), new_delta(0); for (int i = 0; i < sample_count; ++i) { old_delta += PGPool_update_old(removed_snaps); new_delta += PGPool_update_new(removed_snaps); } float ratio = float(NANOSECONDS(old_delta)) / NANOSECONDS(new_delta); std::cout << ratio << std::endl; } Suggested-by: Piotr Dałek <piotr.dalek@corp.ovh.com> Signed-off-by: Zac Medico <zmedico@gmail.com>

gregsfortytwo

Looks good except for a missing bounds check.

gregsfortytwo · 2017-09-22T23:23:58Z

src/include/interval_set.h

+    while (pa != a_end && pb != b_end) {
+
+      if (pb->first + pb->second <= pa->first)
+        { ++pb;  continue; }


This will break if you give it input where pb runs through to the end.

The pb != b_end bound is checked by the outer while loop. I did think about using an inner while loop here, but it didn't seem worth it since in only optimizes away one pa != a_end comparison.

I'll go ahead and change it to a while loop with bounds check.

zmedico · 2017-09-25T19:02:31Z

Added the following bounds check:

      while (pb->first + pb->second <= pa->first) {
        ++pb;
        if (pb == b_end)
          return false;
      }

gregsfortytwo

Heh, I guess since I misread it that's a good enough reason to change — thanks!

I now believe this is functional but I noted another confusing point...

gregsfortytwo · 2017-09-25T21:01:32Z

src/include/interval_set.h

+                                               b_end = b.m.end();
+
+    while (pa != a_end && pb != b_end) {
+


No big deal but FYI for next time: we don't usually put that much whitespace in logic flows like this.

Okay, I've removed some whitespace in the latest update. Let me know how it looks.

gregsfortytwo · 2017-09-25T21:12:28Z

src/include/interval_set.h

+      }
+
+      if (pa->first < pb->first)
+        return false;


Sorry, should have noticed this before...this check is stronger than if (pa->first + pa->second <= pb->first), but I don't think it needs to follow.

Why not just use it to replace the prior one? I think that will make it clearer over all:

skip through any extra intervals in (what we assume to be) the longer set, if it doesn't overlap the current shorter set

if the shorter set has an interval starting point which we just found out isn't covered by the longer set, return false

if the intervals are equal, iterate to the next one in each set until the intervals aren't equal (and we haven't run out), then start over from the top

if the shorter set's interval is longer than the longer set's, return false

or else move to the next interval in the shorter set and loop to the top

Instead of what looks like 1, weak 2, 3, strong 2, 4, 5 as written. :)

Both pa->first < pb->first and pa->first + pa->second <= pb->first are distinct cases that need to be handled.

pa->first < pb->first means the shorter set's interval begins before the other interval

pa->first + pa->second <= pb->first means the shorter set's interval ends before the other interval begins

So, there's no redundancy in the existing logic. It's tricky enough to visualize that maybe we should have comments for each case, like:

// interval begins before other if (pa->first < pb->first) return false; // interval ends before other begins if (pa->first + pa->second <= pb->first) return false; // interval is longer than other if (pa->first + pa->second > pb->first + pb->second) return false;

I'd also like to add an earlier range_end() comparison back in the subset_of method.

I see it now, "interval ends before other begins" is a case of "interval begins before other", so I've removed the redundant case.

gregsfortytwo · 2017-09-26T00:33:17Z

Let's get it into the next testing run somebody does as-is though, in case I'm missing something.

zmedico · 2017-09-26T07:14:16Z

Updates:

Removed some whitespace
Removed redundant "interval ends before other begins" case, and added comments
Added early range_end() > big.range_end() check to subset_of

tchaikov · 2017-09-26T07:23:20Z

src/include/interval_set.h

@@ -286,6 +286,43 @@ class interval_set {
      }
    }
  }
+
+  bool subset_size_sym(const interval_set &b) const {
+    typename decltype(m)::const_iterator pa = m.begin(),


might want to use

auto pa = m.cbegin(), pb = b.m.cbegin();

Thanks, done.

gregsfortytwo · 2017-09-26T17:54:14Z

Okay, I believe the docs issue was in master and fixed in master, and this looks great. Just pending a testing branch run with it.

Optimize subset_of to use sequential search when it performs better than the lower_bound method, for set size ratios smaller than 10. This is analogous to intersection_of behavior since commit 825470f. The subset_of method can be used in some cases as a less-expensive alternative to the intersection_of method, since subset_of can return early if any element of the smaller set is not contained in the larger set, and intersection_of has the added burden of storing the intersecting elements. Signed-off-by: Zac Medico <zmedico@gmail.com>

zmedico · 2017-09-27T00:41:33Z

Rebased on changes from caf6803.

Replace expensive inverval_set intersection_of and operator== calls with a single subset_of call. I borrowed this idea from Piotr Dałek's "osd/PGPool: don't use intermediate interval set" patch. The following benchmark program demonstrates a 38% performance increase: #include <iostream> #include <chrono> #include "include/interval_set.h" #define NANOSECONDS(d) \ std::chrono::duration_cast<std::chrono::nanoseconds>(d).count() typedef uint64_t snapid_t; typedef std::chrono::steady_clock::duration duration; duration PGPool_update_old(const interval_set<snapid_t> &rs) { std::chrono::steady_clock::time_point start, end; interval_set<snapid_t> newly_removed_snaps, cached_removed_snaps; // initialize state cached_removed_snaps = rs; // start timed simulation start = std::chrono::steady_clock::now(); { newly_removed_snaps = cached_removed_snaps; interval_set<snapid_t> intersection; intersection.intersection_of(newly_removed_snaps, cached_removed_snaps); assert(intersection == cached_removed_snaps); cached_removed_snaps.swap(newly_removed_snaps); newly_removed_snaps = cached_removed_snaps; newly_removed_snaps.subtract(intersection); } // end timed simulation end = std::chrono::steady_clock::now(); return end - start; } duration PGPool_update_new(const interval_set<snapid_t> &rs) { std::chrono::steady_clock::time_point start, end; interval_set<snapid_t> newly_removed_snaps, cached_removed_snaps; // initialize state cached_removed_snaps = rs; // start timed simulation start = std::chrono::steady_clock::now(); { newly_removed_snaps = cached_removed_snaps; assert(cached_removed_snaps.subset_of(newly_removed_snaps)); interval_set<snapid_t> removed_snaps = newly_removed_snaps; newly_removed_snaps.subtract(cached_removed_snaps); cached_removed_snaps.swap(removed_snaps); } // end timed simulation end = std::chrono::steady_clock::now(); return end - start; } int main(int argc, char *argv[]) { assert(argc == 3); const int sample_count = std::stoi(argv[1]); const int interval_count = std::stoi(argv[2]); const int interval_distance = 4; const int interval_size = 2; const int max_offset = interval_count * interval_distance; interval_set<snapid_t> removed_snaps; for (int i = 0; i < max_offset; i += interval_distance) removed_snaps.insert(i, interval_size); duration old_delta(0), new_delta(0); for (int i = 0; i < sample_count; ++i) { old_delta += PGPool_update_old(removed_snaps); new_delta += PGPool_update_new(removed_snaps); } float ratio = float(NANOSECONDS(old_delta)) / NANOSECONDS(new_delta); std::cout << ratio << std::endl; } Suggested-by: Piotr Dałek <piotr.dalek@corp.ovh.com> Signed-off-by: Zac Medico <zmedico@gmail.com>

yuriw · 2017-09-28T22:52:10Z

RBD
http://pulpito.ceph.com/yuriw-2017-09-27_15:26:37-rbd-wip-yuri-testing4-2017-09-27-1402-distro-basic-smithi/

http://pulpito.ceph.com/yuriw-2017-09-27_20:29:25-rbd-wip-yuri-testing4-2017-09-27-1402-distro-basic-smithi/

RADOS

http://pulpito.ceph.com/yuriw-2017-09-27_15:37:08-rados-wip-yuri-testing4-2017-09-27-1402-distro-basic-smithi/

http://pulpito.ceph.com/yuriw-2017-09-28_02:30:38-rados-wip-yuri-testing4-2017-09-27-1402-distro-basic-smithi/

ceph#17820 merged in 5c37758 Optimize subset_of to use sequential search when it performs better than the lower_bound method, for set size ratios smaller than 10. This is analogous to intersection_of behavior since commit 825470f. The subset_of method can be used in some cases as a less-expensive alternative to the intersection_of method, since subset_of can return early if any element of the smaller set is not contained in the larger set, and intersection_of has the added burden of storing the intersecting elements. Signed-off-by: Zac Medico <zmedico@gmail.com>

ceph#17820 merged in 18cbba4 Replace expensive inverval_set intersection_of and operator== calls with a single subset_of call. I borrowed this idea from Piotr Dałek's "osd/PGPool: don't use intermediate interval set" patch. The following benchmark program demonstrates a 38% performance increase: #include <iostream> #include <chrono> #include "include/interval_set.h" #define NANOSECONDS(d) \ std::chrono::duration_cast<std::chrono::nanoseconds>(d).count() typedef uint64_t snapid_t; typedef std::chrono::steady_clock::duration duration; duration PGPool_update_old(const interval_set<snapid_t> &rs) { std::chrono::steady_clock::time_point start, end; interval_set<snapid_t> newly_removed_snaps, cached_removed_snaps; // initialize state cached_removed_snaps = rs; // start timed simulation start = std::chrono::steady_clock::now(); { newly_removed_snaps = cached_removed_snaps; interval_set<snapid_t> intersection; intersection.intersection_of(newly_removed_snaps, cached_removed_snaps); assert(intersection == cached_removed_snaps); cached_removed_snaps.swap(newly_removed_snaps); newly_removed_snaps = cached_removed_snaps; newly_removed_snaps.subtract(intersection); } // end timed simulation end = std::chrono::steady_clock::now(); return end - start; } duration PGPool_update_new(const interval_set<snapid_t> &rs) { std::chrono::steady_clock::time_point start, end; interval_set<snapid_t> newly_removed_snaps, cached_removed_snaps; // initialize state cached_removed_snaps = rs; // start timed simulation start = std::chrono::steady_clock::now(); { newly_removed_snaps = cached_removed_snaps; assert(cached_removed_snaps.subset_of(newly_removed_snaps)); interval_set<snapid_t> removed_snaps = newly_removed_snaps; newly_removed_snaps.subtract(cached_removed_snaps); cached_removed_snaps.swap(removed_snaps); } // end timed simulation end = std::chrono::steady_clock::now(); return end - start; } int main(int argc, char *argv[]) { assert(argc == 3); const int sample_count = std::stoi(argv[1]); const int interval_count = std::stoi(argv[2]); const int interval_distance = 4; const int interval_size = 2; const int max_offset = interval_count * interval_distance; interval_set<snapid_t> removed_snaps; for (int i = 0; i < max_offset; i += interval_distance) removed_snaps.insert(i, interval_size); duration old_delta(0), new_delta(0); for (int i = 0; i < sample_count; ++i) { old_delta += PGPool_update_old(removed_snaps); new_delta += PGPool_update_new(removed_snaps); } float ratio = float(NANOSECONDS(old_delta)) / NANOSECONDS(new_delta); std::cout << ratio << std::endl; } Suggested-by: Piotr Dałek <piotr.dalek@corp.ovh.com> Signed-off-by: Zac Medico <zmedico@gmail.com>

zmedico mentioned this pull request Sep 20, 2017

osd_types: optimized version of pg_pool_t::build_removed_snaps #17493

Closed

zmedico changed the title ~~PGPool::update: optimize with subset_of~~ osd/PGPool::update: optimize with subset_of Sep 20, 2017

zmedico mentioned this pull request Sep 20, 2017

osd/PGPool::update: optimize with std::vector for intersection results #17618

Closed

zmedico force-pushed the PGPool-update-optimize-with-subset-of branch 5 times, most recently from 2e77d46 to 3e14687 Compare September 22, 2017 03:30

zmedico force-pushed the PGPool-update-optimize-with-subset-of branch 3 times, most recently from c224030 to aad01d5 Compare September 22, 2017 19:27

gregsfortytwo requested changes Sep 22, 2017

View reviewed changes

zmedico force-pushed the PGPool-update-optimize-with-subset-of branch from aad01d5 to 710e253 Compare September 22, 2017 23:41

gregsfortytwo reviewed Sep 26, 2017

View reviewed changes

gregsfortytwo added cleanup core needs-qa performance labels Sep 26, 2017

zmedico force-pushed the PGPool-update-optimize-with-subset-of branch from 710e253 to 11bba1c Compare September 26, 2017 07:08

tchaikov reviewed Sep 26, 2017

View reviewed changes

zmedico force-pushed the PGPool-update-optimize-with-subset-of branch 2 times, most recently from 23938f7 to 1f9d8a4 Compare September 26, 2017 08:10

gregsfortytwo approved these changes Sep 26, 2017

View reviewed changes

zmedico force-pushed the PGPool-update-optimize-with-subset-of branch from 1f9d8a4 to 8935f65 Compare September 27, 2017 00:38

zmedico force-pushed the PGPool-update-optimize-with-subset-of branch from 8935f65 to 18cbba4 Compare September 27, 2017 01:35

yuriw added the wip-yuri4-testing label Sep 27, 2017

yuriw merged commit 488c6e4 into ceph:master Sep 28, 2017

zmedico mentioned this pull request Oct 6, 2017

osd/PGPool::update: optimize with deleting_snaps #18147

Closed

zmedico mentioned this pull request Sep 6, 2018

luminous: core: PGPool::update optimizations #23969

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

osd/PGPool::update: optimize with subset_of #17820

osd/PGPool::update: optimize with subset_of #17820

zmedico commented Sep 20, 2017 •

edited

zmedico commented Sep 20, 2017 •

edited

zmedico commented Sep 22, 2017

zmedico commented Sep 22, 2017

gregsfortytwo left a comment

gregsfortytwo Sep 22, 2017

zmedico Sep 22, 2017

zmedico Sep 22, 2017

zmedico commented Sep 25, 2017

gregsfortytwo left a comment

gregsfortytwo Sep 25, 2017

zmedico Sep 26, 2017

gregsfortytwo Sep 25, 2017

zmedico Sep 26, 2017

zmedico Sep 26, 2017

gregsfortytwo commented Sep 26, 2017

zmedico commented Sep 26, 2017 •

edited

tchaikov Sep 26, 2017

zmedico Sep 26, 2017

gregsfortytwo commented Sep 26, 2017

zmedico commented Sep 27, 2017

yuriw commented Sep 28, 2017

osd/PGPool::update: optimize with subset_of #17820

osd/PGPool::update: optimize with subset_of #17820

Conversation

zmedico commented Sep 20, 2017 • edited

zmedico commented Sep 20, 2017 • edited

zmedico commented Sep 22, 2017

zmedico commented Sep 22, 2017

gregsfortytwo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zmedico commented Sep 25, 2017

gregsfortytwo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gregsfortytwo commented Sep 26, 2017

zmedico commented Sep 26, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gregsfortytwo commented Sep 26, 2017

zmedico commented Sep 27, 2017

yuriw commented Sep 28, 2017

zmedico commented Sep 20, 2017 •

edited

zmedico commented Sep 20, 2017 •

edited

zmedico commented Sep 26, 2017 •

edited