Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd/PGPool::update: optimize with subset_of #17820

Merged
merged 2 commits into from Sep 28, 2017

Conversation

zmedico
Copy link
Contributor

@zmedico zmedico commented Sep 20, 2017

Replace expensive inverval_set intersection_of and operator==
calls with a single subset_of call. I borrowed this idea from
Piotr Dałek's "osd/PGPool: don't use intermediate interval set"
patch. The following benchmark program demonstrates a 38%
performance increase:

#include <iostream>
#include <chrono>
#include "include/interval_set.h"

#define NANOSECONDS(d) \
    std::chrono::duration_cast<std::chrono::nanoseconds>(d).count()

typedef uint64_t snapid_t;
typedef std::chrono::steady_clock::duration duration;

duration PGPool_update_old(const interval_set<snapid_t> &rs) {
  std::chrono::steady_clock::time_point start, end;
  interval_set<snapid_t> newly_removed_snaps, cached_removed_snaps;

  // initialize state
  cached_removed_snaps = rs;

  // start timed simulation
  start = std::chrono::steady_clock::now();

  {
    newly_removed_snaps = cached_removed_snaps;
    interval_set<snapid_t> intersection;
    intersection.intersection_of(newly_removed_snaps, cached_removed_snaps);

    assert(intersection == cached_removed_snaps);
    cached_removed_snaps.swap(newly_removed_snaps);
    newly_removed_snaps = cached_removed_snaps;
    newly_removed_snaps.subtract(intersection);
  }

  // end timed simulation
  end = std::chrono::steady_clock::now();

  return end - start;
}

duration PGPool_update_new(const interval_set<snapid_t> &rs) {
  std::chrono::steady_clock::time_point start, end;
  interval_set<snapid_t> newly_removed_snaps, cached_removed_snaps;

  // initialize state
  cached_removed_snaps = rs;

  // start timed simulation
  start = std::chrono::steady_clock::now();

  {
    newly_removed_snaps = cached_removed_snaps;

    assert(cached_removed_snaps.subset_of(newly_removed_snaps));
    interval_set<snapid_t> removed_snaps = newly_removed_snaps;
    newly_removed_snaps.subtract(cached_removed_snaps);
    cached_removed_snaps.swap(removed_snaps);
  }

  // end timed simulation
  end = std::chrono::steady_clock::now();

  return end - start;
}

int main(int argc, char *argv[])
{
  assert(argc == 3);
  const int sample_count = std::stoi(argv[1]);
  const int interval_count = std::stoi(argv[2]);
  const int interval_distance = 4;
  const int interval_size = 2;
  const int max_offset = interval_count * interval_distance;
  interval_set<snapid_t> removed_snaps;

  for (int i = 0; i < max_offset; i += interval_distance)
    removed_snaps.insert(i, interval_size);

  duration old_delta(0), new_delta(0);

  for (int i = 0; i < sample_count; ++i) {
    old_delta += PGPool_update_old(removed_snaps);
    new_delta += PGPool_update_new(removed_snaps);
  }

  float ratio = float(NANOSECONDS(old_delta)) / NANOSECONDS(new_delta);

  std::cout << ratio << std::endl;
}

@branch-predictor

@zmedico zmedico changed the title PGPool::update: optimize with subset_of osd/PGPool::update: optimize with subset_of Sep 20, 2017
@zmedico
Copy link
Contributor Author

zmedico commented Sep 20, 2017

I need to benchmark this. It looks like subset_of might need to be optimized, since it calls contains, which uses the find_inc/lower_bound method. In order for subset_of to be viable, it would need to dynamically choose between sequential search and lower_bound methods, like the intersect_of implementation since #17088.

@zmedico zmedico force-pushed the PGPool-update-optimize-with-subset-of branch 5 times, most recently from 2e77d46 to 3e14687 Compare September 22, 2017 03:30
@zmedico
Copy link
Contributor Author

zmedico commented Sep 22, 2017

I've added an optimized subset_of implementation, and my benchmark program is showing a 38% improvement in performance.

@zmedico zmedico force-pushed the PGPool-update-optimize-with-subset-of branch 3 times, most recently from c224030 to aad01d5 Compare September 22, 2017 19:27
@zmedico
Copy link
Contributor Author

zmedico commented Sep 22, 2017

I've updated unittest_interval_set to cover all cases of the subset_size_sym function.

zmedico added a commit to zmedico/ceph that referenced this pull request Sep 22, 2017
ceph#17820 (1 of 2)

Optimize subset_of to use sequential search when it
performs better than the lower_bound method, for set
size ratios smaller than 10. This is analogous to
intersection_of behavior since commit 825470f.

The subset_of method can be used in some cases as a
less-expensive alternative to the intersection_of
method, since subset_of can return early if any element
of the smaller set is not contained in the larger set,
and intersection_of has the added burden of storing
the intersecting elements.

Signed-off-by: Zac Medico <zmedico@gmail.com>
zmedico added a commit to zmedico/ceph that referenced this pull request Sep 22, 2017
ceph#17820 (2 of 2)

Replace expensive inverval_set intersection_of and operator==
calls with a single subset_of call. I borrowed this idea from
Piotr Dałek's "osd/PGPool: don't use intermediate interval set"
patch. The following benchmark program demonstrates a 38%
performance increase:

#include <iostream>
#include <chrono>
#include "include/interval_set.h"

#define NANOSECONDS(d) \
    std::chrono::duration_cast<std::chrono::nanoseconds>(d).count()

typedef uint64_t snapid_t;
typedef std::chrono::steady_clock::duration duration;

duration PGPool_update_old(const interval_set<snapid_t> &rs) {
  std::chrono::steady_clock::time_point start, end;
  interval_set<snapid_t> newly_removed_snaps, cached_removed_snaps;

  // initialize state
  cached_removed_snaps = rs;

  // start timed simulation
  start = std::chrono::steady_clock::now();

  {
    newly_removed_snaps = cached_removed_snaps;
    interval_set<snapid_t> intersection;
    intersection.intersection_of(newly_removed_snaps, cached_removed_snaps);

    assert(intersection == cached_removed_snaps);
    cached_removed_snaps.swap(newly_removed_snaps);
    newly_removed_snaps = cached_removed_snaps;
    newly_removed_snaps.subtract(intersection);
  }

  // end timed simulation
  end = std::chrono::steady_clock::now();

  return end - start;
}

duration PGPool_update_new(const interval_set<snapid_t> &rs) {
  std::chrono::steady_clock::time_point start, end;
  interval_set<snapid_t> newly_removed_snaps, cached_removed_snaps;

  // initialize state
  cached_removed_snaps = rs;

  // start timed simulation
  start = std::chrono::steady_clock::now();

  {
    newly_removed_snaps = cached_removed_snaps;

    assert(cached_removed_snaps.subset_of(newly_removed_snaps));
    interval_set<snapid_t> removed_snaps = newly_removed_snaps;
    newly_removed_snaps.subtract(cached_removed_snaps);
    cached_removed_snaps.swap(removed_snaps);
  }

  // end timed simulation
  end = std::chrono::steady_clock::now();

  return end - start;
}

int main(int argc, char *argv[])
{
  assert(argc == 3);
  const int sample_count = std::stoi(argv[1]);
  const int interval_count = std::stoi(argv[2]);
  const int interval_distance = 4;
  const int interval_size = 2;
  const int max_offset = interval_count * interval_distance;
  interval_set<snapid_t> removed_snaps;

  for (int i = 0; i < max_offset; i += interval_distance)
    removed_snaps.insert(i, interval_size);

  duration old_delta(0), new_delta(0);

  for (int i = 0; i < sample_count; ++i) {
    old_delta += PGPool_update_old(removed_snaps);
    new_delta += PGPool_update_new(removed_snaps);
  }

  float ratio = float(NANOSECONDS(old_delta)) / NANOSECONDS(new_delta);

  std::cout << ratio << std::endl;
}

Suggested-by: Piotr Dałek <piotr.dalek@corp.ovh.com>
Signed-off-by: Zac Medico <zmedico@gmail.com>
Copy link
Member

@gregsfortytwo gregsfortytwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good except for a missing bounds check.

while (pa != a_end && pb != b_end) {

if (pb->first + pb->second <= pa->first)
{ ++pb; continue; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will break if you give it input where pb runs through to the end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pb != b_end bound is checked by the outer while loop. I did think about using an inner while loop here, but it didn't seem worth it since in only optimizes away one pa != a_end comparison.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll go ahead and change it to a while loop with bounds check.

@zmedico zmedico force-pushed the PGPool-update-optimize-with-subset-of branch from aad01d5 to 710e253 Compare September 22, 2017 23:41
@zmedico
Copy link
Contributor Author

zmedico commented Sep 25, 2017

Added the following bounds check:

      while (pb->first + pb->second <= pa->first) {
        ++pb;
        if (pb == b_end)
          return false;
      }

Copy link
Member

@gregsfortytwo gregsfortytwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, I guess since I misread it that's a good enough reason to change — thanks!

I now believe this is functional but I noted another confusing point...

b_end = b.m.end();

while (pa != a_end && pb != b_end) {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No big deal but FYI for next time: we don't usually put that much whitespace in logic flows like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I've removed some whitespace in the latest update. Let me know how it looks.

}

if (pa->first < pb->first)
return false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, should have noticed this before...this check is stronger than if (pa->first + pa->second <= pb->first), but I don't think it needs to follow.

Why not just use it to replace the prior one? I think that will make it clearer over all:

  1. skip through any extra intervals in (what we assume to be) the longer set, if it doesn't overlap the current shorter set
  2. if the shorter set has an interval starting point which we just found out isn't covered by the longer set, return false
  3. if the intervals are equal, iterate to the next one in each set until the intervals aren't equal (and we haven't run out), then start over from the top
  4. if the shorter set's interval is longer than the longer set's, return false
  5. or else move to the next interval in the shorter set and loop to the top

Instead of what looks like 1, weak 2, 3, strong 2, 4, 5 as written. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both pa->first < pb->first and pa->first + pa->second <= pb->first are distinct cases that need to be handled.

pa->first < pb->first means the shorter set's interval begins before the other interval

pa->first + pa->second <= pb->first means the shorter set's interval ends before the other interval begins

So, there's no redundancy in the existing logic. It's tricky enough to visualize that maybe we should have comments for each case, like:

      // interval begins before other
      if (pa->first < pb->first)
        return false;

      // interval ends before other begins
      if (pa->first + pa->second <= pb->first)
        return false;

      // interval is longer than other
      if (pa->first + pa->second > pb->first + pb->second)
        return false;

I'd also like to add an earlier range_end() comparison back in the subset_of method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see it now, "interval ends before other begins" is a case of "interval begins before other", so I've removed the redundant case.

@gregsfortytwo
Copy link
Member

Let's get it into the next testing run somebody does as-is though, in case I'm missing something.

@zmedico zmedico force-pushed the PGPool-update-optimize-with-subset-of branch from 710e253 to 11bba1c Compare September 26, 2017 07:08
@zmedico
Copy link
Contributor Author

zmedico commented Sep 26, 2017

Updates:

  • Removed some whitespace
  • Removed redundant "interval ends before other begins" case, and added comments
  • Added early range_end() > big.range_end() check to subset_of

@@ -286,6 +286,43 @@ class interval_set {
}
}
}

bool subset_size_sym(const interval_set &b) const {
typename decltype(m)::const_iterator pa = m.begin(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might want to use

auto pa = m.cbegin(), pb = b.m.cbegin();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done.

@zmedico zmedico force-pushed the PGPool-update-optimize-with-subset-of branch 2 times, most recently from 23938f7 to 1f9d8a4 Compare September 26, 2017 08:10
@gregsfortytwo
Copy link
Member

Okay, I believe the docs issue was in master and fixed in master, and this looks great. Just pending a testing branch run with it.

Optimize subset_of to use sequential search when it
performs better than the lower_bound method, for set
size ratios smaller than 10. This is analogous to
intersection_of behavior since commit 825470f.

The subset_of method can be used in some cases as a
less-expensive alternative to the intersection_of
method, since subset_of can return early if any element
of the smaller set is not contained in the larger set,
and intersection_of has the added burden of storing
the intersecting elements.

Signed-off-by: Zac Medico <zmedico@gmail.com>
@zmedico zmedico force-pushed the PGPool-update-optimize-with-subset-of branch from 1f9d8a4 to 8935f65 Compare September 27, 2017 00:38
@zmedico
Copy link
Contributor Author

zmedico commented Sep 27, 2017

Rebased on changes from caf6803.

Replace expensive inverval_set intersection_of and operator==
calls with a single subset_of call. I borrowed this idea from
Piotr Dałek's "osd/PGPool: don't use intermediate interval set"
patch. The following benchmark program demonstrates a 38%
performance increase:

#include <iostream>
#include <chrono>
#include "include/interval_set.h"

#define NANOSECONDS(d) \
    std::chrono::duration_cast<std::chrono::nanoseconds>(d).count()

typedef uint64_t snapid_t;
typedef std::chrono::steady_clock::duration duration;

duration PGPool_update_old(const interval_set<snapid_t> &rs) {
  std::chrono::steady_clock::time_point start, end;
  interval_set<snapid_t> newly_removed_snaps, cached_removed_snaps;

  // initialize state
  cached_removed_snaps = rs;

  // start timed simulation
  start = std::chrono::steady_clock::now();

  {
    newly_removed_snaps = cached_removed_snaps;
    interval_set<snapid_t> intersection;
    intersection.intersection_of(newly_removed_snaps, cached_removed_snaps);

    assert(intersection == cached_removed_snaps);
    cached_removed_snaps.swap(newly_removed_snaps);
    newly_removed_snaps = cached_removed_snaps;
    newly_removed_snaps.subtract(intersection);
  }

  // end timed simulation
  end = std::chrono::steady_clock::now();

  return end - start;
}

duration PGPool_update_new(const interval_set<snapid_t> &rs) {
  std::chrono::steady_clock::time_point start, end;
  interval_set<snapid_t> newly_removed_snaps, cached_removed_snaps;

  // initialize state
  cached_removed_snaps = rs;

  // start timed simulation
  start = std::chrono::steady_clock::now();

  {
    newly_removed_snaps = cached_removed_snaps;

    assert(cached_removed_snaps.subset_of(newly_removed_snaps));
    interval_set<snapid_t> removed_snaps = newly_removed_snaps;
    newly_removed_snaps.subtract(cached_removed_snaps);
    cached_removed_snaps.swap(removed_snaps);
  }

  // end timed simulation
  end = std::chrono::steady_clock::now();

  return end - start;
}

int main(int argc, char *argv[])
{
  assert(argc == 3);
  const int sample_count = std::stoi(argv[1]);
  const int interval_count = std::stoi(argv[2]);
  const int interval_distance = 4;
  const int interval_size = 2;
  const int max_offset = interval_count * interval_distance;
  interval_set<snapid_t> removed_snaps;

  for (int i = 0; i < max_offset; i += interval_distance)
    removed_snaps.insert(i, interval_size);

  duration old_delta(0), new_delta(0);

  for (int i = 0; i < sample_count; ++i) {
    old_delta += PGPool_update_old(removed_snaps);
    new_delta += PGPool_update_new(removed_snaps);
  }

  float ratio = float(NANOSECONDS(old_delta)) / NANOSECONDS(new_delta);

  std::cout << ratio << std::endl;
}

Suggested-by: Piotr Dałek <piotr.dalek@corp.ovh.com>
Signed-off-by: Zac Medico <zmedico@gmail.com>
@zmedico zmedico force-pushed the PGPool-update-optimize-with-subset-of branch from 8935f65 to 18cbba4 Compare September 27, 2017 01:35
@yuriw yuriw merged commit 488c6e4 into ceph:master Sep 28, 2017
zmedico added a commit to zmedico/ceph that referenced this pull request Oct 13, 2017
ceph#17820 merged in 5c37758

Optimize subset_of to use sequential search when it
performs better than the lower_bound method, for set
size ratios smaller than 10. This is analogous to
intersection_of behavior since commit 825470f.

The subset_of method can be used in some cases as a
less-expensive alternative to the intersection_of
method, since subset_of can return early if any element
of the smaller set is not contained in the larger set,
and intersection_of has the added burden of storing
the intersecting elements.

Signed-off-by: Zac Medico <zmedico@gmail.com>
zmedico added a commit to zmedico/ceph that referenced this pull request Oct 13, 2017
ceph#17820 merged in 18cbba4

Replace expensive inverval_set intersection_of and operator==
calls with a single subset_of call. I borrowed this idea from
Piotr Dałek's "osd/PGPool: don't use intermediate interval set"
patch. The following benchmark program demonstrates a 38%
performance increase:

#include <iostream>
#include <chrono>
#include "include/interval_set.h"

#define NANOSECONDS(d) \
    std::chrono::duration_cast<std::chrono::nanoseconds>(d).count()

typedef uint64_t snapid_t;
typedef std::chrono::steady_clock::duration duration;

duration PGPool_update_old(const interval_set<snapid_t> &rs) {
  std::chrono::steady_clock::time_point start, end;
  interval_set<snapid_t> newly_removed_snaps, cached_removed_snaps;

  // initialize state
  cached_removed_snaps = rs;

  // start timed simulation
  start = std::chrono::steady_clock::now();

  {
    newly_removed_snaps = cached_removed_snaps;
    interval_set<snapid_t> intersection;
    intersection.intersection_of(newly_removed_snaps, cached_removed_snaps);

    assert(intersection == cached_removed_snaps);
    cached_removed_snaps.swap(newly_removed_snaps);
    newly_removed_snaps = cached_removed_snaps;
    newly_removed_snaps.subtract(intersection);
  }

  // end timed simulation
  end = std::chrono::steady_clock::now();

  return end - start;
}

duration PGPool_update_new(const interval_set<snapid_t> &rs) {
  std::chrono::steady_clock::time_point start, end;
  interval_set<snapid_t> newly_removed_snaps, cached_removed_snaps;

  // initialize state
  cached_removed_snaps = rs;

  // start timed simulation
  start = std::chrono::steady_clock::now();

  {
    newly_removed_snaps = cached_removed_snaps;

    assert(cached_removed_snaps.subset_of(newly_removed_snaps));
    interval_set<snapid_t> removed_snaps = newly_removed_snaps;
    newly_removed_snaps.subtract(cached_removed_snaps);
    cached_removed_snaps.swap(removed_snaps);
  }

  // end timed simulation
  end = std::chrono::steady_clock::now();

  return end - start;
}

int main(int argc, char *argv[])
{
  assert(argc == 3);
  const int sample_count = std::stoi(argv[1]);
  const int interval_count = std::stoi(argv[2]);
  const int interval_distance = 4;
  const int interval_size = 2;
  const int max_offset = interval_count * interval_distance;
  interval_set<snapid_t> removed_snaps;

  for (int i = 0; i < max_offset; i += interval_distance)
    removed_snaps.insert(i, interval_size);

  duration old_delta(0), new_delta(0);

  for (int i = 0; i < sample_count; ++i) {
    old_delta += PGPool_update_old(removed_snaps);
    new_delta += PGPool_update_new(removed_snaps);
  }

  float ratio = float(NANOSECONDS(old_delta)) / NANOSECONDS(new_delta);

  std::cout << ratio << std::endl;
}

Suggested-by: Piotr Dałek <piotr.dalek@corp.ovh.com>
Signed-off-by: Zac Medico <zmedico@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants