Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve components that accept forward iterators and compute the size #169

Closed
13 tasks done
Morwenn opened this issue Aug 22, 2020 · 2 comments
Closed
13 tasks done
Milestone

Comments

@Morwenn
Copy link
Owner

Morwenn commented Aug 22, 2020

Several components in the library still accept forward or bidirectional iterators/iterables and start by computing std::distance(first, last). This is sub-optimal when the container can already provide its size in O(1) such as std::list, which is why some sorters use utility::size to take advantage of it.

For version 1.8.0 we should inspect the library for sorters, adapters and other algorithms that could be improved with direct size information but are not.

Components that could obviously be improved:

  • equal_range: seldom used, unclear whether it's worth it.
  • lower_bound
  • probe::exc
  • probe::ham
  • probe::inv
  • probe::max
  • schwartz_adapter
  • stable_adapter
  • upper_bound
  • vergesort

Components where improvement is more involved if possible at all:

  • inplace_merge: passing the size might allow improvements for verge_sort with bidirectional iterators.
  • probe::dis
  • probe::rem

I hope that counted sentinels (#45) will make the situation even better and allow to take advantage if size information even when the algorithm only takes a part of a collection.

@Morwenn Morwenn added this to the 1.8.0 milestone Aug 22, 2020
Morwenn added a commit that referenced this issue Aug 22, 2020
The inplace_merge algorithm that works for forward iterators when no
heap memory is available was passing next(i, n) to upper_bound and
lower_bound while both of those start by computing the size again from
the passed iterators. Introducing lower_bound_n and upper_bound_n avoids
to cross the subrange twice to compute information we already have.

This change makes the forward iterator version of merge_sort way faster
than it used to be when no heap memory is available.
@Morwenn
Copy link
Owner Author

Morwenn commented Aug 23, 2020

Introducing lower_bound_n and upper_bound_n that take an iterator and a size yielded some surprisingly good results: the graph below shows the result of benchmarking the old version of merge_sort against the new one when sorting an std::forward_list and no extra heap memory is available - as you can see the new version performs really faster than the previous one in almost every scenario:

Old vs. new merge_sort benchmark

It's now story time fellow mortals: that improvement comes from the use of lower_bound_n and upper_bound_n in the fallback algorithm used by inplace_merge to merge forward iterators when no extra heap memory is available. There are places where I called lower_bound(it, std::next(it, n), ...), however lower_bound starts by computing std::distance(first, last) to get the size of the search space. In the case of forward iterators, that meant traversing the same sequence twice to get information that we already had to start with, hence the poor results.

Now those of you who learnt the README by heart will remember this piece of trivia attribution:

The library internally uses an inplace_merge function that works with forward itetors. Its implementation uses a merge algorithm proposed by Dudziński and Dydek, and implemented by Alexander Stepanov and Paul McJones in their book Elements of Programming.

For all I know, Stepanov always recomments using the information we have fully and not to recompute what we already know (see for example the Law of Useful Return). With that in mind I wondered how Elements of Programming could contain suboptimal code like this, so I decided to go back in and have a look at the original code.

...

It turns out that Elements of Programming uses functions named lower_bound_n and upper_bound_n too, so it does not waste time traversing the search space twice. My guess is that I originally took that piece of code back when I was still trying not to reimplement half of the standard library's algorithms in cpp-sort and just "patched" the algorithm on the fly with std::next so that I could use std::lower_bound directly, and never gave it a second thought or tried to benchmark how bad the impact would be.

Once again Stepanov and McJones were right, and the mistakes were mine.

Morwenn added a commit that referenced this issue Aug 27, 2020
probe::dis was accidentally O(n^3) with forward iterators and
bidirectional iterators. This commit changes the algorithm a bit to
avoid a lot of unnecessary calls to std::distance, and gives it its
marketed O(n^2) complexity for all categories of iterators.

Loosely related to issue #169.
@Morwenn
Copy link
Owner Author

Morwenn commented Aug 27, 2020

probe::dis got some love too because of this issue and the improvement is just ridiculously huge (to the point that the bars for the new speed are almost invisible):

Old vs. new probe::dis

Long story short: probe::dis was accidentally cubic instead of quadratic for forward and bidirectional iterators. The latest commit made it O(n²) as it was expected to be. I've still got a few ideas to make probe::dis a bit faster again, but none would be that big: issue #85 tracks the measures of presortedness for which the time complexity could be improved according to literature, and Dis(X) is not among them.

EDIT: with the addition of a small heuristic that short-circuits parts of the computation when we know that we can't find a better result, we get an even faster algorithm on average:

Even newer probe::dis

The orange bar in this graph is the O(n²) version from the previous graph (which was also an orange bar), the green bar is the O(n²) with the new optimization. It is worth noting that the second benchmark uses a collection 10 times bigger than that of the previous benchmark.

Morwenn added a commit that referenced this issue Aug 30, 2020
Add new innplace_merge overload for bidirectional iterators to allow to
pass the sizes of the subranges to merge without passing a buffer. Use
this information to avoid recomputing size information as often in the
vergesort overload that handles bidirectional iterators.

This commit also cleans the bidirectional overload of vergesort, which
had been kind of neglected.
@Morwenn Morwenn closed this as completed Sep 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant