Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 6192 - std.algorithm.sort performance #3922

Closed
wants to merge 1,823 commits into from
Closed

Conversation

andralex
Copy link
Member

std.sort has had a performance issue. The quicksort algorithm uses the median of three: first, middle, and last element.

Say an array is already sorted except for the last element, which is small (e.g. the same as the first element). In that case, the median of three will choose the first element as the pivot, i.e. an extremum, leading to a wasted pass. This is a common situation (e.g. append to an array then resort).

The proposed solution is to use median of five with a bit of randomness. Five elements are sorted in place using Demuth's algorithm (1956!) which is very advantageous because it only uses up to seven comparisons.

@andralex
Copy link
Member Author

My measurements on random arrays, almost sorted arrays, and sorted arrays all indicate good improvements.

@JakobOvrum
Copy link
Member

Took me a while to grok, but this is awesome. LGTM.

@andralex
Copy link
Member Author

Some measurements in https://issues.dlang.org/show_bug.cgi?id=6192

@JackStouffer
Copy link
Member

Here is some benchmarking code comparing 2.069 and master with this pull using std.datetime.benchmark which makes the code much more readable than the example posted in the bug report (which AFAIK was created before benchmark existed). Mostly posting this for the eventual changelog entry.

import std.stdio;
import std.datetime;
import std.conv : to;
import std.algorithm;
import std.range : iota;
import std.array : array;
import std.random;

enum array_size = 6_000_000;

double[] random_array;
double[] sorted_array;
double[] semi_sorted_array;

auto randomArray()
{
    sort(random_array);
}

auto sortedArray()
{
    sort(sorted_array);
}

auto semiSortedArray()
{
    sort(semi_sorted_array);
}

void main()
{
    // data construction
    foreach (i; 0 .. array_size)
        random_array ~= uniform(0.0, 10000.0);

    sorted_array = iota(double(array_size)).array;

    semi_sorted_array = sorted_array[0 .. $ - 100].dup;
    foreach (i; 0 .. 100)
        semi_sorted_array ~= uniform(0.0, 10000.0);

    // benchmark
    auto r = benchmark!(randomArray, sortedArray, semiSortedArray)(1);
    auto random_result = to!Duration(r[0]);
    auto sorted_result = to!Duration(r[1]);
    auto semi_sorted_result = to!Duration(r[2]);

    writeln("Random: ", random_result);
    writeln("Sorted: ", sorted_result);
    writeln("Semi Sorted: ", semi_sorted_result);
}

built with -O -release -inline -boundscheck=off

DMD 2.069

Random: 633 ms, 898 μs, and 6 hnsecs
Sorted: 80 ms, 714 μs, and 5 hnsecs
Semi Sorted: 720 ms, 744 μs, and 2 hnsecs

DMD with commit 31d75df

Random: 636 ms, 427 μs, and 6 hnsecs
Sorted: 99 ms, 512 μs, and 7 hnsecs
Semi Sorted: 178 ms, 920 μs, and 9 hnsecs

Updated: DMD with commit bc1a23b

Random: 671 ms, 162 μs, and 6 hnsecs
Sorted: 94 ms, 572 μs, and 8 hnsecs
Semi Sorted: 181 ms, 890 μs, and 1 hnsec

@Xinok
Copy link
Contributor

Xinok commented Jan 12, 2016

Since this hasn't been pulled yet, I may as well share: It's possible to find the median of five in six comparisons. Better yet, somebody hardcoded all possible cases in this super elegant function.

I also wrote my own function a while back which also partitions the elements by the median, still in six comparisons.

@andralex
Copy link
Member Author

@Xinok cool, let me adapt your function tomorrow for some testing. Partitioning (not sorting) by the median should be fine.

@schuetzm
Copy link
Contributor

I guess bringToFront can be used for the chain-swap operations, which is better if we have non-POD types.

@dlang-bot
Copy link
Contributor

Fix Bugzilla Description
6192 std.algorithm.sort performance

@andralex
Copy link
Member Author

@Xinok thanks, that helped!! I adapted your code with credit. Please lmk what you think. @schuetzm yes, bringToFront would be of good use in a number of places but sadly it's not that special-cased for random access ranges, etc. I'll leave that to another day.

@JackStouffer
Copy link
Member

@andralex your recent change made the random array sort slightly slower, see my updated comment.

@dnadlinger
Copy link
Member

Regarding "slightly slower", I'd always do such tests using GDC or LDC. The results are pretty much worthless otherwise, unless you are absolutely sure that the cause is an unavoidable algorithmic regression.

@andralex
Copy link
Member Author

@JackStouffer what is the pessimization? (Can I see the history of the comment?) @klickverbot well arguably improving performance with dmd is both good in and of itself, and an indicator of possible improvements on others. Also improvements in the algos used are likely to improve across the lot.

@JackStouffer
Copy link
Member

@andralex I updated my comment again. Sorry, I shouldn't have overwritten my previous results.

std.datetime.benchmark needs a teardown function option to be called at the end of every test. This would allow for tests with side effects, like this one, to be run multiple times to allow for an averaging out of outliers. I will make a bugzilla request for this.

@andralex
Copy link
Member Author

I'm trying to get gdc to work, how do I override its paths? It always picks up its own phobos.

@andralex
Copy link
Member Author

Same question for ldc2... I tried -I/path/to/phobos, no avail.

@andralex
Copy link
Member Author

My tests also show a slight performance decrease in all scenarios when using Xinok's partition instead of Demuth's, so I'm getting back to Demuth.

@andralex
Copy link
Member Author

My speculation: the additional structuring brought about by Demuth's algo justifies the extra cost.

@ibuclaw
Copy link
Member

ibuclaw commented Jan 13, 2016

You should also test on different sized arrays, if you are not already. :)

@ibuclaw
Copy link
Member

ibuclaw commented Jan 13, 2016

I'm trying to get gdc to work, how do I override its paths? It always picks up its own phobos.

There is -nophobos switch. However I'd just compile the sort function on it's own with the test.

@andralex
Copy link
Member Author

ping? guess we should move this forward

@quickfur
Copy link
Member

What @ibuclaw said.

Yes, I'd like to see some "real" measurements with gdc/ldc instead of dmd. Over the years I'd tended not to trust in dmd performance measurements, because generally gdc/ldc do it about 20-30% faster (sometimes more).

@andralex
Copy link
Member Author

@quickfur this is not a microoptimization, it mainly removes a provable pathological case.

@9il
Copy link
Member

9il commented Jan 15, 2016

This PR changes the topN performance for the ndslice example:

  • 3x3 blocks: topN is 20% slower with this PR
  • 7x7 blocks: topN is 5% slower with this PR

@andralex
Copy link
Member Author

@9il cool, could you please create a paste at dpaste.dzfl.pl with a complete benchmark and email it to me so I can look over it?

@9il
Copy link
Member

9il commented Jan 16, 2016

@9il cool, could you please create a paste at dpaste.dzfl.pl with a complete benchmark and email it to me so I can look over it?

Emailed. I hope that email is correct.
https://github.com/DlangScience/examples/tree/master/image_processing/median-filter

@andralex
Copy link
Member Author

@9il got it thx!

andralex and others added 26 commits September 25, 2016 18:13
Optimized sort, 4%-8% speed improvements
This feature will hopefully be deprecated soon.
[TRIVIAL] Remove last use of implicit string concatenation
add mapSlice and fix  Issue 16501
[trivial] Added const to varibles in std.file that aren't modified
…verloads in order to facilitate further improvements
[Issue 16170] Partial Fix for Broken std.algorithm.sorting.partition
@andralex
Copy link
Member Author

So, I rebased and got this million commits in there.... will clean this crap up. Anyhow, now we can approach this PR differently because medianOf is available. I've also introduced a simple watermark-based regression buster.

@andralex
Copy link
Member Author

Restarted in #4826

@andralex andralex closed this Sep 30, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.