Issue 6192 - std.algorithm.sort performance #3922

andralex · 2016-01-12T03:08:36Z

std.sort has had a performance issue. The quicksort algorithm uses the median of three: first, middle, and last element.

Say an array is already sorted except for the last element, which is small (e.g. the same as the first element). In that case, the median of three will choose the first element as the pivot, i.e. an extremum, leading to a wasted pass. This is a common situation (e.g. append to an array then resort).

The proposed solution is to use median of five with a bit of randomness. Five elements are sorted in place using Demuth's algorithm (1956!) which is very advantageous because it only uses up to seven comparisons.

andralex · 2016-01-12T03:12:30Z

My measurements on random arrays, almost sorted arrays, and sorted arrays all indicate good improvements.

JakobOvrum · 2016-01-12T03:23:07Z

Took me a while to grok, but this is awesome. LGTM.

andralex · 2016-01-12T03:27:45Z

Some measurements in https://issues.dlang.org/show_bug.cgi?id=6192

JackStouffer · 2016-01-12T04:14:04Z

Here is some benchmarking code comparing 2.069 and master with this pull using std.datetime.benchmark which makes the code much more readable than the example posted in the bug report (which AFAIK was created before benchmark existed). Mostly posting this for the eventual changelog entry.

import std.stdio;
import std.datetime;
import std.conv : to;
import std.algorithm;
import std.range : iota;
import std.array : array;
import std.random;

enum array_size = 6_000_000;

double[] random_array;
double[] sorted_array;
double[] semi_sorted_array;

auto randomArray()
{
    sort(random_array);
}

auto sortedArray()
{
    sort(sorted_array);
}

auto semiSortedArray()
{
    sort(semi_sorted_array);
}

void main()
{
    // data construction
    foreach (i; 0 .. array_size)
        random_array ~= uniform(0.0, 10000.0);

    sorted_array = iota(double(array_size)).array;

    semi_sorted_array = sorted_array[0 .. $ - 100].dup;
    foreach (i; 0 .. 100)
        semi_sorted_array ~= uniform(0.0, 10000.0);

    // benchmark
    auto r = benchmark!(randomArray, sortedArray, semiSortedArray)(1);
    auto random_result = to!Duration(r[0]);
    auto sorted_result = to!Duration(r[1]);
    auto semi_sorted_result = to!Duration(r[2]);

    writeln("Random: ", random_result);
    writeln("Sorted: ", sorted_result);
    writeln("Semi Sorted: ", semi_sorted_result);
}

built with -O -release -inline -boundscheck=off

DMD 2.069

Random: 633 ms, 898 μs, and 6 hnsecs
Sorted: 80 ms, 714 μs, and 5 hnsecs
Semi Sorted: 720 ms, 744 μs, and 2 hnsecs

DMD with commit 31d75df

Random: 636 ms, 427 μs, and 6 hnsecs
Sorted: 99 ms, 512 μs, and 7 hnsecs
Semi Sorted: 178 ms, 920 μs, and 9 hnsecs

Updated: DMD with commit bc1a23b

Random: 671 ms, 162 μs, and 6 hnsecs
Sorted: 94 ms, 572 μs, and 8 hnsecs
Semi Sorted: 181 ms, 890 μs, and 1 hnsec

Xinok · 2016-01-12T05:18:06Z

Since this hasn't been pulled yet, I may as well share: It's possible to find the median of five in six comparisons. Better yet, somebody hardcoded all possible cases in this super elegant function.

I also wrote my own function a while back which also partitions the elements by the median, still in six comparisons.

andralex · 2016-01-12T06:13:11Z

@Xinok cool, let me adapt your function tomorrow for some testing. Partitioning (not sorting) by the median should be fine.

schuetzm · 2016-01-12T10:10:25Z

I guess bringToFront can be used for the chain-swap operations, which is better if we have non-POD types.

dlang-bot · 2016-01-12T16:35:33Z

Fix	Bugzilla	Description
✗	6192	std.algorithm.sort performance

andralex · 2016-01-12T16:38:58Z

@Xinok thanks, that helped!! I adapted your code with credit. Please lmk what you think. @schuetzm yes, bringToFront would be of good use in a number of places but sadly it's not that special-cased for random access ranges, etc. I'll leave that to another day.

JackStouffer · 2016-01-12T16:45:36Z

@andralex your recent change made the random array sort slightly slower, see my updated comment.

dnadlinger · 2016-01-12T16:50:34Z

Regarding "slightly slower", I'd always do such tests using GDC or LDC. The results are pretty much worthless otherwise, unless you are absolutely sure that the cause is an unavoidable algorithmic regression.

andralex · 2016-01-12T17:15:47Z

@JackStouffer what is the pessimization? (Can I see the history of the comment?) @klickverbot well arguably improving performance with dmd is both good in and of itself, and an indicator of possible improvements on others. Also improvements in the algos used are likely to improve across the lot.

JackStouffer · 2016-01-12T18:18:43Z

@andralex I updated my comment again. Sorry, I shouldn't have overwritten my previous results.

std.datetime.benchmark needs a teardown function option to be called at the end of every test. This would allow for tests with side effects, like this one, to be run multiple times to allow for an averaging out of outliers. I will make a bugzilla request for this.

andralex · 2016-01-12T18:28:33Z

I'm trying to get gdc to work, how do I override its paths? It always picks up its own phobos.

andralex · 2016-01-12T18:34:59Z

Same question for ldc2... I tried -I/path/to/phobos, no avail.

andralex · 2016-01-12T18:49:19Z

My tests also show a slight performance decrease in all scenarios when using Xinok's partition instead of Demuth's, so I'm getting back to Demuth.

andralex · 2016-01-12T18:50:32Z

My speculation: the additional structuring brought about by Demuth's algo justifies the extra cost.

ibuclaw · 2016-01-13T00:05:09Z

You should also test on different sized arrays, if you are not already. :)

ibuclaw · 2016-01-13T00:07:48Z

I'm trying to get gdc to work, how do I override its paths? It always picks up its own phobos.

There is -nophobos switch. However I'd just compile the sort function on it's own with the test.

andralex · 2016-01-13T17:06:11Z

ping? guess we should move this forward

quickfur · 2016-01-13T19:51:10Z

What @ibuclaw said.

Yes, I'd like to see some "real" measurements with gdc/ldc instead of dmd. Over the years I'd tended not to trust in dmd performance measurements, because generally gdc/ldc do it about 20-30% faster (sometimes more).

andralex · 2016-01-15T00:50:43Z

@quickfur this is not a microoptimization, it mainly removes a provable pathological case.

9il · 2016-01-15T07:55:31Z

This PR changes the topN performance for the ndslice example:

3x3 blocks: topN is 20% slower with this PR
7x7 blocks: topN is 5% slower with this PR

andralex · 2016-01-15T17:32:33Z

@9il cool, could you please create a paste at dpaste.dzfl.pl with a complete benchmark and email it to me so I can look over it?

9il · 2016-01-16T08:59:51Z

@9il cool, could you please create a paste at dpaste.dzfl.pl with a complete benchmark and email it to me so I can look over it?

Emailed. I hope that email is correct.
https://github.com/DlangScience/examples/tree/master/image_processing/median-filter

andralex · 2016-01-16T12:00:37Z

@9il got it thx!

Optimized sort, 4%-8% speed improvements

This feature will hopefully be deprecated soon.

[TRIVIAL] Remove last use of implicit string concatenation

add mapSlice and fix Issue 16501

[trivial] Added const to varibles in std.file that aren't modified

…ec later

Add medianOf function

Fix Issue 16544 - Add File.reopen

reduce ndslice template bloat

…verloads in order to facilitate further improvements

workaround for Issue 16473

[Issue 16170] Partial Fix for Broken std.algorithm.sorting.partition

…irst five elements

andralex · 2016-09-30T19:36:08Z

So, I rebased and got this million commits in there.... will clean this crap up. Anyhow, now we can approach this PR differently because medianOf is available. I've also introduced a simple watermark-based regression buster.

andralex · 2016-09-30T20:00:13Z

Restarted in #4826

andralex force-pushed the 6192 branch 2 times, most recently from 9b2dd54 to bc1a23b Compare January 12, 2016 16:42

andralex force-pushed the 6192 branch from bc1a23b to 032bc80 Compare January 12, 2016 18:48

9il mentioned this pull request Jan 16, 2016

Faster topN using a heap (or two) #3934

Closed

andralex and others added 26 commits September 25, 2016 18:13

Tighter loop for insertion sorting

a439324

Merge pull request dlang#4816 from andralex/sort

dd5ebbf

Optimized sort, 4%-8% speed improvements

Remove last use of implicit string concatenation

e43a3a1

This feature will hopefully be deprecated soon.

Merge pull request dlang#4821 from mathias-lang-sociomantic/fix-3827

5813fac

[TRIVIAL] Remove last use of implicit string concatenation

Fix Issue 16544 - Add File.reopen

d63572f

Merge pull request dlang#4781 from 9il/mapSlice

a5e9353

add mapSlice and fix Issue 16501

Merge pull request dlang#4811 from JackStouffer/file-const

81c09ed

[trivial] Added const to varibles in std.file that aren't modified

Add medianOf

dcd00a7

Review

15ee49f

@wilzbach review

0a96757

Make medianOf private for now to have freedom in choosing a public sp…

1859f0d

…ec later

medianOf restricted for now to only size_t indexes

25dac83

reduce ndslice template bloat part 1

c39ec4c

reduce ndslice template bloat part 2

a90f857

Merge pull request dlang#4810 from andralex/medianOf

ee1110d

Add medianOf function

Merge pull request dlang#4822 from CyberShadow/pull-20160926-152950

59b6392

Fix Issue 16544 - Add File.reopen

Merge pull request dlang#4823 from 9il/templatebloat

a725dbc

reduce ndslice template bloat

workaround for Issue 16473

545dfd0

[Issue 16170] Seperate std.algorithm.sorting.partition into various o…

e21f272

…verloads in order to facilitate further improvements

fix mixin style

1c9ff2f

Merge pull request dlang#4820 from 9il/workaround16473

3026329

workaround for Issue 16473

Merge pull request dlang#4429 from JackStouffer/issue16170

b297804

[Issue 16170] Partial Fix for Broken std.algorithm.sorting.partition

Issue 6192 - std.algorithm.sort performance

3c79f6b

Improve optimisticInsertionSort by using Demuth's algorithm for the f…

dfbef2e

…irst five elements

Merge branch '6192' of github.com:andralex/phobos into 6192

3a22963

Use medianOf 5 to compute pivot

0486b52

andralex mentioned this pull request Sep 30, 2016

Use medianOf 3 and 5 to estimate pivot #4826

Merged

andralex closed this Sep 30, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue 6192 - std.algorithm.sort performance #3922

Issue 6192 - std.algorithm.sort performance #3922

andralex commented Jan 12, 2016

andralex commented Jan 12, 2016

JakobOvrum commented Jan 12, 2016

andralex commented Jan 12, 2016

JackStouffer commented Jan 12, 2016

Xinok commented Jan 12, 2016

andralex commented Jan 12, 2016

schuetzm commented Jan 12, 2016

dlang-bot commented Jan 12, 2016

andralex commented Jan 12, 2016

JackStouffer commented Jan 12, 2016

dnadlinger commented Jan 12, 2016

andralex commented Jan 12, 2016

JackStouffer commented Jan 12, 2016

andralex commented Jan 12, 2016

andralex commented Jan 12, 2016

andralex commented Jan 12, 2016

andralex commented Jan 12, 2016

ibuclaw commented Jan 13, 2016

ibuclaw commented Jan 13, 2016

andralex commented Jan 13, 2016

quickfur commented Jan 13, 2016

andralex commented Jan 15, 2016

9il commented Jan 15, 2016

andralex commented Jan 15, 2016

9il commented Jan 16, 2016

andralex commented Jan 16, 2016

andralex commented Sep 30, 2016

andralex commented Sep 30, 2016

Issue 6192 - std.algorithm.sort performance #3922

Issue 6192 - std.algorithm.sort performance #3922

Conversation

andralex commented Jan 12, 2016

andralex commented Jan 12, 2016

JakobOvrum commented Jan 12, 2016

andralex commented Jan 12, 2016

JackStouffer commented Jan 12, 2016

Xinok commented Jan 12, 2016

andralex commented Jan 12, 2016

schuetzm commented Jan 12, 2016

dlang-bot commented Jan 12, 2016

andralex commented Jan 12, 2016

JackStouffer commented Jan 12, 2016

dnadlinger commented Jan 12, 2016

andralex commented Jan 12, 2016

JackStouffer commented Jan 12, 2016

andralex commented Jan 12, 2016

andralex commented Jan 12, 2016

andralex commented Jan 12, 2016

andralex commented Jan 12, 2016

ibuclaw commented Jan 13, 2016

ibuclaw commented Jan 13, 2016

andralex commented Jan 13, 2016

quickfur commented Jan 13, 2016

andralex commented Jan 15, 2016

9il commented Jan 15, 2016

andralex commented Jan 15, 2016

9il commented Jan 16, 2016

andralex commented Jan 16, 2016

andralex commented Sep 30, 2016

andralex commented Sep 30, 2016