fix stable sort (4584) #787

DmitryOlshansky · 2012-09-11T20:38:28Z

Use Xinok's public domain Tim Sort for stable sort.

We had broken stable sort for 2 years now but I really need a working one for upcoming std.uni ;)

andralex · 2012-09-17T03:01:58Z

std/algorithm.d

@@ -7295,7 +7298,7 @@ sort(alias less = "a < b", SwapStrategy ss = SwapStrategy.unstable,
    {
        static assert(false, "Invalid predicate passed to sort: "~less);
    }
-    return assumeSorted!less(r);
+    return assumeSorted!(less)(r);


why change from good style to bad style?

andralex · 2012-09-17T03:43:18Z

There are quite a few stylistic issues and unrealized opportunities for adding more useful algorithms. For now, however, let's focus on fixing the more obvious ones and getting this in. Thanks!

DmitryOlshansky · 2012-09-17T09:08:45Z

Okay, I knew I'm up for some work :)
I'll do it step by step starting with style.

DmitryOlshansky · 2012-09-17T11:17:19Z

Fixed all of raised issues.

Xinok · 2012-09-17T22:32:13Z

I'm happy to see this coming into fruition. It looks bad for D when simple things like a stable sort are broken.

I suggest removing the "Adopted From" comment and replace it with something more proper. If you really want to give attribution, you could simply provide a link to the original code.

You could replace the "optimisticInsertionSort" with binary insertion sort, whether mine or your own, and call that instead. While it's a bit slower when sorting primitive types, it does far fewer comparisons which would benefit the unstable sort as well.

As for moveAll, I would avoid that altogether. The optimizer isn't really to blame; The implementation uses input range primitives which are just slower. Also, destructively moving each element for insertion will add much unneeded overhead, especially for larger types. Insertion sort is where the majority of time is spent and is integral for performance.

My email is in my profile if you need help with anything. I'm not sure how difficult it is to understand as I don't thoroughly comment my code.

DmitryOlshansky · 2012-09-18T21:02:57Z

@Xinok About "Adopted From..." I conjured this note in a couple of minutes and not particularly fond of it myself.

Still IRC public domain license requires to explicilty distinguish altered versions and this one is getting some changes. Anyway I'm more then willing to put in any note that you as an author feel would be more appropriate.

optimisticInsertionSort vs binaryInsertionSort - looks like I'd need to play some more with them to have an idea of their performance characteristics on small inputs.

Now speaking of input range primitives and moveAll I completely disagree. If range primitives are to have any success they must be (almost) as fast as naked arrays. If not then we are doing something wrong and that leaves us with 2 options: ether redesign ranges or hack compiler to optimize them better.

And I'd argue that ranges as defined are good enough ground for optimal (or near optimal) performance as they are just compile-time hardwired thin wrappers for arrays unlike say GoF Iterator that has dynamic dispatch.

See this sample that includes 3 versions of shift right array to the right:
https://gist.github.com/3745627

And the disassembly of 3 relevant pieces here:
https://gist.github.com/3745686
(cmd line: dmd -g -O -inline -release -noboundscheck test.d)
Note: inclusion of symbolic info does affect timings, meaning that '-g' doesn't disable any optimizations.

Obviously compiler doesn't do a prticularly good job and happily reloads plenty of stuff from stack even in simpliest case. 2nd version doesn't have move inlined, poor fellow.
3rd version is unrolled but compiler again uses stack a lot (and~2x as much as in 1st version), including some obviously awful cases like swapping 2 registers through the stack.

Having looked at its source code of DMD inliner I can confidently say: it's not good enough. For instance it won't ever optimize functions with multiple returns and in fact move currently has an early return and thus is not inlined.
All in all, it tries to rewrite everything into a kind of long comma-expressions and a lot of stuff can't be converted that easily.

DmitryOlshansky · 2012-09-18T21:13:19Z

Bottom line:
In general I believe it IS a compiler issue but there could be a few minor faults in Phobos:
For instance after fixing move for builtins (removing branch) I finally see identical code for fn1 vs fn2 in test sample.
The good thing about move is that it incapsulates proper logic for postblit/destructors.

Xinok · 2012-09-20T03:48:53Z

Considering that arrays are a pointer and length pair, input ranges have to increment the pointer and decrement the length for each iteration. Even when iterating in reverse (bidirectional ranges), although it only has to decrement the length, getting the back element is arr[$-1]. Optimizers aren't magic that can whisk away any inefficiency. The programmer must still put in effort to manually optimize their code.

The best thing to do is use foreach loops when possible. In this way, the compiler can choose the optimal method for iteration without the programmer writing specialized code for different types. In this case, the most efficient way to iterate an array is using pointers.

Personally, I optimize my code empirically. DMD is an odd compiler, things that should be faster in theory actually turn out to be slower. So in cases where a foreach loop should be optimal, I use a for loop instead because it proved to be faster in benchmarks.

I pushed out an update for Timsort today. Read the comment to understand what I changed and why.

I also wrote this. I tried to optimize insertion sort using Duff's Device, but again, it failed to be any faster in benchmarks.

As for attribution, just include a link to the original code. This is sort of what is done already; std.algorithm:sort links to STL's sort and stable sort.

DmitryOlshansky · 2012-09-20T09:24:43Z

Considering that arrays are a pointer and length pair, input ranges have to increment the pointer and decrement the length for each iteration. Even when iterating in reverse (bidirectional ranges), although it only has to decrement the length, getting the back element is arr[$-1].

Okay, I got it. What could be improved is rewriting copy/moveAll for random access ranges so that it can use single index loop instead of doing 2 popFronts. Experimenting this way I'm able to get near identical performance. I suspect it could be applied to other similar algorithms benefiting even more code.

To make it clear - I'm not opposed to optimizing by hand, I'm just in favor of reducing that amount and allow greater exposure of optimization. It's all about reuse and impact - if we are to make moveAll as fast as technically possible for random access ranges it would benefit a much broader audience then if we just hack stable sort to be faster.

Either way I see that even straight loop could use more efficient code. There has to be an optimized copy backwards primitive like memcpy but typesafe (a-la array ops).

Optimizers aren't magic that can whisk away any inefficiency. The programmer must still put in effort to manually optimize their code.

I'm mostly aware of what they can and can't do. I recall being surprised by both how much they can do and how much they can't. The good read is Agner's Fog optimization tutorials http://www.agner.org/optimize/.

Personally, I optimize my code empirically. DMD is an odd compiler, things that should be faster in theory actually turn out to be slower.

If there is anything stable it is "look at disassembly" method. I found it to be quite revealing.

So in cases where a foreach loop should be optimal, I use a for loop instead because it proved to be faster in benchmarks.

BTW I've found it myself on numerous occasions. It sounds like a bug.

As for attribution, just include a link to the original code. This is sort of what is done already; std.algorithm:sort links to STL's sort and stable sort.

Will do. Though in this case it was a pointer to how analogous C++ function works not that implementation was taken from STL.

DmitryOlshansky · 2012-09-21T21:21:00Z

I also wrote this. I tried to optimize insertion sort using Duff's Device, but again, it failed to be any faster in benchmarks.

I suspect it's because every step is dependent on the next one (i.e. all of these upper--). What helps a bit is partially unrolling the loop by factor of 2-4 and using Duff's Device to handle first length % factor.

Okay, I think I'd better leave fine-tuning to some time later e.g. when we have std.benchmark ;)
As it is now it works within couple of % of original that should be good enough.

UPDATE: looking over disassembly shows the move+retro code should be indeed slower, as it has more complex inner loop. So I revert to stright for-loop again :)

I added source link, see if it looks okay. The only thing left should be your latest enhancements to TimSort. I'll pull these in soon.

DmitryOlshansky · 2012-09-22T18:00:18Z

Folded in. Interestingly I do observe slight speed up compared to a version of Tim Sort that I checked out from your repo previously.

Xinok · 2012-09-24T23:20:28Z

std/algorithm.d

+        immutable minTemp   = max(range.length / 2, MIN_STORAGE);
+        size_t    minGallop = MIN_GALLOP;
+        Slice[STACK_SIZE] stack = void;
+        size_t    stackLen  = 0;


Line up these declarations or remove the extra spaces.

DmitryOlshansky · 2012-09-26T14:13:18Z

@Xinok Done.

@andralex Ready for another round of destruction.

andralex · 2012-09-29T19:06:27Z

std/algorithm.d

        {
-            put(target, source.front);
+            auto len = source.length;
+            for (size_t idx = 0; idx < len; idx++)


Xinok · 2012-10-10T20:20:57Z

@blackwhale I edited my original post. You can include my more descriptive comment on that line. In fact, please do. :P

I don't thoroughly comment my code, so if there's anywhere that needs a more descriptive comment, leave a note on that line and I'll write a better explanation like I did above.

Xinok · 2012-10-10T20:22:06Z

std/algorithm.d

                {
-                    swapAt(r, lessI, greaterI);
+                    size_t at = stack[run1].length <= stack[run3].length 
+                        ? run1 : run2;


size_t at -> immutable at

andralex · 2012-10-14T13:49:45Z

Ping on this? Any work left prompted by Xinok's comments?

DmitryOlshansky · 2012-10-14T16:55:05Z

Ping on this? Any work left prompted by Xinok's comments?

Yup. I need to add an exteneded unittest and do a few extra style changes.

DmitryOlshansky · 2012-10-15T20:54:50Z

Ready to go.

monarchdodra · 2012-10-19T07:23:49Z

std/algorithm.d

+            for (; !source.empty; source.popFront())
+            {
+                put(target, source.front);
+            }


Put natively supports a range as input, so you can replace this with put(target, source);.

I know the for loop wasn't yours, but if you touch it, might as well improve it.

DmitryOlshansky · 2012-10-27T20:26:19Z

Sorry, I've been somewhat busy. @monarchdodra nits are addressed, waiting for more comments if any.

monarchdodra · 2012-10-29T20:52:32Z

I had nothing but nitpicks.

alexrp · 2012-10-31T11:06:58Z

Are we good to go?

DmitryOlshansky · 2012-10-31T11:37:46Z

Let me fold revamped unit test and squash it then we are good to go.

DmitryOlshansky · 2012-10-31T21:47:32Z

I'm done.

DmitryOlshansky · 2012-11-13T19:21:53Z

Squashed and waiting.

dnadlinger · 2012-11-23T16:10:49Z

Could you separate out the random whitespace/style changes all over the modules to a separate commit? Other than that, looks good to me. Let's get this merged.

DmitryOlshansky · 2012-11-23T17:45:33Z

The whitespace changes are not random it's a deliberate pass over std.algo to bring existing code to the prevalent standard.

Anyway will do.

…orks

DmitryOlshansky · 2012-11-23T19:57:54Z

Here you go. Hopefully we can finally get this in.

dnadlinger · 2012-11-23T23:26:26Z

Thanks – and sorry for being such a stickler, but it is helpful to be able to quickly skip such commits when browsing the history (git blame, …).

DmitryOlshansky · 2012-12-01T09:04:31Z

Ehm! Cool, now the win32 build runs out of memory...

DmitryOlshansky · 2012-12-12T12:36:22Z

Ping.
Is it out of reach of 2.061 or is there any other problem?

andralex · 2012-12-12T15:38:08Z

I assume "cannot read file b6395.d" is unrelated. Merging now.

fix stable sort (4584)

andralex reviewed Sep 17, 2012
View reviewed changes

Xinok reviewed Sep 24, 2012
View reviewed changes

andralex reviewed Sep 29, 2012
View reviewed changes

std/algorithm.d

{

put(target, source.front);

auto len = source.length;

for (size_t idx = 0; idx < len; idx++)

Copy link

Member

andralex Sep 29, 2012

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

foreach

Xinok reviewed Oct 10, 2012
View reviewed changes

monarchdodra reviewed Oct 19, 2012
View reviewed changes

monarchdodra mentioned this pull request Oct 31, 2012

Copy re-impl. #839

Closed

DmitryOlshansky mentioned this pull request Nov 3, 2012

[RFC] Fix move #923

Closed

DmitryOlshansky added 3 commits November 23, 2012 22:58

add stable sort, tweaks to std.algorithm

585a9a2

whitespace & std.algortihm style unification

dca2a00

extra test to make sure stable sort of non-lvalue ranges compiles & w…

3aa8e20

…orks

andralex added a commit that referenced this pull request Dec 12, 2012

Merge pull request #787 from blackwhale/loan-stable-sort

0fcca44

fix stable sort (4584)

andralex merged commit 0fcca44 into dlang:master Dec 12, 2012

Uh oh!

fix stable sort (4584) #787

fix stable sort (4584) #787

Uh oh!

Conversation

DmitryOlshansky commented Sep 11, 2012

Uh oh!

andralex Sep 17, 2012

Choose a reason for hiding this comment

Uh oh!

andralex commented Sep 17, 2012

Uh oh!

DmitryOlshansky commented Sep 17, 2012

Uh oh!

DmitryOlshansky commented Sep 17, 2012

Uh oh!

Xinok commented Sep 17, 2012

Uh oh!

DmitryOlshansky commented Sep 18, 2012

Uh oh!

DmitryOlshansky commented Sep 18, 2012

Uh oh!

Xinok commented Sep 20, 2012

Uh oh!

DmitryOlshansky commented Sep 20, 2012

Uh oh!

DmitryOlshansky commented Sep 21, 2012

Uh oh!

DmitryOlshansky commented Sep 22, 2012

Uh oh!

Xinok Sep 24, 2012

Choose a reason for hiding this comment

Uh oh!

DmitryOlshansky commented Sep 26, 2012

Uh oh!

andralex Sep 29, 2012

Choose a reason for hiding this comment

Uh oh!

Xinok commented Oct 10, 2012

Uh oh!

Xinok Oct 10, 2012

Choose a reason for hiding this comment

Uh oh!

andralex commented Oct 14, 2012

Uh oh!

DmitryOlshansky commented Oct 14, 2012

Uh oh!

DmitryOlshansky commented Oct 15, 2012

Uh oh!

monarchdodra Oct 19, 2012

Choose a reason for hiding this comment

Uh oh!

DmitryOlshansky Oct 20, 2012

Choose a reason for hiding this comment

Uh oh!

DmitryOlshansky commented Oct 27, 2012

Uh oh!

monarchdodra commented Oct 29, 2012

Uh oh!

alexrp commented Oct 31, 2012

Uh oh!

DmitryOlshansky commented Oct 31, 2012

Uh oh!

DmitryOlshansky commented Oct 31, 2012

Uh oh!

DmitryOlshansky commented Nov 13, 2012

Uh oh!

dnadlinger commented Nov 23, 2012

Uh oh!

DmitryOlshansky commented Nov 23, 2012

Uh oh!

DmitryOlshansky commented Nov 23, 2012

Uh oh!

dnadlinger commented Nov 23, 2012

Uh oh!

DmitryOlshansky commented Dec 1, 2012

Uh oh!

DmitryOlshansky commented Dec 12, 2012

Uh oh!

andralex commented Dec 12, 2012