-
-
Notifications
You must be signed in to change notification settings - Fork 741
fix stable sort (4584) #787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -7295,7 +7298,7 @@ sort(alias less = "a < b", SwapStrategy ss = SwapStrategy.unstable, | |||
{ | |||
static assert(false, "Invalid predicate passed to sort: "~less); | |||
} | |||
return assumeSorted!less(r); | |||
return assumeSorted!(less)(r); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why change from good style to bad style?
There are quite a few stylistic issues and unrealized opportunities for adding more useful algorithms. For now, however, let's focus on fixing the more obvious ones and getting this in. Thanks! |
Okay, I knew I'm up for some work :) |
Fixed all of raised issues. |
I'm happy to see this coming into fruition. It looks bad for D when simple things like a stable sort are broken. I suggest removing the "Adopted From" comment and replace it with something more proper. If you really want to give attribution, you could simply provide a link to the original code. You could replace the "optimisticInsertionSort" with binary insertion sort, whether mine or your own, and call that instead. While it's a bit slower when sorting primitive types, it does far fewer comparisons which would benefit the unstable sort as well. As for moveAll, I would avoid that altogether. The optimizer isn't really to blame; The implementation uses input range primitives which are just slower. Also, destructively moving each element for insertion will add much unneeded overhead, especially for larger types. Insertion sort is where the majority of time is spent and is integral for performance. My email is in my profile if you need help with anything. I'm not sure how difficult it is to understand as I don't thoroughly comment my code. |
@Xinok About "Adopted From..." I conjured this note in a couple of minutes and not particularly fond of it myself. Still IRC public domain license requires to explicilty distinguish altered versions and this one is getting some changes. Anyway I'm more then willing to put in any note that you as an author feel would be more appropriate. optimisticInsertionSort vs binaryInsertionSort - looks like I'd need to play some more with them to have an idea of their performance characteristics on small inputs. Now speaking of input range primitives and moveAll I completely disagree. If range primitives are to have any success they must be (almost) as fast as naked arrays. If not then we are doing something wrong and that leaves us with 2 options: ether redesign ranges or hack compiler to optimize them better. And I'd argue that ranges as defined are good enough ground for optimal (or near optimal) performance as they are just compile-time hardwired thin wrappers for arrays unlike say GoF Iterator that has dynamic dispatch. See this sample that includes 3 versions of shift right array to the right: And the disassembly of 3 relevant pieces here: Obviously compiler doesn't do a prticularly good job and happily reloads plenty of stuff from stack even in simpliest case. 2nd version doesn't have move inlined, poor fellow. Having looked at its source code of DMD inliner I can confidently say: it's not good enough. For instance it won't ever optimize functions with multiple returns and in fact move currently has an early return and thus is not inlined. |
Bottom line: |
Considering that arrays are a pointer and length pair, input ranges have to increment the pointer and decrement the length for each iteration. Even when iterating in reverse (bidirectional ranges), although it only has to decrement the length, getting the back element is arr[$-1]. Optimizers aren't magic that can whisk away any inefficiency. The programmer must still put in effort to manually optimize their code. The best thing to do is use foreach loops when possible. In this way, the compiler can choose the optimal method for iteration without the programmer writing specialized code for different types. In this case, the most efficient way to iterate an array is using pointers. Personally, I optimize my code empirically. DMD is an odd compiler, things that should be faster in theory actually turn out to be slower. So in cases where a foreach loop should be optimal, I use a for loop instead because it proved to be faster in benchmarks. I pushed out an update for Timsort today. Read the comment to understand what I changed and why. I also wrote this. I tried to optimize insertion sort using Duff's Device, but again, it failed to be any faster in benchmarks. As for attribution, just include a link to the original code. This is sort of what is done already; std.algorithm:sort links to STL's sort and stable sort. |
Okay, I got it. What could be improved is rewriting copy/moveAll for random access ranges so that it can use single index loop instead of doing 2 popFronts. Experimenting this way I'm able to get near identical performance. I suspect it could be applied to other similar algorithms benefiting even more code. To make it clear - I'm not opposed to optimizing by hand, I'm just in favor of reducing that amount and allow greater exposure of optimization. It's all about reuse and impact - if we are to make moveAll as fast as technically possible for random access ranges it would benefit a much broader audience then if we just hack stable sort to be faster. Either way I see that even straight loop could use more efficient code. There has to be an optimized copy backwards primitive like memcpy but typesafe (a-la array ops).
I'm mostly aware of what they can and can't do. I recall being surprised by both how much they can do and how much they can't. The good read is Agner's Fog optimization tutorials http://www.agner.org/optimize/.
If there is anything stable it is "look at disassembly" method. I found it to be quite revealing.
BTW I've found it myself on numerous occasions. It sounds like a bug.
Will do. Though in this case it was a pointer to how analogous C++ function works not that implementation was taken from STL. |
I suspect it's because every step is dependent on the next one (i.e. all of these upper--). What helps a bit is partially unrolling the loop by factor of 2-4 and using Duff's Device to handle first length % factor. Okay, I think I'd better leave fine-tuning to some time later e.g. when we have std.benchmark ;) UPDATE: looking over disassembly shows the move+retro code should be indeed slower, as it has more complex inner loop. So I revert to stright for-loop again :) I added source link, see if it looks okay. The only thing left should be your latest enhancements to TimSort. I'll pull these in soon. |
Folded in. Interestingly I do observe slight speed up compared to a version of Tim Sort that I checked out from your repo previously. |
immutable minTemp = max(range.length / 2, MIN_STORAGE); | ||
size_t minGallop = MIN_GALLOP; | ||
Slice[STACK_SIZE] stack = void; | ||
size_t stackLen = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line up these declarations or remove the extra spaces.
{ | ||
put(target, source.front); | ||
auto len = source.length; | ||
for (size_t idx = 0; idx < len; idx++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
foreach
@blackwhale I edited my original post. You can include my more descriptive comment on that line. In fact, please do. :P I don't thoroughly comment my code, so if there's anywhere that needs a more descriptive comment, leave a note on that line and I'll write a better explanation like I did above. |
{ | ||
swapAt(r, lessI, greaterI); | ||
size_t at = stack[run1].length <= stack[run3].length | ||
? run1 : run2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size_t at -> immutable at
Ping on this? Any work left prompted by Xinok's comments? |
Yup. I need to add an exteneded unittest and do a few extra style changes. |
Ready to go. |
for (; !source.empty; source.popFront()) | ||
{ | ||
put(target, source.front); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put natively supports a range as input, so you can replace this with put(target, source);
.
I know the for loop
wasn't yours, but if you touch it, might as well improve it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do
Sorry, I've been somewhat busy. @monarchdodra nits are addressed, waiting for more comments if any. |
I had nothing but nitpicks. |
Are we good to go? |
Let me fold revamped unit test and squash it then we are good to go. |
I'm done. |
Squashed and waiting. |
Could you separate out the random whitespace/style changes all over the modules to a separate commit? Other than that, looks good to me. Let's get this merged. |
The whitespace changes are not random it's a deliberate pass over std.algo to bring existing code to the prevalent standard. Anyway will do. |
Here you go. Hopefully we can finally get this in. |
Thanks – and sorry for being such a stickler, but it is helpful to be able to quickly skip such commits when browsing the history (git blame, …). |
Ehm! Cool, now the win32 build runs out of memory... |
Ping. |
I assume "cannot read file b6395.d" is unrelated. Merging now. |
Use Xinok's public domain Tim Sort for stable sort.
We had broken stable sort for 2 years now but I really need a working one for upcoming std.uni ;)