New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stable sort for Array#sort. #5661
Comments
Have the stable-sorting engines converged on a single behavior, or are they differently stable? |
I don't believe folks have normalized on the type of stable sort. For example JavaScriptCore and SpiderMonkey uses a MergeSort algorithm while V8 now uses Timsort (hybrid of MergeSort and InsertionSort). \cc @mathiasbynens @zenparsing |
Stability is the important bit here, as it’s observable by JS developers. The exact sorting algorithm used behind the scenes is an implementation detail. |
Yep, that's the point of a stable sort: Because the relative order of same-comparing items isn't altered, then given identical comparer functions, the same input always produces the same output regardless of the underlying algorithm. |
In other words, a stable sort means that: arr.sort(() => 0); never moves a single item, by definition. There is no such thing as “differently stable” because all comparison results between two items are well-defined w.r.t. where the item ends up (assuming the return values are self-consistent, at least). |
@fatcerberus I think @pleath might have been referring to the remaining differences in observability. The order in which getters/setters are called is observably different depending on the sorting algorithm used behind the scenes. E.g. const array = [1, 2, 3];
Object.defineProperty(array, '0', {
get() { console.log('get'); },
set(value) { console.log('set'); },
});
array.sort();
As stated before, I think these differences are less important than the differences w.r.t. stability. |
Okay, that makes sense, thanks. 😃 I would be surprised if there was much real-world code that depended on the exact order of comparisons (particularly because the specification doesn’t mandate an algorithm—and even with the same algorithm the order will change with the number of items in the list), so I’m inclined to agree that stability is the important thing here. |
The original message implies that there'd be a change to the sort spec after this change is made - is there any indication that this would be the limit of the change to the spec? It'd be unfortunate if we went through the work of replacing the sort implementation, only to then have to re-implement it again if the sort specification gets narrower. Currently, the sort spec allows a great deal of latitude in many regards, and doesn't reflect web reality very well - by spec, many situations would allow us to not sort at all, while the web reality is that we need to make some effort to sort the array reasonably in these situations. I don't think it'd be too surprising if more of these facets were touched upon if the issue of changing the spec to require stability came up... |
cc @littledan To my knowledge, only requiring a stable sorting algorithm has been discussed. Did you have a specific unspecced situation in mind? In my experience, leaving corner cases - such as the existence of elements on the prototype chain of a sparse array - as unspecified behavior has not been problematic. |
Discussing |
V8's change came after years of the lack of stability being a very frequent developer complaint about V8, both on Twitter and the bug tracker; I believe it'd be similarly developer-friendly to make this change in ChakraCore, and I support the proposed specification change. I believe previous specification discussions didn't get very far, with the argument that forcing a stable sort could be worse for performance and therefore worse for developers (I'm not sure who exactly made this argument); that argument would be less valid with the demonstration of improving performance in V8's transition. @schuay and I discussed the possibility of locking down even more things than stability, e.g., not reading up prototype chains for hole-y arrays. My guess is that it would be difficult to sell the committee on this sort of change, and it would risk web compatibility, given our long history of supporting reads through holes across many Array methods. |
Adding a note to this topic - I've been experimenting with JS implementations of merge sort and timSort and running on macOS they so far seem significantly faster than CC's quick sort (now that the quickSort actually works properly i.e. since #5670 ) Could a JsBuiltin mergesort or timSort be a good answer to this question? And... Let me know if you'd like a Pull Request to implement such. (Unsure if there's a license issue with using timSort - noted that v8's implementation says copyright Python foundation at the top of the file even though it's their own implementation of it - is the algorithm itself copyrighted?) |
Adding to the above: V8 already uses timSort today (unsure on SM or JSC) so this would make us consistent in that regard: |
@rhuanjl Given the relatively low frequency of array.sort being a performance limiter, a jsBuiltin is probably the best route to go. I suspect that the Python copyright being at the top of the file (in v8) is due to the functionality being a direct port of the python implementation, rather than a re-implementation of the algorithm from the general outline. @liminzhu, do you know what the license compatibility is for the Python Software Foundation V2 license in our codebase? |
I know PSF is one of the more permissive licenses but I don't know the details :/. Will need to check with the lawyers on that if we use Python's implementation. |
If it's not considered a key performance target then merge sort would likely be a better choice:
Anyway let me know if you'd like me to make a PR for a new sort (and if so which one is wanted). |
Personally, I'd lean towards MergeSort as a JsBuiltin -- if performance becomes an issue, we can re-evaluate. I'd also advocate for using the same sort across all platforms (which a JsBuiltin would enable). To that end, I'd also recommend keeping the native sort implementations as a fallback, and have a flag to use the JsBuiltin version, maybe on-by-default for now. |
@dilijev idk, if we don't have a situation in which we really want to use the native fallback behavior, it's probably cleaner to just remove it now, imo - it'll be in our git history if we need it, and historically off-by-default features tend to rot quickly in our codebase for various reasons. Getting a mergesort implementation in and unifying our supported platforms' behavior would be good. In any case, if the spec were to eventually become even stricter in terms of what behavior is required of the standard array sort, then I imagine that it'd likely standardize on a merge sort - it's significantly easier to reason about and work with consistently between engines. @rhuanjl, if you'd like to make a PR for a merge sort, I'm sure we'd be interested in looking at it. |
I have an implementation that works BUT for some reason testing it as a JsBuiltin it runs at about half the speed it does if I put the same logic in an external JS file - going to spend a bit of time trying to work out what causes the slow down. EDIT: the issue seems to be that inlining goes wrong somehow when the sort is a JsBuiltin hence causing the slowdown this issue has remained the case despite:
|
@sigatrev or @jackhorton might be able to help with figuring out performance issues in JsBuiltins |
Agreed on removing the native version from the code, if the JsBuiltin solution has acceptable performance. |
@rhuanjl Out of curiosity, what was the difference in performance between native sorting and the JS (external version)? |
I'll try and clean up my implementation sufficiently to open a PR tomorrow or the day after, it doesn't pass all tests yet - though this performance point likely needs to be resolved before it's merged. On the performance point: Once put into a JsBuiltin it dropped to half the speed of the native sort on windows (though still 10 times faster than native on macOS). Running it with -trace:inline it appears that the compare function gets immediately inlined when the sort is in a JS file along with the compare function; however when the compare function is passed as a parameter to the JsBuiltin I couldn't understand the output from -trace:inline though the inline count reached 960 - I think it may have been repeatedly trying and failing to inline it for some reason. |
I know @LouisLaf and @sigatrev both shivered at my suggestion of "turning on JsBuiltInForceInline for Intl and seeing what happens," since the Intl.js code has numerous helper functions and is generally extremely complex. I think in general, inlining complex functions was (originally?) an explicit non-goal, so I am not surprised if there are some issues. I know my recent JsBuiltIn PRs force-enabled force-inlining for every jsbuiltin, but you can try to play around with disabling inlining for the jsbuiltin sort implementation to see if that helps. If the user provides a simple callback for the comparison function, that should get inlined into the sort impl regardless due to normal inlining heuristics. I would also be interested in the perf difference between the default comparison function (a native call, I would imagine) and a user-provided comparison. |
Just throwing in my two cents, I would guess that the default comparator isn't going to be particularly fast for most data sets regardless of whether it's implemented as a native or JS call as it roughly translates to (assuming my understanding of the spec is correct): (x, y) => {
let xStr = String(x);
let yStr = String(y);
return xStr < yStr ? -1
: xStr > yStr ? +1
: 0;
} ...which is to say that every comparison involves two string coercions, per specification. Unless your data set is made up of strings to begin with, that's not going to be particularly speedy (and probably won't yield the behavior you want either, in most cases, but I digress 😄). |
This is not necessarily the case. Callback inlining would require we inline sort into the function that called it. In test cases calling sort with a single sort functions you should see regular inlining kick in, but in the general case, if sort is called with numerous different sort functions you wouldn't get inlining unless sort is fully inlined into it's caller. @rhuanjl When you open a PR I'll be happy to look at the perf and see what's going wrong. |
The comparison function supplied to Array.prototype.sort() should satisfy the property: cmp(a, b) == -cmp(b, a) but this one did not. In particular, if both 'a' and 'b' are on the desktop, or if neither is, it would claim that one sorted before the other (which one depended on the parameter order) rather than comparing them equal. With a stable sort, elements that compare equal will not be reordered relative to one another, so fixing this function means that when two apps are both on the grid, the higher-ranked one will not be reorded to be below the lower-ranked one. (It is not currently defined whether Array.prototype.sort() is stable, but mozjs's implementation happens to be. Microsoft's ChakraCore is the only mainstream JS implementation with an unstable sort, and they are discussing changing that, then changing the spec to require stability. chakra-core/ChakraCore#5661) https://phabricator.endlessm.com/T23669
The comparison function supplied to Array.prototype.sort() should satisfy the property: cmp(a, b) == -cmp(b, a) but this one did not. In particular, if both 'a' and 'b' are on the desktop, or if neither is, it would claim that one sorted before the other (which one depended on the parameter order) rather than comparing them equal. With a stable sort, elements that compare equal will not be reordered relative to one another, so fixing this function means that when two apps are both on the grid, the higher-ranked one will not be reorded to be below the lower-ranked one. (It is not currently defined whether Array.prototype.sort() is stable, but mozjs's implementation happens to be. Microsoft's ChakraCore is the only mainstream JS implementation with an unstable sort, and they are discussing changing that, then changing the spec to require stability. chakra-core/ChakraCore#5661) https://phabricator.endlessm.com/T23669
….sort Merge pull request #5724 from rhuanjl:mergeSort Picking up on the discussion in #5661 This PR implements a stable bottom up Merge Sort as a JsBuiltin for arrays of any length up to 2^32 (well I hit out of memory trying to allocate an array with length above 2^29 but in theory). I'm not sure if it's good enough to merge as is but would appreciate feedback. **EDIT:** I've made some large edits to the below to reflect changes made. **Issues to consider:** 1. **Performance - DefaultCompare** - My Default Compare sort is very slow despite cacheing all the string conversions at the start it. The string less than operation is a significant bottle neck - I have tried: - a native chakraLibrary method to compare strings - this was about the same performance as using less than - using charCodeAt in a loop - this was also about the same performance as using less than 1. **Insertion sort** - I have included an insertion sort directly in the Array.prototype.sort function used for short arrays - could consider what the best cut off is before switching to mergeSort instead - currently length of 2048 is used. 1. **Memory usage** - My implementation of merge sort needs a buffer array with length up to half the length of the array being sorted. 1. **Scope** - I've not looked at the sort method for TypedArrays obviously stabilising that doesn't make sense (though may be worth looking at its performance on xplat as it uses the earlier mentioned slow xplat qsort for arrays of any length) 1. **Tests** - I've consolidated most of the pre-existing array sort tests and also added a test for sorting a variety of random arrays and ensuring that the sort is both correct and stable 1. **General** - see other comments I've added below... fixes: #5661 fixes: #5719
Now that V8 has patched their
Array#sort
to be stable (see here) it appears that Chakra may be one of the last (or the last) of the browser JS engines (JS Core, SpiderMonkey, V8, etc.) to perform a non-stable sort (see demo).https://github.com/Microsoft/ChakraCore/blob/0f40413d2e29863edf5f82de14fac41843deb79f/lib/Runtime/Library/JavascriptArray.cpp#L6640-L6675
While not a spec violation, if all engines perform a stable sort we can work towards standardizing it in the TC39.
The text was updated successfully, but these errors were encountered: