Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stable sort for Array#sort. #5661

Closed
jdalton opened this issue Sep 3, 2018 · 27 comments · Fixed by #5724
Closed

Stable sort for Array#sort. #5661

jdalton opened this issue Sep 3, 2018 · 27 comments · Fixed by #5724

Comments

@jdalton
Copy link

jdalton commented Sep 3, 2018

Now that V8 has patched their Array#sort to be stable (see here) it appears that Chakra may be one of the last (or the last) of the browser JS engines (JS Core, SpiderMonkey, V8, etc.) to perform a non-stable sort (see demo).

https://github.com/Microsoft/ChakraCore/blob/0f40413d2e29863edf5f82de14fac41843deb79f/lib/Runtime/Library/JavascriptArray.cpp#L6640-L6675

While not a spec violation, if all engines perform a stable sort we can work towards standardizing it in the TC39.

@pleath
Copy link
Contributor

pleath commented Sep 3, 2018

Have the stable-sorting engines converged on a single behavior, or are they differently stable?

@jdalton
Copy link
Author

jdalton commented Sep 3, 2018

I don't believe folks have normalized on the type of stable sort. For example JavaScriptCore and SpiderMonkey uses a MergeSort algorithm while V8 now uses Timsort (hybrid of MergeSort and InsertionSort). \cc @mathiasbynens @zenparsing

@mathiasbynens
Copy link
Contributor

Stability is the important bit here, as it’s observable by JS developers. The exact sorting algorithm used behind the scenes is an implementation detail.

@fatcerberus
Copy link
Contributor

Yep, that's the point of a stable sort: Because the relative order of same-comparing items isn't altered, then given identical comparer functions, the same input always produces the same output regardless of the underlying algorithm.

@fatcerberus
Copy link
Contributor

fatcerberus commented Sep 5, 2018

In other words, a stable sort means that:

arr.sort(() => 0);

never moves a single item, by definition. There is no such thing as “differently stable” because all comparison results between two items are well-defined w.r.t. where the item ends up (assuming the return values are self-consistent, at least).

@mathiasbynens
Copy link
Contributor

@fatcerberus I think @pleath might have been referring to the remaining differences in observability. The order in which getters/setters are called is observably different depending on the sorting algorithm used behind the scenes. E.g.

const array = [1, 2, 3];
Object.defineProperty(array, '0', {
  get() { console.log('get'); },
  set(value) { console.log('set'); },
});
array.sort();
$ chakra example.js
get
set

$ spidermonkey example.js
get
set

$ javascriptcore example.js # after replacing console.log() with print()
get
get
set
get
get
set

$ v8 example.js
get
get
set
get
get
set
get
get
set

As stated before, I think these differences are less important than the differences w.r.t. stability.

@fatcerberus
Copy link
Contributor

fatcerberus commented Sep 6, 2018

Okay, that makes sense, thanks. 😃

I would be surprised if there was much real-world code that depended on the exact order of comparisons (particularly because the specification doesn’t mandate an algorithm—and even with the same algorithm the order will change with the number of items in the list), so I’m inclined to agree that stability is the important thing here.

@Penguinwizzard
Copy link
Contributor

The original message implies that there'd be a change to the sort spec after this change is made - is there any indication that this would be the limit of the change to the spec? It'd be unfortunate if we went through the work of replacing the sort implementation, only to then have to re-implement it again if the sort specification gets narrower. Currently, the sort spec allows a great deal of latitude in many regards, and doesn't reflect web reality very well - by spec, many situations would allow us to not sort at all, while the web reality is that we need to make some effort to sort the array reasonably in these situations. I don't think it'd be too surprising if more of these facets were touched upon if the issue of changing the spec to require stability came up...

@schuay
Copy link

schuay commented Sep 12, 2018

cc @littledan

To my knowledge, only requiring a stable sorting algorithm has been discussed.

Did you have a specific unspecced situation in mind? In my experience, leaving corner cases - such as the existence of elements on the prototype chain of a sparse array - as unspecified behavior has not been problematic.

@mathiasbynens
Copy link
Contributor

Discussing Array.prototype.sort stability is on the agenda for this month's TC39 meeting: https://github.com/tc39/agendas/blob/master/2018/09.md We'll update this issue based on the committee's feedback.

@littledan
Copy link

littledan commented Sep 12, 2018

V8's change came after years of the lack of stability being a very frequent developer complaint about V8, both on Twitter and the bug tracker; I believe it'd be similarly developer-friendly to make this change in ChakraCore, and I support the proposed specification change. I believe previous specification discussions didn't get very far, with the argument that forcing a stable sort could be worse for performance and therefore worse for developers (I'm not sure who exactly made this argument); that argument would be less valid with the demonstration of improving performance in V8's transition.

@schuay and I discussed the possibility of locking down even more things than stability, e.g., not reading up prototype chains for hole-y arrays. My guess is that it would be difficult to sell the committee on this sort of change, and it would risk web compatibility, given our long history of supporting reads through holes across many Array methods.

@rhuanjl
Copy link
Collaborator

rhuanjl commented Sep 15, 2018

Adding a note to this topic - I've been experimenting with JS implementations of merge sort and timSort and running on macOS they so far seem significantly faster than CC's quick sort (now that the quickSort actually works properly i.e. since #5670 )

Could a JsBuiltin mergesort or timSort be a good answer to this question?

And... Let me know if you'd like a Pull Request to implement such.

(Unsure if there's a license issue with using timSort - noted that v8's implementation says copyright Python foundation at the top of the file even though it's their own implementation of it - is the algorithm itself copyrighted?)

@fatcerberus
Copy link
Contributor

Adding to the above: V8 already uses timSort today (unsure on SM or JSC) so this would make us consistent in that regard:
https://github.com/v8/v8/blob/fa11e2ac0380948a2f57963e6651402c3ca07d04/third_party/v8/builtins/array-sort.tq

@Penguinwizzard
Copy link
Contributor

Penguinwizzard commented Sep 16, 2018

@rhuanjl Given the relatively low frequency of array.sort being a performance limiter, a jsBuiltin is probably the best route to go. I suspect that the Python copyright being at the top of the file (in v8) is due to the functionality being a direct port of the python implementation, rather than a re-implementation of the algorithm from the general outline. @liminzhu, do you know what the license compatibility is for the Python Software Foundation V2 license in our codebase?

@liminzhu
Copy link
Collaborator

I know PSF is one of the more permissive licenses but I don't know the details :/. Will need to check with the lawyers on that if we use Python's implementation.

@rhuanjl
Copy link
Collaborator

rhuanjl commented Sep 17, 2018

If it's not considered a key performance target then merge sort would likely be a better choice:

  1. ~150 lines of code for merge sort vs >1000 lines of code for timSort
  2. merge sort is simple for someone else to understand/maintain whereas timSort implementations tend to look like some kind of eldritch creation
  3. the license point (the licensing issue with timSort being that the algorithm was invented for Python and any implementation of it effectively ends up being a port of the code written for python)
  4. JSC uses mergesort so wouldn't be going solo to go that way

Anyway let me know if you'd like me to make a PR for a new sort (and if so which one is wanted).

@dilijev
Copy link
Contributor

dilijev commented Sep 19, 2018

Personally, I'd lean towards MergeSort as a JsBuiltin -- if performance becomes an issue, we can re-evaluate. I'd also advocate for using the same sort across all platforms (which a JsBuiltin would enable).

To that end, I'd also recommend keeping the native sort implementations as a fallback, and have a flag to use the JsBuiltin version, maybe on-by-default for now.

@Penguinwizzard
Copy link
Contributor

@dilijev idk, if we don't have a situation in which we really want to use the native fallback behavior, it's probably cleaner to just remove it now, imo - it'll be in our git history if we need it, and historically off-by-default features tend to rot quickly in our codebase for various reasons. Getting a mergesort implementation in and unifying our supported platforms' behavior would be good. In any case, if the spec were to eventually become even stricter in terms of what behavior is required of the standard array sort, then I imagine that it'd likely standardize on a merge sort - it's significantly easier to reason about and work with consistently between engines. @rhuanjl, if you'd like to make a PR for a merge sort, I'm sure we'd be interested in looking at it.

@rhuanjl
Copy link
Collaborator

rhuanjl commented Sep 19, 2018

I have an implementation that works BUT for some reason testing it as a JsBuiltin it runs at about half the speed it does if I put the same logic in an external JS file - going to spend a bit of time trying to work out what causes the slow down.

EDIT: the issue seems to be that inlining goes wrong somehow when the sort is a JsBuiltin hence causing the slowdown this issue has remained the case despite:

  1. trying a stack based mergesort instead of a recursive one (this was slower)
  2. Disabling the "force - inline" call for JsBuiltins - this did nothing

@dilijev
Copy link
Contributor

dilijev commented Sep 20, 2018

@sigatrev or @jackhorton might be able to help with figuring out performance issues in JsBuiltins

@dilijev
Copy link
Contributor

dilijev commented Sep 20, 2018

Agreed on removing the native version from the code, if the JsBuiltin solution has acceptable performance.

@dilijev
Copy link
Contributor

dilijev commented Sep 20, 2018

@rhuanjl - (Lack of) inlining of JsBuiltins is most likely the problem, yes. I also think that issue was being worked on, but not sure who was working on it. Maybe @sigatrev

@dilijev
Copy link
Contributor

dilijev commented Sep 20, 2018

@rhuanjl Out of curiosity, what was the difference in performance between native sorting and the JS (external version)?

@rhuanjl
Copy link
Collaborator

rhuanjl commented Sep 21, 2018

I'll try and clean up my implementation sufficiently to open a PR tomorrow or the day after, it doesn't pass all tests yet - though this performance point likely needs to be resolved before it's merged.

On the performance point:
The JS implementation in a separate file was comparable to the native sort on windows - within margin of error of each other so can't say which is faster. But it was 20 times faster than the native sort on macOS (on xplat a different qsort implementation is used and it's clearly not the fastest).

Once put into a JsBuiltin it dropped to half the speed of the native sort on windows (though still 10 times faster than native on macOS).

Running it with -trace:inline it appears that the compare function gets immediately inlined when the sort is in a JS file along with the compare function; however when the compare function is passed as a parameter to the JsBuiltin I couldn't understand the output from -trace:inline though the inline count reached 960 - I think it may have been repeatedly trying and failing to inline it for some reason.

@jackhorton
Copy link
Contributor

I know @LouisLaf and @sigatrev both shivered at my suggestion of "turning on JsBuiltInForceInline for Intl and seeing what happens," since the Intl.js code has numerous helper functions and is generally extremely complex. I think in general, inlining complex functions was (originally?) an explicit non-goal, so I am not surprised if there are some issues.

I know my recent JsBuiltIn PRs force-enabled force-inlining for every jsbuiltin, but you can try to play around with disabling inlining for the jsbuiltin sort implementation to see if that helps. If the user provides a simple callback for the comparison function, that should get inlined into the sort impl regardless due to normal inlining heuristics. I would also be interested in the perf difference between the default comparison function (a native call, I would imagine) and a user-provided comparison.

@fatcerberus
Copy link
Contributor

fatcerberus commented Sep 21, 2018

Just throwing in my two cents, I would guess that the default comparator isn't going to be particularly fast for most data sets regardless of whether it's implemented as a native or JS call as it roughly translates to (assuming my understanding of the spec is correct):

(x, y) => {
    let xStr = String(x);
    let yStr = String(y);
    return xStr < yStr ? -1
           : xStr > yStr ? +1
           : 0;
}

...which is to say that every comparison involves two string coercions, per specification. Unless your data set is made up of strings to begin with, that's not going to be particularly speedy (and probably won't yield the behavior you want either, in most cases, but I digress 😄).

@sigatrev
Copy link
Contributor

sigatrev commented Sep 21, 2018

If the user provides a simple callback for the comparison function, that should get inlined into the sort impl regardless due to normal inlining heuristics.

This is not necessarily the case. Callback inlining would require we inline sort into the function that called it. In test cases calling sort with a single sort functions you should see regular inlining kick in, but in the general case, if sort is called with numerous different sort functions you wouldn't get inlining unless sort is fully inlined into it's caller.

@rhuanjl When you open a PR I'll be happy to look at the perf and see what's going wrong.

wjt added a commit to endlessm/gnome-shell that referenced this issue Sep 24, 2018
The comparison function supplied to Array.prototype.sort() should
satisfy the property:

    cmp(a, b) == -cmp(b, a)

but this one did not. In particular, if both 'a' and 'b' are on the
desktop, or if neither is, it would claim that one sorted before the
other (which one depended on the parameter order) rather than comparing
them equal.

With a stable sort, elements that compare equal will not be reordered
relative to one another, so fixing this function means that when two
apps are both on the grid, the higher-ranked one will not be reorded to
be below the lower-ranked one.

(It is not currently defined whether Array.prototype.sort() is stable,
but mozjs's implementation happens to be. Microsoft's ChakraCore is the
only mainstream JS implementation with an unstable sort, and they are
discussing changing that, then changing the spec to require stability.
chakra-core/ChakraCore#5661)

https://phabricator.endlessm.com/T23669
wjt added a commit to endlessm/gnome-shell that referenced this issue Sep 24, 2018
The comparison function supplied to Array.prototype.sort() should
satisfy the property:

    cmp(a, b) == -cmp(b, a)

but this one did not. In particular, if both 'a' and 'b' are on the
desktop, or if neither is, it would claim that one sorted before the
other (which one depended on the parameter order) rather than comparing
them equal.

With a stable sort, elements that compare equal will not be reordered
relative to one another, so fixing this function means that when two
apps are both on the grid, the higher-ranked one will not be reorded to
be below the lower-ranked one.

(It is not currently defined whether Array.prototype.sort() is stable,
but mozjs's implementation happens to be. Microsoft's ChakraCore is the
only mainstream JS implementation with an unstable sort, and they are
discussing changing that, then changing the spec to require stability.
chakra-core/ChakraCore#5661)

https://phabricator.endlessm.com/T23669
chakrabot pushed a commit that referenced this issue Nov 1, 2018
….sort

Merge pull request #5724 from rhuanjl:mergeSort

Picking up on the discussion in #5661 This PR implements a stable bottom up Merge Sort as a JsBuiltin for arrays of any length up to 2^32 (well I hit out of memory trying to allocate an array with length above 2^29 but in theory).

I'm not sure if it's good enough to merge as is but would appreciate feedback.

**EDIT:** I've made some large edits to the below to reflect changes made.

**Issues to consider:**
1. **Performance - DefaultCompare** - My Default Compare sort is very slow despite cacheing all the string conversions at the start it. The string less than operation is a significant bottle neck - I have tried:
    - a native chakraLibrary method to compare strings - this was about the same performance as using less than
    - using charCodeAt in a loop - this was also about the same performance as using less than

1. **Insertion sort** - I have included an insertion sort directly in the Array.prototype.sort function used for short arrays - could consider what the best cut off is before switching to mergeSort instead - currently length of 2048 is used.

1. **Memory usage** - My implementation of merge sort needs a buffer array with length up to half the length of the array being sorted.

1. **Scope** - I've not looked at the sort method for TypedArrays obviously stabilising that doesn't make sense (though may be worth looking at its performance on xplat as it uses the earlier mentioned slow xplat qsort for arrays of any length)

1. **Tests** - I've consolidated most of the pre-existing array sort tests and also added a test for sorting a variety of random arrays and ensuring that the sort is both correct and stable

1. **General** - see other comments I've added below...

fixes: #5661
fixes: #5719
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.