Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add specifications for sorting functions #19

Merged
merged 8 commits into from
Nov 8, 2020
Merged

Add specifications for sorting functions #19

merged 8 commits into from
Nov 8, 2020

Conversation

kgryte
Copy link
Contributor

@kgryte kgryte commented Aug 24, 2020

This PR

  • adds specifications for sorting functions.
  • is derived from comparing API signatures across array libraries.
  • includes a keyword argument for specifying the default sort order (e.g., 'descending'). This follows Torch and TensorFlow.
  • includes a keyword argument for specifying whether to use a stable sorting algorithm. This is necessary for ensuring reproducible results for argsort where a stable sort permutation may be desired.

Notes

  • NumPy includes additional sorting functions (e.g., lexsort, partition, argpartition, msort); however, these functions are not widely implemented by other analyzed array libraries (less than half; see here) and, thus, were not included in this initial specification. Should additional sorting functions be necessary, they can be proposed in follow-up proposals.

@rgommers
Copy link
Member

Thanks @kgryte. Can you also list the ones you considered and left out here? E.g. lexsort, searchsorted, argpartition.

Should the spec mandate a default sort order (e.g., 'ascending')? Or should we allow this to be implementation-dependent?

I'd say we want the same order for all libraries by default, so should be specified.

If we specify a default sort order, should this be configurable? If so, what should be the corresponding keyword? Currently, among array libraries, Torch and Tensorflow allow specifying the sort order, while NumPy et al do not.

Hmm good question. It makes sense as a convenience; the alternative is to add [::-1] to the output to reverse the sort order. There's only two possible options, so if we add it I like the boolean version of PyTorch better. Also ref the builtin sorted, which has a reverse keyword.

Given that it's trivial to do [::-1], maybe we should be conservative here and not implement it on the basis that the majority of libraries don't have this option.

Should the spec require a stable sorting algorithm? Or should this be implementation-dependent? Note that this mainly has implications if we want the sort permutation to be stable (e.g., argsort).

Not sure - from first principles specifying a stable sort as the default may make sense, but then it will require a method= keyword probably. I just quickly checked the TensorFlow and PyTorch docs, they don't bother to even state what algorithm is used.

@kgryte
Copy link
Contributor Author

kgryte commented Aug 24, 2020

Re: other APIs. Updated the OP.

Re: sort order. Updated the proposal to sort in ascending order (by default).

Re: [::-1]. I suppose one argument in favor of configurability is for performance reasons. While strided implementations can simply return a view, reversing the sort order view via indexing may require another array traversal and potentially additional memory allocation. If the sort order could be specified up front (ascending or descending), then the sort algorithm could support either mode.

And agreed. I also prefer a boolean keyword argument (e.g., descending=False, similar to Torch), if configurability is desired.

Re: stability. My guess is that array libraries may want the flexibility to choose their default implementation. We could go a middle path and follow Tensorflow argsort and require a stable=True keyword to enforce the availability of a stable sort.

Maybe one additional forward looking concern is, should we spec complex data type support, whether "ascending" and "descending" is appropriate language. In which case, for sort order configurability, using reverse (similar to the Python built-in sorted) may be more appropriate.

@rgommers
Copy link
Member

While strided implementations can simply return a view, reversing the sort order view via indexing may require another array traversal and potentially additional memory allocation. If the sort order could be specified up front (ascending or descending), then the sort algorithm could support either mode.

That's a good reason indeed, +1 for adding a keyword.

Maybe one additional forward looking concern is, should we spec complex data type support, whether "ascending" and "descending" is appropriate language. In which case, for sort order configurability, using reverse (similar to the Python built-in sorted) may be more appropriate.

I'd be inclined to not worry about that - complex sorting may need specific keywords and hence another function. The "sort by real values first" in numpy.sort_complex is pretty arbitrary, sorting by magnitude would be an equally valid option.

@kgryte
Copy link
Contributor Author

kgryte commented Aug 24, 2020

Re: sorting_complex. Agreed. The ambiguity of what it means to sort complex numbers was what I had in mind. I am more inclined, given the nuances of sorting, e.g., complex numbers, to advocate for a separate method, as done in NumPy now.

@rgommers
Copy link
Member

rgommers commented Sep 8, 2020

Re: stability. My guess is that array libraries may want the flexibility to choose their default implementation. We could go a middle path and follow Tensorflow argsort and require a stable=True keyword to enforce the availability of a stable sort.

@kgryte this was discussed in the 27 Aug meeting; the minutes still need to be updated from the video, and then this PR needs an update I think (or a confirmation that we agreed on stable=True).

@rgommers
Copy link
Member

@kgryte ping on this last point on stable=True.

@kgryte
Copy link
Contributor Author

kgryte commented Nov 8, 2020

@rgommers Updated the PR to include a stable keyword. This PR is ready for final review.

Copy link
Member

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now, thanks @kgryte

@rgommers rgommers merged commit eb0a991 into master Nov 8, 2020
@rgommers rgommers deleted the sort branch November 8, 2020 22:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants