Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add workspace/buffer support to sorting #45330

Merged
merged 1 commit into from
Jun 2, 2022

Conversation

LilithHafner
Copy link
Member

This PR adds an additional workspace argument to most internal sorting functions and allows the end-user to pass one in with the keyword argument workspace (undocumented).

There are two types of workspaces: nothing, and an AbstractVector. nothing indicates that sorting functions should allocate new workspaces as needed and discard them when done. An AbstractVector indicates that sorting functions should use that vector, resizing it larger if needed. Wherever possible, sorting functions can handle receiving nothing, and AbstractVector with appropriate eltype, or no argument at all.

This PR also utilizes that feature to implement solution 3 of #45326 and therefore closes #45326.

As is, this PR exposes a new public-facing keyword argument for (almost) all sorting functions:

sort(x, workspace=y)

Is there a standard name for this optional keyword argument?

@pcjentsch
Copy link
Contributor

Is there a standard name for this optional keyword argument?

I think SciML usually calls it cache.

@LilithHafner
Copy link
Member Author

I crossposted to discourse and am crossposting Jar1's response back

I usually think of a “cache” as being a cache of values, rather than a location with empty available memory.

Perhaps “workspace” or “buffer” works better.

@LilithHafner
Copy link
Member Author

There's likely still room for improvement by letting the workspace function depend on OffsetArrays, but this is already pretty good:

@btime sort(x; dims=1) setup=(x=rand(Int, 3000, 3000)) evals=1;
v1.7.2 => 357.704 ms (5 allocations: 68.66 MiB)
v1.9.0 => 10.980 s (12005 allocations: 201.24 GiB)
PR     => 229.195 ms (3018 allocations: 150.60 MiB)

@LilithHafner
Copy link
Member Author

Bump :)

@oscardssmith oscardssmith merged commit bd8dbc3 into JuliaLang:master Jun 2, 2022
@musm
Copy link
Contributor

musm commented Jun 2, 2022

I would've preferred the keyword name buffer for this. Looking over the PR I kept expecting workspace to denote something related to the current session's work space of variables.

Is the intention to keep this as an internal undocumented feature?

@StefanKarpinski
Copy link
Sponsor Member

Agree that buffer would be a clearer name. I thought this might be about implementing variable clearing.

@LilithHafner
Copy link
Member Author

Both seem fine to me. There is no reason we have to stick to "workspace" just because this merged. Googling definitions yields:

Buffer:
a temporary memory area in which data is stored while it is being processed or transferred, especially one used while streaming video or downloading audio.

Workspace:
a memory storage facility for temporary use.

Scratchspace:
Scratch space is space on the hard disk drive that is dedicated for storage of temporary user data

As long as we reach consensus by the time we start 1.9 prereleases this shouldn't be a problem.

@LilithHafner
Copy link
Member Author

FWIW I just talked with a CS researcher who doesn't use Julia. She prefers "workspace".

@LilithHafner
Copy link
Member Author

Another CS researcher who doesn't use Julia prefers Workspace over Cache, Buffer, and Scratchspace, saying "I feel like workspace is a more intuitive word for non-computer scientist people to understand, if you have to use something"

@musm
Copy link
Contributor

musm commented Jun 5, 2022

I think that's arguable, but I thought I'd share my reaction looking through this PR. I'd hazard that throughout most of the Julia code base buffer is used to denote this concept.

Workspace to me denotes something entirely different. For example, in MATLAB (https://www.mathworks.com/help/matlab/ref/workspace.html) and VS Code https://code.visualstudio.com/docs/editor/workspaces. This was why I was thrown off initially. Apparently, there is a precedent from Fortran to use workspace for the concept, but that almost feels to me a little antiquated.

@LilithHafner
Copy link
Member Author

Unless anyone else chimes in in favor of keeping workspace I'll do a renaming workspace => buffer once an opportunity arises that avoids excessive merge conflicts with ongoing sorting PRs.

@LilithHafner LilithHafner changed the title Add workspace support to sorting Add workspace/buffer support to sorting Jun 8, 2022
oscardssmith pushed a commit that referenced this pull request Jun 16, 2022
* Fix and test sort!(OffsetArray(rand(200), -10))

* Convert to 1-based indexing rather than generalize to arbitrary indexing

* avoid overhead of views where reasonable

* style

* handle edge cases better, making the workspace function unhelpful. Also minor style changes and fixups from #45596 and local review.

* move comments in tests for discoverability

Co-authored-by: Lilith Hafner <Lilith.Hafner@gmail.com>
@LilithHafner
Copy link
Member Author

The renaming pull request is here

@fredrikekre fredrikekre added needs news A NEWS entry is required for this change needs compat annotation Add !!! compat "Julia x.y" to the docstring labels Jun 16, 2022
@LilithHafner LilithHafner added the domain:sorting Put things in order label Jul 19, 2022
@LilithHafner LilithHafner deleted the scratchspace branch July 19, 2022 13:19
@LilithHafner
Copy link
Member Author

As this is internal, I don't think it needs news or a compat annotation.

@LilithHafner LilithHafner removed needs news A NEWS entry is required for this change needs compat annotation Add !!! compat "Julia x.y" to the docstring labels Oct 10, 2022
@LilithHafner LilithHafner mentioned this pull request Dec 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:sorting Put things in order
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unnecessary allocations in sorting
6 participants