Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: std::partition #81

Open
jfb-h opened this issue Dec 1, 2023 · 7 comments · May be fixed by #82
Open

Feature request: std::partition #81

jfb-h opened this issue Dec 1, 2023 · 7 comments · May be fixed by #82

Comments

@jfb-h
Copy link

jfb-h commented Dec 1, 2023

Following this slack thread, it would be nice to have an implementation of std::partition.

@LilithHafner
Copy link
Member

This is already possible with sort!(data, by=iseven)

sort! is stable, but has a worse performance

Benchmarks
using Chairmarks, Plots

function partition!(v::AbstractVector; by)
    i, j = firstindex(v), lastindex(v)
    @inbounds while true
        while !by(v[i]); i += 1; end;
        while by(v[j]); j -= 1; end;
        i >= j && break
        v[i], v[j] = v[j], v[i]
        i += 1; j -= 1
    end

    v # maybe return j?
end

@time for n in 1:100
    x = rand(1:10, n);
    # @assert sort!(copy(x), by=iseven) == partition!(copy(x), by=iseven) # (sort! is stable, partition! is not)
    @assert iseven.(sort!(copy(x), by=iseven)) == iseven.(partition!(copy(x), by=iseven))
end

f(n) = (@b n sort!(rand(1:10, _), by=iseven) seconds=.02).time
g(n) = (@b n partition!(rand(1:10, _), by=iseven) seconds=.02).time

x = unique(round.(Int, 2 .* 1.1 .^ (1:150)));
@time y = [f(n) for n in x];
@time z = [g(n) for n in x];

plot(x, 1e9y ./ x, label="sort!", xaxis=:log, ylabel="time per element (ns)", xlabel="length", legend=:topleft, ylims=(0, 35), xticks=10)
plot!(x, 1e9z ./ x, label="partition!", xaxis=:log)
Screenshot 2023-12-03 at 2 06 04 PM

Do either of you have large inputs and care about a 4x performance loss? or want the index where the low values transition to high values? (j in the benchmark implementation)

Maybe I'll make ScratchQuickSort faster for inputs with few unique values so that this feature request is unnecessary :P that optimization will help the by case once JuliaLang/julia#52033 merges.

If we do this we could either use

partition!(by, x)

or

sort!(x; rev, by, ..., alg=Partition)

@jariji
Copy link

jariji commented Dec 3, 2023

I don't especially feel the need for this functionality at the moment, but if we do implement it, it should probably be one that returns the cutoff/transition locations (or equivalently the a tuple of views/slices).

@jfb-h
Copy link
Author

jfb-h commented Dec 4, 2023

Yeah, for now I've basically been using this:

function partition!(f, x)
    sort!(x; by=!f)
    findfirst(!f, x)
end

I guess it would make sense to bikeshed the name with Iterators.partition being something different. Though having this be super performant is not urgent for me, either.

@LilithHafner
Copy link
Member

lol, yeah. Iterators.partition has literally no relation to this.

Stability characteristics are another thing. You can get no stability, one-sided stability, or two-sided stability (though I can't think of any way to get two-sided stability in place and in linear time). We use all three stability characteristics in various places within Base.Sort. std::partition provides no stability guarantees—is that the semantic you would want for quad trees?

@jariji
Copy link

jariji commented Dec 6, 2023

@LilithHafner
Copy link
Member

LilithHafner commented Dec 8, 2023

Some design questions and proposed answers to those questions

Return value

  • The input
  • j
  • Two SubArrays (This)
  • Two Vectors (only possible with Memory and Julia 1.11)

Order

  • true first
  • false first (consistent with sort(_; by)) (This)

Input

  • Specify alg arg? (No)
  • Specify stability args? (Yes)
  • Specify if allocations are allowed? (No)
  • What is the default stability? (Stable except for the one-arg version which struggles to be stable with good perforamnce)
  • Provide both partition! and partition (Yes)
  • Allow partition_to! (which, I'm guessing, is fastest and stable) (Yes)

Proposal

export partition, partition!
public Stable, Unstable, ReverseStable

abstract type Stability end
struct Stable <: Stability end
struct Unstable <: Stability end
struct ReverseStable <: Stability end

partition!(by, v; stability=Unstable; false_stability=stability; true_stability=stability) -> NTuple{2, SubArray}
partition!(by, dest, src; stability=Stable; false_stability=stability; true_stability=stability) -> NTuple{2, SubArray}
partition(by, v; kws...) = partition!(by, similar(v), v; kws...)::NTuple{2, SubArray}

Of which partition and partition!(by, v; stability=Stable) would allocate, and partition!(by, dest, src; stability=Unstable) and partition!(by, dest, src; true_stability=ReverseStable) would be the fastest with partition!(by, dest, src) a close second fastest.

I'm certainly open to feedback and other ideas!

@LilithHafner LilithHafner linked a pull request Dec 8, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants