Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sort!(::Vector{Union{T, Missing}}) could be more efficient #27781

Closed
rdeits opened this issue Jun 25, 2018 · 27 comments · Fixed by #27817 or #47383
Closed

sort!(::Vector{Union{T, Missing}}) could be more efficient #27781

rdeits opened this issue Jun 25, 2018 · 27 comments · Fixed by #27817 or #47383
Labels
domain:missing data Base.missing and related functionality domain:sorting Put things in order performance Must go faster

Comments

@rdeits
Copy link
Contributor

rdeits commented Jun 25, 2018

Opening an issue as suggested here: https://discourse.julialang.org/t/with-missings-julia-is-slower-than-r/11838/17?u=rdeits

Sorting a Float64 array in-place is quite fast and non-allocating:

julia> @btime sort!(y) setup=(y = rand(100)) evals=1
  2.414 μs (0 allocations: 0 bytes)

but sorting an array of Union{Float64, Missing} in-place allocates a new copy and is about 2x slower, regardless of the input size:

julia> @btime sort!(y2) setup=(y = rand(100); y2 = ifelse.(rand(length(y)) .< 0.9, y, missing)) evals=1
  4.157 μs (2 allocations: 624 bytes)
julia> versioninfo()
Julia Version 0.7.0-beta.5
Commit 948b088f17 (2018-06-24 17:50 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, skylake)
@mbauman mbauman added performance Must go faster kind:regression Regression in behavior compared to a previous version domain:missing data Base.missing and related functionality labels Jun 25, 2018
@StefanKarpinski
Copy link
Sponsor Member

Not really a regression, right? Since this was previously definitely not efficient.

@nalimilan nalimilan removed the kind:regression Regression in behavior compared to a previous version label Jun 26, 2018
@haampie
Copy link
Contributor

haampie commented Jun 26, 2018

The difference is that sort!(v::Vector{Float64}) calls fpsort!(v), which partitions v like

[ ≤ -0.0 | ≥ +0.0 | NaN ]

and then uses Core.Intrinsics.slt_int for comparison on the first and second part.

Maybe we could treat missing in general like we treat NaN w/ floats?

What I mean is: when calling sort!(v::Vector{Union{T,Missing}}), first partition v as

[ non-missing | missing ]

and then call sort!(::Vector{T}) on the first block. Is something like this possible? Can you convince the compiler that the first bit of the vector consists of T's, not of Union{Missing,T}'s?

@nalimilan
Copy link
Member

As people have spotted on the Discourse thread, a first issue is that merge sort is used instead of quick sort. This explains the allocations, but using quick sort "only" reduces the time by one third. Probably still worth doing though, see #27789.

julia> @btime sort!(y, alg=QuickSort) setup=(y = rand(100));
  460.335 ns (0 allocations: 0 bytes)

julia> @btime sort!(y, alg=MergeSort) setup=(y = rand(100));
  665.809 ns (3 allocations: 608 bytes)

julia> @btime sort!(y2, alg=QuickSort) setup=(y = rand(100); y2 = ifelse.(rand(length(y)) .< 0.9, y, missing));
  1.861 μs (0 allocations: 0 bytes)

julia> @btime sort!(y2, alg=MergeSort) setup=(y = rand(100); y2 = ifelse.(rand(length(y)) .< 0.9, y, missing));
  1.918 μs (2 allocations: 624 bytes)

@nalimilan
Copy link
Member

What I mean is: when calling sort!(v::Vector{Union{T,Missing}}), first partition v as

[ non-missing | missing ]

and then call sort!(::Vector{T}) on the first block. Is something like this possible? Can you convince the compiler that the first bit of the vector consists of T's, not of Union{Missing,T}'s?

Good catch @haampie! I guess we could try extending nans2end! to also move missing values to the end, which will allow using the same algorithm. The compiler can be convinced using type assertions, but AFAIK they have some cost, so maybe more subtle solutions would be needed (like accessing directly the underlying data).

@haampie
Copy link
Contributor

haampie commented Jun 26, 2018

But is there a safe way to do so? Reinterpreting the vector will not work:

> reinterpret(Float64, Vector{Union{Missing,Float64}}(rand(100)))
ERROR: ArgumentError: cannot reinterpret `Union{Missing, Float64}` `Float64`, type `Union{Missing, Float64}` is not a bits type

@nalimilan
Copy link
Member

No, we'd need to access the data part of the array, which is a hidden implementation detail. But I hope we can avoid that.

@nalimilan
Copy link
Member

I've just tested a quick hack like this:

@@ -1021,7 +1021,7 @@ right(o::Perm) = Perm(right(o.order), o.data)
 lt(::Left, x::T, y::T) where {T<:Floats} = slt_int(y, x)
 lt(::Right, x::T, y::T) where {T<:Floats} = slt_int(x, y)

-isnan(o::DirectOrdering, x::Floats) = (x!=x)
+isnan(o::DirectOrdering, x::Union{Floats,Missing}) = (x === missing || x!=x)
@@ -1082,7 +1082,7 @@ end
 fpsort!(v::AbstractVector, a::Sort.PartialQuickSort, o::Ordering) =
     sort!(v, first(axes(v,1)), last(axes(v,1)), a, o)

-sort!(v::AbstractVector{<:Floats}, a::Algorithm, o::DirectOrdering) = fpsort!(v,a,o)
+sort!(v::AbstractVector{<:Union{Floats, Missing}, a::Algorithm, o::DirectOrdering) = fpsort!(v,a,o)

AFAICT it gives correct results for normal values, but mixes NaN and missing according to their order of appearance (this should be fixed and shouldn't be too costly). Performance is already quite good:

julia> @btime sort!(y2, alg=QuickSort) setup=(y = rand(100); y2 = ifelse.(rand(length(y)) .< 0.9, y, missing));
  796.571 ns (0 allocations: 0 bytes)

@StefanKarpinski
Copy link
Sponsor Member

but mixes NaN and missing according to their order of appearance (this should be fixed and shouldn't be too costly)

The correct behavior based on isless seems to be sorting the NaNs first in a block in appearance order and then the missings in a block at the very end also in order of appearance.

@nalimilan
Copy link
Member

I've prepared a PR at #27817. It's about twice faster than the current QuickSort, which is itself faster by 30% than the current MergeSort default. That makes us about 50% slower than R AFAICT, so there's still room for improvement but it's quite reasonable. One should also test with different proportions of missing values.

@haampie
Copy link
Contributor

haampie commented Jun 27, 2018

What about larger vectors? I tried it with ~1000 elements, and that seemed to give the largest performance gap.

@nalimilan
Copy link
Member

You mean compared with R? Actually I could have mentioned that "50% slower" referred to a comparison for a vector with 10M entries (as in the original Discourse post), to limit the variability of measurements.

@haampie
Copy link
Contributor

haampie commented Jun 27, 2018

I opened this issue #27831 with a question about the reinterpret-trick, cause I feel that's the way to tackle this problem in general.

@haampie
Copy link
Contributor

haampie commented Jun 28, 2018

So, the consensus of #27831 seems that something nice & performant is not yet around, but something quick & dirty will work:

@inline function partition_missing!(v::Vector{Union{T,Missing}}) where {T}
    i, j = 1, length(v)
    @inbounds while true
        while i < j && v[i] !== missing; i += 1; end
        while i < j && v[j] === missing; j -= 1; end
        i >= j && break
        v[i], v[j] = v[j], v[i]
        i += 1; j -= 1;
    end
    @inbounds return v[i] === missing ? i - 1 : i
end

function my_sort!(v::Vector{Union{T,Missing}}) where {T}
    m = partition_missing!(v)
    w = unsafe_wrap(Array, Ptr{T}(pointer(v)), m)
    sort!(w)
    v
end

### test
using Test
function test_things()
    @test issorted(my_sort!([missing, 3, 2, 10, missing, 4]))
    @test issorted(my_sort!(Vector{Union{Int,Missing}}([missing, missing])))
    @test issorted(my_sort!(Vector{Union{Int,Missing}}([4,2,6,2,9])))
end

### bench

using BenchmarkTools
using Random

function bench(T::Type = Int, n = 1_000)
    y = rand(T, n)
    vec = ifelse.(y .< 0.9, y, missing)
    bench_new = @benchmark my_sort!(z) setup = (z = copy($vec))
    bench_curr = @benchmark sort!(z) setup = (z = copy($vec))
    bench_notmissing = @benchmark sort!(z) setup = (z = copy($y))
    bench_new, bench_curr, bench_notmissing
end

gives

julia> a, b, c = bench(Int, 10_000)
(Trial(352.888 μs), Trial(1.700 ms), Trial(532.544 μs))

julia> a, b, c = bench(Float64, 10_000)
(Trial(514.307 μs), Trial(2.488 ms), Trial(530.676 μs))

So, sorting a vector with missing data is actually faster than sorting a vector without missing values :). Probably has to do with the fact that we don't have to sort 10% of the data when moving the missing values to the end.

@nalimilan
Copy link
Member

Fascinating! ;-)

That approach sounds promising to me, maybe it can handle missing values in general and avoid the need for #27817. One question is to find what types T are safe with that approach, something @vtjnash and @quinnj can tell. Then you need to identify where in the complex dispatch tree of the sorting code this method can be introduced

@quinnj
Copy link
Member

quinnj commented Jun 28, 2018

Base.isbitsunion(T)

@StefanKarpinski
Copy link
Sponsor Member

I don’t think we really need a completely general solution to this right away, it would be fine to hack it in for common important bits types like ints and floats.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Jun 28, 2018

That's not the right predicate, but the simplest is maybe Base.isbitstype(nonmissingtype(eltype(x)))

@nalimilan
Copy link
Member

I don’t think we really need a completely general solution to this right away, it would be fine to hack it in for common important bits types like ints and floats.

Sure. What I was wondering is whether we need special code in fpsort as in #27817 or whether a common solution can apply to both Int and Float64 (and probably others).

@haampie
Copy link
Contributor

haampie commented Jun 28, 2018

My guess is that the above might not be noticeably slower than #27817, maybe even faster?

@nalimilan
Copy link
Member

Yes, it should be similar, though moving NaNs and missings to the end in a single pass, and separating them only in a second pass is probably slightly more efficient when they are the minority. Likely not worth the additional code if we can have a common solution for several element types.

@nalimilan
Copy link
Member

@haampie Do you want to give a try to the general approach you presented above? It would be interesting to know whether it can make #27817 unnecessary or not.

@haampie
Copy link
Contributor

haampie commented Jul 10, 2018

Yes, I'm sorry for not pursuing this right away. I'll give it a shot this weekend!

@pdeffebach
Copy link
Contributor

Gently bumping this. Would be nice to get something along these lines into 1.7.

@nalimilan
Copy link
Member

nalimilan commented Feb 26, 2021

I've rebased #27817. Given that we haven't made any progress on the more general solution for more than two years, it's probably a good idea to merge it to make common cases fast at least.

EDIT: and anyway #27817 works for any AbstractArray{Union{Missing, Float64}}, while the general solution proposed above would work for more element types, but only for Array. So these are complementary.

@nalimilan
Copy link
Member

Thanks for merging #27817 @vtjnash. I think it's still work keeping this open for non-floating point types given that we have a good discussion above.

@nalimilan nalimilan reopened this Mar 11, 2021
@LilithHafner
Copy link
Member

This is still an issue in 1.9.0-DEV right now for non-floating point unions with missing.

@LilithHafner
Copy link
Member

using BenchmarkTools
@btime sort!(x) setup=(x=vcat([(rand(),) for x in 1:10^5]));
#  8.858 ms (2 allocations: 390.67 KiB)
@btime sort!(x) setup=(x=vcat([(rand(),) for x in 1:10^5], [missing]));
#  15.823 ms (2 allocations: 439.55 KiB)

@LilithHafner LilithHafner added the domain:sorting Put things in order label Jul 19, 2022
@LilithHafner LilithHafner changed the title sort!(::Vector{Union{Float64, Missing}}) allocates unexpectedly in v0.7-beta sort!(::Vector{Union{T, Missing}}) could be more efficient Jul 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:missing data Base.missing and related functionality domain:sorting Put things in order performance Must go faster
Projects
None yet
9 participants