`sort!(::Vector{Union{T, Missing}})` could be more efficient #27781

rdeits · 2018-06-25T21:34:38Z

Opening an issue as suggested here: https://discourse.julialang.org/t/with-missings-julia-is-slower-than-r/11838/17?u=rdeits

Sorting a Float64 array in-place is quite fast and non-allocating:

julia> @btime sort!(y) setup=(y = rand(100)) evals=1
  2.414 μs (0 allocations: 0 bytes)

but sorting an array of Union{Float64, Missing} in-place allocates a new copy and is about 2x slower, regardless of the input size:

julia> @btime sort!(y2) setup=(y = rand(100); y2 = ifelse.(rand(length(y)) .< 0.9, y, missing)) evals=1
  4.157 μs (2 allocations: 624 bytes)

julia> versioninfo()
Julia Version 0.7.0-beta.5
Commit 948b088f17 (2018-06-24 17:50 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, skylake)

The text was updated successfully, but these errors were encountered:

StefanKarpinski · 2018-06-25T22:32:43Z

Not really a regression, right? Since this was previously definitely not efficient.

haampie · 2018-06-26T07:13:56Z

The difference is that sort!(v::Vector{Float64}) calls fpsort!(v), which partitions v like

[ ≤ -0.0 | ≥ +0.0 | NaN ]

and then uses Core.Intrinsics.slt_int for comparison on the first and second part.

Maybe we could treat missing in general like we treat NaN w/ floats?

What I mean is: when calling sort!(v::Vector{Union{T,Missing}}), first partition v as

[ non-missing | missing ]

and then call sort!(::Vector{T}) on the first block. Is something like this possible? Can you convince the compiler that the first bit of the vector consists of T's, not of Union{Missing,T}'s?

nalimilan · 2018-06-26T07:18:23Z

As people have spotted on the Discourse thread, a first issue is that merge sort is used instead of quick sort. This explains the allocations, but using quick sort "only" reduces the time by one third. Probably still worth doing though, see #27789.

julia> @btime sort!(y, alg=QuickSort) setup=(y = rand(100));
  460.335 ns (0 allocations: 0 bytes)

julia> @btime sort!(y, alg=MergeSort) setup=(y = rand(100));
  665.809 ns (3 allocations: 608 bytes)

julia> @btime sort!(y2, alg=QuickSort) setup=(y = rand(100); y2 = ifelse.(rand(length(y)) .< 0.9, y, missing));
  1.861 μs (0 allocations: 0 bytes)

julia> @btime sort!(y2, alg=MergeSort) setup=(y = rand(100); y2 = ifelse.(rand(length(y)) .< 0.9, y, missing));
  1.918 μs (2 allocations: 624 bytes)

nalimilan · 2018-06-26T07:27:01Z

What I mean is: when calling sort!(v::Vector{Union{T,Missing}}), first partition v as

[ non-missing | missing ]

and then call sort!(::Vector{T}) on the first block. Is something like this possible? Can you convince the compiler that the first bit of the vector consists of T's, not of Union{Missing,T}'s?

Good catch @haampie! I guess we could try extending nans2end! to also move missing values to the end, which will allow using the same algorithm. The compiler can be convinced using type assertions, but AFAIK they have some cost, so maybe more subtle solutions would be needed (like accessing directly the underlying data).

haampie · 2018-06-26T07:32:56Z

But is there a safe way to do so? Reinterpreting the vector will not work:

> reinterpret(Float64, Vector{Union{Missing,Float64}}(rand(100)))
ERROR: ArgumentError: cannot reinterpret `Union{Missing, Float64}` `Float64`, type `Union{Missing, Float64}` is not a bits type

nalimilan · 2018-06-26T08:03:13Z

No, we'd need to access the data part of the array, which is a hidden implementation detail. But I hope we can avoid that.

nalimilan · 2018-06-26T08:12:48Z

I've just tested a quick hack like this:

@@ -1021,7 +1021,7 @@ right(o::Perm) = Perm(right(o.order), o.data)
 lt(::Left, x::T, y::T) where {T<:Floats} = slt_int(y, x)
 lt(::Right, x::T, y::T) where {T<:Floats} = slt_int(x, y)

-isnan(o::DirectOrdering, x::Floats) = (x!=x)
+isnan(o::DirectOrdering, x::Union{Floats,Missing}) = (x === missing || x!=x)
@@ -1082,7 +1082,7 @@ end
 fpsort!(v::AbstractVector, a::Sort.PartialQuickSort, o::Ordering) =
     sort!(v, first(axes(v,1)), last(axes(v,1)), a, o)

-sort!(v::AbstractVector{<:Floats}, a::Algorithm, o::DirectOrdering) = fpsort!(v,a,o)
+sort!(v::AbstractVector{<:Union{Floats, Missing}, a::Algorithm, o::DirectOrdering) = fpsort!(v,a,o)

AFAICT it gives correct results for normal values, but mixes NaN and missing according to their order of appearance (this should be fixed and shouldn't be too costly). Performance is already quite good:

julia> @btime sort!(y2, alg=QuickSort) setup=(y = rand(100); y2 = ifelse.(rand(length(y)) .< 0.9, y, missing));
  796.571 ns (0 allocations: 0 bytes)

StefanKarpinski · 2018-06-26T13:50:02Z

but mixes NaN and missing according to their order of appearance (this should be fixed and shouldn't be too costly)

The correct behavior based on isless seems to be sorting the NaNs first in a block in appearance order and then the missings in a block at the very end also in order of appearance.

nalimilan · 2018-06-27T08:06:36Z

I've prepared a PR at #27817. It's about twice faster than the current QuickSort, which is itself faster by 30% than the current MergeSort default. That makes us about 50% slower than R AFAICT, so there's still room for improvement but it's quite reasonable. One should also test with different proportions of missing values.

haampie · 2018-06-27T11:25:33Z

What about larger vectors? I tried it with ~1000 elements, and that seemed to give the largest performance gap.

nalimilan · 2018-06-27T11:51:09Z

You mean compared with R? Actually I could have mentioned that "50% slower" referred to a comparison for a vector with 10M entries (as in the original Discourse post), to limit the variability of measurements.

haampie · 2018-06-27T20:31:04Z

I opened this issue #27831 with a question about the reinterpret-trick, cause I feel that's the way to tackle this problem in general.

haampie · 2018-06-28T17:21:42Z

So, the consensus of #27831 seems that something nice & performant is not yet around, but something quick & dirty will work:

@inline function partition_missing!(v::Vector{Union{T,Missing}}) where {T}
    i, j = 1, length(v)
    @inbounds while true
        while i < j && v[i] !== missing; i += 1; end
        while i < j && v[j] === missing; j -= 1; end
        i >= j && break
        v[i], v[j] = v[j], v[i]
        i += 1; j -= 1;
    end
    @inbounds return v[i] === missing ? i - 1 : i
end

function my_sort!(v::Vector{Union{T,Missing}}) where {T}
    m = partition_missing!(v)
    w = unsafe_wrap(Array, Ptr{T}(pointer(v)), m)
    sort!(w)
    v
end

### test
using Test
function test_things()
    @test issorted(my_sort!([missing, 3, 2, 10, missing, 4]))
    @test issorted(my_sort!(Vector{Union{Int,Missing}}([missing, missing])))
    @test issorted(my_sort!(Vector{Union{Int,Missing}}([4,2,6,2,9])))
end

### bench

using BenchmarkTools
using Random

function bench(T::Type = Int, n = 1_000)
    y = rand(T, n)
    vec = ifelse.(y .< 0.9, y, missing)
    bench_new = @benchmark my_sort!(z) setup = (z = copy($vec))
    bench_curr = @benchmark sort!(z) setup = (z = copy($vec))
    bench_notmissing = @benchmark sort!(z) setup = (z = copy($y))
    bench_new, bench_curr, bench_notmissing
end

gives

julia> a, b, c = bench(Int, 10_000)
(Trial(352.888 μs), Trial(1.700 ms), Trial(532.544 μs))

julia> a, b, c = bench(Float64, 10_000)
(Trial(514.307 μs), Trial(2.488 ms), Trial(530.676 μs))

So, sorting a vector with missing data is actually faster than sorting a vector without missing values :). Probably has to do with the fact that we don't have to sort 10% of the data when moving the missing values to the end.

nalimilan · 2018-06-28T17:33:22Z

Fascinating! ;-)

That approach sounds promising to me, maybe it can handle missing values in general and avoid the need for #27817. One question is to find what types T are safe with that approach, something @vtjnash and @quinnj can tell. Then you need to identify where in the complex dispatch tree of the sorting code this method can be introduced

quinnj · 2018-06-28T17:46:31Z

Base.isbitsunion(T)

StefanKarpinski · 2018-06-28T18:00:23Z

I don’t think we really need a completely general solution to this right away, it would be fine to hack it in for common important bits types like ints and floats.

vtjnash · 2018-06-28T19:10:20Z

That's not the right predicate, but the simplest is maybe Base.isbitstype(nonmissingtype(eltype(x)))

nalimilan · 2018-06-28T19:27:44Z

I don’t think we really need a completely general solution to this right away, it would be fine to hack it in for common important bits types like ints and floats.

Sure. What I was wondering is whether we need special code in fpsort as in #27817 or whether a common solution can apply to both Int and Float64 (and probably others).

haampie · 2018-06-28T20:04:55Z

My guess is that the above might not be noticeably slower than #27817, maybe even faster?

nalimilan · 2018-06-28T20:20:57Z

Yes, it should be similar, though moving NaNs and missings to the end in a single pass, and separating them only in a second pass is probably slightly more efficient when they are the minority. Likely not worth the additional code if we can have a common solution for several element types.

nalimilan · 2018-07-10T10:34:59Z

@haampie Do you want to give a try to the general approach you presented above? It would be interesting to know whether it can make #27817 unnecessary or not.

haampie · 2018-07-10T22:17:36Z

Yes, I'm sorry for not pursuing this right away. I'll give it a shot this weekend!

pdeffebach · 2021-02-26T00:03:53Z

Gently bumping this. Would be nice to get something along these lines into 1.7.

nalimilan · 2021-02-26T09:00:33Z

I've rebased #27817. Given that we haven't made any progress on the more general solution for more than two years, it's probably a good idea to merge it to make common cases fast at least.

EDIT: and anyway #27817 works for any AbstractArray{Union{Missing, Float64}}, while the general solution proposed above would work for more element types, but only for Array. So these are complementary.

nalimilan · 2021-03-11T20:30:49Z

Thanks for merging #27817 @vtjnash. I think it's still work keeping this open for non-floating point types given that we have a good discussion above.

LilithHafner · 2022-06-02T21:29:03Z

This is still an issue in 1.9.0-DEV right now for non-floating point unions with missing.

LilithHafner · 2022-06-02T21:41:46Z

using BenchmarkTools
@btime sort!(x) setup=(x=vcat([(rand(),) for x in 1:10^5]));
#  8.858 ms (2 allocations: 390.67 KiB)
@btime sort!(x) setup=(x=vcat([(rand(),) for x in 1:10^5], [missing]));
#  15.823 ms (2 allocations: 439.55 KiB)

mbauman added performance Must go faster kind:regression Regression in behavior compared to a previous version domain:missing data Base.missing and related functionality labels Jun 25, 2018

nalimilan removed the kind:regression Regression in behavior compared to a previous version label Jun 26, 2018

nalimilan mentioned this issue Jun 26, 2018

Use QuickSort rather than MergeSort by default for Union{<:Number, Missing} arrays #27789

Merged

GunnarFarneback mentioned this issue Jun 26, 2018

trouble with filter(!ismissing, x) #27809

Closed

nalimilan mentioned this issue Jun 27, 2018

Support missing values in fpsort #27817

Merged

haampie mentioned this issue Jun 27, 2018

Reinterpreting Vector{Union{T,Missing}} as Vector{T} #27831

Closed

vtjnash closed this as completed in #27817 Mar 11, 2021

nalimilan reopened this Mar 11, 2021

LilithHafner added the domain:sorting Put things in order label Jul 19, 2022

LilithHafner changed the title ~~sort!(::Vector{Union{Float64, Missing}}) allocates unexpectedly in v0.7-beta~~ sort!(::Vector{Union{T, Missing}}) could be more efficient Jul 19, 2022

LilithHafner mentioned this issue Oct 29, 2022

Refactor and document sorting dispatch #47383

Merged

LilithHafner closed this as completed in #47383 Dec 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`sort!(::Vector{Union{T, Missing}})` could be more efficient #27781

`sort!(::Vector{Union{T, Missing}})` could be more efficient #27781

rdeits commented Jun 25, 2018

StefanKarpinski commented Jun 25, 2018

haampie commented Jun 26, 2018 •

edited

Loading

nalimilan commented Jun 26, 2018

nalimilan commented Jun 26, 2018

haampie commented Jun 26, 2018

nalimilan commented Jun 26, 2018

nalimilan commented Jun 26, 2018

StefanKarpinski commented Jun 26, 2018

nalimilan commented Jun 27, 2018

haampie commented Jun 27, 2018

nalimilan commented Jun 27, 2018

haampie commented Jun 27, 2018

haampie commented Jun 28, 2018 •

edited

Loading

nalimilan commented Jun 28, 2018

quinnj commented Jun 28, 2018

StefanKarpinski commented Jun 28, 2018

vtjnash commented Jun 28, 2018

nalimilan commented Jun 28, 2018

haampie commented Jun 28, 2018

nalimilan commented Jun 28, 2018

nalimilan commented Jul 10, 2018

haampie commented Jul 10, 2018

pdeffebach commented Feb 26, 2021

nalimilan commented Feb 26, 2021 •

edited

Loading

nalimilan commented Mar 11, 2021

LilithHafner commented Jun 2, 2022

LilithHafner commented Jun 2, 2022

sort!(::Vector{Union{T, Missing}}) could be more efficient #27781

sort!(::Vector{Union{T, Missing}}) could be more efficient #27781

Comments

rdeits commented Jun 25, 2018

StefanKarpinski commented Jun 25, 2018

haampie commented Jun 26, 2018 • edited Loading

nalimilan commented Jun 26, 2018

nalimilan commented Jun 26, 2018

haampie commented Jun 26, 2018

nalimilan commented Jun 26, 2018

nalimilan commented Jun 26, 2018

StefanKarpinski commented Jun 26, 2018

nalimilan commented Jun 27, 2018

haampie commented Jun 27, 2018

nalimilan commented Jun 27, 2018

haampie commented Jun 27, 2018

haampie commented Jun 28, 2018 • edited Loading

nalimilan commented Jun 28, 2018

quinnj commented Jun 28, 2018

StefanKarpinski commented Jun 28, 2018

vtjnash commented Jun 28, 2018

nalimilan commented Jun 28, 2018

haampie commented Jun 28, 2018

nalimilan commented Jun 28, 2018

nalimilan commented Jul 10, 2018

haampie commented Jul 10, 2018

pdeffebach commented Feb 26, 2021

nalimilan commented Feb 26, 2021 • edited Loading

nalimilan commented Mar 11, 2021

LilithHafner commented Jun 2, 2022

LilithHafner commented Jun 2, 2022

`sort!(::Vector{Union{T, Missing}})` could be more efficient #27781

`sort!(::Vector{Union{T, Missing}})` could be more efficient #27781

haampie commented Jun 26, 2018 •

edited

Loading

haampie commented Jun 28, 2018 •

edited

Loading

nalimilan commented Feb 26, 2021 •

edited

Loading