Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve quantile in corner cases of collection eltype #30938

Merged
merged 10 commits into from Apr 1, 2019

Conversation

@bkamins
Copy link
Contributor

commented Feb 1, 2019

This PR fixes an inconsistency in quantile which worked or failed depending whether p was a tuple or not. For example:

julia> using Statistics

julia> quantile(Any[1,2,3,4], (0.5, 0.7))
(2.5, 3.0999999999999996)

julia> quantile(Any[1,2,3,4], [0.5, 0.7])
ERROR: MethodError: no method matching AbstractFloat(::Type{Any})

A practical common case is when itr has eltype that allows Missing but it actually the collection does not contain any missing values (CSV.jl by default reads the data in this way). Now calculating quantiles on such collections fails:

julia> quantile(Union{Float64, Missing}[1,2,3,4], 0.1:0.1:0.9)
ERROR: MethodError: no method matching AbstractFloat(::Type{Union{Missing, Float64}})

while passing a tuple as p works:

julia> quantile(Union{Float64, Missing}[1,2,3,4], Tuple(0.1:0.1:0.9))
(1.3, 1.6, 1.9, 2.2, 2.5, 2.8, 3.0999999999999996, 3.4000000000000004, 3.7)

As quantile does not allow missing values, we try to strip Missing from the return eltype if possible (this will work unless the eltype is Any, in which case it will stay Any).

@bkamins

This comment has been minimized.

Copy link
Contributor Author

commented Feb 5, 2019

@nalimilan: four your reference, this causes cut in CategoricalArrays.jl fail when what you want to apply cat on allows Missing in eltype.

quantile!(similar(p,float(eltype(v))), v, p; sorted=sorted)
function quantile!(v::AbstractVector, p::AbstractArray; sorted::Bool=false)
T = Core.Compiler.typesubtract(eltype(v), Missing)
S = T <: AbstractFloat ? T : promote_type(T, Float64)

This comment has been minimized.

Copy link
@nalimilan

nalimilan Feb 5, 2019

Contributor

Is there any reason not to use the same approach as method below which is used when p is a tuple (which calls quantilesort! and then _quantile)? promote_type(T, Float64) is problematic here, as we claim to support types other than real.

This comment has been minimized.

Copy link
@bkamins

bkamins Feb 5, 2019

Author Contributor

Good point. Fixed.

bkamins added 2 commits Feb 5, 2019
@@ -821,8 +821,13 @@ function quantile!(q::AbstractArray, v::AbstractVector, p::AbstractArray;
return q
end

quantile!(v::AbstractVector, p::AbstractArray; sorted::Bool=false) =
quantile!(similar(p,float(eltype(v))), v, p; sorted=sorted)
function quantile!(v::AbstractVector, p::AbstractArray; sorted::Bool=false)

This comment has been minimized.

Copy link
@nalimilan

nalimilan Feb 5, 2019

Contributor

Actually can't we just use this method for tuple p? map should do the right thing in both cases.

This comment has been minimized.

Copy link
@bkamins

bkamins Feb 5, 2019

Author Contributor

Right - I will merge them.

@bkamins

This comment has been minimized.

Copy link
Contributor Author

commented Feb 6, 2019

CI error is unrelated

@test quantile([1, 2, 3, 4], ()) == ()
@test isempty(quantile([1, 2, 3, 4], Float64[]))
@test quantile([1, 2, 3, 4], Float64[]) isa Vector{Float64}
@test quantile([1, 2, 3, 4], []) isa Vector{Any}

This comment has been minimized.

Copy link
@nalimilan

nalimilan Feb 6, 2019

Contributor

Could be worth testing with p::Vector{Any} when it is non-empty, since we allow for it.

This comment has been minimized.

Copy link
@bkamins

bkamins Feb 6, 2019

Author Contributor

Good point. I have added two more tests. What is relevant here is that using map we have some type instability (this problem was present with tuple version already) - I expose this instability in tests.

Here is the problem (I show it on tuple which works the same before and after this PR):

julia> quantile([4, 9, 1, 5, 7, 8, 2, 3, 5, 17, 11],
                          (0.1, 0.2, 0.4, 0.9))
(2.0, 3.0, 5.0, 11.0)

julia> quantile(Any[4, 9, 1, 5, 7, 8, 2, 3, 5, 17, 11],
                          (0.1, 0.2, 0.4, 0.9))
(2, 3, 5, 11)

julia> quantile([4, 9, 1, 5, 7, 8, 2, 3, 5, 17, 11],
                          (0.1, 0.2, 0.4, 0.99))
(2.0, 3.0, 5.0, 16.400000000000002)

julia> quantile(Any[4, 9, 1, 5, 7, 8, 2, 3, 5, 17, 11],
                          (0.1, 0.2, 0.4, 0.99))
(2, 3, 5, 16.400000000000002)

This comment has been minimized.

Copy link
@nalimilan

nalimilan Feb 6, 2019

Contributor

Ah, that's annoying. Maybe change:

T = promote_type(eltype(v), typeof(v[1]*h))

to

T = promote_type(typeof(v[1]), typeof(v[1]*h))

Or maybe we can just do T = typeof(v[1]*h)? What's the point of using promotion?

This comment has been minimized.

Copy link
@bkamins

bkamins Feb 6, 2019

Author Contributor

Both proposals are not ideal as v[1] may have a type unrelated to the type we actually need, e.g. for Vector{Real} with heterogeneous inputs. I have proposed something else that I think better captures what we want - it always combines the types of elements required and the type of quantile required.

Note that in corner cases of 0 or 1 as quantile we still get an integer:

julia> quantile([1,2,3], 1)
3

julia> quantile([1,2,3], 0)
1

but I think it is OK to leave it as is (we could enforce these cases to produce a float result, but I am not sure if we should, as in general we do not guarantee that quantile returns an AbstractFloat).

This comment has been minimized.

Copy link
@nalimilan

nalimilan Feb 6, 2019

Contributor

Makes sense. Just to be sure: when you say we get an integer for 0 and 1, that's not the case with 0.0 and 1.0, right?

This comment has been minimized.

Copy link
@bkamins

bkamins Feb 6, 2019

Author Contributor

Right. Simply only if you pass an integer as a quantile (which is possible if the quantile is equal to 0 or 1). The key line is f0 = (lv - 1)*p with evaluates to an integer if p is an integer, and this propagates (as we only have additions and multiplications later).

bkamins added 3 commits Feb 6, 2019
@bkamins

This comment has been minimized.

Copy link
Contributor Author

commented Feb 6, 2019

I made sure that what we have now is type stable for normal cases of containers with concrete types. However, still if we have an e.g. Union{Float64, Missing} container like Union{Float64, Missing}[1,2,3] then quantile will not be type stable as map is not type stable then either. Here is a MWE of the root of the problem:

julia> x = Union{Float64, Missing}[1,2,3]
3-element Array{Union{Missing, Float64},1}:
 1.0
 2.0
 3.0

julia> map(identity, x)
3-element Array{Float64,1}:
 1.0
 2.0
 3.0

julia> @code_warntype map(identity, x)
Body::Array
1 ─ %1 = %new(Base.Generator{Array{Union{Missing, Float64},1},typeof(identity)}, identity, A)::Base.Generator{Array{Union{Missing, Float64},1},typeof(identity)}
│   %2 = invoke Base._collect(_3::Array{Union{Missing, Float64},1}, %1::Base.Generator{Array{Union{Missing, Float64},1},typeof(identity)}, $(QuoteNode(Base.EltypeUnknown()))::Base.EltypeUnknown, $(QuoteNode(Base.HasShape{1}()))::Base.HasShape{1})::Array
└──      return %2
@nalimilan

This comment has been minimized.

Copy link
Contributor

commented Feb 6, 2019

That's the map equivalent of the inference failure that has already been noted for broadcast at #28382.

@bkamins

This comment has been minimized.

Copy link
Contributor Author

commented Feb 6, 2019

Agreed - but given you made #30485, is it possible to also cover map and comprehensions (as they have the same problem) there?

@nalimilan

This comment has been minimized.

Copy link
Contributor

commented Feb 6, 2019

Probably, but first the approach used by #30485 has to be approved...

@bkamins

This comment has been minimized.

Copy link
Contributor Author

commented Feb 7, 2019

The CI errors seem unrelated.

@nalimilan

This comment has been minimized.

Copy link
Contributor

commented Feb 20, 2019

Merge?

@bkamins

This comment has been minimized.

Copy link
Contributor Author

commented Mar 24, 2019

bump

@nalimilan nalimilan requested a review from StefanKarpinski Mar 25, 2019

@StefanKarpinski

This comment has been minimized.

Copy link
Member

commented Mar 25, 2019

My review is requested but I'm not really sure what I'm reviewing for. @nalimilan, what were you looking for input on?

@nalimilan

This comment has been minimized.

Copy link
Contributor

commented Mar 25, 2019

I just want you to merge if you think it's OK, so that I'm not the only one to blame if I missed something. ;-)

@nalimilan nalimilan merged commit cc516ab into JuliaLang:master Apr 1, 2019

2 of 3 checks passed

continuous-integration/travis-ci/pr The Travis CI build failed
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
julia freebsd ci Build done
Details

@bkamins bkamins deleted the bkamins:fix_quantile branch Apr 1, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.