Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add indexing support to SkipMissing #31008

Merged
merged 4 commits into from Feb 22, 2019

Conversation

@nalimilan
Copy link
Contributor

commented Feb 8, 2019

Define IndexStyle, eachindex, keys and getindex to match indices of wrapped array, skipping missing entries. This notably allows finding the index of elements in the wrapped array using findall, findnext, argmin/max and findmin/max.

Fixes #29305. Part of #30606.

FWIW, iterating over SkipMissing using eachindex and getindex rather than iterate is slower, but not that much (and I see no reason why the compiler couldn't improve since the involved code is quite simple) EDIT: with @inbounds annotations both solutions have the same performance (but the performance difference persists without it).

julia> function f(x)
           s = 0
           @inbounds for v in x
               s += v
           end
           s
       end
f (generic function with 1 method)

julia> function g(x)
           s = 0
           @inbounds for i in eachindex(x)
               s += x[i]
           end
           s
       end
g (generic function with 1 method)

julia> x = skipmissing(ifelse.(rand(10_000) .> 0.8, missing, rand(10_000)))
Base.SkipMissing{Array{Union{Missing, Float64},1}}(Union{Missing, Float64}[0.764807, 0.225138, missing, 0.920196, missing, 0.782529, 0.936112,0.0176555, 0.581311, 0.6244590.175387, missing, 0.393418, 0.717713, 0.898142, 0.899193, 0.177139, missing, 0.56281, 0.479712])

julia> using BenchmarkTools

julia> @assert f(x) == g(x)

julia> @btime f(x);
  24.347 μs (1 allocation: 16 bytes)

julia> @btime g(x);
  24.521 μs (1 allocation: 16 bytes)
Add indexing support to SkipMissing
Define IndexStyle, eachindex, keys and getindex to match indices of wrapped array,
skipping missing entries. This notably allows finding the index of elements
in the wrapped array using findall, findnext, argmin/max and findmin/max.
nalimilan added 2 commits Feb 8, 2019
@StefanKarpinski

This comment has been minimized.

Copy link
Member

commented Feb 20, 2019

I'm not sure that I'm the best person to review this. @mbauman, would you feel qualified to take a look at this and review it?

@nalimilan

This comment has been minimized.

Copy link
Contributor Author

commented Feb 20, 2019

Given the small amount of code introduced, I think what's needed is mainly a decision regarding the proposal at #30606.

@mbauman
Copy link
Member

left a comment

This is great as is — just a few optional comments on docs that I noted as I was reading through.

base/missing.jl Outdated Show resolved Hide resolved
doc/src/manual/missing.md Outdated Show resolved Hide resolved
@nalimilan

This comment has been minimized.

Copy link
Contributor Author

commented Feb 21, 2019

Thanks! I've pushed a commit to try to improve the wording.

@nalimilan nalimilan merged commit 0d21100 into master Feb 22, 2019

3 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
julia freebsd ci Build done
Details

@nalimilan nalimilan deleted the nl/skipmissing branch Feb 22, 2019

@nalimilan

This comment has been minimized.

Copy link
Contributor Author

commented Apr 16, 2019

I've just realized that broadcast(f, skipmissing(x)) currently works and returns a vector of length count(!ismissing, x). This is somewhat inconsistent with the indexing support added in this PR: for example a[findall(skipmissing(a) .< 0)] is a bit misleading since it will always work but return values which don't correspond to anything. Probably not a big deal since it's quite convoluted and the only alternative is to throw an error, but maybe something to change in 2.0.

@mbauman

This comment has been minimized.

Copy link
Member

commented Apr 16, 2019

Yup, that'll fit in nicely with my hope to do something about wider axis support beyond just AbstractUnitRanges. Linking to #24019 as a nice example of a type that's in-between associative and array. Also interesting that it's the first example that I've seen where the default broadcastable fallback does the wrong thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.