Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define length(::SkipMissing) #35946

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

andyferris
Copy link
Member

@andyferris andyferris commented May 20, 2020

This adds a definition for the length of SkipMissing. Note that the IteratorSize trait remains available for algorithms to know that this function is not O(1).

In particular, it seems odd that I can do:

julia> (sum  skipmissing)([1, 2, missing, 4])
7

but not:

julia> (length  skipmissing)([1, 2, missing, 4])
ERROR: MethodError: no method matching length(::Base.SkipMissing{Array{Union{Missing, Int64},1}})
Closest candidates are:
  length(::Base.Iterators.Flatten{Tuple{}}) at iterators.jl:1061
  length(::BitSet) at bitset.jl:365
  length(::Base.EnvDict) at env.jl:132
  ...
Stacktrace:
 [1] (::Base.var"#62#63"{typeof(length),typeof(skipmissing)})(::Array{Union{Missing, Int64},1}) at ./operators.jl:877
 [2] top-level scope at REPL[2]:1

Now we have:

julia> (length  skipmissing)([1, 2, missing, 4])
3

@andyferris
Copy link
Member Author

andyferris commented May 20, 2020

Actually I'm not sure why we don't have this as a definition for length(::Any) and tie it directly to the iterate API? I suppose there may be un-ending collections, we may want to exclude IsInfinite or only include SizeUnknown?

@andyferris
Copy link
Member Author

OK I nerd-sniped myself. The generalization is at #35947.

@andyferris
Copy link
Member Author

FYI It seems that #35947 won't be suitable (at least for the time being), so I'd still like to have this PR considered.

Copy link
Contributor

@stev47 stev47 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use count?

@@ -258,6 +258,13 @@ keys(itr::SkipMissing) =
v === missing && throw(MissingException("the value at index $I is missing"))
v
end
function length(itr::SkipMissing)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
function length(itr::SkipMissing)
length(itr::SkipMissing) = count(_->true, itr)

@martinholters
Copy link
Member

Doesn't this suffer the same problem as #35947 in case someone wraps a stateful iterator in skipmissing? Maybe only do this for HasLength/HasShape underlying iterators?

@andyferris
Copy link
Member Author

andyferris commented May 20, 2020

Doesn't this suffer the same problem as #35947 in case someone wraps a stateful iterator in skipmissing?

This is where I'm confused - I'm not "saved" from calling sum or count on such an iterator. Nor should I be! It's neither better nor safer that I should use (itr -> count(_->true, itr)) ∘ skipmissing in my code.

I thought I've seen people argue it's a good model to use reductions over Channel-like objects.

Maybe only do this for HasLength/HasShape underlying iterators?

Unfortunately, having a determinate size is orthogonal to being stateful, and we don't have a trait for iterator-statefulness. Besides, that particular approach would destroy composibility of multiple layers of lazy filtering.

You can see the current situation below.

julia> a = Iterators.Stateful([1, 2, missing, 4]);

julia> Base.IteratorSize(a)
Base.HasLength()

julia> length(a)
4

julia> length(a)
4

julia> (length  skipmissing)(a)
ERROR: MethodError: no method matching length(::Base.SkipMissing{Base.Iterators.Stateful{Array{Union{Missing, Int64},1},Union{Nothing, Tuple{Union{Missing, Int64},Int64}}}})
Closest candidates are:
  length(::Core.SimpleVector) at essentials.jl:596
  length(::Base.MethodList) at reflection.jl:852
  length(::Core.MethodTable) at reflection.jl:938
  ...
Stacktrace:
 [1] (::Base.var"#64#65"{typeof(length),typeof(skipmissing)})(::Base.Iterators.Stateful{Array{Union{Missing, Int64},1},Union{Nothing, Tuple{Union{Missing, Int64},Int64}}}) at ./operators.jl:859
 [2] top-level scope at REPL[4]:1
 [3] eval(::Module, ::Any) at ./boot.jl:331
 [4] eval_user_input(::Any, ::REPL.REPLBackend) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/REPL/src/REPL.jl:86
 [5] run_backend(::REPL.REPLBackend) at /home/ferris/.julia/packages/Revise/AMRie/src/Revise.jl:1023
 [6] top-level scope at none:0
 [7] eval(::Module, ::Any) at ./boot.jl:331
 [8] eval_user_input(::Any, ::REPL.REPLBackend) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/REPL/src/REPL.jl:86
 [9] run_backend(::REPL.REPLBackend) at /home/ferris/.julia/packages/Revise/AMRie/src/Revise.jl:1023
 [10] top-level scope at none:0

julia> ((itr -> count(_ -> true, itr))  skipmissing)(a)
3

julia> ((itr -> count(_ -> true, itr))  skipmissing)(a)
0

@martinholters
Copy link
Member

Right, doing this for HasLength()/HasShape() only doesn't improve the situation. But I still believe that having this for SkipMissing is not (much) better than for general iterators. Having the possibility that length(iter) != length(iter) seems very unfortunate to me, no matter whether that's restricted to iter::SkipMissing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:missing data Base.missing and related functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants