Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HasLength for even more Flatten iterators #23431

Open
tkoolen opened this issue Aug 24, 2017 · 7 comments
Open

HasLength for even more Flatten iterators #23431

tkoolen opened this issue Aug 24, 2017 · 7 comments
Labels
iteration Involves iteration or the iteration protocol

Comments

@tkoolen
Copy link
Contributor

tkoolen commented Aug 24, 2017

I saw these recent changes: #22691, but maybe we can do better.

Even after these changes, the following example (from Discourse) still results in a SizeUnknown Flatten iterator:

using Base.Iterators: flatten
dr = [1:2, 4:7]
Base.iteratorsize(flatten(dr)) # SizeUnknown()

It would be possible to simply define

Base.length(fl::Flatten) = mapreduce(length, +, 0, fl.it)

and to add UnitRange to this list, but maybe it would be nice to do this more generically? Base.iteratorsize could return HasLength() for fl::Flatten if all iterators in fl.it are HasLength or HasSize, and SizeUnknown otherwise. That would mean that Base.iteratorsize(::Flatten) could no longer be optimized into a constant however; is that a big issue?

@rfourquet
Copy link
Member

Even only with UnitRange, it would mean that computing lenght is not O(1), which I think is a requirement for HasLength. I didn't see it specified though.

@stevengj
Copy link
Member

I think it is desirable for iteratorsize to be type stable (i.e. type of result known at compile time given the input type).

I don't think it is necessary for length to be O(1). (It isn't for String, after all.)

@stevengj
Copy link
Member

Also, I'm wary of actually using the iterator to compute the length in general, because an arbitrary iterator may have side effects. Think of Base.Iterators.flatten(eachline(filename)), for example.

@tkoolen
Copy link
Contributor Author

tkoolen commented Aug 25, 2017

OK, I clearly didn't think it through enough. Would a special case for UnitRange still be desirable, or would you consider it to just be code bloat for little additional gain? I think it may be a common enough case that it could warrant special optimization.

@stevengj
Copy link
Member

Couldn't we have a special case for any AbstractArray?

@tkoolen
Copy link
Contributor Author

tkoolen commented Aug 30, 2017

I don't think that's possible, because e.g. an OffsetArrays.OffsetArray is a subtype of AbstractArray, but does not define length or size methods.

This whole thing makes me wish that the interface were to define methods for Base.iteratorsize(::Type{T}) instead of for Base.iteratorsize(T); I believe everything could be done in type stable fashion using Tuple type recursion if that were the case.

@mcabbott
Copy link
Contributor

mcabbott commented Oct 27, 2022

Defining length for more Flatten types could greatly speed up collect ∘ Iterators.flatten. This has been discussed a bit in the context of #43334, and how best to express things like reduce(vcat, vec_of_vecs). Times:

julia> vecs = [rand(100) for _ in 1:100];

julia> @btime collect(Iterators.flatten($vecs));
  min 68.875 μs, mean 81.634 μs (9 allocations, 326.55 KiB) # master
  min 12.167 μs, mean 17.848 μs (2 allocations, 78.17 KiB)  # with length(::Flatten{...

I presume much has changed since discussion above in 2017, and in particular OffsetArrays do define length now, and Base.IteratorSize does work on types.

@brenhinkeller brenhinkeller added the iteration Involves iteration or the iteration protocol label Nov 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
iteration Involves iteration or the iteration protocol
Projects
None yet
Development

No branches or pull requests

5 participants