Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Iterators.cycle(iter, n) #47354

Merged
merged 7 commits into from
Feb 13, 2024
Merged

Add Iterators.cycle(iter, n) #47354

merged 7 commits into from
Feb 13, 2024

Conversation

mcabbott
Copy link
Contributor

@mcabbott mcabbott commented Oct 27, 2022

At present Iterators.repeated(iter, n) is the lazy cousin of fill(iter, n), but there is no iterator for repeat(iter, n). This PR proposes that Iterators.cycle(iter, n) should fill that role.

This relates to the one-argument form in the same way as Iterators.repeated. That is, cycle(iter) means cycle(iter, Inf) in the same way that repeated(iter) means repeated(iter, Inf)... or would be if Inf were an integer.

The implementation uses flatten(repeated(xs, n)). It could instead use take(cycle(xs), n * length(xs)) but that only works when the contents has known length. take(cycle...) tends to be faster, perhaps it should be used when possible? Some timing below. But perhaps this detail is a secondary question.

julia> takecycle(x, n::Int) = Iterators.take(Iterators.cycle(x), n * length(x));

julia> flatrep(x, n::Int) = Iterators.flatten(Iterators.repeated(x, n));  # as in PR, first commit

julia> takecycle(1:10, 100) |> length
1000

julia> flatrep(1:10, 100) |> Base.haslength  # and won't be helped by 47353
false

julia> @btime collect(takecycle(1:10, 100));
  min 1.642 μs, mean 2.554 μs (1 allocation, 7.94 KiB)

julia> @btime collect(flatrep(1:10, 100));
  min 6.617 μs, mean 9.107 μs (6 allocations, 21.86 KiB)

julia> flatrep(Tuple(1:10), 100) |> Base.haslength  # behaves better with tuples, but not faster:
true

julia> @btime collect(takecycle($(Tuple(rand(10))), 100));
  min 1.100 μs, mean 1.977 μs (1 allocation, 7.94 KiB)

julia> @btime collect(flatrep($(Tuple(rand(10))), 100));
  min 10.458 μs, mean 11.220 μs (1 allocation, 7.94 KiB)

Copy link
Contributor

@fingolfin fingolfin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Maybe should have an entry in NEWS.md ?

base/iterators.jl Outdated Show resolved Hide resolved
@mcabbott
Copy link
Contributor Author

Thanks!

Running my benchmark above on 1.11, the advantage of take(cycle...) is not as clear as it was. Can investigate such optimisations a bit more, but will do so in a follow-up PR.

@fingolfin
Copy link
Contributor

I think the overall new API is good; optimizations are of course welcome but better to have this in than to wait for the perfect version, I'd say

@fingolfin fingolfin added the status:merge me PR is reviewed. Merge when all tests are passing label Feb 10, 2024
@jariji
Copy link
Contributor

jariji commented Feb 12, 2024

map is to Iterators.map as repeat is to Iterators.repeat?

@mcabbott
Copy link
Contributor Author

Maybe this comment was for #53291?

I note that Iterators.repeat and Iterators.fill are both errors at present, which is good.

@jariji
Copy link
Contributor

jariji commented Feb 12, 2024

I just mean if there's gonna be an Iterators version of repeat, there's a certain sense in calling it Iterators.repeat.

@mcabbott
Copy link
Contributor Author

This PR hoped not to get tangled up in renaming things...

I thought Base.Iterators deliberately chose distinct names, so that they could be exported. With the exception of Iterators.filter. Perhaps so that they could be imported without conflicts? Then someone added Iterators.map which is mostly just an alias for Base.Generator. And I forgot about Iterators.accumulate, Iterators.reverse, so IDK perhaps there was never a pattern. Or perhaps it depends on how similar the functions are.

This PR documents that Iterators.repeated(val, n) is the lazy cousin of fill(val, n). The lazy one has a default value n=Inf which would be impossible for the eager one. And in fact fill(val) already means something else, a zero-array, extending a pattern that fill(val, n, n) makes a matrix, etc. Maybe that's why calling it Iterators.fill would be confusing?

Iterators.cycle(iter) is similarly the lazy n=Inf cousin of eager repeat(vector, n). This PR gives it a finite variant, with exactly the same relationship as the two methods of Iterators.repeated. The eager repeat(1:4) does nothing, and is == repeat(1:4, 1) so I suppose has default n==1. Maybe that's why while Iterators.cycle doesn't share its name?

Besides arrays, the eager function has a method for Char: repeat('a', 5) isa String. And it does things like repeat(1:4; inner=2), whose lazy version is Iterators.flatten(Iterators.repeated(x, 2) for x in 1:4). IDK if there's any appetite for an easy lazy way to do such inner repetition.

One downside of calling this method Iterators.repeat is that it's one more symbol, and only two letters separate it from the very different Iterators.repeated. Unless that were first re-named, which is what I read #53291 as asking for.

@jariji
Copy link
Contributor

jariji commented Feb 12, 2024

SGTM

@vtjnash vtjnash merged commit e6992f7 into JuliaLang:master Feb 13, 2024
4 of 7 checks passed
@mcabbott mcabbott deleted the cycle_n branch February 13, 2024 03:01
@giordano giordano removed the status:merge me PR is reviewed. Merge when all tests are passing label Feb 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:iteration Involves iteration or the iteration protocol
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants