Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new feature: partition into N sets (to add to sets of length N) #37449

Open
bjarthur opened this issue Sep 7, 2020 · 3 comments
Open

new feature: partition into N sets (to add to sets of length N) #37449

bjarthur opened this issue Sep 7, 2020 · 3 comments

Comments

@bjarthur
Copy link
Contributor

bjarthur commented Sep 7, 2020

currently, Base.Iterators.partition accepts an argument n which specifies the length of the disjoint subsets. i frequently find i would rather specify the number of disjoint subsets. has this been considered before? a similar interface-design issue exists with range(start, stop, length) and range(start, stop, step).

i actually already have some code that does this via a new iterator. and i was about to submit a PR to IterTools.
but i think a new method to Base.Iterators.partition might make more sense.

here is the docstring showing the API of what i currently have:

"""
split(xs, n)

Group values of xs into n disjoint sets. The sets will have identical
lengths if length(xs) is evenly divisible by n, otherwise they will be
approximately similar.

julia> collect(split(1:9,4))
4-element Array{Any,1}:
 1:2
 3:4
 5:7
 8:9

See also: partition.
"""

the existing partition API could be changed to partition(collection; n, l), where only one of the two keyword arguments would be specified, one acting as the old second positional argument, and the other specifying the desired number of disjoint sets.

should i stick with IterTools or extend Base? or neither? thanks!

@mcabbott
Copy link
Contributor

mcabbott commented Sep 7, 2020

I agree this would often be convenient:

Iterators.partition(itr; count) = Iterators.partition(itr, cld(length(itr), count))

But unlike the present method, this one won't work for things like itr = (i for i in 1:10 if rand()>0.5) for which haslength(itr) == false. Perhaps unlike every other method in Iterators. Does that mean it should collect first in such cases? Or not live in Iterators? Or just give an error?

@xgdgsc
Copy link
Contributor

xgdgsc commented May 8, 2023

Besides a split for iterator, would including https://juliaml.github.io/MLUtils.jl/stable/api/#MLUtils.chunk for arrays also useful?

@IanButterworth
Copy link
Sponsor Member

Or not live in Iterators

Yeah, a Base.partition function that could do either ngroups and nelements while collecting seems reasonable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants