Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: slices: partition slice into two slice by predicate #67326

Open
leaxoy opened this issue May 12, 2024 · 14 comments · May be fixed by #67327
Open

proposal: slices: partition slice into two slice by predicate #67326

leaxoy opened this issue May 12, 2024 · 14 comments · May be fixed by #67327
Labels
Milestone

Comments

@leaxoy
Copy link

leaxoy commented May 12, 2024

Proposal Details

In some scenarios, we need to divide the slice into two sub-slices that match and do not match by predicate, so it is proposed to provide the Partition and PartitionInPlace function.

Here are implementations

  1. Partition: Alloc new slice, and keep order, don't modify the original slice.
// Partition split the original slice into two slice.
// The elements of the first slice all satisfy the predicate.
// The elements of the second slice none satisfy the predicate.
func Partition[Slice ~[]E, E any](s Slice, f func(E) bool) (S, S) {
        if s == nil {
		return nil, nil
	}
	truthSlice := make([]E, 0)
	falseSlice := make([]E, 0)
	for _, x := range s {
		if f(x) {
			truthSlice = append(truthSlice, x)
		} else {
			falseSlice = append(falseSlice, x)
		}
	}
	return truthSlice, falseSlice
}
  1. PartitionInPlace: No alloc slice, also not keep order, modify original slice.
func PartitionInPlace[E any, S ~[]E](s S, f func(E) bool) (true, false S) {
	if s == nil {
		return nil, nil
	}

	i, t := 0, len(s)-1
	for i < t {
		if !f(s[i]) {
			i++
			continue
		}

		s[i], s[t] = s[t], s[i]
		t--
	}
	return s[:i], s[t:]
}

Usecases

Here is a simple search in github:
https://github.com/search?q=%2Ffunc%5Cs*%28%28P%7Cp%29artition%7C%28S%7Cs%29plit%29.*func%5C%28.*%5C%29+bool%5C%29%5Cs*%5C%28%2F+language%3AGo+++NOT+is%3Aarchived+NOT+is%3Afork+&type=code

It may not be an exact match, but it is enough to prove that the usage scenarios of partition are rich enough.

Also includes the following situations, which cannot be quickly retrieved by github search:

  1. There should be other function names that do similar things
  2. There are forms that do similar things directly instead of defining functions

Some specific examples:

  1. func (addrs addrList) partition(strategy func(Addr) bool) (primaries, fallbacks addrList) {
  2. https://github.com/jesseduffield/lazygit/blob/618fe533f8c6113392eea981d03c65e6da5860bb/pkg/utils/slice.go#L131
  3. https://github.com/telepresenceio/telepresence/blob/1d0e76afabb392079ad2397e30eeb31b3a0fccd8/pkg/subnet/subnet.go#L131
  4. https://github.com/database64128/tfo-go/blob/3f17e86cda66000ab80dec80ef780db99c51509d/tfo_supported.go#L347
  5. https://github.com/felix-kaestner/slices/blob/9b6acd0396d156fe509b8a228820530ce03cd1e1/slices.go#L200
@gopherbot gopherbot added this to the Proposal milestone May 12, 2024
leaxoy added a commit to leaxoy/go that referenced this issue May 12, 2024
Partition split the original slice into two slice.

Fixes golang#67326.
@leaxoy leaxoy linked a pull request May 12, 2024 that will close this issue
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/585018 mentions this issue: slices: add Partition

@septemhill
Copy link

It seems could only be two parts. Why don't we use GroupBy instead ?

@seankhliao
Copy link
Member

how common is this operation? "some scenarios" is not strong justification

@leaxoy
Copy link
Author

leaxoy commented May 12, 2024

@leaxoy
Copy link
Author

leaxoy commented May 12, 2024

It seems could only be two parts. Why don't we use GroupBy instead ?

@septemhill

Two main reason:

  1. Partition is convenient than GroupBy in the case of two classifications
  2. Usually groupby will return map, so I'm not sure if it makes sense to introduce map as a return value in slices. Before that, slices and maps didn't interact with each other.

@seankhliao
Copy link
Member

That search result contains a lot of results irrelevant to this proposal
a more accurate search may be something like https://github.com/search?q=%2Ffunc%5Cs*%28P%7Cp%29artition.*func%5C%28.*%5C%29+bool%5C%29+%5C%28%2F+language%3AGo+++NOT+is%3Aarchived+NOT+is%3Afork+&type=code

@leaxoy
Copy link
Author

leaxoy commented May 13, 2024

In addition, I also found that the split function also has similar functions.
https://github.com/search?q=%2Ffunc%5Cs*%28%28P%7Cp%29artition%7C%28S%7Cs%29plit%29.*func%5C%28.*%5C%29+bool%5C%29%5Cs*%5C%28%2F+language%3AGo+++NOT+is%3Aarchived+NOT+is%3Afork+&type=code

  1. And there should be other function names that do similar
    things
  2. And there are forms that use similar functions directly instead of defining functions

@adonovan
Copy link
Member

adonovan commented May 13, 2024

Potential optimization: allocate a bit vector containing f(i) for every index; then allocate a single array of length n and copy the true elements into the bottom and the false elements into the top; finally, return two cap-limited slices of this array. This would eliminate reallocations in append and reduce the amortized allocation overhead from 12.5% (append's average waste) to 1 bit per element (1.5% for a word-sized element, less for larger elements).

[Update: on reflection, a 12.5% allocation improvement doesn't seem worth the extra complexity. Never mind.]

@zigo101
Copy link

zigo101 commented May 13, 2024

It element order is not important ([edit] and the input one is allowed to be changed), then no allocation is needed.

@leaxoy
Copy link
Author

leaxoy commented May 14, 2024

Potential optimization: allocate a bit vector containing f(i) for every index; then allocate a single array of length n and copy the true elements into the bottom and the false elements into the top; finally, return two cap-limited slices of this array.

@adonovan This give me an idea to partition in place and return the index of the dividing point. I have seen this practice in rust, called as PartitionInPlace.

@go101 And if I understand correctly, what you expressed can also be considered as PartitionInPlace

@DeedleFake
Copy link

DeedleFake commented May 14, 2024

I wrote a Partition() function in an internal package in a project a little while back. I wound up removing it more recently though, mostly because it just wasn't useful anymore. It seems to me that an allocating version would make more sense in x/iter, while the slices implementation should be in-place.

For the fun of it, here's a zero-allocation implementation that runs in O(n) but does not preserve ordering.

Edit: Here's a version that does preserve ordering.

Edit 2: Version 3, now with overall easier-to-read code but annoyingly repetitive condition checks.

Edit 3: I wasn't paying attention to the elements of the true slice. Their order is not preserved. Whoops. I think it's possible to preserve it by keeping track of the index of the first one and reversing the sub-slice, but that'll reduce the efficiency just slightly.

@leaxoy
Copy link
Author

leaxoy commented May 14, 2024

I wrote a Partition() function in an internal package in a project a little while back. I wound up removing it more recently though, mostly because it just wasn't useful anymore. It seems to me that an allocating version would make more sense in x/iter, while the slices implementation should be in-place.

In fact, I write my version of Partition in my iter package, but is use a nested function as Seq, because I didn't find an efficient way to create seq without using slice.

Below is my implementations in my packgae iter.
Here is my first version implementation:

func appendSeq[E any](s Seq[E], x E) Seq[E] {
    return func(yield func(x E) bool {
        for e := range s {
            if !yield(e) {
                return
            }
        }
        yield(x)
    })
}

func Partition[E any](s Seq[E], f func(E) bool) (Seq[E], Seq[E]) {
    true:=func(yield func(E) bool) {}
    false:=func(yield func(E) bool) {}
    for x := range s {
        if f(x) {
            true = appendSeq(true, x)
        }else {
            false = appendSeq(false, x)
        }
    }
    return true, false
}

And the second implementation: Partition, it iterate over Seq twice, so it is not available for non-repeatable Seq(like: io stream).

For the fun of it, here's a zero-allocation implementation that runs in O(n) but does not preserve ordering.

I think we should reverse the version that keep order. Leave the choice to the user.

Finally, your comments are quite valuable and I will integrate these versions of implementation into the original proposal

@zigo101
Copy link

zigo101 commented May 14, 2024

Will iter become another feature which is abused in Go (other than channels)? :D

@leaxoy
Copy link
Author

leaxoy commented May 17, 2024

Following the above discussion, we will have two variants:

func Partition[S ~[]E, E any](s S, f func(E) bool) (S, S)

func PartitionInPlace[S ~[]E, E any](s S, f func(E) bool) (S, S)

Is there anything else that needs to be discussed? If not, I will update the PR later to reflect the latest changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Incoming
Development

Successfully merging a pull request may close this issue.

7 participants