Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SR-4164] Consider treating LazyFilteredCollection like FlattenCollection #46747

Open
dabrahams opened this issue Mar 5, 2017 · 11 comments
Open

Comments

@dabrahams
Copy link
Collaborator

dabrahams commented Mar 5, 2017

Previous ID SR-4164
Radar None
Original Reporter @dabrahams
Type Bug
Additional Detail from JIRA
Votes 1
Component/s Standard Library
Labels Bug
Assignee Hektve87 (JIRA)
Priority Medium

md5: e621a9d1963ce2fe4205f377c6b2fb16

Issue Description:

The refusal to pre-count elements of a `FlattenCollection` is based on reasoning about the performance of `flatMap`. But `flatMap`, when applied to collections of optionals, doesn't produce a LazyFlattenedCollection<…>, but a LazyMapCollection<LazyFilterCollection<…>>. We are not doing a similar thing for LazyFilterCollection.

The following demonstrates:

struct Foo {
  let i: UInt ;
  init?(_ i: UInt) { print("Foo(\(i))"); if i % 4 == 0 { return nil }; self.i = i }
}
Array((0..<10).lazy.flatMap(Foo.init))

which currently generates Foo(i) 3 times for each i in 0..<10. If this FIXME were addressed it would go down to 2 times (once for counting), but if we take the option suggested in this bug, it would go down to 1 (at the cost of Array reallocations due to failing to pre-count, of course).

@swift-ci
Copy link
Collaborator

swift-ci commented Mar 5, 2017

Comment by Hector David (JIRA)

@davidgreens

@dabrahams
Copy link
Collaborator Author

dabrahams commented Mar 5, 2017

Hector, it doesn't look like @davidgreens is a recognized userID, so that comment probably will not have any effect.

@natecook1000
Copy link
Member

natecook1000 commented Mar 9, 2017

A big +1 for this change—the multiple iteration of lazily filtered collections is consistently surprising for users of the lazy system.

@dabrahams
Copy link
Collaborator Author

dabrahams commented Mar 9, 2017

Wow, this is very different from what I saw the last time I tried this. My preliminary benchmarks show that you have to have > 10⁵ elements in the resulting array before pre-counting is a win, at least when the elements are Ints. However, this still isn't as cut-and-dried as it looks, since this shows count to be at least 10x slower than it should be for a lazy filtered collection:

import Darwin
import CoreFoundation

var total = 0

let length = 10_000_000_00

let sc = (0..<length).lazy.filter { $0  != 0 }

let t0 = CFAbsoluteTimeGetCurrent()
total += sc.reduce(0) { $0 + $1 } >> 8  // shift to avoid overflows
let t1 = CFAbsoluteTimeGetCurrent()
total += sc.count >> 8
let t2 = CFAbsoluteTimeGetCurrent()

print("reduce", t1 - t0)
print("count", t2 - t1)

exit(total == 0 ? 1 : 0) // A way to make sure it isn't optimized out

@dabrahams
Copy link
Collaborator Author

dabrahams commented Mar 9, 2017

Here's a more complete test, with some results at the bottom. I don't have time to complete the testing right now but I've left a // TODO: comment at the bottom that shows what we might want to do to. If someone wants to pick that up and finish the testing we'd have all the experimental evidence we need to make a decision, IMO.

@natecook1000
Copy link
Member

natecook1000 commented Mar 12, 2017

@dabrahams—I ran with your test and added the remainder. My forked gist is here with some results in a comment. Interesting stuff!

@dabrahams
Copy link
Collaborator Author

dabrahams commented Mar 12, 2017

@natecook1000 Cool! What do you conclude from these results?

@natecook1000
Copy link
Member

natecook1000 commented Mar 12, 2017

Skipping preallocation looks like a win in enough cases that I'd skip it as the default, since it's an unexpected quirk of the implementation that filtered lazy collections get iterated multiple times. By the time it seems to start to matter, the collection is big enough (billions of items?) that I'd assume people are measuring performance. If someone does want the benefit of that preallocation, they should be able to measure the count of the filtered lazy collection directly, reserve that capacity in a new array, and then append the elements. This was my preference before testing, as well, but I don't seea strong case in this data for keeping the current behavior.

@dabrahams
Copy link
Collaborator Author

dabrahams commented Mar 12, 2017

@natecook1000 Hmm, I'd go further; to me it looks like there's a really strong case for dropping the current behavior. Time for a pull request?

@natecook1000
Copy link
Member

natecook1000 commented Mar 13, 2017

@dabrahams Time indeed!

@swift-ci
Copy link
Collaborator

swift-ci commented Apr 16, 2017

Comment by Hector David (JIRA)

#demo-1

@swift-ci swift-ci transferred this issue from apple/swift-issues Apr 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants