Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move task spawning to driver methods #4

Open
yoshuawuyts opened this issue Mar 14, 2020 · 2 comments
Open

Move task spawning to driver methods #4

yoshuawuyts opened this issue Mar 14, 2020 · 2 comments
Labels

Comments

@yoshuawuyts
Copy link
Collaborator

@yoshuawuyts yoshuawuyts commented Mar 14, 2020

Currently parallel-stream will start doing work the second it's initialized. This is much the same as task::spawn. The upside of this is that we got it to work, and it's what people want in the overwhelming amount of cases.

The downsides are that we're both spawning more tasks than needed, which interferes with the ability of the compiler to inline futures, which in turn will impact performance. Also there's no backpressure.

The solution to this seems to be to move "task spawning" to the edge methods: next, for_each, collect, sum that on the one hand spawn tasks as fast as possible, while on the other hand allow outputting them one-by-one.

There's probably some nuance here; for example next is by nature sequential so task spawning might not even make sense. But overall it seems that if we can invert the logic slightly this could lead to some neat results.

@yoshuawuyts

This comment has been minimized.

Copy link
Collaborator Author

@yoshuawuyts yoshuawuyts commented Mar 14, 2020

However something to be mindful of is that calling next in a loop can also be used to implement methods such as collect. It seems there's an inherent mismatch between backpressure and driving futures.

Perhaps we can learn from futures-rs terminology here, and the max concurrency (#3) is not only the upper bound of futures executing simultaneously. But also the amount of futures that have (potentially) been completed, are now buffered, and are waiting to be read.

This is an interesting one it seems.

@yoshuawuyts

This comment has been minimized.

Copy link
Collaborator Author

@yoshuawuyts yoshuawuyts commented Mar 19, 2020

On the topic of ordering: some discussion occurred on reddit, but I think that we should:

  • have a method called ordered which can enforce the stream is ordered
    • Enforcing ordering would add overhead to the trait, and each individual trait impl would need to branch. This is probably not worth it. If ordering is important there are ways to implement it on top of an unordered stream, for which for_each suffices. The main point of parallel streams is to increase throughput, and ordering is generally only useful when creating collections so they should probably be scoped to that.
  • final methods should be responsible for ordering themselves. E.g. collecting into a vec should probably be ordered, but collecting into a hashmap should not. Similarly for_each should not be ordered.
    • Important is for each final method to document which ordering they use.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant
You can’t perform that action at this time.