Skip to content

Conversation

@jonasfj
Copy link
Member

@jonasfj jonasfj commented Oct 29, 2024

From dart-archive/async#249 which we never managed to land in dart:async.


Future<void> Stream<T>.parallelForEach(
    int maxParallel,
    FutureOr<void> Function(T item) each, {
    FutureOr<void> Function(Object e, StackTrace? st) onError = Future.error,
  })

Call each for each item in this stream with maxParallel invocations.

This method will invoke each for each item in this stream, and wait for
all futures from each to be resolved. parallelForEach will call each
in parallel, but never more then maxParallel.

If each throws and onError rethrows (default behavior), then
parallelForEach will wait for ongoing each invocations to finish,
before throw the first error.

If onError does not throw, then iteration will not be interrupted and
errors from each will be ignored.

// Count size of all files in the current folder
var folderSize = 0;
// Use parallelForEach to read at-most 5 files at the same time.
await Directory.current.list().parallelForEach(5, (item) async {
  if (item is File) {
    final bytes = await item.readAsBytes();
    folderSize += bytes.length;
  }
});
print('Folder size: $folderSize');

@jonasfj jonasfj requested a review from isoos October 29, 2024 20:22
```dart
Future<void> Stream<T>.parallelForEach(
    int maxParallel,
    FutureOr<void> Function(T item) each, {
    FutureOr<void> Function(Object e, StackTrace? st) onError = Future.error,
  })
```

Call `each` for each item in this stream with `maxParallel` invocations.

This method will invoke `each` for each item in this stream, and wait for
all futures from `each` to be resolved. `parallelForEach` will call `each`
in parallel, but never more then `maxParallel`.

If `each` throws and `onError` rethrows (default behavior), then
`parallelForEach` will wait for ongoing `each` invocations to finish,
before throw the first error.

If `onError` does not throw, then iteration will not be interrupted and
errors from `each` will be ignored.

```dart
// Count size of all files in the current folder
var folderSize = 0;
// Use parallelForEach to read at-most 5 files at the same time.
await Directory.current.list().parallelForEach(5, (item) async {
  if (item is File) {
    final bytes = await item.readAsBytes();
    folderSize += bytes.length;
  }
});
print('Folder size: $folderSize');
Copy link
Collaborator

@isoos isoos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, but I'd use the concurrentForEach name instead

/// });
/// print('Folder size: $folderSize');
/// ```
Future<void> parallelForEach(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we call this concurrentForEach instead? Parallel would indicate separate isolates for me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall discussing this on the PR along ago. I don't really like concurrentForEach.

Maybe asyncForEach or boundedAsyncForEach or something completely different.

I think I'm going to keep the current name, just because it's not conflicting with other extensions on Stream.

}());

if (running >= maxParallel) {
await itemDone.wait;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering: will this implicitly pause the stream listener until the pending processing is done? Can we leave a not on that, or a TODO to investigate it at one point?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering: will this implicitly pause the stream listener until the pending processing is done?

Yes, that's kind of the point:

await stream.parallelForEach(N, each);

will:

  • Call each for each item in the stream.
  • Ensure that there are no more than N concurrent invocations of each.
    • This means pausing the stream, while items are being processed.
  • Wait for all each invocations to be completed.
    • Unless, each throws, in which case the stream will be canceled, and but pending each invocations will be awaited.

I would think that pausing the stream is desirable. Supposed you have a Stream<Package> query resulting from a datastore query.

  • You don't really want to load all Package entities into memory, if you did, you could do: await query.toList();.
  • You also don't really want to process each Package entity one at the time, because that's sort of slow. If you did could just do await for (final p in query) {...}.
  • You don't really want to process all Package entities concurrently because it'll do a crazy amount of I/O, won't reuse TCP connections and cause pain. If you did you could do await Future.wait(await query.map((p) async {...}).toList());.

You really want to process N number of entities concurrently and not more.

@jonasfj jonasfj merged commit 596ff08 into dart-lang:master Oct 31, 2024
32 checks passed
@jonasfj jonasfj deleted the parallel-foreach branch October 31, 2024 11:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants