Users may want to limit the parallelism of a step. Two classic uses cases are:
- User wants to produce at most k files, so sets TextIO.Write.withNumShards(k).
- External API only supports k QPS, so user sets a limit of k/(expected QPS/step) on the ParDo that makes the API call.
Unfortunately, there is no way to do this effectively within the Beam model. A GroupByKey with exactly k keys will guarantee that only k elements are produced, but runners are free to break fusion in ways that each element may be processed in parallel later.
To implement this functionaltiy, I believe we need to add this support to the Beam Model.
Imported from Jira BEAM-68. Original Jira may contain additional context.
Reported by: dhalperi.
Users may want to limit the parallelism of a step. Two classic uses cases are:
Unfortunately, there is no way to do this effectively within the Beam model. A GroupByKey with exactly k keys will guarantee that only k elements are produced, but runners are free to break fusion in ways that each element may be processed in parallel later.
To implement this functionaltiy, I believe we need to add this support to the Beam Model.
Imported from Jira BEAM-68. Original Jira may contain additional context.
Reported by: dhalperi.