Skip to content

Support for limiting parallelism of a step #17963

@kennknowles

Description

@kennknowles

Users may want to limit the parallelism of a step. Two classic uses cases are:

  • User wants to produce at most k files, so sets TextIO.Write.withNumShards(k).
  • External API only supports k QPS, so user sets a limit of k/(expected QPS/step) on the ParDo that makes the API call.

Unfortunately, there is no way to do this effectively within the Beam model. A GroupByKey with exactly k keys will guarantee that only k elements are produced, but runners are free to break fusion in ways that each element may be processed in parallel later.

To implement this functionaltiy, I believe we need to add this support to the Beam Model.

Imported from Jira BEAM-68. Original Jira may contain additional context.
Reported by: dhalperi.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions