Skip to content

Optimize GroupIntoBatches for batch Dataflow pipelines #19749

@damccorm

Description

@damccorm

The GroupIntoBatches transform can be significantly optimized on Dataflow since it always ensures that a key K appears in only one bundle after a GroupByKey. This removes the usage of state and timers in the generic GroupIntoBatches transform.

Imported from Jira BEAM-7912. Original Jira may contain additional context.
Reported by: lcwik.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions