Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-3711] Enabling combiner lifting in Dataflow Runner. #5974

Merged
merged 2 commits into from Jul 18, 2018

Conversation

youngoli
Copy link
Contributor

This change does two things:

  1. It modifies the way that the Dataflow Runner transmits combines to
    Dataflow so that it can support combiner lifting. This is done by, when
    translating CombineGroupedValues transforms, encoding the ID of the
    parent Combine Per Key transform as a Serialized Fn.

  2. This change also preemptively fixes an issue that occurs that would
    cause CombineGroupedValues with side inputs to get translated that way
    for Combiner lifting, despite the parent transform being an anonymous
    composite transform, indicating that the CombineGroupedValues should
    be translated as a ParDo. This is fixed by adjusting the PTransform
    Overrides in DataflowRunner slightly.


Follow this checklist to help us incorporate your contribution quickly and easily:

  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

It will help us expedite review of your Pull Request if you tag someone (e.g. @username) to look at it.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- --- --- --- ---
Java Build Status Build Status Build Status Build Status Build Status Build Status Build Status
Python Build Status --- Build Status
Build Status
--- --- --- ---

This change does two things:

1. It modifies the way that the Dataflow Runner transmits combines to
Dataflow so that it can support combiner lifting. This is done by, when
translating CombineGroupedValues transforms, encoding the ID of the
parent Combine Per Key transform as a Serialized Fn.

2. This change also preemptively fixes an issue that occurs that would
cause CombineGroupedValues with side inputs to get translated that way
for Combiner lifting, despite the parent transform being an anonymous
composite transform, indicating that the CombineGroupedValues should
be translated as a ParDo. This is fixed by adjusting the PTransform
Overrides in DataflowRunner slightly.
@youngoli
Copy link
Contributor Author

R: @lukecwik

@lukecwik lukecwik merged commit 602cac1 into apache:master Jul 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants