Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-2927] Python support for portable side inputs over Fn API #4781

Merged
merged 1 commit into from Mar 21, 2018

Conversation

robertwb
Copy link
Contributor

@robertwb robertwb commented Mar 1, 2018

DESCRIPTION HERE


Follow this checklist to help us incorporate your contribution quickly and easily:

  • Make sure there is a JIRA issue filed for the change (usually before you start working on it). Trivial changes like typos do not require a JIRA issue. Your pull request should address just this issue, without pulling in other changes.
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue.
  • Write a pull request description that is detailed enough to understand:
    • What the pull request does
    • Why it does it
    • How it does it
    • Why this approach
  • Each commit in the pull request should have a meaningful subject line and body.
  • Run mvn clean verify to make sure basic checks pass. A more thorough check will be performed on your pull request automatically.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

@robertwb robertwb force-pushed the fn-api-dataflow-side-inputs branch 2 times, most recently from 200a039 to e51bdb9 Compare March 1, 2018 23:17
@robertwb
Copy link
Contributor Author

robertwb commented Mar 2, 2018

R: @tvalentyn

@lukecwik
Copy link
Member

lukecwik commented Mar 2, 2018

Nice

Copy link
Contributor

@tvalentyn tvalentyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks; the only major comment - how well is the new code covered by existing tests? Do we need to add test coverage?


At sdks/python/apache_beam/typehints/typehints.py:716:

  Internally, KV[X, Y] proxies to Tuple[X, Y]. A KV type-hint accepts only

nit: drop accepts only.

from apache_beam.transforms.core import ParDo

class SideInputVisitor(PipelineVisitor):
"""Ensures input `PCollection` used as a side inputs have a `KV` type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: grammar
Ensures input PCollection used as a side input has a KV type. ?

Also: s/SDk/SDK

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

pipeline,
element_type=typehints.KV[
str, side_input.pvalue.element_type])
parent = transform_node.parent or pipeline._root_transform()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why is root_transform not defined as a default parent for transforms that don't have a parent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good question, but I did run into this case. (Probably a bug, filed BEAM-3871 for follow-up.)

new_side_inputs = []
for ix, side_input in enumerate(transform_node.side_inputs):
access_pattern = side_input._side_input_data().access_pattern
if access_pattern == common_urns.ITERABLE_SIDE_INPUT:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the branches here currently have any test coverage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like external test coverage for DataflowRunner is pretty sparse... I added a unit test of this explicitly (and also tested manually against the Dataflow runner).

@@ -1098,3 +1098,36 @@ def is_consistent_with(sub, base):
# Nothing but object lives above any type constraints.
return base == object
return issubclass(sub, base)


def coerce_to_kv_type(element_type, label=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the branches here currently have any test coverage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they do.

Copy link
Contributor Author

@robertwb robertwb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review, and sorry for the delay getting back to this. PTAL.

pipeline,
element_type=typehints.KV[
str, side_input.pvalue.element_type])
parent = transform_node.parent or pipeline._root_transform()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good question, but I did run into this case. (Probably a bug, filed BEAM-3871 for follow-up.)

from apache_beam.transforms.core import ParDo

class SideInputVisitor(PipelineVisitor):
"""Ensures input `PCollection` used as a side inputs have a `KV` type.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -1098,3 +1098,36 @@ def is_consistent_with(sub, base):
# Nothing but object lives above any type constraints.
return base == object
return issubclass(sub, base)


def coerce_to_kv_type(element_type, label=None):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they do.

new_side_inputs = []
for ix, side_input in enumerate(transform_node.side_inputs):
access_pattern = side_input._side_input_data().access_pattern
if access_pattern == common_urns.ITERABLE_SIDE_INPUT:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like external test coverage for DataflowRunner is pretty sparse... I added a unit test of this explicitly (and also tested manually against the Dataflow runner).

@robertwb robertwb force-pushed the fn-api-dataflow-side-inputs branch from 3d74d37 to 99f1ebf Compare March 20, 2018 21:21
@robertwb robertwb merged commit b51ed61 into apache:master Mar 21, 2018
charlesccychen added a commit to charlesccychen/beam that referenced this pull request Mar 28, 2018
aaltay added a commit that referenced this pull request Mar 28, 2018
Revert #4781 which broke Python postsubmits
robertwb added a commit to robertwb/incubator-beam that referenced this pull request Mar 30, 2018
robertwb added a commit to robertwb/incubator-beam that referenced this pull request Apr 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants