Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-7678] Fixes bug in output element_type generation in Kv PipelineVisitor #9238

Merged

Conversation

ecanzonieri
Copy link
Contributor

@ecanzonieri ecanzonieri commented Aug 3, 2019

This review tries to address BEAM-7678. When we use typehints in the output the element_type is already defined so in theory there is no need to try to infer it. There are some types that cannot be inferred by the function infer_output_type, in these cases the typehint should be used to bypass the inferred type.

I'm not sure how to test this code, I can reproduce the issue in a manual test, using as type a class that for some reason the infer_output_type is not able to infer.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- Build Status --- --- Build Status
Java Build Status Build Status Build Status Build Status
Build Status
Build Status
Build Status Build Status Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
--- Build Status
Build Status
Build Status --- --- Build Status

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

@ecanzonieri
Copy link
Contributor Author

@aaltay would you have time to review or recommend another reviewer?

Copy link
Member

@aaltay aaltay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it makes sense to preserve the previous output.element_type if one is not set.

It looks like you have a test case in (https://issues.apache.org/jira/browse/BEAM-7678). Would it possible to convert that to a unit test? It will be good to see a passing test.

sdks/python/apache_beam/pipeline.py Outdated Show resolved Hide resolved
@ecanzonieri
Copy link
Contributor Author

I agree that we should have a unit test for this. In the case of the manual test the Message class lives in a separate library and it's a very complex class (several hundreds of lines of code). My guess is that for some reason the inference is not able to work for that class.

I had spent some time trying to write a test doing exactly what you suggested before submitting the pr. The problem is that I can't find an easy way to create a type that would fail infer_output_type.
In my first attempt I had created a new class inside the pipeline_test.py as follow:

class TestOutputType(object):

  def __init__(self, value):
    self.value = value

I had created a stateful DoFn that would hit the Kv PipelineVisitor (verified via debugger) but the infer_output_type was able to find the proper type. In other words I wasn't able to get the unit test to fail with the older code. Replacing the TestOutputType with the Message would correctly fail the test before the fix. Is there any obvious way to create a type that will fail inference?

@ecanzonieri ecanzonieri force-pushed the BEAM-7678_use_typehints_to_resolve_kvtypes branch from b12f1ca to a70de1d Compare August 6, 2019 17:32
@ecanzonieri
Copy link
Contributor Author

ecanzonieri commented Aug 6, 2019

I was able to write a unit test for this feature. I rebased the code because of some wrong commits I accidentally pushed to my pr.
Please, let me know if there is anything else that I need to address as part of this pr.

@aaltay
Copy link
Member

aaltay commented Aug 6, 2019

Thank you this looks good. I will merge it after tests pass.

Just to be explicit, this test was failing before your fix, right?

@ecanzonieri
Copy link
Contributor Author

Correct, I verified that the test was failing before the fix and passed after the fix.

@aaltay
Copy link
Member

aaltay commented Aug 6, 2019

Test error looks like a flake. Filed: https://issues.apache.org/jira/browse/BEAM-7911

@aaltay
Copy link
Member

aaltay commented Aug 6, 2019

Run Python PreCommit

@aaltay aaltay merged commit bc2c6ff into apache:master Aug 7, 2019
@ecanzonieri
Copy link
Contributor Author

@aaltay do you think we can close the ticket BEAM-7678 now? Feel free to assign it me, I don't think I have permission in the Beam Jira to close or assign tickets to myself.

@aaltay
Copy link
Member

aaltay commented Aug 8, 2019

@ecanzonieri I assigned the issue to you and closed it. I added as a contributor, you should be able to assign JIRA issues to yourself from now on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants