Skip to content

[BEAM-9273] Explicitly disable @RequiresTimeSortedInput on unsupported runners#10816

Merged
je-ik merged 2 commits intoapache:masterfrom
je-ik:BEAM-9273
Feb 14, 2020
Merged

[BEAM-9273] Explicitly disable @RequiresTimeSortedInput on unsupported runners#10816
je-ik merged 2 commits intoapache:masterfrom
je-ik:BEAM-9273

Conversation

@je-ik
Copy link
Contributor

@je-ik je-ik commented Feb 10, 2020

Fixes [BEAM-9273]

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- Build Status --- --- Build Status
Java Build Status Build Status Build Status Build Status
Build Status
Build Status
Build Status Build Status Build Status
Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
--- Build Status
Build Status
Build Status
Build Status
--- --- Build Status
XLang --- --- --- Build Status --- --- ---

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status
Build Status
Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

@je-ik
Copy link
Contributor Author

je-ik commented Feb 10, 2020

Run Flink ValidatesRunner

@je-ik
Copy link
Contributor Author

je-ik commented Feb 10, 2020

Run Spark ValidatesRunner

@je-ik
Copy link
Contributor Author

je-ik commented Feb 10, 2020

Run Spark StructuredStreaming ValidatesRunner

@je-ik
Copy link
Contributor Author

je-ik commented Feb 10, 2020

Run Apex ValidatesRunner

@je-ik
Copy link
Contributor Author

je-ik commented Feb 10, 2020

Run Dataflow ValidatesRunner

@je-ik
Copy link
Contributor Author

je-ik commented Feb 10, 2020

Run Samza ValidatesRunner

@je-ik
Copy link
Contributor Author

je-ik commented Feb 10, 2020

Run Flink ValidatesRunner

@je-ik
Copy link
Contributor Author

je-ik commented Feb 10, 2020

Run Dataflow ValidatesRunner

@je-ik
Copy link
Contributor Author

je-ik commented Feb 10, 2020

Run Spark StructuredStreaming ValidatesRunner

@je-ik
Copy link
Contributor Author

je-ik commented Feb 10, 2020

Run Apex ValidatesRunner

@je-ik
Copy link
Contributor Author

je-ik commented Feb 10, 2020

Run Spark ValidatesRunner

@je-ik
Copy link
Contributor Author

je-ik commented Feb 10, 2020

Run Samza ValidatesRunner

@je-ik
Copy link
Contributor Author

je-ik commented Feb 10, 2020

Run Gearpump ValidatesRunner

@je-ik
Copy link
Contributor Author

je-ik commented Feb 10, 2020

The fail seems not to be related to this PR. First failed post commit seems to be this one.

Copy link
Member

@kennknowles kennknowles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meta-comment: switching the whole world to use DoFnFeatures would be best as a separate commit, since it does not have to do with RequiresTimeSortedInput. It is OK to leave as one commit since it isn't that big a deal.

import org.apache.beam.sdk.values.TypeDescriptor;

/**
* Features a {@link DoFn} can posses. Each runner might implement a different (sub)set of this
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DoFnSignature is precisely the list of features already, no? It is OK to have helper methods anyhow.

Style point: for a given class, utility static methods usually go in a companion class like DoFnSignature (class) & DoFnSignatures (utility methods)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. The reason I created this class is that I wanted to demonstrate the approach of declaring pipeline requirements and validating that runner supports all requirements.

That would go as follows:

 Set<Features> required = PipelineFeaturs.extract(pipeline);
 Runner.validateAllFeaturesSupported(required);

That way, adding a new feature to pipeline would not require any change to runner core, but the runner would reject the pipeline.

I then realized that this change would be too big and really not related to the annotation, so I returned back to the original approach, where runners explicitly reject features they do not support (and adding unsupported feature needs modification in runners code). I will rename the class as you suggest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm ... DoFnSignatures already exists. It does something different (creates DoFnSignature from DoFn). Feels weird to mix these two helper functionalities.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kennknowles would you have any suggestions about naming the class? I think this code really should not go to DoFnSignatures, can we agree on some alternative name?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I quite strongly believe it belongs in DoFnSignatures and that what you describe in the rest belongs in Pipelines. Static-method-only utility classes tend to be disorganized and undiscoverable. It is better to attach them to the thing that they are most related to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the code there. The reason I didn't like it is that DoFnSignatures probably when created served the purpose to create signatures from DoFn, while the code I added makes no use of DoFnSignature itself. But there seems to be more code like that already, so I'm fine with that.

DoFn<KV<K, InputT>, OutputT> fn = originalParDo.getFn();
verifyFnIsStateful(fn);
DataflowRunner.verifyStateSupported(fn);
DataflowRunner.verifyDoFnSupported(fn, false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing a raw bool into a function call is not very readable. I suggest splitting into verifyDoFnSupportedForStreaming and verifyDoFnSupportedForBatch. These can each call the common code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added both versions, although the second one verifyDoFnSupportedStreaming is not used. I added that so that the methods are not imbalanced. The streaming case is called from DataflowRunner.verifyDoFnSupported(fn, context.getPipelineOptions().isStreaming()), where it would be weird to do if (context..isStream()) verifyStreaming() else ...

Copy link
Contributor Author

@je-ik je-ik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed comments from CR. I'll squash the commit after approval.

@je-ik je-ik requested a review from kennknowles February 12, 2020 21:11

@Test
public void testAllDoFnFeatures() {
tests.forEach(FeatureTest::test);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: there's a lot of logic in this test class. See https://testing.googleblog.com/2014/07/testing-on-toilet-dont-put-logic-in.html

It would be better to simply write out the boilerplate for each test, so each test can be read in a straight line with no context needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to keep the declaration of a class and its features encapsulated in single class. Agree that this doesn't play well with the rest of the DoFnSignaturesTest. But because there is effort in the direction of superseding all this with "pipeline features", I think it is fine to keep is as is for now and drop it as part of the effort later.

Copy link
Member

@kennknowles kennknowles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now see that you were raising it so that the feature checks take a DoFn as an argument instead of a DoFnSignature. So, yea, the methods would then belong in DoFns.java or since it is a class you can put static methods right on DoFn. The reason this isn't great is it creates a circular dependency. So IMO it is fine as-is.

@je-ik je-ik merged commit a149b6b into apache:master Feb 14, 2020
@je-ik je-ik deleted the BEAM-9273 branch February 14, 2020 21:40
@je-ik
Copy link
Contributor Author

je-ik commented Feb 14, 2020

Thanks @kennknowles

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants