Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-8111] Enable CloudObjectsTest$DefaultCoders #9446

Merged
merged 13 commits into from
Oct 4, 2019

Conversation

TheNeuralBit
Copy link
Member

@TheNeuralBit TheNeuralBit commented Aug 29, 2019

Adds a @RunWith(Enclosed.class) to CloudObjectsTest so that DefaultCoders actually runs. Since this test hasn't been running it has a few issues, which I've also attempted to resolve here. A summary of the changes to that end:

  • When testing a StructuredCoder sub-class use the components list as the expected components, rather than the usual arguments list.
  • Add DoubleCoder to list of Dataflow known coders.
  • Now only validates that Dataflow known coders don't have a PIPELINE_PROTO_CODER_ID rather than all model coders.
  • Add equals and hashCode for SchemaCoder and RowCoder so that they can be compared. Currently just compares the underlying schema and fromRow, toRow, so it doesn't work as expected for instances created with lambdas.

This also adds a test case that would have caught BEAM-8111.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- Build Status --- --- Build Status
Java Build Status Build Status Build Status Build Status
Build Status
Build Status
Build Status Build Status Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
--- Build Status
Build Status
Build Status --- --- Build Status
XLang --- --- --- Build Status --- --- ---

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

@TheNeuralBit TheNeuralBit changed the title [Do not merge] Add asserts that always fail to CloudObjectsTest$CoderTest [Do not merge] Enable CloudObjectsTest$CoderTest Aug 31, 2019
@TheNeuralBit
Copy link
Member Author

Run Dataflow ValidatesRunner

@TheNeuralBit
Copy link
Member Author

Run Dataflow ValidatesRunner

@TheNeuralBit TheNeuralBit changed the title [Do not merge] Enable CloudObjectsTest$CoderTest [Do not merge] Enable CloudObjectsTest$DefaultCoders Sep 6, 2019
@TheNeuralBit TheNeuralBit changed the title [Do not merge] Enable CloudObjectsTest$DefaultCoders [BEAM-8111] Enable CloudObjectsTest$DefaultCoders Sep 7, 2019
@TheNeuralBit
Copy link
Member Author

R: @reuvenlax would you mind reviewing this?

SchemaCoder<?> that = (SchemaCoder<?>) o;
return rowCoder.equals(that.rowCoder)
&& toRowFunction.equals(that.toRowFunction)
&& fromRowFunction.equals(that.fromRowFunction);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this just revert to object equality comparison on the to/from functions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - I discussed this offline a bit with @kennknowles and he convinced me that it was better to have an equals function that might have some false negatives (if the toRowFunction and fromRowFunction don't have a good equals), rather than one that could have false positives (like if we rely on just checking the schema and typeDescriptor, and assume that the toRow/fromRow are the same).

I managed to make the CloudObjectsTest work by adding RowIdentity with an equals() function here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I would phrase this is: let the functions own their equals. If they say they are equal, they are. If they say they aren't, they aren't. So this equals() is relative to that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good in theory. In practice these functions are usually lambdas, so we might have trouble making this work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true. I was thinking it's not such a big deal to get false negatives when lambdas are used, since I really just want the equality check to use in tests.

What do you think about updating the various schema providers to create Function sub-classes (with equals implemented) instead of using lambdas?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another alternative could be to add something like assertEquivalentSchemaCoder that just checks schema and type, rather than continuing down this rabbit hole.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we go ahead and merge this as is? I could follow up with more changes to the SchemaCoder equals (plumbing through a type descriptor and using that for comparison, as well as possibly changing the toRow/fromRow functions created by the existing SchemaProviders to make them comparable)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a PR up now (#9493) that adds equals and hashCode to the fromRow and toRow functions created by all the GetterBasedSchemaProvider sub-classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW this is not just for tests. The Flink runner appears to rely on coder equality (even though you can argue it shouldn't).

@TheNeuralBit
Copy link
Member Author

R: @kennknowles

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants