-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-8111] Enable CloudObjectsTest$DefaultCoders #9446
Conversation
Run Dataflow ValidatesRunner |
Run Dataflow ValidatesRunner |
R: @reuvenlax would you mind reviewing this? |
SchemaCoder<?> that = (SchemaCoder<?>) o; | ||
return rowCoder.equals(that.rowCoder) | ||
&& toRowFunction.equals(that.toRowFunction) | ||
&& fromRowFunction.equals(that.fromRowFunction); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this just revert to object equality comparison on the to/from functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - I discussed this offline a bit with @kennknowles and he convinced me that it was better to have an equals function that might have some false negatives (if the toRowFunction and fromRowFunction don't have a good equals), rather than one that could have false positives (like if we rely on just checking the schema and typeDescriptor, and assume that the toRow/fromRow are the same).
I managed to make the CloudObjectsTest work by adding RowIdentity with an equals() function here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way I would phrase this is: let the functions own their equals. If they say they are equal, they are. If they say they aren't, they aren't. So this equals() is relative to that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good in theory. In practice these functions are usually lambdas, so we might have trouble making this work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true. I was thinking it's not such a big deal to get false negatives when lambdas are used, since I really just want the equality check to use in tests.
What do you think about updating the various schema providers to create Function sub-classes (with equals implemented) instead of using lambdas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another alternative could be to add something like assertEquivalentSchemaCoder
that just checks schema and type, rather than continuing down this rabbit hole.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we go ahead and merge this as is? I could follow up with more changes to the SchemaCoder equals (plumbing through a type descriptor and using that for comparison, as well as possibly changing the toRow/fromRow functions created by the existing SchemaProviders to make them comparable)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a PR up now (#9493) that adds equals
and hashCode
to the fromRow
and toRow
functions created by all the GetterBasedSchemaProvider
sub-classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW this is not just for tests. The Flink runner appears to rely on coder equality (even though you can argue it shouldn't).
R: @kennknowles |
Adds a
@RunWith(Enclosed.class)
toCloudObjectsTest
so thatDefaultCoders
actually runs. Since this test hasn't been running it has a few issues, which I've also attempted to resolve here. A summary of the changes to that end:StructuredCoder
sub-class use the components list as the expected components, rather than the usual arguments list.DoubleCoder
to list of Dataflow known coders.PIPELINE_PROTO_CODER_ID
rather than all model coders.fromRow
,toRow
, so it doesn't work as expected for instances created with lambdas.This also adds a test case that would have caught BEAM-8111.
Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.