New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-5646] Fix quality and hashcode for bytes in Row. #6765
[BEAM-5646] Fix quality and hashcode for bytes in Row. #6765
Conversation
Row a = Row.withSchema(schema).addValue(a0).build(); | ||
Row b = Row.withSchema(schema).addValue(b0).build(); | ||
|
||
Assert.assertEquals(a, b); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will this fail without the change in hashCode
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. Seems like hashCode
is not on the path of quality.
sdks/java/core/src/main/java/org/apache/beam/sdk/values/Row.java
Outdated
Show resolved
Hide resolved
a0a2c23
to
b66a249
Compare
12c8527
to
e475c4c
Compare
271012c
to
2b77b97
Compare
There was an instability of Github on 10/21/2018. Please ignore those duplicate messages and other inconsistent behaviors on this PR. Comments are addressed. |
run java precommit |
2b77b97
to
848f7ec
Compare
848f7ec
to
d5a974c
Compare
Ping @reuvenlax could you take another look? |
This adds extra full-array copies on the equals path. I'm not sure if that will matter in practice though. I'll go ahead an merge it, but let's keep an eye on the benchmarks. |
lgtm |
@@ -347,12 +347,12 @@ public boolean equals(Object o) { | |||
} | |||
Row other = (Row) o; | |||
return Objects.equals(getSchema(), other.getSchema()) | |||
&& Objects.equals(getValues(), other.getValues()); | |||
&& Objects.deepEquals(getValues().toArray(), other.getValues().toArray()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually not enough. If you have a schema with f: LIST<BYTES>
it will not have correct equality. We need our own schema-driven deep equality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. This is a valid concern. If there is no other option, we need our deep equality check for our schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem comes from Objects.deepEquals
only have a deep equal implementation for primitive types and array. So Map and List will at least fail to check the correct equality.
byte[] a0 = new byte[] {1, 2, 3, 4}; | ||
byte[] b0 = new byte[] {1, 2, 3, 4}; | ||
|
||
Schema schema = Schema.of(Schema.Field.of("bytes", Schema.FieldType.BYTES)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need a test of many deeper structures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The quality of
Bytes
field inRow
breaks because ofbyte[].equals(byte[])
(which should be the content comparison).Change the implementation of Row's
equals
andhashcode
to handlebyte[]
as a special case.Follow this checklist to help us incorporate your contribution quickly and easily:
[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.It will help us expedite review of your Pull Request if you tag someone (e.g.
@username
) to look at it.Post-Commit Tests Status (on master branch)