Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-5646] Fix quality and hashcode for bytes in Row. #6765

Merged
merged 1 commit into from Oct 25, 2018

Conversation

amaliujia
Copy link
Contributor

@amaliujia amaliujia commented Oct 20, 2018

The quality of Bytes field in Row breaks because of byte[].equals(byte[]) (which should be the content comparison).

Change the implementation of Row's equals and hashcode to handle byte[] as a special case.


Follow this checklist to help us incorporate your contribution quickly and easily:

  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

It will help us expedite review of your Pull Request if you tag someone (e.g. @username) to look at it.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- --- --- --- ---
Java Build Status Build Status Build Status Build Status Build Status Build Status Build Status Build Status
Python Build Status --- Build Status
Build Status
Build Status --- --- ---

@amaliujia
Copy link
Contributor Author

amaliujia commented Oct 20, 2018

R: @reuvenlax @xumingmin @kanterov

Row a = Row.withSchema(schema).addValue(a0).build();
Row b = Row.withSchema(schema).addValue(b0).build();

Assert.assertEquals(a, b);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this fail without the change in hashCode?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. Seems like hashCode is not on the path of quality.

@amaliujia amaliujia force-pushed the rui_wang-fix_row_bytes_equality branch 3 times, most recently from a0a2c23 to b66a249 Compare October 22, 2018 02:26
@amaliujia amaliujia closed this Oct 22, 2018
@amaliujia amaliujia force-pushed the rui_wang-fix_row_bytes_equality branch from 12c8527 to e475c4c Compare October 22, 2018 02:34
@amaliujia amaliujia reopened this Oct 22, 2018
@amaliujia amaliujia force-pushed the rui_wang-fix_row_bytes_equality branch from 271012c to 2b77b97 Compare October 22, 2018 17:00
@amaliujia
Copy link
Contributor Author

There was an instability of Github on 10/21/2018. Please ignore those duplicate messages and other inconsistent behaviors on this PR.

Comments are addressed.

@amaliujia
Copy link
Contributor Author

run java precommit

@amaliujia amaliujia force-pushed the rui_wang-fix_row_bytes_equality branch from 2b77b97 to 848f7ec Compare October 24, 2018 18:00
@amaliujia amaliujia force-pushed the rui_wang-fix_row_bytes_equality branch from 848f7ec to d5a974c Compare October 24, 2018 22:53
@amaliujia
Copy link
Contributor Author

Ping

@reuvenlax could you take another look?

@reuvenlax
Copy link
Contributor

This adds extra full-array copies on the equals path. I'm not sure if that will matter in practice though. I'll go ahead an merge it, but let's keep an eye on the benchmarks.

@reuvenlax reuvenlax closed this Oct 25, 2018
@reuvenlax
Copy link
Contributor

lgtm

@reuvenlax reuvenlax reopened this Oct 25, 2018
@reuvenlax reuvenlax merged commit f965ebf into apache:master Oct 25, 2018
@amaliujia amaliujia deleted the rui_wang-fix_row_bytes_equality branch October 25, 2018 19:57
@kanterov kanterov mentioned this pull request Oct 26, 2018
2 tasks
@@ -347,12 +347,12 @@ public boolean equals(Object o) {
}
Row other = (Row) o;
return Objects.equals(getSchema(), other.getSchema())
&& Objects.equals(getValues(), other.getValues());
&& Objects.deepEquals(getValues().toArray(), other.getValues().toArray());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually not enough. If you have a schema with f: LIST<BYTES> it will not have correct equality. We need our own schema-driven deep equality.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This is a valid concern. If there is no other option, we need our deep equality check for our schema.

https://issues.apache.org/jira/browse/BEAM-5868

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem comes from Objects.deepEquals only have a deep equal implementation for primitive types and array. So Map and List will at least fail to check the correct equality.

byte[] a0 = new byte[] {1, 2, 3, 4};
byte[] b0 = new byte[] {1, 2, 3, 4};

Schema schema = Schema.of(Schema.Field.of("bytes", Schema.FieldType.BYTES));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need a test of many deeper structures.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants