-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-32485][SQL][TEST] Fix endianness issues in tests in RecordBinaryComparatorSuite #29259
Conversation
ok to test |
I think #26548 only guarantees that the comparison results of I'm not sure if this is a problem if the Spark cluster is mixed-architecture. cc @kiszk @rednaxelafx @maropu @viirya |
Are you saying the test fails on big-endian machines, as-is? |
Yes. The long values being compared are in native byte order. Therefore the result of the byte-by-byte comparison may be different on big and little endian machines. |
Yes. This PR ensures that the order of the bytes provided to the comparator in the tests is the same on both big- and little-endian systems. This is required because #26548 didn't actually change the behaviour of the comparator on big-endian systems (which already compared 8-byte chunks in big-endian byte order) but did change the expected output of the comparisons in these tests. The change therefore broke these tests on big-endian systems. |
The change you cite doesn't change how things are compared - just fixes the intended logic. What's at issue here is what the test writes. Writing 11 to memory directly on little-endian writes different bytes as big-endian. They result in a different comparison because the test case ends up expressing a different test. I think the change is therefore reasonable, but would instead swap the bytes when the arch is big-endian, and leave the current assertion. Just a minor semantic difference that preserves the existing test, but, it's pretty fine either way. Is that the only current big-endian test failure? |
Ack. I'll rework that.
The reason for writing the values in big-endian byte order is that it preserves the original intent of the tests which are trying and trigger old bugs related to mis-sized integer comparisons. Writing the values in little-endian byte order makes the tests less applicable since they aren't hitting those cases. It might be better to just write out the bytes individually and avoid the platform endianness check. I'll see if that change is cleaner.
In this suite yes but there are test failures in other places. I am currently going through and fixing what I can. Eventually we'd like to try and get a big endian system into Jenkins, if possible. |
I see your point, the original intent was likely to write the bytes as if the big-endian representation of a long. OK. Up to you about how you want to attack it - if there are other similar failures we can fix them together in one PR to group them logically. If there are several distinct issues you can tackle them separately if desired, or, together could be OK too. I suspect they're all related to this general part of the code. |
Test build #126648 has finished for PR 29259 at commit
|
Based on #26548 and the comments, RecordBinaryComparator just needs to make sure a consistent order on the records to compare between reruns. A different order between big-endian and little-endian system is possible by RecordBinaryComparator. But I think it won't affect correctness generally. |
@viirya but this test asserts a particular ordering -- for test purposes, sure. The ordering won't be consistent, so the test fails based on arch. Because this is a special case, asserting the ordering, the endian-ness has to be fixed. |
OK, I can see the point. Looks we have better to fix it. |
long row1Data = 11L; | ||
long row2Data = 11L + Integer.MAX_VALUE; | ||
|
||
// BinaryComparator compares longs in big-endian byte order. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you like to add few more words? It is a bit hard to reason about this with this only sentence.
E.g. "... In little-endian sysyem, this bytes comparison does not overflow. We swap it to make sure overflow happened."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't even think that's quite true. The comparison isn't endian at all as it is byte-by-byte. But the point here is to write bytes in a certain order for the test, for sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the overflow only happened when comparing longs? @mundaym
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think overflow was the issue per se; signed vs unsigned bytes were, for sure, in the original issue. But not so much here in this test case.
long row2Data = 1; | ||
|
||
// BinaryComparator compares longs in big-endian byte order. | ||
if (ByteOrder.nativeOrder().equals(ByteOrder.LITTLE_ENDIAN)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, one of the tests does a comparison between
Long.MIN_VALUE and 1 in order to trigger an overflow condition that
existed in the past (i.e. Long.MIN_VALUE - 1). These constants
correspond to the values 0x80..00 and 0x00..01. However on a
little-endian machine the bytes in these values are now swapped
before they are compared. This means that we will now be comparing
0x00..80 with 0x01..00. 0x00..80 - 0x01..00 does not overflow
therefore missing the original purpose of the test.
I'm a bit confused by the PR description; I checked the original PR that added this test case and it seems like the overflow in the test title comes from the old code: https://github.com/apache/spark/pull/22101/files#diff-4ec35a60ad6a3f3f60f4d5ce91f59933L61-L63 To keep the original intention, why do you think we need to update the existing test in little-endian cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you mean, this is kind of a different issue -- yes. Should be a new JIRA. I'd summarize this as: the bytes that this test sets up and asserts about are different on big-endian. It creates the wrong test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, the intention of this test is to perform the comparison at compare()
regardless endianness.
For people, would it be good to add byte pattern to be tested.
80 00 00 00 00 00 00 00
00 00 00 00 00 00 00 01
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution, @mundaym .
BTW, please create a new JIRA with TESTS
component because SPARK-29918 is already released .
Does anyone know if there is a clean(er) way to copy an 8 byte array into a single element in an UnsafeRow? I'm trying to avoid the byte reversal but as far as I can tell the interface requires me to go through |
Maybe just leave it as is and write a comment about what the resulting byte order is, for readers |
I think we can move forward on this if we just change the comments, to show the desired bytes that are being written. Functionally this seems OK. |
|
||
insertRow(row1); | ||
insertRow(row2); | ||
|
||
Assert.assertTrue(compare(0, 1) > 0); | ||
Assert.assertTrue(compare(0, 1) < 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is: the comparator compares bytes from left to right. It doesn't assume the byte ordering of the data. It's expected that different byte order leads to different comparison results.
I think a simple way to fix the test is:
if (LITTLE_ENDIAN) {
Assert.assertTrue(compare(0, 1) < 0);
} else {
Assert.assertTrue(compare(0, 1) > 0);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would also work here. I think it's a little less ideal, because then you are running a different test on little- vs big-endian (and those are the correct answers to the two different tests). I think only one of those tests was the intended one AFAICT, and I buy the argument that the proposed change restores it as well as addresses endianness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then you are running a different test on little- vs big-endian
Yea it's true. But it's also true that this is what happens in the reality: the comparator compares bytes from left to right, no matter it's little- or big-endian.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right. There is no endian-ness issue in the logic being tested. There is endian-ness in how this test constructs the bytes to compare, because it writes a long into memory. I doubt this test's purpose holds both ways? It "works" but either little- or big-endian machines aren't running the test as intended. It feels like there's one intended test to run here, not either of two.
…an platforms. Comparisons performed by RecordBinaryComparator are executed in lexicographic order (byte by byte starting from the byte at index 0). This means that the underlying endianness of the platform can affect the result of a comparison between multi-byte values such as longs. This difference means that two tests fail on big-endian platforms. Also, the two tests compare 'special' long values to test edge cases that triggered old bugs in the fast path of the comparator. However since PR apache#26548 these 'special' values are byte-reversed before being compared on little-endian platforms, which means that the edge cases these 'special' values were designed to trigger are no longer tested. This PR fixes both these issues by byte reversing the values on little-endian systems. RecordBinaryComparatorSuite tests now pass on a big-endian machine (s390x).
Thanks everyone. Sorry for the delay. I've updated the PR commit description and the comments in the tests. I've also opened and linked against a new JIRA issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks OK to me.
// on little-endian platforms. | ||
long row1Data = 11L; | ||
long row2Data = 11L + Integer.MAX_VALUE; | ||
if (ByteOrder.nativeOrder().equals(ByteOrder.LITTLE_ENDIAN)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My worry is that: this changes what we were testing before. When we wrote the test, we assume the test is run in little-endian machines. Now we reverse the bytes for little-endian machines, and as a result the testing result also changes (result > 0
-> result < 0
).
I'm not very sure about the intention of these 2 tests. Maybe they were wrong at the very beginning and we should reverse the bytes for testing. I'll leave it for others to decide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is perhaps the only remaining issue. We should fix the test that this sets up of course, but, which was the intended test? I'm also not 100% sure, but @mundaym had a reasonable argument at #29259 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah I see, LGTM
Test build #127095 has finished for PR 29259 at commit
|
thanks, merging to master! |
…ryComparatorSuite ### What changes were proposed in this pull request? PR apache#26548 means that RecordBinaryComparator now uses big endian byte order for long comparisons. However, this means that some of the constants in the regression tests no longer map to the same values in the comparison that they used to. For example, one of the tests does a comparison between Long.MIN_VALUE and 1 in order to trigger an overflow condition that existed in the past (i.e. Long.MIN_VALUE - 1). These constants correspond to the values 0x80..00 and 0x00..01. However on a little-endian machine the bytes in these values are now swapped before they are compared. This means that we will now be comparing 0x00..80 with 0x01..00. 0x00..80 - 0x01..00 does not overflow therefore missing the original purpose of the test. To fix this the constants are now explicitly written out in big endian byte order to match the byte order used in the comparison. This also fixes the tests on big endian machines (which would otherwise get a different comparison result to the little-endian machines). ### Why are the changes needed? The regression tests no longer serve their initial purposes and also fail on big-endian systems. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Tests run on big-endian system (s390x). Closes apache#29259 from mundaym/fix-endian. Authored-by: Michael Munday <mike.munday@ibm.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
PR #26548 means that RecordBinaryComparator now uses big endian
byte order for long comparisons. However, this means that some of
the constants in the regression tests no longer map to the same
values in the comparison that they used to.
For example, one of the tests does a comparison between
Long.MIN_VALUE and 1 in order to trigger an overflow condition that
existed in the past (i.e. Long.MIN_VALUE - 1). These constants
correspond to the values 0x80..00 and 0x00..01. However on a
little-endian machine the bytes in these values are now swapped
before they are compared. This means that we will now be comparing
0x00..80 with 0x01..00. 0x00..80 - 0x01..00 does not overflow
therefore missing the original purpose of the test.
To fix this the constants are now explicitly written out in big
endian byte order to match the byte order used in the comparison.
This also fixes the tests on big endian machines (which would
otherwise get a different comparison result to the little-endian
machines).
Why are the changes needed?
The regression tests no longer serve their initial purposes and also fail on big-endian systems.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Tests run on big-endian system (s390x).