-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-29918][SQL] RecordBinaryComparator should check endianness when compared by long #26548
Conversation
@jiangxb1987 @cloud-fan Could you please help review this? |
ok to test |
@@ -273,7 +273,7 @@ public void testBinaryComparatorWhenSubtractionIsDivisibleByMaxIntValue() throws | |||
insertRow(row1); | |||
insertRow(row2); | |||
|
|||
assert(compare(0, 1) < 0); | |||
assert(compare(0, 1) > 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, do you mean this is wrong before this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change definitely changes the ordering, as bytes are compared in a different order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, do you mean this is wrong before this PR?
RecordBinaryComparator
is used as a local sort comparator in RoundRobinPartition
to make sure the order of each records are the same after rerun. But the relative order is not important.
Since this is reported as a correctness issue, ping @gatorsmile , too. |
if (v1 != v2) { | ||
if (ByteOrder.nativeOrder().equals(ByteOrder.LITTLE_ENDIAN)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cant his check happen once and make it a constant rather than call each time?
This is also the unfortunately the common case, little-endian architectures. Hm, I'm trying to think of an alternative that would introduce overhead only for big-endian, but not sure if it's possible. that said reverseBytes
ought to be very fast. And I guess this only happens once before this terminates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify the issue isn't endianness alone, but aligned access? varying either one could cause the result to change.
@@ -273,7 +273,7 @@ public void testBinaryComparatorWhenSubtractionIsDivisibleByMaxIntValue() throws | |||
insertRow(row1); | |||
insertRow(row2); | |||
|
|||
assert(compare(0, 1) < 0); | |||
assert(compare(0, 1) > 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change definitely changes the ordering, as bytes are compared in a different order.
Is https://issues.apache.org/jira/browse/SPARK-23207 actually the same issue? that was marked fixed. (Edited below to fix my example) So, hm, do we not also have a subtle problem with the signed nature of bytes and longs? Putting aside endianness issues, if you compare two longs starting with bytes like: 1000 0000 ... The first is less than the second, because it's clearly negative while the other isn't. That said, I also don't know how common it is to mix this hardware? (I don't even know which machines do and don't support aligned access.) Is it common? |
if (ByteOrder.nativeOrder().equals(ByteOrder.LITTLE_ENDIAN)) { | ||
v1 = Long.reverseBytes(v1); | ||
v2 = Long.reverseBytes(v2); | ||
} | ||
return v1 > v2 ? 1 : -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pulling together my comment with #25491 (comment) - maybe this also has to be Long.compareUnsigned
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thanks for point out this. This is the same issue with #25491 (comment). And this can be reproduced in our cluster.
Test build #113892 has finished for PR 26548 at commit
|
https://issues.apache.org/jira/browse/SPARK-23207 is the initial commit to address To clarify, suppose there is a map task whose input has two records A and B, both has 8 bytes
In the first run, suppose it runs on a little endian machine whose But in the following rerun due to shuffle fetch failure, suppose it runs on a machine whose The inconsistent after rerun can still cause incorrect data. |
Yeah I get the endianness issue; it is triggered even on little-endian architectures that vary in ability to do unaligned access. Just clarifying that is the case. And yeah I think there's a second closely related issue here that's worth resolving. BTW is the target architecture here really ARM? like you have a big Intel-based driver and ARM-based workers? |
is it officially supported to deploy Spark on mixed architecture? Spark read/write unsafe rows with unsafe API (native byte order), and seems many things can be broken on mixed architecture. |
Agree that mixing big- and little-endian probably breaks more than this. Mixing aligned/unaligned seems a bit more "valid" (again think mixing in some cheaper ARM instances, the "a" instances on AWS), though I still don't know whether it should be supported. It'd be nice I guess, but looks like it'd introduce a small perf hit. One argument for this kind of change is simply that it makes the code do what it appears to intend to do now, but doesn't, which makes me a little nervous. However if I read this right, this could come up even on the same hardware if it doesn't support unaligned access? if the two offsets can be aligned, it'll use the 'longs' code path for some of it. If it doesn't, then it uses the byte code path for all of it. This suggests those can give different answers. Even if unaligned access is supported everywhere, it seems like the same data could give different results depending on where the data are in memory, on the offsets mod 8. The code path again varies depending on whether they match or do not at the start. If so that's a bigger problem, but, why would we not observe it until now in testing? is it that the data it seems always happens to be aligned similarly, like always 0 or 4 mod 8 for internal reasons? just that this rarely ever actually causes a problem in practice because consistent results would only matter in rare cases? BTW @tcondie did your issue come up on mixed hardware? |
@srowen the issue did not appear on mixed hardware. |
Maybe some words in my description make you misunderstood. In fact, it has nothing to do with the mixing of big- and little-endian. For two same records, the comparison results may be different after rerun. It is just the code logic bug. This can happened in a cluster with all x86-64 and little-endian servers. Take a real test data in my test in a cluster will ALL x86-64 and little-endian servers as an example. In the first run, recordA's offset is In the second run, recordA's offset is Here comes the difference. For the last 4 bytes, in the first run, it is compared by byte-wise, in the second run, it is compared in a Long. Compare them in a Long and compare them byte-wise is not equal in little-endian machine(explained in #26548 (comment)). The offset of record A and B is different in the two run, which will cause the comparison code path in |
Yes, it isn't really related to endian-ness, although to fix this issue you have to know whether the bytes were read as little-endian or not. Here's a simple proof of concept:
(Edited the example to correctly show the problem with signed vs unsigned long comparison) This prints -1 and 1, but it's the same 8 bytes being compared with each other, even at the same alignment. In the first one, it's aligned (12 + 4 = 16), so starts by comparing longs. The first one is negative when read as a signed long, so compares before the second, 0. However when not aligned in the second case, it starts with byte-by-byte comparison, and the first 0xa0 is read as a unsigned byte greater than 0, so it's larger. (Note here the endian-ness issue cancels out, so to speak.) This example is actually illustrating the other issue mentioned here, about unsigned long comparison. But I think this example shows the issue you reported:
Prints 1 and -1. And yes this is just running locally on one machine even, Intel 64-bit (which supports unaligned access). The second one is fixed by a change like the one in this PR. |
BTW if we do make the compareUnsigned change, we could both optimize this bit of code while making it slightly more consistent:
|
@srowen are you for using unsigned long comparison? If so then I can verify the fix with the internal customer that is still hitting the inconsistent round-robin repartition issue. I'm not 100% sure this a root cause; I simply looked for areas of non-deterministic behavior and the byte/long comparison seemed to be a likely source. |
The first example is fixed with compareUnsigned, and the second is fixed with the change in this PR, independently. They should work together too. If you're testing, probably worth trying both changes. |
Nice explanation. The second example is just what I mean. |
nice explanation @srowen ! shall we fix the issues in one PR? |
yes, I'll update the PR today |
Test build #114021 has finished for PR 26548 at commit
|
@@ -22,6 +22,12 @@ | |||
|
|||
public final class RecordBinaryComparator extends RecordComparator { | |||
|
|||
boolean isLittlenEndian; | |||
|
|||
public RecordBinaryComparator(boolean isLittlenEndian) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for this. It's a fixed property of the entire VM / JVM. Just:
private static final boolean LITTLE_ENDIAN = ByteOrder.nativeOrder.equals(ByteOrder.LITTLE_ENDIAN);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
@@ -108,7 +108,7 @@ private int compare(int index1, int index2) { | |||
baseOffset2, recordLength2); | |||
} | |||
|
|||
private final RecordComparator binaryComparator = new RecordBinaryComparator(); | |||
private final RecordComparator binaryComparator = new RecordBinaryComparator(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... per the above comment, this wouldn't work on big-endian systems.
@@ -321,4 +321,69 @@ public void testBinaryComparatorWhenOnlyTheLastColumnDiffers() throws Exception | |||
|
|||
assert(compare(0, 1) < 0); | |||
} | |||
|
|||
@Test | |||
public void testBinaryComparatorGiveSameResultWhenComparedByteByByteAndComparedByLong() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, you might chop this down a bit. testCompareLongsAsLittleEndian()
or something.
throws Exception { | ||
int numFields = 1; | ||
|
||
UnsafeRow row1 = new UnsafeRow(numFields); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need UnsafeRow in this test? the test case I posted seems fine by itself. It isn't specific to UnsafeRow, or at least, I'd make sure this test still triggers the problem after the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test cast updated
} | ||
|
||
@Test | ||
public void testBinaryComparatorShouldComparedWithUnsignedLong() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
testCompareLongAsUnsigned
? we already know this is about RecordBinaryComparator
// so the RecordBinaryComparator will compare row1 and row2 with two long comparison directly, | ||
// while the comparison between row1 and row3 is started with 4 bytes byte-by-byte comparison, | ||
// followed by a long comparison, and lastly 4 bytes byte-by-byte comparison. | ||
assert(compare(0, 1) < 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just assert the comparison is the same, here and below.
Test build #114025 has finished for PR 26548 at commit
|
sql/core/src/main/java/org/apache/spark/sql/execution/RecordBinaryComparator.java
Show resolved
Hide resolved
long[] arr2 = new long[2]; | ||
Platform.putLong(arr2, arrayOffset + 4, 0x0000000000000000L); | ||
// both leftBaseOffset and rightBaseOffset are aligned, so it will start by comparing longs | ||
int result1 = binaryComparator.compare(arr1, arrayOffset + 4, 8, arr2, arrayOffset + 4, 8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think several of these lines are too long (> 100 chars)
Platform.putLong(arr4, arrayOffset, 0x0000000000000001L); | ||
int result2 = binaryComparator.compare(arr3, arrayOffset, 8, arr4, arrayOffset, 8); | ||
|
||
assert(result1 == result2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW I don't think we have to fix it here necessarily, but this whole suite is using JVM asserts, not JUnit asserts. It's still reasonable as assertions are enabled in tests, but, isn't the best practice - error messages are not as informative, and if someone doesn't turn on asserts, doesn't actually test. I don't mind fixing this separately ; there are just a few Java suites with this problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to fix this in #26581 ; whichever is merged first, the other can follow suit. It doesn't block this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed the two new added test cases to use Junit.Asserts
Test build #114022 has finished for PR 26548 at commit
|
Test build #114026 has finished for PR 26548 at commit
|
Test build #114046 has finished for PR 26548 at commit
|
…n compared by long ### What changes were proposed in this pull request? This PR try to make sure the comparison results of `compared by 8 bytes at a time` and `compared by bytes wise` in RecordBinaryComparator is *consistent*, by reverse long bytes if it is little-endian and using Long.compareUnsigned. ### Why are the changes needed? If the architecture supports unaligned or the offset is 8 bytes aligned, `RecordBinaryComparator` compare 8 bytes at a time by reading 8 bytes as a long. Related code is ``` if (Platform.unaligned() || (((leftOff + i) % 8 == 0) && ((rightOff + i) % 8 == 0))) { while (i <= leftLen - 8) { final long v1 = Platform.getLong(leftObj, leftOff + i); final long v2 = Platform.getLong(rightObj, rightOff + i); if (v1 != v2) { return v1 > v2 ? 1 : -1; } i += 8; } } ``` Otherwise, it will compare bytes by bytes. Related code is ``` while (i < leftLen) { final int v1 = Platform.getByte(leftObj, leftOff + i) & 0xff; final int v2 = Platform.getByte(rightObj, rightOff + i) & 0xff; if (v1 != v2) { return v1 > v2 ? 1 : -1; } i += 1; } ``` However, on little-endian machine, the result of *compared by a long value* and *compared bytes by bytes* maybe different. For two same records, its offsets may vary in the first run and second run, which will lead to compare them using long comparison or byte-by-byte comparison, the result maybe different. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Add new test cases in RecordBinaryComparatorSuite Closes #26548 from WangGuangxin/binary_comparator. Authored-by: wangguangxin.cn <wangguangxin.cn@bytedance.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit ffc9753) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
thanks, merging to master/2.4! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep looks good, I'll fix the other asserts separately.
Would it be possible to merge into 2.3? I believe the fix could resolve the non-deterministic round-robin shuffle issue, for which an earlier fix was merged into 2.3 i.e., SPARK-28699 |
Possible, but there are no more 2.3.x releases coming, so wouldn't end up in a release. |
I understand. Please let me know if it can be done in an existing release. We have a customer tied to 2.3 hitting the shuffle issue. |
This will be in 3.0 and 2.4.5. Vendors typically can cherry-pick commits like this if maintaining an otherwise EOL branch. It should be pretty clean. |
Understood. Thank you very much Sean! |
long[] arr4 = new long[2]; | ||
Platform.putLong(arr4, arrayOffset, 0x0000000000000001L); | ||
// both left and right offset is not aligned, it will start with byte-by-byte comparison | ||
int result2 = binaryComparator.compare(arr3, arrayOffset, 8, arr4, arrayOffset, 8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to consider the uaoSize here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, good question. I don't think it comes up here as it's just messing with an long[]
that was allocated normally? rather than data we wrote into an arbitrary location. But I may not be fully aware of the issue you have in mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some issues with the test cases in this PR. @jiangxb1987 will send a follow-up PR to fix them.
|
||
@Test | ||
public void testCompareLongsAsLittleEndian() { | ||
long arrayOffset = 12; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test cases in this PR look problematic: the long arrayOffset = 12;
is hard coded with implicit assumption on the underlying JVM's object layout in memory.
The object layout isn't guaranteed to be the same among different JVM, and in fact can be different even with the same JVM on the same machine but just different VM options.
It's preferable to use Platform.LONG_ARRAY_OFFSET
as your mental model of where "0-offset" is. You should never write anything into the offset range [0, Platform.LONG_ARRAY_OFFSET[
because there's no guarantee what's there.
e.g. on the HotSpot JVM, that's where the object header lives.
For example, on the HotSpot JVM, the object header size for long[]
is:
- 32-bit: 16 bytes = 4 mark work + 4 klassptr + 4 array length + 4 padding
- 64-bit with Compressed Class pointer: 16 bytes = 8 mark word + 4 klassptr + 4 array length
- 64-bit without Compressed Class pointer: 24 bytes = 8 mark work + 8 klassptr + 4 array length + 4 padding.
12 is always a bad offset to use in this test case on HotSpot, regardless of which mode above it's in -- writing to this offset will corrupt the header of thelong[]
and if you happen to go into a GC right after the corruption, you'll find the VM doing weird stuff (including crashing with a SIGSEGV).
…mparatorSuite` ### What changes were proposed in this pull request? As mentioned in #26548 (review), some test cases in `RecordBinaryComparatorSuite` use a fixed arrayOffset when writing to long arrays, this could lead to weird stuff including crashing with a SIGSEGV. This PR fix the problem by computing the arrayOffset based on `Platform.LONG_ARRAY_OFFSET`. ### How was this patch tested? Tested locally. Previously, when we try to add `System.gc()` between write into long array and compare by RecordBinaryComparator, there is a chance to hit JVM crash with SIGSEGV like: ``` # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x00007efc66970bcb, pid=11831, tid=0x00007efc0f9f9700 # # JRE version: OpenJDK Runtime Environment (8.0_222-b10) (build 1.8.0_222-8u222-b10-1ubuntu1~16.04.1-b10) # Java VM: OpenJDK 64-Bit Server VM (25.222-b10 mixed mode linux-amd64 compressed oops) # Problematic frame: # V [libjvm.so+0x5fbbcb] # # Core dump written. Default location: /home/jenkins/workspace/sql/core/core or core.11831 # # An error report file with more information is saved as: # /home/jenkins/workspace/sql/core/hs_err_pid11831.log # # If you would like to submit a bug report, please visit: # http://bugreport.java.com/bugreport/crash.jsp # ``` After the fix those test cases didn't crash the JVM anymore. Closes #26939 from jiangxb1987/rbc. Authored-by: Xingbo Jiang <xingbo.jiang@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…mparatorSuite` ### What changes were proposed in this pull request? As mentioned in #26548 (review), some test cases in `RecordBinaryComparatorSuite` use a fixed arrayOffset when writing to long arrays, this could lead to weird stuff including crashing with a SIGSEGV. This PR fix the problem by computing the arrayOffset based on `Platform.LONG_ARRAY_OFFSET`. ### How was this patch tested? Tested locally. Previously, when we try to add `System.gc()` between write into long array and compare by RecordBinaryComparator, there is a chance to hit JVM crash with SIGSEGV like: ``` # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x00007efc66970bcb, pid=11831, tid=0x00007efc0f9f9700 # # JRE version: OpenJDK Runtime Environment (8.0_222-b10) (build 1.8.0_222-8u222-b10-1ubuntu1~16.04.1-b10) # Java VM: OpenJDK 64-Bit Server VM (25.222-b10 mixed mode linux-amd64 compressed oops) # Problematic frame: # V [libjvm.so+0x5fbbcb] # # Core dump written. Default location: /home/jenkins/workspace/sql/core/core or core.11831 # # An error report file with more information is saved as: # /home/jenkins/workspace/sql/core/hs_err_pid11831.log # # If you would like to submit a bug report, please visit: # http://bugreport.java.com/bugreport/crash.jsp # ``` After the fix those test cases didn't crash the JVM anymore. Closes #26939 from jiangxb1987/rbc. Authored-by: Xingbo Jiang <xingbo.jiang@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…RecordBinaryComparatorSuite PR apache#26548 means that RecordBinaryComparator now uses big endian byte order for long comparisons. However, this means that some of the constants in the regression tests no longer map to the same values in the comparison that they used to. For example, one of the tests does a comparison between Long.MIN_VALUE and 1 in order to trigger an overflow condition that existed in the past (i.e. Long.MIN_VALUE - 1). These constants correspond to the values 0x80..00 and 0x00..01. However on a little-endian machine the bytes in these values are now swapped before they are compared. This means that we will now be comparing 0x00..80 with 0x01..00. 0x00..80 - 0x01..00 does not overflow therefore missing the original purpose of the test. To fix this the constants are now explicitly written out in big endian byte order to match the byte order used in the comparison. This also fixes the tests on big endian machines (which would otherwise get a different comparison result to the little-endian machines). RecordBinaryComparatorSuite tests now pass on a big-endian machine (s390x).
…an platforms. Comparisons performed by RecordBinaryComparator are executed in lexicographic order (byte by byte starting from the byte at index 0). This means that the underlying endianness of the platform can affect the result of a comparison between multi-byte values such as longs. This difference means that two tests fail on big-endian platforms. Also, the two tests compare 'special' long values to test edge cases that triggered old bugs in the fast path of the comparator. However since PR apache#26548 these 'special' values are byte-reversed before being compared on little-endian platforms, which means that the edge cases these 'special' values were designed to trigger are no longer tested. This PR fixes both these issues by byte reversing the values on little-endian systems. RecordBinaryComparatorSuite tests now pass on a big-endian machine (s390x).
…ryComparatorSuite ### What changes were proposed in this pull request? PR #26548 means that RecordBinaryComparator now uses big endian byte order for long comparisons. However, this means that some of the constants in the regression tests no longer map to the same values in the comparison that they used to. For example, one of the tests does a comparison between Long.MIN_VALUE and 1 in order to trigger an overflow condition that existed in the past (i.e. Long.MIN_VALUE - 1). These constants correspond to the values 0x80..00 and 0x00..01. However on a little-endian machine the bytes in these values are now swapped before they are compared. This means that we will now be comparing 0x00..80 with 0x01..00. 0x00..80 - 0x01..00 does not overflow therefore missing the original purpose of the test. To fix this the constants are now explicitly written out in big endian byte order to match the byte order used in the comparison. This also fixes the tests on big endian machines (which would otherwise get a different comparison result to the little-endian machines). ### Why are the changes needed? The regression tests no longer serve their initial purposes and also fail on big-endian systems. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Tests run on big-endian system (s390x). Closes #29259 from mundaym/fix-endian. Authored-by: Michael Munday <mike.munday@ibm.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…ryComparatorSuite ### What changes were proposed in this pull request? PR apache#26548 means that RecordBinaryComparator now uses big endian byte order for long comparisons. However, this means that some of the constants in the regression tests no longer map to the same values in the comparison that they used to. For example, one of the tests does a comparison between Long.MIN_VALUE and 1 in order to trigger an overflow condition that existed in the past (i.e. Long.MIN_VALUE - 1). These constants correspond to the values 0x80..00 and 0x00..01. However on a little-endian machine the bytes in these values are now swapped before they are compared. This means that we will now be comparing 0x00..80 with 0x01..00. 0x00..80 - 0x01..00 does not overflow therefore missing the original purpose of the test. To fix this the constants are now explicitly written out in big endian byte order to match the byte order used in the comparison. This also fixes the tests on big endian machines (which would otherwise get a different comparison result to the little-endian machines). ### Why are the changes needed? The regression tests no longer serve their initial purposes and also fail on big-endian systems. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Tests run on big-endian system (s390x). Closes apache#29259 from mundaym/fix-endian. Authored-by: Michael Munday <mike.munday@ibm.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
This PR try to make sure the comparison results of
compared by 8 bytes at a time
andcompared by bytes wise
in RecordBinaryComparator is consistent, by reverse long bytes if it is little-endian and using Long.compareUnsigned.Why are the changes needed?
If the architecture supports unaligned or the offset is 8 bytes aligned,
RecordBinaryComparator
compare 8 bytes at a time by reading 8 bytes as a long. Related code isOtherwise, it will compare bytes by bytes. Related code is
However, on little-endian machine, the result of compared by a long value and compared bytes by bytes maybe different.
For two same records, its offsets may vary in the first run and second run, which will lead to compare them using long comparison or byte-by-byte comparison, the result maybe different.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Add new test cases in RecordBinaryComparatorSuite