[SPARK-29918][SQL] RecordBinaryComparator should check endianness when compared by long #26548

WangGuangxin · 2019-11-15T16:57:46Z

What changes were proposed in this pull request?

This PR try to make sure the comparison results of compared by 8 bytes at a time and compared by bytes wise in RecordBinaryComparator is consistent, by reverse long bytes if it is little-endian and using Long.compareUnsigned.

Why are the changes needed?

If the architecture supports unaligned or the offset is 8 bytes aligned, RecordBinaryComparator compare 8 bytes at a time by reading 8 bytes as a long. Related code is

    if (Platform.unaligned() || (((leftOff + i) % 8 == 0) && ((rightOff + i) % 8 == 0))) {
      while (i <= leftLen - 8) {
        final long v1 = Platform.getLong(leftObj, leftOff + i);
        final long v2 = Platform.getLong(rightObj, rightOff + i);
        if (v1 != v2) {
          return v1 > v2 ? 1 : -1;
        }
        i += 8;
      }
    }

Otherwise, it will compare bytes by bytes. Related code is

    while (i < leftLen) {
      final int v1 = Platform.getByte(leftObj, leftOff + i) & 0xff;
      final int v2 = Platform.getByte(rightObj, rightOff + i) & 0xff;
      if (v1 != v2) {
        return v1 > v2 ? 1 : -1;
      }
      i += 1;
    }

However, on little-endian machine, the result of compared by a long value and compared bytes by bytes maybe different.

For two same records, its offsets may vary in the first run and second run, which will lead to compare them using long comparison or byte-by-byte comparison, the result maybe different.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Add new test cases in RecordBinaryComparatorSuite

WangGuangxin · 2019-11-15T16:59:45Z

@jiangxb1987 @cloud-fan Could you please help review this?

dongjoon-hyun · 2019-11-15T18:27:30Z

ok to test

dongjoon-hyun · 2019-11-15T18:28:41Z

...core/src/test/java/test/org/apache/spark/sql/execution/sort/RecordBinaryComparatorSuite.java

@@ -273,7 +273,7 @@ public void testBinaryComparatorWhenSubtractionIsDivisibleByMaxIntValue() throws
    insertRow(row1);
    insertRow(row2);

-    assert(compare(0, 1) < 0);
+    assert(compare(0, 1) > 0);


So, do you mean this is wrong before this PR?

The change definitely changes the ordering, as bytes are compared in a different order.

So, do you mean this is wrong before this PR?

RecordBinaryComparator is used as a local sort comparator in RoundRobinPartition to make sure the order of each records are the same after rerun. But the relative order is not important.

dongjoon-hyun · 2019-11-15T18:30:46Z

Since this is reported as a correctness issue, ping @gatorsmile , too.

srowen · 2019-11-15T20:23:38Z

sql/core/src/main/java/org/apache/spark/sql/execution/RecordBinaryComparator.java

        if (v1 != v2) {
+          if (ByteOrder.nativeOrder().equals(ByteOrder.LITTLE_ENDIAN)) {


Cant his check happen once and make it a constant rather than call each time?

This is also the unfortunately the common case, little-endian architectures. Hm, I'm trying to think of an alternative that would introduce overhead only for big-endian, but not sure if it's possible. that said reverseBytes ought to be very fast. And I guess this only happens once before this terminates.

To clarify the issue isn't endianness alone, but aligned access? varying either one could cause the result to change.

srowen · 2019-11-15T20:36:46Z

...core/src/test/java/test/org/apache/spark/sql/execution/sort/RecordBinaryComparatorSuite.java

@@ -273,7 +273,7 @@ public void testBinaryComparatorWhenSubtractionIsDivisibleByMaxIntValue() throws
    insertRow(row1);
    insertRow(row2);

-    assert(compare(0, 1) < 0);
+    assert(compare(0, 1) > 0);


The change definitely changes the ordering, as bytes are compared in a different order.

srowen · 2019-11-15T20:41:42Z

Is https://issues.apache.org/jira/browse/SPARK-23207 actually the same issue? that was marked fixed.

(Edited below to fix my example)

So, hm, do we not also have a subtle problem with the signed nature of bytes and longs? Putting aside endianness issues, if you compare two longs starting with bytes like:

1000 0000 ...
0000 0000 ...

The first is less than the second, because it's clearly negative while the other isn't.
But comparing byte by byte, we'd consider the first one to be greater than the second.
Would this also be an issue between aligned and unaligned machines?

That said, I also don't know how common it is to mix this hardware? (I don't even know which machines do and don't support aligned access.) Is it common?

srowen · 2019-11-15T21:33:26Z

sql/core/src/main/java/org/apache/spark/sql/execution/RecordBinaryComparator.java

+          if (ByteOrder.nativeOrder().equals(ByteOrder.LITTLE_ENDIAN)) {
+            v1 = Long.reverseBytes(v1);
+            v2 = Long.reverseBytes(v2);
+          }
          return v1 > v2 ? 1 : -1;


Pulling together my comment with #25491 (comment) - maybe this also has to be Long.compareUnsigned?

Yes, thanks for point out this. This is the same issue with #25491 (comment). And this can be reproduced in our cluster.

SparkQA · 2019-11-15T22:11:27Z

Test build #113892 has finished for PR 26548 at commit 37ebd38.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

WangGuangxin · 2019-11-15T22:28:53Z

Is https://issues.apache.org/jira/browse/SPARK-23207 actually the same issue? that was marked fixed.

(Edited below to fix my example)

So, hm, do we not also have a subtle problem with the signed nature of bytes and longs? Putting aside endianness issues, if you compare two longs starting with bytes like:

1000 0000 ...
0000 0000 ...

The first is less than the second, because it's clearly negative while the other isn't.
But comparing byte by byte, we'd consider the first one to be greater than the second.
Would this also be an issue between aligned and unaligned machines?

That said, I also don't know how common it is to mix this hardware? (I don't even know which machines do and don't support aligned access.) Is it common?

https://issues.apache.org/jira/browse/SPARK-23207 is the initial commit to address Shuffle + Repartition case, and RecordBinaryComparator is introduced in this commit as a local sort comparator to make sure the partition output is consistent even after rerun due to stage failure.
What this PR does is to make sure the RecordBinaryComparator always produce the same result even with different architecture.

To clarify, suppose there is a map task whose input has two records A and B, both has 8 bytes

A: 00000001 00000000 * 6 00000000
B: 00000000 00000000 * 6 00000001

In the first run, suppose it runs on a little endian machine whose Platform.unaligned() is true, then A and B is compared by reading it as a Long, so A = 1 and B = 72057594037927936L, so A < B.

But in the following rerun due to shuffle fetch failure, suppose it runs on a machine whose Platform.unaligned() is false, then A and B is compared by byte-wise from low address to high address, so A > B.

The inconsistent after rerun can still cause incorrect data.

srowen · 2019-11-15T22:33:29Z

Yeah I get the endianness issue; it is triggered even on little-endian architectures that vary in ability to do unaligned access. Just clarifying that is the case.

And yeah I think there's a second closely related issue here that's worth resolving.

BTW is the target architecture here really ARM? like you have a big Intel-based driver and ARM-based workers?

cloud-fan · 2019-11-16T09:01:33Z

is it officially supported to deploy Spark on mixed architecture? Spark read/write unsafe rows with unsafe API (native byte order), and seems many things can be broken on mixed architecture.

srowen · 2019-11-16T12:25:13Z

Agree that mixing big- and little-endian probably breaks more than this. Mixing aligned/unaligned seems a bit more "valid" (again think mixing in some cheaper ARM instances, the "a" instances on AWS), though I still don't know whether it should be supported. It'd be nice I guess, but looks like it'd introduce a small perf hit.

One argument for this kind of change is simply that it makes the code do what it appears to intend to do now, but doesn't, which makes me a little nervous.

However if I read this right, this could come up even on the same hardware if it doesn't support unaligned access? if the two offsets can be aligned, it'll use the 'longs' code path for some of it. If it doesn't, then it uses the byte code path for all of it. This suggests those can give different answers.

Even if unaligned access is supported everywhere, it seems like the same data could give different results depending on where the data are in memory, on the offsets mod 8. The code path again varies depending on whether they match or do not at the start.

If so that's a bigger problem, but, why would we not observe it until now in testing? is it that the data it seems always happens to be aligned similarly, like always 0 or 4 mod 8 for internal reasons? just that this rarely ever actually causes a problem in practice because consistent results would only matter in rare cases? BTW @tcondie did your issue come up on mixed hardware?

tcondie · 2019-11-16T14:26:35Z

@srowen the issue did not appear on mixed hardware.

WangGuangxin · 2019-11-16T14:52:08Z

Maybe some words in my description make you misunderstood. In fact, it has nothing to do with the mixing of big- and little-endian.

For two same records, the comparison results may be different after rerun. It is just the code logic bug. This can happened in a cluster with all x86-64 and little-endian servers.

Take a real test data in my test in a cluster will ALL x86-64 and little-endian servers as an example.

In the first run, recordA's offset is 25617612, length is 40, recordB's offset is 53434324, length is 40. Since 25617612 % 8 == 4 && 53434324 % 8 == 4, so according the logic in RecordBinaryComparator, it will compare first 4 bytes in byte-wise, and then compare the following 32 bytes by 4 Long. The last 4 bytes is compared by byte-wise again.

In the second run, recordA's offset is 19257280, length is 40, recordB's offset is 16892896, length is 40. Since 19257280 % 8 == 0 && 16892896 % 8 == 0, so it will compare the 40 bytes by 5 Long according to the logic in RecordBinaryComparator.

Here comes the difference. For the last 4 bytes, in the first run, it is compared by byte-wise, in the second run, it is compared in a Long. Compare them in a Long and compare them byte-wise is not equal in little-endian machine(explained in #26548 (comment)). The offset of record A and B is different in the two run, which will cause the comparison code path in RecordBinaryComparator different.

@srowen @cloud-fan

srowen · 2019-11-16T15:11:59Z

Yes, it isn't really related to endian-ness, although to fix this issue you have to know whether the bytes were read as little-endian or not.

Here's a simple proof of concept:

    long arrayOffset = 12;
    
    long[] arr1 = new long[2];
    // 12 + 4 = 16, aligned
    Platform.putLong(arr1, arrayOffset + 4, 0xa000000000000000L);
    long[] arr2 = new long[2];
    Platform.putLong(arr2, arrayOffset + 4, 0x0000000000000000L);
    System.out.println(binaryComparator.compare(arr1, arrayOffset + 4, 8, arr2, arrayOffset + 4, 8));

    long[] arr3 = new long[2];
    Platform.putLong(arr3, arrayOffset, 0xa000000000000000L);
    long[] arr4 = new long[2];
    Platform.putLong(arr4, arrayOffset, 0x0000000000000000L);
    System.out.println(binaryComparator.compare(arr3, arrayOffset, 8, arr4, arrayOffset, 8));

(Edited the example to correctly show the problem with signed vs unsigned long comparison)

This prints -1 and 1, but it's the same 8 bytes being compared with each other, even at the same alignment. In the first one, it's aligned (12 + 4 = 16), so starts by comparing longs. The first one is negative when read as a signed long, so compares before the second, 0. However when not aligned in the second case, it starts with byte-by-byte comparison, and the first 0xa0 is read as a unsigned byte greater than 0, so it's larger. (Note here the endian-ness issue cancels out, so to speak.)

This example is actually illustrating the other issue mentioned here, about unsigned long comparison. But I think this example shows the issue you reported:

    long arrayOffset = 12;

    long[] arr1 = new long[2];
    Platform.putLong(arr1, arrayOffset, 0x0100000000000000L);
    long[] arr2 = new long[2];
    Platform.putLong(arr2, arrayOffset + 4, 0x0000000000000001L);
    System.out.println(binaryComparator.compare(arr1, arrayOffset, 8, arr2, arrayOffset + 4, 8));

    long[] arr3 = new long[2];
    Platform.putLong(arr3, arrayOffset, 0x0100000000000000L);
    long[] arr4 = new long[2];
    Platform.putLong(arr4, arrayOffset, 0x0000000000000001L);
    System.out.println(binaryComparator.compare(arr3, arrayOffset, 8, arr4, arrayOffset, 8));

Prints 1 and -1.
Here, in the first case it again starts with long comparisons as the architecture supports unaligned comparison. Both are positive but the long in the first case is larger. In the second case, they're not aligned on a 8-byte boundary so starts with byte comparison. But the first byte it looks at is the least-significant byte of the longs, and so compares 0 to 1 and finds the second is larger.

And yes this is just running locally on one machine even, Intel 64-bit (which supports unaligned access).

The second one is fixed by a change like the one in this PR.
The first one is fixed by also comparing the longs then with Long.compareUnsigned. They agree in both cases if both changes are made.

srowen · 2019-11-16T16:14:57Z

BTW if we do make the compareUnsigned change, we could both optimize this bit of code while making it slightly more consistent:

        final int v1 = Platform.getByte(leftObj, leftOff + i) & 0xff;
        final int v2 = Platform.getByte(rightObj, rightOff + i) & 0xff;
        if (v1 != v2) {
          return v1 > v2 ? 1 : -1;
        }

        final byte v1 = Platform.getByte(leftObj, leftOff + i);
        final byte v2 = Platform.getByte(rightObj, rightOff + i);
        if (v1 != v2) {
          return (v1 & 0xff) > (v2 & 0xff) ? 1 : -1;
        }

tcondie · 2019-11-16T18:18:35Z

@srowen are you for using unsigned long comparison? If so then I can verify the fix with the internal customer that is still hitting the inconsistent round-robin repartition issue. I'm not 100% sure this a root cause; I simply looked for areas of non-deterministic behavior and the byte/long comparison seemed to be a likely source.

srowen · 2019-11-16T18:55:41Z

The first example is fixed with compareUnsigned, and the second is fixed with the change in this PR, independently. They should work together too. If you're testing, probably worth trying both changes.

WangGuangxin · 2019-11-17T00:12:27Z

Yes, it isn't really related to endian-ness, although to fix this issue you have to know whether the bytes were read as little-endian or not.

Here's a simple proof of concept:
    long arrayOffset = 12;
    
    long[] arr1 = new long[2];
    // 12 + 4 = 16, aligned
    Platform.putLong(arr1, arrayOffset + 4, 0xa000000000000000L);
    long[] arr2 = new long[2];
    Platform.putLong(arr2, arrayOffset + 4, 0x0000000000000000L);
    System.out.println(binaryComparator.compare(arr1, arrayOffset + 4, 8, arr2, arrayOffset + 4, 8));

    long[] arr3 = new long[2];
    Platform.putLong(arr3, arrayOffset, 0xa000000000000000L);
    long[] arr4 = new long[2];
    Platform.putLong(arr4, arrayOffset, 0x0000000000000000L);
    System.out.println(binaryComparator.compare(arr3, arrayOffset, 8, arr4, arrayOffset, 8));
(Edited the example to correctly show the problem with signed vs unsigned long comparison)

This prints -1 and 1, but it's the same 8 bytes being compared with each other, even at the same alignment. In the first one, it's aligned (12 + 4 = 16), so starts by comparing longs. The first one is negative when read as a signed long, so compares before the second, 0. However when not aligned in the second case, it starts with byte-by-byte comparison, and the first 0xa0 is read as a unsigned byte greater than 0, so it's larger. (Note here the endian-ness issue cancels out, so to speak.)

This example is actually illustrating the other issue mentioned here, about unsigned long comparison. But I think this example shows the issue you reported:
    long arrayOffset = 12;

    long[] arr1 = new long[2];
    Platform.putLong(arr1, arrayOffset, 0x0100000000000000L);
    long[] arr2 = new long[2];
    Platform.putLong(arr2, arrayOffset + 4, 0x0000000000000001L);
    System.out.println(binaryComparator.compare(arr1, arrayOffset, 8, arr2, arrayOffset + 4, 8));

    long[] arr3 = new long[2];
    Platform.putLong(arr3, arrayOffset, 0x0100000000000000L);
    long[] arr4 = new long[2];
    Platform.putLong(arr4, arrayOffset, 0x0000000000000001L);
    System.out.println(binaryComparator.compare(arr3, arrayOffset, 8, arr4, arrayOffset, 8));
Prints 1 and -1.
Here, in the first case it again starts with long comparisons as the architecture supports unaligned comparison. Both are positive but the long in the first case is larger. In the second case, they're not aligned on a 8-byte boundary so starts with byte comparison. But the first byte it looks at is the least-significant byte of the longs, and so compares 0 to 1 and finds the second is larger.

And yes this is just running locally on one machine even, Intel 64-bit (which supports unaligned access).

The second one is fixed by a change like the one in this PR.
The first one is fixed by also comparing the longs then with Long.compareUnsigned. They agree in both cases if both changes are made.

Nice explanation. The second example is just what I mean.

cloud-fan · 2019-11-18T03:48:52Z

nice explanation @srowen ! shall we fix the issues in one PR?

WangGuangxin · 2019-11-18T08:26:49Z

yes, I'll update the PR today

SparkQA · 2019-11-18T14:07:08Z

Test build #114021 has finished for PR 26548 at commit 3cb480d.

This patch fails Java style tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2019-11-18T14:31:06Z

sql/core/src/main/java/org/apache/spark/sql/execution/RecordBinaryComparator.java

@@ -22,6 +22,12 @@

 public final class RecordBinaryComparator extends RecordComparator {

+  boolean isLittlenEndian;
+
+  public RecordBinaryComparator(boolean isLittlenEndian) {


No need for this. It's a fixed property of the entire VM / JVM. Just:
private static final boolean LITTLE_ENDIAN = ByteOrder.nativeOrder.equals(ByteOrder.LITTLE_ENDIAN);

srowen · 2019-11-18T14:31:52Z

...core/src/test/java/test/org/apache/spark/sql/execution/sort/RecordBinaryComparatorSuite.java

@@ -108,7 +108,7 @@ private int compare(int index1, int index2) {
        baseOffset2, recordLength2);
  }

-  private final RecordComparator binaryComparator = new RecordBinaryComparator();
+  private final RecordComparator binaryComparator = new RecordBinaryComparator(true);


... per the above comment, this wouldn't work on big-endian systems.

srowen · 2019-11-18T14:32:47Z

...core/src/test/java/test/org/apache/spark/sql/execution/sort/RecordBinaryComparatorSuite.java

@@ -321,4 +321,69 @@ public void testBinaryComparatorWhenOnlyTheLastColumnDiffers() throws Exception

    assert(compare(0, 1) < 0);
  }
+
+  @Test
+  public void testBinaryComparatorGiveSameResultWhenComparedByteByByteAndComparedByLong()


Heh, you might chop this down a bit. testCompareLongsAsLittleEndian() or something.

srowen · 2019-11-18T14:33:18Z

...core/src/test/java/test/org/apache/spark/sql/execution/sort/RecordBinaryComparatorSuite.java

+      throws Exception {
+    int numFields = 1;
+
+    UnsafeRow row1 = new UnsafeRow(numFields);


Do you need UnsafeRow in this test? the test case I posted seems fine by itself. It isn't specific to UnsafeRow, or at least, I'd make sure this test still triggers the problem after the change.

test cast updated

srowen · 2019-11-18T14:33:43Z

...core/src/test/java/test/org/apache/spark/sql/execution/sort/RecordBinaryComparatorSuite.java

+  }
+
+  @Test
+  public void testBinaryComparatorShouldComparedWithUnsignedLong() throws Exception {


testCompareLongAsUnsigned? we already know this is about RecordBinaryComparator

srowen · 2019-11-18T14:34:04Z

...core/src/test/java/test/org/apache/spark/sql/execution/sort/RecordBinaryComparatorSuite.java

+    // so the RecordBinaryComparator will compare row1 and row2 with two long comparison directly,
+    // while the comparison between row1 and row3 is started with 4 bytes byte-by-byte comparison,
+    // followed by a long comparison, and lastly 4 bytes byte-by-byte comparison.
+    assert(compare(0, 1) < 0);


Just assert the comparison is the same, here and below.

SparkQA · 2019-11-18T15:41:38Z

Test build #114025 has finished for PR 26548 at commit fdbc871.

This patch fails Java style tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
public final class RecordBinaryComparator extends RecordComparator

sql/core/src/main/java/org/apache/spark/sql/execution/RecordBinaryComparator.java

srowen · 2019-11-18T17:20:41Z

...core/src/test/java/test/org/apache/spark/sql/execution/sort/RecordBinaryComparatorSuite.java

+    long[] arr2 = new long[2];
+    Platform.putLong(arr2, arrayOffset + 4, 0x0000000000000000L);
+    // both leftBaseOffset and rightBaseOffset are aligned, so it will start by comparing longs
+    int result1 = binaryComparator.compare(arr1, arrayOffset + 4, 8, arr2, arrayOffset + 4, 8);


I think several of these lines are too long (> 100 chars)

srowen · 2019-11-18T17:23:47Z

...core/src/test/java/test/org/apache/spark/sql/execution/sort/RecordBinaryComparatorSuite.java

+    Platform.putLong(arr4, arrayOffset, 0x0000000000000001L);
+    int result2 = binaryComparator.compare(arr3, arrayOffset, 8, arr4, arrayOffset, 8);
+
+    assert(result1 == result2);


BTW I don't think we have to fix it here necessarily, but this whole suite is using JVM asserts, not JUnit asserts. It's still reasonable as assertions are enabled in tests, but, isn't the best practice - error messages are not as informative, and if someone doesn't turn on asserts, doesn't actually test. I don't mind fixing this separately ; there are just a few Java suites with this problem.

I'm going to fix this in #26581 ; whichever is merged first, the other can follow suit. It doesn't block this.

I've changed the two new added test cases to use Junit.Asserts

SparkQA · 2019-11-18T17:38:06Z

Test build #114022 has finished for PR 26548 at commit a07cfc6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-11-18T19:17:35Z

Test build #114026 has finished for PR 26548 at commit 5cc04fa.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-11-19T07:01:22Z

Test build #114046 has finished for PR 26548 at commit 753883d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

…n compared by long ### What changes were proposed in this pull request? This PR try to make sure the comparison results of `compared by 8 bytes at a time` and `compared by bytes wise` in RecordBinaryComparator is *consistent*, by reverse long bytes if it is little-endian and using Long.compareUnsigned. ### Why are the changes needed? If the architecture supports unaligned or the offset is 8 bytes aligned, `RecordBinaryComparator` compare 8 bytes at a time by reading 8 bytes as a long. Related code is ``` if (Platform.unaligned() || (((leftOff + i) % 8 == 0) && ((rightOff + i) % 8 == 0))) { while (i <= leftLen - 8) { final long v1 = Platform.getLong(leftObj, leftOff + i); final long v2 = Platform.getLong(rightObj, rightOff + i); if (v1 != v2) { return v1 > v2 ? 1 : -1; } i += 8; } } ``` Otherwise, it will compare bytes by bytes. Related code is ``` while (i < leftLen) { final int v1 = Platform.getByte(leftObj, leftOff + i) & 0xff; final int v2 = Platform.getByte(rightObj, rightOff + i) & 0xff; if (v1 != v2) { return v1 > v2 ? 1 : -1; } i += 1; } ``` However, on little-endian machine, the result of *compared by a long value* and *compared bytes by bytes* maybe different. For two same records, its offsets may vary in the first run and second run, which will lead to compare them using long comparison or byte-by-byte comparison, the result maybe different. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Add new test cases in RecordBinaryComparatorSuite Closes #26548 from WangGuangxin/binary_comparator. Authored-by: wangguangxin.cn <wangguangxin.cn@bytedance.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit ffc9753) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

cloud-fan · 2019-11-19T08:13:45Z

thanks, merging to master/2.4!

srowen

Yep looks good, I'll fix the other asserts separately.

tcondie · 2019-11-20T16:35:21Z

Would it be possible to merge into 2.3? I believe the fix could resolve the non-deterministic round-robin shuffle issue, for which an earlier fix was merged into 2.3 i.e., SPARK-28699

srowen · 2019-11-20T16:37:20Z

Possible, but there are no more 2.3.x releases coming, so wouldn't end up in a release.

tcondie · 2019-11-20T16:38:44Z

I understand. Please let me know if it can be done in an existing release. We have a customer tied to 2.3 hitting the shuffle issue.

srowen · 2019-11-20T16:57:40Z

This will be in 3.0 and 2.4.5. Vendors typically can cherry-pick commits like this if maintaining an otherwise EOL branch. It should be pretty clean.

tcondie · 2019-11-20T17:18:43Z

Understood. Thank you very much Sean!

jiangxb1987 · 2019-12-07T00:49:37Z

...core/src/test/java/test/org/apache/spark/sql/execution/sort/RecordBinaryComparatorSuite.java

+    long[] arr4 = new long[2];
+    Platform.putLong(arr4, arrayOffset, 0x0000000000000001L);
+    // both left and right offset is not aligned, it will start with byte-by-byte comparison
+    int result2 = binaryComparator.compare(arr3, arrayOffset, 8, arr4, arrayOffset, 8);


Do we need to consider the uaoSize here?

Hm, good question. I don't think it comes up here as it's just messing with an long[] that was allocated normally? rather than data we wrote into an arbitrary location. But I may not be fully aware of the issue you have in mind.

rednaxelafx

There are some issues with the test cases in this PR. @jiangxb1987 will send a follow-up PR to fix them.

rednaxelafx · 2019-12-19T00:22:39Z

...core/src/test/java/test/org/apache/spark/sql/execution/sort/RecordBinaryComparatorSuite.java

+
+  @Test
+  public void testCompareLongsAsLittleEndian() {
+    long arrayOffset = 12;


The test cases in this PR look problematic: the long arrayOffset = 12; is hard coded with implicit assumption on the underlying JVM's object layout in memory.
The object layout isn't guaranteed to be the same among different JVM, and in fact can be different even with the same JVM on the same machine but just different VM options.

It's preferable to use Platform.LONG_ARRAY_OFFSET as your mental model of where "0-offset" is. You should never write anything into the offset range [0, Platform.LONG_ARRAY_OFFSET[ because there's no guarantee what's there.
e.g. on the HotSpot JVM, that's where the object header lives.

For example, on the HotSpot JVM, the object header size for long[] is:

32-bit: 16 bytes = 4 mark work + 4 klassptr + 4 array length + 4 padding

64-bit with Compressed Class pointer: 16 bytes = 8 mark word + 4 klassptr + 4 array length

64-bit without Compressed Class pointer: 24 bytes = 8 mark work + 8 klassptr + 4 array length + 4 padding.
12 is always a bad offset to use in this test case on HotSpot, regardless of which mode above it's in -- writing to this offset will corrupt the header of the long[] and if you happen to go into a GC right after the corruption, you'll find the VM doing weird stuff (including crashing with a SIGSEGV).

…mparatorSuite` ### What changes were proposed in this pull request? As mentioned in #26548 (review), some test cases in `RecordBinaryComparatorSuite` use a fixed arrayOffset when writing to long arrays, this could lead to weird stuff including crashing with a SIGSEGV. This PR fix the problem by computing the arrayOffset based on `Platform.LONG_ARRAY_OFFSET`. ### How was this patch tested? Tested locally. Previously, when we try to add `System.gc()` between write into long array and compare by RecordBinaryComparator, there is a chance to hit JVM crash with SIGSEGV like: ``` # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x00007efc66970bcb, pid=11831, tid=0x00007efc0f9f9700 # # JRE version: OpenJDK Runtime Environment (8.0_222-b10) (build 1.8.0_222-8u222-b10-1ubuntu1~16.04.1-b10) # Java VM: OpenJDK 64-Bit Server VM (25.222-b10 mixed mode linux-amd64 compressed oops) # Problematic frame: # V [libjvm.so+0x5fbbcb] # # Core dump written. Default location: /home/jenkins/workspace/sql/core/core or core.11831 # # An error report file with more information is saved as: # /home/jenkins/workspace/sql/core/hs_err_pid11831.log # # If you would like to submit a bug report, please visit: # http://bugreport.java.com/bugreport/crash.jsp # ``` After the fix those test cases didn't crash the JVM anymore. Closes #26939 from jiangxb1987/rbc. Authored-by: Xingbo Jiang <xingbo.jiang@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

…RecordBinaryComparatorSuite PR apache#26548 means that RecordBinaryComparator now uses big endian byte order for long comparisons. However, this means that some of the constants in the regression tests no longer map to the same values in the comparison that they used to. For example, one of the tests does a comparison between Long.MIN_VALUE and 1 in order to trigger an overflow condition that existed in the past (i.e. Long.MIN_VALUE - 1). These constants correspond to the values 0x80..00 and 0x00..01. However on a little-endian machine the bytes in these values are now swapped before they are compared. This means that we will now be comparing 0x00..80 with 0x01..00. 0x00..80 - 0x01..00 does not overflow therefore missing the original purpose of the test. To fix this the constants are now explicitly written out in big endian byte order to match the byte order used in the comparison. This also fixes the tests on big endian machines (which would otherwise get a different comparison result to the little-endian machines). RecordBinaryComparatorSuite tests now pass on a big-endian machine (s390x).

…an platforms. Comparisons performed by RecordBinaryComparator are executed in lexicographic order (byte by byte starting from the byte at index 0). This means that the underlying endianness of the platform can affect the result of a comparison between multi-byte values such as longs. This difference means that two tests fail on big-endian platforms. Also, the two tests compare 'special' long values to test edge cases that triggered old bugs in the fast path of the comparator. However since PR apache#26548 these 'special' values are byte-reversed before being compared on little-endian platforms, which means that the edge cases these 'special' values were designed to trigger are no longer tested. This PR fixes both these issues by byte reversing the values on little-endian systems. RecordBinaryComparatorSuite tests now pass on a big-endian machine (s390x).

…ryComparatorSuite ### What changes were proposed in this pull request? PR #26548 means that RecordBinaryComparator now uses big endian byte order for long comparisons. However, this means that some of the constants in the regression tests no longer map to the same values in the comparison that they used to. For example, one of the tests does a comparison between Long.MIN_VALUE and 1 in order to trigger an overflow condition that existed in the past (i.e. Long.MIN_VALUE - 1). These constants correspond to the values 0x80..00 and 0x00..01. However on a little-endian machine the bytes in these values are now swapped before they are compared. This means that we will now be comparing 0x00..80 with 0x01..00. 0x00..80 - 0x01..00 does not overflow therefore missing the original purpose of the test. To fix this the constants are now explicitly written out in big endian byte order to match the byte order used in the comparison. This also fixes the tests on big endian machines (which would otherwise get a different comparison result to the little-endian machines). ### Why are the changes needed? The regression tests no longer serve their initial purposes and also fail on big-endian systems. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Tests run on big-endian system (s390x). Closes #29259 from mundaym/fix-endian. Authored-by: Michael Munday <mike.munday@ibm.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

…ryComparatorSuite ### What changes were proposed in this pull request? PR apache#26548 means that RecordBinaryComparator now uses big endian byte order for long comparisons. However, this means that some of the constants in the regression tests no longer map to the same values in the comparison that they used to. For example, one of the tests does a comparison between Long.MIN_VALUE and 1 in order to trigger an overflow condition that existed in the past (i.e. Long.MIN_VALUE - 1). These constants correspond to the values 0x80..00 and 0x00..01. However on a little-endian machine the bytes in these values are now swapped before they are compared. This means that we will now be comparing 0x00..80 with 0x01..00. 0x00..80 - 0x01..00 does not overflow therefore missing the original purpose of the test. To fix this the constants are now explicitly written out in big endian byte order to match the byte order used in the comparison. This also fixes the tests on big endian machines (which would otherwise get a different comparison result to the little-endian machines). ### Why are the changes needed? The regression tests no longer serve their initial purposes and also fail on big-endian systems. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Tests run on big-endian system (s390x). Closes apache#29259 from mundaym/fix-endian. Authored-by: Michael Munday <mike.munday@ibm.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

handle endianness

37ebd38

WangGuangxin changed the title ~~[SPARK-][SQL] RecordBinaryComparator should check endianness when compared by long~~ [SPARK-29918][SQL] RecordBinaryComparator should check endianness when compared by long Nov 15, 2019

dongjoon-hyun added the SQL label Nov 15, 2019

dongjoon-hyun reviewed Nov 15, 2019

View reviewed changes

srowen reviewed Nov 15, 2019

View reviewed changes

srowen mentioned this pull request Nov 15, 2019

[SPARK-28699][SQL] Disable using radix sort for ShuffleExchangeExec in repartition case #25491

Closed

srowen reviewed Nov 15, 2019

View reviewed changes

WangGuangxin added 2 commits November 18, 2019 21:38

use long.compareUnsigned & add ut

3cb480d

fix style

a07cfc6

srowen requested changes Nov 18, 2019

View reviewed changes

update

fdbc871

style

5cc04fa

cloud-fan reviewed Nov 18, 2019

View reviewed changes

sql/core/src/main/java/org/apache/spark/sql/execution/RecordBinaryComparator.java Show resolved Hide resolved

srowen reviewed Nov 18, 2019

View reviewed changes

update comments & ut

753883d

cloud-fan closed this in ffc9753 Nov 19, 2019

srowen reviewed Nov 19, 2019

View reviewed changes

jiangxb1987 reviewed Dec 7, 2019

View reviewed changes

rednaxelafx reviewed Dec 19, 2019

View reviewed changes

jiangxb1987 mentioned this pull request Dec 19, 2019

[SPARK-29918][SQL][FOLLOWUP][TEST] Fix arrayOffset in RecordBinaryComparatorSuite #26939

Closed

mundaym mentioned this pull request Jul 27, 2020

[SPARK-32485][SQL][TEST] Fix endianness issues in tests in RecordBinaryComparatorSuite #29259

Closed

		if (v1 != v2) {
		if (ByteOrder.nativeOrder().equals(ByteOrder.LITTLE_ENDIAN)) {

[SPARK-29918][SQL] RecordBinaryComparator should check endianness when compared by long #26548

[SPARK-29918][SQL] RecordBinaryComparator should check endianness when compared by long #26548

Conversation

WangGuangxin commented Nov 15, 2019 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

WangGuangxin commented Nov 15, 2019

dongjoon-hyun commented Nov 15, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dongjoon-hyun commented Nov 15, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

srowen commented Nov 15, 2019 • edited Loading

Choose a reason for hiding this comment

WangGuangxin Nov 15, 2019 • edited Loading

Choose a reason for hiding this comment

SparkQA commented Nov 15, 2019

WangGuangxin commented Nov 15, 2019

srowen commented Nov 15, 2019

cloud-fan commented Nov 16, 2019

srowen commented Nov 16, 2019

tcondie commented Nov 16, 2019

WangGuangxin commented Nov 16, 2019 • edited Loading

srowen commented Nov 16, 2019 • edited Loading

srowen commented Nov 16, 2019 • edited Loading

tcondie commented Nov 16, 2019

srowen commented Nov 16, 2019

WangGuangxin commented Nov 17, 2019

cloud-fan commented Nov 18, 2019

WangGuangxin commented Nov 18, 2019

SparkQA commented Nov 18, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Nov 18, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Nov 18, 2019

SparkQA commented Nov 18, 2019

SparkQA commented Nov 19, 2019

cloud-fan commented Nov 19, 2019

srowen left a comment

Choose a reason for hiding this comment

tcondie commented Nov 20, 2019

srowen commented Nov 20, 2019

tcondie commented Nov 20, 2019

srowen commented Nov 20, 2019

tcondie commented Nov 20, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rednaxelafx left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WangGuangxin commented Nov 15, 2019 •

edited

Loading

srowen commented Nov 15, 2019 •

edited

Loading

WangGuangxin Nov 15, 2019 •

edited

Loading

WangGuangxin commented Nov 16, 2019 •

edited

Loading

srowen commented Nov 16, 2019 •

edited

Loading

srowen commented Nov 16, 2019 •

edited

Loading