Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-29918][SQL] RecordBinaryComparator should check endianness when compared by long #26548

Closed
wants to merge 6 commits into from

Conversation

WangGuangxin
Copy link
Contributor

@WangGuangxin WangGuangxin commented Nov 15, 2019

What changes were proposed in this pull request?

This PR try to make sure the comparison results of compared by 8 bytes at a time and compared by bytes wise in RecordBinaryComparator is consistent, by reverse long bytes if it is little-endian and using Long.compareUnsigned.

Why are the changes needed?

If the architecture supports unaligned or the offset is 8 bytes aligned, RecordBinaryComparator compare 8 bytes at a time by reading 8 bytes as a long. Related code is

    if (Platform.unaligned() || (((leftOff + i) % 8 == 0) && ((rightOff + i) % 8 == 0))) {
      while (i <= leftLen - 8) {
        final long v1 = Platform.getLong(leftObj, leftOff + i);
        final long v2 = Platform.getLong(rightObj, rightOff + i);
        if (v1 != v2) {
          return v1 > v2 ? 1 : -1;
        }
        i += 8;
      }
    }

Otherwise, it will compare bytes by bytes.  Related code is

    while (i < leftLen) {
      final int v1 = Platform.getByte(leftObj, leftOff + i) & 0xff;
      final int v2 = Platform.getByte(rightObj, rightOff + i) & 0xff;
      if (v1 != v2) {
        return v1 > v2 ? 1 : -1;
      }
      i += 1;
    }

However, on little-endian machine,  the result of compared by a long value and compared bytes by bytes maybe different.

For two same records, its offsets may vary in the first run and second run, which will lead to compare them using long comparison or byte-by-byte comparison, the result maybe different.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Add new test cases in RecordBinaryComparatorSuite

@WangGuangxin WangGuangxin changed the title [SPARK-][SQL] RecordBinaryComparator should check endianness when compared by long [SPARK-29918][SQL] RecordBinaryComparator should check endianness when compared by long Nov 15, 2019
@WangGuangxin
Copy link
Contributor Author

@jiangxb1987 @cloud-fan Could you please help review this?

@dongjoon-hyun
Copy link
Member

ok to test

@@ -273,7 +273,7 @@ public void testBinaryComparatorWhenSubtractionIsDivisibleByMaxIntValue() throws
insertRow(row1);
insertRow(row2);

assert(compare(0, 1) < 0);
assert(compare(0, 1) > 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, do you mean this is wrong before this PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change definitely changes the ordering, as bytes are compared in a different order.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, do you mean this is wrong before this PR?

RecordBinaryComparator is used as a local sort comparator in RoundRobinPartition to make sure the order of each records are the same after rerun. But the relative order is not important.

@dongjoon-hyun
Copy link
Member

Since this is reported as a correctness issue, ping @gatorsmile , too.

if (v1 != v2) {
if (ByteOrder.nativeOrder().equals(ByteOrder.LITTLE_ENDIAN)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cant his check happen once and make it a constant rather than call each time?

This is also the unfortunately the common case, little-endian architectures. Hm, I'm trying to think of an alternative that would introduce overhead only for big-endian, but not sure if it's possible. that said reverseBytes ought to be very fast. And I guess this only happens once before this terminates.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify the issue isn't endianness alone, but aligned access? varying either one could cause the result to change.

@@ -273,7 +273,7 @@ public void testBinaryComparatorWhenSubtractionIsDivisibleByMaxIntValue() throws
insertRow(row1);
insertRow(row2);

assert(compare(0, 1) < 0);
assert(compare(0, 1) > 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change definitely changes the ordering, as bytes are compared in a different order.

@srowen
Copy link
Member

srowen commented Nov 15, 2019

Is https://issues.apache.org/jira/browse/SPARK-23207 actually the same issue? that was marked fixed.

(Edited below to fix my example)

So, hm, do we not also have a subtle problem with the signed nature of bytes and longs? Putting aside endianness issues, if you compare two longs starting with bytes like:

1000 0000 ...
0000 0000 ...

The first is less than the second, because it's clearly negative while the other isn't.
But comparing byte by byte, we'd consider the first one to be greater than the second.
Would this also be an issue between aligned and unaligned machines?

That said, I also don't know how common it is to mix this hardware? (I don't even know which machines do and don't support aligned access.) Is it common?

if (ByteOrder.nativeOrder().equals(ByteOrder.LITTLE_ENDIAN)) {
v1 = Long.reverseBytes(v1);
v2 = Long.reverseBytes(v2);
}
return v1 > v2 ? 1 : -1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pulling together my comment with #25491 (comment) - maybe this also has to be Long.compareUnsigned?

Copy link
Contributor Author

@WangGuangxin WangGuangxin Nov 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks for point out this. This is the same issue with #25491 (comment). And this can be reproduced in our cluster.

@SparkQA
Copy link

SparkQA commented Nov 15, 2019

Test build #113892 has finished for PR 26548 at commit 37ebd38.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@WangGuangxin
Copy link
Contributor Author

Is https://issues.apache.org/jira/browse/SPARK-23207 actually the same issue? that was marked fixed.

(Edited below to fix my example)

So, hm, do we not also have a subtle problem with the signed nature of bytes and longs? Putting aside endianness issues, if you compare two longs starting with bytes like:

1000 0000 ...
0000 0000 ...

The first is less than the second, because it's clearly negative while the other isn't.
But comparing byte by byte, we'd consider the first one to be greater than the second.
Would this also be an issue between aligned and unaligned machines?

That said, I also don't know how common it is to mix this hardware? (I don't even know which machines do and don't support aligned access.) Is it common?

https://issues.apache.org/jira/browse/SPARK-23207 is the initial commit to address Shuffle + Repartition case, and RecordBinaryComparator is introduced in this commit as a local sort comparator to make sure the partition output is consistent even after rerun due to stage failure.
What this PR does is to make sure the RecordBinaryComparator always produce the same result even with different architecture.

To clarify, suppose there is a map task whose input has two records A and B, both has 8 bytes

A: 00000001 00000000 * 6 00000000
B: 00000000 00000000 * 6 00000001

In the first run, suppose it runs on a little endian machine whose Platform.unaligned() is true, then A and B is compared by reading it as a Long, so A = 1 and B = 72057594037927936L, so A < B.

But in the following rerun due to shuffle fetch failure, suppose it runs on a machine whose Platform.unaligned() is false, then A and B is compared by byte-wise from low address to high address, so A > B.

The inconsistent after rerun can still cause incorrect data.

@srowen
Copy link
Member

srowen commented Nov 15, 2019

Yeah I get the endianness issue; it is triggered even on little-endian architectures that vary in ability to do unaligned access. Just clarifying that is the case.

And yeah I think there's a second closely related issue here that's worth resolving.

BTW is the target architecture here really ARM? like you have a big Intel-based driver and ARM-based workers?

@cloud-fan
Copy link
Contributor

is it officially supported to deploy Spark on mixed architecture? Spark read/write unsafe rows with unsafe API (native byte order), and seems many things can be broken on mixed architecture.

@srowen
Copy link
Member

srowen commented Nov 16, 2019

Agree that mixing big- and little-endian probably breaks more than this. Mixing aligned/unaligned seems a bit more "valid" (again think mixing in some cheaper ARM instances, the "a" instances on AWS), though I still don't know whether it should be supported. It'd be nice I guess, but looks like it'd introduce a small perf hit.

One argument for this kind of change is simply that it makes the code do what it appears to intend to do now, but doesn't, which makes me a little nervous.

However if I read this right, this could come up even on the same hardware if it doesn't support unaligned access? if the two offsets can be aligned, it'll use the 'longs' code path for some of it. If it doesn't, then it uses the byte code path for all of it. This suggests those can give different answers.

Even if unaligned access is supported everywhere, it seems like the same data could give different results depending on where the data are in memory, on the offsets mod 8. The code path again varies depending on whether they match or do not at the start.

If so that's a bigger problem, but, why would we not observe it until now in testing? is it that the data it seems always happens to be aligned similarly, like always 0 or 4 mod 8 for internal reasons? just that this rarely ever actually causes a problem in practice because consistent results would only matter in rare cases? BTW @tcondie did your issue come up on mixed hardware?

@tcondie
Copy link
Contributor

tcondie commented Nov 16, 2019

@srowen the issue did not appear on mixed hardware.

@WangGuangxin
Copy link
Contributor Author

WangGuangxin commented Nov 16, 2019

Maybe some words in my description make you misunderstood. In fact, it has nothing to do with the mixing of big- and little-endian.

For two same records, the comparison results may be different after rerun. It is just the code logic bug. This can happened in a cluster with all x86-64 and little-endian servers.

Take a real test data in my test in a cluster will ALL x86-64 and little-endian servers as an example.

In the first run, recordA's offset is 25617612, length is 40, recordB's offset is 53434324, length is 40. Since 25617612 % 8 == 4 && 53434324 % 8 == 4, so according the logic in RecordBinaryComparator, it will compare first 4 bytes in byte-wise, and then compare the following 32 bytes by 4 Long. The last 4 bytes is compared by byte-wise again.

In the second run, recordA's offset is 19257280, length is 40, recordB's offset is 16892896, length is 40. Since 19257280 % 8 == 0 && 16892896 % 8 == 0, so it will compare the 40 bytes by 5 Long according to the logic in RecordBinaryComparator.

Here comes the difference. For the last 4 bytes, in the first run, it is compared by byte-wise, in the second run, it is compared in a Long. Compare them in a Long and compare them byte-wise is not equal in little-endian machine(explained in #26548 (comment)). The offset of record A and B is different in the two run, which will cause the comparison code path in RecordBinaryComparator different.

@srowen @cloud-fan

@srowen
Copy link
Member

srowen commented Nov 16, 2019

Yes, it isn't really related to endian-ness, although to fix this issue you have to know whether the bytes were read as little-endian or not.

Here's a simple proof of concept:

    long arrayOffset = 12;
    
    long[] arr1 = new long[2];
    // 12 + 4 = 16, aligned
    Platform.putLong(arr1, arrayOffset + 4, 0xa000000000000000L);
    long[] arr2 = new long[2];
    Platform.putLong(arr2, arrayOffset + 4, 0x0000000000000000L);
    System.out.println(binaryComparator.compare(arr1, arrayOffset + 4, 8, arr2, arrayOffset + 4, 8));

    long[] arr3 = new long[2];
    Platform.putLong(arr3, arrayOffset, 0xa000000000000000L);
    long[] arr4 = new long[2];
    Platform.putLong(arr4, arrayOffset, 0x0000000000000000L);
    System.out.println(binaryComparator.compare(arr3, arrayOffset, 8, arr4, arrayOffset, 8));

(Edited the example to correctly show the problem with signed vs unsigned long comparison)

This prints -1 and 1, but it's the same 8 bytes being compared with each other, even at the same alignment. In the first one, it's aligned (12 + 4 = 16), so starts by comparing longs. The first one is negative when read as a signed long, so compares before the second, 0. However when not aligned in the second case, it starts with byte-by-byte comparison, and the first 0xa0 is read as a unsigned byte greater than 0, so it's larger. (Note here the endian-ness issue cancels out, so to speak.)

This example is actually illustrating the other issue mentioned here, about unsigned long comparison. But I think this example shows the issue you reported:

    long arrayOffset = 12;

    long[] arr1 = new long[2];
    Platform.putLong(arr1, arrayOffset, 0x0100000000000000L);
    long[] arr2 = new long[2];
    Platform.putLong(arr2, arrayOffset + 4, 0x0000000000000001L);
    System.out.println(binaryComparator.compare(arr1, arrayOffset, 8, arr2, arrayOffset + 4, 8));

    long[] arr3 = new long[2];
    Platform.putLong(arr3, arrayOffset, 0x0100000000000000L);
    long[] arr4 = new long[2];
    Platform.putLong(arr4, arrayOffset, 0x0000000000000001L);
    System.out.println(binaryComparator.compare(arr3, arrayOffset, 8, arr4, arrayOffset, 8));

Prints 1 and -1.
Here, in the first case it again starts with long comparisons as the architecture supports unaligned comparison. Both are positive but the long in the first case is larger. In the second case, they're not aligned on a 8-byte boundary so starts with byte comparison. But the first byte it looks at is the least-significant byte of the longs, and so compares 0 to 1 and finds the second is larger.

And yes this is just running locally on one machine even, Intel 64-bit (which supports unaligned access).

The second one is fixed by a change like the one in this PR.
The first one is fixed by also comparing the longs then with Long.compareUnsigned. They agree in both cases if both changes are made.

@srowen
Copy link
Member

srowen commented Nov 16, 2019

BTW if we do make the compareUnsigned change, we could both optimize this bit of code while making it slightly more consistent:

        final int v1 = Platform.getByte(leftObj, leftOff + i) & 0xff;
        final int v2 = Platform.getByte(rightObj, rightOff + i) & 0xff;
        if (v1 != v2) {
          return v1 > v2 ? 1 : -1;
        }
        final byte v1 = Platform.getByte(leftObj, leftOff + i);
        final byte v2 = Platform.getByte(rightObj, rightOff + i);
        if (v1 != v2) {
          return (v1 & 0xff) > (v2 & 0xff) ? 1 : -1;
        }

@tcondie
Copy link
Contributor

tcondie commented Nov 16, 2019

@srowen are you for using unsigned long comparison? If so then I can verify the fix with the internal customer that is still hitting the inconsistent round-robin repartition issue. I'm not 100% sure this a root cause; I simply looked for areas of non-deterministic behavior and the byte/long comparison seemed to be a likely source.

@srowen
Copy link
Member

srowen commented Nov 16, 2019

The first example is fixed with compareUnsigned, and the second is fixed with the change in this PR, independently. They should work together too. If you're testing, probably worth trying both changes.

@WangGuangxin
Copy link
Contributor Author

Yes, it isn't really related to endian-ness, although to fix this issue you have to know whether the bytes were read as little-endian or not.

Here's a simple proof of concept:

    long arrayOffset = 12;
    
    long[] arr1 = new long[2];
    // 12 + 4 = 16, aligned
    Platform.putLong(arr1, arrayOffset + 4, 0xa000000000000000L);
    long[] arr2 = new long[2];
    Platform.putLong(arr2, arrayOffset + 4, 0x0000000000000000L);
    System.out.println(binaryComparator.compare(arr1, arrayOffset + 4, 8, arr2, arrayOffset + 4, 8));

    long[] arr3 = new long[2];
    Platform.putLong(arr3, arrayOffset, 0xa000000000000000L);
    long[] arr4 = new long[2];
    Platform.putLong(arr4, arrayOffset, 0x0000000000000000L);
    System.out.println(binaryComparator.compare(arr3, arrayOffset, 8, arr4, arrayOffset, 8));

(Edited the example to correctly show the problem with signed vs unsigned long comparison)

This prints -1 and 1, but it's the same 8 bytes being compared with each other, even at the same alignment. In the first one, it's aligned (12 + 4 = 16), so starts by comparing longs. The first one is negative when read as a signed long, so compares before the second, 0. However when not aligned in the second case, it starts with byte-by-byte comparison, and the first 0xa0 is read as a unsigned byte greater than 0, so it's larger. (Note here the endian-ness issue cancels out, so to speak.)

This example is actually illustrating the other issue mentioned here, about unsigned long comparison. But I think this example shows the issue you reported:

    long arrayOffset = 12;

    long[] arr1 = new long[2];
    Platform.putLong(arr1, arrayOffset, 0x0100000000000000L);
    long[] arr2 = new long[2];
    Platform.putLong(arr2, arrayOffset + 4, 0x0000000000000001L);
    System.out.println(binaryComparator.compare(arr1, arrayOffset, 8, arr2, arrayOffset + 4, 8));

    long[] arr3 = new long[2];
    Platform.putLong(arr3, arrayOffset, 0x0100000000000000L);
    long[] arr4 = new long[2];
    Platform.putLong(arr4, arrayOffset, 0x0000000000000001L);
    System.out.println(binaryComparator.compare(arr3, arrayOffset, 8, arr4, arrayOffset, 8));

Prints 1 and -1.
Here, in the first case it again starts with long comparisons as the architecture supports unaligned comparison. Both are positive but the long in the first case is larger. In the second case, they're not aligned on a 8-byte boundary so starts with byte comparison. But the first byte it looks at is the least-significant byte of the longs, and so compares 0 to 1 and finds the second is larger.

And yes this is just running locally on one machine even, Intel 64-bit (which supports unaligned access).

The second one is fixed by a change like the one in this PR.
The first one is fixed by also comparing the longs then with Long.compareUnsigned. They agree in both cases if both changes are made.

Nice explanation. The second example is just what I mean.

@cloud-fan
Copy link
Contributor

nice explanation @srowen ! shall we fix the issues in one PR?

@WangGuangxin
Copy link
Contributor Author

yes, I'll update the PR today

@SparkQA
Copy link

SparkQA commented Nov 18, 2019

Test build #114021 has finished for PR 26548 at commit 3cb480d.

  • This patch fails Java style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -22,6 +22,12 @@

public final class RecordBinaryComparator extends RecordComparator {

boolean isLittlenEndian;

public RecordBinaryComparator(boolean isLittlenEndian) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this. It's a fixed property of the entire VM / JVM. Just:
private static final boolean LITTLE_ENDIAN = ByteOrder.nativeOrder.equals(ByteOrder.LITTLE_ENDIAN);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@@ -108,7 +108,7 @@ private int compare(int index1, int index2) {
baseOffset2, recordLength2);
}

private final RecordComparator binaryComparator = new RecordBinaryComparator();
private final RecordComparator binaryComparator = new RecordBinaryComparator(true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... per the above comment, this wouldn't work on big-endian systems.

@@ -321,4 +321,69 @@ public void testBinaryComparatorWhenOnlyTheLastColumnDiffers() throws Exception

assert(compare(0, 1) < 0);
}

@Test
public void testBinaryComparatorGiveSameResultWhenComparedByteByByteAndComparedByLong()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, you might chop this down a bit. testCompareLongsAsLittleEndian() or something.

throws Exception {
int numFields = 1;

UnsafeRow row1 = new UnsafeRow(numFields);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need UnsafeRow in this test? the test case I posted seems fine by itself. It isn't specific to UnsafeRow, or at least, I'd make sure this test still triggers the problem after the change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test cast updated

}

@Test
public void testBinaryComparatorShouldComparedWithUnsignedLong() throws Exception {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testCompareLongAsUnsigned? we already know this is about RecordBinaryComparator

// so the RecordBinaryComparator will compare row1 and row2 with two long comparison directly,
// while the comparison between row1 and row3 is started with 4 bytes byte-by-byte comparison,
// followed by a long comparison, and lastly 4 bytes byte-by-byte comparison.
assert(compare(0, 1) < 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just assert the comparison is the same, here and below.

@SparkQA
Copy link

SparkQA commented Nov 18, 2019

Test build #114025 has finished for PR 26548 at commit fdbc871.

  • This patch fails Java style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • public final class RecordBinaryComparator extends RecordComparator

long[] arr2 = new long[2];
Platform.putLong(arr2, arrayOffset + 4, 0x0000000000000000L);
// both leftBaseOffset and rightBaseOffset are aligned, so it will start by comparing longs
int result1 = binaryComparator.compare(arr1, arrayOffset + 4, 8, arr2, arrayOffset + 4, 8);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think several of these lines are too long (> 100 chars)

Platform.putLong(arr4, arrayOffset, 0x0000000000000001L);
int result2 = binaryComparator.compare(arr3, arrayOffset, 8, arr4, arrayOffset, 8);

assert(result1 == result2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW I don't think we have to fix it here necessarily, but this whole suite is using JVM asserts, not JUnit asserts. It's still reasonable as assertions are enabled in tests, but, isn't the best practice - error messages are not as informative, and if someone doesn't turn on asserts, doesn't actually test. I don't mind fixing this separately ; there are just a few Java suites with this problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to fix this in #26581 ; whichever is merged first, the other can follow suit. It doesn't block this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the two new added test cases to use Junit.Asserts

@SparkQA
Copy link

SparkQA commented Nov 18, 2019

Test build #114022 has finished for PR 26548 at commit a07cfc6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 18, 2019

Test build #114026 has finished for PR 26548 at commit 5cc04fa.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 19, 2019

Test build #114046 has finished for PR 26548 at commit 753883d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan cloud-fan closed this in ffc9753 Nov 19, 2019
cloud-fan pushed a commit that referenced this pull request Nov 19, 2019
…n compared by long

### What changes were proposed in this pull request?
This PR try to make sure the comparison results of  `compared by 8 bytes at a time` and `compared by bytes wise` in RecordBinaryComparator is *consistent*, by reverse long bytes if it is little-endian and using Long.compareUnsigned.

### Why are the changes needed?
If the architecture supports unaligned or the offset is 8 bytes aligned, `RecordBinaryComparator` compare 8 bytes at a time by reading 8 bytes as a long.  Related code is
```
    if (Platform.unaligned() || (((leftOff + i) % 8 == 0) && ((rightOff + i) % 8 == 0))) {
      while (i <= leftLen - 8) {
        final long v1 = Platform.getLong(leftObj, leftOff + i);
        final long v2 = Platform.getLong(rightObj, rightOff + i);
        if (v1 != v2) {
          return v1 > v2 ? 1 : -1;
        }
        i += 8;
      }
    }
```

Otherwise, it will compare bytes by bytes.  Related code is
```
    while (i < leftLen) {
      final int v1 = Platform.getByte(leftObj, leftOff + i) & 0xff;
      final int v2 = Platform.getByte(rightObj, rightOff + i) & 0xff;
      if (v1 != v2) {
        return v1 > v2 ? 1 : -1;
      }
      i += 1;
    }
```

However, on little-endian machine,  the result of *compared by a long value* and *compared bytes by bytes* maybe different.

For two same records, its offsets may vary in the first run and second run, which will lead to compare them using long comparison or byte-by-byte comparison, the result maybe different.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Add new test cases in RecordBinaryComparatorSuite

Closes #26548 from WangGuangxin/binary_comparator.

Authored-by: wangguangxin.cn <wangguangxin.cn@bytedance.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit ffc9753)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@cloud-fan
Copy link
Contributor

thanks, merging to master/2.4!

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep looks good, I'll fix the other asserts separately.

@tcondie
Copy link
Contributor

tcondie commented Nov 20, 2019

Would it be possible to merge into 2.3? I believe the fix could resolve the non-deterministic round-robin shuffle issue, for which an earlier fix was merged into 2.3 i.e., SPARK-28699

@srowen
Copy link
Member

srowen commented Nov 20, 2019

Possible, but there are no more 2.3.x releases coming, so wouldn't end up in a release.

@tcondie
Copy link
Contributor

tcondie commented Nov 20, 2019

I understand. Please let me know if it can be done in an existing release. We have a customer tied to 2.3 hitting the shuffle issue.

@srowen
Copy link
Member

srowen commented Nov 20, 2019

This will be in 3.0 and 2.4.5. Vendors typically can cherry-pick commits like this if maintaining an otherwise EOL branch. It should be pretty clean.

@tcondie
Copy link
Contributor

tcondie commented Nov 20, 2019

Understood. Thank you very much Sean!

long[] arr4 = new long[2];
Platform.putLong(arr4, arrayOffset, 0x0000000000000001L);
// both left and right offset is not aligned, it will start with byte-by-byte comparison
int result2 = binaryComparator.compare(arr3, arrayOffset, 8, arr4, arrayOffset, 8);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to consider the uaoSize here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, good question. I don't think it comes up here as it's just messing with an long[] that was allocated normally? rather than data we wrote into an arbitrary location. But I may not be fully aware of the issue you have in mind.

Copy link
Contributor

@rednaxelafx rednaxelafx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some issues with the test cases in this PR. @jiangxb1987 will send a follow-up PR to fix them.


@Test
public void testCompareLongsAsLittleEndian() {
long arrayOffset = 12;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test cases in this PR look problematic: the long arrayOffset = 12; is hard coded with implicit assumption on the underlying JVM's object layout in memory.
The object layout isn't guaranteed to be the same among different JVM, and in fact can be different even with the same JVM on the same machine but just different VM options.

It's preferable to use Platform.LONG_ARRAY_OFFSET as your mental model of where "0-offset" is. You should never write anything into the offset range [0, Platform.LONG_ARRAY_OFFSET[ because there's no guarantee what's there.
e.g. on the HotSpot JVM, that's where the object header lives.

For example, on the HotSpot JVM, the object header size for long[] is:

  • 32-bit: 16 bytes = 4 mark work + 4 klassptr + 4 array length + 4 padding
  • 64-bit with Compressed Class pointer: 16 bytes = 8 mark word + 4 klassptr + 4 array length
  • 64-bit without Compressed Class pointer: 24 bytes = 8 mark work + 8 klassptr + 4 array length + 4 padding.
    12 is always a bad offset to use in this test case on HotSpot, regardless of which mode above it's in -- writing to this offset will corrupt the header of the long[] and if you happen to go into a GC right after the corruption, you'll find the VM doing weird stuff (including crashing with a SIGSEGV).

cloud-fan pushed a commit that referenced this pull request Dec 19, 2019
…mparatorSuite`

### What changes were proposed in this pull request?

As mentioned in #26548 (review), some test cases in `RecordBinaryComparatorSuite` use a fixed arrayOffset when writing to long arrays, this  could lead to weird stuff including crashing with a SIGSEGV.

This PR fix the problem by computing the arrayOffset based on `Platform.LONG_ARRAY_OFFSET`.

### How was this patch tested?
Tested locally. Previously, when we try to add `System.gc()` between write into long array and compare by RecordBinaryComparator, there is a chance to hit JVM crash with SIGSEGV like:
```
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007efc66970bcb, pid=11831, tid=0x00007efc0f9f9700
#
# JRE version: OpenJDK Runtime Environment (8.0_222-b10) (build 1.8.0_222-8u222-b10-1ubuntu1~16.04.1-b10)
# Java VM: OpenJDK 64-Bit Server VM (25.222-b10 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# V  [libjvm.so+0x5fbbcb]
#
# Core dump written. Default location: /home/jenkins/workspace/sql/core/core or core.11831
#
# An error report file with more information is saved as:
# /home/jenkins/workspace/sql/core/hs_err_pid11831.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#
```
After the fix those test cases didn't crash the JVM anymore.

Closes #26939 from jiangxb1987/rbc.

Authored-by: Xingbo Jiang <xingbo.jiang@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
srowen pushed a commit that referenced this pull request Dec 21, 2019
…mparatorSuite`

### What changes were proposed in this pull request?

As mentioned in #26548 (review), some test cases in `RecordBinaryComparatorSuite` use a fixed arrayOffset when writing to long arrays, this  could lead to weird stuff including crashing with a SIGSEGV.

This PR fix the problem by computing the arrayOffset based on `Platform.LONG_ARRAY_OFFSET`.

### How was this patch tested?
Tested locally. Previously, when we try to add `System.gc()` between write into long array and compare by RecordBinaryComparator, there is a chance to hit JVM crash with SIGSEGV like:
```
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007efc66970bcb, pid=11831, tid=0x00007efc0f9f9700
#
# JRE version: OpenJDK Runtime Environment (8.0_222-b10) (build 1.8.0_222-8u222-b10-1ubuntu1~16.04.1-b10)
# Java VM: OpenJDK 64-Bit Server VM (25.222-b10 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# V  [libjvm.so+0x5fbbcb]
#
# Core dump written. Default location: /home/jenkins/workspace/sql/core/core or core.11831
#
# An error report file with more information is saved as:
# /home/jenkins/workspace/sql/core/hs_err_pid11831.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#
```
After the fix those test cases didn't crash the JVM anymore.

Closes #26939 from jiangxb1987/rbc.

Authored-by: Xingbo Jiang <xingbo.jiang@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
mundaym added a commit to linux-on-ibm-z/spark that referenced this pull request Jul 27, 2020
…RecordBinaryComparatorSuite

PR apache#26548 means that RecordBinaryComparator now uses big endian
byte order for long comparisons. However, this means that some of
the constants in the regression tests no longer map to the same
values in the comparison that they used to.

For example, one of the tests does a comparison between
Long.MIN_VALUE and 1 in order to trigger an overflow condition that
existed in the past (i.e. Long.MIN_VALUE - 1). These constants
correspond to the values 0x80..00 and 0x00..01. However on a
little-endian machine the bytes in these values are now swapped
before they are compared. This means that we will now be comparing
0x00..80 with 0x01..00. 0x00..80 - 0x01..00 does not overflow
therefore missing the original purpose of the test.

To fix this the constants are now explicitly written out in big
endian byte order to match the byte order used in the comparison.
This also fixes the tests on big endian machines (which would
otherwise get a different comparison result to the little-endian
machines).

RecordBinaryComparatorSuite tests now pass on a big-endian machine
(s390x).
mundaym added a commit to linux-on-ibm-z/spark that referenced this pull request Aug 5, 2020
…an platforms.

Comparisons performed by RecordBinaryComparator are executed in
lexicographic order (byte by byte starting from the byte at index 0).
This means that the underlying endianness of the platform can affect
the result of a comparison between multi-byte values such as longs.
This difference means that two tests fail on big-endian platforms.

Also, the two tests compare 'special' long values to test edge cases
that triggered old bugs in the fast path of the comparator. However
since PR apache#26548 these 'special' values are byte-reversed before being
compared on little-endian platforms, which means that the edge cases
these 'special' values were designed to trigger are no longer tested.

This PR fixes both these issues by byte reversing the values on
little-endian systems.

RecordBinaryComparatorSuite tests now pass on a big-endian machine
(s390x).
cloud-fan pushed a commit that referenced this pull request Aug 5, 2020
…ryComparatorSuite

### What changes were proposed in this pull request?
PR #26548 means that RecordBinaryComparator now uses big endian
byte order for long comparisons. However, this means that some of
the constants in the regression tests no longer map to the same
values in the comparison that they used to.

For example, one of the tests does a comparison between
Long.MIN_VALUE and 1 in order to trigger an overflow condition that
existed in the past (i.e. Long.MIN_VALUE - 1). These constants
correspond to the values 0x80..00 and 0x00..01. However on a
little-endian machine the bytes in these values are now swapped
before they are compared. This means that we will now be comparing
0x00..80 with 0x01..00. 0x00..80 - 0x01..00 does not overflow
therefore missing the original purpose of the test.

To fix this the constants are now explicitly written out in big
endian byte order to match the byte order used in the comparison.
This also fixes the tests on big endian machines (which would
otherwise get a different comparison result to the little-endian
machines).

### Why are the changes needed?
The regression tests no longer serve their initial purposes and also fail on big-endian systems.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Tests run on big-endian system (s390x).

Closes #29259 from mundaym/fix-endian.

Authored-by: Michael Munday <mike.munday@ibm.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
holdenk pushed a commit to holdenk/spark that referenced this pull request Oct 27, 2020
…ryComparatorSuite

### What changes were proposed in this pull request?
PR apache#26548 means that RecordBinaryComparator now uses big endian
byte order for long comparisons. However, this means that some of
the constants in the regression tests no longer map to the same
values in the comparison that they used to.

For example, one of the tests does a comparison between
Long.MIN_VALUE and 1 in order to trigger an overflow condition that
existed in the past (i.e. Long.MIN_VALUE - 1). These constants
correspond to the values 0x80..00 and 0x00..01. However on a
little-endian machine the bytes in these values are now swapped
before they are compared. This means that we will now be comparing
0x00..80 with 0x01..00. 0x00..80 - 0x01..00 does not overflow
therefore missing the original purpose of the test.

To fix this the constants are now explicitly written out in big
endian byte order to match the byte order used in the comparison.
This also fixes the tests on big endian machines (which would
otherwise get a different comparison result to the little-endian
machines).

### Why are the changes needed?
The regression tests no longer serve their initial purposes and also fail on big-endian systems.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Tests run on big-endian system (s390x).

Closes apache#29259 from mundaym/fix-endian.

Authored-by: Michael Munday <mike.munday@ibm.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
8 participants