[SPARK-32485][SQL][TEST] Fix endianness issues in tests in RecordBinaryComparatorSuite #29259

mundaym · 2020-07-27T10:48:06Z

What changes were proposed in this pull request?

PR #26548 means that RecordBinaryComparator now uses big endian
byte order for long comparisons. However, this means that some of
the constants in the regression tests no longer map to the same
values in the comparison that they used to.

For example, one of the tests does a comparison between
Long.MIN_VALUE and 1 in order to trigger an overflow condition that
existed in the past (i.e. Long.MIN_VALUE - 1). These constants
correspond to the values 0x80..00 and 0x00..01. However on a
little-endian machine the bytes in these values are now swapped
before they are compared. This means that we will now be comparing
0x00..80 with 0x01..00. 0x00..80 - 0x01..00 does not overflow
therefore missing the original purpose of the test.

To fix this the constants are now explicitly written out in big
endian byte order to match the byte order used in the comparison.
This also fixes the tests on big endian machines (which would
otherwise get a different comparison result to the little-endian
machines).

Why are the changes needed?

The regression tests no longer serve their initial purposes and also fail on big-endian systems.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Tests run on big-endian system (s390x).

HyukjinKwon · 2020-07-27T13:43:30Z

ok to test

cloud-fan · 2020-07-27T14:28:15Z

I think #26548 only guarantees that the comparison results of compared by 8 bytes at a time and compared by bytes wise are consistent, but it doesn't mean the comparison results are consistent between big-endian and little-endian systems.

I'm not sure if this is a problem if the Spark cluster is mixed-architecture. cc @kiszk @rednaxelafx @maropu @viirya

srowen · 2020-07-27T14:32:38Z

Are you saying the test fails on big-endian machines, as-is?

mundaym · 2020-07-27T15:01:21Z

Are you saying the test fails on big-endian machines, as-is?

Yes. The long values being compared are in native byte order. Therefore the result of the byte-by-byte comparison may be different on big and little endian machines.

mundaym · 2020-07-27T15:19:03Z

I think #26548 only guarantees that the comparison results of compared by 8 bytes at a time and compared by bytes wise are consistent, but it doesn't mean the comparison results are consistent between big-endian and little-endian systems.

Yes. This PR ensures that the order of the bytes provided to the comparator in the tests is the same on both big- and little-endian systems.

This is required because #26548 didn't actually change the behaviour of the comparator on big-endian systems (which already compared 8-byte chunks in big-endian byte order) but did change the expected output of the comparisons in these tests. The change therefore broke these tests on big-endian systems.

srowen · 2020-07-27T15:20:53Z

The change you cite doesn't change how things are compared - just fixes the intended logic.
Endian-ness isn't really at issue here in the comparison logic, because this is byte-by-byte comparison, so I'd remove comments to that effect.

What's at issue here is what the test writes. Writing 11 to memory directly on little-endian writes different bytes as big-endian. They result in a different comparison because the test case ends up expressing a different test.

I think the change is therefore reasonable, but would instead swap the bytes when the arch is big-endian, and leave the current assertion. Just a minor semantic difference that preserves the existing test, but, it's pretty fine either way.

Is that the only current big-endian test failure?

mundaym · 2020-07-27T15:54:45Z

Endian-ness isn't really at issue here in the comparison logic, because this is byte-by-byte comparison, so I'd remove comments to that effect.

Ack. I'll rework that.

I think the change is therefore reasonable, but would instead swap the bytes when the arch is big-endian, and leave the current assertion. Just a minor semantic difference that preserves the existing test, but, it's pretty fine either way.

The reason for writing the values in big-endian byte order is that it preserves the original intent of the tests which are trying and trigger old bugs related to mis-sized integer comparisons. Writing the values in little-endian byte order makes the tests less applicable since they aren't hitting those cases.

It might be better to just write out the bytes individually and avoid the platform endianness check. I'll see if that change is cleaner.

Is that the only current big-endian test failure?

In this suite yes but there are test failures in other places. I am currently going through and fixing what I can. Eventually we'd like to try and get a big endian system into Jenkins, if possible.

srowen · 2020-07-27T15:59:08Z

I see your point, the original intent was likely to write the bytes as if the big-endian representation of a long. OK.
Sure, agree, just writing bytes is probably clearer still. Or at least write the intended bytes in the comment for the reader.

Up to you about how you want to attack it - if there are other similar failures we can fix them together in one PR to group them logically. If there are several distinct issues you can tackle them separately if desired, or, together could be OK too. I suspect they're all related to this general part of the code.

SparkQA · 2020-07-27T18:15:35Z

Test build #126648 has finished for PR 29259 at commit f0abc7e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2020-07-27T19:09:25Z

I think #26548 only guarantees that the comparison results of compared by 8 bytes at a time and compared by bytes wise are consistent, but it doesn't mean the comparison results are consistent between big-endian and little-endian systems.

I'm not sure if this is a problem if the Spark cluster is mixed-architecture. cc @kiszk @rednaxelafx @maropu @viirya

Based on #26548 and the comments, RecordBinaryComparator just needs to make sure a consistent order on the records to compare between reruns. A different order between big-endian and little-endian system is possible by RecordBinaryComparator. But I think it won't affect correctness generally.

srowen · 2020-07-27T19:13:19Z

@viirya but this test asserts a particular ordering -- for test purposes, sure. The ordering won't be consistent, so the test fails based on arch. Because this is a special case, asserting the ordering, the endian-ness has to be fixed.

viirya · 2020-07-27T19:19:07Z

OK, I can see the point. Looks we have better to fix it.

viirya · 2020-07-27T19:20:10Z

...core/src/test/java/test/org/apache/spark/sql/execution/sort/RecordBinaryComparatorSuite.java

+    long row1Data = 11L;
+    long row2Data = 11L + Integer.MAX_VALUE;
+
+    // BinaryComparator compares longs in big-endian byte order.


Would you like to add few more words? It is a bit hard to reason about this with this only sentence.

E.g. "... In little-endian sysyem, this bytes comparison does not overflow. We swap it to make sure overflow happened."

I don't even think that's quite true. The comparison isn't endian at all as it is byte-by-byte. But the point here is to write bytes in a certain order for the test, for sure.

I think the overflow only happened when comparing longs? @mundaym

I don't think overflow was the issue per se; signed vs unsigned bytes were, for sure, in the original issue. But not so much here in this test case.

maropu · 2020-07-28T03:57:23Z

...core/src/test/java/test/org/apache/spark/sql/execution/sort/RecordBinaryComparatorSuite.java

+    long row2Data = 1;
+
+    // BinaryComparator compares longs in big-endian byte order.
+    if (ByteOrder.nativeOrder().equals(ByteOrder.LITTLE_ENDIAN)) {


For example, one of the tests does a comparison between
Long.MIN_VALUE and 1 in order to trigger an overflow condition that
existed in the past (i.e. Long.MIN_VALUE - 1). These constants
correspond to the values 0x80..00 and 0x00..01. However on a
little-endian machine the bytes in these values are now swapped
before they are compared. This means that we will now be comparing
0x00..80 with 0x01..00. 0x00..80 - 0x01..00 does not overflow
therefore missing the original purpose of the test.

I'm a bit confused by the PR description; I checked the original PR that added this test case and it seems like the overflow in the test title comes from the old code: https://github.com/apache/spark/pull/22101/files#diff-4ec35a60ad6a3f3f60f4d5ce91f59933L61-L63 To keep the original intention, why do you think we need to update the existing test in little-endian cases?

If you mean, this is kind of a different issue -- yes. Should be a new JIRA. I'd summarize this as: the bytes that this test sets up and asserts about are different on big-endian. It creates the wrong test.

IIUC, the intention of this test is to perform the comparison at compare() regardless endianness.

For people, would it be good to add byte pattern to be tested.

80 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01

dongjoon-hyun

Thank you for your contribution, @mundaym .
BTW, please create a new JIRA with TESTS component because SPARK-29918 is already released .

mundaym · 2020-07-29T11:04:46Z

Does anyone know if there is a clean(er) way to copy an 8 byte array into a single element in an UnsafeRow? I'm trying to avoid the byte reversal but as far as I can tell the interface requires me to go through setLong to set an 8 byte value which means I need to take into account the endianness of the platform to get a particular byte layout in memory (i.e. use reverseBytes or go through a ByteBuffer). The other option is to modify the byte array that backs the UnsafeRow directly but I don't think that would necessarily be robust against any future UnsafeRow changes.

srowen · 2020-07-29T14:32:52Z

Maybe just leave it as is and write a comment about what the resulting byte order is, for readers

srowen · 2020-07-31T15:40:37Z

I think we can move forward on this if we just change the comments, to show the desired bytes that are being written. Functionally this seems OK.

cloud-fan · 2020-08-04T05:57:42Z

...core/src/test/java/test/org/apache/spark/sql/execution/sort/RecordBinaryComparatorSuite.java


    insertRow(row1);
    insertRow(row2);

-    Assert.assertTrue(compare(0, 1) > 0);
+    Assert.assertTrue(compare(0, 1) < 0);


My understanding is: the comparator compares bytes from left to right. It doesn't assume the byte ordering of the data. It's expected that different byte order leads to different comparison results.

I think a simple way to fix the test is:

if (LITTLE_ENDIAN) { Assert.assertTrue(compare(0, 1) < 0); } else { Assert.assertTrue(compare(0, 1) > 0); }

That would also work here. I think it's a little less ideal, because then you are running a different test on little- vs big-endian (and those are the correct answers to the two different tests). I think only one of those tests was the intended one AFAICT, and I buy the argument that the proposed change restores it as well as addresses endianness.

then you are running a different test on little- vs big-endian

Yea it's true. But it's also true that this is what happens in the reality: the comparator compares bytes from left to right, no matter it's little- or big-endian.

That's right. There is no endian-ness issue in the logic being tested. There is endian-ness in how this test constructs the bytes to compare, because it writes a long into memory. I doubt this test's purpose holds both ways? It "works" but either little- or big-endian machines aren't running the test as intended. It feels like there's one intended test to run here, not either of two.

…an platforms. Comparisons performed by RecordBinaryComparator are executed in lexicographic order (byte by byte starting from the byte at index 0). This means that the underlying endianness of the platform can affect the result of a comparison between multi-byte values such as longs. This difference means that two tests fail on big-endian platforms. Also, the two tests compare 'special' long values to test edge cases that triggered old bugs in the fast path of the comparator. However since PR apache#26548 these 'special' values are byte-reversed before being compared on little-endian platforms, which means that the edge cases these 'special' values were designed to trigger are no longer tested. This PR fixes both these issues by byte reversing the values on little-endian systems. RecordBinaryComparatorSuite tests now pass on a big-endian machine (s390x).

mundaym · 2020-08-05T11:23:30Z

Thanks everyone. Sorry for the delay. I've updated the PR commit description and the comments in the tests. I've also opened and linked against a new JIRA issue.

srowen

This looks OK to me.

cloud-fan · 2020-08-05T14:43:22Z

...core/src/test/java/test/org/apache/spark/sql/execution/sort/RecordBinaryComparatorSuite.java

+    // on little-endian platforms.
+    long row1Data = 11L;
+    long row2Data = 11L + Integer.MAX_VALUE;
+    if (ByteOrder.nativeOrder().equals(ByteOrder.LITTLE_ENDIAN)) {


My worry is that: this changes what we were testing before. When we wrote the test, we assume the test is run in little-endian machines. Now we reverse the bytes for little-endian machines, and as a result the testing result also changes (result > 0 -> result < 0).

I'm not very sure about the intention of these 2 tests. Maybe they were wrong at the very beginning and we should reverse the bytes for testing. I'll leave it for others to decide.

That is perhaps the only remaining issue. We should fix the test that this sets up of course, but, which was the intended test? I'm also not 100% sure, but @mundaym had a reasonable argument at #29259 (comment)

ah I see, LGTM

SparkQA · 2020-08-05T15:33:17Z

Test build #127095 has finished for PR 29259 at commit 5d29560.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-08-05T16:11:01Z

thanks, merging to master!

…ryComparatorSuite ### What changes were proposed in this pull request? PR apache#26548 means that RecordBinaryComparator now uses big endian byte order for long comparisons. However, this means that some of the constants in the regression tests no longer map to the same values in the comparison that they used to. For example, one of the tests does a comparison between Long.MIN_VALUE and 1 in order to trigger an overflow condition that existed in the past (i.e. Long.MIN_VALUE - 1). These constants correspond to the values 0x80..00 and 0x00..01. However on a little-endian machine the bytes in these values are now swapped before they are compared. This means that we will now be comparing 0x00..80 with 0x01..00. 0x00..80 - 0x01..00 does not overflow therefore missing the original purpose of the test. To fix this the constants are now explicitly written out in big endian byte order to match the byte order used in the comparison. This also fixes the tests on big endian machines (which would otherwise get a different comparison result to the little-endian machines). ### Why are the changes needed? The regression tests no longer serve their initial purposes and also fail on big-endian systems. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Tests run on big-endian system (s390x). Closes apache#29259 from mundaym/fix-endian. Authored-by: Michael Munday <mike.munday@ibm.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

probot-autolabeler bot added the SQL label Jul 27, 2020

viirya reviewed Jul 27, 2020

View reviewed changes

maropu reviewed Jul 28, 2020

View reviewed changes

dongjoon-hyun requested changes Jul 28, 2020

View reviewed changes

cloud-fan reviewed Aug 4, 2020

View reviewed changes

mundaym force-pushed the fix-endian branch from f0abc7e to 5d29560 Compare August 5, 2020 11:20

mundaym changed the title ~~[SPARK-29918][SQL][FOLLOWUP][TEST] Fix endianness issues in tests in RecordBinaryComparatorSuite~~ [SPARK-32485][SQL][TEST] Fix endianness issues in tests in RecordBinaryComparatorSuite Aug 5, 2020

mundaym requested a review from dongjoon-hyun August 5, 2020 11:26

srowen reviewed Aug 5, 2020

View reviewed changes

cloud-fan reviewed Aug 5, 2020

View reviewed changes

cloud-fan closed this in 4a0427c Aug 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-32485][SQL][TEST] Fix endianness issues in tests in RecordBinaryComparatorSuite #29259

[SPARK-32485][SQL][TEST] Fix endianness issues in tests in RecordBinaryComparatorSuite #29259

mundaym commented Jul 27, 2020

HyukjinKwon commented Jul 27, 2020

cloud-fan commented Jul 27, 2020

srowen commented Jul 27, 2020

mundaym commented Jul 27, 2020 •

edited

Loading

mundaym commented Jul 27, 2020 •

edited

Loading

srowen commented Jul 27, 2020

mundaym commented Jul 27, 2020

srowen commented Jul 27, 2020

SparkQA commented Jul 27, 2020

viirya commented Jul 27, 2020

srowen commented Jul 27, 2020

viirya commented Jul 27, 2020

viirya Jul 27, 2020

srowen Jul 27, 2020

viirya Jul 27, 2020

srowen Jul 27, 2020

maropu Jul 28, 2020

srowen Jul 28, 2020

kiszk Jul 28, 2020

dongjoon-hyun left a comment •

edited

Loading

mundaym commented Jul 29, 2020

srowen commented Jul 29, 2020

srowen commented Jul 31, 2020

cloud-fan Aug 4, 2020

srowen Aug 4, 2020

cloud-fan Aug 4, 2020

srowen Aug 4, 2020

mundaym commented Aug 5, 2020

srowen left a comment

cloud-fan Aug 5, 2020

srowen Aug 5, 2020

cloud-fan Aug 5, 2020

SparkQA commented Aug 5, 2020

cloud-fan commented Aug 5, 2020

[SPARK-32485][SQL][TEST] Fix endianness issues in tests in RecordBinaryComparatorSuite #29259

[SPARK-32485][SQL][TEST] Fix endianness issues in tests in RecordBinaryComparatorSuite #29259

Conversation

mundaym commented Jul 27, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

HyukjinKwon commented Jul 27, 2020

cloud-fan commented Jul 27, 2020

srowen commented Jul 27, 2020

mundaym commented Jul 27, 2020 • edited Loading

mundaym commented Jul 27, 2020 • edited Loading

srowen commented Jul 27, 2020

mundaym commented Jul 27, 2020

srowen commented Jul 27, 2020

SparkQA commented Jul 27, 2020

viirya commented Jul 27, 2020

srowen commented Jul 27, 2020

viirya commented Jul 27, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dongjoon-hyun left a comment • edited Loading

Choose a reason for hiding this comment

mundaym commented Jul 29, 2020

srowen commented Jul 29, 2020

srowen commented Jul 31, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mundaym commented Aug 5, 2020

srowen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Aug 5, 2020

cloud-fan commented Aug 5, 2020

mundaym commented Jul 27, 2020 •

edited

Loading

mundaym commented Jul 27, 2020 •

edited

Loading

dongjoon-hyun left a comment •

edited

Loading