Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-32485][SQL][TEST] Fix endianness issues in tests in RecordBinaryComparatorSuite #29259

Closed
wants to merge 1 commit into from

Conversation

mundaym
Copy link
Contributor

@mundaym mundaym commented Jul 27, 2020

What changes were proposed in this pull request?

PR #26548 means that RecordBinaryComparator now uses big endian
byte order for long comparisons. However, this means that some of
the constants in the regression tests no longer map to the same
values in the comparison that they used to.

For example, one of the tests does a comparison between
Long.MIN_VALUE and 1 in order to trigger an overflow condition that
existed in the past (i.e. Long.MIN_VALUE - 1). These constants
correspond to the values 0x80..00 and 0x00..01. However on a
little-endian machine the bytes in these values are now swapped
before they are compared. This means that we will now be comparing
0x00..80 with 0x01..00. 0x00..80 - 0x01..00 does not overflow
therefore missing the original purpose of the test.

To fix this the constants are now explicitly written out in big
endian byte order to match the byte order used in the comparison.
This also fixes the tests on big endian machines (which would
otherwise get a different comparison result to the little-endian
machines).

Why are the changes needed?

The regression tests no longer serve their initial purposes and also fail on big-endian systems.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Tests run on big-endian system (s390x).

@HyukjinKwon
Copy link
Member

ok to test

@cloud-fan
Copy link
Contributor

I think #26548 only guarantees that the comparison results of compared by 8 bytes at a time and compared by bytes wise are consistent, but it doesn't mean the comparison results are consistent between big-endian and little-endian systems.

I'm not sure if this is a problem if the Spark cluster is mixed-architecture. cc @kiszk @rednaxelafx @maropu @viirya

@srowen
Copy link
Member

srowen commented Jul 27, 2020

Are you saying the test fails on big-endian machines, as-is?

@mundaym
Copy link
Contributor Author

mundaym commented Jul 27, 2020

Are you saying the test fails on big-endian machines, as-is?

Yes. The long values being compared are in native byte order. Therefore the result of the byte-by-byte comparison may be different on big and little endian machines.

@mundaym
Copy link
Contributor Author

mundaym commented Jul 27, 2020

I think #26548 only guarantees that the comparison results of compared by 8 bytes at a time and compared by bytes wise are consistent, but it doesn't mean the comparison results are consistent between big-endian and little-endian systems.

Yes. This PR ensures that the order of the bytes provided to the comparator in the tests is the same on both big- and little-endian systems.

This is required because #26548 didn't actually change the behaviour of the comparator on big-endian systems (which already compared 8-byte chunks in big-endian byte order) but did change the expected output of the comparisons in these tests. The change therefore broke these tests on big-endian systems.

@srowen
Copy link
Member

srowen commented Jul 27, 2020

The change you cite doesn't change how things are compared - just fixes the intended logic.
Endian-ness isn't really at issue here in the comparison logic, because this is byte-by-byte comparison, so I'd remove comments to that effect.

What's at issue here is what the test writes. Writing 11 to memory directly on little-endian writes different bytes as big-endian. They result in a different comparison because the test case ends up expressing a different test.

I think the change is therefore reasonable, but would instead swap the bytes when the arch is big-endian, and leave the current assertion. Just a minor semantic difference that preserves the existing test, but, it's pretty fine either way.

Is that the only current big-endian test failure?

@mundaym
Copy link
Contributor Author

mundaym commented Jul 27, 2020

Endian-ness isn't really at issue here in the comparison logic, because this is byte-by-byte comparison, so I'd remove comments to that effect.

Ack. I'll rework that.

I think the change is therefore reasonable, but would instead swap the bytes when the arch is big-endian, and leave the current assertion. Just a minor semantic difference that preserves the existing test, but, it's pretty fine either way.

The reason for writing the values in big-endian byte order is that it preserves the original intent of the tests which are trying and trigger old bugs related to mis-sized integer comparisons. Writing the values in little-endian byte order makes the tests less applicable since they aren't hitting those cases.

It might be better to just write out the bytes individually and avoid the platform endianness check. I'll see if that change is cleaner.

Is that the only current big-endian test failure?

In this suite yes but there are test failures in other places. I am currently going through and fixing what I can. Eventually we'd like to try and get a big endian system into Jenkins, if possible.

@srowen
Copy link
Member

srowen commented Jul 27, 2020

I see your point, the original intent was likely to write the bytes as if the big-endian representation of a long. OK.
Sure, agree, just writing bytes is probably clearer still. Or at least write the intended bytes in the comment for the reader.

Up to you about how you want to attack it - if there are other similar failures we can fix them together in one PR to group them logically. If there are several distinct issues you can tackle them separately if desired, or, together could be OK too. I suspect they're all related to this general part of the code.

@SparkQA
Copy link

SparkQA commented Jul 27, 2020

Test build #126648 has finished for PR 29259 at commit f0abc7e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member

viirya commented Jul 27, 2020

I think #26548 only guarantees that the comparison results of compared by 8 bytes at a time and compared by bytes wise are consistent, but it doesn't mean the comparison results are consistent between big-endian and little-endian systems.

I'm not sure if this is a problem if the Spark cluster is mixed-architecture. cc @kiszk @rednaxelafx @maropu @viirya

Based on #26548 and the comments, RecordBinaryComparator just needs to make sure a consistent order on the records to compare between reruns. A different order between big-endian and little-endian system is possible by RecordBinaryComparator. But I think it won't affect correctness generally.

@srowen
Copy link
Member

srowen commented Jul 27, 2020

@viirya but this test asserts a particular ordering -- for test purposes, sure. The ordering won't be consistent, so the test fails based on arch. Because this is a special case, asserting the ordering, the endian-ness has to be fixed.

@viirya
Copy link
Member

viirya commented Jul 27, 2020

OK, I can see the point. Looks we have better to fix it.

long row1Data = 11L;
long row2Data = 11L + Integer.MAX_VALUE;

// BinaryComparator compares longs in big-endian byte order.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you like to add few more words? It is a bit hard to reason about this with this only sentence.

E.g. "... In little-endian sysyem, this bytes comparison does not overflow. We swap it to make sure overflow happened."

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't even think that's quite true. The comparison isn't endian at all as it is byte-by-byte. But the point here is to write bytes in a certain order for the test, for sure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the overflow only happened when comparing longs? @mundaym

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think overflow was the issue per se; signed vs unsigned bytes were, for sure, in the original issue. But not so much here in this test case.

long row2Data = 1;

// BinaryComparator compares longs in big-endian byte order.
if (ByteOrder.nativeOrder().equals(ByteOrder.LITTLE_ENDIAN)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, one of the tests does a comparison between
Long.MIN_VALUE and 1 in order to trigger an overflow condition that
existed in the past (i.e. Long.MIN_VALUE - 1). These constants
correspond to the values 0x80..00 and 0x00..01. However on a
little-endian machine the bytes in these values are now swapped
before they are compared. This means that we will now be comparing
0x00..80 with 0x01..00. 0x00..80 - 0x01..00 does not overflow
therefore missing the original purpose of the test.

I'm a bit confused by the PR description; I checked the original PR that added this test case and it seems like the overflow in the test title comes from the old code: https://github.com/apache/spark/pull/22101/files#diff-4ec35a60ad6a3f3f60f4d5ce91f59933L61-L63 To keep the original intention, why do you think we need to update the existing test in little-endian cases?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you mean, this is kind of a different issue -- yes. Should be a new JIRA. I'd summarize this as: the bytes that this test sets up and asserts about are different on big-endian. It creates the wrong test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, the intention of this test is to perform the comparison at compare() regardless endianness.

For people, would it be good to add byte pattern to be tested.

80 00 00 00 00 00 00 00
00 00 00 00 00 00 00 01

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution, @mundaym .
BTW, please create a new JIRA with TESTS component because SPARK-29918 is already released .

@mundaym
Copy link
Contributor Author

mundaym commented Jul 29, 2020

Does anyone know if there is a clean(er) way to copy an 8 byte array into a single element in an UnsafeRow? I'm trying to avoid the byte reversal but as far as I can tell the interface requires me to go through setLong to set an 8 byte value which means I need to take into account the endianness of the platform to get a particular byte layout in memory (i.e. use reverseBytes or go through a ByteBuffer). The other option is to modify the byte array that backs the UnsafeRow directly but I don't think that would necessarily be robust against any future UnsafeRow changes.

@srowen
Copy link
Member

srowen commented Jul 29, 2020

Maybe just leave it as is and write a comment about what the resulting byte order is, for readers

@srowen
Copy link
Member

srowen commented Jul 31, 2020

I think we can move forward on this if we just change the comments, to show the desired bytes that are being written. Functionally this seems OK.


insertRow(row1);
insertRow(row2);

Assert.assertTrue(compare(0, 1) > 0);
Assert.assertTrue(compare(0, 1) < 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is: the comparator compares bytes from left to right. It doesn't assume the byte ordering of the data. It's expected that different byte order leads to different comparison results.

I think a simple way to fix the test is:

if (LITTLE_ENDIAN) {
  Assert.assertTrue(compare(0, 1) < 0);
} else {
  Assert.assertTrue(compare(0, 1) > 0);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would also work here. I think it's a little less ideal, because then you are running a different test on little- vs big-endian (and those are the correct answers to the two different tests). I think only one of those tests was the intended one AFAICT, and I buy the argument that the proposed change restores it as well as addresses endianness.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then you are running a different test on little- vs big-endian

Yea it's true. But it's also true that this is what happens in the reality: the comparator compares bytes from left to right, no matter it's little- or big-endian.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. There is no endian-ness issue in the logic being tested. There is endian-ness in how this test constructs the bytes to compare, because it writes a long into memory. I doubt this test's purpose holds both ways? It "works" but either little- or big-endian machines aren't running the test as intended. It feels like there's one intended test to run here, not either of two.

…an platforms.

Comparisons performed by RecordBinaryComparator are executed in
lexicographic order (byte by byte starting from the byte at index 0).
This means that the underlying endianness of the platform can affect
the result of a comparison between multi-byte values such as longs.
This difference means that two tests fail on big-endian platforms.

Also, the two tests compare 'special' long values to test edge cases
that triggered old bugs in the fast path of the comparator. However
since PR apache#26548 these 'special' values are byte-reversed before being
compared on little-endian platforms, which means that the edge cases
these 'special' values were designed to trigger are no longer tested.

This PR fixes both these issues by byte reversing the values on
little-endian systems.

RecordBinaryComparatorSuite tests now pass on a big-endian machine
(s390x).
@mundaym mundaym changed the title [SPARK-29918][SQL][FOLLOWUP][TEST] Fix endianness issues in tests in RecordBinaryComparatorSuite [SPARK-32485][SQL][TEST] Fix endianness issues in tests in RecordBinaryComparatorSuite Aug 5, 2020
@mundaym
Copy link
Contributor Author

mundaym commented Aug 5, 2020

Thanks everyone. Sorry for the delay. I've updated the PR commit description and the comments in the tests. I've also opened and linked against a new JIRA issue.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks OK to me.

// on little-endian platforms.
long row1Data = 11L;
long row2Data = 11L + Integer.MAX_VALUE;
if (ByteOrder.nativeOrder().equals(ByteOrder.LITTLE_ENDIAN)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My worry is that: this changes what we were testing before. When we wrote the test, we assume the test is run in little-endian machines. Now we reverse the bytes for little-endian machines, and as a result the testing result also changes (result > 0 -> result < 0).

I'm not very sure about the intention of these 2 tests. Maybe they were wrong at the very beginning and we should reverse the bytes for testing. I'll leave it for others to decide.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is perhaps the only remaining issue. We should fix the test that this sets up of course, but, which was the intended test? I'm also not 100% sure, but @mundaym had a reasonable argument at #29259 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah I see, LGTM

@SparkQA
Copy link

SparkQA commented Aug 5, 2020

Test build #127095 has finished for PR 29259 at commit 5d29560.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 4a0427c Aug 5, 2020
holdenk pushed a commit to holdenk/spark that referenced this pull request Oct 27, 2020
…ryComparatorSuite

### What changes were proposed in this pull request?
PR apache#26548 means that RecordBinaryComparator now uses big endian
byte order for long comparisons. However, this means that some of
the constants in the regression tests no longer map to the same
values in the comparison that they used to.

For example, one of the tests does a comparison between
Long.MIN_VALUE and 1 in order to trigger an overflow condition that
existed in the past (i.e. Long.MIN_VALUE - 1). These constants
correspond to the values 0x80..00 and 0x00..01. However on a
little-endian machine the bytes in these values are now swapped
before they are compared. This means that we will now be comparing
0x00..80 with 0x01..00. 0x00..80 - 0x01..00 does not overflow
therefore missing the original purpose of the test.

To fix this the constants are now explicitly written out in big
endian byte order to match the byte order used in the comparison.
This also fixes the tests on big endian machines (which would
otherwise get a different comparison result to the little-endian
machines).

### Why are the changes needed?
The regression tests no longer serve their initial purposes and also fail on big-endian systems.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Tests run on big-endian system (s390x).

Closes apache#29259 from mundaym/fix-endian.

Authored-by: Michael Munday <mike.munday@ibm.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
9 participants