Skip to content

Conversation

@jonathan-albrecht-ibm
Copy link
Contributor

What changes were proposed in this pull request?

Fix minor endianness issues in the following tests.

ArrayBasedMapBuilderSuite: The output of the UnsafeRow.toString() is based on the underlying bytes and is endian dependent. Add an expected value for big endian platforms. Add an expected value for big endian platforms.

WriteDistributionAndOrderingSuite: Casting the id of type Int to Long doesn't work on big endian platforms because the BucketFunction calls UnsafeRow.getLong() for that column. That happens to work on little endian since an int field is stored in the first 4 bytes of the 8 byte field so positive ints are layed out the same as positive longs ie. little endian order. On big endian, the layout of UnsafeRow int fields does not happen to match the layout of long fields for the same number. Change the type of the id column to Long so that it matches what BucketFunction expects. Change the type of the id column to Long so that it matches what BucketFunction expects.

Why are the changes needed?

Allow tests to pass on big endian platforms

Does this PR introduce any user-facing change?

No

How was this patch tested?

Ran existing tests on amd64 (little endian) and s390x (big endian)

Was this patch authored or co-authored using generative AI tooling?

No

ArrayBasedMapBuilderSuite: The output of the UnsafeRow.toString() is based on
the underlying bytes and is endian dependent. Add an expected value for big
endian platforms.

WriteDistributionAndOrderingSuite: Casting the id of type Int to Long doesn't
work on big endian platforms because the BucketFunction calls UnsafeRow.getLong()
for that column. That happens to work on little endian since an int field is
stored in the first 4 bytes of the 8 byte field so positive ints are layed out
the same as positive longs ie. little endian order. On big endian, the layout of
UnsafeRow int fields does not happen to match the layout of long fields for the
same number. Change the type of the id column to Long so that it matches what
BucketFunction expects.

Signed-off-by: Jonathan Albrecht <jonathan.albrecht@ibm.com>
@github-actions github-actions bot added the SQL label Feb 5, 2025
@jonathan-albrecht-ibm jonathan-albrecht-ibm changed the title Fix minor endianness issues in tests. [SPARK-51093][SQL] Fix minor endianness issues in tests. Feb 5, 2025
Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add the [TESTS] tag to PR's title, please.

condition = "DUPLICATED_MAP_KEY",
parameters = Map(
"key" -> "[0,1]",
"key" -> keyAsString,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you just replace [0,1] by unsafeRow.toString?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @MaxGekk! Yes, that's much better. I've pushed the change

@jonathan-albrecht-ibm jonathan-albrecht-ibm changed the title [SPARK-51093][SQL] Fix minor endianness issues in tests. [SPARK-51093][SQL][TESTS] Fix minor endianness issues in tests. Feb 6, 2025
…t literal expected values

Signed-off-by: Jonathan Albrecht <jonathan.albrecht@ibm.com>
Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waiting for CI.

@jonathan-albrecht-ibm
Copy link
Contributor Author

The CI failed on a pyspark mllib test with:

RuntimeError: The server socket has failed to listen on any local network address. port: 38587, useIpv6: 0, code: -98, name: EADDRINUSE, message: address already in use

So its not related to this change. All other tests look like they passed if I'm reading it correctly.

@MaxGekk
Copy link
Member

MaxGekk commented Feb 6, 2025

So its not related to this change.

Highly likely you are right, but this particular GitHub action stopped and didn't run the rest tests that might be related to your changes. May I ask you to re-run only the failed GA (you can do that in UI).

@jonathan-albrecht-ibm
Copy link
Contributor Author

Thanks @MaxGekk, I didn't know I could do that. I reran the failing build and it passed this time

@MaxGekk
Copy link
Member

MaxGekk commented Feb 7, 2025

+1, LGTM. Merging to master/4.0.
Thank you, @jonathan-albrecht-ibm.

@MaxGekk MaxGekk closed this in f5f7c36 Feb 7, 2025
MaxGekk pushed a commit that referenced this pull request Feb 7, 2025
### What changes were proposed in this pull request?
Fix minor endianness issues in the following tests.

ArrayBasedMapBuilderSuite: The output of the UnsafeRow.toString() is based on the underlying bytes and is endian dependent. Add an expected value for big endian platforms. Add an expected value for big endian platforms.

WriteDistributionAndOrderingSuite: Casting the id of type Int to Long doesn't work on big endian platforms because the BucketFunction calls UnsafeRow.getLong() for that column. That happens to work on little endian since an int field is stored in the first 4 bytes of the 8 byte field so positive ints are layed out the same as positive longs ie. little endian order. On big endian, the layout of UnsafeRow int fields does not happen to match the layout of long fields for the same number. Change the type of the id column to Long so that it matches what BucketFunction expects. Change the type of the id column to Long so that it matches what BucketFunction expects.

### Why are the changes needed?
Allow tests to pass on big endian platforms

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Ran existing tests on amd64 (little endian) and s390x (big endian)

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #49812 from jonathan-albrecht-ibm/master-endian-testEndianness.

Authored-by: Jonathan Albrecht <jonathan.albrecht@ibm.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
(cherry picked from commit f5f7c36)
Signed-off-by: Max Gekk <max.gekk@gmail.com>
@jonathan-albrecht-ibm
Copy link
Contributor Author

@MaxGekk Thanks for reviewing and merging!

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @jonathan-albrecht-ibm and @MaxGekk .

BTW, can we have this in branch-3.5 too?

@MaxGekk
Copy link
Member

MaxGekk commented Feb 7, 2025

BTW, can we have this in branch-3.5 too?

I didn't merge to 3.5 because the changes cause conflicts in branch-3.5. @jonathan-albrecht-ibm Could you open a separate PR, please.

dongjoon-hyun pushed a commit that referenced this pull request Feb 10, 2025
This is the branch-3.5 backport of #49812

### What changes were proposed in this pull request?
Fix minor endianness issues in the following tests.

ArrayBasedMapBuilderSuite: The output of the UnsafeRow.toString() is based on the underlying bytes and is endian dependent. Add an expected value for big endian platforms. Add an expected value for big endian platforms.

WriteDistributionAndOrderingSuite: Casting the id of type Int to Long doesn't work on big endian platforms because the BucketFunction calls UnsafeRow.getLong() for that column. That happens to work on little endian since an int field is stored in the first 4 bytes of the 8 byte field so positive ints are layed out the same as positive longs ie. little endian order. On big endian, the layout of UnsafeRow int fields does not happen to match the layout of long fields for the same number. Change the type of the id column to Long so that it matches what BucketFunction expects. Change the type of the id column to Long so that it matches what BucketFunction expects.

### Why are the changes needed?
Allow tests to pass on big endian platforms

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Ran existing tests on amd64 (little endian) and s390x (big endian)

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #49866 from jonathan-albrecht-ibm/branch-3.5-endian-testEndianness.

Authored-by: Jonathan Albrecht <jonathan.albrecht@ibm.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
zifeif2 pushed a commit to zifeif2/spark that referenced this pull request Nov 14, 2025
### What changes were proposed in this pull request?
Fix minor endianness issues in the following tests.

ArrayBasedMapBuilderSuite: The output of the UnsafeRow.toString() is based on the underlying bytes and is endian dependent. Add an expected value for big endian platforms. Add an expected value for big endian platforms.

WriteDistributionAndOrderingSuite: Casting the id of type Int to Long doesn't work on big endian platforms because the BucketFunction calls UnsafeRow.getLong() for that column. That happens to work on little endian since an int field is stored in the first 4 bytes of the 8 byte field so positive ints are layed out the same as positive longs ie. little endian order. On big endian, the layout of UnsafeRow int fields does not happen to match the layout of long fields for the same number. Change the type of the id column to Long so that it matches what BucketFunction expects. Change the type of the id column to Long so that it matches what BucketFunction expects.

### Why are the changes needed?
Allow tests to pass on big endian platforms

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Ran existing tests on amd64 (little endian) and s390x (big endian)

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#49812 from jonathan-albrecht-ibm/master-endian-testEndianness.

Authored-by: Jonathan Albrecht <jonathan.albrecht@ibm.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
(cherry picked from commit 9237eae)
Signed-off-by: Max Gekk <max.gekk@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants