[SPARK-51093][SQL][TESTS] Fix minor endianness issues in tests. #49812

jonathan-albrecht-ibm · 2025-02-05T13:42:30Z

What changes were proposed in this pull request?

Fix minor endianness issues in the following tests.

ArrayBasedMapBuilderSuite: The output of the UnsafeRow.toString() is based on the underlying bytes and is endian dependent. Add an expected value for big endian platforms. Add an expected value for big endian platforms.

WriteDistributionAndOrderingSuite: Casting the id of type Int to Long doesn't work on big endian platforms because the BucketFunction calls UnsafeRow.getLong() for that column. That happens to work on little endian since an int field is stored in the first 4 bytes of the 8 byte field so positive ints are layed out the same as positive longs ie. little endian order. On big endian, the layout of UnsafeRow int fields does not happen to match the layout of long fields for the same number. Change the type of the id column to Long so that it matches what BucketFunction expects. Change the type of the id column to Long so that it matches what BucketFunction expects.

Why are the changes needed?

Allow tests to pass on big endian platforms

Does this PR introduce any user-facing change?

No

How was this patch tested?

Ran existing tests on amd64 (little endian) and s390x (big endian)

Was this patch authored or co-authored using generative AI tooling?

No

ArrayBasedMapBuilderSuite: The output of the UnsafeRow.toString() is based on the underlying bytes and is endian dependent. Add an expected value for big endian platforms. WriteDistributionAndOrderingSuite: Casting the id of type Int to Long doesn't work on big endian platforms because the BucketFunction calls UnsafeRow.getLong() for that column. That happens to work on little endian since an int field is stored in the first 4 bytes of the 8 byte field so positive ints are layed out the same as positive longs ie. little endian order. On big endian, the layout of UnsafeRow int fields does not happen to match the layout of long fields for the same number. Change the type of the id column to Long so that it matches what BucketFunction expects. Signed-off-by: Jonathan Albrecht <jonathan.albrecht@ibm.com>

MaxGekk

Could you add the [TESTS] tag to PR's title, please.

MaxGekk · 2025-02-05T21:15:29Z

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapBuilderSuite.scala

      condition = "DUPLICATED_MAP_KEY",
      parameters = Map(
-        "key" -> "[0,1]",
+        "key" -> keyAsString,


Can't you just replace [0,1] by unsafeRow.toString?

Thanks @MaxGekk! Yes, that's much better. I've pushed the change

…t literal expected values Signed-off-by: Jonathan Albrecht <jonathan.albrecht@ibm.com>

MaxGekk

Waiting for CI.

jonathan-albrecht-ibm · 2025-02-06T17:02:12Z

The CI failed on a pyspark mllib test with:

RuntimeError: The server socket has failed to listen on any local network address. port: 38587, useIpv6: 0, code: -98, name: EADDRINUSE, message: address already in use

So its not related to this change. All other tests look like they passed if I'm reading it correctly.

MaxGekk · 2025-02-06T17:31:33Z

So its not related to this change.

Highly likely you are right, but this particular GitHub action stopped and didn't run the rest tests that might be related to your changes. May I ask you to re-run only the failed GA (you can do that in UI).

jonathan-albrecht-ibm · 2025-02-06T20:02:15Z

Thanks @MaxGekk, I didn't know I could do that. I reran the failing build and it passed this time

MaxGekk · 2025-02-07T09:47:54Z

+1, LGTM. Merging to master/4.0.
Thank you, @jonathan-albrecht-ibm.

### What changes were proposed in this pull request? Fix minor endianness issues in the following tests. ArrayBasedMapBuilderSuite: The output of the UnsafeRow.toString() is based on the underlying bytes and is endian dependent. Add an expected value for big endian platforms. Add an expected value for big endian platforms. WriteDistributionAndOrderingSuite: Casting the id of type Int to Long doesn't work on big endian platforms because the BucketFunction calls UnsafeRow.getLong() for that column. That happens to work on little endian since an int field is stored in the first 4 bytes of the 8 byte field so positive ints are layed out the same as positive longs ie. little endian order. On big endian, the layout of UnsafeRow int fields does not happen to match the layout of long fields for the same number. Change the type of the id column to Long so that it matches what BucketFunction expects. Change the type of the id column to Long so that it matches what BucketFunction expects. ### Why are the changes needed? Allow tests to pass on big endian platforms ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Ran existing tests on amd64 (little endian) and s390x (big endian) ### Was this patch authored or co-authored using generative AI tooling? No Closes #49812 from jonathan-albrecht-ibm/master-endian-testEndianness. Authored-by: Jonathan Albrecht <jonathan.albrecht@ibm.com> Signed-off-by: Max Gekk <max.gekk@gmail.com> (cherry picked from commit f5f7c36) Signed-off-by: Max Gekk <max.gekk@gmail.com>

jonathan-albrecht-ibm · 2025-02-07T13:25:34Z

@MaxGekk Thanks for reviewing and merging!

dongjoon-hyun

Thank you, @jonathan-albrecht-ibm and @MaxGekk .

BTW, can we have this in branch-3.5 too?

MaxGekk · 2025-02-07T16:19:15Z

BTW, can we have this in branch-3.5 too?

I didn't merge to 3.5 because the changes cause conflicts in branch-3.5. @jonathan-albrecht-ibm Could you open a separate PR, please.

This is the branch-3.5 backport of #49812 ### What changes were proposed in this pull request? Fix minor endianness issues in the following tests. ArrayBasedMapBuilderSuite: The output of the UnsafeRow.toString() is based on the underlying bytes and is endian dependent. Add an expected value for big endian platforms. Add an expected value for big endian platforms. WriteDistributionAndOrderingSuite: Casting the id of type Int to Long doesn't work on big endian platforms because the BucketFunction calls UnsafeRow.getLong() for that column. That happens to work on little endian since an int field is stored in the first 4 bytes of the 8 byte field so positive ints are layed out the same as positive longs ie. little endian order. On big endian, the layout of UnsafeRow int fields does not happen to match the layout of long fields for the same number. Change the type of the id column to Long so that it matches what BucketFunction expects. Change the type of the id column to Long so that it matches what BucketFunction expects. ### Why are the changes needed? Allow tests to pass on big endian platforms ### Does this PR introduce any user-facing change? No ### How was this patch tested? Ran existing tests on amd64 (little endian) and s390x (big endian) ### Was this patch authored or co-authored using generative AI tooling? No Closes #49866 from jonathan-albrecht-ibm/branch-3.5-endian-testEndianness. Authored-by: Jonathan Albrecht <jonathan.albrecht@ibm.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

### What changes were proposed in this pull request? Fix minor endianness issues in the following tests. ArrayBasedMapBuilderSuite: The output of the UnsafeRow.toString() is based on the underlying bytes and is endian dependent. Add an expected value for big endian platforms. Add an expected value for big endian platforms. WriteDistributionAndOrderingSuite: Casting the id of type Int to Long doesn't work on big endian platforms because the BucketFunction calls UnsafeRow.getLong() for that column. That happens to work on little endian since an int field is stored in the first 4 bytes of the 8 byte field so positive ints are layed out the same as positive longs ie. little endian order. On big endian, the layout of UnsafeRow int fields does not happen to match the layout of long fields for the same number. Change the type of the id column to Long so that it matches what BucketFunction expects. Change the type of the id column to Long so that it matches what BucketFunction expects. ### Why are the changes needed? Allow tests to pass on big endian platforms ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Ran existing tests on amd64 (little endian) and s390x (big endian) ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#49812 from jonathan-albrecht-ibm/master-endian-testEndianness. Authored-by: Jonathan Albrecht <jonathan.albrecht@ibm.com> Signed-off-by: Max Gekk <max.gekk@gmail.com> (cherry picked from commit 9237eae) Signed-off-by: Max Gekk <max.gekk@gmail.com>

github-actions bot added the SQL label Feb 5, 2025

jonathan-albrecht-ibm changed the title ~~Fix minor endianness issues in tests.~~ [SPARK-51093][SQL] Fix minor endianness issues in tests. Feb 5, 2025

MaxGekk reviewed Feb 5, 2025

View reviewed changes

jonathan-albrecht-ibm changed the title ~~[SPARK-51093][SQL] Fix minor endianness issues in tests.~~ [SPARK-51093][SQL][TESTS] Fix minor endianness issues in tests. Feb 6, 2025

Just call unsafeRow.toString() instead of using two endian dependen…

b86a9d2

…t literal expected values Signed-off-by: Jonathan Albrecht <jonathan.albrecht@ibm.com>

MaxGekk approved these changes Feb 6, 2025

View reviewed changes

MaxGekk closed this in f5f7c36 Feb 7, 2025

dongjoon-hyun reviewed Feb 7, 2025

View reviewed changes

jonathan-albrecht-ibm mentioned this pull request Feb 10, 2025

[SPARK-51093][SQL][TESTS][3.5] Fix minor endianness issues in tests #49866

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-51093][SQL][TESTS] Fix minor endianness issues in tests. #49812

[SPARK-51093][SQL][TESTS] Fix minor endianness issues in tests. #49812

Uh oh!

jonathan-albrecht-ibm commented Feb 5, 2025

Uh oh!

MaxGekk left a comment

Uh oh!

MaxGekk Feb 5, 2025

Uh oh!

jonathan-albrecht-ibm Feb 6, 2025

Uh oh!

MaxGekk left a comment

Uh oh!

jonathan-albrecht-ibm commented Feb 6, 2025

Uh oh!

MaxGekk commented Feb 6, 2025 •

edited

Loading

Uh oh!

jonathan-albrecht-ibm commented Feb 6, 2025

Uh oh!

MaxGekk commented Feb 7, 2025

Uh oh!

jonathan-albrecht-ibm commented Feb 7, 2025

Uh oh!

dongjoon-hyun left a comment

Uh oh!

MaxGekk commented Feb 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-51093][SQL][TESTS] Fix minor endianness issues in tests. #49812

[SPARK-51093][SQL][TESTS] Fix minor endianness issues in tests. #49812

Uh oh!

Conversation

jonathan-albrecht-ibm commented Feb 5, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

MaxGekk left a comment

Choose a reason for hiding this comment

Uh oh!

MaxGekk Feb 5, 2025

Choose a reason for hiding this comment

Uh oh!

jonathan-albrecht-ibm Feb 6, 2025

Choose a reason for hiding this comment

Uh oh!

MaxGekk left a comment

Choose a reason for hiding this comment

Uh oh!

jonathan-albrecht-ibm commented Feb 6, 2025

Uh oh!

MaxGekk commented Feb 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jonathan-albrecht-ibm commented Feb 6, 2025

Uh oh!

MaxGekk commented Feb 7, 2025

Uh oh!

jonathan-albrecht-ibm commented Feb 7, 2025

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

MaxGekk commented Feb 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MaxGekk commented Feb 6, 2025 •

edited

Loading