-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-51093][SQL][TESTS] Fix minor endianness issues in tests. #49812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-51093][SQL][TESTS] Fix minor endianness issues in tests. #49812
Conversation
ArrayBasedMapBuilderSuite: The output of the UnsafeRow.toString() is based on the underlying bytes and is endian dependent. Add an expected value for big endian platforms. WriteDistributionAndOrderingSuite: Casting the id of type Int to Long doesn't work on big endian platforms because the BucketFunction calls UnsafeRow.getLong() for that column. That happens to work on little endian since an int field is stored in the first 4 bytes of the 8 byte field so positive ints are layed out the same as positive longs ie. little endian order. On big endian, the layout of UnsafeRow int fields does not happen to match the layout of long fields for the same number. Change the type of the id column to Long so that it matches what BucketFunction expects. Signed-off-by: Jonathan Albrecht <jonathan.albrecht@ibm.com>
MaxGekk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add the [TESTS] tag to PR's title, please.
| condition = "DUPLICATED_MAP_KEY", | ||
| parameters = Map( | ||
| "key" -> "[0,1]", | ||
| "key" -> keyAsString, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you just replace [0,1] by unsafeRow.toString?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @MaxGekk! Yes, that's much better. I've pushed the change
…t literal expected values Signed-off-by: Jonathan Albrecht <jonathan.albrecht@ibm.com>
MaxGekk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Waiting for CI.
|
The CI failed on a pyspark mllib test with: So its not related to this change. All other tests look like they passed if I'm reading it correctly. |
Highly likely you are right, but this particular GitHub action stopped and didn't run the rest tests that might be related to your changes. May I ask you to re-run only the failed GA (you can do that in UI). |
|
Thanks @MaxGekk, I didn't know I could do that. I reran the failing build and it passed this time |
|
+1, LGTM. Merging to master/4.0. |
### What changes were proposed in this pull request? Fix minor endianness issues in the following tests. ArrayBasedMapBuilderSuite: The output of the UnsafeRow.toString() is based on the underlying bytes and is endian dependent. Add an expected value for big endian platforms. Add an expected value for big endian platforms. WriteDistributionAndOrderingSuite: Casting the id of type Int to Long doesn't work on big endian platforms because the BucketFunction calls UnsafeRow.getLong() for that column. That happens to work on little endian since an int field is stored in the first 4 bytes of the 8 byte field so positive ints are layed out the same as positive longs ie. little endian order. On big endian, the layout of UnsafeRow int fields does not happen to match the layout of long fields for the same number. Change the type of the id column to Long so that it matches what BucketFunction expects. Change the type of the id column to Long so that it matches what BucketFunction expects. ### Why are the changes needed? Allow tests to pass on big endian platforms ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Ran existing tests on amd64 (little endian) and s390x (big endian) ### Was this patch authored or co-authored using generative AI tooling? No Closes #49812 from jonathan-albrecht-ibm/master-endian-testEndianness. Authored-by: Jonathan Albrecht <jonathan.albrecht@ibm.com> Signed-off-by: Max Gekk <max.gekk@gmail.com> (cherry picked from commit f5f7c36) Signed-off-by: Max Gekk <max.gekk@gmail.com>
|
@MaxGekk Thanks for reviewing and merging! |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @jonathan-albrecht-ibm and @MaxGekk .
BTW, can we have this in branch-3.5 too?
I didn't merge to 3.5 because the changes cause conflicts in |
This is the branch-3.5 backport of #49812 ### What changes were proposed in this pull request? Fix minor endianness issues in the following tests. ArrayBasedMapBuilderSuite: The output of the UnsafeRow.toString() is based on the underlying bytes and is endian dependent. Add an expected value for big endian platforms. Add an expected value for big endian platforms. WriteDistributionAndOrderingSuite: Casting the id of type Int to Long doesn't work on big endian platforms because the BucketFunction calls UnsafeRow.getLong() for that column. That happens to work on little endian since an int field is stored in the first 4 bytes of the 8 byte field so positive ints are layed out the same as positive longs ie. little endian order. On big endian, the layout of UnsafeRow int fields does not happen to match the layout of long fields for the same number. Change the type of the id column to Long so that it matches what BucketFunction expects. Change the type of the id column to Long so that it matches what BucketFunction expects. ### Why are the changes needed? Allow tests to pass on big endian platforms ### Does this PR introduce any user-facing change? No ### How was this patch tested? Ran existing tests on amd64 (little endian) and s390x (big endian) ### Was this patch authored or co-authored using generative AI tooling? No Closes #49866 from jonathan-albrecht-ibm/branch-3.5-endian-testEndianness. Authored-by: Jonathan Albrecht <jonathan.albrecht@ibm.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request? Fix minor endianness issues in the following tests. ArrayBasedMapBuilderSuite: The output of the UnsafeRow.toString() is based on the underlying bytes and is endian dependent. Add an expected value for big endian platforms. Add an expected value for big endian platforms. WriteDistributionAndOrderingSuite: Casting the id of type Int to Long doesn't work on big endian platforms because the BucketFunction calls UnsafeRow.getLong() for that column. That happens to work on little endian since an int field is stored in the first 4 bytes of the 8 byte field so positive ints are layed out the same as positive longs ie. little endian order. On big endian, the layout of UnsafeRow int fields does not happen to match the layout of long fields for the same number. Change the type of the id column to Long so that it matches what BucketFunction expects. Change the type of the id column to Long so that it matches what BucketFunction expects. ### Why are the changes needed? Allow tests to pass on big endian platforms ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Ran existing tests on amd64 (little endian) and s390x (big endian) ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#49812 from jonathan-albrecht-ibm/master-endian-testEndianness. Authored-by: Jonathan Albrecht <jonathan.albrecht@ibm.com> Signed-off-by: Max Gekk <max.gekk@gmail.com> (cherry picked from commit 9237eae) Signed-off-by: Max Gekk <max.gekk@gmail.com>
What changes were proposed in this pull request?
Fix minor endianness issues in the following tests.
ArrayBasedMapBuilderSuite: The output of the UnsafeRow.toString() is based on the underlying bytes and is endian dependent. Add an expected value for big endian platforms. Add an expected value for big endian platforms.
WriteDistributionAndOrderingSuite: Casting the id of type Int to Long doesn't work on big endian platforms because the BucketFunction calls UnsafeRow.getLong() for that column. That happens to work on little endian since an int field is stored in the first 4 bytes of the 8 byte field so positive ints are layed out the same as positive longs ie. little endian order. On big endian, the layout of UnsafeRow int fields does not happen to match the layout of long fields for the same number. Change the type of the id column to Long so that it matches what BucketFunction expects. Change the type of the id column to Long so that it matches what BucketFunction expects.
Why are the changes needed?
Allow tests to pass on big endian platforms
Does this PR introduce any user-facing change?
No
How was this patch tested?
Ran existing tests on amd64 (little endian) and s390x (big endian)
Was this patch authored or co-authored using generative AI tooling?
No