Skip to content

Conversation

@jonathan-albrecht-ibm
Copy link
Contributor

What changes were proposed in this pull request?

Write the month and days fields of intervals with one call to Platform.put/getLong() instead of two calls to Platform.put/getInt().

In commit ac07cea there was a performance improvement to reading a writing CalendarIntervals in UnsafeRow. This makes writing intervals consistent with UnsafeRow and has better performance compared to the original code.

This also fixes big endian platforms where the old (two calls to getput) and new methods of reading and writing CalendarIntervals do not order the bytes in the same way. Currently CalendarInterval related tests in Catalyst and SQL are failing on big endian platforms.

There is no effect on little endian platforms (byte order is not affected) except for performance improvement.

Why are the changes needed?

  • Improves performance reading and writing CalendarIntervals in Unsafe* classes
  • Fixes big endian platforms where CalendarIntervals are not read or written correctly in Unsafe* classes

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing unit tests on big and little endian platforms

Was this patch authored or co-authored using generative AI tooling?

No

…m.putLong()

instead of two calls to Platform.putInt().

This makes writing intervals consistent with the way they are written and read in
other places like UnsafeRow.java and should have the same or better performance
compared to the original code.
It also makes the code endian independent so it works on big and little endian
platforms. After this change, all interval related tests are able to pass on big
endian platforms.

Signed-off-by: Jonathan Albrecht <jonathan.albrecht@ibm.com>
@github-actions github-actions bot added the SQL label Jan 30, 2025
Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is ok to align the implementation to UnsafeRow.java, and fix a big-endian issue.

@MaxGekk
Copy link
Member

MaxGekk commented Feb 6, 2025

+1, LGTM. Merging to master/4.0.
Thank you, @jonathan-albrecht-ibm.

@MaxGekk MaxGekk closed this in a79ba48 Feb 6, 2025
MaxGekk pushed a commit that referenced this pull request Feb 6, 2025
…als with one call in Unsafe* classes

### What changes were proposed in this pull request?

Write the month and days fields of intervals with one call to Platform.put/getLong() instead of two calls to Platform.put/getInt().

In commit ac07cea there was a performance improvement to reading a writing CalendarIntervals in UnsafeRow. This makes writing intervals consistent with UnsafeRow and has better performance compared to the original code.

This also fixes big endian platforms where the old (two calls to getput) and new methods of reading and writing CalendarIntervals do not order the bytes in the same way. Currently CalendarInterval related tests in Catalyst and SQL are failing on big endian platforms.

There is no effect on little endian platforms (byte order is not affected) except for performance improvement.

### Why are the changes needed?

* Improves performance reading and writing CalendarIntervals in Unsafe* classes
* Fixes big endian platforms where CalendarIntervals are not read or written correctly in Unsafe* classes

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing unit tests on big and little endian platforms

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #49737 from jonathan-albrecht-ibm/master-endian-interval.

Authored-by: Jonathan Albrecht <jonathan.albrecht@ibm.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
(cherry picked from commit a79ba48)
Signed-off-by: Max Gekk <max.gekk@gmail.com>
@MaxGekk
Copy link
Member

MaxGekk commented Feb 6, 2025

@jonathan-albrecht-ibm Congratulations with your first contribution to Apache Spark!

@jonathan-albrecht-ibm
Copy link
Contributor Author

@MaxGekk Thanks for reviewing and merging!

@jonathan-albrecht-ibm
Copy link
Contributor Author

@MaxGekk Would it be possible to merge this to branch-3.5 as well? I have also built and tested it on a local build of 3.5.4.

More generally, should I mention that I'd like to merge a PR back as far as branch-3.5 in the PR description or in the JIRA or somewhere else? I didn't see any info on how to ask for which branches to merge to and not sure what the policies are.

@MaxGekk
Copy link
Member

MaxGekk commented Feb 7, 2025

Would it be possible to merge this to branch-3.5 as well?

I think so. In the public doc we declare:

Spark runs on both Windows and UNIX-like systems (e.g. Linux, Mac OS), and it should run on any platform that runs a supported version of Java.

see https://spark.apache.org/docs/latest/

Fixes big endian platforms where CalendarIntervals are not read or written correctly in Unsafe* classes

Apparently, we can consider the changes as a bug fix. cc @dongjoon-hyun @cloud-fan WDYT?

@cloud-fan
Copy link
Contributor

+1 to backport

@jonathan-albrecht-ibm
Copy link
Contributor Author

@MaxGekk, @cloud-fan Thanks for considering this to be backported. Should I open a new PR against branch-3.5? The cherry-pick of a79ba48 applies cleanly on branch-3.5

@dongjoon-hyun
Copy link
Member

+1 for backporting. Thank you for pinging me, @MaxGekk .

To @jonathan-albrecht-ibm , yes, we need a new PR in order to make it sure that it passes in branch-3.5.

jonathan-albrecht-ibm added a commit to linux-on-ibm-z/spark that referenced this pull request Feb 12, 2025
…als with one call in Unsafe* classes

### What changes were proposed in this pull request?

Write the month and days fields of intervals with one call to Platform.put/getLong() instead of two calls to Platform.put/getInt().

In commit ac07cea there was a performance improvement to reading a writing CalendarIntervals in UnsafeRow. This makes writing intervals consistent with UnsafeRow and has better performance compared to the original code.

This also fixes big endian platforms where the old (two calls to getput) and new methods of reading and writing CalendarIntervals do not order the bytes in the same way. Currently CalendarInterval related tests in Catalyst and SQL are failing on big endian platforms.

There is no effect on little endian platforms (byte order is not affected) except for performance improvement.

### Why are the changes needed?

* Improves performance reading and writing CalendarIntervals in Unsafe* classes
* Fixes big endian platforms where CalendarIntervals are not read or written correctly in Unsafe* classes

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing unit tests on big and little endian platforms

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#49737 from jonathan-albrecht-ibm/master-endian-interval.

Authored-by: Jonathan Albrecht <jonathan.albrecht@ibm.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
dongjoon-hyun pushed a commit that referenced this pull request Feb 12, 2025
…ntervals with one call in Unsafe* classes

This is the branch-3.5 backport of #49737

### What changes were proposed in this pull request?

Write the month and days fields of intervals with one call to Platform.put/getLong() instead of two calls to Platform.put/getInt().

In commit ac07cea there was a performance improvement to reading a writing CalendarIntervals in UnsafeRow. This makes writing intervals consistent with UnsafeRow and has better performance compared to the original code.

This also fixes big endian platforms where the old (two calls to getput) and new methods of reading and writing CalendarIntervals do not order the bytes in the same way. Currently CalendarInterval related tests in Catalyst and SQL are failing on big endian platforms.

There is no effect on little endian platforms (byte order is not affected) except for performance improvement.

### Why are the changes needed?

* Improves performance reading and writing CalendarIntervals in Unsafe* classes
* Fixes big endian platforms where CalendarIntervals are not read or written correctly in Unsafe* classes

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing unit tests on big and little endian platforms

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #49912 from jonathan-albrecht-ibm/branch-3.5-endian-interval.

Authored-by: Jonathan Albrecht <jonathan.albrecht@ibm.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
zifeif2 pushed a commit to zifeif2/spark that referenced this pull request Nov 14, 2025
…als with one call in Unsafe* classes

### What changes were proposed in this pull request?

Write the month and days fields of intervals with one call to Platform.put/getLong() instead of two calls to Platform.put/getInt().

In commit 86d3e80 there was a performance improvement to reading a writing CalendarIntervals in UnsafeRow. This makes writing intervals consistent with UnsafeRow and has better performance compared to the original code.

This also fixes big endian platforms where the old (two calls to getput) and new methods of reading and writing CalendarIntervals do not order the bytes in the same way. Currently CalendarInterval related tests in Catalyst and SQL are failing on big endian platforms.

There is no effect on little endian platforms (byte order is not affected) except for performance improvement.

### Why are the changes needed?

* Improves performance reading and writing CalendarIntervals in Unsafe* classes
* Fixes big endian platforms where CalendarIntervals are not read or written correctly in Unsafe* classes

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing unit tests on big and little endian platforms

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#49737 from jonathan-albrecht-ibm/master-endian-interval.

Authored-by: Jonathan Albrecht <jonathan.albrecht@ibm.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
(cherry picked from commit 05ebdf1)
Signed-off-by: Max Gekk <max.gekk@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants