New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-48019][SQL][FOLLOWUP] Use primitive arrays over object arrays when nulls exist #46372
Conversation
@cloud-fan Could you take a look at this followup to improve the primitive array with nulls issue. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for making a PR, @gene-db .
- Could you provide a test case to validate your contribution and to prevent any future regression at this area?
- I'd like to recommend to use
[SQL]
in SQL module PR title next time.
Yes, the previous PR added many tests in this area, to make sure it is correct. The reason I am updating in this PR is to go back to using primitive arrays (and still produce correct results with the tests), which are faster than the object ones.
Ok, I will add the appropriate module name next time. Thanks! |
To @gene-db , for the following purpose, you should not use
|
Please revise the PR title and description. |
@dongjoon-hyun Updated the title and description. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to change it back to use primitive arrays!
thanks, merging to master/3.5! |
…when nulls exist ### What changes were proposed in this pull request? This is a followup to #46254 . Instead of using object arrays when nulls are present, continue to use primitive arrays when appropriate. This PR sets the null bits appropriately for the primitive array copy. Primitive arrays are faster than object arrays and won't create unnecessary objects. ### Why are the changes needed? This will improve performance and memory usage, when nulls are present in the `ColumnarArray`. ### Does this PR introduce _any_ user-facing change? This is expected to be faster when copying `ColumnarArray`. ### How was this patch tested? Existing tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46372 from gene-db/primitive-nulls. Authored-by: Gene Pang <gene.pang@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit bf2e254) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…when nulls exist ### What changes were proposed in this pull request? This is a followup to apache#46254 . Instead of using object arrays when nulls are present, continue to use primitive arrays when appropriate. This PR sets the null bits appropriately for the primitive array copy. Primitive arrays are faster than object arrays and won't create unnecessary objects. ### Why are the changes needed? This will improve performance and memory usage, when nulls are present in the `ColumnarArray`. ### Does this PR introduce _any_ user-facing change? This is expected to be faster when copying `ColumnarArray`. ### How was this patch tested? Existing tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#46372 from gene-db/primitive-nulls. Authored-by: Gene Pang <gene.pang@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
This is a followup to #46254 . Instead of using object arrays when nulls are present, continue to use primitive arrays when appropriate. This PR sets the null bits appropriately for the primitive array copy.
Primitive arrays are faster than object arrays and won't create unnecessary objects.
Why are the changes needed?
This will improve performance and memory usage, when nulls are present in the
ColumnarArray
.Does this PR introduce any user-facing change?
This is expected to be faster when copying
ColumnarArray
.How was this patch tested?
Existing tests.
Was this patch authored or co-authored using generative AI tooling?
No.