-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-26001][SQL]Reduce memory copy when writing decimal #22998
Conversation
cc @mgaido91, @dongjoon-hyun , @cloud-fan , @kiszk |
Can one of the admins verify this patch? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to me very similar to what was done in #20850. That caused correctness issues fixed in d7ae36a. The point is that in the same column, there may be decimals occupying less than 8 bytes and more then 16. So if you write a decimal with 16 bytes and then one with less than 8, with the current change the remaining 8 bytes would remain dirty. Hence I don't think we should do this as it may introduce correctness issues. Please correct me if my understanding is wrong.
I have two questions.
|
bab69d4
to
1a9b34b
Compare
@mgaido91 |
@kiszk thank you for review it.
|
I think this is wrong. We have to zero out the bytes even writing a null decimal, so that 2 unsafe rows with same values(including null values) are exactly same(in binary format). |
yes, I agree with @cloud-fan , this can create wrong results with nulls... |
What changes were proposed in this pull request?
this PR fix 2 here:
when writing non-null decimals, we not zero-out all the 16 allocated bytes. if the number of bytes needed for a decimal is greater than 8. then we not need zero-out between 0-byte and 8-byte. The first 8-byte will be covered when writing decimal.
when writing null decimals, we not zero-out all the 16 allocated bytes. BitSetMethods.set the label for null and the length of decimal to 0. when we get the decimal, will not access the 16 byte memory value, so this is safe.
How was this patch tested?
the existed test cases.