Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-26001][SQL]Reduce memory copy when writing decimal #22998

Closed
wants to merge 1 commit into from

Conversation

heary-cao
Copy link
Contributor

What changes were proposed in this pull request?

this PR fix 2 here:

  • when writing non-null decimals, we not zero-out all the 16 allocated bytes. if the number of bytes needed for a decimal is greater than 8. then we not need zero-out between 0-byte and 8-byte. The first 8-byte will be covered when writing decimal.

  • when writing null decimals, we not zero-out all the 16 allocated bytes. BitSetMethods.set the label for null and the length of decimal to 0. when we get the decimal, will not access the 16 byte memory value, so this is safe.

How was this patch tested?

the existed test cases.

@heary-cao
Copy link
Contributor Author

cc @mgaido91, @dongjoon-hyun , @cloud-fan , @kiszk

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

Copy link
Contributor

@mgaido91 mgaido91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to me very similar to what was done in #20850. That caused correctness issues fixed in d7ae36a. The point is that in the same column, there may be decimals occupying less than 8 bytes and more then 16. So if you write a decimal with 16 bytes and then one with less than 8, with the current change the remaining 8 bytes would remain dirty. Hence I don't think we should do this as it may introduce correctness issues. Please correct me if my understanding is wrong.

@kiszk
Copy link
Member

kiszk commented Nov 10, 2018

I have two questions.

  1. Is this PR already tested with "SPARK-25538: zero-out all bits for decimals"?
  2. How does this PR achieve performance improvement? This PR may introduce some complication. We would like to know the trade-off between performance and ease of understanding.

@heary-cao
Copy link
Contributor Author

@mgaido91
thank you for review it. I added a test case to test "write a decimal with 16 bytes and then one with less than 8". then the current change the remaining 8 bytes would not dirty.

@heary-cao
Copy link
Contributor Author

heary-cao commented Nov 12, 2018

@kiszk thank you for review it.

  • when writing null decimals:
OpenJDK 64-Bit Server VM 1.8.0_163-b01 on Windows 7 6.1
Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
iter length 1048576:                     Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
before PR (input == null)                       51 /   56         20.4          49.0       1.0X
after PR (input == null)                         8 /    9        125.2           8.0       6.1X
  • when writing non-null decimals:
OpenJDK 64-Bit Server VM 1.8.0_163-b01 on Windows 7 6.1
Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
iter length 1048576:                     Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
before PR (input != null)                       52 /   53         20.3          49.2       1.0X
after PR (input != null)                        54 /   56         19.3          51.7       1.0X

@cloud-fan
Copy link
Contributor

I think this is wrong. We have to zero out the bytes even writing a null decimal, so that 2 unsafe rows with same values(including null values) are exactly same(in binary format).

@mgaido91
Copy link
Contributor

yes, I agree with @cloud-fan , this can create wrong results with nulls...

@heary-cao heary-cao closed this Nov 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants