-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Make decimal multiplication allow precision-loss in DataFusion #6103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| sum(l_extendedprice) as sum_base_price, | ||
| sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, | ||
| sum(cast(l_extendedprice as decimal(12,2)) * (1 - l_discount) * (1 + l_tax)) as sum_charge, | ||
| sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the additional cast now.
|
|
||
| pub(crate) fn multiply_dyn_decimal( | ||
| /// Remove this once arrow-rs provides `multiply_fixed_point_dyn`. | ||
| fn math_op_dict<K, T, F>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copied from arrow-rs. Once multiply_fixed_point_dyn is provided by arrow-rs. We can remove this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find any reference to multiply_fixed_point_dyn in arrow-rs -- is there an existing ticket?
https://github.com/search?q=repo%3Aapache%2Farrow-rs%20multiply_fixed_point_dyn&type=code
|
|
||
| /// Divide a decimal native value by given divisor and round the result. | ||
| /// Remove this once arrow-rs provides `multiply_fixed_point_dyn`. | ||
| fn divide_and_round<I>(input: I::Native, div: I::Native) -> I::Native |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
| } | ||
|
|
||
| /// Remove this once arrow-rs provides `multiply_fixed_point_dyn`. | ||
| fn multiply_fixed_point_dyn( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to submit this to arrow-rs. Having it here first to verify and show it work.
95f787d to
46203f1
Compare
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me -- thank you @viirya
The one thing I didn't see was tests (other than the change to the tpch.slt test)
I think there is value adding additional tests so that if we break this code accidentally in a future refactor the tests will fail too.
Could you please file / add a link to the upstream arrow-rs ticket that tracks adding multiply_fixed_point_dyn or whatever else is needed to remove the copy in DataFusion?
|
|
||
| pub(crate) fn multiply_dyn_decimal( | ||
| /// Remove this once arrow-rs provides `multiply_fixed_point_dyn`. | ||
| fn math_op_dict<K, T, F>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find any reference to multiply_fixed_point_dyn in arrow-rs -- is there an existing ticket?
https://github.com/search?q=repo%3Aapache%2Farrow-rs%20multiply_fixed_point_dyn&type=code
| } | ||
| } | ||
|
|
||
| pub(crate) fn multiply_dyn_decimal( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great to eventually get ride of the decimal variants of the dyn kernels and simply use the mul_dyn
Yea, I have unit test for the kernel in the patch for arrow-rs. Let me also add it here too.
Yea, not filing a arrow-rs ticket yet, just first wanted to verify it pass all DataFusion existing tests. I will file one and link it here. |
5c9b304 to
36bdb44
Compare
|
The upstream ticket: apache/arrow-rs#4135 |
| let left = left.as_any().downcast_ref::<Decimal128Array>().unwrap(); | ||
| let right = right.as_any().downcast_ref::<Decimal128Array>().unwrap(); | ||
|
|
||
| Ok(multiply_fixed_point(left, right, required_scale) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was confused for a while why there is so much code for the dictionary case but the normal decimal case is calling multiply_fixed_point
ThenI see that part of the issue is the required precision / scale calculation is duplicated
https://github.com/apache/arrow-rs/blob/9fa8125fbe14a3a85b4995617945bda51ee3b055/arrow-arith/src/arithmetic.rs#L1508-L1528
I think this is good for DataFusion, and I'll comment on apache/arrow-rs#4136 about removing the duplication
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
|
Thanks @alamb |
Which issue does this PR close?
Closes #6068.
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?