-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-35942: [C++] Improve Decimal ToReal accuracy #36667
Conversation
|
|
|
|
|
@pitrou Would you mind having a look at this PR? Thanks! |
|
1 similar comment
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @js8544 ! This is a nice improvement.
Real x = RealTraits<Real>::two_to_64(static_cast<Real>(decimal.high_bits())); | ||
x += static_cast<Real>(decimal.low_bits()); | ||
x *= LargePowerOfTen<Real>(-scale); | ||
return x; | ||
} | ||
|
||
/// An appoximate conversion from Decimal128 to Real that guarantees: | ||
/// 1. If the decimal is an integer, the conversion is exact. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if the integer has more than 52 significant bits??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See below
Here "exact value" means the closest representable value by Real.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, ok :-)
cpp/src/arrow/util/decimal_test.cc
Outdated
constexpr Real epsilon = 1.1920928955078125e-07f; // 2^-23 | ||
CheckDecimalToRealWithinEpsilon<Decimal, Real>( | ||
"112334829348925.99070703983306884765625", 23, epsilon, | ||
112334829348925.99070703983306884765625f); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing this as a float
constant is weird. If you want to differentiate these tests between float
and double
, you may use if constexpr
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The weird thing is even though it is parametrized over Real
, these tests were only run for float
. I kept it this way. Perhaps it's better to remove the Real parameter and make it clear that it's only for float?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either that, or try to re-enable running them for double
. Are the tests redundant with other double
tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, double
is tested in TestDecimalToRealDouble::Precision. I added a static_assert at the beginning to make it clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the test parameterization is fake, then let's just remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I've refactored this test as TestDecimalToRealFloat.
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
CI failures are unrelated. |
### Rationale for this change The current implementation of `Decimal::ToReal` can be naively represented as the following pseudocode: ``` Real v = static_cast<Real>(decimal.as_int128/256()) return v * (10.0**-scale) ``` It stores the intermediate unscaled int128/256 value as a float/double. The unscaled int128/256 value can be very large when the decimal has a large scale, which causes precision issues such as in apache#36602. ### What changes are included in this PR? Avoid storing the unscaled large int as float if the representation is not precise, by spliting the decimal into integral and fractional parts and dealing with them separately. This algorithm guarantees that: 1. If the decimal is an integer, the conversion is exact. 2. If the number of fractional digits is <= RealTraits<Real>::kMantissaDigits (e.g. 8 for float and 16 for double), the conversion is within 1 ULP of the exact value. For example Decimal128::ToReal<float>(9999.999) falls into this category because the integer 9999999 is precisely representable by float, whereas 9999.9999 would be in the next category. 3. Otherwise, the conversion is within 2^(-RealTraits<Real>::kMantissaDigits+1) (e.g. 2^-23 for float and 2^-52 for double) of the exact value. Here "exact value" means the closest representable value by Real. I believe this algorithm is good enough, because an"exact" algorithm would require iterative multiplication and subtraction of decimals to determain the binary representation of its fractional part. Yet the result would still almost always be inaccurate because float/double can only accurately represent powers of two. IMHO It's not worth it to spend that many expensive operations just to improve the result by one ULP. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * Closes: apache#35942 Lead-authored-by: Jin Shang <shangjin1997@gmail.com> Co-authored-by: Antoine Pitrou <pitrou@free.fr> Signed-off-by: Antoine Pitrou <antoine@python.org>
After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 245141e. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about possible false positives for unstable benchmarks that are known to sometimes produce them. |
### Rationale for this change The current implementation of `Decimal::ToReal` can be naively represented as the following pseudocode: ``` Real v = static_cast<Real>(decimal.as_int128/256()) return v * (10.0**-scale) ``` It stores the intermediate unscaled int128/256 value as a float/double. The unscaled int128/256 value can be very large when the decimal has a large scale, which causes precision issues such as in apache#36602. ### What changes are included in this PR? Avoid storing the unscaled large int as float if the representation is not precise, by spliting the decimal into integral and fractional parts and dealing with them separately. This algorithm guarantees that: 1. If the decimal is an integer, the conversion is exact. 2. If the number of fractional digits is <= RealTraits<Real>::kMantissaDigits (e.g. 8 for float and 16 for double), the conversion is within 1 ULP of the exact value. For example Decimal128::ToReal<float>(9999.999) falls into this category because the integer 9999999 is precisely representable by float, whereas 9999.9999 would be in the next category. 3. Otherwise, the conversion is within 2^(-RealTraits<Real>::kMantissaDigits+1) (e.g. 2^-23 for float and 2^-52 for double) of the exact value. Here "exact value" means the closest representable value by Real. I believe this algorithm is good enough, because an"exact" algorithm would require iterative multiplication and subtraction of decimals to determain the binary representation of its fractional part. Yet the result would still almost always be inaccurate because float/double can only accurately represent powers of two. IMHO It's not worth it to spend that many expensive operations just to improve the result by one ULP. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * Closes: apache#35942 Lead-authored-by: Jin Shang <shangjin1997@gmail.com> Co-authored-by: Antoine Pitrou <pitrou@free.fr> Signed-off-by: Antoine Pitrou <antoine@python.org>
Rationale for this change
The current implementation of
Decimal::ToReal
can be naively represented as the following pseudocode:It stores the intermediate unscaled int128/256 value as a float/double. The unscaled int128/256 value can be very large when the decimal has a large scale, which causes precision issues such as in #36602.
What changes are included in this PR?
Avoid storing the unscaled large int as float if the representation is not precise, by spliting the decimal into integral and fractional parts and dealing with them separately. This algorithm guarantees that:
Here "exact value" means the closest representable value by Real.
I believe this algorithm is good enough, because an"exact" algorithm would require iterative multiplication and subtraction of decimals to determain the binary representation of its fractional part. Yet the result would still almost always be inaccurate because float/double can only accurately represent powers of two. IMHO It's not worth it to spend that many expensive operations just to improve the result by one ULP.
Are these changes tested?
Yes.
Are there any user-facing changes?
No.