-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++] power_checked incorrectly returns NaN #36602
Comments
It was due to an inaccurate cast from decimal to float. >>> b = pa.array([7], type=pa.decimal128(38,18))
>>> pc.cast(b, pa.float64())
<pyarrow.lib.DoubleArray object at 0x7fbb78518d00>
[
7.000000000000001
] Powering a negative number to a non-integer number results in NaN. |
Thanks for checking, do you know when release 13.0 will be available? |
Would coming soon, 13.0.0 code freeze seems done yet, I guess it could be release about this month |
Actually, I was wrong. The PR I mentioned solves Float->Decimal, not Decimal->Float in your case. This is another issue #35942 which hasn't been resolved yet. I'll take a look at this issue today. |
### Rationale for this change The current implementation of `Decimal::ToReal` can be naively represented as the following pseudocode: ``` Real v = static_cast<Real>(decimal.as_int128/256()) return v * (10.0**-scale) ``` It stores the intermediate unscaled int128/256 value as a float/double. The unscaled int128/256 value can be very large when the decimal has a large scale, which causes precision issues such as in #36602. ### What changes are included in this PR? Avoid storing the unscaled large int as float if the representation is not precise, by spliting the decimal into integral and fractional parts and dealing with them separately. This algorithm guarantees that: 1. If the decimal is an integer, the conversion is exact. 2. If the number of fractional digits is <= RealTraits<Real>::kMantissaDigits (e.g. 8 for float and 16 for double), the conversion is within 1 ULP of the exact value. For example Decimal128::ToReal<float>(9999.999) falls into this category because the integer 9999999 is precisely representable by float, whereas 9999.9999 would be in the next category. 3. Otherwise, the conversion is within 2^(-RealTraits<Real>::kMantissaDigits+1) (e.g. 2^-23 for float and 2^-52 for double) of the exact value. Here "exact value" means the closest representable value by Real. I believe this algorithm is good enough, because an"exact" algorithm would require iterative multiplication and subtraction of decimals to determain the binary representation of its fractional part. Yet the result would still almost always be inaccurate because float/double can only accurately represent powers of two. IMHO It's not worth it to spend that many expensive operations just to improve the result by one ULP. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * Closes: #35942 Lead-authored-by: Jin Shang <shangjin1997@gmail.com> Co-authored-by: Antoine Pitrou <pitrou@free.fr> Signed-off-by: Antoine Pitrou <antoine@python.org>
@rohanjain101 Hi, this was fixed in #36667, but will unfortunately only be available since pyarrow 14, which should be released in a few months. If you need this fix now, you can use the nightly pyarrow package. |
### Rationale for this change The current implementation of `Decimal::ToReal` can be naively represented as the following pseudocode: ``` Real v = static_cast<Real>(decimal.as_int128/256()) return v * (10.0**-scale) ``` It stores the intermediate unscaled int128/256 value as a float/double. The unscaled int128/256 value can be very large when the decimal has a large scale, which causes precision issues such as in apache#36602. ### What changes are included in this PR? Avoid storing the unscaled large int as float if the representation is not precise, by spliting the decimal into integral and fractional parts and dealing with them separately. This algorithm guarantees that: 1. If the decimal is an integer, the conversion is exact. 2. If the number of fractional digits is <= RealTraits<Real>::kMantissaDigits (e.g. 8 for float and 16 for double), the conversion is within 1 ULP of the exact value. For example Decimal128::ToReal<float>(9999.999) falls into this category because the integer 9999999 is precisely representable by float, whereas 9999.9999 would be in the next category. 3. Otherwise, the conversion is within 2^(-RealTraits<Real>::kMantissaDigits+1) (e.g. 2^-23 for float and 2^-52 for double) of the exact value. Here "exact value" means the closest representable value by Real. I believe this algorithm is good enough, because an"exact" algorithm would require iterative multiplication and subtraction of decimals to determain the binary representation of its fractional part. Yet the result would still almost always be inaccurate because float/double can only accurately represent powers of two. IMHO It's not worth it to spend that many expensive operations just to improve the result by one ULP. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * Closes: apache#35942 Lead-authored-by: Jin Shang <shangjin1997@gmail.com> Co-authored-by: Antoine Pitrou <pitrou@free.fr> Signed-off-by: Antoine Pitrou <antoine@python.org>
### Rationale for this change The current implementation of `Decimal::ToReal` can be naively represented as the following pseudocode: ``` Real v = static_cast<Real>(decimal.as_int128/256()) return v * (10.0**-scale) ``` It stores the intermediate unscaled int128/256 value as a float/double. The unscaled int128/256 value can be very large when the decimal has a large scale, which causes precision issues such as in apache#36602. ### What changes are included in this PR? Avoid storing the unscaled large int as float if the representation is not precise, by spliting the decimal into integral and fractional parts and dealing with them separately. This algorithm guarantees that: 1. If the decimal is an integer, the conversion is exact. 2. If the number of fractional digits is <= RealTraits<Real>::kMantissaDigits (e.g. 8 for float and 16 for double), the conversion is within 1 ULP of the exact value. For example Decimal128::ToReal<float>(9999.999) falls into this category because the integer 9999999 is precisely representable by float, whereas 9999.9999 would be in the next category. 3. Otherwise, the conversion is within 2^(-RealTraits<Real>::kMantissaDigits+1) (e.g. 2^-23 for float and 2^-52 for double) of the exact value. Here "exact value" means the closest representable value by Real. I believe this algorithm is good enough, because an"exact" algorithm would require iterative multiplication and subtraction of decimals to determain the binary representation of its fractional part. Yet the result would still almost always be inaccurate because float/double can only accurately represent powers of two. IMHO It's not worth it to spend that many expensive operations just to improve the result by one ULP. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * Closes: apache#35942 Lead-authored-by: Jin Shang <shangjin1997@gmail.com> Co-authored-by: Antoine Pitrou <pitrou@free.fr> Signed-off-by: Antoine Pitrou <antoine@python.org>
Describe the bug, including details regarding any error messages, version, and platform.
I would expect this to return -300124211606973, not NaN. If B is re-scaled, then it returns correct result:
Component(s)
Python
The text was updated successfully, but these errors were encountered: