[C++] power_checked incorrectly returns NaN #36602

rohanjain101 · 2023-07-10T19:02:26Z

Describe the bug, including details regarding any error messages, version, and platform.

>>> a = pa.array([-117], type=pa.int8())
>>> b = pa.array([7], type=pa.decimal128(38,18))
>>> pa.compute.power_checked(a, b)
<pyarrow.lib.DoubleArray object at 0x000001F398AF9F00>
[
  nan
]
>>>

I would expect this to return -300124211606973, not NaN. If B is re-scaled, then it returns correct result:

>>> b = pa.array([7], type=pa.decimal128(3,2))
>>> pa.compute.power_checked(a, b)
<pyarrow.lib.DoubleArray object at 0x000001F398AF8DC0>
[
  -3.00124211606973e+14
]
>>>

Component(s)

Python

The text was updated successfully, but these errors were encountered:

js8544 · 2023-07-12T08:41:41Z

It was due to an inaccurate cast from decimal to float.

>>> b = pa.array([7], type=pa.decimal128(38,18))
>>> pc.cast(b, pa.float64())
<pyarrow.lib.DoubleArray object at 0x7fbb78518d00>
[
  7.000000000000001
]

Powering a negative number to a non-integer number results in NaN.
The casting issue is solved by #35997, so this error won't happen any more starting from release 13.0.

rohanjain101 · 2023-07-12T21:48:00Z

Thanks for checking, do you know when release 13.0 will be available?

mapleFU · 2023-07-13T03:02:22Z

Would coming soon, 13.0.0 code freeze seems done yet, I guess it could be release about this month

js8544 · 2023-07-13T03:04:07Z

Thanks for checking, do you know when release 13.0 will be available?

Actually, I was wrong. The PR I mentioned solves Float->Decimal, not Decimal->Float in your case. This is another issue #35942 which hasn't been resolved yet. I'll take a look at this issue today.

### Rationale for this change The current implementation of `Decimal::ToReal` can be naively represented as the following pseudocode: ``` Real v = static_cast<Real>(decimal.as_int128/256()) return v * (10.0**-scale) ``` It stores the intermediate unscaled int128/256 value as a float/double. The unscaled int128/256 value can be very large when the decimal has a large scale, which causes precision issues such as in #36602. ### What changes are included in this PR? Avoid storing the unscaled large int as float if the representation is not precise, by spliting the decimal into integral and fractional parts and dealing with them separately. This algorithm guarantees that: 1. If the decimal is an integer, the conversion is exact. 2. If the number of fractional digits is <= RealTraits<Real>::kMantissaDigits (e.g. 8 for float and 16 for double), the conversion is within 1 ULP of the exact value. For example Decimal128::ToReal<float>(9999.999) falls into this category because the integer 9999999 is precisely representable by float, whereas 9999.9999 would be in the next category. 3. Otherwise, the conversion is within 2^(-RealTraits<Real>::kMantissaDigits+1) (e.g. 2^-23 for float and 2^-52 for double) of the exact value. Here "exact value" means the closest representable value by Real. I believe this algorithm is good enough, because an"exact" algorithm would require iterative multiplication and subtraction of decimals to determain the binary representation of its fractional part. Yet the result would still almost always be inaccurate because float/double can only accurately represent powers of two. IMHO It's not worth it to spend that many expensive operations just to improve the result by one ULP. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * Closes: #35942 Lead-authored-by: Jin Shang <shangjin1997@gmail.com> Co-authored-by: Antoine Pitrou <pitrou@free.fr> Signed-off-by: Antoine Pitrou <antoine@python.org>

js8544 · 2023-07-19T09:44:37Z

@rohanjain101 Hi, this was fixed in #36667, but will unfortunately only be available since pyarrow 14, which should be released in a few months. If you need this fix now, you can use the nightly pyarrow package.

### Rationale for this change The current implementation of `Decimal::ToReal` can be naively represented as the following pseudocode: ``` Real v = static_cast<Real>(decimal.as_int128/256()) return v * (10.0**-scale) ``` It stores the intermediate unscaled int128/256 value as a float/double. The unscaled int128/256 value can be very large when the decimal has a large scale, which causes precision issues such as in apache#36602. ### What changes are included in this PR? Avoid storing the unscaled large int as float if the representation is not precise, by spliting the decimal into integral and fractional parts and dealing with them separately. This algorithm guarantees that: 1. If the decimal is an integer, the conversion is exact. 2. If the number of fractional digits is <= RealTraits<Real>::kMantissaDigits (e.g. 8 for float and 16 for double), the conversion is within 1 ULP of the exact value. For example Decimal128::ToReal<float>(9999.999) falls into this category because the integer 9999999 is precisely representable by float, whereas 9999.9999 would be in the next category. 3. Otherwise, the conversion is within 2^(-RealTraits<Real>::kMantissaDigits+1) (e.g. 2^-23 for float and 2^-52 for double) of the exact value. Here "exact value" means the closest representable value by Real. I believe this algorithm is good enough, because an"exact" algorithm would require iterative multiplication and subtraction of decimals to determain the binary representation of its fractional part. Yet the result would still almost always be inaccurate because float/double can only accurately represent powers of two. IMHO It's not worth it to spend that many expensive operations just to improve the result by one ULP. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * Closes: apache#35942 Lead-authored-by: Jin Shang <shangjin1997@gmail.com> Co-authored-by: Antoine Pitrou <pitrou@free.fr> Signed-off-by: Antoine Pitrou <antoine@python.org>

rohanjain101 added the Type: bug label Jul 10, 2023

github-actions bot added the Component: Python label Jul 10, 2023

westonpace changed the title ~~power_checked incorrectly returns NaN~~ [C++] power_checked incorrectly returns NaN Jul 11, 2023

westonpace added the Component: C++ label Jul 11, 2023

js8544 mentioned this issue Jul 13, 2023

GH-35942: [C++] Improve Decimal ToReal accuracy #36667

Merged

pitrou closed this as not planned Won't fix, can't repro, duplicate, stale Jul 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[C++] power_checked incorrectly returns NaN #36602

[C++] power_checked incorrectly returns NaN #36602

rohanjain101 commented Jul 10, 2023

js8544 commented Jul 12, 2023

rohanjain101 commented Jul 12, 2023

mapleFU commented Jul 13, 2023

js8544 commented Jul 13, 2023 •

edited

Loading

js8544 commented Jul 19, 2023

[C++] power_checked incorrectly returns NaN #36602

[C++] power_checked incorrectly returns NaN #36602

Comments

rohanjain101 commented Jul 10, 2023

Describe the bug, including details regarding any error messages, version, and platform.

Component(s)

js8544 commented Jul 12, 2023

rohanjain101 commented Jul 12, 2023

mapleFU commented Jul 13, 2023

js8544 commented Jul 13, 2023 • edited Loading

js8544 commented Jul 19, 2023

js8544 commented Jul 13, 2023 •

edited

Loading