[C++] Decimal-to-real accuracy loss / rounding issue #35942

pitrou · 2023-06-06T13:11:57Z

Describe the bug, including details regarding any error messages, version, and platform.

It seems the expected result from converting a decimal to real may be off by one ULP:

>>> a = pa.array(['999999999999999.9']).cast(pa.decimal128(17,1))
>>> a
<pyarrow.lib.Decimal128Array object at 0x7f8ec6f3f6a0>
[
  999999999999999.9
]
>>> a.cast(pa.float64())
<pyarrow.lib.DoubleArray object at 0x7f8fa4a88ee0>
[
  1e+15
]

>>> expected = 999999999999999.9
>>> actual = a.cast(pa.float64())[0].as_py()
>>> abs(expected-actual)/expected
1.25e-16
>>> abs(expected-actual) == expected - math.nextafter(expected, 0)
True  # Exactly one ULP

This is much less severe than #35576

Component(s)

C++

js8544 · 2023-07-17T08:53:04Z

take

### Rationale for this change The current implementation of `Decimal::ToReal` can be naively represented as the following pseudocode: ``` Real v = static_cast<Real>(decimal.as_int128/256()) return v * (10.0**-scale) ``` It stores the intermediate unscaled int128/256 value as a float/double. The unscaled int128/256 value can be very large when the decimal has a large scale, which causes precision issues such as in #36602. ### What changes are included in this PR? Avoid storing the unscaled large int as float if the representation is not precise, by spliting the decimal into integral and fractional parts and dealing with them separately. This algorithm guarantees that: 1. If the decimal is an integer, the conversion is exact. 2. If the number of fractional digits is <= RealTraits<Real>::kMantissaDigits (e.g. 8 for float and 16 for double), the conversion is within 1 ULP of the exact value. For example Decimal128::ToReal<float>(9999.999) falls into this category because the integer 9999999 is precisely representable by float, whereas 9999.9999 would be in the next category. 3. Otherwise, the conversion is within 2^(-RealTraits<Real>::kMantissaDigits+1) (e.g. 2^-23 for float and 2^-52 for double) of the exact value. Here "exact value" means the closest representable value by Real. I believe this algorithm is good enough, because an"exact" algorithm would require iterative multiplication and subtraction of decimals to determain the binary representation of its fractional part. Yet the result would still almost always be inaccurate because float/double can only accurately represent powers of two. IMHO It's not worth it to spend that many expensive operations just to improve the result by one ULP. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * Closes: #35942 Lead-authored-by: Jin Shang <shangjin1997@gmail.com> Co-authored-by: Antoine Pitrou <pitrou@free.fr> Signed-off-by: Antoine Pitrou <antoine@python.org>

### Rationale for this change The current implementation of `Decimal::ToReal` can be naively represented as the following pseudocode: ``` Real v = static_cast<Real>(decimal.as_int128/256()) return v * (10.0**-scale) ``` It stores the intermediate unscaled int128/256 value as a float/double. The unscaled int128/256 value can be very large when the decimal has a large scale, which causes precision issues such as in apache#36602. ### What changes are included in this PR? Avoid storing the unscaled large int as float if the representation is not precise, by spliting the decimal into integral and fractional parts and dealing with them separately. This algorithm guarantees that: 1. If the decimal is an integer, the conversion is exact. 2. If the number of fractional digits is <= RealTraits<Real>::kMantissaDigits (e.g. 8 for float and 16 for double), the conversion is within 1 ULP of the exact value. For example Decimal128::ToReal<float>(9999.999) falls into this category because the integer 9999999 is precisely representable by float, whereas 9999.9999 would be in the next category. 3. Otherwise, the conversion is within 2^(-RealTraits<Real>::kMantissaDigits+1) (e.g. 2^-23 for float and 2^-52 for double) of the exact value. Here "exact value" means the closest representable value by Real. I believe this algorithm is good enough, because an"exact" algorithm would require iterative multiplication and subtraction of decimals to determain the binary representation of its fractional part. Yet the result would still almost always be inaccurate because float/double can only accurately represent powers of two. IMHO It's not worth it to spend that many expensive operations just to improve the result by one ULP. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * Closes: apache#35942 Lead-authored-by: Jin Shang <shangjin1997@gmail.com> Co-authored-by: Antoine Pitrou <pitrou@free.fr> Signed-off-by: Antoine Pitrou <antoine@python.org>

pitrou added the Type: bug label Jun 6, 2023

github-actions bot added the Component: C++ label Jun 6, 2023

js8544 mentioned this issue Jul 13, 2023

[C++] power_checked incorrectly returns NaN #36602

Closed

js8544 added a commit to js8544/arrow that referenced this issue Jul 13, 2023

apacheGH-35942: [C++] Improve Decimal ToReal accuracy

2d05cb0

github-actions bot mentioned this issue Jul 13, 2023

GH-35942: [C++] Improve Decimal ToReal accuracy #36667

Merged

mapleFU mentioned this issue Jul 17, 2023

[CI] GitHub bot fails to assign issue to PR creator #36711

Closed

github-actions bot assigned js8544 Jul 17, 2023

pitrou closed this as completed in #36667 Jul 18, 2023

pitrou added this to the 14.0.0 milestone Jul 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[C++] Decimal-to-real accuracy loss / rounding issue #35942

[C++] Decimal-to-real accuracy loss / rounding issue #35942

pitrou commented Jun 6, 2023 •

edited

Loading

js8544 commented Jul 17, 2023

[C++] Decimal-to-real accuracy loss / rounding issue #35942

[C++] Decimal-to-real accuracy loss / rounding issue #35942

Comments

pitrou commented Jun 6, 2023 • edited Loading

Describe the bug, including details regarding any error messages, version, and platform.

Component(s)

js8544 commented Jul 17, 2023

pitrou commented Jun 6, 2023 •

edited

Loading