Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] Decimal-to-real accuracy loss / rounding issue #35942

Closed
pitrou opened this issue Jun 6, 2023 · 1 comment · Fixed by #36667
Closed

[C++] Decimal-to-real accuracy loss / rounding issue #35942

pitrou opened this issue Jun 6, 2023 · 1 comment · Fixed by #36667
Assignees
Milestone

Comments

@pitrou
Copy link
Member

pitrou commented Jun 6, 2023

Describe the bug, including details regarding any error messages, version, and platform.

It seems the expected result from converting a decimal to real may be off by one ULP:

>>> a = pa.array(['999999999999999.9']).cast(pa.decimal128(17,1))
>>> a
<pyarrow.lib.Decimal128Array object at 0x7f8ec6f3f6a0>
[
  999999999999999.9
]
>>> a.cast(pa.float64())
<pyarrow.lib.DoubleArray object at 0x7f8fa4a88ee0>
[
  1e+15
]

>>> expected = 999999999999999.9
>>> actual = a.cast(pa.float64())[0].as_py()
>>> abs(expected-actual)/expected
1.25e-16
>>> abs(expected-actual) == expected - math.nextafter(expected, 0)
True  # Exactly one ULP

This is much less severe than #35576

Component(s)

C++

@js8544
Copy link
Collaborator

js8544 commented Jul 17, 2023

take

pitrou added a commit that referenced this issue Jul 18, 2023
### Rationale for this change

The current implementation of `Decimal::ToReal` can be naively represented as the following pseudocode:
```
Real v = static_cast<Real>(decimal.as_int128/256())
return v * (10.0**-scale)
```
It stores the intermediate unscaled int128/256 value as a float/double. The unscaled int128/256 value can be very large when the decimal has a large scale, which causes precision issues such as in #36602.

### What changes are included in this PR?

Avoid storing the unscaled large int as float if the representation is not precise, by spliting the decimal into integral and fractional parts and dealing with them separately. This algorithm guarantees that:
1. If the decimal is an integer, the conversion is exact.
2. If the number of fractional digits is <= RealTraits<Real>::kMantissaDigits (e.g. 8 for float and 16 for double), the conversion is within 1 ULP of the exact value. For example Decimal128::ToReal<float>(9999.999) falls into this category because the integer 9999999 is precisely representable by float, whereas 9999.9999 would be in the next category.
3. Otherwise, the conversion is within 2^(-RealTraits<Real>::kMantissaDigits+1) (e.g. 2^-23 for float and 2^-52 for double) of the exact value.

Here "exact value" means the closest representable value by Real.

I believe this algorithm is good enough, because an"exact" algorithm would require iterative multiplication and subtraction of decimals to determain the binary representation of its fractional part. Yet the result would still almost always be inaccurate because float/double can only accurately represent powers of two. IMHO It's not worth it to spend that many expensive operations just to improve the result by one ULP.

### Are these changes tested?

Yes.

### Are there any user-facing changes?
 No.

* Closes: #35942 

Lead-authored-by: Jin Shang <shangjin1997@gmail.com>
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Signed-off-by: Antoine Pitrou <antoine@python.org>
@pitrou pitrou added this to the 14.0.0 milestone Jul 18, 2023
chelseajonesr pushed a commit to chelseajonesr/arrow that referenced this issue Jul 20, 2023
### Rationale for this change

The current implementation of `Decimal::ToReal` can be naively represented as the following pseudocode:
```
Real v = static_cast<Real>(decimal.as_int128/256())
return v * (10.0**-scale)
```
It stores the intermediate unscaled int128/256 value as a float/double. The unscaled int128/256 value can be very large when the decimal has a large scale, which causes precision issues such as in apache#36602.

### What changes are included in this PR?

Avoid storing the unscaled large int as float if the representation is not precise, by spliting the decimal into integral and fractional parts and dealing with them separately. This algorithm guarantees that:
1. If the decimal is an integer, the conversion is exact.
2. If the number of fractional digits is <= RealTraits<Real>::kMantissaDigits (e.g. 8 for float and 16 for double), the conversion is within 1 ULP of the exact value. For example Decimal128::ToReal<float>(9999.999) falls into this category because the integer 9999999 is precisely representable by float, whereas 9999.9999 would be in the next category.
3. Otherwise, the conversion is within 2^(-RealTraits<Real>::kMantissaDigits+1) (e.g. 2^-23 for float and 2^-52 for double) of the exact value.

Here "exact value" means the closest representable value by Real.

I believe this algorithm is good enough, because an"exact" algorithm would require iterative multiplication and subtraction of decimals to determain the binary representation of its fractional part. Yet the result would still almost always be inaccurate because float/double can only accurately represent powers of two. IMHO It's not worth it to spend that many expensive operations just to improve the result by one ULP.

### Are these changes tested?

Yes.

### Are there any user-facing changes?
 No.

* Closes: apache#35942 

Lead-authored-by: Jin Shang <shangjin1997@gmail.com>
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Signed-off-by: Antoine Pitrou <antoine@python.org>
R-JunmingChen pushed a commit to R-JunmingChen/arrow that referenced this issue Aug 20, 2023
### Rationale for this change

The current implementation of `Decimal::ToReal` can be naively represented as the following pseudocode:
```
Real v = static_cast<Real>(decimal.as_int128/256())
return v * (10.0**-scale)
```
It stores the intermediate unscaled int128/256 value as a float/double. The unscaled int128/256 value can be very large when the decimal has a large scale, which causes precision issues such as in apache#36602.

### What changes are included in this PR?

Avoid storing the unscaled large int as float if the representation is not precise, by spliting the decimal into integral and fractional parts and dealing with them separately. This algorithm guarantees that:
1. If the decimal is an integer, the conversion is exact.
2. If the number of fractional digits is <= RealTraits<Real>::kMantissaDigits (e.g. 8 for float and 16 for double), the conversion is within 1 ULP of the exact value. For example Decimal128::ToReal<float>(9999.999) falls into this category because the integer 9999999 is precisely representable by float, whereas 9999.9999 would be in the next category.
3. Otherwise, the conversion is within 2^(-RealTraits<Real>::kMantissaDigits+1) (e.g. 2^-23 for float and 2^-52 for double) of the exact value.

Here "exact value" means the closest representable value by Real.

I believe this algorithm is good enough, because an"exact" algorithm would require iterative multiplication and subtraction of decimals to determain the binary representation of its fractional part. Yet the result would still almost always be inaccurate because float/double can only accurately represent powers of two. IMHO It's not worth it to spend that many expensive operations just to improve the result by one ULP.

### Are these changes tested?

Yes.

### Are there any user-facing changes?
 No.

* Closes: apache#35942 

Lead-authored-by: Jin Shang <shangjin1997@gmail.com>
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants