Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] power_checked incorrectly returns NaN #36602

Closed
rohanjain101 opened this issue Jul 10, 2023 · 5 comments
Closed

[C++] power_checked incorrectly returns NaN #36602

rohanjain101 opened this issue Jul 10, 2023 · 5 comments

Comments

@rohanjain101
Copy link

Describe the bug, including details regarding any error messages, version, and platform.

>>> a = pa.array([-117], type=pa.int8())
>>> b = pa.array([7], type=pa.decimal128(38,18))
>>> pa.compute.power_checked(a, b)
<pyarrow.lib.DoubleArray object at 0x000001F398AF9F00>
[
  nan
]
>>>

I would expect this to return -300124211606973, not NaN. If B is re-scaled, then it returns correct result:

>>> b = pa.array([7], type=pa.decimal128(3,2))
>>> pa.compute.power_checked(a, b)
<pyarrow.lib.DoubleArray object at 0x000001F398AF8DC0>
[
  -3.00124211606973e+14
]
>>>

Component(s)

Python

@westonpace westonpace changed the title power_checked incorrectly returns NaN [C++] power_checked incorrectly returns NaN Jul 11, 2023
@js8544
Copy link
Collaborator

js8544 commented Jul 12, 2023

It was due to an inaccurate cast from decimal to float.

>>> b = pa.array([7], type=pa.decimal128(38,18))
>>> pc.cast(b, pa.float64())
<pyarrow.lib.DoubleArray object at 0x7fbb78518d00>
[
  7.000000000000001
]

Powering a negative number to a non-integer number results in NaN.
The casting issue is solved by #35997, so this error won't happen any more starting from release 13.0.

@rohanjain101
Copy link
Author

Thanks for checking, do you know when release 13.0 will be available?

@mapleFU
Copy link
Member

mapleFU commented Jul 13, 2023

Would coming soon, 13.0.0 code freeze seems done yet, I guess it could be release about this month

@js8544
Copy link
Collaborator

js8544 commented Jul 13, 2023

Thanks for checking, do you know when release 13.0 will be available?

Actually, I was wrong. The PR I mentioned solves Float->Decimal, not Decimal->Float in your case. This is another issue #35942 which hasn't been resolved yet. I'll take a look at this issue today.

pitrou added a commit that referenced this issue Jul 18, 2023
### Rationale for this change

The current implementation of `Decimal::ToReal` can be naively represented as the following pseudocode:
```
Real v = static_cast<Real>(decimal.as_int128/256())
return v * (10.0**-scale)
```
It stores the intermediate unscaled int128/256 value as a float/double. The unscaled int128/256 value can be very large when the decimal has a large scale, which causes precision issues such as in #36602.

### What changes are included in this PR?

Avoid storing the unscaled large int as float if the representation is not precise, by spliting the decimal into integral and fractional parts and dealing with them separately. This algorithm guarantees that:
1. If the decimal is an integer, the conversion is exact.
2. If the number of fractional digits is <= RealTraits<Real>::kMantissaDigits (e.g. 8 for float and 16 for double), the conversion is within 1 ULP of the exact value. For example Decimal128::ToReal<float>(9999.999) falls into this category because the integer 9999999 is precisely representable by float, whereas 9999.9999 would be in the next category.
3. Otherwise, the conversion is within 2^(-RealTraits<Real>::kMantissaDigits+1) (e.g. 2^-23 for float and 2^-52 for double) of the exact value.

Here "exact value" means the closest representable value by Real.

I believe this algorithm is good enough, because an"exact" algorithm would require iterative multiplication and subtraction of decimals to determain the binary representation of its fractional part. Yet the result would still almost always be inaccurate because float/double can only accurately represent powers of two. IMHO It's not worth it to spend that many expensive operations just to improve the result by one ULP.

### Are these changes tested?

Yes.

### Are there any user-facing changes?
 No.

* Closes: #35942 

Lead-authored-by: Jin Shang <shangjin1997@gmail.com>
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Signed-off-by: Antoine Pitrou <antoine@python.org>
@js8544
Copy link
Collaborator

js8544 commented Jul 19, 2023

@rohanjain101 Hi, this was fixed in #36667, but will unfortunately only be available since pyarrow 14, which should be released in a few months. If you need this fix now, you can use the nightly pyarrow package.

@pitrou pitrou closed this as not planned Won't fix, can't repro, duplicate, stale Jul 19, 2023
chelseajonesr pushed a commit to chelseajonesr/arrow that referenced this issue Jul 20, 2023
### Rationale for this change

The current implementation of `Decimal::ToReal` can be naively represented as the following pseudocode:
```
Real v = static_cast<Real>(decimal.as_int128/256())
return v * (10.0**-scale)
```
It stores the intermediate unscaled int128/256 value as a float/double. The unscaled int128/256 value can be very large when the decimal has a large scale, which causes precision issues such as in apache#36602.

### What changes are included in this PR?

Avoid storing the unscaled large int as float if the representation is not precise, by spliting the decimal into integral and fractional parts and dealing with them separately. This algorithm guarantees that:
1. If the decimal is an integer, the conversion is exact.
2. If the number of fractional digits is <= RealTraits<Real>::kMantissaDigits (e.g. 8 for float and 16 for double), the conversion is within 1 ULP of the exact value. For example Decimal128::ToReal<float>(9999.999) falls into this category because the integer 9999999 is precisely representable by float, whereas 9999.9999 would be in the next category.
3. Otherwise, the conversion is within 2^(-RealTraits<Real>::kMantissaDigits+1) (e.g. 2^-23 for float and 2^-52 for double) of the exact value.

Here "exact value" means the closest representable value by Real.

I believe this algorithm is good enough, because an"exact" algorithm would require iterative multiplication and subtraction of decimals to determain the binary representation of its fractional part. Yet the result would still almost always be inaccurate because float/double can only accurately represent powers of two. IMHO It's not worth it to spend that many expensive operations just to improve the result by one ULP.

### Are these changes tested?

Yes.

### Are there any user-facing changes?
 No.

* Closes: apache#35942 

Lead-authored-by: Jin Shang <shangjin1997@gmail.com>
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Signed-off-by: Antoine Pitrou <antoine@python.org>
R-JunmingChen pushed a commit to R-JunmingChen/arrow that referenced this issue Aug 20, 2023
### Rationale for this change

The current implementation of `Decimal::ToReal` can be naively represented as the following pseudocode:
```
Real v = static_cast<Real>(decimal.as_int128/256())
return v * (10.0**-scale)
```
It stores the intermediate unscaled int128/256 value as a float/double. The unscaled int128/256 value can be very large when the decimal has a large scale, which causes precision issues such as in apache#36602.

### What changes are included in this PR?

Avoid storing the unscaled large int as float if the representation is not precise, by spliting the decimal into integral and fractional parts and dealing with them separately. This algorithm guarantees that:
1. If the decimal is an integer, the conversion is exact.
2. If the number of fractional digits is <= RealTraits<Real>::kMantissaDigits (e.g. 8 for float and 16 for double), the conversion is within 1 ULP of the exact value. For example Decimal128::ToReal<float>(9999.999) falls into this category because the integer 9999999 is precisely representable by float, whereas 9999.9999 would be in the next category.
3. Otherwise, the conversion is within 2^(-RealTraits<Real>::kMantissaDigits+1) (e.g. 2^-23 for float and 2^-52 for double) of the exact value.

Here "exact value" means the closest representable value by Real.

I believe this algorithm is good enough, because an"exact" algorithm would require iterative multiplication and subtraction of decimals to determain the binary representation of its fractional part. Yet the result would still almost always be inaccurate because float/double can only accurately represent powers of two. IMHO It's not worth it to spend that many expensive operations just to improve the result by one ULP.

### Are these changes tested?

Yes.

### Are there any user-facing changes?
 No.

* Closes: apache#35942 

Lead-authored-by: Jin Shang <shangjin1997@gmail.com>
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants