Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Precision-Loss Decimal Arithmetic #4664

Closed
tustvold opened this issue Aug 7, 2023 · 2 comments
Closed

Precision-Loss Decimal Arithmetic #4664

tustvold opened this issue Aug 7, 2023 · 2 comments
Assignees
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog

Comments

@tustvold
Copy link
Contributor

tustvold commented Aug 7, 2023

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Currently decimal arithmetic may return an overflow error if an arithmetic operation produces an output that exceeds the maximum precision of the storage type.

Describe the solution you'd like

Spark and similar systems instead truncate the precision of the output, if the precision would exceed the maximum of the storage type - https://github.com/apache/arrow/blob/36ddbb531cac9b9e512dfa3776d1d64db588209f/java/gandiva/src/main/java/org/apache/arrow/gandiva/evaluator/DecimalTypeUtil.java#L83

To achieve this the arithmetic kernels would need to be updated to detect this case, and instead perform arithmetic on double-width decimal primitives, before then truncating this down to an appropriate scale/precision.

Describe alternatives you've considered

We could continue to return an error, but this is likely to be surprising for users.

Additional context

#4640 proposes increasing the decimal output scale, which in turn would make this error case more likely.

The fundamental primitive necessary for this is, N-digit division, which will likely be implemented as part of #4663

@tustvold
Copy link
Contributor Author

tustvold commented Aug 12, 2023

Having spent a significant amount of time working on this I'm inclined to not move forward with this.

  • Decimal256 supports up to 76 decimal digits, which is more than most commercial systems, and makes me think supporting precision loss arithmetic, i.e. arithmetic with even greater precision than this, is of limited practical utility:
    • MySQL - 65
    • MariaDB - 65
    • SQL Server - 38
    • DuckDB - 38
    • Oracle - 38
    • Redshift - 38
    • Spark - 38
    • Postgres - 131072!! (postgres decimals appear to be rather special)
  • Silently truncating precision is potentially surprising and is inconsistent with the rest of the arithmetic kernels
  • Users can easily increase/decrease the precision of a calculation by casting the inputs
  • Performant precision-loss arithmetic is extremely complex
  • Ensuring good test coverage is very hard given the size of the numbers involved
  • I personally find it hard to justify spending significant amounts of time on decimals when they don't show up in my workloads

TLDR the current decimal support, whilst likely not perfect, covers most bases, and is at least as feature-full as many commercial DBMSs, and I therefore struggle to justify spending more time working on it

@tustvold tustvold closed this as not planned Won't fix, can't repro, duplicate, stale Aug 12, 2023
@tustvold tustvold added the arrow Changes to the arrow crate label Aug 15, 2023
@tustvold
Copy link
Contributor Author

label_issue.py automatically added labels {'arrow'} from #4672

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

No branches or pull requests

1 participant