-
Notifications
You must be signed in to change notification settings - Fork 752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decimal enhancements in arrow-cast #5068
Comments
Decimals are rounded when casting between scales, so one option might be to parse using 1 higher scale and then cast. This might be simpler than supporting rounding whilst parsing, which is likely to be complex to achieve without regressing performance.
Correct, if CastOptions::safe is set to true (the default) values that fail to parse will be treated as nulls |
Ah, missed that. Thank you! |
My memory is hazy, but I seem to remember the current logic eliminates the need for overflow/underflow detection, or precision verification. Provided we can show we aren't massively regressing performance of decimal parsing, I don't see an issue with changing this behaviour as you describe. |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
There are currently two different functions in arrow-cast which convert strings to decimals.
parse_string_to_decimal_native
is used by arrow-cast when casting an entire array from a string type to a decimal type.parse_decimal
is used by arrow-csv and arrow-json. The inputs differ a bit, but the return types are identical. The latter accepts an additional precision argument, however when the former is used it is followed up by a call tovalidate_decimal_precision
for the desired type. The former rounds values when input exceeds scale while the latter truncates. Neither function supports parsing scientific notation.Describe the solution you'd like
Describe alternatives you've considered
N/A
Additional context
I stumbled upon this when using arrow-json reader to build a record batch. I was getting errors when my decimal values contained scientific notation and noticed
parse_decimal
doesn't support it, despite scientific notation being valid for JSON numbers. When I tried forcing expanded notation I found the excess scale was being truncated instead of rounded. I then tried building my column using a string array and casting that to a decimal type. Instead of erroring on scientific notation the value was being treated asnull
. Expanded values did work and were rounded.The text was updated successfully, but these errors were encountered: