Decimal enhancements in arrow-cast #5068

ryanaston · 2023-11-12T15:48:20Z

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
There are currently two different functions in arrow-cast which convert strings to decimals. parse_string_to_decimal_native is used by arrow-cast when casting an entire array from a string type to a decimal type. parse_decimal is used by arrow-csv and arrow-json. The inputs differ a bit, but the return types are identical. The latter accepts an additional precision argument, however when the former is used it is followed up by a call to validate_decimal_precision for the desired type. The former rounds values when input exceeds scale while the latter truncates. Neither function supports parsing scientific notation.

Describe the solution you'd like

Consolidate into a single, consistent function
Support scientific notation input
Default to "half up" rounding
Stretch goal: allow for configurable rounding (maybe as a CastOption?)

Describe alternatives you've considered
N/A

Additional context
I stumbled upon this when using arrow-json reader to build a record batch. I was getting errors when my decimal values contained scientific notation and noticed parse_decimal doesn't support it, despite scientific notation being valid for JSON numbers. When I tried forcing expanded notation I found the excess scale was being truncated instead of rounded. I then tried building my column using a string array and casting that to a decimal type. Instead of erroring on scientific notation the value was being treated as null. Expanded values did work and were rounded.

The text was updated successfully, but these errors were encountered:

tustvold · 2023-11-12T18:13:13Z

Decimals are rounded when casting between scales, so one option might be to parse using 1 higher scale and then cast. This might be simpler than supporting rounding whilst parsing, which is likely to be complex to achieve without regressing performance.

Instead of erroring on scientific notation the value was being treated as null

Correct, if CastOptions::safe is set to true (the default) values that fail to parse will be treated as nulls

ryanaston · 2023-11-12T19:13:37Z

Decimals are rounded when casting between scales, so one option might be to parse using 1 higher scale and then cast. This might be simpler than supporting rounding whilst parsing, which is likely to be complex to achieve without regressing performance.

parse_decimal already looks at every character in the string, even those beyond the desired scale to make sure they are valid characters for a decimal. Could it simply examine the character at scale + 1 and do an add_wrapping (for positives) or a sub_wrapping (for negatives)? Rounding is already present in parse_string_to_decimal_native and does it this way. Why would rounding be a performance concern in one but not the other?

Instead of erroring on scientific notation the value was being treated as null

Correct, if CastOptions::safe is set to true (the default) values that fail to parse will be treated as nulls

Ah, missed that. Thank you!

tustvold · 2023-11-12T19:22:41Z

Could it simply examine the character at scale + 1 and do an add_wrapping (for positives) or a sub_wrapping (for negatives)

My memory is hazy, but I seem to remember the current logic eliminates the need for overflow/underflow detection, or precision verification. Provided we can show we aren't massively regressing performance of decimal parsing, I don't see an issue with changing this behaviour as you describe.

ryanaston added the enhancement Any new improvement worthy of a entry in the changelog label Nov 12, 2023

ryanaston mentioned this issue Nov 12, 2023

Unable to write high-precision decimal values to Delta table using serde_json/JsonWriter delta-io/delta-rs#1778

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decimal enhancements in arrow-cast #5068

Decimal enhancements in arrow-cast #5068

ryanaston commented Nov 12, 2023 •

edited

Loading

tustvold commented Nov 12, 2023 •

edited

Loading

ryanaston commented Nov 12, 2023

tustvold commented Nov 12, 2023

Decimal enhancements in arrow-cast #5068

Decimal enhancements in arrow-cast #5068

Comments

ryanaston commented Nov 12, 2023 • edited Loading

tustvold commented Nov 12, 2023 • edited Loading

ryanaston commented Nov 12, 2023

tustvold commented Nov 12, 2023

ryanaston commented Nov 12, 2023 •

edited

Loading

tustvold commented Nov 12, 2023 •

edited

Loading