Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: add missing precision overflow checking for cast_string_to_decimal #4830

Merged
merged 2 commits into from
Sep 25, 2023

Conversation

jonahgao
Copy link
Member

Which issue does this PR close?

Closes #4829.

Rationale for this change

Perform a precision overflow check when casting from a string to a decimal.

What changes are included in this PR?

Are there any user-facing changes?

No

@github-actions github-actions bot added the arrow Changes to the arrow crate label Sep 18, 2023
@jonahgao jonahgao changed the title fix: add missing precision overflow checking for `cast_string_to_deci… fix: add missing precision overflow checking for cast_string_to_decimal Sep 18, 2023
Comment on lines +2804 to +2808
.and_then(|v| {
T::validate_decimal_precision(v, precision)
.is_ok()
.then_some(v)
})
Copy link
Member

@viirya viirya Sep 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that the original idea is to avoid this validation and leave the decision to the caller. But after several rounds of revamp on decimal, I'm not sure if that design idea is still kept and we want to add validation now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viirya I noticed that casting from integers has this validation, which is inconsistent with the behavior of casting from strings.

There is an intuitive example in Datafusion.

DataFusion CLI v31.0.0
❯ select cast(1000 as decimal(10,8));
Optimizer rule 'simplify_expressions' failed
caused by
Arrow error: Invalid argument error: 100000000000 is too large to store in a Decimal128 of precision 10. Max is 9999999999

❯ select cast('1000' as decimal(10,8));
+--------------+
| Utf8("1000") |
+--------------+
| 10.00000000  |
+--------------+

If they could have a unified behavior, it would be preferable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel strongly here, casting in general does tend to check for overflow, and given it is parsing a string is unlikely to massively regress performance. I defer to you @viirya

@@ -8097,6 +8106,32 @@ mod tests {
test_cast_string_to_decimal(array);
}

#[test]
fn test_cast_string_to_decimal128_precision_overflow() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cast_string_to_decimal is not only for decimal128 but also for decimal256, do you want to add a test for decimal256 too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viirya I think it is necessary, and a new test has been added for decimal256.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.

@tustvold tustvold merged commit 7e7ac15 into apache:master Sep 25, 2023
25 checks passed
ryanaston pushed a commit to segmentio/arrow-rs that referenced this pull request Nov 6, 2023
…mal` (apache#4830)

* fix: add missing precision overflow checking for `cast_string_to_decimal`

* Add test_cast_string_to_decimal256_precision_overflow
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cast_string_to_decimal should check precision overflow
3 participants