Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Possible Bug] - Display panics for large numbers #108

Closed
DanielBauman88 opened this issue Jul 19, 2023 · 10 comments
Closed

[Possible Bug] - Display panics for large numbers #108

DanielBauman88 opened this issue Jul 19, 2023 · 10 comments
Labels

Comments

@DanielBauman88
Copy link
Contributor

DanielBauman88 commented Jul 19, 2023

Is this intentional? I think it'd be an improvement to avoid panics in display, and simply use a different display format for numbers that can't be displayed with the standard implementation.

EG: the following test will panic

    #[test]
    fn test_big_scale() {
        let min: i64 = i64::MIN;
        let str = &format!("1E{min}");
        BigDecimal::from_str(str).unwrap().to_string();
    }

Will panic with "attempt to subtract with overflow"

Same for example for

    #[test]
    fn test_small_scale() {
        let max: i64 = i64::MAX;
        let str = &format!("1E{}", max);
        BigDecimal::from_str(str).unwrap().to_string();
    }
@akubera
Copy link
Owner

akubera commented Jul 20, 2023

Ok: edge case comes from subtracting the length of the "decimal" string from the scale, counting how many digits should come before the decimal point:

let location = abs_int.len() as i64 - self.scale;

Interesting, I wrote this unit test

        let dec = BigDecimal::new(1.into(), stdlib::i64::MAX);
        let expected = format!("1E{}", stdlib::i64::MAX);
        assert_eq!(dec.to_string(), expected);

and got memory allocation of 9223372036854775806 bytes failed

So... that's a bad sign.

@akubera
Copy link
Owner

akubera commented Jul 20, 2023

It boils down to we don't support exponential syntax? In which case, anything close to i64 min or max (or even the 4GB of i32 min/max) will cause problems.

I don't know what heuristic to use for formatting decimals in exponential-form vs standard-form.

Definitely a bug that needs fixed, but I'm not too worried about it as it's been there for 6 years. It's on the todo list

@akubera akubera added the bug label Jul 20, 2023
@DanielBauman88
Copy link
Contributor Author

DanielBauman88 commented Jul 20, 2023

Another issue here which isn't related to the edge case, it's not obvious that display will allocate so much memory for large numbers.

EG - try those examples just below the limit. Display does huge allocations.

    #[test]
    fn test_big_scale() {
        let max: i64 = i64::MAX - 2;
        let str = &format!("1E{}", max);
        BigDecimal::from_str(str).unwrap().to_string();
    }

The to_string call here allocates 9223372036854775805 bytes of memory
https://github.com/akubera/bigdecimal-rs/blob/main/src/lib.rs#L1694-L1694

Doesn't seem necessary for displaying. Would be better to have some limit where you display in exponential form.

edit: I see now that you came across the same problem here - #108 (comment)

akubera pushed a commit that referenced this issue Jan 17, 2024
**Description**
Add a second path to the `Display` impl so that numbers that have a
large scale relative to length of their integer representation are
formatted differently. Specifically, I switched it to use
exponential number format like:
```
1.23456789E38
```

instead of
```
123456789000000000000000000000000000000
```

**Motivation**

If you extrapolate from the example above, and also looking at issue
#108, you could see
that with large exponents you run the risk of large allocations,
since the resulting bytestring is proportional in size to the
exponent.

Any software using the `bigdecimal` crate and converting unprotected
uer input into the `BigDecimal` type runs the risk of allocating
way too much and potentially crashing, if they happen to use either
the `Debug` or `Display` impls.

This change is useful because it bounds the output and prevents
unexpected large allocations when formatting the `BigDecimal` type

**Testing Done**
`cargo test && ./scripts/bigdecimal-property-tests test`

**Backwards Compatibility Criteria**

I believe that this does not constitue a backwards incompatible
change because the `BigDecimal::from_str` method already supports
exponential format numbers as input.
akubera pushed a commit that referenced this issue Jan 18, 2024
**Description**
Add a second path to the `Display` impl so that numbers that have a
large scale relative to length of their integer representation are
formatted differently. Specifically, I switched it to use
exponential number format like:
```
1.23456789E38
```

instead of
```
123456789000000000000000000000000000000
```

**Motivation**

If you extrapolate from the example above, and also looking at issue
#108, you could see
that with large exponents you run the risk of large allocations,
since the resulting bytestring is proportional in size to the
exponent.

Any software using the `bigdecimal` crate and converting unprotected
uer input into the `BigDecimal` type runs the risk of allocating
way too much and potentially crashing, if they happen to use either
the `Debug` or `Display` impls.

This change is useful because it bounds the output and prevents
unexpected large allocations when formatting the `BigDecimal` type

**Testing Done**
`cargo test && ./scripts/bigdecimal-property-tests test`

**Backwards Compatibility Criteria**

I believe that this does not constitue a backwards incompatible
change because the `BigDecimal::from_str` method already supports
exponential format numbers as input.
@ddimaria
Copy link

We're running into the same issue in production when converting a large BigDecimal to a string. Design-wise, the to_string() method should return a result or there should be different ways of displaying the stringified BigDecimal that avoids panics. Are you open to a PR to fix this?

@akubera
Copy link
Owner

akubera commented Feb 22, 2024

Does the branch in MR #121 fix the issue?

@akubera
Copy link
Owner

akubera commented Feb 22, 2024

It did not.

This line had a potential out of bounds error if the exponent was near the i64 boundary, and had to be shifted forward or backwards past the boundaries when displaying in scientific notation:

let exponent = abs_int.len() as i64 + exp - 1;

Casting the length and exp to i128's will solve that particular issue.

@akubera
Copy link
Owner

akubera commented Feb 23, 2024

I casted a bunch of i64's to i128's before doing formatting stuff, and added some tests where the exponent is near the i64 boundary:

impl_case!(test_max: format!("1E{}", i64::MAX) => "1E+9223372036854775807");
impl_case!(test_max_multiple_digits: format!("314156E{}", i64::MAX) => "3.14156E+9223372036854775812");
impl_case!(test_min_scale: "1E9223372036854775808" => "1E+9223372036854775808");
impl_case!(test_max_scale: "1E-9223372036854775807" => "1E-9223372036854775807");
impl_case!(test_min_multiple_digits: format!("271828182E-{}", i64::MAX) => "2.71828182E-9223372036854775799");
impl_case!((panics) test_max_exp_overflow: "1E9223372036854775809");
impl_case!((panics) test_min_exp_overflow: "1E-9223372036854775808");
}

Those test round-tripping, so everything rendered in .to_string() is able to be parsed (which wasn't true before).

@akubera
Copy link
Owner

akubera commented Feb 23, 2024

Where did these issues come up? I know Python limits Decimal exponents to 10^18 (about 60 bits) and Java uses 32-bit a scale so I thought 64 bits would be enough for anybody. But I guess not?

Obviously these were bugs needing to be fixed, but I'm curious where they come up?

@ddimaria
Copy link

Is was while importing a user-submitted CSV. It comes in as text, then BigDecimal parses the exponential number successfully (signals that the text was a number). When we serialize the value we get the error. Our temp fix is to manually check for overflows.

@akubera
Copy link
Owner

akubera commented Mar 6, 2024

Should be fixed in 0.4.3!

@akubera akubera closed this as completed Mar 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants