Use exponential format for decimals with relatively large scale #121

declanvk · 2024-01-08T23:35:16Z

This PR is related to some of the comments in #108, but there is not a specific issue created separately for this. Please let me know if you want me to create an issue first for some discussion.

The first commit of the PR is mostly just adding tests to capture the current behavior.

The second commit is the more important, it adds a different formatting code path to the fmt::Display impl, so that numbers that have a large (absolute) scale relative to their integer representation length will be displayed in exponential format. The second commit message has more detail.

I chose the (abs_int.len() as i64 - self.scale - 1) as the criteria for the cutoff because I didn't want it to trigger only based on the scale. I imagined a scenario where the scale is large (either negative or positive), but it is inflated because the int_val has a large number of digits.

I'm specifically targeting cases where someone could input a small string like 1E100000 and then crash a program by making it do a huge allocation.

Thanks for making and maintaining the bigdecimal crate! We find it very useful

akubera · 2024-01-17T01:47:24Z

This is much needed, thanks. I think started that at some point, but I can't find the commits 😬 .

Like the other traits, I'd like to break this out into an impl_fmt.rs file. So I'll do that in trunk and rebase your branch onto that to keep accurate attribution.

I'll break your function-scope format_full_scale and format_exponential out into module-scope, so they may be individually, without dependency on EXPONENTIAL_FORMAT_THRESHOLD.

akubera · 2024-01-17T01:54:41Z

Regarding EXPONENTIAL_FORMAT_THRESHOLD, I think I'll make that a compile-time configurable; like RUST_BIGDECIMAL_DEFAULT_PRECISION.

Most people wont care, but someone might find it useful.

It looks like Python's default decimal formatter only keeps 5 leading zeros before going to exponential form:

>>> Decimal("1e-6")
Decimal('0.000001')
>>> Decimal("1e-7")
Decimal('1E-7')

>>> Decimal("11e-7")
Decimal('0.0000011')
>>> Decimal("11e-8")
Decimal('1.1E-7')

And trailing zeros are precision-dependent, which makes sense as those could indicate accuracy:

>>> Decimal("1e0")
Decimal('1')
>>> Decimal("1e1")
Decimal('1E+1')
>>> Decimal("1e3")
Decimal('1E+3')

>>> Decimal("10e1")
Decimal('1.0E+2')

>>> Decimal("10000e3")
Decimal('1.0000E+7')

__str__ follows same rules:

>>> str(Decimal("1e-6"))
'0.000001'
>>> str(Decimal("1e-7"))
'1E-7'
>>> str(Decimal("100e7"))
'1.00E+9'

akubera · 2024-01-17T02:17:30Z

I found my feature/impl-fmt branch. I rebased that onto trunk and I'll get your changes in there somewhere.

akubera · 2024-01-17T15:16:54Z

I applied your changes to the split file instead of src/lib.rs.

exponential-format

Only significant change was moving the format_exponential out of the fmt method, and leaving the other code in the function as-is (i.e. jump to format_exponential if length condition holds, otherwise use the old code)

I may change this back to your two-function approach, and move the whole thing into yet another function that parameterizes the threshold-value, so we can write nicer tests.

akubera · 2024-01-17T15:27:10Z

One thing to note: I set EXPONENTIAL_FORMAT_THRESHOLD = 2 (small value) and the exponential form is chosen when given a value like 7000:

println!("{}", BigDecimal::from(7000)); -> "7.000E3"

Is that expected behavior?

declanvk · 2024-01-17T16:29:12Z

Regarding EXPONENTIAL_FORMAT_THRESHOLD, I think I'll make that a compile-time configurable; like RUST_BIGDECIMAL_DEFAULT_PRECISION.

Sounds good to me! That is a cool technique you're using in the build.rs

Is that expected behavior?

This makes sense, but I did not anticipate it ahead of time. Sinceabs_int = "7000" and scale = 0, then we get "7000".len() - 0 > 2 which evaluates to true and goes to the exponential path.

I think this points out a weakness in the condition, I previously assumed that the trailing zeroes would not have this affect.

What is your preference on this? Should I make it count the abs_int.len() - num_trailing_zeroes?

akubera · 2024-01-17T23:53:55Z

The special case scale == 0 always means the BigDecimal is an integer, and the BigInt may be printed verbatim. So that's one.

akubera · 2024-01-18T00:02:51Z

Comparing to other libraries:

// Java
import java.math.*;
class BigDecimalTest {
    public static void main(String[] args) {
        String[] strs = new String[]{
            "0.123",
            "0.000001",
            "0.0000001",
            "100",
            "100e3",
            "1000000000",
            "1000000000.00000",
            "1000000000e1",
        };
        for (String s : strs) {
           BigDecimal n = new BigDecimal(s);
           System.out.println(s + " -> " + n);
        }
    }
}

outputs

0.123 -> 0.123
0.000001 -> 0.000001
0.0000001 -> 1E-7
100 -> 100
100e3 -> 1.00E+5
1000000000 -> 1000000000
1000000000.00000 -> 1000000000.00000
1000000000e1 -> 1.000000000E+10

# Python
from decimal import Decimal

strs = [
    "0.123",
    "0.000001",
    "0.0000001",
    "100",
    "100e3",
    "1000000000",
    "1000000000.00000",
    "1000000000e1",
]

for s in strs:
    n = Decimal(s)
    print(f"{s} -> {n}")

0.123 -> 0.123
0.000001 -> 0.000001
0.0000001 -> 1E-7
100 -> 100
100e3 -> 1.00E+5
1000000000 -> 1000000000
1000000000.00000 -> 1000000000.00000
1000000000e1 -> 1.000000000E+10

(good, it's the same)

Ruby:

require 'bigdecimal'

strs = [
    "0.123",
    "0.000001",
    "0.0000001",
    "100",
    "100e3",
    "1000000000",
    "1000000000.00000",
    "1000000000e1",
]

for s in strs do
  n = BigDecimal s
  print(s, " -> ", n, "\n")
end

0.123 -> 0.123e0
0.000001 -> 0.1e-5
0.0000001 -> 0.1e-6
100 -> 0.1e3
100e3 -> 0.1e6
1000000000 -> 0.1e10
1000000000.00000 -> 0.1e10
1000000000e1 -> 0.1e11

...... well I regret trying that one. They ALL start with 0. !?

akubera · 2024-01-18T00:36:07Z

0.1  ->  0.1
0.10  ->  0.10
0.100  ->  0.100
0.1000  ->  0.1000
...
0.100000000  ->  0.100000000
...
0.100000000000000000000000000  ->  0.100000000000000000000000000

So trailing zeros don't matter.

I think the rule is

// compare scale to zero
match self.scale.cmp(&0) {
   // simply print integer if scale is zero
   Equal => print(abs_int),

   // decimal point is "to the right" of the end of the integer  
   // always print exponential form
   // (example 100e3 => 100xxx. )
   Less => print_exponential_form(...),

   // greater means decimal point is to the left of the end of the integer 
   // we should do exponential form if the decimal point is past the left of
   // the start (significant) digit 
   //       123e-2 => 1.23         (scale - len = 2 - 3 = -1)
   //       123e-5 => 0.00123      (scale - len = 5 - 3 = 2)
   //       123e-9 => 0.000000123  (scale - len = 9 - 3 = 6 > threshold => 1.23e-7)
   _ => if self.scale - abs_int.len() as i64  > THRESHOLD  {
          print_exponential_form(...)
        } else {
          print_full_form(...)
        }
}

akubera · 2024-01-18T01:13:04Z

Ok: We have decimal with digits

dddddddddd

Only thing that determines exponential is placement of the decimal point. Value or number of digits do not matter:

          dddddddddd0…00.    <- exponential
          dddddddddd.        <- full (no decimal point to be printed)
          ddddd.dddd         <- full
         .dddddddddd         <- full
     .0000dddddddddd         <- full
.0000…0000dddddddddd     <- exponential if n > threshold
 |-- n --|

So I'm going with

// check if scale is outside of "boundary"
if scale < 0 || (scale as usize > abs_int.len() + threshold) {
    exponential
} else {
    full
}

akubera · 2024-01-18T02:00:25Z

These tests are passing for threshold = 2 👏

    impl_case!(case_0d123: "0.123" => "0.123");
    impl_case!(case_0d0123: "0.0123" => "0.0123");
    impl_case!(case_0d00123: "0.00123" => "0.00123");
    impl_case!(case_0d000123: "0.000123" => "1.23E-4");

    impl_case!(case_123d: "123." => "123");
    impl_case!(case_123de1: "123.e1" => "1.23E3");

test impl_fmt::test_display::case_0d000123 ... ok
test impl_fmt::test_display::case_123d ... ok
test impl_fmt::test_display::case_0d00123 ... ok
test impl_fmt::test_display::case_123de1 ... ok
test impl_fmt::test_display::case_0d123 ... ok
test impl_fmt::test_display::case_0d0123 ... ok

the last one there is almost consistent with Python. Do we force the +?

>>> Decimal("123.e1")
Decimal('1.23E+3')

declanvk · 2024-01-18T13:04:11Z

Nice work on adjusting the rule! I like the latest iteration, focusing in on the placement of the digits really brings it into focus

the last one there is almost consistent with Python. Do we force the +?

Yeah I think go for it! Looking at the python and java examples, it seems like they always include the sign on the exponent

declanvk · 2024-01-18T13:07:22Z

src/impl_fmt.rs

+    }
+
+    if exponent != 0 {
+        abs_int += "E";


nit: Do you want to make this lowercase e to match the other notations in this file? Or prefer to match other big decimal libraries with an uppercase E?

I will update the others to use "E"

I think they've all been addressed.

declanvk · 2024-01-18T13:09:45Z

src/impl_fmt.rs

+}
+
+
+fn dynamically_format_decimal(


idea,not blocking: Another idea I was playing around with was incorporating the use of the alternate format flag #, see https://doc.rust-lang.org/std/fmt/index.html#sign0.

We could use the alternate format to choose a different format, or force the output into the format_exponential or format_full_scale branches

You're saying println!("{:#}", d) should use "format_full_scale"? (No threshold check?)

I like it.

That paired with std::fmt::{UpperExp,LowerExp} should give the user full control.

println!("{:e}", d); // 1.2e+1 println!("{:E}", d); // 1.2E+1 println!("{:#}", d); // 12 println!("{}", d); // 12 println!("{:?}", d); // BigDecimal("12E+0")

I've amended format_exponential to take an "e_symbol" parameter so we can choose between e & E (and any other values as required in the future, though I'm not sure if any localization changes that particular char).

Do we leave the case that a user printing "{:#}", BigDecimal("1e-1000000")) will print many zeros? I.e. do we have a larger threshold for number of zeros for "alternate" mode?

These changes break a lot of tests so I'm going through those before pushing what I've done.

src/impl_fmt.rs

declanvk · 2024-01-18T13:14:55Z

I left some nonblocking comments (on my own PR hahaha), let me know what you think! I can go ahead and implement anything, I feel bad making you do all this work!

Thanks again for looking at this

akubera · 2024-01-18T13:44:20Z

I didn't mean to force push onto your branch, but if you like the changes I'm glad I did.
Thanks for your comments, they look good.
I'll rebase again to correct the last old-rust compatibility issue.

I removed the updated threshold check, I'm working on redoing that commit to also update the tests.

I'll look at all this again tonight.

declanvk · 2024-01-18T18:28:38Z

I didn't mean to force push onto your branch, but if you like the changes I'm glad I did.

I'm glad you did! The changes you made look good

akubera · 2024-01-21T19:20:16Z

I'm about ready to call this done. I wrote a little code-writing python script to generate unit-test mods so the Rust bigdecimal formatting of {:XYZ} aligns with the same format in Python.

akubera · 2024-03-06T00:59:59Z

This was merged and released last night in 0.4.3.

I forgot to use the GitHub ui, oops. To make it look merged I'm going to merge it now into trunk and then force push updates away.

declanvk · 2024-03-11T16:39:27Z

Thanks for all your work on this @akubera !

akubera · 2024-03-11T22:31:27Z

'twas a long time coming. Let me know right away if there's any problems!

akubera mentioned this pull request Jan 17, 2024

impl Arbitrary for BigDecimal #122

Open

akubera self-assigned this Jan 17, 2024

akubera force-pushed the exponential-format branch from b0a2ea3 to 952a505 Compare January 18, 2024 07:03

declanvk commented Jan 18, 2024

View reviewed changes

akubera force-pushed the exponential-format branch 2 times, most recently from 3b33646 to 379e1c2 Compare January 21, 2024 03:06

Enable property tests in circle-ci and cache via lockfile

dbf3b48

akubera force-pushed the exponential-format branch from 77a3641 to dbf3b48 Compare January 21, 2024 19:07

akubera mentioned this pull request Feb 22, 2024

[Possible Bug] - Display panics for large numbers #108

Closed

akubera force-pushed the trunk branch from 8033eb6 to 33f44aa Compare March 5, 2024 02:50

akubera merged commit 2ac1c62 into akubera:trunk Mar 6, 2024
11 checks passed

declanvk deleted the exponential-format branch March 11, 2024 16:39

declanvk mentioned this pull request Mar 28, 2024

Use scientific notation for fmt::Display and fmt::Debug please #91

Closed

rubdos mentioned this pull request Apr 1, 2024

std::fmt precision in format!: regression since 0.3? #126

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use exponential format for decimals with relatively large scale #121

Use exponential format for decimals with relatively large scale #121

declanvk commented Jan 8, 2024

akubera commented Jan 17, 2024

akubera commented Jan 17, 2024

akubera commented Jan 17, 2024

akubera commented Jan 17, 2024

akubera commented Jan 17, 2024

declanvk commented Jan 17, 2024

akubera commented Jan 17, 2024

akubera commented Jan 18, 2024

akubera commented Jan 18, 2024 •

edited

akubera commented Jan 18, 2024 •

edited

akubera commented Jan 18, 2024 •

edited

declanvk commented Jan 18, 2024

declanvk Jan 18, 2024

akubera Jan 19, 2024

akubera Jan 21, 2024

declanvk Jan 18, 2024

akubera Jan 20, 2024

akubera Jan 20, 2024

declanvk commented Jan 18, 2024

akubera commented Jan 18, 2024 •

edited

declanvk commented Jan 18, 2024

akubera commented Jan 21, 2024

akubera commented Mar 6, 2024

declanvk commented Mar 11, 2024

akubera commented Mar 11, 2024

Use exponential format for decimals with relatively large scale #121

Use exponential format for decimals with relatively large scale #121

Conversation

declanvk commented Jan 8, 2024

akubera commented Jan 17, 2024

akubera commented Jan 17, 2024

akubera commented Jan 17, 2024

akubera commented Jan 17, 2024

akubera commented Jan 17, 2024

declanvk commented Jan 17, 2024

akubera commented Jan 17, 2024

akubera commented Jan 18, 2024

akubera commented Jan 18, 2024 • edited

akubera commented Jan 18, 2024 • edited

akubera commented Jan 18, 2024 • edited

declanvk commented Jan 18, 2024

declanvk Jan 18, 2024

Choose a reason for hiding this comment

akubera Jan 19, 2024

Choose a reason for hiding this comment

akubera Jan 21, 2024

Choose a reason for hiding this comment

declanvk Jan 18, 2024

Choose a reason for hiding this comment

akubera Jan 20, 2024

Choose a reason for hiding this comment

akubera Jan 20, 2024

Choose a reason for hiding this comment

declanvk commented Jan 18, 2024

akubera commented Jan 18, 2024 • edited

declanvk commented Jan 18, 2024

akubera commented Jan 21, 2024

akubera commented Mar 6, 2024

declanvk commented Mar 11, 2024

akubera commented Mar 11, 2024

akubera commented Jan 18, 2024 •

edited

akubera commented Jan 18, 2024 •

edited

akubera commented Jan 18, 2024 •

edited

akubera commented Jan 18, 2024 •

edited