-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use exponential format for decimals with relatively large scale #121
Conversation
This is much needed, thanks. I think started that at some point, but I can't find the commits 😬 . Like the other traits, I'd like to break this out into an I'll break your function-scope |
Regarding Most people wont care, but someone might find it useful. It looks like Python's default decimal formatter only keeps 5 leading zeros before going to exponential form:
And trailing zeros are precision-dependent, which makes sense as those could indicate accuracy:
|
I found my feature/impl-fmt branch. I rebased that onto trunk and I'll get your changes in there somewhere. |
I applied your changes to the split file instead of Only significant change was moving the I may change this back to your two-function approach, and move the whole thing into yet another function that parameterizes the threshold-value, so we can write nicer tests. |
One thing to note: I set
Is that expected behavior? |
Sounds good to me! That is a cool technique you're using in the
This makes sense, but I did not anticipate it ahead of time. Since I think this points out a weakness in the condition, I previously assumed that the trailing zeroes would not have this affect. What is your preference on this? Should I make it count the |
The special case |
Comparing to other libraries: // Java
import java.math.*;
class BigDecimalTest {
public static void main(String[] args) {
String[] strs = new String[]{
"0.123",
"0.000001",
"0.0000001",
"100",
"100e3",
"1000000000",
"1000000000.00000",
"1000000000e1",
};
for (String s : strs) {
BigDecimal n = new BigDecimal(s);
System.out.println(s + " -> " + n);
}
}
} outputs
# Python
from decimal import Decimal
strs = [
"0.123",
"0.000001",
"0.0000001",
"100",
"100e3",
"1000000000",
"1000000000.00000",
"1000000000e1",
]
for s in strs:
n = Decimal(s)
print(f"{s} -> {n}")
(good, it's the same) Ruby: require 'bigdecimal'
strs = [
"0.123",
"0.000001",
"0.0000001",
"100",
"100e3",
"1000000000",
"1000000000.00000",
"1000000000e1",
]
for s in strs do
n = BigDecimal s
print(s, " -> ", n, "\n")
end
...... well I regret trying that one. They ALL start with |
So trailing zeros don't matter. I think the rule is // compare scale to zero
match self.scale.cmp(&0) {
// simply print integer if scale is zero
Equal => print(abs_int),
// decimal point is "to the right" of the end of the integer
// always print exponential form
// (example 100e3 => 100xxx. )
Less => print_exponential_form(...),
// greater means decimal point is to the left of the end of the integer
// we should do exponential form if the decimal point is past the left of
// the start (significant) digit
// 123e-2 => 1.23 (scale - len = 2 - 3 = -1)
// 123e-5 => 0.00123 (scale - len = 5 - 3 = 2)
// 123e-9 => 0.000000123 (scale - len = 9 - 3 = 6 > threshold => 1.23e-7)
_ => if self.scale - abs_int.len() as i64 > THRESHOLD {
print_exponential_form(...)
} else {
print_full_form(...)
}
} |
Ok: We have decimal with digits
Only thing that determines exponential is placement of the decimal point. Value or number of digits do not matter:
So I'm going with // check if scale is outside of "boundary"
if scale < 0 || (scale as usize > abs_int.len() + threshold) {
exponential
} else {
full
} |
These tests are passing for threshold = 2 👏 impl_case!(case_0d123: "0.123" => "0.123");
impl_case!(case_0d0123: "0.0123" => "0.0123");
impl_case!(case_0d00123: "0.00123" => "0.00123");
impl_case!(case_0d000123: "0.000123" => "1.23E-4");
impl_case!(case_123d: "123." => "123");
impl_case!(case_123de1: "123.e1" => "1.23E3");
the last one there is almost consistent with Python. Do we force the >>> Decimal("123.e1")
Decimal('1.23E+3') |
b0a2ea3
to
952a505
Compare
Nice work on adjusting the rule! I like the latest iteration, focusing in on the placement of the digits really brings it into focus
Yeah I think go for it! Looking at the python and java examples, it seems like they always include the sign on the exponent |
src/impl_fmt.rs
Outdated
} | ||
|
||
if exponent != 0 { | ||
abs_int += "E"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Do you want to make this lowercase e
to match the other notations in this file? Or prefer to match other big decimal libraries with an uppercase E
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will update the others to use "E"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think they've all been addressed.
src/impl_fmt.rs
Outdated
} | ||
|
||
|
||
fn dynamically_format_decimal( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idea,not blocking: Another idea I was playing around with was incorporating the use of the alternate format flag #
, see https://doc.rust-lang.org/std/fmt/index.html#sign0.
We could use the alternate format to choose a different format, or force the output into the format_exponential
or format_full_scale
branches
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're saying println!("{:#}", d)
should use "format_full_scale"? (No threshold check?)
I like it.
That paired with std::fmt::{UpperExp,LowerExp}
should give the user full control.
println!("{:e}", d); // 1.2e+1
println!("{:E}", d); // 1.2E+1
println!("{:#}", d); // 12
println!("{}", d); // 12
println!("{:?}", d); // BigDecimal("12E+0")
I've amended format_exponential
to take an "e_symbol" parameter so we can choose between e & E (and any other values as required in the future, though I'm not sure if any localization changes that particular char).
Do we leave the case that a user printing "{:#}", BigDecimal("1e-1000000"))
will print many zeros? I.e. do we have a larger threshold for number of zeros for "alternate" mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes break a lot of tests so I'm going through those before pushing what I've done.
I left some nonblocking comments (on my own PR hahaha), let me know what you think! I can go ahead and implement anything, I feel bad making you do all this work! Thanks again for looking at this |
I didn't mean to force push onto your branch, but if you like the changes I'm glad I did. I removed the updated threshold check, I'm working on redoing that commit to also update the tests. I'll look at all this again tonight. |
I'm glad you did! The changes you made look good |
3b33646
to
379e1c2
Compare
77a3641
to
dbf3b48
Compare
I'm about ready to call this done. I wrote a little code-writing python script to generate unit-test mods so the Rust bigdecimal formatting of |
This was merged and released last night in 0.4.3. I forgot to use the GitHub ui, oops. To make it look merged I'm going to merge it now into trunk and then force push updates away. |
Thanks for all your work on this @akubera ! |
'twas a long time coming. Let me know right away if there's any problems! |
This PR is related to some of the comments in #108, but there is not a specific issue created separately for this. Please let me know if you want me to create an issue first for some discussion.
The first commit of the PR is mostly just adding tests to capture the current behavior.
The second commit is the more important, it adds a different formatting code path to the
fmt::Display
impl, so that numbers that have a large (absolute) scale relative to their integer representation length will be displayed in exponential format. The second commit message has more detail.I chose the
(abs_int.len() as i64 - self.scale - 1)
as the criteria for the cutoff because I didn't want it to trigger only based on thescale
. I imagined a scenario where the scale is large (either negative or positive), but it is inflated because theint_val
has a large number of digits.I'm specifically targeting cases where someone could input a small string like
1E100000
and then crash a program by making it do a huge allocation.Thanks for making and maintaining the
bigdecimal
crate! We find it very useful