-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Update the double/float formatters to return the shortest roundtrippable string. #22040
Conversation
This is marked [WIP] as I am still doing some final validation that everything works correctly. |
This passes all 100,000,000 of the ES6 validation tests. |
cc. @jkotas, @danmosemsft |
Some examples: 1.1 (no change from previous):
double.MaxValue:
0.84551240822557006:
|
Do you aim to also fix Are you doing perf measurements? |
There is nothing to fix here, the user explicitly requested 17 digits.
Yes, I plan on getting some perf measurements here. |
Rebased onto dotnet/master |
Going through the CoreFX failures now. There are 206 of them, but most of them look to be bugs that have been resolved. For example: Microsoft.VisualBasic.Tests.ConversionsTests.ToSingle_Obejct_ReturnsExpected The input is The input, when converted to a float, is exactly This PR causes the result to be
|
Rebased and rerunning tests as the old jobs were since deleted. |
Added a new commit which ensures
I tested the following values: The results are here: |
|
The latest commit (which is hopefully the last one, outside disabling the CoreFX tests in CoreFX.issues.json) fixes the formatters to take the format specifier into account when handling the precision. As a basica summary, this means In the latter case, this means that the trailing digit (when more than 17 digits are requested) is no longer always 0, it also means that we always fill the integral portion and only use - C17 $1.10000000000000010
+ C17 $1.10000000000000009
- C18 $1.100000000000000090
+ C18 $1.100000000000000089
- C19 $1.1000000000000000890
+ C19 $1.1000000000000000888
...
- E17 4.94065645841246540E-324
+ E17 4.94065645841246544E-324
- E18 4.940656458412465440E-324
+ E18 4.940656458412465442E-324
-E19 4.9406564584124654420E-324
+ E19 4.9406564584124654418E-324
...
- F 179769313486231570000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.00
+ F 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.00
- F0 179769313486231570000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
+ F0 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368
- F1 179769313486231570000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.0
+ F1 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.0
...
- N17 123,456,789.01234600000000000
+ N17 123,456,789.01234567165374756
- N18 123,456,789.012346000000000000
+ N18 123,456,789.012345671653747559
- N19 123,456,789.0123460000000000000
+ N19 123,456,789.0123456716537475586 |
Why can't we make them always 0? |
The user requested |
This should all line up with what is documented and given in examples here: https://docs.microsoft.com/en-us/dotnet/standard/base-types/standard-numeric-format-strings. They just don't use any numbers that have more than 15-17 significant digits in their examples. |
The diff file (showing a few example numbers) from the latest change is here: new_diff.txt. The baseline remains the same. |
The latest changes brings the CoreFX failure count from 206 down to 55. I have gone through all 55 tests and they are all instances where either:
An example of the first case is for
An example of the second case is for
Another example of the second case is for
|
There look to be a couple of asserts being hit in the Checked jobs as well (around |
This should be ready for review now. Perf tests are running and I should have numbers shortly. |
Perf numbers here, with those for It can basically be summed as:
I am also running a bench locally on the entire |
Looks like, overall, the change (for Some additional data would be that, before this PR, we only had After the change, by default, we are now producing roundtrippable results for 100% of the inputs and are producing the shortest string that allows this (which generally results in "prettier" results). |
Given that we explicitly have the However, doing so would mean some of the bugs listed in the original post would not be closed (or would only be partially resolved). One such case is https://github.com/dotnet/coreclr/issues/13615, which tracks I would like to hear some other opinions here as well, but my preference would be to take the perf hit here, and work at getting it back in other ways. Such as:
|
It is also worth noting that, for what I would presume is the most common case (serialization) there should be no perf regression. For code using
Edit: |
CC. @danmosemsft, @stephentoub, @jkotas. Could I get some weigh-in here. The summary of the above is:
It is my belief that |
The alternative is to keep |
The different performance characteristics look fine to me. Thanks for collecting the data. I would be more worried about the compatibility / breaking potential of this change. I think we are ok on this front too, pending further feedback. |
@eerhardt, @ahsonkhan. I believe I have responded to all feedback (either with an appropriate fix or a comment explaining the reasoning). |
@ahsonkhan, any other feedback here? |
Logged https://github.com/dotnet/coreclr/issues/22343 to track the potential perf improvements to the Will hold off on merging until after I get a PR for the CoreFX test fixes up. |
In that case, marking as no merge. |
CoreFX test fixes are here: dotnet/corefx#35016 Will merge this after I see the NetFX leg pass, the NetCore tests passed locally. |
The implementation of this was a terrible decision, especially for those migrating from full framework to .net core. Not everyone cares about rounding tripping. Some of us actually care more about, I don't know, how things get displayed to users (.net isn't just used as a backend). Changing the default behavior and then making no way to easily override it was just very poor foresight. |
The changes are detailed here, including several ways to get output that is generally compatible with the previous formatting behavior: https://devblogs.microsoft.com/dotnet/floating-point-parsing-and-formatting-improvements-in-net-core-3-0/#potential-impact-to-existing-code However, to give some perspective on why this was done... Imagine you were working with integers rather than floats. That by default, when you called This was the world of Breaking changes are certainly frustrating, but so is not being able to write a program that can deterministically compute the correct result or that can share information losslessly with other programs. The same goes for favoring display usage over correctness as the default behavior (however, I think we hit a good middle ground by returning the shortest roundtrippable string). This along with the changes required for users to continue printing "pretty" strings after the break being fairly minimal tipped the scales in favor of making the break to provide a better .NET. |
WHAT?!? that's nonsense. You're changing the default behavior of probably one of the most fundamental methods in the entire framework: At the very least, make it easy to opt-out: If you wan't to fix bugs with round tripping, great, fix those bugs. But don't change the default behavior/intention of the parameter-less This is not "providing a better .NET". You've made it an order of magnitude worse. |
|
Which explains why you should never use "==" to compare doubles, which I'm already aware of, but doesn't address the real issue. No non-programmer end user is ever going to be happy to see "3.5999999999999996". You changed the fundamental purpose of |
IEEE 754 floating-point arithmetic is deterministic. One issue people frequently encounter is that they try to treat it as normal arithmetic and expect things like
All other primitive types produce roundtrippable numbers by default and users are expected to call For example:
|
Another example is, for
There are similar cutoffs for |
@davidmilligan could you say a little more about what quantities you are using floats to represent, and why you're summing them? I assume these are continuous quantities like temperature or something of that sort? |
Here's an example of the issue with a trivial WPF app: https://github.com/davidmilligan/RoundingRegressionExample Type 1.2 into the first text box and 2.4 into the second one and click Add. A normal user will not be very happy with that result. The "fixed" box has the suggested fix of using G15 from the article, and granted, in this trivial example, applying the fix is very easy, and it works. However, imagine any sort of non-trivial project. Perhaps it has hundreds of text boxes bound to doubles across dozens of screens along with thousands of other text boxes not bound to doubles. Applying the fix is anything but easy, in fact its extremely time consuming (you can't do a find and replace or anything like that, you've got to check each and every Binding in your application, out of maybe 10 thousand, and see if it's binding to a double), and there's no way to just globally change the default behavior of ToString() back to what it used to be, and no generic way to get WPF to do something different either. @tannergooding You keep trying to prove why it's better. I don't dispute that it probably is "better" in general, but I don't care. It's completely different and focused in a completely different direction now. That's my point. People have structured massive programs around the intentions of the previous behavior, and it's just such a basic, fundamental part of the framework, making this breaking change just doesn't make sense, at least not without a way to opt-out. |
@davidmilligan I'm curious what kinds of quantities you're representing in your app with floats -- not how to repro what you're seeing. I'm not suggesting it's not a reasonable scenario, just curious what your scenario is for adding floats and presenting them. |
In one particular scenario, these are weights in pounds. Users may be entering various weights of individual items that make up some larger components and the total weight is computed. |
- Revert workarounds for the issues resolved by dotnet/coreclr#22040 - See that PR for the links to all the specific issues it resolved
…ble string. (dotnet/coreclr#22040) * Updating Number.Formatting to always compute a round-trippable number by default. * Adding a third-party notice entry for the Grisu3 reference implementation * Fixing up the Grisu3 algorithm to better match the reference implementation (including comments, etc) * Porting the Grisu3 implementation that generates the shortest roundtrippable sequence. * Updating the Dragon4 algorithm to support printing the shortest roundtrippable string. * Fixing the double/float formatters to ignore precision specifiers for 'R' * Fixing the double/float formatters to handle +0.0 and -0.0 * Fix the casing of `point` in THIRD-PARTY-NOTICES Co-Authored-By: tannergooding <tagoo@outlook.com> * Fixing the double/float formatting code to consider a precision specifier of 0 to be the same as default. * Fixing the double/float formatter so that nMaxDigits is set appropriately. * Changing the double/float formatting logic to account for the format when determining the correct precision to request. * Updating the double/float formatter to take the format specifier into account when determining the number of digits to request. * Fixing the double/float formatting code to continue handling zero correctly. * Disabling some outdated CoreFX tests. * Responding to various feedback from the PR Commit migrated from dotnet/coreclr@a21b151
The double/float formatters are currently implemented using the
Grisu3
andDragon4
algorithms. However, they were only using the variants that return an explicitly provided digit count (precision).This updates the algorithms to also support the variants that return a "shortest roundtrippable string" (i.e. the shortest string that, when reparsed, will return the original value). This variant is chosen for "R" and when no precision specifier is given.
This allows us to return strings that are both "pretty" and that will return the original value when requested.
This resolves:
R
(and other formats where no precision is specified) should now always round-trip.G
are now equivalent toR
and will return the shortest round-trippable string