Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the double/float formatters to return the shortest roundtrippable string. #22040

Merged
merged 15 commits into from Feb 1, 2019

Conversation

@tannergooding
Copy link
Member

commented Jan 17, 2019

The double/float formatters are currently implemented using the Grisu3 and Dragon4 algorithms. However, they were only using the variants that return an explicitly provided digit count (precision).

This updates the algorithms to also support the variants that return a "shortest roundtrippable string" (i.e. the shortest string that, when reparsed, will return the original value). This variant is chosen for "R" and when no precision specifier is given.

This allows us to return strings that are both "pretty" and that will return the original value when requested.

This resolves:

  • dotnet/corefx#26785
    • Values for R (and other formats where no precision is specified) should now always round-trip.
  • #21272
    • We now support returning the shortest round-trippable string.
  • #3313
    • The specific scenario listed was validated as part of this fix
  • #13106
    • The specific scenario listed was validated as part of this fix
  • #13615
    • No format specifier and just G are now equivalent to R and will return the shortest round-trippable string
@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Jan 17, 2019

This is marked [WIP] as I am still doing some final validation that everything works correctly.

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Jan 17, 2019

This passes all 100,000,000 of the ES6 validation tests.

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Jan 17, 2019

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Jan 17, 2019

Some examples:

1.1 (no change from previous):

R: 1.1
G: 1.1
G17: 1.1000000000000001

double.MaxValue:

R: 1.7976931348623157E+308
G: 1.7976931348623157E+308 (previously 1.79769313486232E+308)
G17: 1.7976931348623157E+308

0.84551240822557006:

R: 0.8455124082255701 (previously 0.84551240822557, which roundtrips to 0.84551240822556994)
G: 0.8455124082255701 (previously 0.84551240822557, which roundtrips to 0.84551240822556994)
G17: 0.84551240822557006
@danmosemsft

This comment has been minimized.

Copy link
Member

commented Jan 17, 2019

Do you aim to also fix G17: 1.1000000000000001 ?

Are you doing perf measurements?

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Jan 17, 2019

Do you aim to also fix G17: 1.1000000000000001 ?

There is nothing to fix here, the user explicitly requested 17 digits.

Are you doing perf measurements?

Yes, I plan on getting some perf measurements here.

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Jan 17, 2019

Rebased onto dotnet/master

@tannergooding tannergooding force-pushed the tannergooding:roundtrip branch from 3bc18b1 to 8b104e3 Jan 17, 2019

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Jan 17, 2019

Going through the CoreFX failures now. There are 206 of them, but most of them look to be bugs that have been resolved.

For example: Microsoft.VisualBasic.Tests.ConversionsTests.ToSingle_Obejct_ReturnsExpected

The input is -2147483648 and the expected result is -2.147484E+09.

The input, when converted to a float, is exactly -2147483650 (0xCF000000). The expected result, when parsed, actually results in -2147483900 (0xCF000001).

This PR causes the result to be -2.1474836E+09, which roundtrips to the expected value of -2147483650 (0xCF000000).

  • This new string is also "better" (shorter) than the previous string returned by R, which was -2.14748365E+09
  • This new string is longer (by one digit) than the previous string returned by G, but it is now correct.
THIRD-PARTY-NOTICES.TXT Outdated Show resolved Hide resolved
@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Jan 24, 2019

Rebased and rerunning tests as the old jobs were since deleted.

@tannergooding tannergooding force-pushed the tannergooding:roundtrip branch 2 times, most recently from 781069e to d4b3c79 Jan 24, 2019

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Jan 25, 2019

Added a new commit which ensures nMaxDigits is set appropriately:

  • digits=0 was only "invalid" for G, where it was treated the same as -1 (for others, it just removes digits after the decimal point).
  • For G and R when returning the shortest roundtrippable string, we need to ensure that nMaxDigits picks the higher of number.DigitsCount or Double/SinglePrecision, this ensures that numbers like -60 are printed as -60, rather than as -6E+01.
  • For all format specifies other than G and R, precision specifies the number of digits after the decimal point which should be printed. We need to ensure that we always request at least Double/SinglePrecision digits in these cases to ensure that we don't accidentally cut off digits in the fractional part.
    • Ideally we would actually always ensure that Grisu3/Dragon4 completely fills the integral portion and then also fills the fractional portion to the requested number of digits. This is possible, but it requires some additional work.

I tested the following values: -60, 1.1, double.Epsilon, double.MaxValue, 0.84551240822557006, Math.PI, and Math.E
with the following format specifiers: none, C, F, N, E, G, P, R; where each format specifier was tested standalone and with precisions of 0-19 (inclusive).

The results are here:

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Jan 25, 2019

double.MaxValue is the only one, from the values tested, where the precision specifier shows how it impacts both the integral and fractional portion (bullet point 3 in the above). The fix (to always fill the integral portion for these specifiers) should be fairly straightforward, but we should determine that doing so is desirable.

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Jan 26, 2019

The latest commit (which is hopefully the last one, outside disabling the CoreFX tests in CoreFX.issues.json) fixes the formatters to take the format specifier into account when handling the precision.

As a basica summary, this means G and R (which are requesting a number of significant digits) continue behaving the same, but C, E, F, N, and P (which are requesting a number of decimal digits) now take that into account.

In the latter case, this means that the trailing digit (when more than 17 digits are requested) is no longer always 0, it also means that we always fill the integral portion and only use precision for the digits after the decimal point. Some examples are:

- C17  $1.10000000000000010
+ C17  $1.10000000000000009
- C18  $1.100000000000000090
+ C18  $1.100000000000000089
- C19  $1.1000000000000000890
+ C19  $1.1000000000000000888
...
- E17  4.94065645841246540E-324
+ E17  4.94065645841246544E-324
- E18  4.940656458412465440E-324
+ E18  4.940656458412465442E-324
-E19  4.9406564584124654420E-324
+ E19  4.9406564584124654418E-324
...
- F    179769313486231570000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.00
+ F    179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.00
- F0   179769313486231570000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
+ F0   179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368
- F1   179769313486231570000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.0
+ F1   179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.0
...
- N17  123,456,789.01234600000000000
+ N17  123,456,789.01234567165374756
- N18  123,456,789.012346000000000000
+ N18  123,456,789.012345671653747559
- N19  123,456,789.0123460000000000000
+ N19  123,456,789.0123456716537475586
@jkotas

This comment has been minimized.

Copy link
Member

commented Jan 26, 2019

this means that the trailing digit (when more than 17 digits are requested) is no longer always 0

Why can't we make them always 0?

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Jan 26, 2019

Why can't we make them always 0?

The user requested x digits after the decimal point and rather than giving them x digits after the decimal point, we would give them x significant digits (regardless of whether they appeared before or after the decimal point). This resulted in weird (and IMO incorrect) behavior for numbers that had more than 15-17 significant digits. The fix to always return the requested number of decimal digits was only a few lines of additional code (basically just adjusting where the cutoff point was).

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Jan 26, 2019

This should all line up with what is documented and given in examples here: https://docs.microsoft.com/en-us/dotnet/standard/base-types/standard-numeric-format-strings. They just don't use any numbers that have more than 15-17 significant digits in their examples.

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Jan 26, 2019

The diff file (showing a few example numbers) from the latest change is here: new_diff.txt. The baseline remains the same.

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Jan 26, 2019

The latest changes brings the CoreFX failure count from 206 down to 55. I have gone through all 55 tests and they are all instances where either:

  • We were returning a string that was too short so the number didn't roundtrip and now we are returning a slightly longer string that does (no more than 9 digits total for float and 17 digits for double), or
  • We were returning a string that did roundtrip, but there was also a shorter string that did the same, in which case we now return the latter.

An example of the first case is for 18446744073709551616, where:

We were returning:    double = 1.84467440737096E+19;   float = 1.844674E+19
We are now returning: double = 1.8446744073709552E+19; float = 1.8446744E+19

An example of the second case is for Epsilon, where:

We were returning:    double=4.94065645841247E-324; float=1.401298E-45
We are now returning: double=5E-324;                float=1E-45

Another example of the second case is for float -3.40282347E+38, where:

We were returning:    -3.40282347E+38
We are now returning: -3.4028235E+38
@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Jan 26, 2019

There look to be a couple of asserts being hit in the Checked jobs as well (around LogBase2). I am looking into those.

@tannergooding tannergooding force-pushed the tannergooding:roundtrip branch 6 times, most recently from f848b11 to bfbaf07 Jan 27, 2019

@tannergooding tannergooding force-pushed the tannergooding:roundtrip branch from a2eedaa to f679baa Jan 31, 2019

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Jan 31, 2019

@eerhardt, @ahsonkhan. I believe I have responded to all feedback (either with an appropriate fix or a comment explaining the reasoning).

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Feb 1, 2019

@ahsonkhan, any other feedback here?

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Feb 1, 2019

Logged #22343 to track the potential perf improvements to the Number.BigInteger ref struct.

Will hold off on merging until after I get a PR for the CoreFX test fixes up.

@ahsonkhan

This comment has been minimized.

Copy link
Member

commented Feb 1, 2019

Will hold off on merging until after I get a PR for the CoreFX test fixes up.

In that case, marking as no merge.

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Feb 1, 2019

CoreFX test fixes are here: dotnet/corefx#35016

Will merge this after I see the NetFX leg pass, the NetCore tests passed locally.

@tannergooding tannergooding merged commit a21b151 into dotnet:master Feb 1, 2019

25 checks passed

CentOS7.1 x64 Checked Innerloop Build and Test Build finished.
Details
CentOS7.1 x64 Debug Innerloop Build Build finished.
Details
Linux-musl x64 Debug Build Build finished.
Details
OSX10.12 x64 Checked Innerloop Build and Test Build finished.
Details
Ubuntu arm Cross Checked crossgen_comparison Build and Test Build finished.
Details
Ubuntu arm Cross Release crossgen_comparison Build and Test Build finished.
Details
Ubuntu x64 Checked CoreFX Tests Build finished.
Details
Ubuntu x64 Checked Innerloop Build and Test Build finished.
Details
Ubuntu x64 Checked Innerloop Build and Test (Jit - TieredCompilation=0) Build finished.
Details
Ubuntu x64 Formatting Build finished.
Details
WIP Ready for review
Details
Windows_NT x64 Checked CoreFX Tests Build finished.
Details
Windows_NT x64 Checked Innerloop Build and Test Build finished.
Details
Windows_NT x64 Checked Innerloop Build and Test (Jit - TieredCompilation=0) Build finished.
Details
Windows_NT x64 Formatting Build finished.
Details
Windows_NT x64 Release CoreFX Tests Build finished.
Details
Windows_NT x64 full_opt ryujit CoreCLR Perf Tests Correctness Build finished.
Details
Windows_NT x64 min_opt ryujit CoreCLR Perf Tests Correctness Build finished.
Details
Windows_NT x86 Checked Innerloop Build and Test Build finished.
Details
Windows_NT x86 Checked Innerloop Build and Test (Jit - TieredCompilation=0) Build finished.
Details
Windows_NT x86 Release Innerloop Build and Test Build finished.
Details
Windows_NT x86 full_opt ryujit CoreCLR Perf Tests Correctness Build finished.
Details
Windows_NT x86 min_opt ryujit CoreCLR Perf Tests Correctness Build finished.
Details
coreclr-ci Build #20190131.741 succeeded
Details
license/cla All CLA requirements met.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.