Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix up the majority of IEEE compliance bugs for Double/Single ToString and Parse #19905

Closed
wants to merge 7 commits into from

Conversation

@tannergooding
Copy link
Member

commented Sep 12, 2018

This resolves the majority of the issues covered by #19802:

  • Ensures that double.Parse for -0 works correctly
  • Ensures that recovering NaN, PositiveInfinity, and NegativeInfinity is done case insensitively
  • Ensures that "R" always roundtrips to the required number of digits
  • Ensures that the "precision" specifier works up to 50 digits
    • 50 is the current internal limit for DoubleToNumber
    • IEEE requires at least 20 for Double and 12 for Single

This should resolve (validated locally, can close after CoreFX tests are added):

This should partially resolve, both of which deal with ToString("R"), but which also involves parsing the result back as well:

Not Resolved:

  • #1316
  • #13615
    • double.MinValue and double.MaxValue require 17 digits to roundtrip, otherwise an overflow occurs. The current default number of digits is 15, for any format that isn't "R"
  • #17467

This does not handle:

  • -nan and +nan, which IEEE requires we support parsing, but which we can parse to "Regular" NaN
  • snan, which IEEE requires we support parsing; but which we can parse to "regular" NaN
    • support for preceding - and + is also required for snan
    • IEEE does not require us to support formatting snan, we are free to format to "regular" NaN.
  • NaN and SNaN payloads, which are optional
  • +Infinity (case insensitive), which IEEE requires we support parsing
@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Sep 12, 2018

@@ -304,13 +304,20 @@ void DoubleToNumber(double value, int precision, NUMBER* number)
WRAPPER_NO_CONTRACT
_ASSERTE(number != NULL);

if (precision > NUMBER_MAXDIGITS)
{
precision = NUMBER_MAXDIGITS;

This comment has been minimized.

Copy link
@tannergooding

tannergooding Sep 12, 2018

Author Member

Given that the format specifier is currently limited to xx (that is 0-99), maybe we should just set NUMBER_MAXDIGITS to 99. Thoughts?

This comment has been minimized.

Copy link
@tannergooding

tannergooding Sep 12, 2018

Author Member

Also, can users go over 99 digits with custom format specifiers? If so, we may need to tweak the algorithm to pad 0's past the "internal limit" (as per the IEEE requirement).

Edit: To clarify, I mean tweaking the NumberToString algorithm. DoubleToNumber can still be internally limited however we need, provided it is at least 20 for double and 12 for single.

This comment has been minimized.

Copy link
@tannergooding

tannergooding Sep 12, 2018

Author Member

I pushed a commit which changes NUMBER_MAXDIGITS to 99

@tannergooding tannergooding force-pushed the tannergooding:ieee-tostring branch from 7ef4638 to 134cd94 Sep 12, 2018

fmt = 'G';
digits = DoublePrecision;
}
else if (digits > 0)

This comment has been minimized.

Copy link
@tannergooding

tannergooding Sep 12, 2018

Author Member

G results in digits = -1 and G0 results in digits = 0.

In both cases, this results in the default precision (17 for double and 9 for single).

For G0, this matches the current behavior. The IEEE spec indicates that the lower limit for significant digits converted should be 1. So, we could also just make G0 be treated as G1.

  • Thoughts/Opinions?
@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Sep 12, 2018

We will need to update some of the various documentation for numeric format strings and the like: https://docs.microsoft.com/en-us/dotnet/standard/base-types/standard-numeric-format-strings

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Sep 12, 2018

This should also measurable increase perf for "R". Will try to get numbers tomorrow.

@tannergooding tannergooding force-pushed the tannergooding:ieee-tostring branch from 134cd94 to e5ddb22 Sep 12, 2018

{
// This ensures that, for the default case, we return a string containing the old
// number of digits (15), which results in "prettier" numbers for most cases.
digits = DefaultDoubleDigits;

This comment has been minimized.

Copy link
@tannergooding

tannergooding Sep 12, 2018

Author Member

Now, (0.1).ToString() will print 0.1 (as before). But requesting more than 15 digits will return the requested number of digits (up to 50).

For example, previously (0.1).ToString("G20") returned "0.10000000000000001" (which is the same as "G17"), but now will return "0.10000000000000000555" (as is required by the IEEE spec).

As mentioned here. The internal buffer for DoubleToNumber is set to 50, but the precision specifier supports up to 99. So it might be worth resizing it so that any requested number of digits works.

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Sep 12, 2018

CoreFX has 393 failures from this, most of which are XML failures due to the "R" fix (XmlWriter serializes using "R": https://source.dot.net/#System.Private.Xml/System/Xml/XmlConvert.cs,780)

@danmosemsft

This comment has been minimized.

Copy link
Member

commented Sep 12, 2018

Can you give some examples of the values causing the failures? Looks like they use Assert.True not Assert.Equals unfortunately so it's not in the log.

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Sep 12, 2018

One example is...

Expected:

<RectangleF xmlns=\"http://schemas.datacontract.org/2004/07/System.Drawing\" xmlns:i=\"http://www.w3.org/2001/XMLSchema-instance\">
  <height>-2.5</height>
  <width>1.5</width>
  <x>1.50001</x>
  <y>-2.5</y>
</RectangleF>

//

Actual:

<RectangleF xmlns=\"http://schemas.datacontract.org/2004/07/System.Drawing\" xmlns:i=\"http://www.w3.org/2001/XMLSchema-instance\">
  <height>-2.5</height>
  <width>1.5</width>
  <x>1.50001001</x>
  <y>-2.5</y>
</RectangleF>

Most of the other examples are similar. That is, they are numbers where the string returned did not have enough digits to always be an "exact" conversion and which could potentially result in a number that did not round-trip to the same bitvalue.

else
{
// IEEE requires we correctly round to the requested number of digits
precision = digits;

This comment has been minimized.

Copy link
@tannergooding

tannergooding Sep 12, 2018

Author Member

Had to update this logic so that we preserve the fact that the user didn't explicitly specify a digit count, or that the user specified 0 for the digit count.

This makes a difference for things like double.ToString("N") which i had accidentally made to be treated like double.ToString("N15").

This comment has been minimized.

Copy link
@tannergooding

tannergooding Sep 12, 2018

Author Member

Now, digits is always preserved (except for "R", where it is explicitly documented to be ignored) and precision is updated accordingly.

This comment has been minimized.

Copy link
@tannergooding

tannergooding Sep 12, 2018

Author Member

This brought the CoreFX failure count down to 354, and they are now all XML and JSON serialization differences plus the UTF8Parser differences (which need fixup and for which the code lives in CoreFX).

@tannergooding tannergooding force-pushed the tannergooding:ieee-tostring branch from e16dfd1 to db591d5 Sep 12, 2018

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Sep 12, 2018

Dropped the commit that

Re-enables some tests temporarily disabled by #19775

The CoreFX side change hasn't flowed back into CoreCLR yet.

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Sep 12, 2018

I believe I have all the relevant tests disabled now and almost have corresponding fixes for all of them on the CoreFX side.

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Sep 13, 2018

CoreFX Side PR fixing up all the tests is here: dotnet/corefx#32268

@jkotas, @danmosemsft, @eerhardt. This should be ready for final review now.

@jkotas

This comment has been minimized.

Copy link
Member

commented Sep 13, 2018

#1316

I do not think that these changes are fixing #1316 (and other issues in your list related to parsing)

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Sep 13, 2018

The big issue in #1316 was actually mostly fixed in #19775.

For x64, most of the issues listed in #1316 were fixed (except for -0.0) back when we moved to the Grisu3/Dragon4 algorithms (which ensured proper rounding and parsing).
For x86, the parsing issues remained until #19775 when I removed the custom x86 implementation that was using the x87 FPU stack.

The remaining issues related to parsing finite values should just be -0.0, which is being handled by this PR.

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Sep 13, 2018

The other parsing issues also fall into the same category as #1316 and could be considered dupes.

I had tested a number of these scenarios locally already and was planning on porting the RealParserTests from Roslyn before finally closing all the bugs.

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Sep 13, 2018

However, #13615 in particular is no longer fixed by this PR.

We would need to special-case double.MinValue and double.MaxValue to print at 17 digits for the default format specifier ("G").

@jkotas

This comment has been minimized.

Copy link
Member

commented Sep 13, 2018

The big issue in #1316

I am not sure what you mean by the big issue #1316. The repro code in the top post in #1316 still prints wrong values:

Console.WriteLine("{0:G17}  {0:R}", Double.Parse("0.6822871999174000000"));
Console.WriteLine("{0:G17}  {0:R}", Double.Parse("0.6822871999174000001"));

Actual result:

0.68228719991740006  0.68228719991740006
0.68228719991739994  0.68228719991739994

Expected result:

0.68228719991740006  0.68228719991740006
0.68228719991740006  0.68228719991740006
private const int FloatPrecision = 7;
private const int DoublePrecision = 15;

// IEEE Requires at least 9 digits for roundtripping. However, using a lower number

This comment has been minimized.

Copy link
@jkotas

jkotas Sep 13, 2018

Member

Which paragraph does require this? I am not able to find it.

This comment has been minimized.

Copy link
@tannergooding

tannergooding Sep 13, 2018

Author Member

"5.12.2 External decimal character sequences representing finite numbers"

― for binary32, Pmin (binary32) = 9
― for binary64, Pmin (binary64) = 17

Conversions from a supported binary format bf to an external character sequence and back again
results in a copy of the original number so long as there are at least Pmin (bf ) significant digits
specified and the rounding-direction attributes in effect during the two conversions are round to
nearest rounding-direction attributes.

This comment has been minimized.

Copy link
@jkotas

jkotas Sep 13, 2018

Member

I do not read this as requirement. This paragraph just explains consequences of what is written above.

This says that round-tripping can be achieved by always having at least Pmin(bf) significant digits. It does not say that this is the only way to achieve round-tripping.

@jkotas

This comment has been minimized.

Copy link
Member

commented Sep 13, 2018

Have you looked at what other languages do for floating point number round-tripping by any chance? I would like to know whether there is pattern that most other environments follow.

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Sep 13, 2018

The repro code in the top post in #1316 still prints wrong values:

Ah, ok, I think I see the problem. It looks like (at the very least), we have a bug in our mantissa normalization logic here: https://github.com/dotnet/coreclr/blob/master/src/classlibnative/bcltype/number.cpp#L662

The "naive" algorithm for normalization would be: Edit: Nevermind, this isn't the right algorithm either.

while (((val & 0xC000000000000000) != 0x8000000000000000) && ((val & 0xC000000000000000) != 0x4000000000000000))
{
    val <<= 1;
    exp--;
}

Which we look to be following for everything except the last shift (where we incorrectly shift if the bit is zero).

@danmosemsft

This comment has been minimized.

Copy link
Member

commented Sep 13, 2018

I wish there was a way of writing 0xC000000000000000 in C++ that meant I didn't have to laboriously count zeros. In C# of course we have 0xC000_0000_0000_0000 but I see many places in Corelib we don't do that.

@jkotas

This comment has been minimized.

Copy link
Member

commented Sep 13, 2018

The "naive" algorithm for normalization would be:

I do not understand why we would need to care about the two top bits vs. just one that we care about today.

@mikedn

This comment has been minimized.

Copy link
Contributor

commented Sep 13, 2018

I wish there was a way of writing 0xC000000000000000 in C++

0xC000'0000'0000'0000

@danmosemsft

This comment has been minimized.

Copy link
Member

commented Sep 13, 2018

@mikedn thanks. Incidentally I see that separately the VS debugger claims to support back ticks but in a quick try, it doesn't work for me.

@mikedn

This comment has been minimized.

Copy link
Contributor

commented Sep 13, 2018

@danmosemsft That debugger things is non-standard. C++ does support digit separators since C++14: https://en.cppreference.com/w/cpp/language/integer_literal

@danmosemsft

This comment has been minimized.

Copy link
Member

commented Sep 13, 2018

@mikedn, right, but it wsan't working in the debugger. Anyway I'll stop distracting from the actual PR discussion... 😊

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Sep 13, 2018

I do not understand why we would need to care about the two top bits vs. just one that we care about today.

@jkotas, you're right. Not sure what I was thinking for that....

In any case, the current normalization logic is doing the following:

if ((val & I64(0xFFFFFFFF00000000)) == 0) { val <<= 32; exp -= 32; }
if ((val & I64(0xFFFF000000000000)) == 0) { val <<= 16; exp -= 16; }
if ((val & I64(0xFF00000000000000)) == 0) { val <<= 8; exp -= 8; }
if ((val & I64(0xF000000000000000)) == 0) { val <<= 4; exp -= 4; }
if ((val & I64(0xC000000000000000)) == 0) { val <<= 2; exp -= 2; }
if ((val & I64(0x8000000000000000)) == 0) { val <<= 1; exp -= 1; }

index = absscale & 15;
if (index) {
    INT multexp = rgexp64Power10[index-1];
    // the exponents are shared between the inverted and regular table
    exp += (scale < 0) ? (-multexp + 1) : multexp;

    UINT64 multval = rgval64Power10[index + ((scale < 0) ? 15 : 0) - 1];
    val = Mul64Lossy(val, multval, &exp);
}

index = absscale >> 4;
if (index) {
    INT multexp = rgexp64Power10By16[index-1];
    // the exponents are shared between the inverted and regular table
    exp += (scale < 0) ? (-multexp + 1) : multexp;

    UINT64 multval = rgval64Power10By16[index + ((scale < 0) ? 21 : 0) - 1];
    val = Mul64Lossy(val, multval, &exp);
}

Before the normalization occurs, val contains the decimal representation of the double-precision number (6822871999174)

After the normalization is finished, val contains the following bit string:

1 01011101 01010100 10111111 01110100 00111111 11010001 1011 1 000 0000000
x yyyyyyyy yyyyyyyy yyyyyyyy yyyyyyyy yyyyyyyy yyyyyyyy yyyy z www wwwwwww

* x is the implicit bit
* y is the explicit bits
* z is the rounding bit
* w is the remaining bits (which are unused)

However, the correct bit string is:

1 01011101 01010100 10111111 01110100 00111111 11010001 1011 0 111 1111100
x yyyyyyyy yyyyyyyy yyyyyyyy yyyyyyyy yyyyyyyy yyyyyyyy yyyy z www wwwwwww
@jkotas

This comment has been minimized.

Copy link
Member

commented Sep 13, 2018

Right, I think it is because of the Mul64Lossy is a wrong optimization. It needs to be a full precision multiplication instead.

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Sep 13, 2018

Right, I think it is because of the Mul64Lossy is a wrong optimization. It needs to be a full precision multiplication instead.

That might be the case. Mul64Lossy has a comment that "the error won't propagate to any of the 53 significant bits of the result". However, we need to ensure that the 54th bit is accurate as well, for rounding purposes.

However, Mul64Precise looks to compute an incorrect result as well (same as before, but with the last w bit set to 1 now):

1 01011101 01010100 10111111 01110100 00111111 11010001 1011 1 000 0000001
x yyyyyyyy yyyyyyyy yyyyyyyy yyyyyyyy yyyyyyyy yyyyyyyy yyyy z www wwwwwww

It looks like there is a comment right above the pre-computed tables indicating that Mul64Precise may also be lossy:

//
// precomputed tables with powers of 10. These allows us to do at most
// two Mul64 during the conversion. This is important not only
// for speed, but also for precision because of Mul64 computes with 1 bit error.
//
@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Sep 13, 2018

Ok, so Mul64Precise is effectively trying to do 6822871999174 * 1e-13. However, it is doing that by doing simulating a double * double operation in software.

The result it returns is correct for double * double, but is incorrect for computing the "infinitely precise result, and then rounding" (this is because 1e-13 is not "exactly representable").

@jkotas, I would think the appropriate direction here is:

  1. Merge this PR, which is a net improvement for several scenarios (provided you have no additional feedback on the current code)
  2. Open a separate PR porting the roslyn "Real Parser" to CoreFX (dotnet/roslyn#5871 and dotnet/roslyn#5379).

From the Roslyn PR:

This is a port of real conversion routines used in the C++ front end written by @JamesMcNellis. There is a new separate Windows-specific solution for running a large number of random tests compared against the C standard library, which is based on David Gay's code (the "gold standard").

@jkotas

This comment has been minimized.

Copy link
Member

commented Sep 13, 2018

Could you please update the list of issues that this PR is fixing at the top to match reality?

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Sep 13, 2018

Could you please update the list of issues that this PR is fixing at the top to match reality?

Fixed in the original post.

This fully resolves #16783, as -0 will now fully roundtrip:

This partially resolves #3313 and #13106, as the numbers will format correctly under "R", but they may not parse correctly when reading them back.

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Sep 13, 2018

Have you looked at what other languages do for floating point number round-tripping by any chance? I would like to know whether there is pattern that most other environments follow.

I looked at a number of other languages (Java, C/C++, Python, Rust) and didn't see any that had an explicit "round-trip" format specifier like .NET has.

For the other specifiesr they all seem to have some documented default (which is not the IEEE value) and which can be explicitly overridden.

For example, in C/C++, they default to 6 digits of precision, but otherwise return the number requested (trimming trailing zeros for some specifiers, but not others).

printf("%f\n", 1.1); // 1.100000
printf("%g\n", 1.1); // 1.1
printf("%e\n", 1.1); // 1.100000e+00
printf("%a\n", 1.1); // 0x1.199999999999ap+0

printf("%.15f\n", 1.1); // 1.100000000000000
printf("%.15g\n", 1.1); // 1.1
printf("%.15e\n", 1.1); // 1.100000000000000e+00
printf("%.15a\n", 1.1); // 0x1.199999999999a00p+0

printf("%.17f\n", 1.1); // 1.10000000000000009
printf("%.17g\n", 1.1); // 1.1000000000000001
printf("%.17e\n", 1.1); // 1.10000000000000009e+00
printf("%.17a\n", 1.1); // 0x1.199999999999a0000p+0
@jkotas

This comment has been minimized.

Copy link
Member

commented Sep 13, 2018

Java's toString(double) seems to be speced to return both pretty and roundtripable number. From https://docs.oracle.com/javase/7/docs/api/java/lang/Double.html: There must be at least one digit to represent the fractional part, and beyond that as many, but only as many, more digits as are needed to uniquely distinguish the argument value.

@jkotas

This comment has been minimized.

Copy link
Member

commented Sep 13, 2018

Python's default conversion to string is described as roundtripable. From https://docs.python.org/3/library/functions.html#repr: attempt to return a string that would yield an object with the same value when passed to eval(). It returns pretty strings too based on a quick test.

@jkotas

This comment has been minimized.

Copy link
Member

commented Sep 13, 2018

Are Java and Python really both pretty and roundripable by default?

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Sep 14, 2018

Briefly testing out a range of numbers in Java, it looks like yes. They looked to be correctly rounded as well.

I'm not sure some of the numbers would qualify as "pretty" however. And I am unsure of the performance of their algorithm.

@jkotas

This comment has been minimized.

Copy link
Member

commented Sep 14, 2018

the numbers would qualify as "pretty" however

as many, but only as many, more digits as are needed is a good definition of "pretty" roundtripable formatted floating point number, I think.

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Sep 14, 2018

Hmm, actually, looking around on the internet, and testing some additional values, it looks like Java does not always return the shortest string. In some cases, it will return more than 17 digits, and the result is not always correctly rounded.

A decent list of these issues can be found here: https://bugs.java.com/bugdatabase/view_bug.do?bug_id=4511638, which also includes some papers describing how to "properly" do this (and do it efficiently).

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Sep 24, 2018

Closing for the time being. This needs some more cleanup and I want to do a bit more investigation into printing the shortest roundtrippable string.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.