Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Float formatting changed behavior in between 2.0 and 2.1 #31847

Closed
eerhardt opened this issue Aug 20, 2018 · 27 comments

Comments

@eerhardt
Copy link
Member

commented Aug 20, 2018

Run the following program on .NET Core 2.0 and 2.1:

static void Main(string[] args)
{
    float f = BitConverter.Int32BitsToSingle(0x435d4200);//221.2578125f

    Console.WriteLine(new StringBuilder().AppendFormat("{0:R}", f));

    Console.WriteLine(f.ToString("R", null));
    Console.WriteLine(f.ToString("G9"));
}

Expected result
You should get the same output between 2.0 and 2.1.

Actual result

Using .NET Core 2.0 you get the output:

221.257813
221.257813
221.257813

Using .NET Core 2.1 you get the output:

221.257812
221.257812
221.257812

What's even more odd is that when I "Watch" the float value in the debugger, I always get the 2.0 value. So, even when debugging on 2.1, I see different values in the Watch window:

image

It appears we re-wrote the number formatting from C++ to C# in between 2.0 and 2.1, so I suspect something about that change made this behavior change.

For the G9 behavior, MSDN (https://docs.microsoft.com/en-us/dotnet/standard/base-types/standard-numeric-format-strings) says

When precision specifier controls the number of fractional digits in the result string, the result strings reflect numbers that are rounded away from zero (that is, using MidpointRounding.AwayFromZero).

@stephentoub

eerhardt added a commit to eerhardt/machinelearning that referenced this issue Aug 20, 2018
eerhardt added a commit to eerhardt/machinelearning that referenced this issue Aug 20, 2018
@EgorBo

This comment has been minimized.

Copy link
Contributor

commented Aug 21, 2018

I suspect dotnet/coreclr#12894 (DoubleToNumber)
Btw, Mono uses Number.Formatting.cs from corefx (but doesn't use Number.CoreCLR) and prints correct 221.257813.
I have a local copy of CoreCLR with CoreRT's DoubleToNumber (https://github.com/dotnet/corert/blob/master/src/System.Private.CoreLib/src/System/Number.Windows.cs#L12-L39) - it also prints the expected value.

@danmosemsft

This comment has been minimized.

Copy link
Member

commented Aug 21, 2018

Looking.

@danmosemsft

This comment has been minimized.

Copy link
Member

commented Aug 21, 2018

When the value reaches the native code, it ends in 8125:

		value	221.25781250000000	double
>	coreclr.dll!Dragon4(double value, int count, int * dec, int * sign, wchar_t * digits) Line 153	C++
 	coreclr.dll!DoubleToNumberWorker(double value, int count, int * dec, int * sign, wchar_t * digits) Line 307	C++
 	coreclr.dll!DoubleToNumber(double value, int precision, NUMBER * number) Line 323	C++
 	coreclr.dll!COMNumber::DoubleToNumberFC(double value, int precision, NUMBER * number) Line 751	C++

Then the old and new implementations seem to differ in how to handle the rounding of the 5.

The new implementation rounds toward the even digit ("IEEE rules") here

The old implementation always rounds up if it's 5 here

@danmosemsft

This comment has been minimized.

Copy link
Member

commented Aug 21, 2018

Next step is probably to build a debug runtime to find out why 221.257813f becomes 221.2578125 there.

@danmosemsft

This comment has been minimized.

Copy link
Member

commented Aug 21, 2018

It is because it is being reinterpreted as a double. If I have (in C++)

    int i = 0x435d4200;
    float f = *((float*)&i);
    double d = f;

then in the debugger I get

  | d | 221.25781250000000 | double
  | f | 221.257813 | float
@eerhardt

This comment has been minimized.

Copy link
Member Author

commented Aug 21, 2018

The new implementation rounds toward the even digit ("IEEE rules") here

That seems inconsistent with the rest of the formatting code. For example:

float f = 8.125f;
Console.WriteLine(f.ToString("G3", null));
Console.WriteLine(f.ToString("N2", null));

produces

8.13
8.13

Reading https://docs.microsoft.com/en-us/dotnet/standard/base-types/standard-numeric-format-strings

When precision specifier controls the number of fractional digits in the result string, the result strings reflect numbers that are rounded away from zero (that is, using MidpointRounding.AwayFromZero).

@danmosemsft

This comment has been minimized.

Copy link
Member

commented Aug 21, 2018

I'll throw up a PR for wider review.

eerhardt added a commit to dotnet/machinelearning that referenced this issue Aug 21, 2018
Move to netcoreapp2.1 (#690)
* Move to netcoreapp2.1

* PR Feedback

* Fix F# test to build with latest tools

* Fix up a couple more netcoreapp2.0 usages

* Update baselines to work around .NET Core 2.1 change

See dotnet/corefx#31847

* Bump the CLI version to 401.
Add a comment to F# test project.
@danmosemsft

This comment has been minimized.

Copy link
Member

commented Aug 21, 2018

Actually before I do this since this is not my area of expertise I would like to get the opinions of @tannergooding and @mazong1123 also. Do you guys see any reason to not use the "round half away from zero" which appears to return us to 2.0/.NET Framework behavior?

In this example, the effect of reinterpreting into a double causes the number to get smaller, so rounding up produces a more correct result when we convert back to float. Are there instances where it causes it to get worse?

One of the cited documents points out that the choice we make when we parse a string into a float is also relevant, because "If you know this rule in both algorithms, there are times when you can print out a unique representation of a number with one less digit."

@Tornhoof

This comment has been minimized.

Copy link
Contributor

commented Aug 21, 2018

As you listed the page about fp to string, there is a fairly recent addition to grisu and dragon.
Ryu:
https://dl.acm.org/citation.cfm?id=3192369
https://pldi18.sigplan.org/event/pldi-2018-papers-ry-fast-float-to-string-conversion
https://github.com/ulfjack/ryu

@danmosemsft

This comment has been minimized.

Copy link
Member

commented Aug 21, 2018

@Tornhoof that is interesting - it claims 3x faster. That is worth opening a new issue, if you want to.

@tannergooding

This comment has been minimized.

Copy link
Member

commented Aug 22, 2018

@danmosemsft, the new implementation is correct and (to my knowledge) is following the IEEE rounding rules.

As for "why" it is becoming 221.2578125, you have to keep in mind that you are doing "string" -> "binary32" -> "string". The first step ("string" -> "binary32") is still a conversion and will not always be "exactly" what you enter. 221.257813 when parsed according to IEEE rules, becomes 221.2578125 (which is the "closest" number representable by the IEEE 754:2008 binary32 format).

The second ("binary32" -> "string") is also a conversion and partially depends on the format specifier used. The debugger, depending on what runtime it is using for its evaluator, and what it is using for the format specifier, will ultimately decide what is getting displayed here.

@danmosemsft

This comment has been minimized.

Copy link
Member

commented Aug 22, 2018

@tannergooding as I understand it IEEE754 allows any of 5 rounding rules - although round nearest may be the default.
If that's the case, which is the right one?

@tannergooding

This comment has been minimized.

Copy link
Member

commented Aug 22, 2018

That seems inconsistent with the rest of the formatting code.

@eerhardt, "R" (round-trippable) is also special and completely ignores the precision specifier. I would also think it would be better for Single and Double to follow the default IEEE behavior for rounding and that users would be expected to explicitly call Math.Round if they wanted different behavior (the IEEE behavior, will generally ensure more accurate handling of IEEE numbers, overall).

@tannergooding

This comment has been minimized.

Copy link
Member

commented Aug 22, 2018

@danmosemsft: Round to nearest is the default and the only one supported by the runtime, outside of using Math.Round.

ECMA-335: I.12.1.3 Handling of floating-point data types

The rounding mode defined in IEC 60559:1989 shall be set by the CLI to “round to the nearest number,” and neither the CIL nor the class library provide a mechanism for modifying this setting. Conforming implementations of the CLI need not be resilient to external interference with this setting. That is, they need not restore the mode prior to performing floating-point operations, but rather, can rely on it having been set as part of their initialization.

@danmosemsft

This comment has been minimized.

Copy link
Member

commented Aug 22, 2018

Sounds fine to me.

For your ECMA quote, so "round to nearest number" is defined as "break ties towards zero" ?

@tannergooding

This comment has been minimized.

Copy link
Member

commented Aug 22, 2018

No, "break ties towards zero" is the equivalent to the IEEE "roundTowardZero" mode:

roundTowardZero, the result shall be the format’s floating-point number closest to and no greater in magnitude than the infinitely precise result.

"round to the nearest number" is equivalent to the IEEE "roundTiesToEven" mode (which is the IEEE default for binary floating-point numbers)

the floating-point number nearest to the infinitely precise result shall be delivered; if the two nearest floating-point numbers bracketing an unrepresentable infinitely precise result are equally near, the one with an even least significant digit shall be delivered

@danmosemsft

This comment has been minimized.

Copy link
Member

commented Aug 22, 2018

OK I'm satisfied. This has been educational for me.@eerhardt?

@eerhardt

This comment has been minimized.

Copy link
Member Author

commented Aug 22, 2018

I'm not convinced the new behavior is "correct".

"R" (round-trippable) is also special and completely ignores the precision specifier.

If you look at the implementation, "R" is first trying to use "G7", and if that isn't big enough, it falls back to "G9". See https://github.com/dotnet/coreclr/blob/fea6dcacb820057eb24a25b97c337b5dcf881301/src/System.Private.CoreLib/shared/System/Number.Formatting.cs#L494-L519

                case 'R':
                case 'r':
                    {
                        // In order to give numbers that are both friendly to display and round-trippable, we parse the
                        // number using 7 digits and then determine if it round trips to the same value. If it does, we
                        // convert that NUMBER to a string, otherwise we reparse using 9 digits and display that.
                        DoubleToNumber(value, FloatPrecision, ref number);
                        if (number.scale == ScaleNAN)
                        {
                            return info.NaNSymbol;
                        }
                        else if (number.scale == ScaleINF)
                        {
                            return number.sign ? info.NegativeInfinitySymbol : info.PositiveInfinitySymbol;
                        }


                        if ((float)NumberToDouble(ref number) == value)
                        {
                            NumberToString(ref sb, ref number, 'G', FloatPrecision, info, isDecimal: false);
                        }
                        else
                        {
                            DoubleToNumber(value, 9, ref number);
                            NumberToString(ref sb, ref number, 'G', 9, info, isDecimal: false);
                        }
                        return null;

So following this, here's a case where we are explicitly breaking MSDN documented behavior:

https://docs.microsoft.com/en-us/dotnet/standard/base-types/standard-numeric-format-strings

When precision specifier controls the number of fractional digits in the result string, the result strings reflect numbers that are rounded away from zero (that is, using MidpointRounding.AwayFromZero).

float f = BitConverter.Int32BitsToSingle(0x435d4200);//221.2578125f
Console.WriteLine(f.ToString("G9"));

On netcoreapp2.0, this program produces 221.257813.
On netcoreapp2.1, this program produces 221.257812.

I've added this scenario to the original post.

@tannergooding

This comment has been minimized.

Copy link
Member

commented Aug 22, 2018

@eerhardt and I talked a bit offline.

My recommendation is to update the MSDN documentation to indicate that on .NET Core 2.1 or later, we will use "round ties to even" logic. Additionally, if required, we should consider adding a quirk mode for people requiring "back-compat" (roundTiesToAway) behavior

From "IEEE 754:2008, 5.12 Details of conversion between floating-point data and external character sequences"

Implementations shall provide conversions between each supported binary format and external decimal character sequences such that, under roundTiesToEven, conversion from the supported format to external decimal character sequence and back recovers the original floating-point representation, except that a signaling NaN might be converted to a quiet NaN.

@mazong1123

This comment has been minimized.

Copy link
Collaborator

commented Aug 23, 2018

I may come a little bit late. But I'm agree with @tannergooding . Also the "R" behavior is not compliant and introduced roundtrip issues I remember. We have a lot of discussing regarding back-compat vs compliant.

@ahsonkhan

This comment has been minimized.

Copy link
Member

commented Aug 28, 2018

Another example of a discrepancy with a fairly trivial repro (useful test case for future reference):

double value1 = 88.102843;
double value2 = 88.102842;

Console.WriteLine(value1.ToString("R")); // 88.102842999999993
Console.WriteLine(value2.ToString("R")); // 88.102842

Also, the debugger is showing the same, incorrect value (it is most likely just calling ToString under the covers):
image

@jkotas, thanks for linking me to this issue.

@tannergooding

This comment has been minimized.

Copy link
Member

commented Aug 28, 2018

@ahsonkhan, I actually don't see what is wrong with your example. What are you expecting the values to be?

@ahsonkhan

This comment has been minimized.

Copy link
Member

commented Aug 28, 2018

I actually don't see what is wrong with your example. What are you expecting the values to be?

I expect the following:

double value1 = 88.102843;
Assert.Equal("88.102843", value1.ToString("R")); // Fails and returns 88.102842999999993
@tannergooding

This comment has been minimized.

Copy link
Member

commented Aug 28, 2018

That is an incorrect expectation.

String -> Double -> String is never guaranteed to roundtrip, because a string may have a value that is not exactly representable by a double (as the case you gave shows).

  • This is the sequence happening here, because the compiler has to take the string "88.102843" from source code and parse it into a double during compilation

Round-tripping is only guaranteed for double -> string -> double, where the exact double value can be converted to an exact 17 digit string and for which that string should produce an identical double value when it is parsed itself.

@tannergooding

This comment has been minimized.

Copy link
Member

commented Aug 28, 2018

For reference, in the case of your code, 88.102843 cannot be exactly represented by the double-precision format and the closest representable number (according to the IEEE rounding rules) is 88.102842999999993 (binary representation: 0x40560694FACE67D7).

The two surrounding values are:

  • 88.102842999999979 (binary representation: 0x40560694FACE67D6)
  • 88.102843000000007 (binary representation: 0x40560694FACE67D8)

The input string 88.102843 is equally near to the selected value 88.102842999999993 and another value 88.102843000000007 (it is 0.000000000000007 away from each). The default IEEE rounding rules say that, in this scenario, the value with the even least significant digit must be chosen (from its bit representation), which gives us 88.102842999999993

eerhardt added a commit to eerhardt/machinelearning that referenced this issue Oct 9, 2018
Enable a QuantileRegression test
Enable CommandTrainScoreEvaluateQuantileRegression, add the dataset and the necessary baselines.

The only baseline changes were of the form that were caused by dotnet/corefx#31847.
eerhardt added a commit to dotnet/machinelearning that referenced this issue Oct 10, 2018
Enable a QuantileRegression Test & Fix Duplicated Baseline Files (#1193
)

* Enable a QuantileRegression test

Enable CommandTrainScoreEvaluateQuantileRegression, add the dataset and the necessary baselines.

The only baseline changes were of the form that were caused by dotnet/corefx#31847.

* Allow BaseTestBaseline to check Common first.

A lot of baseline tests have duplicated baselines between debug and release. Allow BaseTestBaseline to check the Common baseline directory for a baseline file first.

Fix #410

* PR feedback

Add more info to the README.
Add column headers to the housing.txt data file.
@tannergooding

This comment has been minimized.

Copy link
Member

commented Nov 29, 2018

@eerhardt, do you believe this has been sufficiently discussed/resolved? If so, can you close?

@eerhardt

This comment has been minimized.

Copy link
Member Author

commented Nov 29, 2018

Yes, this can be closed now that we have updated the documentation.

@eerhardt eerhardt closed this Nov 29, 2018

@karelz karelz added this to the 3.0 milestone Dec 21, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants
You can’t perform that action at this time.