New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed round-trip bug of double.ToString("R"). #13140

Closed
wants to merge 2 commits into
base: master
from

Conversation

Projects
None yet
9 participants
@mazong1123
Copy link
Collaborator

mazong1123 commented Aug 1, 2017

Removed the "15 digits precision first, then 17 digits precision" converting logic in double.ToString("R"). This change fixes https://stackoverflow.com/questions/24299692/why-is-a-round-trip-conversion-via-a-string-not-safe-for-a-double

Fix #13106

Fixed round-trip bug of double.ToString("R").
Removed the "15 digits precision first, then 17 digits precision" converting logic in double.ToString("R"). This change fixes https://stackoverflow.com/questions/24299692/why-is-a-round-trip-conversion-via-a-string-not-safe-for-a-double

Fix #13106
@mazong1123

This comment has been minimized.

Copy link
Collaborator

mazong1123 commented Aug 1, 2017

@tannergooding @tarekgh @jkotas

Below are performance comparisons. Generally speaking, positive infinity, negative infinity, NaN becomes slower after the change; All other cases becomes much faster.

I guess the performance degradation relys on snprintf. Once #12894 merged, this performance degradation should be eliminated.

I fixed a bug of previous submitting. Now the performance of positive infinity, negative infinity, NaN and zeros after change applied is the same as old one, whereas other cases are nearly 2x faster than the old one.

Before change

Test Name Metric Iterations AVERAGE STDEV.S MIN MAX
perftest.DoubleToStringTest.ToStringWithFormat(format: "R", number: -∞) Duration 100 1.277 0.021 1.267 1.388
perftest.DoubleToStringTest.ToStringWithFormat(format: "R", number: -1.79769313486232E+308) Duration 64 158.625 0.450 157.799 160.515
perftest.DoubleToStringTest.ToStringWithFormat(format: "R", number: -2.2250738585072E-308) Duration 100 66.286 0.255 65.691 67.144
perftest.DoubleToStringTest.ToStringWithFormat(format: "R", number: -8.98846567431158E+307) Duration 61 166.598 1.642 165.728 178.609
perftest.DoubleToStringTest.ToStringWithFormat(format: "R", number: ∞) Duration 100 1.284 0.042 1.267 1.533
perftest.DoubleToStringTest.ToStringWithFormat(format: "R", number: 0) Duration 403 3.035 0.181 2.854 3.705
perftest.DoubleToStringTest.ToStringWithFormat(format: "R", number: 0) Duration 403 3.035 0.181 2.854 3.705
perftest.DoubleToStringTest.ToStringWithFormat(format: "R", number: 1.1125369292536E-308) Duration 100 67.109 3.627 65.601 90.308
perftest.DoubleToStringTest.ToStringWithFormat(format: "R", number: 1.79769313486232E+308) Duration 64 158.598 0.702 157.859 163.283
perftest.DoubleToStringTest.ToStringWithFormat(format: "R", number: 104234.343) Duration 100 40.740 4.287 38.873 66.117
perftest.DoubleToStringTest.ToStringWithFormat(format: "R", number: 2.2250738585072E-308) Duration 100 66.148 0.250 65.518 66.832
perftest.DoubleToStringTest.ToStringWithFormat(format: "R", number: 2.71828182845905) Duration 100 28.122 0.197 27.514 28.772
perftest.DoubleToStringTest.ToStringWithFormat(format: "R", number: 3.14159265358979) Duration 100 33.248 19.837 27.757 226.554
perftest.DoubleToStringTest.ToStringWithFormat(format: "R", number: NaN) Duration 100 1.331 0.051 1.261 1.388

After change

Test Name Metric Iterations AVERAGE STDEV.S MIN MAX
perftest.DoubleToStringTest.ToStringWithFormat(format: "R", number: -∞) Duration 100 1.485 0.448 1.243 2.844
perftest.DoubleToStringTest.ToStringWithFormat(format: "R", number: -1.79769313486232E+308) Duration 100 80.566 0.205 80.048 81.766
perftest.DoubleToStringTest.ToStringWithFormat(format: "R", number: -2.2250738585072E-308) Duration 100 34.441 0.303 33.717 35.385
perftest.DoubleToStringTest.ToStringWithFormat(format: "R", number: -8.98846567431158E+307) Duration 100 84.495 0.542 83.812 89.444
perftest.DoubleToStringTest.ToStringWithFormat(format: "R", number: ∞) Duration 100 1.257 0.019 1.249 1.386
perftest.DoubleToStringTest.ToStringWithFormat(format: "R", number: 0) Duration 403 2.857 0.440 2.684 8.553
perftest.DoubleToStringTest.ToStringWithFormat(format: "R", number: 0) Duration 403 2.857 0.440 2.684 8.553
perftest.DoubleToStringTest.ToStringWithFormat(format: "R", number: 1.1125369292536E-308) Duration 100 34.360 0.338 33.770 36.018
perftest.DoubleToStringTest.ToStringWithFormat(format: "R", number: 1.79769313486232E+308) Duration 100 80.555 0.207 80.036 81.169
perftest.DoubleToStringTest.ToStringWithFormat(format: "R", number: 104234.343) Duration 100 37.697 0.214 37.193 38.355
perftest.DoubleToStringTest.ToStringWithFormat(format: "R", number: 2.2250738585072E-308) Duration 100 34.480 0.304 33.777 35.138
perftest.DoubleToStringTest.ToStringWithFormat(format: "R", number: 2.71828182845905) Duration 100 15.094 0.220 14.540 15.766
perftest.DoubleToStringTest.ToStringWithFormat(format: "R", number: 3.14159265358979) Duration 100 15.252 0.384 14.499 17.041
perftest.DoubleToStringTest.ToStringWithFormat(format: "R", number: NaN) Duration 100 1.245 0.018 1.237 1.360
@mazong1123

This comment has been minimized.

Copy link
Collaborator

mazong1123 commented Aug 1, 2017

I'll add the test case to CoreFX after this change officially included.

@tarekgh

This comment has been minimized.

Copy link
Member

tarekgh commented Aug 1, 2017

@dotnet-bot test Ubuntu x64 corefx_baseline
@dotnet-bot test Windows_NT x86 corefx_baseline

@tarekgh

This comment has been minimized.

Copy link
Member

tarekgh commented Aug 1, 2017

@jkotas @joperezr @AlexGhiondea please have a look.

note that, this is kind of breaking change but I think we should go with this change for net core as it is producing better results when formatting the floating numbers.

@mazong1123 LGTM. Thanks again for getting this done too.

@dotnet dotnet deleted a comment from dotnet-bot Aug 1, 2017

@dotnet dotnet deleted a comment from dotnet-bot Aug 1, 2017


DoubleToNumber(value, DOUBLE_PRECISION, &number);

// Per IEEE double-precision floating-point format, double number in 17 digits precision should be round-trippable.

This comment has been minimized.

@tannergooding

tannergooding Aug 1, 2017

Member

It might be useful to quote the exact IEEE statement here:

Conversions from a supported binary format bf to an external character sequence and back again results in a copy of the original number so long as there are at least Pmin(bf) significant digits specified and the rounding-direction attributes in effect during the two conversions are round to nearest rounding-direction attributes

With an additional note that, for binary64 (double), Pmin(binary64) is 17. The algorithm given is Pmin(bf) = 1 + ceiling(p * log10(2)) where p is the number of significant bits in bf (this is 53 for binary64).

This comment has been minimized.

@mazong1123

mazong1123 Aug 2, 2017

Collaborator

Sure. I'll add these comments.

@tarekgh

This comment has been minimized.

Copy link
Member

tarekgh commented Aug 1, 2017

I'll add the test case to CoreFX after this change officially included.

I have enabled the corefx on this PR to catch any failure for the tests written expecting the old results.

@tarekgh

This comment has been minimized.

Copy link
Member

tarekgh commented Aug 1, 2017

Interesting, looks many tests failed because of that change. The majority of the failures are in the serialization and Xml cases but I am seeing interesting cases in ComponentModel converters too. We'll need to look carefully if there are any wrong cases.

@mazong1123

This comment has been minimized.

Copy link
Collaborator

mazong1123 commented Aug 2, 2017

@tarekgh I think this is why it is called a breaking change :).

One of the failed test case is:

10:12:26         Assert.Equal() Failure
10:12:26         Expected: 1.1
10:12:26         Actual:   1.1000000000000001

The representation of 1.1 in 17 digits precision is 1.1000000000000001. In the old ToString('R') logic, we end up with 15 digits which can be displayed as 1.1; After the change, it has to be 17 digits precision so as a string it has to be displayed as 1.1000000000000001. Because this test case is comparing string so it should fail.

To be clarified about the source path, In https://github.com/dotnet/corefx/blob/master/src/System.ComponentModel.TypeConverter/src/System/ComponentModel/DoubleConverter.cs#L47
we called:

return ((Double)value).ToString("R", formatInfo);

I'm getting started to understand why we added the "15 digits precision first, then 17 digits precision" logic.
It makes the string looks "pretty" but sacrifice the accuracy. It assumes that the edge cases should not happen often, and list the edge cases in MSDN to notify the user whenever you want a real "round-trip", please use "G17".

So in my opinion, we should introduce another special format specifier (may be "S" => "Special for displaying") which has the old "R" logic, and use that new format specifier in https://github.com/dotnet/corefx/blob/master/src/System.ComponentModel.TypeConverter/src/System/ComponentModel/DoubleConverter.cs#L47

return ((Double)value).ToString("S", formatInfo);

@jkotas @tarekgh @tannergooding any thoughts?

@tannergooding

This comment has been minimized.

Copy link
Member

tannergooding commented Aug 2, 2017

If the breaking changes are enough of a concern, then we would need to expose a new format specifier for the 17 digit IEEE compliant version (perhaps 'R17') and keep 'R' producing the current 15 digits.

Otherwise, users who like the "prettier" string can just use "G15" 😉

@mazong1123

This comment has been minimized.

Copy link
Collaborator

mazong1123 commented Aug 2, 2017

@tannergooding I think keep R producing current 15 digits sounds like a "cheat" for users. MSDN said it's used for round-trip so user relies on that. Later on they find R cannot guarantee the round-trip so they may be frustrated. The purpose of user who specifies "R" should be roundtrip instead of displaying pretty string. That's the original stackoverflow question.

@tannergooding

This comment has been minimized.

Copy link
Member

tannergooding commented Aug 2, 2017

I think keep R producing current 15 digits sounds like a "cheat" for users.

To be clear, I agree. However, this behavior has been this way for nearly 17 years (going back to NetFX 1.0, as best as I can tell), so changing it at this point may be more harmful than helpful.

This is something that will likely have to go to through several compat/bar checks to determine:

  • Whether this breaking change is worthwhile for new users
  • Whether this breaking change is worthwhile for users porting existing code to netcore
  • Whether this breaking change is worthwhile back-porting to netfx
  • Whether this breaking change would require a quirk if back-ported to netfx

Additionally, if we determine that the breaking-change is not worthwhile, we need to determine how a user can get an IEEE compliant form of 'R'.

This change is already shown to break tests in CoreFX and could also introduce breaking changes in production code (such as Roslyn, various serialization or IPC mechanisms, etc). So we need to be careful, even if we think it is beneficial.

@mazong1123

This comment has been minimized.

Copy link
Collaborator

mazong1123 commented Aug 2, 2017

@tannergooding sounds reasonable from compatibility's perspective.

Now it's another question - do we really need this change if the compatibility is a concern? Because if we keep the R's behavior, we do not need to introduce a new specifier R17. The user just can use G17 for guaranteed round-trip. That's what MSDN document said...

@tarekgh

This comment has been minimized.

Copy link
Member

tarekgh commented Aug 2, 2017

@mazong1123 considering the test results I am worried more about the app compat. as @tannergooding mentioned this can break the consumers of coreclr.
Also as pointed by you and in the msdn doc https://msdn.microsoft.com/en-us/library/kfsatb94(v=vs.110).aspx the app can still get the 17-digits formatting using G17 if needed to.

By default, the return value only contains 15 digits of precision although a maximum of 17 digits is maintained internally. If the value of this instance has greater than 15 digits, ToString returns PositiveInfinitySymbol or NegativeInfinitySymbol instead of the expected number. If you require more precision, specify format with the "G17" format specification, which always returns 17 digits of precision, or "R", which returns 15 digits if the number can be represented with that precision or 17 digits if the number can only be represented with maximum precision.

Considering that, now I prefer to keep the current behavior and not change it. what do you think?

@tannergooding

This comment has been minimized.

Copy link
Member

tannergooding commented Aug 2, 2017

@tarekgh, while G17 works if you explicitly know the type, I think we should provide a new Standard Numeric Format String for compliant round-trippable results (this provides better support for generics, boxed values, etc).

'R' currently works on BigInteger, Single, and Double. I, personally, want a format specifier that works for all three of these and maintains compliance, without having to remember the intricacies of the underlying format (9 for Single, 17 for Double, variant for BigInteger).

@tarekgh

This comment has been minimized.

Copy link
Member

tarekgh commented Aug 2, 2017

while G17 works if you explicitly know the type, I think we should provide a new Standard Numeric Format String for compliant round-trippable results (this provides better support for generics, boxed values, etc).

Well, introducing a new specifier will be just aliasing to G17 anyway. we can fix the doc of "R" and "G" to be explicit. if you think a new specifier will help, I will not object it but it will be really a low priority as currently we support all possible scenarios.

@tannergooding

This comment has been minimized.

Copy link
Member

tannergooding commented Aug 2, 2017

The point of the new specifier is that it would alias G17 for double and G9 for single (the user doesn't need to be "aware" of the internals, just that it works and will round trip appropriately).

I will not object it but it will be really a low priority

I'll probably add a formal proposal sometime later tonight.

@tarekgh

This comment has been minimized.

Copy link
Member

tarekgh commented Aug 2, 2017

The point of the new specifier is that it would alias G17 for double and G9 for single (the user doesn't need to be "aware" of the internals, just that it works and will round trip appropriately).

This can still be addressed in the docs. Also using G17 all the time wouldn't work?
to be clear, I am not objecting the new specifier but I am really seeing it is a very low priority to have it.

@AlexGhiondea

This comment has been minimized.

Copy link
Member

AlexGhiondea commented Aug 2, 2017

It sounds like the compat break is going to be high. I try to keep the current behavior as-is and figure out a different way forward (i.e a new specifier)

@tannergooding

This comment has been minimized.

Copy link
Member

tannergooding commented Aug 2, 2017

Also using G17 all the time wouldn't work

It would work for double. It would not work for BigInteger (as that can be arbitrary length) and you are requesting twice the number of digits for Single (where G9 is all you need for round-tripping).

I agree that it isn't very high priority and is mostly a "nice-to-have". However, it is also something that should be trivial to add and could easily be picked up by the community (or myself one evening).

@tarekgh

This comment has been minimized.

Copy link
Member

tarekgh commented Aug 2, 2017

Per discussion here, we should close this PR and wait for @tannergooding to provide the proposal for the new specifier and then we can discuss it with the design reviewers and proceed with the implementation.

@mazong1123, thanks for trying that and looking forward you participate in the new specifier work. let me know if you have any concern before closing this PR.

@jkotas

This comment has been minimized.

Copy link
Member

jkotas commented Aug 2, 2017

It sounds like the compat break is going to be high

Why do you think so? Number of failing corefx test is not necessarily a good measure of the compat break.

figure out a different way forward (i.e a new specifier)

I think that a new specifier for this would be very hard to explain. We would end up with two format specifiers that do the same thing, except that one of them is buggy.

@AlexGhiondea

This comment has been minimized.

Copy link
Member

AlexGhiondea commented Aug 2, 2017

My concern is that the CoreFx tests breaking is a symptom of what we'll see in the wild. In particular for serialization scenarios where data goes across different versions of the Framework.

@tannergooding

This comment has been minimized.

Copy link
Member

tannergooding commented Aug 2, 2017

@jkotas, in my mind, it's the types of tests that failed (serialization, parsing, etc) more than the number.

I think that investigating if this would break Roslyn and JSON.NET would be the minimum that should be done before the change is merged.

That being said, it really comes down to how the user is consuming it. Serializing a value using value.ToString("R") and then deserializing it using double.Parse() isn't likely to have any problems (since the string will still be parsable and it was always the case that ToString("R") could potentially return 17 digits), but its probably worth double checking to make sure that this won't break any existing production code.

The CoreFX tests are really only failing because we are checking that value.ToString("R") is returning a specific string.

@jkotas

This comment has been minimized.

Copy link
Member

jkotas commented Aug 2, 2017

I think that investigating if this would break Roslyn and JSON.NET

Agree. Data about whether this breaks real-world code would be useful.

@tannergooding

This comment has been minimized.

Copy link
Member

tannergooding commented Aug 14, 2017

Looks the breaking scenario is when the app expect the exact formatted double number that used to get before (I mean when the app persists such formatted string).

@tarekgh, It only when they compare the outputted string to another string that it is breaking.

If they persist the string, but parse it before any comparisons, then everything is fine.

double->string == string is broken after this change
double->string->double == double should continue working after this change
double->string matches format should continue working after this change

@tarekgh

This comment has been minimized.

Copy link
Member

tarekgh commented Aug 14, 2017

It only when they compare the outputted string to another string that it is breaking.

Yes, that is what I meant, they compare formatted strings. I am still keeping my preference I mentioned.

@joperezr

This comment has been minimized.

Copy link
Member

joperezr commented Aug 14, 2017

I think I agree with @tarekgh on this one. I would rather not take this due to potential breaks. It looks risky and we already have an alternative in place for it (G17) in case people need the roundtrip. Waiting on @AlexGhiondea and @jkotas to chime in, but I think we shouldn't take it.

@jkotas

This comment has been minimized.

Copy link
Member

jkotas commented Aug 14, 2017

"do nothing" is fine with me for now, but we may want to look at this together with the fixing the parser to be IEEE compliant. Fixing the parser would be a similar kind of observable change.

Note that we have been treating changes in floating point precision (precision improvements in particular) as acceptable changes, even in .NET Framework. People are sometimes surprised by it (e.g. #12737), and it breaks poorly written tests or numerically unstable algorithms occasionally (*). .NET Core is not meant to be bug-for-bug compatible from version to version, so I would think that precision improvements are no-brainer to take in .NET Core.

(*) I have debugged a pool game once. The ball flown out from the table on a new .NET version from time to time. I turned out that the ball impact algorithm was numerically unstable and floating point precision improvement caused it to not converge.

@tarekgh

This comment has been minimized.

Copy link
Member

tarekgh commented Aug 14, 2017

we may want to look at this together with the fixing the parser to be IEEE compliant

We already have the issue #1316 tracking fixing the parser. I think mainly we need to pick the implementation from Roslyn.

@tannergooding

This comment has been minimized.

Copy link
Member

tannergooding commented Aug 14, 2017

so I would think that precision improvements are no-brainer to take in .NET Core.

I would second this, especially when the precision improvements also bring a perf improvement like this change.

@gafter

This comment has been minimized.

Copy link
Member

gafter commented Aug 15, 2017

My preference here is just to keep the current behavior and update the documentation to ask people use G17 as needed. I'll leave it to @AlexGhiondea and @jkotas to advise about that.

The documentation for the "R" format specifier already describes these problems (bugs) and advises the use of "G17" to avoid them. See #13140 (comment)

@mazong1123

This comment has been minimized.

Copy link
Collaborator

mazong1123 commented Aug 16, 2017

@gafter The document is asking user to use "R" which would introduce the issue discussing here. Althougth there's a notice in the middle of the doc, user may not notice that and it does not make sense to ask user to use "G17" for x64 and any cpu, whereas use "R" for x86. So IMO we should always ask user to directly use "G17" for roundtrip , and mark "R" as obsoleted if we decide to keep current R's logic.

@tarekgh

This comment has been minimized.

Copy link
Member

tarekgh commented Aug 16, 2017

you can log the doc issue here https://github.com/dotnet/docs

I would avoid saying R is obsolete in the doc. but I agree the doc should be clear recommending G17 for who want to guarantee the round tripping.

@mazong1123

This comment has been minimized.

Copy link
Collaborator

mazong1123 commented Aug 16, 2017

@tarekgh OK got it. I'm waiting for the conclusion here. Once we decide to keep current behavior, I'll file an issue at https://github.com/dotnet/docs

@tarekgh

This comment has been minimized.

Copy link
Member

tarekgh commented Aug 16, 2017

go ahead and file the doc issue. we'll keep the current behavior for now.

@karelz

This comment has been minimized.

Copy link
Member

karelz commented Aug 28, 2017

What is status of this PR? No update in last 2 weeks ...

@tarekgh

This comment has been minimized.

Copy link
Member

tarekgh commented Aug 28, 2017

Closing it. @joperezr @AlexGhiondea please re-open if you think otherwise.

@tarekgh tarekgh closed this Aug 28, 2017

@karelz karelz modified the milestone: 2.1.0 Aug 28, 2017

@tannergooding

This comment has been minimized.

Copy link
Member

tannergooding commented Aug 28, 2017

@tarekgh, I still don't like that "Use G17" is the workaround. This causes additional overhead for generics or, places where you might be using a float instead of a double (G9 would be appropriate here), or places where you might be using a decimal instead of a double (G17 would cause a loss in precision).

If we ever extend things to support half precision or quad precision (half precision support may be partially available with the hardware intrinsic proposal from Intel), they will also have similar issues.

I think we need to either take the break and fix 'R', or provide a separate formatter that provides IEEE compliant round-tripping.

@tarekgh tarekgh modified the milestones: Future, 2.1.0 Aug 28, 2017

@tarekgh

This comment has been minimized.

Copy link
Member

tarekgh commented Aug 28, 2017

@tannergooding as @jkotas suggested before we'll look at the whole story of the formatting and parsing together. some people (including myself) showed app compat concern of changing "R" now.

I would suggest we should look more into mitigating the app compat concern before we introduce a new specifier. @AlexGhiondea, what you think about fixing "R" as this PR did with having AppContext switch to go back to the old behavior if needed? does this mitigate the app compat concern?

@gafter

This comment has been minimized.

Copy link
Member

gafter commented Aug 29, 2017

I would recommend as a fix a correct implementation of both formatting and conversion, rather than a workaround for bugs in the current implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment