Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make converting a double/float to a string (or vice-versa) IEEE compliant #19802

Closed
tannergooding opened this issue Aug 31, 2018 · 9 comments

Comments

@tannergooding
Copy link
Member

commented Aug 31, 2018

This tracks essentially all the IEEE 754:2008 requirements:

Positive Zero

  • Produce a recoverable string from Positive Zero
  • Recover a string from Positive Zero

Negative Zero

  • Produce a recoverable string from Negative Zero
  • Recover a string from Negative Zero

Positive Infinity

  • Produce a recoverable string from Positive Infinity
  • Recover a string from Positive Infinity
  • Produce a string matching one of "inf" or "infinity" (case-insensitive) with an optional preceding '+'
  • Recover a string matching one of "inf" or "infinity" (case-insensitive) with an optional preceding '+'

Negative Infinity

  • Produce a recoverable string from Negative Infinity
  • Recover a string from Negative Infinity
  • Produce a string matching one of "-inf" or "-infinity" (case-insensitive)
  • Recover a string matching one of "-inf" or "-infinity" (case-insensitive)

Quiet NaN

  • Produce a recoverable string from quiet NaN
  • Recover a string from quiet NaN
  • Produce a string matching "nan" (case-insensitive), with an optional preceding sign
  • Recover a string matching "nan" (case-insensitive), with an optional preceding sign
  • Optionally, produce and recover quiet NaN which appends the NaN payload

Signalling NaN

  • Produce a recoverable string from signalling NaN
  • Recover a string from signalling NaN (recovery to quiet NaN is allowed)
  • Produce a string matching "nan" or "snan" (case-insensitive), with an optional preceding sign
  • Recover a string matching "snan" (case-insensitive), with an optional preceding sign (recovery to quiet NaN is allowed)
  • Optionally, produce and recover signalling NaN which appends the NaN payload

Finite (non-zero) Numbers

  • Optionally, provide a means to specify the number of digits output
  • Must preserve the sign
  • Round-tripping should behave correctly under the default rounding mode ("tiesToEven")
  • Round-tripping requires M digits, where M=17 for double and M=9 for single
  • The lower limit for significant digits converted should be 1
  • The upper limit H, if any, on significant digits converted must be >= M + 3 (20 for double and 12 for single)
  • Conversions for which more than H digits are requested must be padded with zeros
  • Conversions from a string of more than H digits must be correctly rounded to H digits and then converted
@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Aug 31, 2018

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Aug 31, 2018

Completed: Negative Zero should be covered by #19775

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Aug 31, 2018

Negative/Positive infinity are interesting as today we produce the unicode symbol, while the spec requires us to produce (and support) inf or infinity (case-insensitive)

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Aug 31, 2018

NaN should be trivial if we don't want to support the optional parts, we just need to change the parsing to case insensitive, and also support "snan" being recovered to double.NaN

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Aug 31, 2018

The Finite Numbers section shouldn't be overly complicated to support, as our existing implementation should handle it all correctly. We should just need to increase the "limits" defined in a few places

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Sep 11, 2018

For both parsing and formatting values we use the IFormatProvider given (which defaults to CurrentInfo.

I would presume the desired direction is to always strictly follow the IFormatProvider given and to ensure that NumberFormatInfo.InvariantInfo is strictly IEEE compliant. For English, this will mean that the default provider is non-IEEE compliant. However, InvariantInfo will be compliant (and already does the right thing for the most part).

This would, for netcore3.0, result in the following new behavior...

When Formatting:

  • Negative sign will be given whenever relevant (this impacts -0.0 for double, float, and decimal)

When Parsing:

  • Negative sign will be preserved whenever relevant (this impacts -0.0 for double, float, and decimal)
  • NaNSymbol, PositiveInfinitySymbol, and NegativeInfinitySymbol will be parsed as OrdinalIgnoreCase (today it is parsed as Ordinal)

Things to determine:

  • For NaNSymbol, we need to determine what to do with snan, which we must support parsing, but which we can convert to double.NaN
    • My initial thought would be to expose an additional SNaNSymbol property, which defaults to NaNSymbol except on InvariantCulture where it would be "SNaN"
  • For NaN we need to determine if we want to support round-tripping the bit payload (which is considered optional per the IEEE spec)
    • I don't think this is particular important for Managed programs, and (given that it is optional) we can probably revisit if anyone actually requests support for it
  • Should we support the higher IEEE limits (H) for all format specifiers or just for "R"
    • I would think yes, just support any user-defined limit, rather than cutting it off at 17. DoubleToNumber currently supports 50 characters as its upper limit, and we can pad with zeros (as per the IEEE requirement) if someone asks for more.
@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Sep 11, 2018

@jkotas, @danmosemsft, @eerhardt. Thoughts on the above?

@jkotas

This comment has been minimized.

Copy link
Member

commented Sep 11, 2018

I would spin a separate issue for SNaN. It is a new public surface and it has the highest breaking potential from all of the above. It is not unusual to see checking for NaN by string comparison.

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Nov 29, 2018

Closing this as the majority of issues have been addressed:

  • We are not planning on supporting NaN payloads (which are optional according to the spec)
  • I have opened a separate issue for parsing SNaN: #21271
    • This is for parsing SNaN to NaN (we do not have to preserve the signalling bit)
    • We already handle formatting SNaN correctly (we format to NaN, dropping the signalling bit as allowed)
  • I have opened a separate issue for roundtripping: #21272
    • I believe the appropriate fix is to update R to always return the shortest round-trippable string. There are a few papers on the subject here and this will provide a good balance between compliance and user-friendly outputs
  • I have opened a separate issue for returning strings with more than 17 digits (when the user requests it): #21273
    • We do this for some format specifiers, such as F, but not for ones like E or G

@tannergooding tannergooding moved this from To do to Done in ML.NET support work Dec 10, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
2 participants
You can’t perform that action at this time.