Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Textual format: floating-point representation #292

Closed
jfbastien opened this issue Aug 10, 2015 · 16 comments
Closed

Textual format: floating-point representation #292

jfbastien opened this issue Aug 10, 2015 · 16 comments

Comments

@jfbastien
Copy link
Member

I'm opening up a can of bikeshed: how should the textual WebAssembly format represent floating-point numbers?

This has nothing to do with the binary format.

Keep in mind:

  • Currently we support f32 and f64.
  • We'll eventually support vectors of these types.
  • Must support NaN properly.
    • Do we support a leading sign?
    • We don't differentiate between different NaN types at the moment. Do we allow n-char-sequence after NaN as printf does?
    • How do we capitalize NaN (if we represent it as text at all)? nan NaN NAN.
  • Must support ±infinity.
    • How do we spell it (if we represent it as text at all)? infinity INFINITY inf INF .
  • Must support ±0.
  • Must support denormals, but that's likely a non-issue.
  • Not all numbers are accurately representable as f32 or f64: decimal (e.g. 3.141592) and exponential (e.g. 6.022140e23) often have such non-representable numbers that need to be rounded somehow.
  • We can support multiple inputs for programmer's convenience, but then we're making the text format parser more complex (which doesn't really matter).

A few ideas:

  • little-endian hexadecimal representation of the IEEE-754 number (e.g. 0x7FF8000000000000 is positive NaN with 0 bits). This is effectively a reinterpret_cast from integer to floating-point.
  • C99 hexadecimal floating-point (e.g. 0x1p1 is 2.0).
  • Precisely-representable decimal and exponential notations only.
  • Any decimal and exponential notation, with a spec-mandated rounding.

Or a combination of the above.

@kg
Copy link
Contributor

kg commented Aug 10, 2015

Some thoughts:

  • Syntax shouldn't have any unintentional similarity with integer literals. For example, 0x7FF... is undesirable because you could treat it as an integer literal. Bit/byte-pattern representations like that should have distinct syntax like (strawman) [aa bb cc dd ee ff 00 11] that isn't used for any purpose other than raw byte literals.
  • Keywords should be folded into a higher level scheme of keywords. Our textual format will already have other keywords, like types (int32, float32, etc). Things like nan and infinity should be keywords of the same sort, but contextually valid only in places where a float literal is acceptable. I'm not convinced case should matter.
  • I think we should just allow leading + or - signs on all signed literals (both int and float) and handle them appropriately. Much better than nuanced rules where the sign is only valid in some cases. This would include 0 and whatever infinity looks like.
  • I think decimal literals that aren't exactly representable should be a validation/parse error, assuming we have enough context to know the size they're being stored to.
  • I don't think we should support exponential literals. I think decimal and raw bytes are the only types we should support.

@lukewagner
Copy link
Member

I really like the idea of directly encoding the ieee754 bits directly; this aligns with the goal that the text format isn't doing any "interesting" transformations.

@sunfishcode
Copy link
Member

C99 hexadecimal float syntax is great because it also directly corresponds to the value encoding without any "interesting" transformations. And, it splits out the fields (sign bit, significand, exponent) clearly instead of just lumping them all together as opaque bits, so it's even somewhat readable.

For example, the bit pattern 0x7fefffffffffffff prints in C99 hexadecimal format as
0x1.fffffffffffffp+1023.

@kripken
Copy link
Member

kripken commented Aug 10, 2015

I believe LLVM assembly has a combination of those, which makes sense: you want to encode the bits directly for things you can't represent well in text, but that is rare, and you don't want the common case of -0.5f to look like 0x3F000000.

@jfbastien
Copy link
Member Author

I should have put a decent reference for hexadecimal floating-point.

@kripken
Copy link
Member

kripken commented Aug 10, 2015

Hmm, even 0x1p-1 for -0.5f seems quite painful to me.

@jfbastien
Copy link
Member Author

@kripken LLVM tries to support nice representations and precise ones, which leads to code such as:
https://github.com/llvm-mirror/llvm/blob/master/lib/IR/AsmWriter.cpp#L1104

@sunfishcode
Copy link
Member

0x1p-1 is much less painful than 0x3fe0000000000000 or any other bit/byte-wise representation :).

@lukewagner
Copy link
Member

(I should point out when I said "encoding the ieee754 bits directly", I was advocating for a simple, precise, non-decimal representation, not literal sequence of bits; e.g., C99 hex floats sound good.)

@kripken
Copy link
Member

kripken commented Aug 10, 2015

@jfbastien: yes, I agree it takes a little more work to get human-readable numbers. I think it's worth it; it's not that much work.

@jfbastien
Copy link
Member Author

It sounds like we have a good consensus for hexadecimal floating-point since it's precise and mildly readable. I just added support for this to LLVM:

  • I canonicalize NaNs to quiet, positive, no payload, and print nan.
  • I print infinity as infinity or -infinity.

@sunfishcode
Copy link
Member

Canonicalizing nans isn't correct; WebAssembly can have NaN payload values (even though WebAssembly doesn't currently provide NaN bitpattern propagation).

If we're going the C99 hexadecimal float route, the obvious thing to do for nan is to let nans have a [+-] prefix and to use the (n-char-sequence) approach. For example nan(0x01234567) has the bit pattern 0x7ff8000001234567 in 64-bit format. The payload would be an unsigned hexadecimal integer. That will cover all possible bit patterns.

@jfbastien
Copy link
Member Author

@sunfishcode is correct, I've updated LLVM to simply assert on SNaN and NaNs with payloads. It currently handles nan and -nan, but no other NaN value for now.

@sunfishcode
Copy link
Member

Correction to my last post: the traditional strtod-style nan(0x01234567) syntax has no ability to produce so-called "signalling NaN" representations. It is not currently clear whether WebAssembly will ever have the concept of a "signalling NaN", so it's not desirable to start making up syntax for it right now.

Fortunately, the present need here is just to have something to put in SExprs to get some basic things working, not to design what will go in the ultimate text format, so it's best to just do something simple for now. Having LLVM assert on NaN values that can't be represented easily is fine for now.

@lukewagner
Copy link
Member

Did this issue result in any design doc PRs? I don't see anything in TextFormat.md and this seems like the type of thing we should record.

jfbastien added a commit that referenced this issue Aug 28, 2015
Addresses the conclusion reached in #292.
@jfbastien
Copy link
Member Author

ಠ_ಠ at self for not doing so. Thanks @lukewagner for calling me out on it! :-)
#318 fixes this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants