Textual format: floating-point representation #292

jfbastien · 2015-08-10T19:51:35Z

I'm opening up a can of bikeshed: how should the textual WebAssembly format represent floating-point numbers?

This has nothing to do with the binary format.

Keep in mind:

Currently we support f32 and f64.
We'll eventually support vectors of these types.
Must support NaN properly.
- Do we support a leading sign?
- We don't differentiate between different NaN types at the moment. Do we allow n-char-sequence after NaN as printf does?
- How do we capitalize NaN (if we represent it as text at all)? nan NaN NAN.
Must support ±infinity.
- How do we spell it (if we represent it as text at all)? infinity INFINITY inf INF ∞.
Must support ±0.
Must support denormals, but that's likely a non-issue.
Not all numbers are accurately representable as f32 or f64: decimal (e.g. 3.141592) and exponential (e.g. 6.022140e23) often have such non-representable numbers that need to be rounded somehow.
We can support multiple inputs for programmer's convenience, but then we're making the text format parser more complex (which doesn't really matter).

A few ideas:

little-endian hexadecimal representation of the IEEE-754 number (e.g. 0x7FF8000000000000 is positive NaN with 0 bits). This is effectively a reinterpret_cast from integer to floating-point.
C99 hexadecimal floating-point (e.g. 0x1p1 is 2.0).
Precisely-representable decimal and exponential notations only.
Any decimal and exponential notation, with a spec-mandated rounding.

Or a combination of the above.

The text was updated successfully, but these errors were encountered:

kg · 2015-08-10T20:16:36Z

Some thoughts:

Syntax shouldn't have any unintentional similarity with integer literals. For example, 0x7FF... is undesirable because you could treat it as an integer literal. Bit/byte-pattern representations like that should have distinct syntax like (strawman) [aa bb cc dd ee ff 00 11] that isn't used for any purpose other than raw byte literals.
Keywords should be folded into a higher level scheme of keywords. Our textual format will already have other keywords, like types (int32, float32, etc). Things like nan and infinity should be keywords of the same sort, but contextually valid only in places where a float literal is acceptable. I'm not convinced case should matter.
I think we should just allow leading + or - signs on all signed literals (both int and float) and handle them appropriately. Much better than nuanced rules where the sign is only valid in some cases. This would include 0 and whatever infinity looks like.
I think decimal literals that aren't exactly representable should be a validation/parse error, assuming we have enough context to know the size they're being stored to.
I don't think we should support exponential literals. I think decimal and raw bytes are the only types we should support.

lukewagner · 2015-08-10T20:36:25Z

I really like the idea of directly encoding the ieee754 bits directly; this aligns with the goal that the text format isn't doing any "interesting" transformations.

sunfishcode · 2015-08-10T20:43:02Z

C99 hexadecimal float syntax is great because it also directly corresponds to the value encoding without any "interesting" transformations. And, it splits out the fields (sign bit, significand, exponent) clearly instead of just lumping them all together as opaque bits, so it's even somewhat readable.

For example, the bit pattern 0x7fefffffffffffff prints in C99 hexadecimal format as
0x1.fffffffffffffp+1023.

kripken · 2015-08-10T20:44:04Z

I believe LLVM assembly has a combination of those, which makes sense: you want to encode the bits directly for things you can't represent well in text, but that is rare, and you don't want the common case of -0.5f to look like 0x3F000000.

jfbastien · 2015-08-10T20:44:15Z

I should have put a decent reference for hexadecimal floating-point.

kripken · 2015-08-10T20:46:01Z

Hmm, even 0x1p-1 for -0.5f seems quite painful to me.

jfbastien · 2015-08-10T20:46:15Z

@kripken LLVM tries to support nice representations and precise ones, which leads to code such as:
https://github.com/llvm-mirror/llvm/blob/master/lib/IR/AsmWriter.cpp#L1104

sunfishcode · 2015-08-10T20:46:42Z

0x1p-1 is much less painful than 0x3fe0000000000000 or any other bit/byte-wise representation :).

lukewagner · 2015-08-10T20:47:58Z

(I should point out when I said "encoding the ieee754 bits directly", I was advocating for a simple, precise, non-decimal representation, not literal sequence of bits; e.g., C99 hex floats sound good.)

kripken · 2015-08-10T20:50:29Z

@jfbastien: yes, I agree it takes a little more work to get human-readable numbers. I think it's worth it; it's not that much work.

jfbastien · 2015-08-10T22:41:13Z

It sounds like we have a good consensus for hexadecimal floating-point since it's precise and mildly readable. I just added support for this to LLVM:

I canonicalize NaNs to quiet, positive, no payload, and print nan.
I print infinity as infinity or -infinity.

sunfishcode · 2015-08-10T23:09:52Z

Canonicalizing nans isn't correct; WebAssembly can have NaN payload values (even though WebAssembly doesn't currently provide NaN bitpattern propagation).

If we're going the C99 hexadecimal float route, the obvious thing to do for nan is to let nans have a [+-] prefix and to use the (n-char-sequence) approach. For example nan(0x01234567) has the bit pattern 0x7ff8000001234567 in 64-bit format. The payload would be an unsigned hexadecimal integer. That will cover all possible bit patterns.

jfbastien · 2015-08-11T00:51:36Z

@sunfishcode is correct, I've updated LLVM to simply assert on SNaN and NaNs with payloads. It currently handles nan and -nan, but no other NaN value for now.

sunfishcode · 2015-08-12T17:12:01Z

Correction to my last post: the traditional strtod-style nan(0x01234567) syntax has no ability to produce so-called "signalling NaN" representations. It is not currently clear whether WebAssembly will ever have the concept of a "signalling NaN", so it's not desirable to start making up syntax for it right now.

Fortunately, the present need here is just to have something to put in SExprs to get some basic things working, not to design what will go in the ultimate text format, so it's best to just do something simple for now. Having LLVM assert on NaN values that can't be represented easily is fine for now.

lukewagner · 2015-08-28T13:43:44Z

Did this issue result in any design doc PRs? I don't see anything in TextFormat.md and this seems like the type of thing we should record.

Addresses the conclusion reached in #292.

jfbastien · 2015-08-28T17:06:00Z

ಠ_ಠ at self for not doing so. Thanks @lukewagner for calling me out on it! :-)
#318 fixes this!

jfbastien added the question label Aug 10, 2015

sunfishcode closed this as completed Aug 12, 2015

sunfishcode mentioned this issue Aug 26, 2015

Implement accurate float32 semantics WebAssembly/spec#29

Merged

jfbastien added a commit that referenced this issue Aug 28, 2015

Text format: not unique, but precise

992f239

Addresses the conclusion reached in #292.

jfbastien mentioned this issue Aug 28, 2015

Text format: not unique, but precise #318

Merged

jfbastien mentioned this issue Jan 28, 2016

Invalid NaN literal WebAssembly/wabt#28

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Textual format: floating-point representation #292

Textual format: floating-point representation #292

jfbastien commented Aug 10, 2015

kg commented Aug 10, 2015

lukewagner commented Aug 10, 2015

sunfishcode commented Aug 10, 2015

kripken commented Aug 10, 2015

jfbastien commented Aug 10, 2015

kripken commented Aug 10, 2015

jfbastien commented Aug 10, 2015

sunfishcode commented Aug 10, 2015

lukewagner commented Aug 10, 2015

kripken commented Aug 10, 2015

jfbastien commented Aug 10, 2015

sunfishcode commented Aug 10, 2015

jfbastien commented Aug 11, 2015

sunfishcode commented Aug 12, 2015

lukewagner commented Aug 28, 2015

jfbastien commented Aug 28, 2015

Textual format: floating-point representation #292

Textual format: floating-point representation #292

Comments

jfbastien commented Aug 10, 2015

kg commented Aug 10, 2015

lukewagner commented Aug 10, 2015

sunfishcode commented Aug 10, 2015

kripken commented Aug 10, 2015

jfbastien commented Aug 10, 2015

kripken commented Aug 10, 2015

jfbastien commented Aug 10, 2015

sunfishcode commented Aug 10, 2015

lukewagner commented Aug 10, 2015

kripken commented Aug 10, 2015

jfbastien commented Aug 10, 2015

sunfishcode commented Aug 10, 2015

jfbastien commented Aug 11, 2015

sunfishcode commented Aug 12, 2015

lukewagner commented Aug 28, 2015

jfbastien commented Aug 28, 2015