Add a method to read the first bytes of a float (and assume the rest are 0) #70

SamWhited · 2017-03-01T05:21:56Z

I'm not sure if this is broadly useful enough to be worth adding to the library, but I've found myself using a similar function to this several times in a recent project so I thought I'd submit a PR in case you wanted it.

The idea is to read the first n bytes of an f64 as a uint and then assume that the rest of the bytes are 0. This is very useful if you're parsing lots of compressed floats in little-endian format where the low bits (which are often zero) can be dropped (eg. due to a form of run length encoding, or other compression that drops sequences of zeros).

BurntSushi · 2017-03-01T12:04:24Z

See #71

SamWhited · 2017-03-01T15:20:33Z

@BurntSushi I don't think this is related, is it (other than the fact that this method would also have a similar problem)? This PR is for a general float method that reads less than 8 bytes (and assumes the rest are 0), but it's likely that I'm just misunderstanding what you meant.

BurntSushi · 2017-03-01T15:24:34Z

@SamWhited It's related in that it would be adding more methods with the same problem as the existing read_f32/read_f64 methods.

FWIW, we do already have read_uint/read_int, which seem like analogous methods, right? So I think I'm at least in principle in favor of these.

SamWhited · 2017-03-01T15:31:05Z

Ah yes, if you consider reading signaling NaN's a problem (although that sounds like it's up in the air? ~~I'm inclined to say that it is a problem personally~~ EDIT: Yup, they cause undefined behavior in LLVM.) this would be adding another method with the same problem. So maybe it's something for the 2.0 release (or maybe it doesn't really cause any issues, since there are already methods with the problem so having another doesn't really change anything)?

FWIW, we do already have read_uint/read_int, which seem like analogous methods, right? So I think I'm at least in principle in favor of these.

Yes, it's more or less analogous

BurntSushi · 2017-03-28T15:52:02Z

One interesting idea brought up in the other thread was that we could mask out the signaling NaN bit before doing the transmute to a float. I don't really like that, but we either do that, or we need to change the API to return a Result for floating point conversions.

cc @est31 @retep998 @nagisa @Amanieu @petrochenkov @dwrensha @valarauca @alexcrichton

SamWhited · 2017-03-28T15:56:26Z

we could mask out the signaling NaN bit

I don't love the idea of losing that information; if we are decoding something to a signaling NaN we either want to know in case it was intentional, or we want to know because it's a bug (or maybe it actually means something) in the encoded stream and we need to display some error, or take some action.

Having float operations return a result feels poor to me too because there's the overhead of unwrapping a result on what appears at first glance to be a simple operation that should "just work", but since it's not actually that simple under the hood I think this is the lesser of two evils personally.

I haven't really thought through any use cases but my own (decoding Go's Gob format), so I'll be curious to see what others say.

valarauca · 2017-03-28T15:58:35Z

I think masking out the signal NaN is fine. Flush to zero is fairly standard behavior. As long as an alternative function exists that *may* return a signal NaN.

…

On Tue, Mar 28, 2017 at 11:56 AM Sam Whited ***@***.***> wrote: we could mask out the signaling NaN bit I don't love the idea of losing the error information; if we *are* decoding something to a signaling NaN we either want to know in case it was intentional, or we want to know because it's a bug (or maybe it actually means something) in the encoded stream and we need to display some error, or take some action. Having float operations return a result feels poor to me too because there's the overhead of unwrapping a result on what appears at first glance to be a simple operation that should "just work", but since it's not actually that simple under the hood I think this is the lesser of two evils personally. I haven't really thought through any use cases but my own though (decoding Go's Gob format), so I'll be curious to see what others say. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#70 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ASViDPWl8gOfNEXqXQkNZw4rA1JPYDIXks5rqS2qgaJpZM4MPO82> .

SamWhited · 2017-03-28T16:00:47Z

As long as an alternative function exists that may return a signal NaN.

I didn't think about that, having a second unsafe version of the function that can return a signaling NaN also makes sense to me, but also bloats the API: I'm not sure if the tradeoff there for something people may never (or rarely) use is worth it.

BurntSushi · 2017-03-28T23:08:51Z

At the moment, I am somewhat inclined to clarify the contract of read_f32 and read_f64 to state that the return value is always defined. At an implementation level, we can mask out the signaling NaN bit and be done with this.

Whether we should add more (unsafe) methods to the API to support getting the f32/f64 without any masking isn't clear, because I don't actually know what the use cases are. And if there aren't any use cases (or if they are very rare/niche), then it seems reasonable to expect callers to drop into transmute.

SamWhited · 2017-03-29T01:47:17Z

The discussion on rust-lang/rust#39271 also seems to have gone that way; I pushed a new commit that I think should turn sNaN's into qNaN's by flipping the most significant fraction bit. Review by someone who's more comfortable with floating point math than I am would be appreciated.

I'm also not 100% sure that the test I wrote for it doesn't introduce undefined behavior itself.

EDIT: ~~On second glance, I think I messed up my masks for the qNaN and the sNaN… double checking which one has the MSB of the mantissa set and which doesn't… please hold.~~

Yup, I did; fixed.

est31 · 2017-03-29T02:07:56Z

Hmm maybe I should do it like you and just flip that single bit.

SamWhited · 2017-03-29T02:11:54Z

@est31 I couldn't think of any cases where it would matter one way or the other, but I can't claim to have any real domain knowledge here. I'm not really sure who to go to for advice either; maybe some other project has done something similar and we could copy it or ask them?

est31 · 2017-03-29T02:17:04Z

@SamWhited for my use cases it wouldn't be really useful either, but apparently you can use the lower part of the fraction field for a payload. I think I'll just do something like:

if v.is_nan() { v |= 1 << pos_of_highest_bit_of_fraction_field }

est31 · 2017-03-29T02:25:57Z

Hmm just realized that is_nan is not defined on u64/u32. I'll keep the old version.

SamWhited · 2017-03-29T02:30:51Z

@est31 If you have a u64 you should be able to do something like:

// The exponent is 1's  &&  the mantissa has at least one bit set 
   (n & 0x7FF == 0x7FF) && (n & 0x000FFFFFFFFFFFFF != 0)

to check for NaN

EDIT: oops, probably need to shift the exponent left 52 places.

BurntSushi · 2017-03-29T11:15:21Z

src/lib.rs

    #[test]
    fn uint_bigger_buffer() {
        use {ByteOrder, LittleEndian};
        let n = LittleEndian::read_uint(&[1, 2, 3, 4, 5, 6, 7, 8], 5);
        assert_eq!(n, 0x0504030201);
    }
+
+    // TODO: How is transmute implemented? Does it count as an operation on the sNaN?
+    //       Is this test undefined behavior?


By the time you call transmute, I think you've already convert the sNaN to a qNaN, so I think you're safe.

(Even so, since it's only a test, I'm fine with being in murky territory.)

SamWhited · 2017-03-29T14:33:18Z

Rebased and removed TODO comments based on @BurntSushi's feedback. I think the last unopened question is what variants of this method do we want and what should they be named? Eg. do we want both an f32/f64 version fo this method? Do we want write methods (I haven't even thought about those), etc.

read_f{32,64} sounds closest to read_uint, but it's already taken. read_nf32? read_float<T: Float>?

BurntSushi · 2017-03-29T14:47:27Z

@SamWhited Thanks! The read_uint and read_int methods always yield a u64 and i64, respectively. So to be congruous with those, I think we just want a single read_float that returns a f64.

Also, since you're adding the sNaN masking to read_float, would you be willing to also add it to read_f64/read_f32 as well?

And yes, since we have write_int/write_uint, we should also have write_float.

SamWhited · 2017-03-29T14:49:13Z

Also, since you're adding the sNaN masking to read_float, would you be willing to also add it to read_f64/read_f32 as well?

Sure thing

SamWhited · 2017-03-29T15:45:52Z

Done ⤴

To reiterate, I'm reasonably sure this is correct and that all my lengths for the various parts of IEEE floats are correct (thanks to the lovely diagrams on Wikipedia), but review by someone who knows floats would be appreciated.

BurntSushi · 2017-03-29T15:47:33Z

@SamWhited I'm not a floating point expert either unfortunately, but I will do some reading and try to do an independent review before merging. It's important that this is right. (It might take me a little bit to get to though.)

nagisa · 2017-03-29T16:20:11Z

src/lib.rs

+        let mut u = Self::read_u32(buf);
+        // The exponent is 1's  &&  the mantissa has at least one bit set (aka. is_nan):
+        if (u & 0xFF<<23 == 0xFF<<23) && (u & 0x3FFFFF != 0) {
+            u |= 1;


Citing ieee754-2008:

All binary NaN bit strings have all the bits of the biased exponent field E set to 1 (see 3.4). A quiet NaN bit
string should be encoded with the first bit (d₁) of the trailing significand field T being 1. A signaling NaN
bit string should be encoded with the first bit of the trailing significand field being 0. If the first bit of the
trailing significand field is 0, some other bit of the trailing significand field must be non-zero to distinguish
the NaN from infinity. In the preferred encoding just described, a signaling NaN shall be quieted by setting
d₁ to 1, leaving the remaining bits of T unchanged.

I believe the d₁ here is the first bit after the exponent, not the least significant bit of the encoding, as per:

making this code not actually mask out the signalling bit.

facepalm You're right, it should be the MSB. Thanks!

Fixed for the read_f{32,64} methods (I think, please double check my bit widths :) Thanks!).

SamWhited · 2017-03-29T17:39:11Z

Note to self: Outstanding bug: the read_float implmentations don't actually detect nan's properly (they need to check every byte in the mantisa and see if any of them contain a set bit).

EDIT: Also fixed; it should be accounting for the entire mantissa now

SamWhited · 2017-03-29T20:31:18Z

Another open question: I just noticed that read_f32 and read_f64 don't actually bother with endianness, which is fair since the floating point spec doesn't mention it. Should I do the same for read_float? Right now there are two different implementations, and the Linux kernel at least appears to consider endianness when reading floating point numbers, but I could see this going either way.

nagisa · 2017-03-29T20:33:46Z

Endianness is handled by read_u{32,64}. By the time these functions return, the data is in host endianness already.

SamWhited · 2017-03-29T20:34:49Z

Oh right, nevermind. *dissapears

Fixes #71

SamWhited · 2017-05-08T15:17:23Z

@BurntSushi ping; just wanted to make sure this didn't fall off the radar. No rush though.

est31 · 2017-06-17T22:49:31Z

@BurntSushi friendly ping :)

You said that you wanted to review the sNaN masking in this PR before we stabilize the transmute functions: rust-lang/rust#39271 (comment)

Would be nice to have stable float<-> int transmute in Rust 1.20.

le-jzr · 2017-06-28T13:19:30Z

I'm no expert, but according to wikipedia, some implementations of IEEE 754 use opposite meaning for the signalling/quiet bit.
IMO the conversion should set the bit to a known-quiet reference, instead of always setting to one, otherwise it's still unsafe on some architectures.

est31 · 2017-07-03T13:39:14Z

@le-jzr great link! Setting the value to a known quiet NAN without any masking might indeed be the way to go forward.

le-jzr · 2017-07-03T13:49:20Z

Also of interest: https://sourceware.org/binutils/docs/as/MIPS-NaN-Encodings.html

BurntSushi · 2017-07-08T13:51:18Z

@SamWhited @est31 In the interest of moving things forward, I've just merged the part of this PR that makes reading floats safe. In particular, I updated it to use @est31's implementation that is now in std.

@SamWhited I tried to salvage the read_float stuff, but I found the test coverage insufficient. I'm pretty sure there is something wrong with the implementation in this PR. In particular, I would expect the implementations to be built on top of {LittleEndian, BigEndian}::read_uint, but when I tried to do that, existing doc tests failed. I think it would be best to approach that in a separate PR, and in particular, add read_float/write_float together. I think there should be at least as many tests for read_float/write_float as there are for read_uint/write_uint.

BurntSushi reviewed Mar 29, 2017

View reviewed changes

SamWhited added 2 commits March 29, 2017 10:07

Add a method to read the first bytes of a float

975b262

Handle sNaN's (make them qNaN's) in read_float

68823b7

nagisa reviewed Mar 29, 2017

View reviewed changes

SamWhited added 2 commits March 30, 2017 09:58

Mask out sNaN's in read_f{32,64}

b332abb

Fixes #71

Flip correct sNaN bit in float methods

f8334a8

SamWhited added 3 commits March 30, 2017 09:59

Ensure &[u8] is_nan checks entire mantissa

8fc8aae

Make the language about sNaN's softer

c2437dc

Add example to read_float

75bd853

BurntSushi mentioned this pull request May 31, 2017

is read_f32/read_f64 unsafe? #71

Closed

BurntSushi closed this Jul 8, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a method to read the first bytes of a float (and assume the rest are 0) #70

Add a method to read the first bytes of a float (and assume the rest are 0) #70

SamWhited commented Mar 1, 2017 •

edited

BurntSushi commented Mar 1, 2017

SamWhited commented Mar 1, 2017 •

edited

BurntSushi commented Mar 1, 2017

SamWhited commented Mar 1, 2017 •

edited

BurntSushi commented Mar 28, 2017

SamWhited commented Mar 28, 2017 •

edited

valarauca commented Mar 28, 2017 via email

SamWhited commented Mar 28, 2017 •

edited

BurntSushi commented Mar 28, 2017

SamWhited commented Mar 29, 2017 •

edited

est31 commented Mar 29, 2017

SamWhited commented Mar 29, 2017 •

edited

est31 commented Mar 29, 2017

est31 commented Mar 29, 2017

SamWhited commented Mar 29, 2017 •

edited

BurntSushi Mar 29, 2017

SamWhited commented Mar 29, 2017 •

edited

BurntSushi commented Mar 29, 2017

SamWhited commented Mar 29, 2017

SamWhited commented Mar 29, 2017 •

edited

BurntSushi commented Mar 29, 2017

nagisa Mar 29, 2017

SamWhited Mar 29, 2017

SamWhited Mar 29, 2017 •

edited

SamWhited commented Mar 29, 2017 •

edited

SamWhited commented Mar 29, 2017

nagisa commented Mar 29, 2017

SamWhited commented Mar 29, 2017 •

edited

SamWhited commented May 8, 2017

est31 commented Jun 17, 2017 •

edited

le-jzr commented Jun 28, 2017

est31 commented Jul 3, 2017

le-jzr commented Jul 3, 2017

BurntSushi commented Jul 8, 2017

Add a method to read the first bytes of a float (and assume the rest are 0) #70

Add a method to read the first bytes of a float (and assume the rest are 0) #70

Conversation

SamWhited commented Mar 1, 2017 • edited

BurntSushi commented Mar 1, 2017

SamWhited commented Mar 1, 2017 • edited

BurntSushi commented Mar 1, 2017

SamWhited commented Mar 1, 2017 • edited

BurntSushi commented Mar 28, 2017

SamWhited commented Mar 28, 2017 • edited

valarauca commented Mar 28, 2017 via email

SamWhited commented Mar 28, 2017 • edited

BurntSushi commented Mar 28, 2017

SamWhited commented Mar 29, 2017 • edited

est31 commented Mar 29, 2017

SamWhited commented Mar 29, 2017 • edited

est31 commented Mar 29, 2017

est31 commented Mar 29, 2017

SamWhited commented Mar 29, 2017 • edited

BurntSushi Mar 29, 2017

Choose a reason for hiding this comment

SamWhited commented Mar 29, 2017 • edited

BurntSushi commented Mar 29, 2017

SamWhited commented Mar 29, 2017

SamWhited commented Mar 29, 2017 • edited

BurntSushi commented Mar 29, 2017

nagisa Mar 29, 2017

Choose a reason for hiding this comment

SamWhited Mar 29, 2017

Choose a reason for hiding this comment

SamWhited Mar 29, 2017 • edited

Choose a reason for hiding this comment

SamWhited commented Mar 29, 2017 • edited

SamWhited commented Mar 29, 2017

nagisa commented Mar 29, 2017

SamWhited commented Mar 29, 2017 • edited

SamWhited commented May 8, 2017

est31 commented Jun 17, 2017 • edited

le-jzr commented Jun 28, 2017

est31 commented Jul 3, 2017

le-jzr commented Jul 3, 2017

BurntSushi commented Jul 8, 2017

SamWhited commented Mar 1, 2017 •

edited

SamWhited commented Mar 1, 2017 •

edited

SamWhited commented Mar 1, 2017 •

edited

SamWhited commented Mar 28, 2017 •

edited

SamWhited commented Mar 28, 2017 •

edited

SamWhited commented Mar 29, 2017 •

edited

SamWhited commented Mar 29, 2017 •

edited

SamWhited commented Mar 29, 2017 •

edited

SamWhited commented Mar 29, 2017 •

edited

SamWhited commented Mar 29, 2017 •

edited

SamWhited Mar 29, 2017 •

edited

SamWhited commented Mar 29, 2017 •

edited

SamWhited commented Mar 29, 2017 •

edited

est31 commented Jun 17, 2017 •

edited