Adds support for decoding floating-point typed arrays from RFC8746 #111

tgockel · 2021-05-25T23:25:32Z

This adds support for decoding arrays of floating point numbers of IEEE 754 formats binary16, binary32, and binary64 in both the big- and little-endian form.

If this looks good, we can add unsigned and signed integers using the same general ideas...and also encoders for these special markers.

Sekenre · 2021-05-28T10:20:43Z

Hi @tgockel, thanks for doing this. I had done some experiments a while back decoding typed arrays into python array.array types. I think that might be faster. It also lets you do a round-trip:

Sekenre@a117ad3

This is related to #32 and is maybe a simple way to handle it without needing numpy as a dependency.

Let me know what you think, I'm open to suggestions.

coveralls · 2021-05-28T10:23:20Z

Coverage decreased (-0.3%) to 96.892% when pulling 70e6f6c on tgockel:rfc8746 into 9f30439 on agronholm:master.

tgockel · 2021-05-31T06:50:08Z

I have never seen array before, but it definitely seems like the right approach instead of the weird struct trickery I did. Unfortunately, array.array doesn't have support for half-precision floats, but I updated the single- and double-precision floating point algorithm to use it.

The biggest issue I see is immutability -- array.array does not have a convenient method like numpy's array.setflags(write=False) for this. I left comments with TODO(tgockel/111) for this, but I don't know an elegant way to address this one.

Sekenre · 2021-06-02T16:05:03Z

The biggest issue I see is immutability -- array.array does not have a convenient method like numpy's array.setflags(write=False) for this. I left comments with TODO(tgockel/111) for this, but I don't know an elegant way to address this one.

If you want it to be immutable, you can wrap the bytes in a memoryview and then cast it, like this:

>>> my_array = memoryview(b'\x1f\x85\xebQ\xb8\x1e\t@').cast('d')
>>> assert my_array[0] == 3.14
>>> my_array[0] = 2.16
Traceback (most recent call last):
  File "<pyshell#57>", line 1, in <module>
    myarray[0] = 2.14
TypeError: cannot modify read-only memory

tgockel · 2021-06-02T16:40:32Z

That unfortunately doesn't work because the ultimate point of making this read-only is so that it can be used as keys in a dictionary, but memoryview hashing has a shortcoming:

ValueError: memoryview: hashing is restricted to formats 'B', 'b' or 'c'

This adds support for decoding arrays of floating point numbers of IEEE 754 formats binary16, binary32, and binary64 in both the big- and little-endian form.

Sekenre · 2021-06-05T17:34:16Z

I tried writing a little class to represent a float16 array instead of converting to a list of floats and posted it here: https://codereview.stackexchange.com/q/261573/243247. This lets you write an encoder that can just copy the underlying buffer into the output. This could be added to cbor2.types.

tgockel · 2021-06-06T20:43:28Z

There's an interesting question on hashing -- should the endianness of the generated source affect hashing? Let's say an x86 machine and an AArch64 machine both generate [1.5, 2.5] and encode it as a half-precision typed array...let's call them arr_le and arr_be. Should the hash(arr_le) == hash(arr_be)? What about hash((1.5, 2.5))? I think a user would expect all 3 hashes to be equal.

This gets even more hairy when we get into integer v float comparisons. In Python, hash(2) == hash(2.0). Per the documentation of hash:

Numeric values that compare equal have the same hash value (even if they are of different types, as is the case for 1 and 1.0).

This extends to tuples, as hash((2, 3, 4)) == hash((2.0, 3.0, 4.0)).

I'm not sure there is a good answer here. My solution of calling tuple(input) has the disadvantage of poor performance, but it only happens when a typed array is used as a key to a map, which I don't think happens all that frequently in the world.

Sekenre · 2021-06-09T12:06:32Z

should the endianness of the generated source affect hashing?

IMO: No it should not, foreign endian data should always be converted to native endian prior to hashing, and each platform should write arrays in their native format since it can always be unambiguously tagged as such.

This extends to tuples, as hash((2, 3, 4)) == hash((2.0, 3.0, 4.0))

Does that hashing behaviour hold true for numpy 1d arrays? Would it just be easier to require numpy for handling these?

tgockel · 2021-06-14T00:26:55Z

numpy arrays avoid the problem by not being hashable.

escherstair · 2022-11-17T15:23:01Z

@tgockel @Sekenre do you have plans to merge this pull request?
Typed arrays is exactly the feature I miss

brendan-simon-indt · 2023-01-20T05:32:44Z

Bump. Any movement on getting various floating point formats encoded with CBOR?

agronholm · 2023-07-14T11:56:12Z

Bump. Any movement on getting various floating point formats encoded with CBOR?

The problem with immutability/hashability has not been solved yet. If you want this faster, participate in the process of finding solutions.

brendan-simon-indt · 2023-07-14T12:15:27Z

Bump. Any movement on getting various floating point formats encoded with CBOR?

The problem with immutability/hashability has not been solved yet. If you want this faster, participate in the process of finding solutions.

I found a solution that works for me - casting to np.floatX, then back to float, then use canonical=True when encoding.

value_to_encode = float( np.float16( value ) )

tgockel force-pushed the rfc8746 branch from 98c252f to cad2aaf Compare May 26, 2021 03:15

tgockel changed the title ~~Work-in-progress: Adds support for typed arrays from RFC8746~~ Adds support for decoding floating-point typed arrays from RFC8746 May 26, 2021

tgockel force-pushed the rfc8746 branch from 435b674 to 7361d6e Compare June 4, 2021 19:23

Adds support for decoding floating-point typed arrays from RFC8746

70e6f6c

This adds support for decoding arrays of floating point numbers of IEEE 754 formats binary16, binary32, and binary64 in both the big- and little-endian form.

tgockel force-pushed the rfc8746 branch from 7361d6e to 70e6f6c Compare June 4, 2021 19:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds support for decoding floating-point typed arrays from RFC8746 #111

Adds support for decoding floating-point typed arrays from RFC8746 #111

tgockel commented May 25, 2021 •

edited

Loading

Sekenre commented May 28, 2021

coveralls commented May 28, 2021 •

edited

Loading

tgockel commented May 31, 2021

Sekenre commented Jun 2, 2021

tgockel commented Jun 2, 2021

Sekenre commented Jun 5, 2021

tgockel commented Jun 6, 2021 •

edited

Loading

Sekenre commented Jun 9, 2021 •

edited

Loading

tgockel commented Jun 14, 2021

escherstair commented Nov 17, 2022

brendan-simon-indt commented Jan 20, 2023

agronholm commented Jul 14, 2023

brendan-simon-indt commented Jul 14, 2023

Adds support for decoding floating-point typed arrays from RFC8746 #111

Are you sure you want to change the base?

Adds support for decoding floating-point typed arrays from RFC8746 #111

Conversation

tgockel commented May 25, 2021 • edited Loading

Sekenre commented May 28, 2021

coveralls commented May 28, 2021 • edited Loading

tgockel commented May 31, 2021

Sekenre commented Jun 2, 2021

tgockel commented Jun 2, 2021

Sekenre commented Jun 5, 2021

tgockel commented Jun 6, 2021 • edited Loading

Sekenre commented Jun 9, 2021 • edited Loading

tgockel commented Jun 14, 2021

escherstair commented Nov 17, 2022

brendan-simon-indt commented Jan 20, 2023

agronholm commented Jul 14, 2023

brendan-simon-indt commented Jul 14, 2023

tgockel commented May 25, 2021 •

edited

Loading

coveralls commented May 28, 2021 •

edited

Loading

tgockel commented Jun 6, 2021 •

edited

Loading

Sekenre commented Jun 9, 2021 •

edited

Loading