Double serialization is inefficient #35

michaelochurch · 2015-06-10T22:25:20Z

Currently, it takes 25 bytes to store a 64-bit Double.

λ Data.ByteString.length $ Data.Serialize.encode (1.75 :: Double)
25

Right now, the current behavior is to use GHC.Float.decodeFloat, which a typical 64-bit Double into an Integer (typically, 17 bytes at the relevant size) and an Int (8 bytes) before serializing them. This leads to a 3.125x increase in size if you're storing, say, a large list or array of Doubles. For 32-bit Float, the footprint is 13 bytes, for a 3.25x increase.

I'm not aware of the history behind decisions that were made. Is there a reason why Double and Float are stored (when the Serialize instance for them is used) as an (Integer, Int) pair rather than as raw binary? Is there a safety-related, corner-case reason for not having the default to the more efficient alternative?

The text was updated successfully, but these errors were encountered:

elliottt · 2015-06-12T03:50:58Z

Originally, there was a reason that we decided to use decodeFloat in the Serialize instance for Double, but I really can't remember what it is at this point. I'm working slowly towards a new release, so maybe that would be a good time to break this instance, and switch to using the functions from Data.Serialize.IEEE754?

@acfoltzer: as the original author of Data.Serialize.IEEE754, do you remember why we didn't start using that functionality in the instances for Double and Float?

acfoltzer · 2015-06-12T17:05:02Z

I think we wanted to be conservative about what a Float or Double is. For Float:

It is desirable that this type be at least equal in range and precision to the IEEE single-precision type.

Data.Serialize.IEEE754 relies on the fact that GHC's representation of these types happens to be IEEE, but that might change in the future, although I doubt it'll change anytime soon.

elliottt · 2015-06-28T16:07:41Z

I have a load of somewhat substantial changes to merge in, that will likely require a major number bump. I say that we use that as an opportunity to switch to using the IEEE754 module for Float/Double serialization at that point, and deal with the change in the future if GHC decides to switch. Any objections?

acfoltzer · 2015-06-30T15:28:02Z

👍

michaelochurch · 2015-07-02T17:23:19Z

Second Adam's approval.

On Tue, Jun 30, 2015 at 10:28 AM, Adam C. Foltzer notifications@github.com
wrote:

[image: 👍]

—
Reply to this email directly or view it on GitHub
#35 (comment).

elliottt · 2015-08-14T18:33:21Z

Sorry this took so long, but I've finally pushed this change. I'm going to start merging in some larger changes before doing a major release.

elliottt closed this as completed Aug 14, 2015

stepcut mentioned this issue Nov 7, 2015

it seems that cereal has changed the way floats/doubles are serialized acid-state/safecopy#35

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Double serialization is inefficient #35

Double serialization is inefficient #35

michaelochurch commented Jun 10, 2015

elliottt commented Jun 12, 2015

acfoltzer commented Jun 12, 2015

elliottt commented Jun 28, 2015

acfoltzer commented Jun 30, 2015

michaelochurch commented Jul 2, 2015

elliottt commented Aug 14, 2015

Double serialization is inefficient #35

Double serialization is inefficient #35

Comments

michaelochurch commented Jun 10, 2015

elliottt commented Jun 12, 2015

acfoltzer commented Jun 12, 2015

elliottt commented Jun 28, 2015

acfoltzer commented Jun 30, 2015

michaelochurch commented Jul 2, 2015

elliottt commented Aug 14, 2015