Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Double serialization is inefficient #35

Closed
michaelochurch opened this issue Jun 10, 2015 · 6 comments
Closed

Double serialization is inefficient #35

michaelochurch opened this issue Jun 10, 2015 · 6 comments

Comments

@michaelochurch
Copy link

Currently, it takes 25 bytes to store a 64-bit Double.

λ Data.ByteString.length $ Data.Serialize.encode (1.75 :: Double)
25

Right now, the current behavior is to use GHC.Float.decodeFloat, which a typical 64-bit Double into an Integer (typically, 17 bytes at the relevant size) and an Int (8 bytes) before serializing them. This leads to a 3.125x increase in size if you're storing, say, a large list or array of Doubles. For 32-bit Float, the footprint is 13 bytes, for a 3.25x increase.

I'm not aware of the history behind decisions that were made. Is there a reason why Double and Float are stored (when the Serialize instance for them is used) as an (Integer, Int) pair rather than as raw binary? Is there a safety-related, corner-case reason for not having the default to the more efficient alternative?

@elliottt
Copy link
Contributor

Originally, there was a reason that we decided to use decodeFloat in the Serialize instance for Double, but I really can't remember what it is at this point. I'm working slowly towards a new release, so maybe that would be a good time to break this instance, and switch to using the functions from Data.Serialize.IEEE754?

@acfoltzer: as the original author of Data.Serialize.IEEE754, do you remember why we didn't start using that functionality in the instances for Double and Float?

@acfoltzer
Copy link

I think we wanted to be conservative about what a Float or Double is. For Float:

It is desirable that this type be at least equal in range and precision to the IEEE single-precision type.

Data.Serialize.IEEE754 relies on the fact that GHC's representation of these types happens to be IEEE, but that might change in the future, although I doubt it'll change anytime soon.

@elliottt
Copy link
Contributor

I have a load of somewhat substantial changes to merge in, that will likely require a major number bump. I say that we use that as an opportunity to switch to using the IEEE754 module for Float/Double serialization at that point, and deal with the change in the future if GHC decides to switch. Any objections?

@acfoltzer
Copy link

👍

@michaelochurch
Copy link
Author

Second Adam's approval.

On Tue, Jun 30, 2015 at 10:28 AM, Adam C. Foltzer notifications@github.com
wrote:

[image: 👍]


Reply to this email directly or view it on GitHub
#35 (comment).

@elliottt
Copy link
Contributor

Sorry this took so long, but I've finally pushed this change. I'm going to start merging in some larger changes before doing a major release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants