New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace Binary with something more reliable for file storage #889

Closed
snoyberg opened this Issue Sep 1, 2015 · 12 comments

Comments

Projects
None yet
2 participants
@snoyberg
Contributor

snoyberg commented Sep 1, 2015

Binary instances are quite problematic for file storage, since they store no metadata about the schema itself. This has led multiple times to out-of-memory issues because an Int was parsed as a list size, for example. We have some hacks in the code base to essentially embed magic numbers to work around this, but there are certainly better ways we can solve this with improved libraries.

A non-solution: use JSON. We tried this already, and the performance hit was significant. We need a fast (de)serialization.

@snoyberg snoyberg added this to the 0.3.0.0 milestone Sep 1, 2015

@phadej

This comment has been minimized.

Contributor

phadej commented Sep 1, 2015

https://github.com/well-typed/binary-serialise-cbor https://www.youtube.com/watch?v=60gUaOuZZsE it's not released but might be an option. CBOR is essentially binary json, there are tags indicating what data comes next in the binary stream.

@snoyberg

This comment has been minimized.

Contributor

snoyberg commented Sep 1, 2015

Awesome, thanks for the link. I remembered hearing @dcoutts discuss this at some point, but didn't know what came of it. Duncan: are there plans to release this in the near term? And there doesn't seem to be any Generic support in the type class yet. Is that intentional, or would Generic support be a good thing for the library?

@phadej

This comment has been minimized.

Contributor

phadej commented Sep 7, 2015

@snoyberg I have made a small package https://github.com/phadej/binary-tagged during "the morning hack". It's missing a bit of essential instances; but could it be used as a possible solution to this? It's not yet on Hackage, but it could be as soon as I verify API in our own dependant project.

@snoyberg

This comment has been minimized.

Contributor

snoyberg commented Sep 7, 2015

It definitely looks like an improvement. My concerns are:

  • It looks like this requires explicit versioning, am I misreading that?
  • How big a performance hit would this introduce?
@phadej

This comment has been minimized.

Contributor

phadej commented Sep 7, 2015

First question: it allows explicit versioning, but data is implicitly versioned by its structure: generic-derivable HasNominalSop. Explicit versioning might be needed sometimes, but with enough newtypes it should be needed (time shows). Actually one could write tests using this, to track when data formats actually change, i.e. hardcode hash into tests.

Second question: If only top-level data is tagged, then the performance hit shouldn't be noticeable. I'll write some trivial benchmarks to verify that.

@snoyberg

This comment has been minimized.

Contributor

snoyberg commented Sep 7, 2015

Sounds good, I'm liking where this is headed :)

@phadej

This comment has been minimized.

Contributor

phadej commented Sep 7, 2015

With https://github.com/phadej/binary-tagged/blob/wip/bench/Bench.hs (which is quite nice usage example as well) the result is:
fireshot capture - criterion report - http___localhost_8000_bench html

I'll try to finalise and realise the package during this week. I'll ping this issue after that.

@snoyberg

This comment has been minimized.

Contributor

snoyberg commented Sep 7, 2015

Very nice, thanks!

On Mon, Sep 7, 2015 at 6:47 PM, Oleg Grenrus notifications@github.com
wrote:

With https://github.com/phadej/binary-tagged/blob/wip/bench/Bench.hs
(which is quite nice usage example as well) the result is:
[image: fireshot capture - criterion report - http___localhost_8000_bench
html]
https://cloud.githubusercontent.com/assets/51087/9719788/aa4ea418-5590-11e5-87eb-8dca842b617d.png

I'll try to finalise and realise the package during this week. I'll ping
this issue after that.


Reply to this email directly or view it on GitHub
#889 (comment)
.

@phadej

This comment has been minimized.

Contributor

phadej commented Sep 8, 2015

First release http://hackage.haskell.org/package/binary-tagged / https://github.com/phadej/binary-tagged, some (almost) real world usage in https://github.com/futurice/haskell-flowdock-rest/blob/grep/flowdock-grep/src/Main.hs

As the latter is under active development, binary-tagged already caught structural changes: didn't need to clean cache files during development by hand at all.

@snoyberg

This comment has been minimized.

Contributor

snoyberg commented Sep 8, 2015

Cool. Is this ready to go?

And by any chance, would you be interested in making the change to the stack codebase? It should be isolated to just a few specific places (anything calling Binary.encode* or Binary.decode*).

@phadej

This comment has been minimized.

Contributor

phadej commented Sep 8, 2015

I can check them out, sure.

@snoyberg

This comment has been minimized.

Contributor

snoyberg commented Sep 8, 2015

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment