Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encode integrity check as multihash #549

Merged

Conversation

singpolyma
Copy link
Collaborator

Using a byte string with internal label as much more compact than
storing base16 as unicode. Re-using the multihash spec so we don't have
to invent our own but using such a small subset of it that implementors
do not need to be familiar with multihash at all to implement this.

This change does not affect semantic hashes, since semantic hashes are
computed on fully resolved expressions.

Closes #548

Copy link
Contributor

@Gabriella439 Gabriella439 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this mainly needs an encoding test, but otherwise looks great to me

standard/binary.md Outdated Show resolved Hide resolved
standard/binary.md Outdated Show resolved Hide resolved
@singpolyma singpolyma force-pushed the smaller-binary-integrity-check branch from 1b497de to 26e2600 Compare May 18, 2019 02:20
@Nadrieril
Copy link
Member

Nadrieril commented May 18, 2019

I am weakly opposed to this, since this will require special support for the cbor library used by each implementation. Moreover we rarely encode non-resolved expressions, so the gains look small to me.
The Rust cbor library for example essentially only supports encoding a JSON value. I expect that in languages with built-in JSON-like structures, like JavaScript, cbor libraries could have similar limitations.

@singpolyma
Copy link
Collaborator Author

singpolyma commented May 18, 2019 via email

@f-f
Copy link
Member

f-f commented May 18, 2019

I am also not feeling great about this, for the same reasons as @Nadrieril:

  • some CBOR libraries don't easily give access to the "building blocks" of CBOR and basically only support encoding a JSON value (the Clojure one doesn't even mention "multihash")
  • the gains are little (as encoding non-resolved expressions is rare indeed) compared to the implementation complications this adds

@singpolyma
Copy link
Collaborator Author

singpolyma commented May 18, 2019 via email

@singpolyma
Copy link
Collaborator Author

singpolyma commented May 18, 2019 via email

@singpolyma
Copy link
Collaborator Author

browser side CBOR library for JavaScript supports bytestring as Uint8Array: https://github.com/paroga/cbor-js/blob/master/cbor.js#L158

Rust serde_cbor implements for at least &[u8]: https://docs.rs/serde/1.0.91/src/serde/de/impls.rs.html#486-493 + https://docs.rs/serde_cbor/0.9.0/src/serde_cbor/de.rs.html#183

@Gabriella439
Copy link
Contributor

Gabriella439 commented May 20, 2019

So from reading through this it seems like the only sticky issue is the Clojure clj-cbor library.

@f-f: Would it be possible to open an issue against them to support encoding Clojure byte arrays as CBOR byte arrays?

For me it seems reasonable to expect a language to (A) support byte arrays and (B) have a CBOR library that supports encoding those byte arrays as CBOR byte arrays.

@f-f
Copy link
Member

f-f commented May 20, 2019

@Gabriel439 sorry for the delay on this. As @singpolyma figured I was just confused by the mention of "multihash" (which is independent from the CBOR encoding, while I thought it was a CBOR feature).
Anyways as he noted clj-cbor seems to encode Java ByteArrays to CBOR bytestrings, so everything should be fine library-wise. Thanks for looking into this!

Using a byte string with internal label as much more compact than
storing base16 as unicode.  Re-using the multihash spec so we don't have
to invent our own but using such a small subset of it that implementors
do not need to be familiar with multihash at all to implement this.

This change does not affect semantic hashes, since semantic hashes are
computed on fully resolved expressions.

Closes dhall-lang#548
@singpolyma singpolyma force-pushed the smaller-binary-integrity-check branch from 26e2600 to 45d4c10 Compare May 22, 2019 01:13
@singpolyma
Copy link
Collaborator Author

I've pushed updates to the tests based on my branch implementation in dhall-ruby

@singpolyma singpolyma merged commit 70c0498 into dhall-lang:master May 24, 2019
@singpolyma singpolyma deleted the smaller-binary-integrity-check branch May 24, 2019 00:50
philandstuff added a commit that referenced this pull request Jun 9, 2019
PR #549 adopted the use of multihash for binary encoding of semantic
hashes; this commit extends that to the filenames of cache files on
disk.
philandstuff added a commit that referenced this pull request Jun 9, 2019
PR #549 adopted the use of multihash for binary encoding of semantic
hashes; this commit extends that to the filenames of cache files on
disk.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Binary encoding of integrity check
5 participants