-
-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store version separately from hash in integrity checks #335
Comments
@MonoidMusician: Yeah, I think this is a great idea. My only minor suggestion is to do it in such a way that it provides a smooth migration path (i.e. treat a semantic integrity check without a version as the last standard version published before this change). Then we can provide Do you think you would be able to standardize this change? |
@MonoidMusician: So I'm looking at this more closely so I can standardize this soon. I have one main suggested change, which is to remove the version tag entirely. In other words:
I believe we can do this so long as we take care to avoid the same encoded expression meaning two different things across different standard versions. For example, suppose we deprecate the |
Fixes #335 The binary representation of Dhall expressions no longer includes the version string, which enables hash stability across standard version changes. This implies that changes to the language standard should take care to avoid reusing the same encoding to mean different things over time. This also implies that interpreters will need to make a greater effort to anticipate likely decoding failures for older expressions and provide helpful error messages.
Here's the proposal to remove the version tag entirely: #362 |
Fixes #335 The binary representation of Dhall expressions no longer includes the version string, which enables hash stability across standard version changes. This implies that changes to the language standard should take care to avoid reusing the same encoding to mean different things over time. This also implies that interpreters will need to make a greater effort to anticipate likely decoding failures for older expressions and provide helpful error messages.
Ah it's been merged already. I was going to suggest using a version string distinct from the dhall version (a |
@Nadrieril: That's how the binary format started out, if I remember correctly. In my view, the main reason against tracking the the version number in the encoding was to compare the following two representative proposals:
These are not the only two possible proposals, but they are useful to compare. Proposal 2 accepts strictly more expressions because there are many expressions that Proposal 1 would reject (false negatives) purely on the basis of the version string being a too-coarse-grained measure of protocol incompatibility. Really what we want is a more fine-grained measure of compatibility that permits older expressions that were still compatible. That fine-grained measure of compatibility already exists: just attempt to decode the expression. Rather than relying on the version string to determine whether or not an older expression is backwards compatible, we can empirically determine compatibility by directly trying to decode the older expression. This empiric determination is more accurate because it doesn't rely on humans to correctly assign version numbers and it accepts the subset of older expressions that don't contain any backwards-incompatible changes. |
But would it possible that the new version is able to decode it but returning another result? It could drive to, maybe improbable, but serious bugs, right? Taking in account that it is possible to decode binary representantions without semantic integrity checks. I think @Nadrieril argument is: we can get the actual behaviour simply not changing the bynary specific version but we have it just in case we want to be more strict about it. And we could only change that version only if it is possible to get two different results for the same representation. |
Inded, I was thinking we would only increment it for major breaking changes that change the format enough that it can't be decoded the same anymore. So in all likelyhood it wouldn't be incremented any time soon, or ever. |
It is possible, but I view it more likely that we have a false negative (rejecting an old expression even though it was compatible) than a false positive (decoding an old expression the wrong way). Usually, if we reclaim an unused older protocol encoding to mean something new we are probably doing so intentionally (such as reclaiming the Regarding future proofing, I don't believe we paint ourselves into a corner by omitting the version number. We can carve out additional unused protocol space by using new list prefixes. This is very similar in spirit to how protocol buffers 3 works for protocol evolution: you allocate new numeric fields and reserve older no-longer-used numeric fields if you want to ensure that they are never repurposed to have a different meaning. In practice, we will probably make breaking changes in the short term while the protocol format is still evolving rapidly so that new implementations don't have to support legacy protocol encodings. Once the language stabilizes and we have enough language bindings then we'll probably be more careful to avoid breaking backwards compatibility. |
Fair enough then, thanks for taking the time to explain :) |
@Nadrieril: You're welcome! 🙂 |
As it stands, the version (of the standard) is an input to the
sha256
hash which is generated for and used in the semantic integrity checks. It makes sense to invalidate hashes when the version changes, but there are version changes that don't affect the hash, other than changing the version string, right?My (incomplete) understanding is that this causes a few difficulties: hashes need to be completely regenerated for even minor version changes (which increases the size of diffs and requires the use of tools such as
dhall freeze
to upgrade every import), and it's not easy for an implementation to detect if a hash is from a version it is still compatible with. (Well, I guess this is the part that I don't fully understand – see the second half of this issue.)Anyhow, my proposal is that the version of the standard be no longer included in the hash itself but instead be stored next to the hash, with a syntax that looks like this:
sha256:5.0.0/bb1e096305428d2e155282d156ddec47cca75cd61bc0ef1aa6ce4cf5eae91e38
.I could see two advantages of this check:
5.1.0
could easily recognize version5.0.0
and verify the hashes using its same algorithm to obtain the same result (since the integrity check should backwards compatible across the minor release).I'm raising this proposal purely for the purposes of discussion, since this is a format that would make sense to me. But I don't actually work with integrity checks, so I don't have a preference either way, and I don't know that I have a particularly good understanding of the issues at hand. In particular, I see that the standard specifies that backwards-compatibility SHOULD be supported already, but I guess I am unfamiliar with the mechanism by which it would be accomplished – is it easier to achieve than I thought?
(This proposal would effectively delete the first sentence there, and hopefully make the rest of it easier for both users and implementations.)
Prior discussions:
The text was updated successfully, but these errors were encountered: