-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ca-derivations can create malformed NAR serializations #8113
Comments
That is quite unfortunate 🤔 I think we could solve that by parsing and rebuilding the NAR during hash rewriting, but I'm a bit afraid of the performance impact for big paths. I guess the best way to know would be to try it |
That would also make it a lot harder to verify the hash afterwards, since that also relies on the fact that hash-rewriting is done in-place and doesn't touch anything besides those hashes. |
Just a random idea: The rewrite only happens on an internal, temporary nar serialisation. That nar is never uploaded, or made visible to the users. It's rater (non ca derivation -> nar serialize -> ca-path filter -> narDeserialize -> dump to ca path), right ? So one could add an internal flag to the nar deserializer to accept nars that are "invalid" due to a wrong ordering. Than ca logic could turn it on for it's internal needs, and no-one ever needs to know that from the outside. |
Oh, but the hash of the content also relies on that ordering, right ? so we actually need a rewrite-aware nar serializer. That is a bit more tricky, but feasible too. Way better than fixing an already composed nar. |
The verification scheme for ca-derivations relies on the fact that you can reproduce the 'internal' nar serialization by doing the substitution again with the final hash, so this makes it impossible to verify the output.
Yes, since the contents are serialized in topological order based on their file path, if the order of entries changes then the file content also gets moved around. This is pretty similar to the issues ca-derivations have with compressed data, where the build-time hash leaks into the output in a way that cannot be substituted for a stable value. As long as hash substitution is limited to the file contents and symlinks the nar serialization should remain valid. |
As of now we do
This is not particularly efficient, and could be turned into
This new algo has no in-memory copy of the path content. (better for perfs, a bit more risky if the content changes on disk). The narHashWithStringsReplaceModulo is tricky, but not impossible. Mostly rewrite all the names before using them, and all the content. Keeping rewrite indexes is really tricky however. The bad property is that we still cannot hash a content addressed path, as self-references in path names will break it's nar serialisation. We need to check that paths are indeed content-adressed using
i.e., there is no way to test the ca-validity of a caPath just from it's nar serialization. The only other option I see is a narStreamRewriteStream, which would be more generic, but would have to maintain an in-memory representation of the tree encoded in the nar, and stream it with dir entries reordered according to the rewritten strings. That's... maybe the best way to go 🤔 |
That sounds like a good plan to me; would you have time to take a crack at implementing it? You might also take a look at #4282, which cleans up some of the same code and would make for a nicer starting point (if it passes CI!) |
I see the linked PR has been merged, but it appears to still be a direct find/replace over the serialized NAR. So this issue should still be present. |
Describe the bug
Currently the ca-derivations feature implements its hash rewriting as a context-free replace over the NAR serialization of the output, but if any filename in the nar archive includes the hash this can break the lexical order of entries in the nar archive.
Steps To Reproduce
The following derivation creates a package that is very sensitive to changes in entry order:
On my machine trying to make this content-addressed results in the following error:
Expected behavior
I expect this to be broken since the requirements of the NAR serialization conflict with the context-free hash rewriting. One way to solve this is to ignore filenames during hash rewriting, but that will complicate the rewriting process. I'm mostly creating this issue so the behavior can be documented.
nix-env --version
outputPriorities
Add 👍 to issues you find important.
The text was updated successfully, but these errors were encountered: