-
Notifications
You must be signed in to change notification settings - Fork 658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
.vo files vary from build order #11229
Comments
Umm, that's bizarre and certainly a bug; while we try to debug, do you have access to the differing files? You can use |
Since it looks like a Require issue it may be worth trying with 8.11/master. |
|
Looks a bit tricky to analyze, I guess having the ability for I need to take care of other stuff, I reproduced just doing two dune builds, one with |
It might be due to the fact that hashconsing is not fully deterministic. Depending on the way the list of identifiers are processed we might get different in-memory representation. |
Indeed I was wondering if the difference would be in the hash-cons, but how can this happen? Not for terms at least; almost surely what's causing the difference here is that we are taking some has of the directory contents. |
In add_vo_path the directory names are put through Lines 246 to 266 in de91f71
|
I don't understand how this affects the hashconsing, tho even ordering would not yield the same results as for example some directories may not be present between two runs I think. |
I could not find |
@bmwiedemann the Coq version you used to generate these two files is still 8.9.1, right? There is more than a change of representation, it seems that the provided files are different just because the vo files they are depending on are already different. Indeed, vo files store a digest of their dependencies and it happens in your example that it's essentially what is changing. |
Forgot to mention, but the single offending file in your example is |
@ppedrot indeed a few files are like that, but there is some root cause; I can provide a full
|
I can provide a script to dump a structured representation of the binary contents of the vo, if that helps. |
If that is diff-friendly would be great, I was thinking of coding a [offtopic note, maybe it's about time we move from .vo, we should have some chat about it.] |
BTW, I am spotting highly suspicious code in the safe demarshaller that seems to have been introduced by the native integer and float additions. I think we can get interesting segfaults when checking files from another architecture... |
Here is a diff-friendly vo-dumping program: https://github.com/ppedrot/vodump. |
I found a minimal counter-example for file Now, understanding why this happened is another story. |
Note that this file uses modules, and they famously rely on an imperative delayed substitution mechanism... |
I do suspect that this is linked to hashconsing, in a way that is unrelated to the directory contents. Rather, this is due to the fact that the GC itself is a source of randomness. My theory is the following:
Definitely the memory representation will be different depending on whether 3. occurred. |
How does that break sharing though? |
The issue appears right in a place where we use mutable state to delay the application of substitutions, so I guess it is an interaction with this imperative use of data. |
Sounds quite plausible @ppedrot . |
What about optionally marshaling with the option |
That could work, but to make it the default we'd have to bench size and time impact. |
Is it confirmed that No_sharing fixes the issue? |
Is there an easy way to use your infrastructure if we come up with some patches? Or should we just ask you? |
You could use my tools on your hardware, or just use the simpler -j variations that ejgallego used. I can also test patches for you (with the disadvantage of slower feedback cycles). |
I think it is safe to remove this from 8.11 deadline, right @ppedrot ? |
coq 8.11.2 is still affected. |
Thanks for the report @bmwiedemann ; I'm afraid that 8.12 will still suffer from this; if @ppedrot analysis is correct the fix is not going to be easy due to the way Coq on-disk representation system is designed :S |
Well, I do suspect that making module substitutions pure would solve the problem. This can be tried easily but since the bench infrastructure is currently down it's hard to check whether this would have negative performance consequences. |
Seems to be fixed after #14337. |
What is the idea? I have a hard time imagining delayed module substitution would be the only reason serialization would be non-deterministic. Was that the only source of unsharing in Coq? |
@silene there is a tentative explanation a few comments above, although I don't know whether the problem is really solved or if I was not unlucky enough to hit it. |
my test records show 8.12.0 as the first reproducible version around 2020-08-15 |
I cannot find it. I understand why eagerly substituting modules is a step toward solving the issue, but I do not understand why it would definitely solve it. Is there no more hashconsing in Coq? |
@silene the problem is not hashconsing, it hashconsing of mutable data structures. We don't have other instances of this phenomenon in the code. The only places I know where we have mutability is:
|
Since @bmwiedemann's records confirm Coq is now reproducible, tentatively closing. |
Best reopened, based on https://coq.zulipchat.com/#narrow/stream/237656-Coq-devs-.26-plugin-devs/topic/Dune.20caching.20.26.20Coq.20non-reproducible.20builds. Unless I made some very weird mistake, a (very mildly patched) Coqc 8.16.0 is (again?) non-reproducible, which affects our dune-based CI. Here are two ARM vo files from the same source file — everything is public I tried https://github.com/ppedrot/vodump from #11229, but it doesn't work as-is (fails on the magic number, might be easy to update?). Most of the credits go to We also have evidence that this happens on our x86 CI — possibly with lower frequency. I should note this file uses
but other files don't use modules so blatantly. I wiped my dune cache and I'm collecting more data via repeated rebuilds — I've already hit the same If anybody wants to try, at recent dune versions make it easy (I'm using 3.6.1): (for i in `seq 1 10`; do rm -rf _build/; time DUNE_CACHE_CHECK_PROBABILITY=1 dune b; done) 2>&1| tee -a output Combining The example in the gist involves modules, but I'll upload a new one without where it seems that opaque constant ... while other times Here it is: https://gist.github.com/Blaisorblade/a7274f0eb15a13afbaa083b5b3673697 for the example that does not involve modules. It should be easy to confirm the same string appears once vs twice, and the node paths hint to the involved types and code:
but the original example (
I haven't cleaned up the analysis of |
Applying for Consortium support on behalf of Bedrock Systems (cc @gmalecha ). |
@Blaisorblade |
Going to move this to a new issue as I don't think the old comments are useful. |
As this issue is still active, just a small note about |
Description of the problem
While working on reproducible builds for openSUSE, I found that
In addition to #11227 , there are variations in .vo files
that go away when I build on a filesystem that has deterministic
readdir
order and build withmake
instead ofmake -j4
- so that the builds happen in deterministic order.https://github.com/bmwiedemann/openSUSE/blob/master/packages/c/coq/coq.spec#L77 has the details of how we build.
Coq Version
8.9.1
The text was updated successfully, but these errors were encountered: