-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
disk action cache can become out of sync with in-memory action cache #2660
Comments
I'm using 0.4.3—I actually avoided upgrading because the cpp input pruning code change looked unstable—but from code inspection it looks like the issue has been present since the dawn of Bazel through today. (You have to get into a very special state to see this issue.) This issue looks like a problem with |
Ok @ulfjack is definitely way more able to understand the action cache that I am, so reassigning. Setting to P1 because correctness |
That's why you shouldn't write your own database. I don't think there's a correctness issue here, it's 'just' forgetting stuff, presumably resulting in slower than necessary builds. Correct? |
If the analysis above is correct, then this bug is many years old already. |
Right, I don't believe this can cause incorrect incremental builds. I note that if this problem did lead to correctness issues, it would highlight defects more systemic than just this bug in |
I think Java may be calling fsync on file close. Anyway, thanks for the patch, I'll merge it shortly. |
I believe 184faf6 is causing #3043 due to the leak of the journal file handle. On Windows, you cannot delete a file when it's open in another process. |
I noticed this commit by blaming PersistentMap.java, I am trying to confirm this by reverting this change . |
Clean should tell the action cache to close all files and remove all in-memory data. If it's not doing that right now, then that'd be a bug already. If it does clear the in-memory action cache, then it shouldn't be difficult to close any open files as part of that. |
One thing I noticed when working on this bug was that there isn't a way to close the resources associated with a |
This fixes bazelbuild/bazel#2660. Basically, if we elect to keep the journal during PersistentMap.save(), we shouldn't stomp over it the next time save() is called. In writeJournal(), we now check if the journal file exists, and open it in append mode if it does. Alternatively, we could simply not close (and thus forget about) the journal in save(), but that would leak the journal file handle if save() was never called with keepJournal() returning false. Change-Id: Id00732f161c8b5a082a6c109aee115591ace2ea7 PiperOrigin-RevId: 152480978
Let's begin with a reproducible example of this bug (output omitted where it's not relevant):
As you can see, Bazel now thinks
a
is out of date despite us doing nothing but restarting the server!I believe the problem is
CompactPersistentActionCache.ActionMap
's ability to retain the cache journal onsave()
. Specifically, Bazel has a heuristic that if the action cache journal is much smaller than the action cache, it keeps the journal instead of applying it to (and rewriting) the main action cache. This is fine in principle, but in reality,PersistentMap
simply forgets about the journal once it closes it. This means that if the journal-keeping heuristic is applied twice in a row, the first cache journal will simply be overwritten by the new one, losing data.In the example, the problem plays out as follows:
noise
rule makes a large action cache.a
, the large action cache causes Bazel to write only to the cache journal.b
, the same heuristic applies and Bazel writes to the action cache journal stamping over the the one it wrote in step 2. The on-disk action cache is now out of sync with the in-memory one.bazel build --check_up_to_date a
works because the in-memory action cache still knows abouta.txt
.3
is read. All references to the work done in step 2 are lost because its journal was never applied.The text was updated successfully, but these errors were encountered: