-
Notifications
You must be signed in to change notification settings - Fork 303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recheck JVM archive extraction directory after acquiring the file lock. #2197
Recheck JVM archive extraction directory after acquiring the file lock. #2197
Conversation
002463d
to
28604d2
Compare
…and use them for Coursier (#13046) Coursier is used to fetch JVMs and artifacts for Java support. On `main`, it uses its default cache directory, but as shown in #12293, this has concurrency issues when multiple clients are fetching JVMs at the same time. The underlying issue is likely to be fixed by coursier/coursier#2197, but in the meantime we can move to using `append_only_caches` for the `coursier` `--cache-dir` and `--jvm-dir`. Alone, this isn't sufficient to avoid concurrency issues (since fetches would simply collide in a new location): but it allows us to re-configure the cache location in `RuleRunner` tests using the `[pytest] execution_slot_var` to prevent collisions. Along the way, support for mixing `append_only_caches` and `use_nailgun` needed fixing. Re-enables running of JVM tests by default, and fixes #12293.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch, thanks @patricklaw!
Thanks for merging, Alex! We'd really like to test this in Pants to see if it resolves our CI issues, but after a bit of poking I can't figure out how to cut a binary release of Coursier. Is that a workflow that you run, or am I missing some documentation somewhere? |
I cut |
Thanks a lot! We'll try consuming it and report back. |
After using the latest launcher a bit, I saw the |
We have some green runs with |
Upgrade to the latest coursier release to pull in coursier/coursier#2197. Because `coursier` is now distributed in a `.gz` file, #13335 was required. Additionally, the `--jvm-dir` flag was removed, so we move to specifying all cache locations using environment variables. Finally, fix `nailgun` server launching not propagating `env` vars to spawned servers. Fixes #13140 and #13316. [ci skip-build-wheels]
With #13336 fixed, we should be able to remove `RuleRunner` cache directory isolation. Thanks a lot to @patricklaw for the upstream fix in coursier/coursier#2197, and @alexarchambault for his work getting out a new coursier release! [ci skip-rust] [ci skip-build-wheels]
The existing locking code for extracting JVM archives unconditionally attempts the extraction after acquiring the file lock for the extracted directory, but the check for whether that directory exists happens before the lock is acquired. This means that while waiting for the lock (or losing the race to get it), another process could successfully extract the archive. We then successfully acquire the lock, but the extraction fails because the directory now exists.
I believe this should fix the root cause of pantsbuild/pants#12293.