Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recheck JVM archive extraction directory after acquiring the file lock. #2197

Merged
merged 2 commits into from
Oct 3, 2021

Conversation

patricklaw
Copy link
Contributor

The existing locking code for extracting JVM archives unconditionally attempts the extraction after acquiring the file lock for the extracted directory, but the check for whether that directory exists happens before the lock is acquired. This means that while waiting for the lock (or losing the race to get it), another process could successfully extract the archive. We then successfully acquire the lock, but the extraction fails because the directory now exists.

I believe this should fix the root cause of pantsbuild/pants#12293.

stuhood added a commit to pantsbuild/pants that referenced this pull request Sep 30, 2021
…and use them for Coursier (#13046)

Coursier is used to fetch JVMs and artifacts for Java support. On `main`, it uses its default cache directory, but as shown in #12293, this has concurrency issues when multiple clients are fetching JVMs at the same time.

The underlying issue is likely to be fixed by coursier/coursier#2197, but in the meantime we can move to using `append_only_caches` for the `coursier` `--cache-dir` and `--jvm-dir`. Alone, this isn't sufficient to avoid concurrency issues (since fetches would simply collide in a new location): but it allows us to re-configure the cache location in `RuleRunner` tests using the `[pytest] execution_slot_var` to prevent collisions.

Along the way, support for mixing `append_only_caches` and `use_nailgun` needed fixing.

Re-enables running of JVM tests by default, and fixes #12293.
Copy link
Member

@alexarchambault alexarchambault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, thanks @patricklaw!

@alexarchambault alexarchambault merged commit c314d81 into coursier:master Oct 3, 2021
@patricklaw
Copy link
Contributor Author

Thanks for merging, Alex! We'd really like to test this in Pants to see if it resolves our CI issues, but after a bit of poking I can't figure out how to cut a binary release of Coursier. Is that a workflow that you run, or am I missing some documentation somewhere?

@alexarchambault
Copy link
Member

I cut 2.0.16-157-gcef6b2330 yesterday, but a new issue came up. Hopefully, the whole release will work now.

@stuhood
Copy link

stuhood commented Oct 6, 2021

Thanks a lot! We'll try consuming it and report back.

@alexarchambault
Copy link
Member

alexarchambault commented Oct 7, 2021

After using the latest launcher a bit, I saw the java / java-home sub-commands are broken (by changes other than this PR)… I need it in other contexts, it should be fixed soon.

@stuhood
Copy link

stuhood commented Oct 25, 2021

After using the latest launcher a bit, I saw the java / java-home sub-commands are broken (by changes other than this PR)… I need it in other contexts, it should be fixed soon.

We have some green runs with v2.0.16-169-g194ebc55c, including with our workaround for this issue removed: thanks a lot for working to get these releases out!

stuhood added a commit to pantsbuild/pants that referenced this pull request Oct 25, 2021
Upgrade to the latest coursier release to pull in coursier/coursier#2197.

Because `coursier` is now distributed in a `.gz` file, #13335 was required. Additionally, the `--jvm-dir` flag was removed, so we move to specifying all cache locations using environment variables. Finally, fix `nailgun` server launching not propagating `env` vars to spawned servers.

Fixes #13140 and #13316.

[ci skip-build-wheels]
stuhood added a commit to pantsbuild/pants that referenced this pull request Oct 26, 2021
With #13336 fixed, we should be able to remove `RuleRunner` cache directory isolation.

Thanks a lot to @patricklaw for the upstream fix in coursier/coursier#2197, and @alexarchambault for his work getting out a new coursier release! 

[ci skip-rust]
[ci skip-build-wheels]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants