Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GlobalClassLoaderCache: move call to getFiles out of synchronized block #3370

Merged
merged 1 commit into from
Jan 14, 2021

Conversation

damianw
Copy link
Contributor

@damianw damianw commented Jan 12, 2021

Fixes #3047

This should fix the deadlock issue from #3047 by moving the call to FileCollection.getFiles out of the synchronized block in GlobalClassLoaderCache. I am having a harder time reproducing the issue these days, but we are seeing Detekt tasks hanging in our CI jobs ~1% of the time, which I suspect is caused by #3047. Hopefully we see those disappear after this change.

I added an (admittedly contrived) test to verify that the cache doesn't call FileCollection.getFiles while synchronized on itself, which failed prior to this change but now passes. I also verified that I was able to run Detekt without issue on my own project.

@codecov
Copy link

codecov bot commented Jan 12, 2021

Codecov Report

Merging #3370 (21740c5) into master (c87a0ac) will increase coverage by 0.13%.
The diff coverage is 88.88%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #3370      +/-   ##
============================================
+ Coverage     80.31%   80.44%   +0.13%     
  Complexity     2721     2721              
============================================
  Files           445      445              
  Lines          8248     8250       +2     
  Branches       1566     1566              
============================================
+ Hits           6624     6637      +13     
+ Misses          785      772      -13     
- Partials        839      841       +2     
Impacted Files Coverage Δ Complexity Δ
...lab/arturbosch/detekt/internal/ClassLoaderCache.kt 90.47% <88.88%> (+58.89%) 0.00 <0.00> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c87a0ac...ef8c0a0. Read the comment docs.

@cortinico
Copy link
Member

I guess @arturbosch is the best candidate to review this.

I just have one question:

by moving the call to FileCollection.getFiles out of the synchronized block in GlobalClassLoaderCache

What makes you think that .getFiles is the culprit?

@damianw
Copy link
Contributor Author

damianw commented Jan 13, 2021

Thanks for the quick reply!

What makes you think that .getFiles is the culprit?

This is based on the thread dump that I provided in #3047:

A thread dump for when I have observed this occurring can be found here:

Copy link
Member

@arturbosch arturbosch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, thanks for investigating this hard issue.
Code looks good to me.
Yes, the other detekt invocations wait for the first one, but I still don't understand why getFiles blocks/waits and who blocked it?
DefaultResourceLockCoordinationService.withStateLock and DefaultWorkerLeaseService.acquireLocks really look suspicious.
I'm okay with merging this and see in a RC release if this helps and/or does not introduce other issues.

@arturbosch arturbosch merged commit 77a2d7f into detekt:master Jan 14, 2021
@arturbosch arturbosch added this to the 1.16.0 milestone Jan 14, 2021
This was referenced Mar 11, 2021
This was referenced Mar 11, 2021
@chao2zhang
Copy link
Member

@damianw Did this PR help you with CI reliability? We may have an incoming PR #3799 that is impacting the behavior again.

@sschuberth
Copy link
Contributor

Actually, I don't think #3799 has an impact on this PR, as IIUC the main point of this PR was to not do val classpathFiles = classpath.files in a synchronized block, which still is the case with #3799. Also, if #3799 had a negative impact on the changes in this PR, I'd also expect the test from this PR to break.

@damianw
Copy link
Contributor Author

damianw commented May 20, 2021

I haven't seen this issue recently so as far as I am aware it seems to have been fixed (although the intermittent nature doesn't rule out other confounding variables). 😅

And yes, that's my conclusion as well @sschuberth.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Running multiple Detekt tasks concurrently may cause deadlock
5 participants