Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plugin: Simpler classLoader caching (#3248) #3799

Merged
merged 1 commit into from
May 24, 2021

Conversation

Dominick1993
Copy link
Contributor

@Dominick1993 Dominick1993 commented May 19, 2021

Previous caching mechanism didn't work properly when multiple projects
were running detekt task in parallel and differed in classpath. Due to
the broken locking mechanism the classpath files would not match the
content in the classloader.

The new implementation uses ConcurrentHashMap which handles locking by
itself and allows us to store classloader for any combination of files.

@codecov
Copy link

codecov bot commented May 20, 2021

Codecov Report

Merging #3799 (d52f9ae) into main (df4d921) will not change coverage.
The diff coverage is n/a.

❗ Current head d52f9ae differs from pull request most recent head 881657f. Consider uploading reports for the commit 881657f to get more accurate results
Impacted file tree graph

@@            Coverage Diff            @@
##               main    #3799   +/-   ##
=========================================
  Coverage     83.48%   83.48%           
  Complexity     3104     3104           
=========================================
  Files           456      456           
  Lines          8937     8937           
  Branches       1751     1751           
=========================================
  Hits           7461     7461           
  Misses          558      558           
  Partials        918      918           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update df4d921...881657f. Read the comment docs.

Copy link
Member

@chao2zhang chao2zhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for looking into the long-standing issue - I am still curious about how the existing implementation caused crash, because the existing implementation is inefficient in waiting for lock release but should still be correct - Return a URLClassloader for the given classpath files.

Copy link
Member

@BraisGabin BraisGabin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we cache all the URLClassLoader instead of just the last one? Aren't we creating a memory leak here?

@Dominick1993
Copy link
Contributor Author

Should we cache all the URLClassLoader instead of just the last one?

I think that it is safe to store multiples of them as it allows for greater re-usability.

Aren't we creating a memory leak here?

This is a point I am not quite sure about. And I think there are 2 points to consider:

  • Are the classloaders properly closed when all detekt tasks finish?
  • Should we limit the size of the (hash)map? Is it possible that someone might run it with hundreds of different classpaths so the size might become problematic?

Copy link
Member

@cortinico cortinico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's great 🎉 Thanks for taking the time to fix this.
Also don't forget to add yourself to the contributors list in the README

}

return loaderAndClasspathFiles?.first ?: error("Cached or newly created detekt classloader expected.")
val hasSuitableClassLoader = classpathFilesWithLoaders.containsKey(classpathFiles)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'm unsure that using a Set<File> as key for the ConcurrentHashMap is the best idea. Maybe we can use classpath.asPath that is a String instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was even thinking about using classpathFiles.hashCode. Or, if we don't like it we could compute the md5 or any other hash so we don't have an unbounded collection.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah also. I'm also unsure if the order of the File in the FileCollection is relevant or we can ignore it for our specific case.

Copy link
Contributor Author

@Dominick1993 Dominick1993 May 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the lookup of existing key should be file order agnostic, so we could use something like HashSet<File> constructed from classpath: FileCollection and use its .hashCode() as HashMap keys. HashSet<File>.hashCode() would trigger File.hashCode() on its content, but it should be fairly quick, since the File hash code is returned by the Filesystem (OS).

That's almost what it does in the current state as HashMap<Set<File>, _>, except for the order agnosticism (which could be done simply by HashMap<HashSet<File>, _>) and not storing the whole File objects as keys, which is really unnecessary in our case.

Regarding the MD5, I think that hash code should be a sufficient key, or am I missing something? The probability of hitting duplicates should be low enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the implementation in 21088b9 to match the one I proposed above. I got the same execution times as I did with the original one.

@Dominick1993
Copy link
Contributor Author

  • Are the classloaders properly closed when all detekt tasks finish?

Seems like once all classes loaded by the classloader and the classloader itself are no longer referenced, the classloader is closed and can be garbage collected. Therefore I assume this shoudn't be a problem, once the detekt task finishes. Please correct me if I am wrong.

  • Should we limit the size of the (hash)map? Is it possible that someone might run it with hundreds of different classpaths so the size might become problematic?

This probably shouldn't be an issue and if so OOM on Gradle JVM would be fairly easy to overcome / fix.

@Dominick1993
Copy link
Contributor Author

I've squashed the commits and rebased onto latest state of main branch.

@chao2zhang chao2zhang linked an issue May 21, 2021 that may be closed by this pull request
@chao2zhang chao2zhang added this to the 1.18.0 milestone May 21, 2021
Copy link
Member

@BraisGabin BraisGabin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the research!

Previous caching mechanism didn't work properly when multiple projects
were running detekt task in parallel and differed in classpath. Due to
the broken locking mechanism the classpath files would not match the
content in the classloader.

The new implementation uses ConcurrentHashMap which handles locking by
itself and allows us to store classloader for any combination of files.
@cortinico
Copy link
Member

Can we merge this? So we can collect some early feedbacks from the latest snapshot?

@cortinico cortinico added the notable changes Marker for notable changes in the changelog label May 24, 2021
@chao2zhang chao2zhang merged commit 6abfd10 into detekt:main May 24, 2021
@chao2zhang
Copy link
Member

I was actually waiting for all checks to pass before merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
notable changes Marker for notable changes in the changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Detekt parallel crash
5 participants