Skip to content

plugin: Simpler classLoader caching (#3248)#3799

Merged
picklebento merged 1 commit intodetekt:mainfrom
Dominick1993:fix_parallel_run
May 24, 2021
Merged

plugin: Simpler classLoader caching (#3248)#3799
picklebento merged 1 commit intodetekt:mainfrom
Dominick1993:fix_parallel_run

Conversation

@Dominick1993
Copy link
Copy Markdown
Contributor

@Dominick1993 Dominick1993 commented May 19, 2021

Previous caching mechanism didn't work properly when multiple projects
were running detekt task in parallel and differed in classpath. Due to
the broken locking mechanism the classpath files would not match the
content in the classloader.

The new implementation uses ConcurrentHashMap which handles locking by
itself and allows us to store classloader for any combination of files.

@codecov
Copy link
Copy Markdown

codecov bot commented May 20, 2021

Codecov Report

Merging #3799 (d52f9ae) into main (df4d921) will not change coverage.
The diff coverage is n/a.

❗ Current head d52f9ae differs from pull request most recent head 881657f. Consider uploading reports for the commit 881657f to get more accurate results
Impacted file tree graph

@@            Coverage Diff            @@
##               main    #3799   +/-   ##
=========================================
  Coverage     83.48%   83.48%           
  Complexity     3104     3104           
=========================================
  Files           456      456           
  Lines          8937     8937           
  Branches       1751     1751           
=========================================
  Hits           7461     7461           
  Misses          558      558           
  Partials        918      918           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update df4d921...881657f. Read the comment docs.

Copy link
Copy Markdown
Member

@picklebento picklebento left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for looking into the long-standing issue - I am still curious about how the existing implementation caused crash, because the existing implementation is inefficient in waiting for lock release but should still be correct - Return a URLClassloader for the given classpath files.

Copy link
Copy Markdown
Member

@BraisGabin BraisGabin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we cache all the URLClassLoader instead of just the last one? Aren't we creating a memory leak here?

@Dominick1993
Copy link
Copy Markdown
Contributor Author

Should we cache all the URLClassLoader instead of just the last one?

I think that it is safe to store multiples of them as it allows for greater re-usability.

Aren't we creating a memory leak here?

This is a point I am not quite sure about. And I think there are 2 points to consider:

  • Are the classloaders properly closed when all detekt tasks finish?
  • Should we limit the size of the (hash)map? Is it possible that someone might run it with hundreds of different classpaths so the size might become problematic?

Copy link
Copy Markdown
Member

@cortinico cortinico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's great 🎉 Thanks for taking the time to fix this.
Also don't forget to add yourself to the contributors list in the README

}

return loaderAndClasspathFiles?.first ?: error("Cached or newly created detekt classloader expected.")
val hasSuitableClassLoader = classpathFilesWithLoaders.containsKey(classpathFiles)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'm unsure that using a Set<File> as key for the ConcurrentHashMap is the best idea. Maybe we can use classpath.asPath that is a String instead?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was even thinking about using classpathFiles.hashCode. Or, if we don't like it we could compute the md5 or any other hash so we don't have an unbounded collection.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah also. I'm also unsure if the order of the File in the FileCollection is relevant or we can ignore it for our specific case.

Copy link
Copy Markdown
Contributor Author

@Dominick1993 Dominick1993 May 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the lookup of existing key should be file order agnostic, so we could use something like HashSet<File> constructed from classpath: FileCollection and use its .hashCode() as HashMap keys. HashSet<File>.hashCode() would trigger File.hashCode() on its content, but it should be fairly quick, since the File hash code is returned by the Filesystem (OS).

That's almost what it does in the current state as HashMap<Set<File>, _>, except for the order agnosticism (which could be done simply by HashMap<HashSet<File>, _>) and not storing the whole File objects as keys, which is really unnecessary in our case.

Regarding the MD5, I think that hash code should be a sufficient key, or am I missing something? The probability of hitting duplicates should be low enough.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the implementation in 21088b9 to match the one I proposed above. I got the same execution times as I did with the original one.

@Dominick1993
Copy link
Copy Markdown
Contributor Author

  • Are the classloaders properly closed when all detekt tasks finish?

Seems like once all classes loaded by the classloader and the classloader itself are no longer referenced, the classloader is closed and can be garbage collected. Therefore I assume this shoudn't be a problem, once the detekt task finishes. Please correct me if I am wrong.

  • Should we limit the size of the (hash)map? Is it possible that someone might run it with hundreds of different classpaths so the size might become problematic?

This probably shouldn't be an issue and if so OOM on Gradle JVM would be fairly easy to overcome / fix.

@Dominick1993
Copy link
Copy Markdown
Contributor Author

I've squashed the commits and rebased onto latest state of main branch.

@picklebento picklebento linked an issue May 21, 2021 that may be closed by this pull request
@picklebento picklebento added this to the 1.18.0 milestone May 21, 2021
Copy link
Copy Markdown
Member

@BraisGabin BraisGabin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the research!

Previous caching mechanism didn't work properly when multiple projects
were running detekt task in parallel and differed in classpath. Due to
the broken locking mechanism the classpath files would not match the
content in the classloader.

The new implementation uses ConcurrentHashMap which handles locking by
itself and allows us to store classloader for any combination of files.
@cortinico
Copy link
Copy Markdown
Member

Can we merge this? So we can collect some early feedbacks from the latest snapshot?

@cortinico cortinico added the notable changes Marker for notable changes in the changelog label May 24, 2021
@picklebento picklebento merged commit 6abfd10 into detekt:main May 24, 2021
@picklebento
Copy link
Copy Markdown
Member

I was actually waiting for all checks to pass before merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

notable changes Marker for notable changes in the changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Detekt parallel crash

5 participants