Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handle thread interrupts on transitive dependencies parsing (infinite loop) #236

Closed
wants to merge 1 commit into from

Conversation

vfedoriv
Copy link

Hello.
In the scope of our project (https://central.sonatype.dev - new (beta) version of https://search.maven.org) we do parsing pom-s for transitive dependencies, using Maven Resolver.
It works fine in most cases. But recently, during the upload musquette artifact
https://repo1.maven.org/maven2/org/webjars/npm/musquette/1.1.1/
we see 100% CPU load on our service. It appears that the resolver is stuck in process of resolving transitive dependencies
The reason: this component (musquette) has dependencies that in the dependencies resolving process lead to being stuck in an infinite loop.
Here profiler graph
Screenshot 2023-01-27 at 14 08 48
We run each pom file parsing in a separate thread, but have no way to interrupt parsing when it's stuck in resolver methods.
So the goal is to interrupt dependencies parsing after the thread with the parser receives an interrupt event.

@cstamas
Copy link
Member

cstamas commented Jan 27, 2023

Could you elaborate a bit more the "musquette has dependencies that in the dependencies resolving process lead to being stuck in an infinite loop"?

@cstamas
Copy link
Member

cstamas commented Jan 27, 2023

Switch to BF collector and you will get an error:

Could not resolve version conflict among 
[
 org.webjars.npm:musquette:jar:1.1.1 -> 
   org.webjars.npm:minimist:jar:[0.2.1,0.3), 
 org.webjars.npm:musquette:jar:1.1.1 -> 
   org.webjars.npm:mqtt:jar:[3.0.0,4) -> 
     org.webjars.npm:commist:jar:[1.0.0,2) -> 
       org.webjars.npm:minimist:jar:[1.1.0,2), 
 org.webjars.npm:musquette:jar:1.1.1 -> 
   org.webjars.npm:mqtt:jar:[3.0.0,4) -> 
     org.webjars.npm:minimist:jar:[1.2.0,2)
]

Basically, the tree is not resolvable. Still, the endless loop does not sound right and is easily reproducible, just run demo snippet org.apache.maven.resolver.examples.GetDependencyTree and change artifact ctor to have string "org.webjars.npm:musquette:1.1.1" as input...

@cstamas
Copy link
Member

cstamas commented Jan 27, 2023

To switch to BF (breadth first) collector set config property like this:

session.setConfigProperty("aether.dependencyCollector.impl", BfDependencyCollector.NAME );

See:
https://issues.apache.org/jira/browse/MRESOLVER-247

@cstamas
Copy link
Member

cstamas commented Jan 27, 2023

Created bug https://issues.apache.org/jira/browse/MRESOLVER-316

This PR is -1 for me, as this would treat the symptom (stopping endless loop), while IMHO the actual cause should be addressed, and there is a workaround by using BF collector.

@vfedoriv
Copy link
Author

Well, it's not really a "pure" maven artifact based on a java package.
It's a mutant, created from npm package https://www.npmjs.com/package/musquette using https://www.webjars.org/
So for dependencies, it will use similar "mutant" dependencies on other webjar packages. And what worse - it using version ranges.
Not all of them really exist in repo (Could not find artifact org.webjars.npm:es6-set:pom:0.1.5 in central). In process of dependencies resolving ~1500 files will be downloaded (but after some time file list stop growing and stay fixed), resolver starts to dance in some big circle trying to resolve dependencies. It can take a few hours w/o visible results (then we give up and kill pod).
UPD. Tried snippet - the process never ends, CPU load 100%

Will try the solution you recommend and get back.

@cstamas
Copy link
Member

cstamas commented Jan 27, 2023

The snippet changes I have locally: switch to BF collector and using musquette artifact:

diff --git a/maven-resolver-demos/maven-resolver-demo-snippets/src/main/java/org/apache/maven/resolver/examples/GetDependencyTree.java b/maven-resolver-demos/maven-resolver-demo-snippets/src/main/java/org/apache/maven/resolver/examples/GetDependencyTree.java
index 78af7839..050c96b8 100644
--- a/maven-resolver-demos/maven-resolver-demo-snippets/src/main/java/org/apache/maven/resolver/examples/GetDependencyTree.java
+++ b/maven-resolver-demos/maven-resolver-demo-snippets/src/main/java/org/apache/maven/resolver/examples/GetDependencyTree.java
@@ -21,6 +21,7 @@ package org.apache.maven.resolver.examples;
 
 import org.apache.maven.resolver.examples.util.Booter;
 import org.apache.maven.resolver.examples.util.ConsoleDependencyGraphDumper;
+import org.eclipse.aether.DefaultRepositorySystemSession;
 import org.eclipse.aether.RepositorySystem;
 import org.eclipse.aether.RepositorySystemSession;
 import org.eclipse.aether.artifact.Artifact;
@@ -28,6 +29,7 @@ import org.eclipse.aether.artifact.DefaultArtifact;
 import org.eclipse.aether.collection.CollectRequest;
 import org.eclipse.aether.collection.CollectResult;
 import org.eclipse.aether.graph.Dependency;
+import org.eclipse.aether.internal.impl.collect.bf.BfDependencyCollector;
 
 /**
  * Collects the transitive dependencies of an artifact.
@@ -48,9 +50,10 @@ public class GetDependencyTree
 
         RepositorySystem system = Booter.newRepositorySystem( Booter.selectFactory( args ) );
 
-        RepositorySystemSession session = Booter.newRepositorySystemSession( system );
+        DefaultRepositorySystemSession session = Booter.newRepositorySystemSession( system );
+        session.setConfigProperty("aether.dependencyCollector.impl", BfDependencyCollector.NAME );
 
-        Artifact artifact = new DefaultArtifact( "org.apache.maven:maven-resolver-provider:3.6.1" );
+        Artifact artifact = new DefaultArtifact( "org.webjars.npm:musquette:1.1.1" );
 
         CollectRequest collectRequest = new CollectRequest();
         collectRequest.setRoot( new Dependency( artifact, "" ) );

@cstamas
Copy link
Member

cstamas commented Jan 27, 2023

Few perf related tips (and then am out for today):

  • Use latest 1.9.4 release of resolver
  • We could not measure perf diff on DF (original) and BF (new) collector, but user reports says BF is faster on big trees (we did not found big enough OSS projects 😄 ) BUT they should behave same despite different strategy
  • Use native http transport (instead of Wagon), that cuts HTTP request count in half (for artifact download) as it extracts checksums from response headers, instead to make another request to get checksum files
  • If you are on Java 11 or newer, try out HTTP/2 transports as well, they are even faster than native (Central supports HTTP/2) [MRESOLVER-308] HTTP transport showdown. #231

@vfedoriv
Copy link
Author

@cstamas Thanks a lot for your help and fast feedback.

Copy link
Contributor

@gnodet gnodet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the same should be added to the BF collector.

@cstamas
Copy link
Member

cstamas commented Feb 7, 2023

I agree with @gnodet this should be handled, and not only in DF but also BF, but there is a catch: the "endless loop" did not happen where this PR changes code but in visitor (as I debugged, visitor goes into endless loop, and there is no IO happening, nor anything else)

@vfedoriv
Copy link
Author

@cstamas So where/what changes do you suggest?
Maybe better if I close this pr and you will do the necessary changes yourself, because you are definitely more familiar with this code.

@cstamas
Copy link
Member

cstamas commented Feb 14, 2023

@vfedoriv yes please, close this one, and created issue https://issues.apache.org/jira/browse/MRESOLVER-321

Just to explain:

  • MRESOLVER-316 is about endless loop: it is a bug
  • MRESOLVER-321 is about "ability to abort resolver" (ie. by interrupting it's thread)

IMO, MRESOLVER-316 even if fixed, will not make possible aborting as MRESOLVER-321 asks for, so aborting will need some spots for this spread in multiple places.

@vfedoriv vfedoriv closed this Feb 14, 2023
@cstamas
Copy link
Member

cstamas commented Nov 27, 2023

Superseded by #380

Now both collectors are "interrupt aware" and will handle same way interruption (see UT).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants