Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRESOLVER-7] download poms in parallel #178

Merged
merged 8 commits into from
Oct 26, 2022

Conversation

caiwei-ebay
Copy link
Contributor

@caiwei-ebay caiwei-ebay commented May 16, 2022

This PR is for parallel POM downloading, the idea is a bit different than https://github.com/ibabiankou/maven-resolver/pull/2/files:

  • only parallelize the VersionRange and ArtifactDesciptor resolution part (poms.xml & maven-metadata.xml downloading in parallel) as the poms downloading or maven-metadata.xml is the most slow part.
  • Pure dependency resolution logic and the skip logic are still run in sequence.

@caiwei-ebay caiwei-ebay marked this pull request as draft May 16, 2022 11:36
@caiwei-ebay
Copy link
Contributor Author

@michael-o @cstamas @ibabiankou

I think the code is already ready for review, I would like to run more tests before converting the PR from Draft to final.
Please help review and share your comments with this PR.

Copy link
Member

@michael-o michael-o left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move the PremanagedDepenency change into a separate PR. This will make it eaiser to review

@caiwei-ebay
Copy link
Contributor Author

@michael-o

Created separate PR for moving the PremanagedDependency class.
#179

@caiwei-ebay
Copy link
Contributor Author

Separated the PR for PremanagedDependency refactoring and rebased the code.

@michael-o
Copy link
Member

Much simpler to read now...

@michael-o michael-o self-requested a review May 17, 2022 17:20
@cstamas
Copy link
Member

cstamas commented Jun 6, 2022

Something is wrong here, unsure what. But, I tried to build maven master using maven-3.9.x + this PR + empty local repo and it fails:

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.10.1:testCompile (default-testCompile) on project maven-model-builder: Compilation failure: Compilation failure: 
[ERROR] /home/cstamas/Worx/apache-maven/maven/maven-model-builder/src/test/java/org/apache/maven/model/interpolation/StringSearchModelInterpolatorTest.java:[36,27] package org.hamcrest does not exist
[ERROR] /home/cstamas/Worx/apache-maven/maven/maven-model-builder/src/test/java/org/apache/maven/model/interpolation/StringSearchModelInterpolatorTest.java:[36,1] static import only from classes and interfaces
[ERROR] /home/cstamas/Worx/apache-maven/maven/maven-model-builder/src/test/java/org/apache/maven/model/interpolation/StringSearchModelInterpolatorTest.java:[37,27] package org.hamcrest does not exist
[ERROR] /home/cstamas/Worx/apache-maven/maven/maven-model-builder/src/test/java/org/apache/maven/model/interpolation/StringSearchModelInterpolatorTest.java:[37,1] static import only from classes and interfaces
[ERROR] -> [Help 1]

The cmd used to test:

[cstamas@infinity maven (maven-3.9.x)]$ ~/bin/maven/apache-maven-3.9.0-SNAPSHOT/bin/mvn clean install -Dmaven.repo.local=local-repo -Drat.skip -DskipTests -Daether.collector.impl=bf

@caiwei-ebay
Copy link
Contributor Author

@cstamas
I can reproduce. Let me do some research.

@caiwei-ebay caiwei-ebay force-pushed the master branch 2 times, most recently from 295deb3 to 77a54f5 Compare June 15, 2022 10:34
@caiwei-ebay caiwei-ebay force-pushed the master branch 2 times, most recently from 7db6ae9 to 2614812 Compare June 17, 2022 12:23
@caiwei-ebay caiwei-ebay marked this pull request as ready for review June 23, 2022 09:29
@caiwei-ebay
Copy link
Contributor Author

caiwei-ebay commented Jun 23, 2022

@michael-o @cstamas

I have converted the PR from draft to ready. Please help review.

The basic idea of this PR is:

  • Given dependency A's children B, C, D
  • Iterate B, C, D one by one, take B as example below.
  • Should B be filtered? If yes, won't add to the queue. Logic here
  • If B should not be filtered, then figure out the managed version of B, ex B1, resolve B1's descriptor in a thread, then add B1 to the queue. Logic here

In such way, poms and metadata.xml will be downloaded in parallel.

The issue @cstamas raised has been fixed. The failure was caused incorrect filtering (should filter with original dependency, B instead of B1 as described in above flow)

Also runs the ITs with both DF and BF (code based on this PR) mode, same test result below:
2 tests that are not related with this PR failed in both modes. Not sure if this is as expected.

[ERROR] Failures: [ERROR] MavenITmng5576CdFriendlyVersions>AbstractMavenIntegrationTestCase.runTest:260->testContinuousDeliveryFriendlyVersionsAreWarningFreeWithBuildConsumer:98 [WARNING] [ERROR] MavenITmng5576CdFriendlyVersions>AbstractMavenIntegrationTestCase.runTest:260->testContinuousDeliveryFriendlyVersionsAreWarningFreeWithoutBuildConsumer:67 [WARNING] [INFO] [ERROR] Tests run: 912, Failures: 2, Errors: 0, Skipped: 0

Also dry-run 1000+ apps in our company by comparing the maven dependency tree with DF and BF mode (code based on this PR), no issues found so far. Test method like this:

  • mvn clean install -DskipTests -Daether.collector.impl=bf -Dmaven.repo.local=bf
  • mvn dependency:tree -DoutputFile=bf-tree.txt -Daether.collector.impl=bf -Dmaven.repo.local=bf
  • mvn clean install -DskipTests -Dmaven.repo.local=df
  • mvn dependency:tree -DoutputFile=df-tree.txt -Dmaven.repo.local=df
  • compare bf-tree.txt df-tree.txt

Please help review and feel free to let me know if anything I can help.

@caiwei-ebay
Copy link
Contributor Author

@michael-o @cstamas
Folks, please help review the PR and let me know if anything needs from me.
This PR can make poms/metadata.xml download much faster.

@cstamas
Copy link
Member

cstamas commented Jul 6, 2022

Will do around weekend, busy few days till then.

@michael-o
Copy link
Member

michael-o commented Jul 6, 2022

Will likely happen either this Friday or in 10 days. I think this should not be in Maven 3.9.0 just to avoid any breakings.

@@ -365,6 +477,64 @@ else if ( descriptorResult == DataPool.NO_DESCRIPTOR )
return descriptorResult;
}

static class ParallelDescriptorResolver
{
final ExecutorService executorService;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this resolver leverage the same executor service as for artifacts with aether.connector.basic.threads? At the end it does not matter what the connector transports, no?

See: org.eclipse.aether.connector.basic.BasicRepositoryConnector

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michael-o
Thanks for the advice. Agreed we should reuse the executor service.

The BfDependencyCollector calls DefaultArtifactsResolver and the latter then calls (Basic)RepositoryConnector.

Do you mean we should make BfDependencyCollector calls RepositoryConnector's get(artifactDownloads, metadataDownloads) to download poms? or simply make the getExecutor method as public and then make BfDependencyCollector call the getExcutor to reuse the executor service. Please help clarify.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the former, fully utilize get(List, List) with all the GetTask, PeekTask. Downloading POMs in parallel shouldn't be any different than before sequentially.

Copy link
Contributor Author

@caiwei-ebay caiwei-ebay Aug 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reply. Busy working these days.

Uploaded a diagram to explain how the dependency resolution works.
here

In a Maven build:

  • DependencyCollector focus on resolve dependency graph by downloading and parsing metadata.xml and pom.xmls which is running in sequence as of now.

  • MojoExecutor calls ensureDependencyResolved which download jars in parallel based on the dependency graph calculated by DependencyCollector.

Using the former (fully unitize get <List, List>) in DependencyCollector seems not possible, I think we need to use a separate thread pool for dependency resolution in DependencyCollector which includes 2 steps:

  • Resolve version range (which resort to VersionRangeResolver -> MetadataResolver -> RepositoryConnector)
  • Resolve descriptor (download or pick local poms and then parse poms, which resort to ArtifactDescriptorReader -> ArtifactResolver -> RepositoryConnector)

By running the above 2 steps in a separate thread pool can achieve better parallelism.
Like the MetadataResolver, it is also using a separate ThreadPool to download metadata.xml one by one from multiple repositories.

May I know any concerns that we should not use another thread pool for dependency resolution here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michael-o @cstamas Please share your thoughts about using a separate pool here. Or please give some design suggestions on how to make the code to meet the merge criteria.

Copy link
Member

@cstamas cstamas Sep 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the executor exists in connector, this is impl module, connector does not depend on impl nor impl depend on connector, see https://maven.apache.org/resolver/index.html

Now, not only that, but also it is quite possible that execution does not "even get to" connector (ie. all is present locally), so this would require major undertake, probably to move executor providing component to SPI (as util is not quite the place for this), then change both uses (so BF collector and connector), but in connector module we'd need to provide some "stub" implementation (as again, it does not depend on impl), and finally that would become one whole. But even writing this down is long, and seems totally unrelated to this PR.

Am currently totally not concerned about sharing executor, but more that executor is created per invocation of collect merhd...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a multiple module project, collect method will be called when building each module, and when run in parallel, collect method will be called by multiple threads. If we consider parallel build case, creates multiple executors seems more reasonable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cstamas Any suggestions about whether we should use separate executors for the Maven parallel build?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I'd follow very same pattern as in connector, it creates executor for each remote repos (not good, but not bad either). So creating executor per resolve invocation is also ok. Later, we could do something around this, but it requires larger scale changes, so leave it for now IMHO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agreed. Could refactor the executor part in a separate PR so all of the download related logic could share.
Another issue you've pointed out is fixed. Thanks for the review. Please let me know if anything else I can help.

@Eskibear
Copy link

Eskibear commented Aug 9, 2022

Any updates on this PR? Can we expect it to be merged soon?

I'm a contributor of JDT Language Server, and one of our core component m2e is depending on maven-resolver. We've been observing that downloading poms one by one takes a lot of time when importing Maven projects. To be honest, it's one of our performance bottleneck and we can hardly improve it on our own, in downstream... So it would be great if this improvement can be shipped in coming releases.

@caiwei-ebay
Copy link
Contributor Author

Any updates on this PR? Can we expect it to be merged soon?

I'm a contributor of JDT Language Server, and one of our core component m2e is depending on maven-resolver. We've been observing that downloading poms one by one takes a lot of time when importing Maven projects. To be honest, it's one of our performance bottleneck and we can hardly improve it on our own, in downstream... So it would be great if this improvement can be shipped in coming releases.

Friends, the PR is under review. Feel free to participate into the contribution. :)

@michael-o
Copy link
Member

Any updates on this PR? Can we expect it to be merged soon?

I'm a contributor of JDT Language Server, and one of our core component m2e is depending on maven-resolver. We've been observing that downloading poms one by one takes a lot of time when importing Maven projects. To be honest, it's one of our performance bottleneck and we can hardly improve it on our own, in downstream... So it would be great if this improvement can be shipped in coming releases.

Proper evaluation I'd mandatory, we already had severe regressions due to the previous approach.

@Eskibear
Copy link

Eskibear commented Sep 5, 2022

Proper evaluation I'd mandatory

I understand and appreciate all the work you guys have done. If I remember clearly, some integration tests and details were discussed in a private ASF slack channel which I had no access...

Just want to know if there's any ETA (already one month since last update)? Anyway, please let me know if there's anything I can help (if that would help to proceed this).

@michael-o
Copy link
Member

Proper evaluation I'd mandatory

I understand and appreciate all the work you guys have done. If I remember clearly, some integration tests and details were discussed in a private ASF slack channel which I had no access...

Just want to know if there's any ETA (already one month since last update)? Anyway, please let me know if there's anything I can help (if that would help to proceed this).

@cstamas we need to discuss this sometime this month.

@caiwei-ebay
Copy link
Contributor Author

Proper evaluation I'd mandatory

I understand and appreciate all the work you guys have done. If I remember clearly, some integration tests and details were discussed in a private ASF slack channel which I had no access...
Just want to know if there's any ETA (already one month since last update)? Anyway, please let me know if there's anything I can help (if that would help to proceed this).

@cstamas we need to discuss this sometime this month.

@michael-o @cstamas

May I know the latest status of discussion? Please never hesitate to let me know if anything I can help or the code I should revise.

@cstamas
Copy link
Member

cstamas commented Sep 28, 2022

@caiwei-ebay will get to it soon, sorry for delay....

@michael-o
Copy link
Member

For me, this won't happen before October.

@caiwei-ebay2
Copy link

@michael-o
How's everything? Got a chance to give more comments to this PR? And may I know what's the remaining steps to get this PR merged?

@cstamas
Copy link
Member

cstamas commented Oct 19, 2022

@michael-o can you please specify what changes are requested with your "change request" PR review? It is unclear to me. As for me, this PR is good to merge (and probably improve it later re shared executor, but is def out of scope for this PR)


private ExecutorService getExecutorService( RepositorySystemSession session )
{
int nThreads = ConfigUtils.getInteger( session, 5, "maven.artifact.threads" );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thing, but I will do this post merge: if you look at BasicRepositoryConnector, it uses CONFIG_PROP_THREADS and just as "secondary" property uses this "maven.artifact.threads" as a fallback. I will update this post merge, to make them behave siimilarly; have it's own prop but fall back to "maven.artifact.threas" as well

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in 7f56672

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants