-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRESOLVER-7] download poms in parallel #178
Conversation
@michael-o @cstamas @ibabiankou I think the code is already ready for review, I would like to run more tests before converting the PR from Draft to final. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move the PremanagedDepenency
change into a separate PR. This will make it eaiser to review
Created separate PR for moving the PremanagedDependency class. |
Separated the PR for PremanagedDependency refactoring and rebased the code. |
Much simpler to read now... |
Something is wrong here, unsure what. But, I tried to build maven master using maven-3.9.x + this PR + empty local repo and it fails:
The cmd used to test:
|
@cstamas |
295deb3
to
77a54f5
Compare
7db6ae9
to
2614812
Compare
I have converted the PR from draft to ready. Please help review. The basic idea of this PR is:
In such way, poms and metadata.xml will be downloaded in parallel. The issue @cstamas raised has been fixed. The failure was caused incorrect filtering (should filter with original dependency, B instead of B1 as described in above flow) Also runs the ITs with both DF and BF (code based on this PR) mode, same test result below:
Also dry-run 1000+ apps in our company by comparing the maven dependency tree with DF and BF mode (code based on this PR), no issues found so far. Test method like this:
Please help review and feel free to let me know if anything I can help. |
@michael-o @cstamas |
Will do around weekend, busy few days till then. |
Will likely happen either this Friday or in 10 days. I think this should not be in Maven 3.9.0 just to avoid any breakings. |
...er-impl/src/main/java/org/eclipse/aether/internal/impl/collect/bf/BfDependencyCollector.java
Outdated
Show resolved
Hide resolved
@@ -365,6 +477,64 @@ else if ( descriptorResult == DataPool.NO_DESCRIPTOR ) | |||
return descriptorResult; | |||
} | |||
|
|||
static class ParallelDescriptorResolver | |||
{ | |||
final ExecutorService executorService; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this resolver leverage the same executor service as for artifacts with aether.connector.basic.threads
? At the end it does not matter what the connector transports, no?
See: org.eclipse.aether.connector.basic.BasicRepositoryConnector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@michael-o
Thanks for the advice. Agreed we should reuse the executor service.
The BfDependencyCollector calls DefaultArtifactsResolver and the latter then calls (Basic)RepositoryConnector.
Do you mean we should make BfDependencyCollector calls RepositoryConnector's get(artifactDownloads, metadataDownloads) to download poms? or simply make the getExecutor method as public and then make BfDependencyCollector call the getExcutor to reuse the executor service. Please help clarify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean the former, fully utilize get(List, List)
with all the GetTask
, PeekTask
. Downloading POMs in parallel shouldn't be any different than before sequentially.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late reply. Busy working these days.
Uploaded a diagram to explain how the dependency resolution works.
In a Maven build:
-
DependencyCollector focus on resolve dependency graph by downloading and parsing metadata.xml and pom.xmls which is running in sequence as of now.
-
MojoExecutor calls ensureDependencyResolved which download jars in parallel based on the dependency graph calculated by DependencyCollector.
Using the former (fully unitize get <List, List>) in DependencyCollector seems not possible, I think we need to use a separate thread pool for dependency resolution in DependencyCollector which includes 2 steps:
- Resolve version range (which resort to VersionRangeResolver -> MetadataResolver -> RepositoryConnector)
- Resolve descriptor (download or pick local poms and then parse poms, which resort to ArtifactDescriptorReader -> ArtifactResolver -> RepositoryConnector)
By running the above 2 steps in a separate thread pool can achieve better parallelism.
Like the MetadataResolver, it is also using a separate ThreadPool to download metadata.xml one by one from multiple repositories.
May I know any concerns that we should not use another thread pool for dependency resolution here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@michael-o @cstamas Please share your thoughts about using a separate pool here. Or please give some design suggestions on how to make the code to meet the merge criteria.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the executor exists in connector, this is impl module, connector does not depend on impl nor impl depend on connector, see https://maven.apache.org/resolver/index.html
Now, not only that, but also it is quite possible that execution does not "even get to" connector (ie. all is present locally), so this would require major undertake, probably to move executor providing component to SPI (as util is not quite the place for this), then change both uses (so BF collector and connector), but in connector module we'd need to provide some "stub" implementation (as again, it does not depend on impl), and finally that would become one whole. But even writing this down is long, and seems totally unrelated to this PR.
Am currently totally not concerned about sharing executor, but more that executor is created per invocation of collect merhd...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a multiple module project, collect method will be called when building each module, and when run in parallel, collect method will be called by multiple threads. If we consider parallel build case, creates multiple executors seems more reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cstamas Any suggestions about whether we should use separate executors for the Maven parallel build?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now I'd follow very same pattern as in connector, it creates executor for each remote repos (not good, but not bad either). So creating executor per resolve invocation is also ok. Later, we could do something around this, but it requires larger scale changes, so leave it for now IMHO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally agreed. Could refactor the executor part in a separate PR so all of the download related logic could share.
Another issue you've pointed out is fixed. Thanks for the review. Please let me know if anything else I can help.
Any updates on this PR? Can we expect it to be merged soon? I'm a contributor of JDT Language Server, and one of our core component m2e is depending on maven-resolver. We've been observing that downloading poms one by one takes a lot of time when importing Maven projects. To be honest, it's one of our performance bottleneck and we can hardly improve it on our own, in downstream... So it would be great if this improvement can be shipped in coming releases. |
Friends, the PR is under review. Feel free to participate into the contribution. :) |
Proper evaluation I'd mandatory, we already had severe regressions due to the previous approach. |
I understand and appreciate all the work you guys have done. If I remember clearly, some integration tests and details were discussed in a private ASF slack channel which I had no access... Just want to know if there's any ETA (already one month since last update)? Anyway, please let me know if there's anything I can help (if that would help to proceed this). |
@cstamas we need to discuss this sometime this month. |
May I know the latest status of discussion? Please never hesitate to let me know if anything I can help or the code I should revise. |
@caiwei-ebay will get to it soon, sorry for delay.... |
For me, this won't happen before October. |
...er-impl/src/main/java/org/eclipse/aether/internal/impl/collect/bf/BfDependencyCollector.java
Outdated
Show resolved
Hide resolved
@michael-o |
@michael-o can you please specify what changes are requested with your "change request" PR review? It is unclear to me. As for me, this PR is good to merge (and probably improve it later re shared executor, but is def out of scope for this PR) |
|
||
private ExecutorService getExecutorService( RepositorySystemSession session ) | ||
{ | ||
int nThreads = ConfigUtils.getInteger( session, 5, "maven.artifact.threads" ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more thing, but I will do this post merge: if you look at BasicRepositoryConnector, it uses CONFIG_PROP_THREADS and just as "secondary" property uses this "maven.artifact.threads" as a fallback. I will update this post merge, to make them behave siimilarly; have it's own prop but fall back to "maven.artifact.threas" as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done in 7f56672
This PR is for parallel POM downloading, the idea is a bit different than https://github.com/ibabiankou/maven-resolver/pull/2/files: