Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MNG-7235] Speed improvements when calculating the sorted project graph #530

Closed
wants to merge 2 commits into from

Conversation

gnodet
Copy link
Contributor

@gnodet gnodet commented Sep 9, 2021

Following this checklist to help us incorporate your
contribution quickly and easily:

  • Make sure there is a JIRA issue filed
    for the change (usually before you start working on it). Trivial changes like typos do not
    require a JIRA issue. Your pull request should address just this issue, without
    pulling in other changes.
  • Each commit in the pull request should have a meaningful subject line and body.
  • Format the pull request title like [MNG-XXX] - Fixes bug in ApproximateQuantiles,
    where you replace MNG-XXX with the appropriate JIRA issue. Best practice
    is to use the JIRA issue title in the pull request title and in the first line of the
    commit message.
  • Write a pull request description that is detailed enough to understand what the pull request does, how, and why.
  • Run mvn clean verify to make sure basic checks pass. A more thorough check will
    be performed on your pull request automatically.
  • You have run the Core IT successfully.

If your pull request is about ~20 lines of code you don't need to sign an
Individual Contributor License Agreement if you are unsure
please ask on the developers list.

To make clear that you license your contribution under
the Apache License Version 2.0, January 2004
you have to acknowledge this by using the following check-box.

@michael-o
Copy link
Member

I will try to review and understand your change this week.

@gnodet
Copy link
Contributor Author

gnodet commented Sep 14, 2021

I will try to review and understand your change this week.

@michael-o Let me give you a few details.

I performed a time analysis on the quite big project (1300 modules or so) with a mvnd help:evaluate -Dexpression=project.version run. mvnd does compute the whole graph at the beginning.
This leads to a call to getDownstreamProjects() for each project in the build.

Currently, this means for each call, going through all projects and calling projectIds.contains( ProjectSorter.getId( mavenProject ) ). The new version caches a few things (the order of the projects so that the dependent projects can be sorted without iterating through the whole list of projects), and a mapping of ProjectSorter.getId(x) -> x to avoid recomputing the ids. In addition to those two caches, the loop is changed so that we retrieve the projects using a lookup and sort them (instead of iterating through the whole list of sorted projects and adding the matching ones).

So, if N is the number of projects, this brings down the number of iteration from N * N to a * N, where a is the mean number of downstream projects. And a is usually very small (especially in my case where we only get the first level of dependencies between modules and not the transitive ones). In addition, the number of getId() calls is down to N, which was the critical spot.

Hopes this helps.

@michael-o
Copy link
Member

Merged.

@michael-o michael-o closed this Sep 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants