Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up project discovery #242

Closed
wants to merge 4 commits into from

Conversation

@oehme
Copy link
Contributor

commented Apr 9, 2019

I profiled the project discovery phase of a large (2000 module) Maven build today, because it was taking several minutes before it actually started executing goals. I found some low-level hotspots that this PR fixes. It shaves off about 2.5min of the startup time of this build. See the individual commit messages for more details. There are still other hotspots left, but they are in the plexus-interpolation project, for which I'll open a separate PR.

  • Make sure there is a JIRA issue filed
    for the change (usually before you start working on it). Trivial changes like typos do not
    require a JIRA issue. Your pull request should address just this issue, without
    pulling in other changes.
  • Each commit in the pull request should have a meaningful subject line and body.
  • Format the pull request title like [MNG-XXX] - Fixes bug in ApproximateQuantiles,
    where you replace MNG-XXX with the appropriate JIRA issue. Best practice
    is to use the JIRA issue title in the pull request title and in the first line of the
    commit message.
  • Write a pull request description that is detailed enough to understand what the pull request does, how, and why.
  • Run mvn clean verify to make sure basic checks pass. A more thorough check will
    be performed on your pull request automatically.
  • You have run the Core IT successfully.

If your pull request is about ~20 lines of code you don't need to sign an
Individual Contributor License Agreement if you are unsure
please ask on the developers list.

To make clear that you license your contribution under
the Apache License Version 2.0, January 2004
you have to acknowledge this by using the following check-box.

@oehme oehme force-pushed the oehme:master branch from f0a0d4e to 0d1ad23 Apr 9, 2019
@michael-o

This comment has been minimized.

Copy link
Member

commented Apr 9, 2019

I will have a look at this tonight.

@oehme

This comment has been minimized.

Copy link
Contributor Author

commented Apr 10, 2019

Note that 3 core-it tests were failing for me, but they also fail with Maven 3.6.0, so they are probably environment dependent. Are there any specific requirements for these tests? I'd really like to see a fully green build :)

Copy link

left a comment

very awesome.
Do you have any number to share ?

@oehme oehme force-pushed the oehme:master branch from 0d1ad23 to a48f6e4 Apr 10, 2019
@oehme

This comment has been minimized.

Copy link
Contributor Author

commented Apr 10, 2019

Project discovery on the project took ~5min before, now down to ~2.5min. Still a long way to go, but a good improvement :)

@oehme oehme force-pushed the oehme:master branch from ef9baf0 to 571da0e Apr 10, 2019
@oehme

This comment has been minimized.

Copy link
Contributor Author

commented Apr 10, 2019

The last commit seems to be problematic, some core-its around ArtifactHandler are failing. I'm not yet sure I understand why - Can the artifact handler for a given type change over time? I thought the idea is to only add additional handlers for new types?

Nevermind, I got it - We can't remember the DefaultArtifactHandler created for unknown types, as that might later be replaced with a better handler by an extension.

All integration tests are passing now.

@oehme oehme force-pushed the oehme:master branch from 571da0e to 073f08c Apr 10, 2019
Copy link
Member

left a comment

Please create seperate JIRA tickets for each change. Did you also try change by change and saw how it affects the build time? I'd like to see broken down numbers for your case.

if ( tok.hasMoreTokens() )
{
qualifier = tok.nextToken();
fallback = Pattern.compile( "\\d+" ).matcher( qualifier ).matches();

This comment has been minimized.

Copy link
@michael-o

michael-o Apr 10, 2019

Member

Why not use the char array approach as before?

This comment has been minimized.

Copy link
@oehme

oehme Apr 10, 2019

Author Contributor

Because it wasn't a hotspot in my tests. I can do it for consistency, but I can't prove that it is a problem :)

This comment has been minimized.

Copy link
@michael-o

michael-o Apr 11, 2019

Member

Why not, if you know that this works better that with regex.

This comment has been minimized.

Copy link
@oehme

oehme Apr 11, 2019

Author Contributor

I generally only change things if I can measure they are a problem. Otherwise one can quickly get into a situation where things get worse or the code gets less readable without a benefit.

This comment has been minimized.

Copy link
@oehme

oehme Apr 11, 2019

Author Contributor

I've adjusted it, as in this case it also improves readability.

return null;
}
long longValue = Long.parseLong( s );
if ( longValue > Integer.MAX_VALUE )

This comment has been minimized.

Copy link
@michael-o

michael-o Apr 10, 2019

Member

Again, change in semantics with long. Dislike on this one. Why all the hassle, why not use NumberUtils from Commons Lang 3?

This comment has been minimized.

Copy link
@oehme

oehme Apr 10, 2019

Author Contributor

Because it throws an exception internally, which is what we're trying to avoid in the first place here. We don't want to throw exceptions on common code paths.

Also, we're not changing semantics here, we are still returning an int. Going through long is only done for the very common case of big numbers that would fail with a NumberFormatException on Integer.parseInt

This comment has been minimized.

Copy link
@vlsi

vlsi Apr 24, 2019

+1 to @oehme, however it is probably worth adding a comment that explains this (the reason behind tryParseInt method and the reason behind parseLong).

@Override
public int hashCode()
{
return value;

This comment has been minimized.

Copy link
@michael-o

michael-o Apr 10, 2019

Member

Why not use Objects#hashCode() on this?

@@ -211,6 +235,30 @@ public int compareTo( Item item )
}
}

@Override
public boolean equals( Object o )

This comment has been minimized.

Copy link
@michael-o

michael-o Apr 10, 2019

Member

Why not use Objects#equals() on this?

}

@Override
public int hashCode()

This comment has been minimized.

Copy link
@michael-o

michael-o Apr 10, 2019

Member

Why not use Objects#hashCode() on this?

This comment has been minimized.

Copy link
@oehme

oehme Apr 11, 2019

Author Contributor

What for? It's a single object, not a list.

@@ -272,6 +320,29 @@ public int compareTo( Item item )
}

@Override
public boolean equals( Object o )

This comment has been minimized.

Copy link
@michael-o

michael-o Apr 10, 2019

Member

same here...

@@ -384,6 +455,29 @@ public int compareTo( Item item )
}

@Override
public boolean equals( Object o )

This comment has been minimized.

Copy link
@michael-o

michael-o Apr 10, 2019

Member

and here...

@oehme

This comment has been minimized.

Copy link
Contributor Author

commented Apr 10, 2019

Please create seperate JIRA tickets for each change.

Why is this overhead necessary? I wouldn't know what to write in the ticket besides "look at the commit message" ;)

Did you also try change by change and saw how it affects the build time? I'd like to see broken down numbers for your case.

I did, each change removes a similar chunk of the build time. They are all very much worth it. I can send you the original and after profiler snapshot if you'd like.

@michael-o

This comment has been minimized.

Copy link
Member

commented Apr 11, 2019

Please create seperate JIRA tickets for each change.

Why is this overhead necessary? I wouldn't know what to write in the ticket besides "look at the commit message" ;)

This isn't overhead. Every change needs to be documented as a ticket. They go into the release notes. We don't expect users to read the log to understand what has been changed.
It is enough to write a sentence in the ticket which justifies the change.

@oehme oehme force-pushed the oehme:master branch 3 times, most recently from 186586c to f4b6e48 Apr 11, 2019
Use a simple list of allowed characters instead of a regex.
@oehme oehme force-pushed the oehme:master branch 2 times, most recently from dca4f99 to 2b34a4e Apr 11, 2019
oehme added 3 commits Apr 9, 2019
By not allocating the canonical representation for equals/hashcode,
but instead using the items we already have. This saves both time
and memory.

I left the canonical field around for testing purposes.
Use if-statements instead of exception-based control flow.
Throwing exceptions is very expensive and should not be used
for normal flow.
Otherwise we have to go through the whole sisu engine again,
which is very slow, because it does a linear scan.
@oehme oehme force-pushed the oehme:master branch from 2b34a4e to 6fc5267 Apr 11, 2019
@oehme

This comment has been minimized.

Copy link
Contributor Author

commented Apr 11, 2019

I've created the issues and addressed your comments above, please have another look.

@oehme

This comment has been minimized.

Copy link
Contributor Author

commented Apr 11, 2019

Here are two profiles, with the optimized methods colored in pink and their total contribution to CPU time shown at the bottom:

Before:
Screenshot 2019-04-11 at 12 02 31

After:
Screenshot 2019-04-11 at 12 02 54

As you can see, optimizing StringSearchModelInterpolator will be a great next step.

@hboutemy

This comment has been minimized.

Copy link
Member

commented Apr 21, 2019

Note that 3 core-it tests were failing for me, but they also fail with Maven 3.6.0, so they are probably environment dependent. Are there any specific requirements for these tests? I'd really like to see a fully green build :)

if you follow the instructions https://maven.apache.org/core-its/core-it-suite/, it should work easily on any configuration (as it works on ASF Jenkins and on many Maven developers). But there may be weak ITs that are fragile against something in your environment: what are the failing ITs and your configuration, please?

@hboutemy

This comment has been minimized.

Copy link
Member

commented Apr 21, 2019

another side: I'd like to be able to reproduce your tests and profiling.
Would it be feasible to create a sample build and write some quick explanations on doing such profiling, please?

@oehme

This comment has been minimized.

Copy link
Contributor Author

commented Apr 21, 2019

@rfscholte, @britter and I are meeting about this next week. How about you join in as well and I can walk you through some of the methodology?

@hboutemy

This comment has been minimized.

Copy link
Member

commented Apr 23, 2019

I had to re-launch the build on ASF Jenkins for MNG-6632 2 times before it went ok https://builds.apache.org/view/M-R/view/Maven/job/maven-box/job/maven/job/MNG-6632/
and I had to relaunch it one time for MNG-6630 https://builds.apache.org/view/M-R/view/Maven/job/maven-box/job/maven/job/MNG-6630/
looks like it was a temporary issue on the servers...

I reviewed also the commits: everything looks perfect to me

I'll approve these changes on GitHub and finish the approval on dev list: IMHO, we'll need to find a simpler process for the future...

@hboutemy hboutemy requested review from michael-o and hboutemy Apr 23, 2019
@rfscholte

This comment has been minimized.

Copy link
Contributor

commented Apr 23, 2019

We'll probably need to fix https://issues.apache.org/jira/browse/MNG-6643 afterwards, i.e. ComparableVersion should not depend on any third party libraries so it can easily be executed from commandline.

@khmarbaise

This comment has been minimized.

Copy link
Member

commented Apr 23, 2019

@rfscholte, @britter and I are meeting about this next week. How about you join in as well and I can walk you through some of the methodology?

Can you tell more accurate when you like to meet?

@oehme

This comment has been minimized.

Copy link
Contributor Author

commented Apr 24, 2019

@khmarbaise the meeting was yesterday afternoon - @britter will post a summary on the mailing list soon.

@britter

This comment has been minimized.

Copy link
Member

commented Apr 24, 2019

@khmarbaise the summary is here: https://lists.apache.org/thread.html/240bec41fd91580027661a167dbde7aa001ff03aa0b20ef779f927ce@%3Cdev.maven.apache.org%3E

If we do this kind of calls again I will make sure more people get a chance to join. It was kind of improvised over the easter weekend. Sorry!

@oehme

This comment has been minimized.

Copy link
Contributor Author

commented Apr 24, 2019

@hboutemy Only one integration test still fails locally for me (an master too), the mng6386BaseUriProperty(itMNG6386UnicodeChars) case. I'll see if I can find out why, it's not a big deal for me though.

Thanks for approving the changes. I was wondering one thing: Is using Guava allowed in the core module? It's already a transitive dependency of Maven and it could simplify some of the code I wrote for this PR.

Copy link
Contributor

left a comment

Unrelated to this PR, but in the following private methods: validate30RawProfileActivation(), validate20RawDependenciesSelfReferencing() and validateEffectiveModelAgainstDependency(), the "request" parameters are never used, so they could be safely deleted.

@hboutemy

This comment has been minimized.

Copy link
Member

commented Apr 27, 2019

PR merged, and additional commit for explanation added

thanks a lot

@hboutemy hboutemy closed this Apr 27, 2019
@oehme

This comment has been minimized.

Copy link
Contributor Author

commented Apr 27, 2019

Thanks @hboutemy! Can you comment on my earlier question? It would help me make my future contributions as concise as possible:

Is using Guava allowed in the core module? It's already a transitive dependency of Maven and it could simplify some of the code I wrote for this PR.

@rfscholte

This comment has been minimized.

Copy link
Contributor

commented Apr 29, 2019

I prefer not to use guava, we've seen way too many backwords compatibility issues with this library. IIRC it is used by Sisu, I hope that's the only one.

}
}

private boolean isValidId( String id )

This comment has been minimized.

Copy link
@vlsi

vlsi Apr 29, 2019

Should this be static? (in order to prevent accidental access to instance fields)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
9 participants
You can’t perform that action at this time.