Skip to content

[BEAM-1092] Shade commonly used libraries (e.g. Guava) to avoid class conflicts#2096

Closed
aviemzur wants to merge 3 commits intoapache:masterfrom
aviemzur:shade-guava-generically
Closed

[BEAM-1092] Shade commonly used libraries (e.g. Guava) to avoid class conflicts#2096
aviemzur wants to merge 3 commits intoapache:masterfrom
aviemzur:shade-guava-generically

Conversation

@aviemzur
Copy link
Member

@aviemzur aviemzur commented Feb 24, 2017

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

  • Make sure the PR title is formatted like:
    [BEAM-<Jira issue #>] Description of pull request
  • Make sure tests pass via mvn clean verify. (Even better, enable
    Travis-CI on your fork and ensure the whole test matrix passes).
  • Replace <Jira issue #> in the title with the actual Jira issue
    number, if there is one.
  • If this contribution is large, please file an Apache
    Individual Contributor License Agreement.

@aviemzur
Copy link
Member Author

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.01%) to 69.303% when pulling ff32293 on aviemzur:shade-guava-generically into 3d0fe85 on apache:master.

@asfbot
Copy link

asfbot commented Feb 24, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/7813/
--none--

@dhalperi
Copy link
Contributor

We looked into this exact approach in the past, and rolled it back because project.artifactId is not a legal java package name (e.g., if it included -). Did you find a way to solve this?

@dhalperi
Copy link
Contributor

(#744)

@aviemzur
Copy link
Member Author

aviemzur commented Feb 24, 2017

I can replace the - with . programatically using build helper maven plugin.

@asfbot
Copy link

asfbot commented Feb 24, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/7831/

Build result: ABORTED

[...truncated 833.54 KB...] ... 31 moreCaused by: java.lang.RuntimeException: The forked VM terminated without properly saying goodbye. VM crash or System.exit called?Command was /bin/sh -c cd /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Java_MavenInstall@2/sdks/java/io/hbase && /usr/local/asfpackages/java/jdk1.8.0_121/jre/bin/java org.apache.maven.surefire.booter.ForkedBooter /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Java_MavenInstall@2/sdks/java/io/hbase/target/surefire/surefire7450086805414423461tmp /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Java_MavenInstall@2/sdks/java/io/hbase/target/surefire/surefire_83942865920716290877tmp at org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:590) at org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:460) at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:229) at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:201) at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1026) at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:862) at org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:755) at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134) ... 32 more2017-02-24T22:25:19.043 [ERROR] 2017-02-24T22:25:19.043 [ERROR] Re-run Maven using the -X switch to enable full debug logging.2017-02-24T22:25:19.043 [ERROR] 2017-02-24T22:25:19.043 [ERROR] For more information about the errors and possible solutions, please read the following articles:2017-02-24T22:25:19.043 [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/PluginExecutionException2017-02-24T22:25:19.043 [ERROR] 2017-02-24T22:25:19.043 [ERROR] After correcting the problems, you can resume the build with the command2017-02-24T22:25:19.043 [ERROR] mvn -rf :beam-sdks-java-io-hbaseBuild was abortedchannel stoppedSetting status of af0c88f to FAILURE with url https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/7831/ and message: 'Build finished. 'Using context: Jenkins: Maven clean install
--none--

@aviemzur
Copy link
Member Author

Retest this please.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.01%) to 69.327% when pulling af0c88f on aviemzur:shade-guava-generically into 3d0fe85 on apache:master.

@asfbot
Copy link

asfbot commented Feb 25, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/7857/
--none--

@aviemzur
Copy link
Member Author

@dhalperi Addressed your comment in my latest commit. PTAL?

<pattern>com.google.common</pattern>
<excludes>
<!-- com.google.common is too generic, need to exclude guava-testlib -->
<exclude>com.google.common.**.testing.*</exclude>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like it's being lost. Is this right?

Copy link
Member Author

@aviemzur aviemzur Feb 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the tests pass without this, so I'm not sure why this is needed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was added in PR #832 because the rules only shade com.google.guava:guava and not com.google.guava:guava-testlib. If we relocate symbols from guava-testlib without shading them, we can get runtime ClassNotFoundExceptions on projects which depend on our artifacts.

IIRC, this issue did not manifest within the the beam project structure because each project had equivalent repackaging rules and could source the repackaged guava-testlib symbols from its own local dependencies. I believe I ran into the issue when creating my own project that didn't have shading and consumed the shaded beam artifacts.

I've been away from Beam development for some months so unfortunately I don't have recent context on whether this is still needed.

Copy link
Member Author

@aviemzur aviemzur Mar 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @swegner for your comment!

All dependencies on guava-testlib in the project currently are in scope test.

My change removes shading of test jars in modules which matched the shading rules that are in the root pom in my branch (They now do not shade their test jars)

The only test jar that remains with shading is the one of Dataflow runner, which still has this exclusion in its shade-plugin configuration: google-cloud-dataflow-java/pom.xml

Is this sufficient?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me; thanks for the thorough investigation!

(Please continue working with @dhalperi for the rest of the PR)

<artifactId>build-helper-maven-plugin</artifactId>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do we ensure this only executes in the modules that use guava?

What happens in the modules that need to shade more than guava?

Copy link
Member Author

@aviemzur aviemzur Feb 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will execute in every module, to ensure forward compatability, when new modules are introduced, they will not need to know they should shade guava and add this shading to the pom, this will happen automatically. (If they don't depend on guava, no extra classes will be shaded into the resulting jar).

For modules that need to shade more than guava we should ask the following:
Should this artifact be shaded in all modules that will use it?
If so: add it to the root pom
If not: override shading for that specific module.

<pattern>com.google.common</pattern>
<excludes>
<!-- com.google.common is too generic, need to exclude guava-testlib -->
<exclude>com.google.common.**.testing.*</exclude>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same q?

@dhalperi
Copy link
Contributor

dhalperi commented Mar 2, 2017

Re: testlib -- I did some GitHub blame/history diving and found JIRA https://issues.apache.org/jira/browse/BEAM-557 and #832 .

Maybe @swegner has more context about which specific ways the Guava testlib is a public dependency.

@dhalperi
Copy link
Contributor

dhalperi commented Mar 2, 2017

Re: general approach.

I see two improvements:

  1. Providing a simple, uniform way to name shaded artifacts by establishing a central rewritten artifactId. This is succeeding where [BEAM-450] Shade separately per artifact #744 failed!
  2. Providing a centralized way to ensure that Guava is not on the public API surface of any module.

I see two downsides:

  1. Extra executions for modules that don't depend on Guava. This will slow the build.
  2. Modules that need to shade more than just Guava will have to override the shading config. They may actually get it wrong after that, so Guava might still be exposed. -- Thus, while this reduces the amount of config necessary in common cases, does guarantee solution.

I would probably:

  • move the shading config back into individual modules. build perf + accuracy. But use the new artifactId trick to make the config more uniform.
  • find a better way to actually verify that the final output artifacts don't expose Guava. (out of scope for this PR.) This is in-line with prior PR comments.

In otherwords, I would probably take improvement 1, but not 2. 1 is a clear win, but 2 has some drawbacks and doesn't quite guarantee success to justify those drawbacks IMO.

@dhalperi
Copy link
Contributor

dhalperi commented Mar 2, 2017

Another good opinion to get would be @davorbonaci .

@aviemzur
Copy link
Member Author

aviemzur commented Mar 3, 2017

@dhalperi I understand the concerns you raise. I'll try to address them from my point of view.

  1. Extra executions for modules that don't depend on Guava. This will slow the build.

Many of the modules already shade, so regarding slowing down of the build I’m not sure this concern is that pertinent if new way of shading is an improvement.

  1. Modules that need to shade more than just Guava will have to override the shading config. They may
    actually get it wrong after that, so Guava might still be exposed. -- Thus, while this reduces the amount of config necessary in common cases, does guarantee solution.

I agree that human error could cause Guava to be forgotten from shading configuration added to child modules, but is this different than today? Can this human error not happen now?

@davorbonaci
Copy link
Member

I'm not totally sold on the premise that we should shade Guava in every single module. A couple of reasons:

  • We now have ~30 modules. We'll probably have 50+ sometime this year. Guava is not that small in size -- bringing in so many copies is just wasteful for a simple call to, say, Preconditions. Minimization could mitigate this concern, but I'm yet-to-be-convinced it works well for libraries.
  • We care that Guava is not leaked to our users -- this is totally legit. On the other hand, we don't have a strong reason to avoid Guava on an internal interface between, say, runners-core and core-sdk. It just doesn't matter to users. Of course, taking a stronger stance can be fine, but I'm a little worried it will bite us (for no real reason).
  • Build time.

Separately, the amount of "magic" worries me too. This will work perfectly in Maven, but what happens when someone imports it into Eclipse, IntelliJ or elsewhere? At least in some IDEs, it will fail.

On the other hand, I'm really strongly in favor of one common configuration. There's literally zero chance that this will work if this configuration is offloaded to every single module. This one outweighs them all for me. (And, I was bitten by Guava 19 vs. 20 just today -- so, I know the pain.)

In summary, I'm leaning towards +1 on this. However, I'd be more comfortable if we were explicit in saying:

  • how much the build time has gone up
  • how much the total bundled jar, across the project, has gone up
  • to investigate minimization (separately from this PR)
  • that we'll make enforcement that Guava in not brought in transitively (which is currently broken at least in one module).

@aviemzur
Copy link
Member Author

aviemzur commented Mar 6, 2017

Build time (MacBook Pro, skipping tests)

Before: 02:13 min
After: 02:35 min

Jar size (Sum of all jars generated)

Before: 93.93 MB
After: 151.34 MB

@davorbonaci
Copy link
Member

Build time -- less than I would have forecasted. SGTM.

Jar size -- more than I would have forecasted -- it is 50% more. I think minimization is in order.

@aviemzur aviemzur force-pushed the shade-guava-generically branch from af0c88f to 7eb6434 Compare March 14, 2017 09:15
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.003%) to 70.14% when pulling 7eb6434 on aviemzur:shade-guava-generically into 9ad01fb on apache:master.

@asfbot
Copy link

asfbot commented Mar 14, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/8378/
--none--

@aviemzur
Copy link
Member Author

aviemzur commented Mar 14, 2017

If there are modules which do not need to shade and relocate Guava - for example if they actively need to not do this and/or we feel that their file size has inflated without any reason - we can add the following node to their <build><plugins> to "opt-out" of the default shading configuration:

<plugin>
  <groupId>org.apache.maven.plugins</groupId>
  <artifactId>maven-shade-plugin</artifactId>
  <executions>
    <execution>
      <id>bundle-and-repackage</id>
      <phase>none</phase>
    </execution>
  </executions>
</plugin>

Another way to tackle large file sizes would be to configure minimization by default.

I think we should start a minimization track separately from this change, as there are some issues with it:

  1. There is a bug in shade-plugin where if minimization is declared on a project with pom packaging it fails. I have created a ticket and PR with a fix for this but could take a while before it is available to us.
  2. Minimization on all of our modules adds a significant amount of build time, which, as discussed, we wish to avoid. (Up from 02:30 mins for a build skipping tests on my laptop to over 5 mins)

Copy link
Member

@davorbonaci davorbonaci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Merging.

@asfgit asfgit closed this in cc12fd3 Mar 14, 2017
@davorbonaci
Copy link
Member

davorbonaci commented Mar 14, 2017

@aviemzur, btw, I noticed the version of the shade plugin went down. Was this intentional? If yes, great. If not, perhaps push a separate PR to get it back up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants