[SPARK-2848] Shade Guava in uber-jars. #1813

vanzin · 2014-08-06T21:47:08Z

For further discussion, please check the JIRA entry.

This change moves Guava classes to a different package so that they don't conflict with the user-provided Guava (or the Hadoop-provided one). Since one class (Optional) was exposed through Spark's public API, that class was forked from Guava at the current dependency version (14.0.1) so that it can be kept going forward (until the API is cleaned).

Note this change has a few implications:

all classes in the final jars will reference the relocated classes. If Hadoop classes are included (i.e. "-Phadoop-provided" is not activated), those will also reference the Guava 14 classes (instead of the Guava 11 classes from the Hadoop classpath).
if the Guava version in Spark is ever changed, the new Guava will still reference the forked Optional class; this may or may not be a problem, but in the long term it's better to think about removing Optional from the public API.

For the end user, there are two visible implications:

Guava is not provided as a transitive dependency anymore (since it's "provided" in Spark)
At runtime, unless they provide their own, they'll either have no Guava or Hadoop's version of Guava (11), depending on how they set up their classpath.

Note that this patch does not change the sbt deliverables; those will still contain guava in its original package, and provide guava as a compile-time dependency. This assumes that maven is the canonical build, and sbt-built artifacts are not (officially) published.

vanzin · 2014-08-06T21:48:58Z

I used a tool I wrote to check the references in the final jars (check it out: https://gist.github.com/vanzin/bd9057fadf4a296220b7). Verified only references to Optional point at the old "com.google.common" package.

Currently running unit tests (don't think they'd really be affected) and will double-check the sbt-generated jar to make sure asm dependencies are not changed from before.

srowen · 2014-08-06T22:09:31Z

core/src/main/java/com/google/common/base/Optional.java

@@ -0,0 +1,243 @@
+/*
+ * Copyright (C) 2011 The Guava Authors


PS I checked and there is no additional change we need to make to LICENSE or NOTICE as a result of including this. Guava is already AL2 and has no NOTICE that pertains to this file.

Shouldn't we still list it in our LICENSE? We've listed all source files that are not copyright Apache, which would include this one.

Not according to my reading of
http://www.apache.org/dev/licensing-howto.html , in the part covering
bundled AL2 dependencies, but this stuff is always less than intuitive to
me.

I suppose the reasoning is that Spark's AL2 license already exactly
describes the terms of the licensing for that file. That's not quite true
for BSD or MIT licenses.

The copyright owner doesn't matter; there are hundreds of them. It matters
just how whoever they are license their IP to be used.
On Aug 11, 2014 12:22 AM, "Matei Zaharia" notifications@github.com wrote:

In core/src/main/java/com/google/common/base/Optional.java:

@@ -0,0 +1,243 @@
+/*

* Copyright (C) 2011 The Guava Authors

Shouldn't we still list it in our LICENSE? We've listed all source files
that are not copyright Apache, which would include this one.

—
Reply to this email directly or view it on GitHub
https://github.com/apache/spark/pull/1813/files#r16034129.

BTW if we're comfortable with not locking this file to the 14.0.1 version (or remembering to do this when we change guava versions), I can remove this fork (and clean up some code elsewhere in this patch too).

I see, I would still include it because the copyright is from another organization. I don't see why adding this would hurt.

@vanzin You're referring to just not including this source file and instead getting it in binary form from the Guava artifact right? b/c yes if it stays in source form it certainly must keep its copyright statement.

@srowen yes that's the alternative.

SparkQA · 2014-08-06T22:34:34Z

QA tests have started for PR 1813. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18069/consoleFull

SparkQA · 2014-08-06T23:25:25Z

QA results for PR 1813:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
public abstract class Optional implements Serializable {
class Relocator(prefix: String, shaded: String, includes: Seq[Regex], excludes: Seq[Regex]) {
class RelocatorRemapper(relocators: List[Relocator]) extends Remapper {
class ShadeStrategy(relocators: List[Relocator], preferLocal: List[Regex]) extends MergeStrategy {

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18069/consoleFull

srowen · 2014-08-07T00:09:23Z

After a side-bar -- I still think compile scope is correct for Guava, but that it can be marked optional as an extra safe-guard against people linking to Spark's Guava accidentally. This seems to be the motivation for provided scope, since it achieves this as a secondary effect. But in the exceptional case where Spark is being used outside an environment where Spark is provided, this has the undesirable primary effect of making Spark not execute for lack of Guava, unless the packager also includes Guava for it. Either way can be made to work and are equivalent for the common case, where Spark is provided.

EDIT: OK, is this accurate?

A "compile" scope Guava is the simple thing to do, and allows the shading solution to proceed. It however causes Guava to leak into apps that depend on Spark. But, only if they don't do the normal thing and treat Spark as "provided".

A "provided" scope Guava is more unusual, and requires some extra work in POM, but also enables the shading solution. Guava doesn't leak at all, but, people not depending on Spark as "provided" must remember to depend on Guava for Spark.

A "compile" scope Guava that is also "optional" works with the shading solution (? I think). Guava doesn't leak. But people using not using Spark as "provided" still have to remember to include Guava for Spark.

The last two seem the same, so, I suppose I can't really argue for "compile+optional" and against "provided" since the semantics are the same and both are not precisely what we desire (which would be something like "runtime by default for transitive dependers").

Proceed unless anyone else has an opinion to the contrary, IMHO.

vanzin · 2014-08-07T00:24:12Z

@pwendell I think you may have suggested in jira at some point to create a separate deliverable for the shaded guava jar, and have the spark build depend on that. Is that what you had in mind?

If that's the preferred approach, it would require code changes (change imports) and I'd need pointers for how to create this separate deliverable (I don't think it would live inside the spark build? how is it done for hive?).

pwendell · 2014-08-07T02:35:02Z

hey @vanzin - so yeah, in the past we've done this by creating separate dependencies and publishing them to maven central, this is a bit more work on our side, but it plays nicer for downstream users because the correct functioning of spark does not depend on our distribution artifacts. For instance, users who write applications that embed spark, they don't have to try to recreate or mimic our distribution artifacts. For this reason I have a slight bias for the existing approach, just because we've used it for a long time without any issues.

The way we do this type of shading has been ad-hoc, we've typically cloned the relevant projects, re-written class names and then published them to maven central under the org.spark-project namespace.

On the dev list I recently explained to @avati that it would be nice to include scripts in the Spark repo that do this. He's actually starting working on this and you can see an example for protobuf - I think the stuff he's done so far looks great:

https://github.com/avati/spark-shaded/blob/master/0001-protobuf.sh

I downloaded guava the other day and played around, I think it will be pretty straightforward to do this for guava as well.

I actually know several other projects that have started to use our shaded artifacts (we tend to shade dependencies that are a common pain point) - so I think there is some value in continuing to do this.

vanzin · 2014-08-07T15:12:51Z

@pwendell creating the shaded deliverable for Guava should be trivial, but where do I create a pull request for it? I can work on the pom file to enable that and publish it to my github in the meantime.

vanzin · 2014-08-07T16:06:43Z

Here's a repo with a pom file to shade guava:
https://github.com/vanzin/spark-shaded

pwendell · 2014-08-07T22:51:15Z

@vanzin yeah so the model I'd like to have is that we include scripts in the Spark source code that can generate these "from scratch" - i.e. take a look at the script that @avati wrote. Then these can be contributed to Spark directly, we can put then in /dev/shaded-deps/

vanzin · 2014-08-07T22:57:08Z

@pwendell I looked at that script but it looks overly complicated, at least for guava. A simple pom that invokes the shade plugin is much, much simpler. Would it be ok to have the script just run the pom?

pwendell · 2014-08-07T23:54:09Z

Does the shader plug-in work if we want to actually publish a jar to maven central? If that's the case including a POM seems like a solid option. We'd want to at least have some simple script though that downloads guava and then inserts that pom - rather than maintain it in a separate repo.

avati · 2014-08-08T00:09:05Z

shade plug-in works in many cases but not in all. From what I understand, if the dependency is direct, shade plugin works fine. In case of protobuf it was indirect dependency - akka-2.3 specifically needs protobuf 2.5 and hadoop1 specifically needs protobuf 2.4. So a custom rebuilt package of either one was inevitable.

I haven't looked deep into the guava situation, just provided some context.

pwendell · 2014-08-08T05:06:11Z

project/Relocator.scala

+ * @param includes Regexes for classes to include inside the matching package (empty = all).
+ * @param excludes Regexes for classes to exclude from the matching package (empty = none).
+ */
+class Relocator(prefix: String, shaded: String, includes: Seq[Regex], excludes: Seq[Regex]) {


This might be useful as a standalone sbt plug-in, I'm guessing a lot of other projects would use this - did you think at all about doing that? (of course that's totally up to you, just had the thought).

pwendell · 2014-08-08T05:12:31Z

Hey so after thinking about it a bit more, maybe it's worth experimenting with this simpler approach of in-lining the shaded dependency in our jars using build plug-ins. The main reason I think it's worth trying is that, in the worst case, if this has some random unintended consequence we can just revert to our existing method of publishing fully shaded dependencies to maven central.

One thought for @vanzin and @srowen - what if instead of having our own copy of Optional, we just continue advertise a full compile dependency on guava and we get Optional from that. Then basically we have a "public" use of Gauva which is for this Optional class and a "private" use of Guava which we shade away. I think that will work as long as Optional does not change in guava... but that is the exact scenario in which the current approach works as well.

srowen · 2014-08-08T13:46:41Z

@pwendell This is complicated, so let me summarize what I think I understand and then give an opinion.

The plan at hand concerned changes to the assembly JARs, pretty much exclusively. Spark continues to depend on Guava in order to compile, and so that consumers of Spark also see this dependency in their builds.

(You can see the discussion with @vanzin about whether to alter how downstream consumers of the Maven build see the Guava dependency -- provided vs compile vs optional -- but this doesn't affect the assembly, and only affects whether Guava automatically becomes a part of the downstream consumer's classpath. I still have a minor issue with provided, but unless @pwendell has any opinion on this, consider me outvoted.)

So I think the proposal was already to continue showing a dependency on Guava, but, to alter the assembly to include only Optional, and shaded copies of all else. And maybe to tweak the dependency to hide it by default for downstream consumers.

Inlining the shaded JARs is all but like removing use of Guava, as far as the world is concerned. That certainly takes it out of the picture. The little snag is Optional. It will now inevitably collide with any other app that builds Guava in too. And downstream consumers can't manage away the conflict normally. It may either be a harmless warning or cause a real problem if Optional changes later. In general I don't like pushing the shading into the compile-time world. Remember the hive-exec awfulness? This would not be my first choice, versus:

I think your second paragraph is basically the same thing proposed already, with one difference. Yes, the build still continues to say it depends on Guava. The concrete assembly is put together differently. I think you can use Guava's Optional rather than embedding a source copy of it and compiling that. Right @vanzin? with some shading cleverness?

vanzin · 2014-08-08T17:40:00Z

Lots of comments, let me try to write a coherent response to all of them. :-)

Shaded jars and maven central

Shouldn't be a problem. You're publishing your project's artifact to maven central, which just happens to include the shaded classes. This is similar to how assembly jars are published (http://search.maven.org/#browse|-1363484021).

(EDIT: yes, they's only a pom for the assembly, but if the pom's packaging type is "jar", you get the jar there. Well, you get the point.)

The Optional class

It should be possible to use Guava's original Optional, without having to fork it. The only reason I forked it is to allow changing the version of Guava without affecting the Spark API. But that can be done on-demand if needed. I'd still package it in the spark-core jar, otherwise the guava dependency would need to have "compile" scope (see below).

Compile vs. Provided

My main argument for using "provided" is to avoid leaking guava into the user's compilation classpath. Users depend on spark-core (e.g.), and if spark-core has a compile dependency on guava, guava will be available in the user's compilation classpath (regadless of whether spark-core is set up as compile or provided). If that dependency is provided (and thus not transitive), then it never shows up; if the user needs guava, he needs to explicitly depend on it.

This does have effects on packaging, though: for people using an existing spark assembly to run their apps, nothing changes. But if the user is creating his own uber-jar with spark and everything else in it, and running it in some custom environment, he'll need to explicitly package guava (even if his app doesn't use it). My belief is that the former case is the overwhelming majority, and the latter case means the user has to care about too many things already (e.g. choosing compatible versions of everything between his app and spark and hadoop et al) for this little thing to be a real concern.

Shading at compile time

@srowen let me know if I misunderstood, but do you mean creating a spark-core jar with the shaded guava classes in it? I actually started with that alternative, but it seems messy. Either you have all downstream projects reference the shaded classes (something that would be needed if going with the separate shaded jar approach), or you get into a weird place, where spark-core is referencing the shaded classes but all downstream projects are not. The assembly would fix everything (by shading again), but someone not using the assembly could get really weird errors.

Also, having duplicate class names in the classpath really confuses some IDEs that automatically add imports. More manual configuration for people to add...

sbt shading plugin

I think the code I wrote can be added to the sbt-assembly plugin (with some modifications). It probably is the best place for that, anyway, since I created it as a merge strategy for that plugin. But that's for later.

srowen · 2014-08-08T19:36:20Z

@vanzin Yes there's no problem with publishing artifacts containing copies of other artifacts as far as Maven Central is concerned. It's just icky for downstream consumers. I think the non-assembly artifacts should be the 'pristine' raw ingredients and assemblies can be crazy remixes.

I can see the argument for provided. It has the pros and cons you mention. It's a moot point for people that use Spark as provided, which is what they're supposed to do. So, meh. Just drop a comment in the POM to explain the scope.

I had understood that the goal was to alter the Spark assembly in some way, such that with some certain footwork in the Hive on Spark build, the Hive on Spark assembly would work with the Spark assembly as deployed on people's clusters. That's mostly about getting Guava out of the way.

I don't think this entails making anyone depend on a shaded anything in Maven. I think that's avoidable and undesirable. So I think we are all basically agreeing on that? So to be clear, Spark continues to express a dependency on Guava in its POM. It must. The shading affects deployed assemblies.

The only delta with what @pwendell suggests seems to be shading in Guava's Optional rather than keeping a source copy. But that seems fine.

Just package the class from the guava dependency instead. Note the dependency-fu needed in core/pom.xml and assembly/pom.xml to not expose the guava dependency in the user's compilation classpath.

Conflicts: project/SparkBuild.scala

vanzin · 2014-08-12T19:26:56Z

Updated to remove forked code. I'm not too proud of the code to package things in the sbt build, but it's how I got things to work without having to expose Guava in the "compile" scope.

vanzin · 2014-08-12T19:27:04Z

Jenkins, retest this please.

SparkQA · 2014-08-12T19:29:51Z

QA tests have started for PR 1813. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18386/consoleFull

SparkQA · 2014-08-12T20:21:18Z

QA results for PR 1813:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
class Relocator(prefix: String, shaded: String, includes: Seq[Regex], excludes: Seq[Regex]) {
class RelocatorRemapper(relocators: List[Relocator]) extends Remapper {
class ShadeStrategy(relocators: List[Relocator]) extends MergeStrategy {

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18386/consoleFull

pwendell · 2014-08-18T19:35:53Z

project/SparkBuild.scala

  lazy val settings = assemblySettings ++ Seq(
    test in assembly := {},
    jarName in assembly <<= (version, moduleName) map { (v, mName) => mName + "-"+v + "-hadoop" +
      Option(System.getProperty("hadoop.version")).getOrElse("1.0.4") + ".jar" },
    mergeStrategy in assembly := {
      case PathList("org", "datanucleus", xs @ _*)             => MergeStrategy.discard
+      case PathList("org", "objectweb", "asm", xs @ _*)        => MergeStrategy.discard


Is this related to SPARK-2848?

Ah, no, this is leftover from a previous change in this patch. I'll remove.

pwendell · 2014-08-18T20:27:34Z

Hey Marcelo,

I took a look through this - I also discussed in some detail with @srowen this past weekend. If we want to fix this problem using shading in our build, this seems like as good a way as any to do it. It does substantially increase the complexity of the build which is not great, so maybe longer term we could consider moving towards just publishing a shaded version of guava upstream.

Regarding this specific implementation, I noticed there is a lot of effort to mirror this logic in the sbt build. I wonder if maybe we should just leave this out of the sbt build entirely to reduce the complexiy and in the sbt build simply have guava be a compile time dependency like it was before. We've said in the past that our official reference build for packagers is maven, so I think it's fine if some of the more esoteric build configurations do not exist in sbt.

vanzin · 2014-08-18T20:57:14Z

Hi Patrick,

Regarding having the compile-scoped dependency in sbt only, the only way I can think of doing it is to create a profile that is always enabled from the sbt build, since the dependencies always come from the poms.

I agree that the sbt build is a little hacky with this patch, but at least it generates the same artifacts as the maven one. So it would avoid "this works with maven but not with sbt" situations, even though they might be rare.

vanzin · 2014-08-18T22:44:44Z

Hi @pwendell, I reverted the sbt changes as we talked about offline; required a tiny change in the pom so that everything works as intended in both builds.

SparkQA · 2014-08-18T22:45:38Z

QA tests have started for PR 1813 at commit 9bdffb0.

This patch merges cleanly.

SparkQA · 2014-08-18T23:38:15Z

QA tests have finished for PR 1813 at commit 9bdffb0.

This patch passes unit tests.
This patch merges cleanly.
This patch adds no public classes.

pwendell · 2014-08-20T18:03:50Z

pom.xml

+        <dependency>
+          <groupId>com.google.guava</groupId>
+          <artifactId>guava</artifactId>
+          <scope>compile</scope>


hey just wondering - since spark core lists this a compile dependency, and everything else depends on spark core, what makes this necessary?

sorry - I see - the issue is that elsewhere this is not in compile scope. However, why not just have it be compile scope by default in the root pom? Is the issue that we need to make sure it does not exist at compile scope in our published root pom?

See previous discussion I had with Sean in this PR.

There's really no good scope in maven for what we want to do. "compile" means that guava will show up in the user's compilation classpath, and since we're shading guava in the assembly, it may lead to runtime errors since guava won't be there. "provided" means that guava is not exposed to the user at all; so if the user wants to use guava, he'll need to add an explicit dependency and package it himself.

I like "provided" better because it means things fail for the user at build time, not runtime, if the user is using guava but hasn't declared the dependency. And that's why I had all those hacks in the sbt build that you asked me to remove.

I just meant that, since we are using the shade plug-in, will it just remove the entire guava from our published pom's anyways... then I realized that it won't remove guava from the root pom most likely, since we only shade inside of spark-core.

pwendell · 2014-08-20T18:06:08Z

Just a minor question but LGTM - we should merge this one into master only.

pwendell · 2014-08-20T18:06:14Z

Jenkins, test this please.

SparkQA · 2014-08-20T18:10:26Z

QA tests have started for PR 1813 at commit 9bdffb0.

This patch merges cleanly.

SparkQA · 2014-08-20T19:05:26Z

QA tests have finished for PR 1813 at commit 9bdffb0.

This patch passes unit tests.
This patch merges cleanly.
This patch adds no public classes.

pwendell · 2014-08-20T23:23:04Z

Okay - let's give this a try. Merging it into master.

For further discussion, please check the JIRA entry. This change moves Guava classes to a different package so that they don't conflict with the user-provided Guava (or the Hadoop-provided one). Since one class (Optional) was exposed through Spark's public API, that class was forked from Guava at the current dependency version (14.0.1) so that it can be kept going forward (until the API is cleaned). Note this change has a few implications: - *all* classes in the final jars will reference the relocated classes. If Hadoop classes are included (i.e. "-Phadoop-provided" is not activated), those will also reference the Guava 14 classes (instead of the Guava 11 classes from the Hadoop classpath). - if the Guava version in Spark is ever changed, the new Guava will still reference the forked Optional class; this may or may not be a problem, but in the long term it's better to think about removing Optional from the public API. For the end user, there are two visible implications: - Guava is not provided as a transitive dependency anymore (since it's "provided" in Spark) - At runtime, unless they provide their own, they'll either have no Guava or Hadoop's version of Guava (11), depending on how they set up their classpath. Note that this patch does not change the sbt deliverables; those will still contain guava in its original package, and provide guava as a compile-time dependency. This assumes that maven is the canonical build, and sbt-built artifacts are not (officially) published. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #1813 from vanzin/SPARK-2848 and squashes the following commits: 9bdffb0 [Marcelo Vanzin] Undo sbt build changes. 819b445 [Marcelo Vanzin] Review feedback. 05e0a3d [Marcelo Vanzin] Merge branch 'master' into SPARK-2848 fef4370 [Marcelo Vanzin] Unfork Optional.java. d3ea8e1 [Marcelo Vanzin] Exclude asm classes from final jar. 637189b [Marcelo Vanzin] Add hacky filter to prefer Spark's copy of Optional. 2fec990 [Marcelo Vanzin] Shade Guava in the sbt build. 616998e [Marcelo Vanzin] Shade Guava in the maven build, fork Guava's Optional.java.

vanzin · 2014-08-21T19:42:12Z

@pwendell any idea why this PR wasn't automatically closed after the merge?

liancheng · 2014-08-25T23:36:22Z

@vanzin Would you mind to have a look at SPARK-3217? I think this PR breaks Spark Maven build. Jenkins didn't complain because it uses SBT and thus doesn't shade Guava.

For further discussion, please check the JIRA entry. This change moves Guava classes to a different package so that they don't conflict with the user-provided Guava (or the Hadoop-provided one). Since one class (Optional) was exposed through Spark's public API, that class was forked from Guava at the current dependency version (14.0.1) so that it can be kept going forward (until the API is cleaned). Note this change has a few implications: - *all* classes in the final jars will reference the relocated classes. If Hadoop classes are included (i.e. "-Phadoop-provided" is not activated), those will also reference the Guava 14 classes (instead of the Guava 11 classes from the Hadoop classpath). - if the Guava version in Spark is ever changed, the new Guava will still reference the forked Optional class; this may or may not be a problem, but in the long term it's better to think about removing Optional from the public API. For the end user, there are two visible implications: - Guava is not provided as a transitive dependency anymore (since it's "provided" in Spark) - At runtime, unless they provide their own, they'll either have no Guava or Hadoop's version of Guava (11), depending on how they set up their classpath. Note that this patch does not change the sbt deliverables; those will still contain guava in its original package, and provide guava as a compile-time dependency. This assumes that maven is the canonical build, and sbt-built artifacts are not (officially) published. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes apache#1813 from vanzin/SPARK-2848 and squashes the following commits: 9bdffb0 [Marcelo Vanzin] Undo sbt build changes. 819b445 [Marcelo Vanzin] Review feedback. 05e0a3d [Marcelo Vanzin] Merge branch 'master' into SPARK-2848 fef4370 [Marcelo Vanzin] Unfork Optional.java. d3ea8e1 [Marcelo Vanzin] Exclude asm classes from final jar. 637189b [Marcelo Vanzin] Add hacky filter to prefer Spark's copy of Optional. 2fec990 [Marcelo Vanzin] Shade Guava in the sbt build. 616998e [Marcelo Vanzin] Shade Guava in the maven build, fork Guava's Optional.java.

qiaohaijun · 2014-11-01T13:59:58Z

+1

For further discussion, please check the JIRA entry. This change moves Guava classes to a different package so that they don't conflict with the user-provided Guava (or the Hadoop-provided one). Since one class (Optional) was exposed through Spark's public API, that class was forked from Guava at the current dependency version (14.0.1) so that it can be kept going forward (until the API is cleaned). Note this change has a few implications: - *all* classes in the final jars will reference the relocated classes. If Hadoop classes are included (i.e. "-Phadoop-provided" is not activated), those will also reference the Guava 14 classes (instead of the Guava 11 classes from the Hadoop classpath). - if the Guava version in Spark is ever changed, the new Guava will still reference the forked Optional class; this may or may not be a problem, but in the long term it's better to think about removing Optional from the public API. For the end user, there are two visible implications: - Guava is not provided as a transitive dependency anymore (since it's "provided" in Spark) - At runtime, unless they provide their own, they'll either have no Guava or Hadoop's version of Guava (11), depending on how they set up their classpath. Note that this patch does not change the sbt deliverables; those will still contain guava in its original package, and provide guava as a compile-time dependency. This assumes that maven is the canonical build, and sbt-built artifacts are not (officially) published. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes apache#1813 from vanzin/SPARK-2848 and squashes the following commits: 9bdffb0 [Marcelo Vanzin] Undo sbt build changes. 819b445 [Marcelo Vanzin] Review feedback. 05e0a3d [Marcelo Vanzin] Merge branch 'master' into SPARK-2848 fef4370 [Marcelo Vanzin] Unfork Optional.java. d3ea8e1 [Marcelo Vanzin] Exclude asm classes from final jar. 637189b [Marcelo Vanzin] Add hacky filter to prefer Spark's copy of Optional. 2fec990 [Marcelo Vanzin] Shade Guava in the sbt build. 616998e [Marcelo Vanzin] Shade Guava in the maven build, fork Guava's Optional.java.

mfawzymkh · 2015-01-19T04:58:42Z

do we have an ETA to get this pull request merged to master? The guava shading issue is causing a problem for client libs that has a dependency on swift-service when spark is compiled with hadoop-2.4

srowen · 2015-01-19T09:53:49Z

@mfawzymkh As you can see this was merged into Spark a long time ago. What are you waiting on?

mfawzymkh · 2015-01-19T20:43:06Z

Sorry , I confused this with some other issue. Thanks

Sent from my iPhone

On Jan 19, 2015, at 1:54 AM, Sean Owen notifications@github.com wrote:

@mfawzymkh As you can see this was merged into Spark a long time ago. What are you waiting on?

—
Reply to this email directly or view it on GitHub.

Marcelo Vanzin added 3 commits August 5, 2014 13:35

Shade Guava in the maven build, fork Guava's Optional.java.

616998e

Shade Guava in the sbt build.

2fec990

Add hacky filter to prefer Spark's copy of Optional.

637189b

srowen reviewed Aug 6, 2014
View reviewed changes

Exclude asm classes from final jar.

d3ea8e1

pwendell reviewed Aug 8, 2014
View reviewed changes

Marcelo Vanzin added 2 commits August 12, 2014 12:23

Unfork Optional.java.

fef4370

Just package the class from the guava dependency instead. Note the dependency-fu needed in core/pom.xml and assembly/pom.xml to not expose the guava dependency in the user's compilation classpath.

Merge branch 'master' into SPARK-2848

05e0a3d

Conflicts: project/SparkBuild.scala

pwendell reviewed Aug 18, 2014
View reviewed changes

Undo sbt build changes.

9bdffb0

pwendell reviewed Aug 20, 2014
View reviewed changes

vanzin closed this Aug 21, 2014

vanzin deleted the SPARK-2848 branch August 21, 2014 19:42

[SPARK-2848] Shade Guava in uber-jars. #1813

[SPARK-2848] Shade Guava in uber-jars. #1813

Conversation

vanzin commented Aug 6, 2014

vanzin commented Aug 6, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Aug 6, 2014

SparkQA commented Aug 6, 2014

srowen commented Aug 7, 2014

vanzin commented Aug 7, 2014

pwendell commented Aug 7, 2014

vanzin commented Aug 7, 2014

vanzin commented Aug 7, 2014

pwendell commented Aug 7, 2014

vanzin commented Aug 7, 2014

pwendell commented Aug 7, 2014

avati commented Aug 8, 2014

Choose a reason for hiding this comment

pwendell commented Aug 8, 2014

srowen commented Aug 8, 2014

vanzin commented Aug 8, 2014

Shaded jars and maven central

The Optional class

Compile vs. Provided

Shading at compile time

sbt shading plugin

srowen commented Aug 8, 2014

vanzin commented Aug 12, 2014

vanzin commented Aug 12, 2014

SparkQA commented Aug 12, 2014

SparkQA commented Aug 12, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pwendell commented Aug 18, 2014

vanzin commented Aug 18, 2014

vanzin commented Aug 18, 2014

SparkQA commented Aug 18, 2014

SparkQA commented Aug 18, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pwendell commented Aug 20, 2014

pwendell commented Aug 20, 2014

SparkQA commented Aug 20, 2014

SparkQA commented Aug 20, 2014

pwendell commented Aug 20, 2014

vanzin commented Aug 21, 2014

liancheng commented Aug 25, 2014

qiaohaijun commented Nov 1, 2014

mfawzymkh commented Jan 19, 2015

srowen commented Jan 19, 2015

mfawzymkh commented Jan 19, 2015