[SPARK-3812] [BUILD] Adapt maven build to publish effective pom. #2673

ScrapCodes · 2014-10-06T11:15:19Z

I have tried maven help plugin first but that published all projects in top level pom. So I was left with no choice but to roll my own trivial plugin. This patch basically installs an effective pom after maven install is finished.

The problem it fixes is described as follows:
If you install using maven
mvn install -DskipTests -Dhadoop.version=2.2.0 -Phadoop-2.2
Then without this patch the published pom(s) will have hadoop version as 1.0.4. This can be a problem at some point.

SparkQA · 2014-10-06T11:19:35Z

QA tests have started for PR 2673 at commit cfe0531.

This patch merges cleanly.

ScrapCodes · 2014-10-06T11:31:26Z

@pwendell Take a look, whenever you get time. It would be good if we can publish https://github.com/ScrapCodes/effective-pom-plugin.

ScrapCodes · 2014-10-06T11:36:18Z

I will have to add a similar thing for http://maven.apache.org/plugins/maven-deploy-plugin/deploy-file-mojo.html. But I am not sure about repository url field.

SparkQA · 2014-10-06T12:29:13Z

QA tests have finished for PR 2673 at commit cfe0531.

This patch passes unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class CacheTableCommand(tableName: String, plan: Option[LogicalPlan], isLazy: Boolean)
- case class UncacheTableCommand(tableName: String) extends Command
- case class CacheTableCommand(
- case class UncacheTableCommand(tableName: String) extends LeafNode with Command
- case class DescribeCommand(child: SparkPlan, output: Seq[Attribute])(

AmplabJenkins · 2014-10-06T12:29:16Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21323/Test PASSed.

vanzin · 2014-10-06T18:44:46Z

@ScrapCodes what's the end goal here?

I understand what your change does, but I'm confused about how this actually translates to the published bits, unless there's more information that I'm missing.

AFAIK, right now, only one "spark-core_2.10" artifact is published, as can bee seen at:

http://mvnrepository.com/artifact/org.apache.spark/spark-core_2.10

So, with this fix, the default hadoop version for that artifact would be changed if the build that publishes it is compiled with a different -Dhadoop.version=x.

Or is there a plan to publish different hadoop versions under different artifactIds (or versions)?

ScrapCodes · 2014-10-07T05:34:03Z

Well, sorry for the confusion. For the published bits used now one can easily exclude and include another dependency as long as it is about a few deps. But once we start moving towards supporting scala 2.11 we would want to publish artifacts for both scala 2.10 and 2.11. At that point publishing effective pom(s) will be essential. There is a related discussion on another PR #2615.

pwendell · 2014-10-07T05:40:23Z

Hey @vanzin this is something we need to support cross building in 2.10 and 2.11, but it's also a broadly useful thing that is lacking in the maven build ecosystem (SBT has this feature out of the box).

The reason why it's needed is that we need to publish different dependency trees for every artifact, say spark-core for both 2.10 and 2.11. The maven build tool isn't good at distinguishing the published artifacts from the build itself, so this will allow us to publish only the "effective" dependency graph for a particular profile that is activated during publishing.

The reason why it's useful more broadly is that we have a lot of profiles in Spark's build that get published with our poms, even though these are not usable by projects that depend on Spark (profiles are not a concept that works transitively across projects - or at least many build tools such as sbt do not respect profiles from other projects). They just make our poms more confusing looking for people. What we really want to publish is a single set of dependencies for each Spark artifact.

vanzin · 2014-10-07T16:48:26Z

Thanks for the explanation. The issue with the Scala versions makes sense. What threw me off was the Hadoop example: I've always seen people say that the Spark API is independent of the Hadoop version, and they should explicitly say the Hadoop version they want in their projects (and have a matching Spark deployment). So explaining this PR in terms of publishing a different Hadoop versions sounds a little bit at odds with that.

ueshin · 2014-10-07T17:34:25Z

Hi @pwendell, I had a similar issue related to artifacts in Maven Central and Hadoop versions.
Could you take a look at SPARK-3764 and #2638 please?

pwendell · 2014-10-08T07:19:14Z

@vanzin yeah I agree - I think it was just sort of a red herring based on the example in @ScrapCodes description. AFAIK this is totally unrelated to Hadoop.

AmplabJenkins · 2014-10-09T20:32:35Z

Can one of the admins verify this patch?

dispalt · 2014-10-09T21:15:06Z

@pwendell As a fun-fact, hbase publishes jars for hadoop1 and hadoop2 like so. http://search.maven.org/#artifactdetails%7Corg.apache.hbase%7Chbase%7C0.98.6.1-hadoop2%7Cpom they have some tooling in dev-support folder to help them with the automation of that.

SparkQA · 2014-10-10T07:09:40Z

QA tests have started for PR 2673 at commit 285be82.

This patch merges cleanly.

SparkQA · 2014-10-10T08:12:23Z

QA tests have finished for PR 2673 at commit 285be82.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-10-10T08:12:26Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21579/Test PASSed.

ScrapCodes · 2014-10-10T09:14:24Z

@pwendell I tried maven shade plugin to somehow work as effective pom generator, but that does not happen unless we have dependencies apart from the project's itself to put in uber jar. Please see the above commit and their commit messages for reasons.

SparkQA · 2014-10-10T09:14:47Z

QA tests have started for PR 2673 at commit 242a24e.

This patch merges cleanly.

SparkQA · 2014-10-10T10:15:43Z

QA tests have finished for PR 2673 at commit 242a24e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-10-10T10:15:47Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21589/Test PASSed.

ScrapCodes · 2014-10-13T06:33:13Z

@pwendell I don't see an easy way with maven shade plugin either ? Do you ?, One way is to include a fake dependency and then ask it to shade that across all artifacts. But I somehow felt this is more invasive.

pwendell · 2014-10-16T06:29:54Z

What if we just add the maven-shade plug-in but we configure it in a way where it effectively is doing no operations? I.e. we add a relocation of a class that does not exist. Will it still produce an effective pom?

pwendell · 2014-10-16T06:30:51Z

I think that a solution that doesn't require us using our own plug-in is good, even if we do something silly like we have to shade a class that is in fact un-used.

ScrapCodes · 2014-10-16T08:45:12Z

Even if we do this trick and shade something in all artifacts, what about spark-parent ?(I mean how to get its effective pom.) There since we don't build jar - shading plugin throws NPEs.

pwendell · 2014-10-17T05:36:45Z

Hm - is there any workaround to that? We don't actually need to publish the parent pom anyways if we are using effective poms. Could we just install the other modules only and not the parent?

The current approach also adds Guava logic to the parent pom which doesn't make much sense. I think ideally we'd just have a very simple "dummy" relocation (e.g. without any exclusions) in the parent and the existing shading stuff around guava would be in core/pom.xml. Does the NPE go away if you do that?

ScrapCodes · 2014-10-17T06:00:33Z

We can definitely install other modules, I am afraid if the resultant(effective) pom(s) will carry reference to parent pom. Let me try that out.

ScrapCodes · 2014-10-21T10:27:34Z

Yeah I verified the resultant dependency tree as shown in gists above.

pwendell · 2014-10-21T17:07:33Z

Jenkins, test this please.

SparkQA · 2014-10-21T17:14:40Z

QA tests have started for PR 2673 at commit 6fff1ce.

This patch merges cleanly.

SparkQA · 2014-10-21T19:14:41Z

Tests timed out for PR 2673 at commit 6fff1ce after a configured wait of 120m.

AmplabJenkins · 2014-10-21T19:14:44Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21996/
Test FAILed.

vanzin · 2014-10-21T21:42:41Z

Hey @ScrapCodes @pwendell,

Funny thing. I've been playing with a different part of the build and I've found out I need exactly this to get what I wanted to work. I think I found a workaround that doesn't require a new plugin, although it's not pretty.

What you can do is create a "dummy" sub-module that just publishes an empty jar (or a jar with ay random file, it really doesn't matter). In the projects where you want to publish the effective pom, add a call to the maven-shade-plugin that only shades this dummy dependency, and actually excludes all files from it in the filters section if needed.

That has both the effect of not packaging anything else in the final jar, and also removing the dummy dependency from the published pom. But well, it's pretty ugly.

vanzin · 2014-10-21T22:24:48Z

Yay, I just noticed your last patch takes exactly that approach.

pwendell · 2014-10-21T22:58:15Z

@vanzin yep that's exactly what we did!

pwendell · 2014-10-21T22:58:34Z

Jenkins, test this please.

SparkQA · 2014-10-21T23:04:45Z

QA tests have started for PR 2673 at commit 6fff1ce.

This patch merges cleanly.

SparkQA · 2014-10-22T01:04:46Z

Tests timed out for PR 2673 at commit 6fff1ce after a configured wait of 120m.

AmplabJenkins · 2014-10-22T01:04:48Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22014/
Test FAILed.

ScrapCodes · 2014-10-22T06:14:18Z

Not sure which is better ! Creating an empty project with just a pom file in it. Or depending on random jar from maven central. ? I prefer first approach.

pwendell · 2014-10-22T06:54:56Z

I'll just publish an empty jar in maven central - we can use it forever.

pwendell · 2014-10-22T07:04:44Z

I created a jar you can use. It might take a couple of hours to propagate to maven central:

  <groupId>org.spark-project.spark</groupId>
  <artifactId>unused</artifactId>
  <version>1.0.0</version>

vanzin · 2014-10-22T18:05:52Z

@ScrapCodes here's an approach that doesn't rely on maven central:
https://gist.github.com/vanzin/1ac8114faf39a6e8359d

Feel free to use / adapt it if you guys think that's preferred. (Also not that this solution runs the maven-shade-plugin even for modules that are not "jar" modules; it seems to work and not do anything evil, though.)

pwendell · 2014-10-22T22:02:54Z

Jenkins, test this please.

SparkQA · 2014-10-22T22:09:52Z

QA tests have started for PR 2673 at commit aa7b91d.

This patch merges cleanly.

SparkQA · 2014-10-22T23:18:56Z

QA tests have finished for PR 2673 at commit aa7b91d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-10-22T23:19:00Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22045/
Test PASSed.

pwendell · 2014-10-23T02:43:46Z

Okay - both approach seem decent but the current one is a bit simpler for developers (otherwise we have to explain this emtpy source directory). I'm just gonna pull this in to lay the groundwork for the other Scala 2.11 stuff.

witgo · 2014-10-23T09:39:19Z

@ScrapCodes @pwendell
This patch will cause maven-assembly-plugin error:
./make-distribution.sh -Dhadoop.version=2.3.0-cdh5.0.1 -Dyarn.version=2.3.0-cdh5.0.1 -Phadoop-2.3 -Pyarn -Pnetlib-lgpl

du -sh dist/lib/*

4.0K    dist/lib/spark-assembly-1.2.0-SNAPSHOT-hadoop2.3.0-cdh5.0.1.jar
928K    dist/lib/spark-examples-1.2.0-SNAPSHOT-hadoop2.3.0-cdh5.0.1.jar

pwendell · 2014-10-23T23:32:54Z

Thanks - I reverted this patch in master.

On Thu, Oct 23, 2014 at 2:39 AM, Guoqiang Li notifications@github.com
wrote:

@ScrapCodes https://github.com/ScrapCodes @pwendell
https://github.com/pwendell

This patch will cause maven-assembly-plugin error:
./make-distribution.sh -Dhadoop.version=2.3.0-cdh5.0.1
-Dyarn.version=2.3.0-cdh5.0.1 -Phadoop-2.3 -Pyarn -Pnetlib-lgpl

du -sh dist/lib/*

4.0K dist/lib/spark-assembly-1.2.0-SNAPSHOT-hadoop2.3.0-cdh5.0.1.jar
928K dist/lib/spark-examples-1.2.0-SNAPSHOT-hadoop2.3.0-cdh5.0.1.jar

—
Reply to this email directly or view it on GitHub
#2673 (comment).

ScrapCodes changed the title ~~Build changes to publish effective pom.~~ [SPARK-3812] Build changes to publish effective pom. Oct 6, 2014

ScrapCodes changed the title ~~[SPARK-3812] Build changes to publish effective pom.~~ [SPARK-3812] [BUILD] Build changes to publish effective pom. Oct 6, 2014

ScrapCodes changed the title ~~[SPARK-3812] [BUILD] Build changes to publish effective pom.~~ [SPARK-3812] [BUILD] Adapt maven build to publish effective pom. Oct 6, 2014

ScrapCodes force-pushed the build-changes-effective-pom branch from 242a24e to e846069 Compare October 20, 2014 09:43

ScrapCodes force-pushed the build-changes-effective-pom branch from 6fff1ce to d7cf5db Compare October 22, 2014 06:10

used an unused dep.

aa7b91d

ScrapCodes force-pushed the build-changes-effective-pom branch from d7cf5db to aa7b91d Compare October 22, 2014 09:05

asfgit closed this in c5882c6 Oct 23, 2014

lhotari mentioned this pull request Mar 22, 2017

Using ${packaging.type} as value for packaging element in pom.xml of 0.9.11 prevents usage with sbt / ivy ronmamo/reflections#169

Open

[SPARK-3812] [BUILD] Adapt maven build to publish effective pom. #2673

[SPARK-3812] [BUILD] Adapt maven build to publish effective pom. #2673

Conversation

ScrapCodes commented Oct 6, 2014

SparkQA commented Oct 6, 2014

ScrapCodes commented Oct 6, 2014

ScrapCodes commented Oct 6, 2014

SparkQA commented Oct 6, 2014

AmplabJenkins commented Oct 6, 2014

vanzin commented Oct 6, 2014

ScrapCodes commented Oct 7, 2014

pwendell commented Oct 7, 2014

vanzin commented Oct 7, 2014

ueshin commented Oct 7, 2014

pwendell commented Oct 8, 2014

AmplabJenkins commented Oct 9, 2014

dispalt commented Oct 9, 2014

SparkQA commented Oct 10, 2014

SparkQA commented Oct 10, 2014

AmplabJenkins commented Oct 10, 2014

ScrapCodes commented Oct 10, 2014

SparkQA commented Oct 10, 2014

SparkQA commented Oct 10, 2014

AmplabJenkins commented Oct 10, 2014

ScrapCodes commented Oct 13, 2014

pwendell commented Oct 16, 2014

pwendell commented Oct 16, 2014

ScrapCodes commented Oct 16, 2014

pwendell commented Oct 17, 2014

ScrapCodes commented Oct 17, 2014

ScrapCodes commented Oct 21, 2014

pwendell commented Oct 21, 2014

SparkQA commented Oct 21, 2014

SparkQA commented Oct 21, 2014

AmplabJenkins commented Oct 21, 2014

vanzin commented Oct 21, 2014

vanzin commented Oct 21, 2014

pwendell commented Oct 21, 2014

pwendell commented Oct 21, 2014

SparkQA commented Oct 21, 2014

SparkQA commented Oct 22, 2014

AmplabJenkins commented Oct 22, 2014

ScrapCodes commented Oct 22, 2014

pwendell commented Oct 22, 2014

pwendell commented Oct 22, 2014

vanzin commented Oct 22, 2014

pwendell commented Oct 22, 2014

SparkQA commented Oct 22, 2014

SparkQA commented Oct 22, 2014

AmplabJenkins commented Oct 22, 2014

pwendell commented Oct 23, 2014

witgo commented Oct 23, 2014

pwendell commented Oct 23, 2014