Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-13576] [build] [test-maven] Don't create assembly for examples. #11452

Closed
wants to merge 7 commits into from

Conversation

vanzin
Copy link
Contributor

@vanzin vanzin commented Mar 1, 2016

As part of the goal to stop creating assemblies in Spark, this change
modifies the mvn and sbt builds to not create an assembly for examples.

Instead, dependencies are copied to the build directory (under
target/scala-xx/jars), and in the final archive, into the "examples/jars"
directory.

To avoid having to deal too much with Windows batch files, I made examples
run through the launcher library; the spark-submit launcher now has a
special mode to run examples, which adds all the necessary jars to the
spark-submit command line, and replaces the bash and batch scripts that
were used to run examples. The scripts are now just a thin wrapper around
spark-submit; another advantage is that now all spark-submit options are
supported.

There are a few glitches; in the mvn build, a lot of duplicated dependencies
get copied, because they are promoted to "compile" scope due to extra
dependencies in the examples module (such as HBase). In the sbt build,
all dependencies are copied, because there doesn't seem to be an easy
way to filter things.

I plan to clean some of this up when the rest of the tasks are finished.
When the main assembly is replaced with jars, we can remove duplicate jars
from the examples directory during packaging.

Tested by running SparkPi in: maven build, sbt build, dist created by
make-distribution.sh.

Finally: note that running the "assembly" target in sbt doesn't build
the examples anymore. You need to run "package" for that.

As part of the goal to stop creating assemblies in Spark, this change
modifies the mvn and sbt builds to not create an assembly for examples.

Instead, dependencies are copied to the build directory (under
target/scala-xx/jars), and in the final archive, into the "examples/jars"
directory.

To avoid having to deal too much with Windows batch files, I made examples
run through the launcher library; the spark-submit launcher now has a
special mode to run examples, which adds all the necessary jars to the
spark-submit command line, and replaces the bash and batch scripts that
were used to run examples. The scripts are now just a thin wrapper around
spark-submit; another advantage is that now all spark-submit options are
supported.

There are a few glitches; in the mvn build, a lot of duplicated dependencies
get copied, because they are promoted to "compile" scope due to extra
dependencies in the examples module (such as HBase). In the sbt build,
all dependencies are copied, because there doesn't seem to be an easy
way to filter things.

I plan to clean some of this up when the rest of the tasks are finished.
When the main assembly is replaced with jars, we can remove duplicate jars
from the examples directory during packaging.

Tested by running SparkPi in: maven build, sbt build, dist created by
make-distribution.sh.

Finally: note that running the "assembly" target in sbt doesn't build
the examples anymore. You need to run "package" for that.
@JoshRosen
Copy link
Contributor

I think there's some slight overlap between this and #11178, so it would be great if you could also review that PR.

@SparkQA
Copy link

SparkQA commented Mar 2, 2016

Test build #52261 has finished for PR 11452 at commit d0d4304.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • checkArgument(!isExample || mainClass != null, \"Missing example class name.\");

@vanzin
Copy link
Contributor Author

vanzin commented Mar 2, 2016

Hmm, looks like forcing jars to be exported (instead of the generated classes directory) broke some tests...

Some tests depend on resources being actual files instead of being inside
of jars.
@SparkQA
Copy link

SparkQA commented Mar 2, 2016

Test build #52330 has finished for PR 11452 at commit 8263ed5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor Author

vanzin commented Mar 8, 2016

So would anybody care to take a look at this? @srowen? @JoshRosen?

@srowen
Copy link
Member

srowen commented Mar 8, 2016

I like the cleanup. I don't feel strongly about the change either way, but I also think there's not much value in offering example assembly JARs. They're really source code examples.

@vanzin
Copy link
Contributor Author

vanzin commented Mar 8, 2016

but I also think there's not much value in offering example assembly JARs

I think it's nice to be able to run a quick Spark app at least to know things are working, although spark-shell can be used for most of those cases.

One future enhancement would be to get these examples out of the main tarball distributions. They're rather large, because of all the dependencies, so that would reduce the package sizes considerably.

@SparkQA
Copy link

SparkQA commented Mar 8, 2016

Test build #52672 has finished for PR 11452 at commit ccdd2b7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 11, 2016

Test build #52933 has finished for PR 11452 at commit fda639b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin vanzin changed the title [SPARK-13576] Don't create assembly for examples. [SPARK-13576] [build] [test-maven] Don't create assembly for examples. Mar 11, 2016
@vanzin
Copy link
Contributor Author

vanzin commented Mar 11, 2016

retest this please

@SparkQA
Copy link

SparkQA commented Mar 12, 2016

Test build #52959 has finished for PR 11452 at commit fda639b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor Author

vanzin commented Mar 12, 2016

SQL test failure, I assume unrelated? retest this please

@SparkQA
Copy link

SparkQA commented Mar 12, 2016

Test build #52969 has finished for PR 11452 at commit fda639b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor Author

vanzin commented Mar 14, 2016

I plan to merge this after tests pass.

@SparkQA
Copy link

SparkQA commented Mar 14, 2016

Test build #53081 has finished for PR 11452 at commit 4e3d390.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • abstract class NativeDDLCommand(val sql: String) extends RunnableCommand
    • case class CreateDatabase(
    • case class CreateFunction(
    • case class AlterTableRename(
    • case class AlterTableSetProperties(
    • case class AlterTableUnsetProperties(
    • case class AlterTableSerDeProperties(
    • case class AlterTableStorageProperties(
    • case class AlterTableNotClustered(
    • case class AlterTableNotSorted(
    • case class AlterTableSkewed(
    • case class AlterTableNotSkewed(
    • case class AlterTableNotStoredAsDirs(
    • case class AlterTableSkewedLocation(
    • case class AlterTableAddPartition(
    • case class AlterTableRenamePartition(
    • case class AlterTableExchangePartition(
    • case class AlterTableDropPartition(
    • case class AlterTableArchivePartition(
    • case class AlterTableUnarchivePartition(
    • case class AlterTableSetFileFormat(
    • case class AlterTableSetLocation(
    • case class AlterTableTouch(
    • case class AlterTableCompact(
    • case class AlterTableMerge(
    • case class AlterTableChangeCol(
    • case class AlterTableAddCol(
    • case class AlterTableReplaceCol(
    • case class In(attribute: String, values: Array[Any]) extends Filter

@vanzin
Copy link
Contributor Author

vanzin commented Mar 14, 2016

retest this please

@SparkQA
Copy link

SparkQA commented Mar 14, 2016

Test build #53098 has finished for PR 11452 at commit 4e3d390.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • abstract class NativeDDLCommand(val sql: String) extends RunnableCommand
    • case class CreateDatabase(
    • case class CreateFunction(
    • case class AlterTableRename(
    • case class AlterTableSetProperties(
    • case class AlterTableUnsetProperties(
    • case class AlterTableSerDeProperties(
    • case class AlterTableStorageProperties(
    • case class AlterTableNotClustered(
    • case class AlterTableNotSorted(
    • case class AlterTableSkewed(
    • case class AlterTableNotSkewed(
    • case class AlterTableNotStoredAsDirs(
    • case class AlterTableSkewedLocation(
    • case class AlterTableAddPartition(
    • case class AlterTableRenamePartition(
    • case class AlterTableExchangePartition(
    • case class AlterTableDropPartition(
    • case class AlterTableArchivePartition(
    • case class AlterTableUnarchivePartition(
    • case class AlterTableSetFileFormat(
    • case class AlterTableSetLocation(
    • case class AlterTableTouch(
    • case class AlterTableCompact(
    • case class AlterTableMerge(
    • case class AlterTableChangeCol(
    • case class AlterTableAddCol(
    • case class AlterTableReplaceCol(
    • case class In(attribute: String, values: Array[Any]) extends Filter

@vanzin
Copy link
Contributor Author

vanzin commented Mar 14, 2016

This is failing a sql test that seems unrelated to this change.

Exchange SinglePartition, None
+- WholeStageCodegen
   :  +- TungstenAggregate(key=[], functions=[(count(1),mode=Partial,isDistinct=false)], output=[count#3206L])
   :     +- Scan org.apache.spark.sql.execution.datasources.text.DefaultSource@332a5bae part: struct<>, data: struct<value:string>[] InputPaths: file:/home/jenkins/workspace/SparkPullRequestBuilder%403/sql/core/target/scala-2.11/test-classes/text-suite.txt

    at test.org.apache.spark.sql.JavaDataFrameSuite.testTextLoad(JavaDataFrameSuite.java:321)
Caused by: java.io.IOException: No input paths specified in job
    at test.org.apache.spark.sql.JavaDataFrameSuite.testTextLoad(JavaDataFrameSuite.java:321)

retest this please

@vanzin
Copy link
Contributor Author

vanzin commented Mar 14, 2016

retest this please

@SparkQA
Copy link

SparkQA commented Mar 14, 2016

Test build #53118 has finished for PR 11452 at commit 19312fe.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public class ShuffleServiceHeartbeat extends BlockTransferMessage

@vanzin
Copy link
Contributor Author

vanzin commented Mar 14, 2016

retest this please

@SparkQA
Copy link

SparkQA commented Mar 15, 2016

Test build #53124 has finished for PR 11452 at commit 19312fe.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public class ShuffleServiceHeartbeat extends BlockTransferMessage

@vanzin
Copy link
Contributor Author

vanzin commented Mar 15, 2016

Yay! Merging to master.

@SparkQA
Copy link

SparkQA commented Mar 15, 2016

Test build #53153 has finished for PR 11452 at commit e9ab003.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor Author

vanzin commented Mar 15, 2016

Merging to master.

@asfgit asfgit closed this in 48978ab Mar 15, 2016
roygao94 pushed a commit to roygao94/spark that referenced this pull request Mar 22, 2016
As part of the goal to stop creating assemblies in Spark, this change
modifies the mvn and sbt builds to not create an assembly for examples.

Instead, dependencies are copied to the build directory (under
target/scala-xx/jars), and in the final archive, into the "examples/jars"
directory.

To avoid having to deal too much with Windows batch files, I made examples
run through the launcher library; the spark-submit launcher now has a
special mode to run examples, which adds all the necessary jars to the
spark-submit command line, and replaces the bash and batch scripts that
were used to run examples. The scripts are now just a thin wrapper around
spark-submit; another advantage is that now all spark-submit options are
supported.

There are a few glitches; in the mvn build, a lot of duplicated dependencies
get copied, because they are promoted to "compile" scope due to extra
dependencies in the examples module (such as HBase). In the sbt build,
all dependencies are copied, because there doesn't seem to be an easy
way to filter things.

I plan to clean some of this up when the rest of the tasks are finished.
When the main assembly is replaced with jars, we can remove duplicate jars
from the examples directory during packaging.

Tested by running SparkPi in: maven build, sbt build, dist created by
make-distribution.sh.

Finally: note that running the "assembly" target in sbt doesn't build
the examples anymore. You need to run "package" for that.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes apache#11452 from vanzin/SPARK-13576.
@vanzin vanzin deleted the SPARK-13576 branch April 5, 2016 20:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants