-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-13576] [build] [test-maven] Don't create assembly for examples. #11452
Conversation
As part of the goal to stop creating assemblies in Spark, this change modifies the mvn and sbt builds to not create an assembly for examples. Instead, dependencies are copied to the build directory (under target/scala-xx/jars), and in the final archive, into the "examples/jars" directory. To avoid having to deal too much with Windows batch files, I made examples run through the launcher library; the spark-submit launcher now has a special mode to run examples, which adds all the necessary jars to the spark-submit command line, and replaces the bash and batch scripts that were used to run examples. The scripts are now just a thin wrapper around spark-submit; another advantage is that now all spark-submit options are supported. There are a few glitches; in the mvn build, a lot of duplicated dependencies get copied, because they are promoted to "compile" scope due to extra dependencies in the examples module (such as HBase). In the sbt build, all dependencies are copied, because there doesn't seem to be an easy way to filter things. I plan to clean some of this up when the rest of the tasks are finished. When the main assembly is replaced with jars, we can remove duplicate jars from the examples directory during packaging. Tested by running SparkPi in: maven build, sbt build, dist created by make-distribution.sh. Finally: note that running the "assembly" target in sbt doesn't build the examples anymore. You need to run "package" for that.
I think there's some slight overlap between this and #11178, so it would be great if you could also review that PR. |
Test build #52261 has finished for PR 11452 at commit
|
Hmm, looks like forcing jars to be exported (instead of the generated classes directory) broke some tests... |
Some tests depend on resources being actual files instead of being inside of jars.
Test build #52330 has finished for PR 11452 at commit
|
So would anybody care to take a look at this? @srowen? @JoshRosen? |
I like the cleanup. I don't feel strongly about the change either way, but I also think there's not much value in offering example assembly JARs. They're really source code examples. |
I think it's nice to be able to run a quick Spark app at least to know things are working, although spark-shell can be used for most of those cases. One future enhancement would be to get these examples out of the main tarball distributions. They're rather large, because of all the dependencies, so that would reduce the package sizes considerably. |
Test build #52672 has finished for PR 11452 at commit
|
Test build #52933 has finished for PR 11452 at commit
|
retest this please |
Test build #52959 has finished for PR 11452 at commit
|
SQL test failure, I assume unrelated? retest this please |
Test build #52969 has finished for PR 11452 at commit
|
I plan to merge this after tests pass. |
Test build #53081 has finished for PR 11452 at commit
|
retest this please |
Test build #53098 has finished for PR 11452 at commit
|
This is failing a sql test that seems unrelated to this change.
retest this please |
retest this please |
Test build #53118 has finished for PR 11452 at commit
|
retest this please |
Test build #53124 has finished for PR 11452 at commit
|
Yay! Merging to master. |
Test build #53153 has finished for PR 11452 at commit
|
Merging to master. |
As part of the goal to stop creating assemblies in Spark, this change modifies the mvn and sbt builds to not create an assembly for examples. Instead, dependencies are copied to the build directory (under target/scala-xx/jars), and in the final archive, into the "examples/jars" directory. To avoid having to deal too much with Windows batch files, I made examples run through the launcher library; the spark-submit launcher now has a special mode to run examples, which adds all the necessary jars to the spark-submit command line, and replaces the bash and batch scripts that were used to run examples. The scripts are now just a thin wrapper around spark-submit; another advantage is that now all spark-submit options are supported. There are a few glitches; in the mvn build, a lot of duplicated dependencies get copied, because they are promoted to "compile" scope due to extra dependencies in the examples module (such as HBase). In the sbt build, all dependencies are copied, because there doesn't seem to be an easy way to filter things. I plan to clean some of this up when the rest of the tasks are finished. When the main assembly is replaced with jars, we can remove duplicate jars from the examples directory during packaging. Tested by running SparkPi in: maven build, sbt build, dist created by make-distribution.sh. Finally: note that running the "assembly" target in sbt doesn't build the examples anymore. You need to run "package" for that. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes apache#11452 from vanzin/SPARK-13576.
As part of the goal to stop creating assemblies in Spark, this change
modifies the mvn and sbt builds to not create an assembly for examples.
Instead, dependencies are copied to the build directory (under
target/scala-xx/jars), and in the final archive, into the "examples/jars"
directory.
To avoid having to deal too much with Windows batch files, I made examples
run through the launcher library; the spark-submit launcher now has a
special mode to run examples, which adds all the necessary jars to the
spark-submit command line, and replaces the bash and batch scripts that
were used to run examples. The scripts are now just a thin wrapper around
spark-submit; another advantage is that now all spark-submit options are
supported.
There are a few glitches; in the mvn build, a lot of duplicated dependencies
get copied, because they are promoted to "compile" scope due to extra
dependencies in the examples module (such as HBase). In the sbt build,
all dependencies are copied, because there doesn't seem to be an easy
way to filter things.
I plan to clean some of this up when the rest of the tasks are finished.
When the main assembly is replaced with jars, we can remove duplicate jars
from the examples directory during packaging.
Tested by running SparkPi in: maven build, sbt build, dist created by
make-distribution.sh.
Finally: note that running the "assembly" target in sbt doesn't build
the examples anymore. You need to run "package" for that.