[SPARK-5341] Use maven coordinates as dependencies in spark-shell and spark-submit #4215

brkyvz · 2015-01-27T01:47:28Z

This PR adds support for using maven coordinates as dependencies to spark-shell.
Coordinates can be provided as a comma-delimited string after the flag --packages.
Additional remote repositories (like SonaType) can be supplied as a comma-delimited string after the flag
--repositories.

Uses the Ivy library to resolve dependencies. Unfortunately the library has no decent documentation, therefore solving more complex dependency issues can be a problem.

@pwendell, @mateiz, @mengxr

Note: This is still a WIP. The following need to be handled:

SparkQA · 2015-01-27T02:54:57Z

Test build #26136 has finished for PR 4215 at commit a0870af.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-01-27T02:57:25Z

Test build #26135 has finished for PR 4215 at commit 6645af4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

mateiz · 2015-01-27T19:14:10Z

bin/utils.sh

-      --master | --deploy-mode | --class | --name | --jars | --py-files | --files | \
-      --conf | --properties-file | --driver-memory | --driver-java-options | \
+      --master | --deploy-mode | --class | --name | --jars | --maven | --py-files | --files | \
+      --conf | --maven_repos | --properties-file | --driver-memory | --driver-java-options | \


Rename this to --maven-repos with a dash instead of an underscore; everything else has a dash

SparkQA · 2015-01-27T23:50:53Z

Test build #26191 has finished for PR 4215 at commit 2edc9b5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

…efault

pwendell · 2015-01-28T01:28:10Z

bin/utils.sh

  # modify NOT ONLY this script but also SparkSubmitArgument.scala
  SUBMISSION_OPTS=()
  APPLICATION_OPTS=()
  while (($#)); do
    case "$1" in
-      --master | --deploy-mode | --class | --name | --jars | --py-files | --files | \
-      --conf | --properties-file | --driver-memory | --driver-java-options | \
+      --master | --deploy-mode | --class | --name | --jars | --maven | --py-files | --files | \


For this one maybe we could call it --packages. IMO maven is a little confusing because it's also the name of a piece of software. I'd also below just say --repositories below.

SparkQA · 2015-01-28T02:19:43Z

Test build #26200 has finished for PR 4215 at commit 3705907.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- * (4) the main class for the child

brkyvz · 2015-01-28T22:21:47Z

@pwendell @mateiz I think the PR is ready for code review. I would appreciate your comments!

SparkQA · 2015-01-28T23:05:00Z

Test build #26255 has finished for PR 4215 at commit dcf5e13.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- * (4) the main class for the child

SparkQA · 2015-01-28T23:10:03Z

Test build #26256 has finished for PR 4215 at commit 97c4a92.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-01-28T23:16:01Z

Test build #26258 has finished for PR 4215 at commit cef0e24.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-01-29T00:18:54Z

Test build #26262 has finished for PR 4215 at commit ea44ca4.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-01-29T02:49:26Z

Test build #26277 has finished for PR 4215 at commit 85ec5a3.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

brkyvz · 2015-01-31T17:13:43Z

Interesting... The tests are successful on my local computer but fails in Jenkins... The end to end test that downloads spark-avro and spark-csv succeeds which is nice. Searching for artifacts at other repositories looks like it failed, but actually it says: Test succeeded, but ended abruptly.

SparkQA · 2015-02-02T08:18:29Z

Test build #26493 has finished for PR 4215 at commit c08dc9f.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- logDebug(s"Did not load class $name from REPL class server at $uri", e)
- logError(s"Failed to check existence of class $name on REPL class server at $uri", e)

SparkQA · 2015-02-02T08:42:14Z

Test build #26492 has finished for PR 4215 at commit 3ada19a.

This patch passes all tests.
This patch does not merge cleanly.
This patch adds no public classes.

tdas · 2015-02-02T19:32:13Z

Is this going to be merged for Spark 1.3? If so, please merge this (@pwendell) so that we can update #3715

JoshRosen · 2015-02-02T21:56:51Z

core/src/test/scala/org/apache/spark/deploy/SparkSubmitUtilsSuite.scala

+    val path = SparkSubmitUtils.resolveMavenCoordinates("com.agimatec:agimatec-validation:0.9.3",
+      Option("https://oss.sonatype.org/content/repositories/agimatec/"), None, true)
+    assert(path.indexOf("agimatec-validation") >= 0, "should find package. If it doesn't, check" +
+      "if package still exists. If it has been removed, replace the example in this test.")


It would be cool if there was some way to mock out the Maven repository so that this test isn't reliant on third-party services that we don't control; that would also allow the test to run without an internet connection.

This is kind of tricky, though, since I guess we do want to test against the actual repository at some point.

Yeah, it would be awesome if we can mock it, but on the other hand, we still want to be sure that we can access these remote repositories correctly. I would prefer to keep it for now.

SparkQA · 2015-02-03T01:12:44Z

Test build #26558 has finished for PR 4215 at commit 71c374d.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class LogisticGradient(numClasses: Int) extends Gradient
- case class HiveScriptIOSchema (
- val trimed_class = serdeClassName.split("'")(1)

JoshRosen · 2015-02-03T06:06:37Z

@brkyvz I see that one of the TODOS is for adding Windows compatibility. Beyond the additions to the shell script command-line parsing, what features are we missing for Windows support? I've been testing a few Windows things today in a VM, so if it's just a matter of testing I'd be glad to try things out.

brkyvz · 2015-02-03T06:11:32Z

@JoshRosen I actually don't know what we are missing. I think it only requires testing, because the directory structure (backslashes instead of slashes) and command-line parsing should all be handled. If you can test it, I'd really appreciate it!

JoshRosen · 2015-02-03T06:15:23Z

core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala

+    artifacts.map { artifactInfo =>
+      val artifactString = artifactInfo.toString
+      val jarName = artifactString.drop(artifactString.lastIndexOf("!") + 1)
+      cacheDirectory.getAbsolutePath + "/" + jarName.substring(0, jarName.lastIndexOf(".jar") + 4)


Hardcoding / as the file separator character will probably break things on Windows; I think we should use File.separator instead.

Good catch! Fixed it. Pushing update in a few secs

SparkQA · 2015-02-03T07:37:54Z

Test build #26615 has finished for PR 4215 at commit 9dae87f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

brkyvz · 2015-02-03T15:50:54Z

@pwendell, I think this is in good shape to go in right before you cut the branch. Having the community test it out under many different settings and setups would help a lot. @JoshRosen, what do you think?

pwendell · 2015-02-03T22:26:05Z

core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala

+import org.apache.ivy.plugins.matcher.GlobPatternMatcher
+import org.apache.ivy.plugins.resolver.{ChainResolver, IBiblioResolver}
+
+import org.apache.spark.Logging


This shouldn't use the existing Spark logging framework. We actually just directly print the output in elsewhere in this tool (look at uses of printStream).

Then should I just log to System.out and System.err?

never mind, just saw the printStream in SparkSubmit

pwendell · 2015-02-03T23:09:34Z

core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala

+      }
+      // Log the callers for each dependency
+      rr.getDependencies.toArray.foreach { case dependency: IvyNode =>
+        var logMsg = s"$dependency will be retrieved as a dependency for:"


After running this myself, I think your original instinct is right. Let's not bother printing this since there is already fairly thorough printing in ivy.

SparkQA · 2015-02-03T23:47:39Z

Test build #26685 has finished for PR 4215 at commit db2a5cc.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

pwendell · 2015-02-04T00:03:53Z

LGTM pending tests.

SparkQA · 2015-02-04T01:46:07Z

Test build #26692 has finished for PR 4215 at commit 9215851.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- printStream.println(s"Failed to load main class $childMainClass.")

pwendell · 2015-02-04T06:39:58Z

I merged this - thanks Burak!

… spark-submit This PR adds support for using maven coordinates as dependencies to spark-shell. Coordinates can be provided as a comma-delimited string after the flag `--packages`. Additional remote repositories (like SonaType) can be supplied as a comma-delimited string after the flag `--repositories`. Uses the Ivy library to resolve dependencies. Unfortunately the library has no decent documentation, therefore solving more complex dependency issues can be a problem. pwendell, mateiz, mengxr **Note: This is still a WIP. The following need to be handled:** - [x] add docs for the methods - [x] take local ivy cache path as an argument - [x] add tests - [x] add Windows compatibility - [x] exclude unused Ivy dependencies Author: Burak Yavuz <brkyvz@gmail.com> Closes #4215 from brkyvz/SPARK-5341ivy and squashes the following commits: 9215851 [Burak Yavuz] ready to merge db2a5cc [Burak Yavuz] changed logging to printStream 9dae87f [Burak Yavuz] file separators changed 71c374d [Burak Yavuz] merge conflicts fixed c08dc9f [Burak Yavuz] fixed merge conflicts 3ada19a [Burak Yavuz] fixed Jenkins error (hopefully) and added comment on oro 43c2290 [Burak Yavuz] fixed that ONE line 231f72f [Burak Yavuz] addressed code review 2cd6562 [Burak Yavuz] Merge branch 'master' of github.com:apache/spark into SPARK-5341ivy 85ec5a3 [Burak Yavuz] added oro as a dependency explicitly ea44ca4 [Burak Yavuz] add oro back to dependencies cef0e24 [Burak Yavuz] IntelliJ is just messing things up 97c4a92 [Burak Yavuz] fix more weird IntelliJ formatting 9cf077d [Burak Yavuz] fix weird IntelliJ formatting dcf5e13 [Burak Yavuz] fix windows command line flags 3a23f21 [Burak Yavuz] excluded ivy dependencies 53423e0 [Burak Yavuz] tests added 3705907 [Burak Yavuz] remove ivy-repo as a command line argument. Use global ivy cache as default c04d885 [Burak Yavuz] take path to ivy cache as a conf 2edc9b5 [Burak Yavuz] managed to exclude Spark and it's dependencies a0870af [Burak Yavuz] add docs. remove unnecesary new lines 6645af4 [Burak Yavuz] [SPARK-5341] added base implementation 882c4c8 [Burak Yavuz] added maven dependency download (cherry picked from commit 6aed719) Signed-off-by: Patrick Wendell <patrick@databricks.com>

brkyvz added 3 commits January 26, 2015 12:52

added maven dependency download

882c4c8

[SPARK-5341] added base implementation

6645af4

add docs. remove unnecesary new lines

a0870af

mateiz reviewed Jan 27, 2015
View reviewed changes

brkyvz added 2 commits January 27, 2015 14:38

managed to exclude Spark and it's dependencies

2edc9b5

take path to ivy cache as a conf

c04d885

remove ivy-repo as a command line argument. Use global ivy cache as d…

3705907

…efault

pwendell reviewed Jan 28, 2015
View reviewed changes

pwendell mentioned this pull request Jan 28, 2015

[SPARK-5154] [PySpark] [Streaming] Kafka streaming support in Python #3715

Closed

brkyvz added 6 commits January 28, 2015 13:34

tests added

53423e0

excluded ivy dependencies

3a23f21

fix windows command line flags

dcf5e13

fix weird IntelliJ formatting

9cf077d

fix more weird IntelliJ formatting

97c4a92

IntelliJ is just messing things up

cef0e24

brkyvz changed the title ~~[WIP][SPARK-5341] Use maven coordinates as dependencies in spark-shell and spark-submit~~ [SPARK-5341] Use maven coordinates as dependencies in spark-shell and spark-submit Jan 28, 2015

add oro back to dependencies

ea44ca4

added oro as a dependency explicitly

85ec5a3

brkyvz added 2 commits February 1, 2015 22:57

fixed Jenkins error (hopefully) and added comment on oro

3ada19a

fixed merge conflicts

c08dc9f

JoshRosen reviewed Feb 2, 2015
View reviewed changes

merge conflicts fixed

71c374d

JoshRosen reviewed Feb 3, 2015
View reviewed changes

file separators changed

9dae87f

pwendell reviewed Feb 3, 2015
View reviewed changes

changed logging to printStream

db2a5cc

pwendell reviewed Feb 3, 2015
View reviewed changes

ready to merge

9215851

asfgit closed this in 6aed719 Feb 4, 2015

brkyvz deleted the SPARK-5341ivy branch February 3, 2019 20:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-5341] Use maven coordinates as dependencies in spark-shell and spark-submit #4215

[SPARK-5341] Use maven coordinates as dependencies in spark-shell and spark-submit #4215

brkyvz commented Jan 27, 2015

SparkQA commented Jan 27, 2015

SparkQA commented Jan 27, 2015

mateiz Jan 27, 2015

SparkQA commented Jan 27, 2015

pwendell Jan 28, 2015

SparkQA commented Jan 28, 2015

brkyvz commented Jan 28, 2015

SparkQA commented Jan 28, 2015

SparkQA commented Jan 28, 2015

SparkQA commented Jan 28, 2015

SparkQA commented Jan 29, 2015

SparkQA commented Jan 29, 2015

brkyvz commented Jan 31, 2015

SparkQA commented Feb 2, 2015

SparkQA commented Feb 2, 2015

tdas commented Feb 2, 2015

JoshRosen Feb 2, 2015

JoshRosen Feb 2, 2015

brkyvz Feb 3, 2015

SparkQA commented Feb 3, 2015

JoshRosen commented Feb 3, 2015

brkyvz commented Feb 3, 2015

JoshRosen Feb 3, 2015

brkyvz Feb 3, 2015

SparkQA commented Feb 3, 2015

brkyvz commented Feb 3, 2015

pwendell Feb 3, 2015

brkyvz Feb 3, 2015

brkyvz Feb 3, 2015

pwendell Feb 3, 2015

SparkQA commented Feb 3, 2015

pwendell commented Feb 4, 2015

SparkQA commented Feb 4, 2015

pwendell commented Feb 4, 2015

[SPARK-5341] Use maven coordinates as dependencies in spark-shell and spark-submit #4215

[SPARK-5341] Use maven coordinates as dependencies in spark-shell and spark-submit #4215

Conversation

brkyvz commented Jan 27, 2015

SparkQA commented Jan 27, 2015

SparkQA commented Jan 27, 2015

Choose a reason for hiding this comment

SparkQA commented Jan 27, 2015

Choose a reason for hiding this comment

SparkQA commented Jan 28, 2015

brkyvz commented Jan 28, 2015

SparkQA commented Jan 28, 2015

SparkQA commented Jan 28, 2015

SparkQA commented Jan 28, 2015

SparkQA commented Jan 29, 2015

SparkQA commented Jan 29, 2015

brkyvz commented Jan 31, 2015

SparkQA commented Feb 2, 2015

SparkQA commented Feb 2, 2015

tdas commented Feb 2, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Feb 3, 2015

JoshRosen commented Feb 3, 2015

brkyvz commented Feb 3, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Feb 3, 2015

brkyvz commented Feb 3, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Feb 3, 2015

pwendell commented Feb 4, 2015

SparkQA commented Feb 4, 2015

pwendell commented Feb 4, 2015