Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-5341] Use maven coordinates as dependencies in spark-shell and spark-submit #4215

Closed
wants to merge 23 commits into from

Conversation

brkyvz
Copy link
Contributor

@brkyvz brkyvz commented Jan 27, 2015

This PR adds support for using maven coordinates as dependencies to spark-shell.
Coordinates can be provided as a comma-delimited string after the flag --packages.
Additional remote repositories (like SonaType) can be supplied as a comma-delimited string after the flag
--repositories.

Uses the Ivy library to resolve dependencies. Unfortunately the library has no decent documentation, therefore solving more complex dependency issues can be a problem.

@pwendell, @mateiz, @mengxr

Note: This is still a WIP. The following need to be handled:

  • add docs for the methods
  • take local ivy cache path as an argument
  • add tests
  • add Windows compatibility
  • exclude unused Ivy dependencies

@SparkQA
Copy link

SparkQA commented Jan 27, 2015

Test build #26136 has finished for PR 4215 at commit a0870af.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 27, 2015

Test build #26135 has finished for PR 4215 at commit 6645af4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

--master | --deploy-mode | --class | --name | --jars | --py-files | --files | \
--conf | --properties-file | --driver-memory | --driver-java-options | \
--master | --deploy-mode | --class | --name | --jars | --maven | --py-files | --files | \
--conf | --maven_repos | --properties-file | --driver-memory | --driver-java-options | \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename this to --maven-repos with a dash instead of an underscore; everything else has a dash

@SparkQA
Copy link

SparkQA commented Jan 27, 2015

Test build #26191 has finished for PR 4215 at commit 2edc9b5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

# modify NOT ONLY this script but also SparkSubmitArgument.scala
SUBMISSION_OPTS=()
APPLICATION_OPTS=()
while (($#)); do
case "$1" in
--master | --deploy-mode | --class | --name | --jars | --py-files | --files | \
--conf | --properties-file | --driver-memory | --driver-java-options | \
--master | --deploy-mode | --class | --name | --jars | --maven | --py-files | --files | \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this one maybe we could call it --packages. IMO maven is a little confusing because it's also the name of a piece of software. I'd also below just say --repositories below.

@SparkQA
Copy link

SparkQA commented Jan 28, 2015

Test build #26200 has finished for PR 4215 at commit 3705907.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • * (4) the main class for the child

@brkyvz brkyvz changed the title [WIP][SPARK-5341] Use maven coordinates as dependencies in spark-shell and spark-submit [SPARK-5341] Use maven coordinates as dependencies in spark-shell and spark-submit Jan 28, 2015
@brkyvz
Copy link
Contributor Author

brkyvz commented Jan 28, 2015

@pwendell @mateiz I think the PR is ready for code review. I would appreciate your comments!

@SparkQA
Copy link

SparkQA commented Jan 28, 2015

Test build #26255 has finished for PR 4215 at commit dcf5e13.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • * (4) the main class for the child

@SparkQA
Copy link

SparkQA commented Jan 28, 2015

Test build #26256 has finished for PR 4215 at commit 97c4a92.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 28, 2015

Test build #26258 has finished for PR 4215 at commit cef0e24.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 29, 2015

Test build #26262 has finished for PR 4215 at commit ea44ca4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 29, 2015

Test build #26277 has finished for PR 4215 at commit 85ec5a3.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@brkyvz
Copy link
Contributor Author

brkyvz commented Jan 31, 2015

Interesting... The tests are successful on my local computer but fails in Jenkins... The end to end test that downloads spark-avro and spark-csv succeeds which is nice. Searching for artifacts at other repositories looks like it failed, but actually it says: Test succeeded, but ended abruptly.

@SparkQA
Copy link

SparkQA commented Feb 2, 2015

Test build #26493 has finished for PR 4215 at commit c08dc9f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • logDebug(s"Did not load class $name from REPL class server at $uri", e)
    • logError(s"Failed to check existence of class $name on REPL class server at $uri", e)

@SparkQA
Copy link

SparkQA commented Feb 2, 2015

Test build #26492 has finished for PR 4215 at commit 3ada19a.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@tdas
Copy link
Contributor

tdas commented Feb 2, 2015

Is this going to be merged for Spark 1.3? If so, please merge this (@pwendell) so that we can update #3715

val path = SparkSubmitUtils.resolveMavenCoordinates("com.agimatec:agimatec-validation:0.9.3",
Option("https://oss.sonatype.org/content/repositories/agimatec/"), None, true)
assert(path.indexOf("agimatec-validation") >= 0, "should find package. If it doesn't, check" +
"if package still exists. If it has been removed, replace the example in this test.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be cool if there was some way to mock out the Maven repository so that this test isn't reliant on third-party services that we don't control; that would also allow the test to run without an internet connection.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kind of tricky, though, since I guess we do want to test against the actual repository at some point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it would be awesome if we can mock it, but on the other hand, we still want to be sure that we can access these remote repositories correctly. I would prefer to keep it for now.

@SparkQA
Copy link

SparkQA commented Feb 3, 2015

Test build #26558 has finished for PR 4215 at commit 71c374d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class LogisticGradient(numClasses: Int) extends Gradient
    • case class HiveScriptIOSchema (
    • val trimed_class = serdeClassName.split("'")(1)

@JoshRosen
Copy link
Contributor

@brkyvz I see that one of the TODOS is for adding Windows compatibility. Beyond the additions to the shell script command-line parsing, what features are we missing for Windows support? I've been testing a few Windows things today in a VM, so if it's just a matter of testing I'd be glad to try things out.

@brkyvz
Copy link
Contributor Author

brkyvz commented Feb 3, 2015

@JoshRosen I actually don't know what we are missing. I think it only requires testing, because the directory structure (backslashes instead of slashes) and command-line parsing should all be handled. If you can test it, I'd really appreciate it!

artifacts.map { artifactInfo =>
val artifactString = artifactInfo.toString
val jarName = artifactString.drop(artifactString.lastIndexOf("!") + 1)
cacheDirectory.getAbsolutePath + "/" + jarName.substring(0, jarName.lastIndexOf(".jar") + 4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoding / as the file separator character will probably break things on Windows; I think we should use File.separator instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Fixed it. Pushing update in a few secs

@SparkQA
Copy link

SparkQA commented Feb 3, 2015

Test build #26615 has finished for PR 4215 at commit 9dae87f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@brkyvz
Copy link
Contributor Author

brkyvz commented Feb 3, 2015

@pwendell, I think this is in good shape to go in right before you cut the branch. Having the community test it out under many different settings and setups would help a lot. @JoshRosen, what do you think?

import org.apache.ivy.plugins.matcher.GlobPatternMatcher
import org.apache.ivy.plugins.resolver.{ChainResolver, IBiblioResolver}

import org.apache.spark.Logging
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't use the existing Spark logging framework. We actually just directly print the output in elsewhere in this tool (look at uses of printStream).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then should I just log to System.out and System.err?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

never mind, just saw the printStream in SparkSubmit

}
// Log the callers for each dependency
rr.getDependencies.toArray.foreach { case dependency: IvyNode =>
var logMsg = s"$dependency will be retrieved as a dependency for:"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After running this myself, I think your original instinct is right. Let's not bother printing this since there is already fairly thorough printing in ivy.

@SparkQA
Copy link

SparkQA commented Feb 3, 2015

Test build #26685 has finished for PR 4215 at commit db2a5cc.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@pwendell
Copy link
Contributor

pwendell commented Feb 4, 2015

LGTM pending tests.

@SparkQA
Copy link

SparkQA commented Feb 4, 2015

Test build #26692 has finished for PR 4215 at commit 9215851.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • printStream.println(s"Failed to load main class $childMainClass.")

@pwendell
Copy link
Contributor

pwendell commented Feb 4, 2015

I merged this - thanks Burak!

@asfgit asfgit closed this in 6aed719 Feb 4, 2015
asfgit pushed a commit that referenced this pull request Feb 4, 2015
… spark-submit

This PR adds support for using maven coordinates as dependencies to spark-shell.
Coordinates can be provided as a comma-delimited string after the flag `--packages`.
Additional remote repositories (like SonaType) can be supplied as a comma-delimited string after the flag
`--repositories`.

Uses the Ivy library to resolve dependencies. Unfortunately the library has no decent documentation, therefore solving more complex dependency issues can be a problem.

pwendell, mateiz, mengxr

**Note: This is still a WIP. The following need to be handled:**
- [x] add docs for the methods
- [x] take local ivy cache path as an argument
- [x] add tests
- [x] add Windows compatibility
- [x] exclude unused Ivy dependencies

Author: Burak Yavuz <brkyvz@gmail.com>

Closes #4215 from brkyvz/SPARK-5341ivy and squashes the following commits:

9215851 [Burak Yavuz] ready to merge
db2a5cc [Burak Yavuz] changed logging to printStream
9dae87f [Burak Yavuz] file separators changed
71c374d [Burak Yavuz] merge conflicts fixed
c08dc9f [Burak Yavuz] fixed merge conflicts
3ada19a [Burak Yavuz] fixed Jenkins error (hopefully) and added comment on oro
43c2290 [Burak Yavuz] fixed that ONE line
231f72f [Burak Yavuz] addressed code review
2cd6562 [Burak Yavuz] Merge branch 'master' of github.com:apache/spark into SPARK-5341ivy
85ec5a3 [Burak Yavuz] added oro as a dependency explicitly
ea44ca4 [Burak Yavuz] add oro back to dependencies
cef0e24 [Burak Yavuz] IntelliJ is just messing things up
97c4a92 [Burak Yavuz] fix more weird IntelliJ formatting
9cf077d [Burak Yavuz] fix weird IntelliJ formatting
dcf5e13 [Burak Yavuz] fix windows command line flags
3a23f21 [Burak Yavuz] excluded ivy dependencies
53423e0 [Burak Yavuz] tests added
3705907 [Burak Yavuz] remove ivy-repo as a command line argument. Use global ivy cache as default
c04d885 [Burak Yavuz] take path to ivy cache as a conf
2edc9b5 [Burak Yavuz] managed to exclude Spark and it's dependencies
a0870af [Burak Yavuz] add docs. remove unnecesary new lines
6645af4 [Burak Yavuz] [SPARK-5341] added base implementation
882c4c8 [Burak Yavuz] added maven dependency download

(cherry picked from commit 6aed719)
Signed-off-by: Patrick Wendell <patrick@databricks.com>
@brkyvz brkyvz deleted the SPARK-5341ivy branch February 3, 2019 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants