Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-17568][CORE][DEPLOY] Add spark-submit option to override ivy settings used to resolve packages/artifacts #15119

Closed
wants to merge 14 commits into from

Conversation

BryanCutler
Copy link
Member

@BryanCutler BryanCutler commented Sep 16, 2016

What changes were proposed in this pull request?

Adding option in spark-submit to allow overriding the default IvySettings used to resolve artifacts as part of the Spark Packages functionality. This will allow all artifact resolution to go through a central managed repository, such as Nexus or Artifactory, where site admins can better approve and control what is used with Spark apps.

This change restructures the creation of the IvySettings object in two distinct ways. First, if the spark.ivy.settings option is not defined then buildIvySettings will create a default settings instance, as before, with defined repositories (Maven Central) included. Second, if the option is defined, the ivy settings file will be loaded from the given path and only repositories defined within will be used for artifact resolution.

How was this patch tested?

Existing tests for default behaviour, Manual tests that load a ivysettings.xml file with local and Nexus repositories defined. Added new test to load a simple Ivy settings file with a local filesystem resolver.

@SparkQA
Copy link

SparkQA commented Sep 17, 2016

Test build #65506 has finished for PR 15119 at commit 1f472ff.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@themodernlife
Copy link
Contributor

FYI I tried this out in our environment

  • Firewall (no access to maven central)
  • Custom ivysettings.xml to point to our internal Artifactory

Everything worked just as I'd expected. Hope to see this merged!

@BryanCutler
Copy link
Member Author

Hi @themodernlife, thanks for testing this out! Glad it worked for you, I'll try to ping some others and see if I can get more interest. ping @JoshRosen @brkyvz

@@ -291,8 +292,12 @@ object SparkSubmit {
} else {
Nil
}

val ivySettings = Option(args.ivySettingsFile).map(SparkSubmitUtils.loadIvySettings).getOrElse(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So basically it's either / or. Is there any way to reconcile the two? (e.g. load the file and add repositories explicitly defined in the command line). This would allow one to override the settings file in the default Spark conf and still be able to add extra repositories.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vanzin I thought this would be useful too, but it turns out I haven't missed it at all using this patch for the last few weeks. It's no big deal to add any "one off" repositories to an xml file if you need that kind of customization.

Copy link
Contributor

@vanzin vanzin Dec 16, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's not much work I think it would be worth it to support the use case. If there's something that blocks that from working well, then it's ok to not support it.

(It also avoids confusion: "why can't I use both of these options together?")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just figured any repositories would be added to the settings file, but I can see how having both options could be of some benefit so let me change that.

@@ -445,6 +450,8 @@ object SparkSubmit {
sysProp = "spark.submit.deployMode"),
OptionAssigner(args.name, ALL_CLUSTER_MGRS, ALL_DEPLOY_MODES, sysProp = "spark.app.name"),
OptionAssigner(args.ivyRepoPath, ALL_CLUSTER_MGRS, CLIENT, sysProp = "spark.jars.ivy"),
OptionAssigner(args.ivySettingsFile, ALL_CLUSTER_MGRS, ALL_DEPLOY_MODES,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this setting needed by the application? It seems it only really affects SparkSubmit itself.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember I was unsure of this line, but was trying to follow the ivyRepoPath setting as an example. So you're saying this options are only for the application and I can delete this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes; OptionAssigner is for propagating the option to SparkConf, and I don't think that's needed for this option.

val jarPath = SparkSubmitUtils.resolveMavenCoordinates(main.toString, None, None,
val jarPath = SparkSubmitUtils.resolveMavenCoordinates(
main.toString,
SparkSubmitUtils.buildIvySettings(None, None),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to add a test that creates a settings file and uses it, instead of calling this method on every test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, there should be a test for loading the file, I'll add that.

@@ -175,6 +176,7 @@ private[deploy] class SparkSubmitArguments(args: Seq[String], env: Map[String, S
jars = Option(jars).orElse(sparkProperties.get("spark.jars")).orNull
files = Option(files).orElse(sparkProperties.get("spark.files")).orNull
ivyRepoPath = sparkProperties.get("spark.jars.ivy").orNull
ivySettingsFile = sparkProperties.get("spark.ivy.settings").orNull
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably deserves documentation in docs/configuration.md.

@BryanCutler
Copy link
Member Author

Thanks for taking a look @vanzin, it'll be a few days before I can work on this but I'll ping you and @themodernlife when ready.

@BryanCutler
Copy link
Member Author

Sorry for the delay @vanzin and @themodernlife , I was only able to start looking at this today. I have a couple questions for you guys

  1. I think the docs might not be quite right about spark.jars.ivy. It's supposed to be a local path to Ivy and used to make the Ivy cache path and a path to download resolved jars, not for a remote repository, but that might work. For additional repositories, the user should specify with the command line option --repositories, which doesn't seem to have a conf property. Do you guys agree? If so I could fix up the docs here.

  2. To make this PR work with additional repositories, it's a little more complicated due to some needed checks. I think the best strategy would be to create a new ChainResolver with a unique name (to not conflict with defined resolvers). Add the default resolver from the settings file to this (if exists), followed by the additional repositories, then make this the new default resolver. The last artifact resolved is the one used, so the additional repositories would have precedence over those in the settings file. In the same vein, if spark.jars.ivy is specified, it would override the Ivy user dir and cache path, if defined in the settings file. Sound ok?

@vanzin
Copy link
Contributor

vanzin commented Jan 3, 2017

Not sure about (1), maybe @brkyvz knows since I think he wrote that code. (2) sounds ok to me.

@themodernlife
Copy link
Contributor

@BryanCutler

@BryanCutler
Copy link
Member Author

@vanzin and @themodernlife , I have this fixed up to use the additional repositories when loading ivy settings from a file. Added a new test for loading a settings and fixed up the docs for spark.jars.ivy - I agree that the name is confusing, but hopefully clear in the docs. Thanks @themodernlife for helping out with the docs too. If you're able to try out this latest revision too, that would be great!

@SparkQA
Copy link

SparkQA commented Jan 7, 2017

Test build #71001 has finished for PR 15119 at commit 7a7b6ba.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -450,8 +452,20 @@ Apart from these, the following properties are also available, and may be useful
<td><code>spark.jars.ivy</code></td>
<td></td>
<td>
Comma-separated list of additional remote repositories to search for the coordinates given
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is wrong documentation. At the time of the first PR we didn't document this, because we were unsure how popular the feature would be and we didn't want to confuse anyone with too many configurations. I think the time has come to add all this new functionality :)

</td>
</tr>
<tr>
<td><code>spark.ivy.settings</code></td>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all other confs are spark.jars.ivy. Should we be consistent here as well? spark.jars.ivySettings perhaps?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds fine with me, it would at least make it more clear that they are related

Copy link
Contributor

@vanzin vanzin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look ok. Just a couple minor things (aside from Burak's comment).

@@ -283,8 +284,17 @@ object SparkSubmit extends CommandLineUtils {
} else {
Nil
}

// Create the IvySettings, either load from file or build defaults
val ivySettings = if (Option(args.ivySettingsFile).isDefined) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a really long way of saying args.ivySettingsFile != null.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, that is kind of dumb I guess.. I just realized that since this is just a conf anyway and not a command-line arg, I don't need args.ivySettingsFile and I could just do

val ivySettings = args.sparkProperties.get("spark.jars.ivySettings").map { ivySettingsFile => ...

Look ok to you?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

brr.setName(s"repo-${i + 1}")
cr.add(brr)
// scalastyle:off println
printStream.println(s"$repo added as a remote repository with the name: ${brr.getName}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really needed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's used in testing IIRC and is helpful in debugging

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was in there before and I think it's nice to get confirmation that your additional was added with that alias, in case the artifact you need isn't resolved. It will print out all the info like this

bin/run-example --repositories www.somerepo.com --packages some:package:1.0 SparkPi
Listening for transport dt_socket at address: 5005
www.somerepo.com added as a remote repository with the name: repo-1
Ivy Default Cache set to: /home/bryan/.ivy2/cache
The jars for the packages stored in: /home/bryan/.ivy2/jars
:: loading settings :: url = jar:file:/home/bryan/git/spark/assembly/target/scala-2.11/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
some#package added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
	confs: [default]
:: resolution report :: resolve 1540ms :: artifacts dl 0ms
	:: modules in use:
	---------------------------------------------------------------------
	|                  |            modules            ||   artifacts   |
	|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
	---------------------------------------------------------------------
	|      default     |   1   |   0   |   0   |   0   ||   0   |   0   |
	---------------------------------------------------------------------

:: problems summary ::
:::: WARNINGS
		::::::::::::::::::::::::::::::::::::::::::::::

		::          UNRESOLVED DEPENDENCIES         ::

		::::::::::::::::::::::::::::::::::::::::::::::

		:: some#package;1.0: repo-1: unable to get resource for some#package;1.0: res=www.somerepo.com/some/package/1.0/package-1.0.pom: java.net.MalformedURLException: no protocol: www.somerepo.com/some/package/1.0/package-1.0.pom

		::::::::::::::::::::::::::::::::::::::::::::::

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I saw later it was already there. I just generally dislike non-error log messages that cannot be turned off.

@SparkQA
Copy link

SparkQA commented Jan 10, 2017

Test build #71100 has finished for PR 15119 at commit 9b6d67f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@BryanCutler
Copy link
Member Author

Hi @vanzin, any more comments or think this could be merged?

Copy link
Contributor

@vanzin vanzin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two minor nits. If no one adds more comments, I'll merge once you address them.

classpaths. If <code>spark.jars.ivySettings</code> is given artifacts will be resolved according
to the configuration in the file, otherwise artifacts will be searched for in the local maven repo,
then maven central and finally any additional remote repositories given by the command-line option
<code>--repositories</code> see <a href="submitting-applications.html#advanced-dependency-management">Advanced Dependency Management</a>.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You seem to be missing punctuation before "see".

|""".stripMargin

val settingsFile = new File(tempIvyPath, "ivysettings.xml")
val settingsWriter = new FileWriter(settingsFile)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I added another comment... minor, but guava's Files.write would be cleaner here (and it also forces you to specify the encoding, which you should be doing here anyway).

@BryanCutler
Copy link
Member Author

Thanks @vanzin , good points. I took another read through the doc and thought it was better to move things around a little. I also noticed a misplaced tag in another setting. Let me know if you see any other issues.

@SparkQA
Copy link

SparkQA commented Jan 11, 2017

Test build #71173 has finished for PR 15119 at commit 532e744.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Jan 11, 2017

LGTM. Merging to master.

@asfgit asfgit closed this in 3bc2eff Jan 11, 2017
themodernlife pushed a commit to themodernlife/spark that referenced this pull request Jan 12, 2017
…ettings used to resolve packages/artifacts

## What changes were proposed in this pull request?

Adding option in spark-submit to allow overriding the default IvySettings used to resolve artifacts as part of the Spark Packages functionality.  This will allow all artifact resolution to go through a central managed repository, such as Nexus or Artifactory, where site admins can better approve and control what is used with Spark apps.

This change restructures the creation of the IvySettings object in two distinct ways.  First, if the `spark.ivy.settings` option is not defined then `buildIvySettings` will create a default settings instance, as before, with defined repositories (Maven Central) included.  Second, if the option is defined, the ivy settings file will be loaded from the given path and only repositories defined within will be used for artifact resolution.
## How was this patch tested?

Existing tests for default behaviour, Manual tests that load a ivysettings.xml file with local and Nexus repositories defined.  Added new test to load a simple Ivy settings file with a local filesystem resolver.

Author: Bryan Cutler <cutlerb@gmail.com>
Author: Ian Hummel <ian@themodernlife.net>

Closes apache#15119 from BryanCutler/spark-custom-IvySettings.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…ettings used to resolve packages/artifacts

## What changes were proposed in this pull request?

Adding option in spark-submit to allow overriding the default IvySettings used to resolve artifacts as part of the Spark Packages functionality.  This will allow all artifact resolution to go through a central managed repository, such as Nexus or Artifactory, where site admins can better approve and control what is used with Spark apps.

This change restructures the creation of the IvySettings object in two distinct ways.  First, if the `spark.ivy.settings` option is not defined then `buildIvySettings` will create a default settings instance, as before, with defined repositories (Maven Central) included.  Second, if the option is defined, the ivy settings file will be loaded from the given path and only repositories defined within will be used for artifact resolution.
## How was this patch tested?

Existing tests for default behaviour, Manual tests that load a ivysettings.xml file with local and Nexus repositories defined.  Added new test to load a simple Ivy settings file with a local filesystem resolver.

Author: Bryan Cutler <cutlerb@gmail.com>
Author: Ian Hummel <ian@themodernlife.net>

Closes apache#15119 from BryanCutler/spark-custom-IvySettings.
cmonkey pushed a commit to cmonkey/spark that referenced this pull request Feb 15, 2017
…ettings used to resolve packages/artifacts

## What changes were proposed in this pull request?

Adding option in spark-submit to allow overriding the default IvySettings used to resolve artifacts as part of the Spark Packages functionality.  This will allow all artifact resolution to go through a central managed repository, such as Nexus or Artifactory, where site admins can better approve and control what is used with Spark apps.

This change restructures the creation of the IvySettings object in two distinct ways.  First, if the `spark.ivy.settings` option is not defined then `buildIvySettings` will create a default settings instance, as before, with defined repositories (Maven Central) included.  Second, if the option is defined, the ivy settings file will be loaded from the given path and only repositories defined within will be used for artifact resolution.
## How was this patch tested?

Existing tests for default behaviour, Manual tests that load a ivysettings.xml file with local and Nexus repositories defined.  Added new test to load a simple Ivy settings file with a local filesystem resolver.

Author: Bryan Cutler <cutlerb@gmail.com>
Author: Ian Hummel <ian@themodernlife.net>

Closes apache#15119 from BryanCutler/spark-custom-IvySettings.
@BryanCutler BryanCutler deleted the spark-custom-IvySettings branch March 7, 2017 20:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants