New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-17568][CORE][DEPLOY] Add spark-submit option to override ivy settings used to resolve packages/artifacts #15119
Conversation
Test build #65506 has finished for PR 15119 at commit
|
FYI I tried this out in our environment
Everything worked just as I'd expected. Hope to see this merged! |
Hi @themodernlife, thanks for testing this out! Glad it worked for you, I'll try to ping some others and see if I can get more interest. ping @JoshRosen @brkyvz |
@@ -291,8 +292,12 @@ object SparkSubmit { | |||
} else { | |||
Nil | |||
} | |||
|
|||
val ivySettings = Option(args.ivySettingsFile).map(SparkSubmitUtils.loadIvySettings).getOrElse( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So basically it's either / or. Is there any way to reconcile the two? (e.g. load the file and add repositories explicitly defined in the command line). This would allow one to override the settings file in the default Spark conf and still be able to add extra repositories.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vanzin I thought this would be useful too, but it turns out I haven't missed it at all using this patch for the last few weeks. It's no big deal to add any "one off" repositories to an xml file if you need that kind of customization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's not much work I think it would be worth it to support the use case. If there's something that blocks that from working well, then it's ok to not support it.
(It also avoids confusion: "why can't I use both of these options together?")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just figured any repositories would be added to the settings file, but I can see how having both options could be of some benefit so let me change that.
@@ -445,6 +450,8 @@ object SparkSubmit { | |||
sysProp = "spark.submit.deployMode"), | |||
OptionAssigner(args.name, ALL_CLUSTER_MGRS, ALL_DEPLOY_MODES, sysProp = "spark.app.name"), | |||
OptionAssigner(args.ivyRepoPath, ALL_CLUSTER_MGRS, CLIENT, sysProp = "spark.jars.ivy"), | |||
OptionAssigner(args.ivySettingsFile, ALL_CLUSTER_MGRS, ALL_DEPLOY_MODES, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this setting needed by the application? It seems it only really affects SparkSubmit
itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember I was unsure of this line, but was trying to follow the ivyRepoPath
setting as an example. So you're saying this options are only for the application and I can delete this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes; OptionAssigner
is for propagating the option to SparkConf, and I don't think that's needed for this option.
val jarPath = SparkSubmitUtils.resolveMavenCoordinates(main.toString, None, None, | ||
val jarPath = SparkSubmitUtils.resolveMavenCoordinates( | ||
main.toString, | ||
SparkSubmitUtils.buildIvySettings(None, None), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to add a test that creates a settings file and uses it, instead of calling this method on every test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, there should be a test for loading the file, I'll add that.
@@ -175,6 +176,7 @@ private[deploy] class SparkSubmitArguments(args: Seq[String], env: Map[String, S | |||
jars = Option(jars).orElse(sparkProperties.get("spark.jars")).orNull | |||
files = Option(files).orElse(sparkProperties.get("spark.files")).orNull | |||
ivyRepoPath = sparkProperties.get("spark.jars.ivy").orNull | |||
ivySettingsFile = sparkProperties.get("spark.ivy.settings").orNull |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably deserves documentation in docs/configuration.md
.
Thanks for taking a look @vanzin, it'll be a few days before I can work on this but I'll ping you and @themodernlife when ready. |
Sorry for the delay @vanzin and @themodernlife , I was only able to start looking at this today. I have a couple questions for you guys
|
Not sure about (1), maybe @brkyvz knows since I think he wrote that code. (2) sounds ok to me. |
|
modified documentation for option changes closes #12
@vanzin and @themodernlife , I have this fixed up to use the additional repositories when loading ivy settings from a file. Added a new test for loading a settings and fixed up the docs for |
Test build #71001 has finished for PR 15119 at commit
|
@@ -450,8 +452,20 @@ Apart from these, the following properties are also available, and may be useful | |||
<td><code>spark.jars.ivy</code></td> | |||
<td></td> | |||
<td> | |||
Comma-separated list of additional remote repositories to search for the coordinates given |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is wrong documentation. At the time of the first PR we didn't document this, because we were unsure how popular the feature would be and we didn't want to confuse anyone with too many configurations. I think the time has come to add all this new functionality :)
</td> | ||
</tr> | ||
<tr> | ||
<td><code>spark.ivy.settings</code></td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all other confs are spark.jars.ivy
. Should we be consistent here as well? spark.jars.ivySettings
perhaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds fine with me, it would at least make it more clear that they are related
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look ok. Just a couple minor things (aside from Burak's comment).
@@ -283,8 +284,17 @@ object SparkSubmit extends CommandLineUtils { | |||
} else { | |||
Nil | |||
} | |||
|
|||
// Create the IvySettings, either load from file or build defaults | |||
val ivySettings = if (Option(args.ivySettingsFile).isDefined) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a really long way of saying args.ivySettingsFile != null
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true, that is kind of dumb I guess.. I just realized that since this is just a conf anyway and not a command-line arg, I don't need args.ivySettingsFile
and I could just do
val ivySettings = args.sparkProperties.get("spark.jars.ivySettings").map { ivySettingsFile => ...
Look ok to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
brr.setName(s"repo-${i + 1}") | ||
cr.add(brr) | ||
// scalastyle:off println | ||
printStream.println(s"$repo added as a remote repository with the name: ${brr.getName}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's used in testing IIRC and is helpful in debugging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was in there before and I think it's nice to get confirmation that your additional was added with that alias, in case the artifact you need isn't resolved. It will print out all the info like this
bin/run-example --repositories www.somerepo.com --packages some:package:1.0 SparkPi
Listening for transport dt_socket at address: 5005
www.somerepo.com added as a remote repository with the name: repo-1
Ivy Default Cache set to: /home/bryan/.ivy2/cache
The jars for the packages stored in: /home/bryan/.ivy2/jars
:: loading settings :: url = jar:file:/home/bryan/git/spark/assembly/target/scala-2.11/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
some#package added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
:: resolution report :: resolve 1540ms :: artifacts dl 0ms
:: modules in use:
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 1 | 0 | 0 | 0 || 0 | 0 |
---------------------------------------------------------------------
:: problems summary ::
:::: WARNINGS
::::::::::::::::::::::::::::::::::::::::::::::
:: UNRESOLVED DEPENDENCIES ::
::::::::::::::::::::::::::::::::::::::::::::::
:: some#package;1.0: repo-1: unable to get resource for some#package;1.0: res=www.somerepo.com/some/package/1.0/package-1.0.pom: java.net.MalformedURLException: no protocol: www.somerepo.com/some/package/1.0/package-1.0.pom
::::::::::::::::::::::::::::::::::::::::::::::
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I saw later it was already there. I just generally dislike non-error log messages that cannot be turned off.
Test build #71100 has finished for PR 15119 at commit
|
Hi @vanzin, any more comments or think this could be merged? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two minor nits. If no one adds more comments, I'll merge once you address them.
classpaths. If <code>spark.jars.ivySettings</code> is given artifacts will be resolved according | ||
to the configuration in the file, otherwise artifacts will be searched for in the local maven repo, | ||
then maven central and finally any additional remote repositories given by the command-line option | ||
<code>--repositories</code> see <a href="submitting-applications.html#advanced-dependency-management">Advanced Dependency Management</a>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You seem to be missing punctuation before "see".
|""".stripMargin | ||
|
||
val settingsFile = new File(tempIvyPath, "ivysettings.xml") | ||
val settingsWriter = new FileWriter(settingsFile) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I added another comment... minor, but guava's Files.write
would be cleaner here (and it also forces you to specify the encoding, which you should be doing here anyway).
Thanks @vanzin , good points. I took another read through the doc and thought it was better to move things around a little. I also noticed a misplaced tag in another setting. Let me know if you see any other issues. |
Test build #71173 has finished for PR 15119 at commit
|
LGTM. Merging to master. |
…ettings used to resolve packages/artifacts ## What changes were proposed in this pull request? Adding option in spark-submit to allow overriding the default IvySettings used to resolve artifacts as part of the Spark Packages functionality. This will allow all artifact resolution to go through a central managed repository, such as Nexus or Artifactory, where site admins can better approve and control what is used with Spark apps. This change restructures the creation of the IvySettings object in two distinct ways. First, if the `spark.ivy.settings` option is not defined then `buildIvySettings` will create a default settings instance, as before, with defined repositories (Maven Central) included. Second, if the option is defined, the ivy settings file will be loaded from the given path and only repositories defined within will be used for artifact resolution. ## How was this patch tested? Existing tests for default behaviour, Manual tests that load a ivysettings.xml file with local and Nexus repositories defined. Added new test to load a simple Ivy settings file with a local filesystem resolver. Author: Bryan Cutler <cutlerb@gmail.com> Author: Ian Hummel <ian@themodernlife.net> Closes apache#15119 from BryanCutler/spark-custom-IvySettings.
…ettings used to resolve packages/artifacts ## What changes were proposed in this pull request? Adding option in spark-submit to allow overriding the default IvySettings used to resolve artifacts as part of the Spark Packages functionality. This will allow all artifact resolution to go through a central managed repository, such as Nexus or Artifactory, where site admins can better approve and control what is used with Spark apps. This change restructures the creation of the IvySettings object in two distinct ways. First, if the `spark.ivy.settings` option is not defined then `buildIvySettings` will create a default settings instance, as before, with defined repositories (Maven Central) included. Second, if the option is defined, the ivy settings file will be loaded from the given path and only repositories defined within will be used for artifact resolution. ## How was this patch tested? Existing tests for default behaviour, Manual tests that load a ivysettings.xml file with local and Nexus repositories defined. Added new test to load a simple Ivy settings file with a local filesystem resolver. Author: Bryan Cutler <cutlerb@gmail.com> Author: Ian Hummel <ian@themodernlife.net> Closes apache#15119 from BryanCutler/spark-custom-IvySettings.
…ettings used to resolve packages/artifacts ## What changes were proposed in this pull request? Adding option in spark-submit to allow overriding the default IvySettings used to resolve artifacts as part of the Spark Packages functionality. This will allow all artifact resolution to go through a central managed repository, such as Nexus or Artifactory, where site admins can better approve and control what is used with Spark apps. This change restructures the creation of the IvySettings object in two distinct ways. First, if the `spark.ivy.settings` option is not defined then `buildIvySettings` will create a default settings instance, as before, with defined repositories (Maven Central) included. Second, if the option is defined, the ivy settings file will be loaded from the given path and only repositories defined within will be used for artifact resolution. ## How was this patch tested? Existing tests for default behaviour, Manual tests that load a ivysettings.xml file with local and Nexus repositories defined. Added new test to load a simple Ivy settings file with a local filesystem resolver. Author: Bryan Cutler <cutlerb@gmail.com> Author: Ian Hummel <ian@themodernlife.net> Closes apache#15119 from BryanCutler/spark-custom-IvySettings.
What changes were proposed in this pull request?
Adding option in spark-submit to allow overriding the default IvySettings used to resolve artifacts as part of the Spark Packages functionality. This will allow all artifact resolution to go through a central managed repository, such as Nexus or Artifactory, where site admins can better approve and control what is used with Spark apps.
This change restructures the creation of the IvySettings object in two distinct ways. First, if the
spark.ivy.settings
option is not defined thenbuildIvySettings
will create a default settings instance, as before, with defined repositories (Maven Central) included. Second, if the option is defined, the ivy settings file will be loaded from the given path and only repositories defined within will be used for artifact resolution.How was this patch tested?
Existing tests for default behaviour, Manual tests that load a ivysettings.xml file with local and Nexus repositories defined. Added new test to load a simple Ivy settings file with a local filesystem resolver.