[SPARK-17855][CORE] Remove query string from jar url #15420

invkrh · 2016-10-10T13:43:58Z

What changes were proposed in this pull request?

Spark-submit support jar url with http protocol. However, if the url contains any query strings, worker.DriverRunner.downloadUserJar() method will throw "Did not see expected jar" exception. This is because this method checks the existance of a downloaded jar whose name contains query strings. This is a problem when your jar is located on some web service which requires some additional information to retrieve the file.

This pr just removes query strings before checking jar existance on worker.

How was this patch tested?

For now, you can only test this patch by manual test.

Deploy a spark cluster locally
Make sure apache httpd service is on
Save an uber jar, e.g spark-job.jar under /var/www/html/
Use http://localhost/spark-job.jar?param=1 as jar url when running spark-submit
Job should be launched

srowen · 2016-10-10T14:02:34Z

core/src/main/scala/org/apache/spark/deploy/worker/DriverRunner.scala

@@ -147,7 +147,8 @@ private[deploy] class DriverRunner(
   * Will throw an exception if there are errors downloading the jar.
   */
  private def downloadUserJar(driverDir: File): String = {
-    val jarPath = new Path(driverDesc.jarUrl)
+    // Remove query string if jarUrl is http based
+    val jarPath = new Path(driverDesc.jarUrl.takeWhile(_ != '?'))


I think there are a few small funny things about this method that we could clean up. First, jarUrl is really a URI, so parsing it as a Path is a little odd. It's just used to get the file name, but java.net.URI is probably the better tool for the job, though it takes an extra step to pick out just the file name.

destPath is actually the same as localJarFile but defined differently.

File existence checked twice for no reason

Exception is thrown not IOException

Log message doesn't actually correctly describe what it's downloading

That is, I wonder if it's worth a little cleanup to get to something like

/** * Download the user jar into the supplied directory and return its local path. * Will throw an exception if there are errors downloading the jar. */ private def downloadUserJar(driverDir: File): String = { val jarFileName = new URI(driverDesc.jarUrl).getPath.split("/").last val localJarFile = new File(driverDir, jarFileName) if (!localJarFile.exists()) { // May already exist if running multiple workers on one node logInfo(s"Copying user jar ${driverDesc.jarUrl} to $localJarFile") Utils.fetchFile( driverDesc.jarUrl, driverDir, conf, securityManager, SparkHadoopUtil.get.newConfiguration(conf), System.currentTimeMillis(), useCache = false) if (!localJarFile.exists()) { // Verify copy succeeded throw new IOException(s"Did not see expected jar $jarFileName in $driverDir") } } localJarFile.getAbsolutePath }

? Have not tested it directly.

I will test it locally and update this pr.

invkrh · 2016-10-10T16:34:49Z

@srowen I have tested the code on spark 1.6.2. It works fine.

SparkQA · 2016-10-12T12:34:59Z

Test build #3327 has finished for PR 15420 at commit d418568.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2016-10-13T13:42:23Z

Jenkins test this please

SparkQA · 2016-10-13T16:08:08Z

Test build #66894 has finished for PR 15420 at commit 2fade47.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2016-10-14T11:52:14Z

Merged to master

## What changes were proposed in this pull request? Spark-submit support jar url with http protocol. However, if the url contains any query strings, `worker.DriverRunner.downloadUserJar()` method will throw "Did not see expected jar" exception. This is because this method checks the existance of a downloaded jar whose name contains query strings. This is a problem when your jar is located on some web service which requires some additional information to retrieve the file. This pr just removes query strings before checking jar existance on worker. ## How was this patch tested? For now, you can only test this patch by manual test. * Deploy a spark cluster locally * Make sure apache httpd service is on * Save an uber jar, e.g spark-job.jar under `/var/www/html/` * Use http://localhost/spark-job.jar?param=1 as jar url when running `spark-submit` * Job should be launched Author: invkrh <invkrh@gmail.com> Closes apache#15420 from invkrh/spark-17855.

Remove query string from jar url

ef308ac

invkrh changed the title ~~Remove query string from jar url~~ [SPARK-17855][CORE] Remove query string from jar url Oct 10, 2016

srowen reviewed Oct 10, 2016

View reviewed changes

Use URI and improve log msg

d418568

Fix code style

2fade47

asfgit closed this in 28b645b Oct 14, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-17855][CORE] Remove query string from jar url #15420

[SPARK-17855][CORE] Remove query string from jar url #15420

invkrh commented Oct 10, 2016

srowen Oct 10, 2016

invkrh Oct 10, 2016 •

edited

invkrh commented Oct 10, 2016

SparkQA commented Oct 12, 2016

srowen commented Oct 13, 2016

SparkQA commented Oct 13, 2016

srowen commented Oct 14, 2016

[SPARK-17855][CORE] Remove query string from jar url #15420

[SPARK-17855][CORE] Remove query string from jar url #15420

Conversation

invkrh commented Oct 10, 2016

What changes were proposed in this pull request?

How was this patch tested?

srowen Oct 10, 2016

Choose a reason for hiding this comment

invkrh Oct 10, 2016 • edited

Choose a reason for hiding this comment

invkrh commented Oct 10, 2016

SparkQA commented Oct 12, 2016

srowen commented Oct 13, 2016

SparkQA commented Oct 13, 2016

srowen commented Oct 14, 2016

invkrh Oct 10, 2016 •

edited