Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-17855][CORE] Remove query string from jar url #15420

Closed
wants to merge 3 commits into from

Conversation

invkrh
Copy link
Contributor

@invkrh invkrh commented Oct 10, 2016

What changes were proposed in this pull request?

Spark-submit support jar url with http protocol. However, if the url contains any query strings, worker.DriverRunner.downloadUserJar() method will throw "Did not see expected jar" exception. This is because this method checks the existance of a downloaded jar whose name contains query strings. This is a problem when your jar is located on some web service which requires some additional information to retrieve the file.

This pr just removes query strings before checking jar existance on worker.

How was this patch tested?

For now, you can only test this patch by manual test.

  • Deploy a spark cluster locally
  • Make sure apache httpd service is on
  • Save an uber jar, e.g spark-job.jar under /var/www/html/
  • Use http://localhost/spark-job.jar?param=1 as jar url when running spark-submit
  • Job should be launched

@invkrh invkrh changed the title Remove query string from jar url [SPARK-17855][CORE] Remove query string from jar url Oct 10, 2016
@@ -147,7 +147,8 @@ private[deploy] class DriverRunner(
* Will throw an exception if there are errors downloading the jar.
*/
private def downloadUserJar(driverDir: File): String = {
val jarPath = new Path(driverDesc.jarUrl)
// Remove query string if jarUrl is http based
val jarPath = new Path(driverDesc.jarUrl.takeWhile(_ != '?'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are a few small funny things about this method that we could clean up. First, jarUrl is really a URI, so parsing it as a Path is a little odd. It's just used to get the file name, but java.net.URI is probably the better tool for the job, though it takes an extra step to pick out just the file name.

  • destPath is actually the same as localJarFile but defined differently.
  • File existence checked twice for no reason
  • Exception is thrown not IOException
  • Log message doesn't actually correctly describe what it's downloading

That is, I wonder if it's worth a little cleanup to get to something like

  /**
   * Download the user jar into the supplied directory and return its local path.
   * Will throw an exception if there are errors downloading the jar.
   */
  private def downloadUserJar(driverDir: File): String = {
    val jarFileName = new URI(driverDesc.jarUrl).getPath.split("/").last
    val localJarFile = new File(driverDir, jarFileName)
    if (!localJarFile.exists()) { // May already exist if running multiple workers on one node
      logInfo(s"Copying user jar ${driverDesc.jarUrl} to $localJarFile")
      Utils.fetchFile(
        driverDesc.jarUrl,
        driverDir,
        conf,
        securityManager,
        SparkHadoopUtil.get.newConfiguration(conf),
        System.currentTimeMillis(),
        useCache = false)
      if (!localJarFile.exists()) { // Verify copy succeeded
        throw new IOException(s"Did not see expected jar $jarFileName in $driverDir")
      }
    }
    localJarFile.getAbsolutePath
  }

? Have not tested it directly.

Copy link
Contributor Author

@invkrh invkrh Oct 10, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will test it locally and update this pr.

@invkrh
Copy link
Contributor Author

invkrh commented Oct 10, 2016

@srowen I have tested the code on spark 1.6.2. It works fine.

@SparkQA
Copy link

SparkQA commented Oct 12, 2016

Test build #3327 has finished for PR 15420 at commit d418568.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Oct 13, 2016

Jenkins test this please

@SparkQA
Copy link

SparkQA commented Oct 13, 2016

Test build #66894 has finished for PR 15420 at commit 2fade47.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Oct 14, 2016

Merged to master

@asfgit asfgit closed this in 28b645b Oct 14, 2016
robert3005 pushed a commit to palantir/spark that referenced this pull request Nov 1, 2016
## What changes were proposed in this pull request?

Spark-submit support jar url with http protocol. However, if the url contains any query strings, `worker.DriverRunner.downloadUserJar()` method will throw "Did not see expected jar" exception. This is because this method checks the existance of a downloaded jar whose name contains query strings. This is a problem when your jar is located on some web service which requires some additional information to retrieve the file.

This pr just removes query strings before checking jar existance on worker.

## How was this patch tested?

For now, you can only test this patch by manual test.
* Deploy a spark cluster locally
* Make sure apache httpd service is on
* Save an uber jar, e.g spark-job.jar under `/var/www/html/`
* Use http://localhost/spark-job.jar?param=1 as jar url when running `spark-submit`
* Job should be launched

Author: invkrh <invkrh@gmail.com>

Closes apache#15420 from invkrh/spark-17855.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
## What changes were proposed in this pull request?

Spark-submit support jar url with http protocol. However, if the url contains any query strings, `worker.DriverRunner.downloadUserJar()` method will throw "Did not see expected jar" exception. This is because this method checks the existance of a downloaded jar whose name contains query strings. This is a problem when your jar is located on some web service which requires some additional information to retrieve the file.

This pr just removes query strings before checking jar existance on worker.

## How was this patch tested?

For now, you can only test this patch by manual test.
* Deploy a spark cluster locally
* Make sure apache httpd service is on
* Save an uber jar, e.g spark-job.jar under `/var/www/html/`
* Use http://localhost/spark-job.jar?param=1 as jar url when running `spark-submit`
* Job should be launched

Author: invkrh <invkrh@gmail.com>

Closes apache#15420 from invkrh/spark-17855.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants