New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-17855][CORE] Remove query string from jar url #15420
Conversation
@@ -147,7 +147,8 @@ private[deploy] class DriverRunner( | |||
* Will throw an exception if there are errors downloading the jar. | |||
*/ | |||
private def downloadUserJar(driverDir: File): String = { | |||
val jarPath = new Path(driverDesc.jarUrl) | |||
// Remove query string if jarUrl is http based | |||
val jarPath = new Path(driverDesc.jarUrl.takeWhile(_ != '?')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are a few small funny things about this method that we could clean up. First, jarUrl is really a URI, so parsing it as a Path is a little odd. It's just used to get the file name, but java.net.URI
is probably the better tool for the job, though it takes an extra step to pick out just the file name.
destPath
is actually the same aslocalJarFile
but defined differently.- File existence checked twice for no reason
- Exception is thrown not IOException
- Log message doesn't actually correctly describe what it's downloading
That is, I wonder if it's worth a little cleanup to get to something like
/**
* Download the user jar into the supplied directory and return its local path.
* Will throw an exception if there are errors downloading the jar.
*/
private def downloadUserJar(driverDir: File): String = {
val jarFileName = new URI(driverDesc.jarUrl).getPath.split("/").last
val localJarFile = new File(driverDir, jarFileName)
if (!localJarFile.exists()) { // May already exist if running multiple workers on one node
logInfo(s"Copying user jar ${driverDesc.jarUrl} to $localJarFile")
Utils.fetchFile(
driverDesc.jarUrl,
driverDir,
conf,
securityManager,
SparkHadoopUtil.get.newConfiguration(conf),
System.currentTimeMillis(),
useCache = false)
if (!localJarFile.exists()) { // Verify copy succeeded
throw new IOException(s"Did not see expected jar $jarFileName in $driverDir")
}
}
localJarFile.getAbsolutePath
}
? Have not tested it directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will test it locally and update this pr.
@srowen I have tested the code on spark 1.6.2. It works fine. |
Test build #3327 has finished for PR 15420 at commit
|
Jenkins test this please |
Test build #66894 has finished for PR 15420 at commit
|
Merged to master |
## What changes were proposed in this pull request? Spark-submit support jar url with http protocol. However, if the url contains any query strings, `worker.DriverRunner.downloadUserJar()` method will throw "Did not see expected jar" exception. This is because this method checks the existance of a downloaded jar whose name contains query strings. This is a problem when your jar is located on some web service which requires some additional information to retrieve the file. This pr just removes query strings before checking jar existance on worker. ## How was this patch tested? For now, you can only test this patch by manual test. * Deploy a spark cluster locally * Make sure apache httpd service is on * Save an uber jar, e.g spark-job.jar under `/var/www/html/` * Use http://localhost/spark-job.jar?param=1 as jar url when running `spark-submit` * Job should be launched Author: invkrh <invkrh@gmail.com> Closes apache#15420 from invkrh/spark-17855.
## What changes were proposed in this pull request? Spark-submit support jar url with http protocol. However, if the url contains any query strings, `worker.DriverRunner.downloadUserJar()` method will throw "Did not see expected jar" exception. This is because this method checks the existance of a downloaded jar whose name contains query strings. This is a problem when your jar is located on some web service which requires some additional information to retrieve the file. This pr just removes query strings before checking jar existance on worker. ## How was this patch tested? For now, you can only test this patch by manual test. * Deploy a spark cluster locally * Make sure apache httpd service is on * Save an uber jar, e.g spark-job.jar under `/var/www/html/` * Use http://localhost/spark-job.jar?param=1 as jar url when running `spark-submit` * Job should be launched Author: invkrh <invkrh@gmail.com> Closes apache#15420 from invkrh/spark-17855.
What changes were proposed in this pull request?
Spark-submit support jar url with http protocol. However, if the url contains any query strings,
worker.DriverRunner.downloadUserJar()
method will throw "Did not see expected jar" exception. This is because this method checks the existance of a downloaded jar whose name contains query strings. This is a problem when your jar is located on some web service which requires some additional information to retrieve the file.This pr just removes query strings before checking jar existance on worker.
How was this patch tested?
For now, you can only test this patch by manual test.
/var/www/html/
spark-submit