New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-6144]When in cluster mode using ADD JAR with a hdfs:// sourced ja... #4880
Conversation
Can one of the admins verify this patch? |
ok to test |
Test build #28246 has started for PR 4880 at commit
|
fileOverwrite: Boolean): Unit = { | ||
if (!targetDir.mkdir()) { | ||
fileOverwrite: Boolean, | ||
filename: String = ""): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd use Option[String] = None
. The in L648 you can do val targetFile = new File(targetDir, filename.getOrElse(innerPath.getName)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to adding some kind of type safety here
LGTM aside from minor style issue. I also think this should really go into 1.3... |
@pwendell adding to your radar. |
new File(targetDir, filename) | ||
} else { | ||
new File(targetDir, innerPath.getName) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct? If path
refers to a directory with multiple files in it, then this will fetch all of those files using the same name, overwriting all but the last one fetched. IIUC we need to differentiate between path
being a directory from it being a file in the beginning of this method:
// L641, before the listStatus logic
if (fs.isFile(path)) {
val targetFile = new File(targetDir, filename.getOrElse(path.getName))
val in = fs.open(path)
downloadFile(path.toString, in, targetFile, fileOverwrite)
} else {
... // do the listStatus thing we've been doing before
}
where filename
should be set if and only if the path refers to a file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a little weird but it works. Only the very first call to fetchHcfsFile
defines filename
. If the path
passed to it is a directory, it will recursively call itself without setting filename
. If it's a file, it will write the file using the given filename
. So even if it could be clearer, the code as is should work.
But I'm ok with making it clearer, exactly to avoid this kind of discussion. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the path passed to it is a directory, it will recursively call itself without setting filename
That's not actually true. If you call listStatus
on a directory it will list the directory's contents but not include the directory itself (I just verified this). So if the directory contains multiple files they will all go into the else case in L646 and be renamed to the same thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the problem here is that now targetDir
is the parent directory of where path
should be, and the children are being written directly to that parent path. It needs some code to create this directory corresponding to path
before downloading the children.
@trystanleftwich thanks for fixing this. I believe given the current way we call In other words I think this patch in its current state is probably fine to merge, but I'd be interested to hear what others think. |
I tried this patch locally and while it works for
Let me take a look to see if I figure out what's missing. |
Ah, I didn't realize |
Test build #28246 has finished for PR 4880 at commit
|
Test PASSed. |
I fat fingered and accidentally closed this ticket, And for some reason its not picking up that the branch has changes in it. I reopened here: |
...r will fail
While in cluster mode if you use ADD JAR with a HDFS sourced jar it will fail trying to source that jar on the worker nodes with the following error: