-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-11102] [SQL] Uninformative exception when specifing non-exist #9490
Conversation
…input for JSON data source
…input for JSON data source
Test build #45111 has finished for PR 9490 at commit
|
cc @liancheng |
Test build #45178 has finished for PR 9490 at commit
|
Looks like the failed test is not related. @liancheng Please help review it. Thanks |
test this please |
Test build #45500 has finished for PR 9490 at commit
|
Test build #45590 has finished for PR 9490 at commit
|
Test build #45606 has finished for PR 9490 at commit
|
Comments for the change:
So for these cases, I do the following changes
|
@@ -431,7 +436,7 @@ abstract class HadoopFsRelation private[sql](maybePartitionSpec: Option[Partitio | |||
val hdfsPath = new Path(path) | |||
val fs = hdfsPath.getFileSystem(hadoopConf) | |||
val qualified = hdfsPath.makeQualified(fs.getUri, fs.getWorkingDirectory) | |||
|
|||
inputExists = inputExists && fs.exists(qualified) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be quite expensive since each fs.exists(qualified)
call invokes FileSystem.getFileStatus()
, which is an RPC call. On the other hand, we've already called fs.listStatus(qualified)
below. Would be better to merge these two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another issue is that this block doesn't handle the case of parallel file listing (the other if
block above).
Hey @zjffdu, are you still working on this? |
@liancheng sorry for late response, will try to make a new commit this weekend |
Test build #46809 has finished for PR 9490 at commit
|
Test build #46810 has finished for PR 9490 at commit
|
Test build #46821 has finished for PR 9490 at commit
|
@zjffdu I feel it is not a good idea to change lots of code just to get a better error message. Is it possible to have a small change to achieve the goal? |
@yhuai Only changing JSONRelation/TextRelation can be a short-term solution. But since this is a general issue for HadoopFsRelation. so it would be better to do it in HadoopFsRelation. Otherwise we may meet same issue when we have new HadoopFsRelation implemention or user create a custom HadoopFsRelation. |
@yhuai @liancheng any more comments ? |
@zjffdu How about we revisit it after we release 1.6? |
@yhuai Sure, np |
Please close this PR |
ping @zjffdu |
Rebase the code base and create another PR, and close the previous PR #9223
Paste comment from last PR here to give more context
The JsonRelation has another special case that it allow input paths as empty when it takes RDD[String] as input. So HadoopFsRelation can not assume all its sub classes must have non-empty input paths. Maybe need to add another flag "allowEmptyInputPaths" in HadoopFsRelation and allow its implementation to decide that.