New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Spark-11522][SQL] input_file_name() returns "" for external tables #9542
Conversation
Jenkins, test this please. |
Test build #2014 has finished for PR 9542 at commit
|
@rxin I pushed again for the scala style test issue. Will the test build be kicked off automatically or manually? Thanks! |
Jenkins, ok to test |
Test build #45330 has finished for PR 9542 at commit
|
split.inputSplit.value match { | ||
case fs: FileSplit => SqlNewHadoopRDD.setInputFileName(fs.getPath.toString) | ||
case _ => SqlNewHadoopRDD.unsetInputFileName() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you call SqlNewHadoopRDD.unsetInputFileName()
in https://github.com/apache/spark/pull/9542/files#diff-83eb37f7b0ebed3c14ccb7bff0d577c2R257? Like what we do in SqlNewHadoopRDD
?
@yhuai Thanks for pointing it out! I will make the change now. |
Test build #45911 has finished for PR 9542 at commit
|
.distinct().collect().length == 1) | ||
sql("DROP TABLE external_parquet") | ||
|
||
// Non-External parquet pointing to /tmp/... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems we do not need to say where it points to since it is a managed table.
@xwu0226 Looks good! I left a few comments regarding the format. |
Test build #45956 has finished for PR 9542 at commit
|
Accidentially pushed another JIRA's code together. . I am backing it out |
LGTM pending jenkins. |
@xwu0226 Sorry for asking you to update several times. I just realized that you added a bunch of files in |
Test build #45959 has finished for PR 9542 at commit
|
@yhuai I did not know that we should not update the resources/data directory.. I thought the test data files were added along the way by contributors. Thanks for pointing it out! Let me update HiveUDFSuite then. |
@xwu0226 Thank you! |
oh seems there is a conflict... |
@yhuai Is it mergable? |
@xwu0226 Can you resolve the conflict? Once you update the pr and jenkins is good, I will merge it. Thanks! |
Test build #45970 has finished for PR 9542 at commit
|
test this please |
Test build #45979 has finished for PR 9542 at commit
|
Test build #45977 has finished for PR 9542 at commit
|
@yhuai The last test build passed. Do you know what might cause the previous errors? After resolving the conflicts, my own diff for this PR is still the same place, that passed test before. Hope it did not break anything. Thanks! |
@xwu0226 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45977/consoleFull is good. I will merge it to master and branch 1.6. |
When computing partition for non-parquet relation, `HadoopRDD.compute` is used. but it does not set the thread local variable `inputFileName` in `NewSqlHadoopRDD`, like `NewSqlHadoopRDD.compute` does.. Yet, when getting the `inputFileName`, `NewSqlHadoopRDD.inputFileName` is exptected, which is empty now. Adding the setting inputFileName in HadoopRDD.compute resolves this issue. Author: xin Wu <xinwu@us.ibm.com> Closes #9542 from xwu0226/SPARK-11522. (cherry picked from commit 0e79604) Signed-off-by: Yin Huai <yhuai@databricks.com>
@yhuai Many thanks! |
When computing partition for non-parquet relation,
HadoopRDD.compute
is used. but it does not set the thread local variableinputFileName
inNewSqlHadoopRDD
, likeNewSqlHadoopRDD.compute
does.. Yet, when getting theinputFileName
,NewSqlHadoopRDD.inputFileName
is exptected, which is empty now.Adding the setting inputFileName in HadoopRDD.compute resolves this issue.