[Spark-11522][SQL] input_file_name() returns "" for external tables #9542

xwu0226 · 2015-11-07T17:47:04Z

When computing partition for non-parquet relation, HadoopRDD.compute is used. but it does not set the thread local variable inputFileName in NewSqlHadoopRDD, like NewSqlHadoopRDD.compute does.. Yet, when getting the inputFileName, NewSqlHadoopRDD.inputFileName is exptected, which is empty now.
Adding the setting inputFileName in HadoopRDD.compute resolves this issue.

rxin · 2015-11-08T17:12:45Z

Jenkins, test this please.

SparkQA · 2015-11-08T17:17:09Z

Test build #2014 has finished for PR 9542 at commit 2658f28.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

xwu0226 · 2015-11-08T22:29:33Z

@rxin I pushed again for the scala style test issue. Will the test build be kicked off automatically or manually? Thanks!

squito · 2015-11-09T03:01:07Z

Jenkins, ok to test

SparkQA · 2015-11-09T06:08:52Z

Test build #45330 has finished for PR 9542 at commit b5fa291.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

xwu0226 · 2015-11-11T22:36:37Z

@rxin or @squito , what do you think about the fix? Thanks!

yhuai · 2015-11-14T00:02:13Z

core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala

+      split.inputSplit.value match {
+        case fs: FileSplit => SqlNewHadoopRDD.setInputFileName(fs.getPath.toString)
+        case _ => SqlNewHadoopRDD.unsetInputFileName()
+      }


Can you call SqlNewHadoopRDD.unsetInputFileName() in https://github.com/apache/spark/pull/9542/files#diff-83eb37f7b0ebed3c14ccb7bff0d577c2R257? Like what we do in SqlNewHadoopRDD?

xwu0226 · 2015-11-14T00:51:30Z

@yhuai Thanks for pointing it out! I will make the change now.

SparkQA · 2015-11-14T04:07:54Z

Test build #45911 has finished for PR 9542 at commit c27d030.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yhuai · 2015-11-15T03:15:14Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDFSuite.scala

+      .distinct().collect().length == 1)
+    sql("DROP TABLE external_parquet")
+
+    // Non-External parquet pointing to /tmp/...


Seems we do not need to say where it points to since it is a managed table.

yhuai · 2015-11-15T03:19:18Z

@xwu0226 Looks good! I left a few comments regarding the format.

SparkQA · 2015-11-15T18:43:44Z

Test build #45956 has finished for PR 9542 at commit fe2d6d8.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

xwu0226 · 2015-11-15T19:24:32Z

Accidentially pushed another JIRA's code together. . I am backing it out

yhuai · 2015-11-15T21:09:31Z

LGTM pending jenkins.

yhuai · 2015-11-15T23:08:07Z

@xwu0226 Sorry for asking you to update several times. I just realized that you added a bunch of files in sql/hive/src/test/resources/data/. Since that file is directly copied from hive, we do not change files or add files in there. Can we just generate some test files in the test? We can make HiveUDFSuite extend SQLTestUtils and then use withTempPath to generate temp dirs that can be used for those external tables.

SparkQA · 2015-11-15T23:11:37Z

Test build #45959 has finished for PR 9542 at commit 4481c82.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

xwu0226 · 2015-11-15T23:30:50Z

@yhuai I did not know that we should not update the resources/data directory.. I thought the test data files were added along the way by contributors. Thanks for pointing it out! Let me update HiveUDFSuite then.

yhuai · 2015-11-15T23:34:07Z

@xwu0226 Thank you!

yhuai · 2015-11-16T02:55:06Z

oh seems there is a conflict...

xwu0226 · 2015-11-16T02:58:46Z

@yhuai Is it mergable?

yhuai · 2015-11-16T03:01:24Z

@xwu0226 Can you resolve the conflict? Once you update the pr and jenkins is good, I will merge it. Thanks!

SparkQA · 2015-11-16T04:57:22Z

Test build #45970 has finished for PR 9542 at commit 83b1c77.

This patch fails PySpark unit tests.
This patch does not merge cleanly.
This patch adds no public classes.

yhuai · 2015-11-16T05:06:56Z

test this please

SparkQA · 2015-11-16T07:16:27Z

Test build #45979 has finished for PR 9542 at commit eeaa6b6.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-11-16T07:32:04Z

Test build #45977 has finished for PR 9542 at commit eeaa6b6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

xwu0226 · 2015-11-16T15:48:12Z

@yhuai The last test build passed. Do you know what might cause the previous errors? After resolving the conflicts, my own diff for this PR is still the same place, that passed test before. Hope it did not break anything. Thanks!

yhuai · 2015-11-16T16:09:57Z

@xwu0226 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45977/consoleFull is good. I will merge it to master and branch 1.6.

When computing partition for non-parquet relation, `HadoopRDD.compute` is used. but it does not set the thread local variable `inputFileName` in `NewSqlHadoopRDD`, like `NewSqlHadoopRDD.compute` does.. Yet, when getting the `inputFileName`, `NewSqlHadoopRDD.inputFileName` is exptected, which is empty now. Adding the setting inputFileName in HadoopRDD.compute resolves this issue. Author: xin Wu <xinwu@us.ibm.com> Closes #9542 from xwu0226/SPARK-11522. (cherry picked from commit 0e79604) Signed-off-by: Yin Huai <yhuai@databricks.com>

xwu0226 · 2015-11-16T18:41:27Z

@yhuai Many thanks!

yhuai reviewed Nov 14, 2015
View reviewed changes

yhuai reviewed Nov 15, 2015
View reviewed changes

xwu0226 force-pushed the SPARK-11522 branch from fe2d6d8 to 4481c82 Compare November 15, 2015 20:21

xwu0226 added 10 commits November 15, 2015 20:05

SPARK-11522 input_file_name() returns empty string for external tables

d2b4b26

SPARK-11522 updating testcases

aafff4a

SPARK-11522 update testcase

14bcd1d

SPARK-11522 update testcase

d74dfd8

SPARK-11522 update testcase

ee6bf1c

SPARK-11522 update testcase to resolve scala code style check failure

e0066c0

SPARK-11522 implement review comment

1a0bb3a

SPARK-11522 update testcase to implement remove comments

f909f6e

SPARK-11522 remove added data files and update testcase

12c4f75

SPARK-11522 resolve conflict

eeaa6b6

xwu0226 force-pushed the SPARK-11522 branch from 83b1c77 to eeaa6b6 Compare November 16, 2015 04:38

asfgit closed this in 0e79604 Nov 16, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Spark-11522][SQL] input_file_name() returns "" for external tables #9542

[Spark-11522][SQL] input_file_name() returns "" for external tables #9542

xwu0226 commented Nov 7, 2015

rxin commented Nov 8, 2015

SparkQA commented Nov 8, 2015

xwu0226 commented Nov 8, 2015

squito commented Nov 9, 2015

SparkQA commented Nov 9, 2015

xwu0226 commented Nov 11, 2015

yhuai Nov 14, 2015

xwu0226 commented Nov 14, 2015

SparkQA commented Nov 14, 2015

yhuai Nov 15, 2015

yhuai commented Nov 15, 2015

SparkQA commented Nov 15, 2015

xwu0226 commented Nov 15, 2015

yhuai commented Nov 15, 2015

yhuai commented Nov 15, 2015

SparkQA commented Nov 15, 2015

xwu0226 commented Nov 15, 2015

yhuai commented Nov 15, 2015

yhuai commented Nov 16, 2015

xwu0226 commented Nov 16, 2015

yhuai commented Nov 16, 2015

SparkQA commented Nov 16, 2015

yhuai commented Nov 16, 2015

SparkQA commented Nov 16, 2015

SparkQA commented Nov 16, 2015

xwu0226 commented Nov 16, 2015

yhuai commented Nov 16, 2015

xwu0226 commented Nov 16, 2015

[Spark-11522][SQL] input_file_name() returns "" for external tables #9542

[Spark-11522][SQL] input_file_name() returns "" for external tables #9542

Conversation

xwu0226 commented Nov 7, 2015

rxin commented Nov 8, 2015

SparkQA commented Nov 8, 2015

xwu0226 commented Nov 8, 2015

squito commented Nov 9, 2015

SparkQA commented Nov 9, 2015

xwu0226 commented Nov 11, 2015

yhuai Nov 14, 2015

Choose a reason for hiding this comment

xwu0226 commented Nov 14, 2015

SparkQA commented Nov 14, 2015

yhuai Nov 15, 2015

Choose a reason for hiding this comment

yhuai commented Nov 15, 2015

SparkQA commented Nov 15, 2015

xwu0226 commented Nov 15, 2015

yhuai commented Nov 15, 2015

yhuai commented Nov 15, 2015

SparkQA commented Nov 15, 2015

xwu0226 commented Nov 15, 2015

yhuai commented Nov 15, 2015

yhuai commented Nov 16, 2015

xwu0226 commented Nov 16, 2015

yhuai commented Nov 16, 2015

SparkQA commented Nov 16, 2015

yhuai commented Nov 16, 2015

SparkQA commented Nov 16, 2015

SparkQA commented Nov 16, 2015

xwu0226 commented Nov 16, 2015

yhuai commented Nov 16, 2015

xwu0226 commented Nov 16, 2015