[SPARK-19833][SQL]remove SQLConf.HIVE_VERIFY_PARTITION_PATH, always return empty when the location does not exists #17176

windpiger · 2017-03-06T09:18:30Z

What changes were proposed in this pull request?

In SPARK-5068, we introduce a SQLConf spark.sql.hive.verifyPartitionPath,
if it is set to true, it will avoid the task failed when the patition location does not exists in the filesystem.

this situation should always return emtpy and don't lead to the task failed, here we remove this conf.

And the function verifyPartitionPath has a bug ,that if the partition path is custom path

it will still do filter for all partition path in the parameter partitionToDeserializer,
it will scan the path which does not belong to the table ,e.g. custom path is /root/a
and the partitionSpec is b=1/c=2, this will lead to scan / because of the getPathPatternByPath

How was this patch tested?

modify a test case

…eturn empty when the location does not exists

SparkQA · 2017-03-06T10:26:59Z

Test build #73991 has finished for PR 17176 at commit 95aa931.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

windpiger · 2017-03-06T12:14:24Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala

+          }
+          // convert  /demo/data/year/month/day  to  /demo/data/*/*/*/
+          def getPathPatternByPath(parNum: Int, tempPath: Path, partitionName: String): String = {
+            // if the partition path does not end with partition name, we should not


if the partition location has been altered to another location, we should not do this pattern, or we will list pattern files which does not belong to the partition

SparkQA · 2017-03-06T13:14:57Z

Test build #73998 has finished for PR 17176 at commit 8128567.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-03-06T13:41:36Z

Test build #73992 has finished for PR 17176 at commit 4bb0e28.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

windpiger · 2017-03-06T15:09:16Z

retest this please

SparkQA · 2017-03-06T16:22:27Z

Test build #74016 has finished for PR 17176 at commit 8128567.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

windpiger · 2017-03-07T05:26:24Z

why jenkins failed...

…yPath

SparkQA · 2017-03-07T06:53:01Z

Test build #74072 has finished for PR 17176 at commit 22b1f53.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

windpiger · 2017-03-07T13:07:11Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala

@@ -159,36 +159,11 @@ class HadoopTableReader(
    def verifyPartitionPath(
        partitionToDeserializer: Map[HivePartition, Class[_ <: Deserializer]]):
        Map[HivePartition, Class[_ <: Deserializer]] = {
-      if (!sparkSession.sessionState.conf.verifyPartitionPath) {


after this pr https://github.com/apache/spark/pull/17187， read hive table which does not use stored by will not use HiveTableScanExec.

this function has a bug ,that if the partition path is custom path

it will still do filter for all partition path in the parameter partitionToDeserializer,

it will scan the path which does not belong to the table ,e.g. custom path is /root/a
and the partitionSpec is b=1/c=2, this will lead to scan / because of the getPathPatternByPath

SparkQA · 2017-03-07T14:59:41Z

Test build #74106 has finished for PR 17176 at commit 262e2f2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-03-07T15:56:12Z

Test build #74107 has finished for PR 17176 at commit 3a15e5d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-10-28T04:35:58Z

@windpiger If you do not have a bandwidth to continue it, how about closing it now?

barrenlake · 2017-12-04T07:46:22Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala

+        case (partition, partDeserializer) =>
+          val partPath = partition.getDataLocation
+          val fs = partPath.getFileSystem(hadoopConf)
+          fs.exists(partPath)


Each partition sending an RPC request to the NameNode can result in poor performance

[SPARK-19833][SQL]remove SQLConf.HIVE_VERIFY_PARTITION_PATH, always r…

95aa931

…eturn empty when the location does not exists

fix a bug

8128567

windpiger force-pushed the removeHiveVerfiyPath branch from 4bb0e28 to 8128567 Compare March 6, 2017 12:05

windpiger commented Mar 6, 2017

View reviewed changes

windpiger added 2 commits March 7, 2017 13:42

add log to find why jenkins failed

22b1f53

Merge branch 'master' of github.com:apache/spark into removeHiveVerfi…

5fd0e20

…yPath

fix test failed

262e2f2

windpiger commented Mar 7, 2017

View reviewed changes

remove log

3a15e5d

barrenlake reviewed Dec 4, 2017

View reviewed changes

HyukjinKwon mentioned this pull request Nov 11, 2018

[INFRA] Close stale PRs #23001

Closed

asfgit closed this in a3ba3a8 Nov 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-19833][SQL]remove SQLConf.HIVE_VERIFY_PARTITION_PATH, always return empty when the location does not exists #17176

[SPARK-19833][SQL]remove SQLConf.HIVE_VERIFY_PARTITION_PATH, always return empty when the location does not exists #17176

windpiger commented Mar 6, 2017 •

edited

Loading

SparkQA commented Mar 6, 2017

windpiger Mar 6, 2017

SparkQA commented Mar 6, 2017

SparkQA commented Mar 6, 2017

windpiger commented Mar 6, 2017

SparkQA commented Mar 6, 2017

windpiger commented Mar 7, 2017

SparkQA commented Mar 7, 2017

windpiger Mar 7, 2017

SparkQA commented Mar 7, 2017

SparkQA commented Mar 7, 2017

gatorsmile commented Oct 28, 2017

barrenlake Dec 4, 2017

[SPARK-19833][SQL]remove SQLConf.HIVE_VERIFY_PARTITION_PATH, always return empty when the location does not exists #17176

[SPARK-19833][SQL]remove SQLConf.HIVE_VERIFY_PARTITION_PATH, always return empty when the location does not exists #17176

Conversation

windpiger commented Mar 6, 2017 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Mar 6, 2017

windpiger Mar 6, 2017

Choose a reason for hiding this comment

SparkQA commented Mar 6, 2017

SparkQA commented Mar 6, 2017

windpiger commented Mar 6, 2017

SparkQA commented Mar 6, 2017

windpiger commented Mar 7, 2017

SparkQA commented Mar 7, 2017

windpiger Mar 7, 2017

Choose a reason for hiding this comment

SparkQA commented Mar 7, 2017

SparkQA commented Mar 7, 2017

gatorsmile commented Oct 28, 2017

barrenlake Dec 4, 2017

Choose a reason for hiding this comment

windpiger commented Mar 6, 2017 •

edited

Loading