[SPARK-33529][SQL] Handle '__HIVE_DEFAULT_PARTITION__' while resolving V2 partition specs #30482

MaxGekk · 2020-11-24T09:50:19Z

What changes were proposed in this pull request?

Extract the code for partition values casting from DSv1 to the common place sql.util.PartitioningUtils - the method castPartitionValues().
Re-use castPartitionValues() from DSv2 resolver of partition specs - ResolvePartitionSpec.

Why are the changes needed?

To have the same behavior as DSv1 which interprets __HIVE_DEFAULT_PARTITION__ as NULL:

spark-sql> CREATE TABLE tbl11 (id int, part0 string) USING parquet PARTITIONED BY (part0);
spark-sql> ALTER TABLE tbl11 ADD PARTITION (part0 = '__HIVE_DEFAULT_PARTITION__');
spark-sql> INSERT INTO tbl11 PARTITION (part0='__HIVE_DEFAULT_PARTITION__') SELECT 1;
spark-sql> SELECT * FROM tbl11;
1	NULL

Does this PR introduce any user-facing change?

Yes

How was this patch tested?

Add new test to AlterTablePartitionV2SQLSuite.

MaxGekk · 2020-11-24T09:51:48Z

sql/catalyst/src/main/scala/org/apache/spark/sql/util/PartitioningUtils.scala


-object PartitioningUtils {
+private[sql] object PartitioningUtils {


Addressed @cloud-fan 's comment #30454 (comment)

MaxGekk · 2020-11-24T09:54:30Z

sql/core/src/test/scala/org/apache/spark/sql/connector/AlterTablePartitionV2SQLSuite.scala

+        .asTableCatalog
+        .loadTable(Identifier.of(Array("ns1", "ns2"), "tbl"))
+        .asPartitionable
+      val expectedPartition = InternalRow.fromSeq(Seq[Any](null))


'__HIVE_DEFAULT_PARTITION__' should be handled as null

SparkQA · 2020-11-24T10:07:35Z

Test build #131641 has finished for PR 30482 at commit 4ad95c5.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-11-24T11:13:50Z

sql/core/src/test/scala/org/apache/spark/sql/connector/AlterTablePartitionV2SQLSuite.scala

+        .asPartitionable
+      val expectedPartition = InternalRow.fromSeq(Seq[Any](null))
+      assert(!partTable.partitionExists(expectedPartition))
+      val partSpec = "PARTITION (part0 = '__HIVE_DEFAULT_PARTITION__')"


I'm not sure about it. It's more like a hive specific thing and we should let v2 implementation to decide how to handle null partition values. This should be internal details and shouldn't be exposed to end users.

ok. How can users specify null partition value?

does part_col = null work?

For example, if we have a string partitioned column - how could we distinguish null from "null"?

The parser should recognize different literals, e.g. part_col = null and part_col = "null"

does part_col = null work?

I have checked that. null is recognized as a string "null".

It's more like a hive specific thing and we should let v2 implementation to decide ...

It is already Spark specific thing too. Implementations don't see '__HIVE_DEFAULT_PARTITION__' at all because it is replaced by null at the analyzing phase.

SparkQA · 2020-11-24T12:35:59Z

Test build #131647 has finished for PR 30482 at commit 26b83a1.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

MaxGekk · 2020-11-24T19:25:04Z

jenkins, retest this, please

SparkQA · 2020-11-24T20:02:32Z

Test build #131694 has started for PR 30482 at commit 26b83a1.

MaxGekk · 2020-11-26T07:03:27Z

jenkins, retest this, please

SparkQA · 2020-11-26T11:03:15Z

Test build #131833 has finished for PR 30482 at commit 26b83a1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

MaxGekk · 2020-12-28T18:46:17Z

@cloud-fan Should I close this?

cloud-fan · 2020-12-29T13:44:53Z

Yea let's close it. __HIVE_DEFAULT_PARTITION__ should just be a normal string. Only hive catalog should handle it specially.

MaxGekk added 7 commits November 24, 2020 12:22

Add a test

3ea40a7

Add JIRA to test's title

052c017

Add castPartitionValues()

846589f

Re-use castPartitionValues()

a2fc4a3

Minor changes

eadb9a7

Remove loc

687cd6c

Add private[sql] to PartitioningUtils

4ad95c5

MaxGekk commented Nov 24, 2020

View reviewed changes

MaxGekk mentioned this pull request Nov 24, 2020

[SPARK-33521][SQL] Universal type conversion in resolving V2 partition specs #30474

Closed

MaxGekk commented Nov 24, 2020

View reviewed changes

Remove unused imports

26b83a1

github-actions bot added the SQL label Nov 24, 2020

cloud-fan reviewed Nov 24, 2020

View reviewed changes

MaxGekk closed this Dec 29, 2020

MaxGekk deleted the dsv2-default-hive-partition branch February 19, 2021 15:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-33529][SQL] Handle '__HIVE_DEFAULT_PARTITION__' while resolving V2 partition specs #30482

[SPARK-33529][SQL] Handle '__HIVE_DEFAULT_PARTITION__' while resolving V2 partition specs #30482

MaxGekk commented Nov 24, 2020

MaxGekk Nov 24, 2020

MaxGekk Nov 24, 2020 •

edited

SparkQA commented Nov 24, 2020

cloud-fan Nov 24, 2020

MaxGekk Nov 24, 2020

cloud-fan Nov 24, 2020

MaxGekk Nov 24, 2020

cloud-fan Nov 24, 2020

MaxGekk Nov 24, 2020

MaxGekk Nov 24, 2020

SparkQA commented Nov 24, 2020

MaxGekk commented Nov 24, 2020

SparkQA commented Nov 24, 2020

MaxGekk commented Nov 26, 2020

SparkQA commented Nov 26, 2020

MaxGekk commented Dec 28, 2020

cloud-fan commented Dec 29, 2020


		object PartitioningUtils {
		private[sql] object PartitioningUtils {

[SPARK-33529][SQL] Handle '__HIVE_DEFAULT_PARTITION__' while resolving V2 partition specs #30482

[SPARK-33529][SQL] Handle '__HIVE_DEFAULT_PARTITION__' while resolving V2 partition specs #30482

Conversation

MaxGekk commented Nov 24, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Choose a reason for hiding this comment

MaxGekk Nov 24, 2020 • edited

Choose a reason for hiding this comment

SparkQA commented Nov 24, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Nov 24, 2020

MaxGekk commented Nov 24, 2020

SparkQA commented Nov 24, 2020

MaxGekk commented Nov 26, 2020

SparkQA commented Nov 26, 2020

MaxGekk commented Dec 28, 2020

cloud-fan commented Dec 29, 2020

MaxGekk Nov 24, 2020 •

edited