[SPARK-30004][SQL] Allow merge UserDefinedType into a native DataType #26644

Fokko · 2019-11-23T15:59:45Z

What changes were proposed in this pull request?

In case you write a UDT, you always need to read it with the UDT registered. In many cases, you want to write it, and then convert it into a native DataType.

In the case of Delta or when appending a partition, you can write to the same table and then it needs to be able to convert merge the UDT into the native type again.

Why are the changes needed?

When appending data to the table, I get the exception:

Failed to merge fields 'START_DATE_MAINTENANCE_FLPL' and 'START_DATE_MAINTENANCE_FLPL'.
Failed to merge incompatible data types TimestampType and org.apache.spark.sql.types.CustomXMLGregorianCalendarType@5ff12345;;

Does this PR introduce any user-facing change?

How was this patch tested?

Add a unit test to the DataTypeSuite.scala
Add an integration test to UserDefinedTypeSuite.scala

https://jira.apache.org/jira/browse/SPARK-30004

dongjoon-hyun · 2019-11-23T19:20:24Z

ok to test

dongjoon-hyun · 2019-11-23T19:29:47Z

Thank you for making this PR, @Fokko .

Fokko · 2019-11-23T19:54:24Z

My pleasure @dongjoon-hyun

SparkQA · 2019-11-23T23:19:43Z

Test build #114320 has finished for PR 26644 at commit 741a070.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2019-11-23T23:36:33Z

sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala

@@ -592,6 +592,9 @@ object StructType extends AbstractDataType {
      case (leftUdt: UserDefinedType[_], rightUdt: UserDefinedType[_])
        if leftUdt.userClass == rightUdt.userClass => leftUdt

+      case (leftType, rightUdt: UserDefinedType[_])


Since this is beyond our existing rule, shall we update the function description accordingly?

https://github.com/apache/spark/pull/26644/files#diff-e9d42ebfb254a463cccff76e8983242cR447-R455

I've added a Scaladoc to the private merge function. I think the Javadoc that you're pointing to, is describing the function at a different level. For example, it doesn't mention any UDT's at all. Let me know what you think.

dongjoon-hyun · 2019-11-23T23:40:12Z

sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala

+    val right = StructType(
+      StructField("a", new CustomXMLGregorianCalendarType) :: Nil)
+
+    assert(left.merge(right) === left)


Shall we check the opposite case, too?

The current implementation isn't symmetrical. So we can convert a UserDefinedType to a DateType, but not the other way around. I'm hesitant to add this functionality because I don't see any obvious applications. Please let me know if you think this should be added as well. I've added a test to check the opposite case as well, including some additional comments to clarify the idea and working.

sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala

maropu · 2019-11-24T01:12:15Z

Could you add end-to-end tests somewhere, e.g., UserDefinedTypeSuite, SQLQuerySuite, ...?

SparkQA · 2019-11-24T11:57:20Z

Test build #114344 has finished for PR 26644 at commit bc9a3c3.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-11-24T12:37:14Z

Test build #114346 has finished for PR 26644 at commit 0da1628.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-11-24T12:44:58Z

Test build #114347 has finished for PR 26644 at commit e7b449d.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-11-24T13:41:22Z

Test build #114350 has finished for PR 26644 at commit dd4c6c4.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-11-24T14:13:05Z

Test build #114348 has finished for PR 26644 at commit 2f2d676.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-11-24T15:03:27Z

Test build #114352 has finished for PR 26644 at commit 74f6951.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-11-24T15:20:35Z

Test build #114353 has finished for PR 26644 at commit d1ca92a.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-11-24T15:38:39Z

Test build #114351 has finished for PR 26644 at commit a1088d0.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

sql/core/src/test/scala/org/apache/spark/sql/UserDefinedTypeSuite.scala

SparkQA · 2019-11-24T20:18:44Z

Test build #114359 has finished for PR 26644 at commit 8f07b78.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-11-24T22:13:24Z

Test build #114360 has finished for PR 26644 at commit 8c324e3.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-11-25T08:05:02Z

Test build #114377 has finished for PR 26644 at commit 8538eed.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-12-02T12:05:44Z

Test build #114716 has finished for PR 26644 at commit 7aaaee2.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-12-02T22:18:09Z

Test build #114736 has finished for PR 26644 at commit a33ed0b.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-12-03T08:05:02Z

Test build #114757 has finished for PR 26644 at commit 3d68a75.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-12-03T09:53:41Z

Test build #114759 has finished for PR 26644 at commit 8f64856.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-12-03T13:52:22Z

Test build #114774 has finished for PR 26644 at commit 12a2c93.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-12-03T18:13:09Z

Test build #114784 has finished for PR 26644 at commit 539ac96.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

Fokko · 2019-12-05T08:13:42Z

@dongjoon-hyun @HyukjinKwon @maropu Any further thoughts?

Fokko · 2019-12-11T07:21:40Z

Rebased onto master

SparkQA · 2019-12-11T08:05:02Z

Test build #115158 has finished for PR 26644 at commit e976297.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

Fokko · 2019-12-11T08:05:51Z

It looks like apache.org is unreachable:

curl: (7) Failed to connect to www.apache.org port 443: Connection timed out

gzip: stdin: unexpected end of file
tar: Child returned status 1
tar: Error is not recoverable: exiting now
Using `mvn` from path: /home/runner/work/spark/spark/build/apache-maven-3.6.3/bin/mvn
./build/mvn: line 172: /home/runner/work/spark/spark/build/apache-maven-3.6.3/bin/mvn: No such file or directory

In case you write a UDT, you always need to read it with the UDT registered. In many cases you want to write it, and then convert it into a native DataType. In the case of Delta or when appending a partition, you can write to the same table and then it needs to be able to convert merge the UDT into the native type again. * Add a test to the DataTypeSuite.scala

SparkQA · 2019-12-11T12:55:13Z

Test build #115165 has finished for PR 26644 at commit b378628.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

…0004

SparkQA · 2019-12-30T20:21:17Z

Test build #115961 has finished for PR 26644 at commit dd3c913.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2020-01-03T07:24:33Z

sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala

+        if leftType == rightUdt.sqlType => leftType
+
+      case (leftUdt: UserDefinedType[_], rightType)
+        if leftUdt.sqlType == rightType => rightType


@Fokko, sorry for my late response. I doubt if we should allow this case.

Currently, merge only allows the same types but UDT <> UDT's SQL types are not the same types. I think it makes less sense to allow this case alone.

Also, this https://github.com/apache/spark/pull/26644/files#r350486670 looks weird. jsonValue seems it should have JSON-serialized value of its own type.

Well, the UserDefinedType extends DataType, similar to TimestampType, StringType, and any other type. The thing is that a UserDefinedType can be compatible with any other type. For example, it is allowed to merge an int into a long. This is an explicit choice by the developer.

it is allowed to merge an int into a long.

But this StructType.merge does not allow such type merging. Given that, it looks weird to allow only UDT.

scala> import org.apache.spark.sql.types._ import org.apache.spark.sql.types._ scala> StructType.merge(LongType, IntegerType) org.apache.spark.SparkException: Failed to merge incompatible data types bigint and int at org.apache.spark.sql.types.StructType$.merge(StructType.scala:600) ... 49 elided

github-actions · 2020-04-16T00:10:06Z

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

Fokko · 2020-04-26T09:01:32Z

I've been thinking of this a lot, but could not come up with a clean solution. I'll leave it for now.

Fokko changed the title ~~SPARK-30004: Allow merge UserDefinedType into normal DataType~~ SPARK-30004: Allow merge UserDefinedType into a native DataType Nov 23, 2019

dongjoon-hyun changed the title ~~SPARK-30004: Allow merge UserDefinedType into a native DataType~~ [SPARK-30004][SQL] Allow merge UserDefinedType into a native DataType Nov 23, 2019

dongjoon-hyun added the SQL label Nov 23, 2019

dongjoon-hyun reviewed Nov 23, 2019

View reviewed changes

maropu reviewed Nov 24, 2019

View reviewed changes

sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala Outdated Show resolved Hide resolved

Fokko force-pushed the SPARK-30004 branch 2 times, most recently from 2f2d676 to dd4c6c4 Compare November 24, 2019 13:28

Fokko force-pushed the SPARK-30004 branch from dd4c6c4 to a1088d0 Compare November 24, 2019 13:44

Fokko force-pushed the SPARK-30004 branch from a1088d0 to 74f6951 Compare November 24, 2019 15:00

Fokko force-pushed the SPARK-30004 branch from 74f6951 to d1ca92a Compare November 24, 2019 15:09

dongjoon-hyun reviewed Nov 24, 2019

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/UserDefinedTypeSuite.scala Show resolved Hide resolved

Fokko force-pushed the SPARK-30004 branch from d1ca92a to 8f07b78 Compare November 24, 2019 19:12

Fokko force-pushed the SPARK-30004 branch from 8538eed to b4d310c Compare November 25, 2019 08:06

Fokko force-pushed the SPARK-30004 branch from f1e854b to 7aaaee2 Compare December 2, 2019 08:22

Fokko mentioned this pull request Dec 2, 2019

[SPARK-30103][SQL] Consolidate Schema merge logic #26737

Closed

Fokko force-pushed the SPARK-30004 branch from 3d68a75 to 8f64856 Compare December 3, 2019 07:56

Fokko force-pushed the SPARK-30004 branch from 8f64856 to 12a2c93 Compare December 3, 2019 12:21

Fokko force-pushed the SPARK-30004 branch from 12a2c93 to 539ac96 Compare December 3, 2019 13:59

Fokko force-pushed the SPARK-30004 branch from 539ac96 to e976297 Compare December 11, 2019 07:21

Fokko added 5 commits December 11, 2019 09:38

Add an integration test

7bfaf14

Make the implementation symmetrical

be45abc

Add udt support to TypeCoercion as well

80e84b8

Updated the integration test

b378628

Fokko force-pushed the SPARK-30004 branch from e976297 to b378628 Compare December 11, 2019 08:40

Merge branch 'master' of https://github.com/apache/spark into SPARK-3…

dd3c913

…0004

HyukjinKwon reviewed Jan 3, 2020

View reviewed changes

github-actions bot added the Stale label Apr 16, 2020

github-actions bot closed this Apr 17, 2020

[SPARK-30004][SQL] Allow merge UserDefinedType into a native DataType #26644

[SPARK-30004][SQL] Allow merge UserDefinedType into a native DataType #26644

Conversation

Fokko commented Nov 23, 2019 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

dongjoon-hyun commented Nov 23, 2019

dongjoon-hyun commented Nov 23, 2019

Fokko commented Nov 23, 2019

SparkQA commented Nov 23, 2019

dongjoon-hyun Nov 23, 2019

Choose a reason for hiding this comment

Fokko Nov 24, 2019

Choose a reason for hiding this comment

dongjoon-hyun Nov 23, 2019 • edited Loading

Choose a reason for hiding this comment

Fokko Nov 24, 2019

Choose a reason for hiding this comment

maropu commented Nov 24, 2019 • edited Loading

SparkQA commented Nov 24, 2019

SparkQA commented Nov 24, 2019

SparkQA commented Nov 24, 2019

SparkQA commented Nov 24, 2019

SparkQA commented Nov 24, 2019

SparkQA commented Nov 24, 2019

SparkQA commented Nov 24, 2019

SparkQA commented Nov 24, 2019

SparkQA commented Nov 24, 2019

SparkQA commented Nov 24, 2019

SparkQA commented Nov 25, 2019

SparkQA commented Dec 2, 2019

SparkQA commented Dec 2, 2019

SparkQA commented Dec 3, 2019

SparkQA commented Dec 3, 2019

SparkQA commented Dec 3, 2019

SparkQA commented Dec 3, 2019

Fokko commented Dec 5, 2019

Fokko commented Dec 11, 2019

SparkQA commented Dec 11, 2019

Fokko commented Dec 11, 2019

SparkQA commented Dec 11, 2019

SparkQA commented Dec 30, 2019

HyukjinKwon Jan 3, 2020

Choose a reason for hiding this comment

Fokko Jan 5, 2020

Choose a reason for hiding this comment

HyukjinKwon Jan 6, 2020

Choose a reason for hiding this comment

github-actions bot commented Apr 16, 2020

Fokko commented Apr 26, 2020

Fokko commented Nov 23, 2019 •

edited

Loading

dongjoon-hyun Nov 23, 2019 •

edited

Loading

maropu commented Nov 24, 2019 •

edited

Loading