[SPARK-36135][SQL] Support TimestampNTZ type in file partitioning #33344

gengliangwang · 2021-07-14T12:34:48Z

What changes were proposed in this pull request?

Support TimestampNTZ type in file partitioning

When there is no provided schema and the default Timestamp type is TimestampNTZ , Spark should infer and parse the timestamp value partitions as TimestampNTZ.
When the provided Partition schema is TimestampNTZ, Spark should be able to parse the TimestampNTZ type partition column.

Why are the changes needed?

File partitioning is an important feature and Spark should support TimestampNTZ type in it.

Does this PR introduce any user-facing change?

Yes, Spark supports TimestampNTZ type in file partitioning

How was this patch tested?

Unit tests

gengliangwang · 2021-07-14T12:35:34Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala

@@ -867,6 +867,9 @@ object TypeCoercion extends TypeCoercionBase {
      case (_: TimestampType, _: DateType) | (_: DateType, _: TimestampType) =>
        Some(TimestampType)

+      case (_: TimestampNTZType, _: DateType) | (_: DateType, _: TimestampNTZType) =>


This is needed for a test case with mixed Date & TimestampNTZ partition columns

gengliangwang · 2021-07-14T12:38:03Z

sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVectorUtils.java

@@ -94,7 +94,7 @@ public static void populate(WritableColumnVector col, InternalRow row, int field
        col.getChild(1).putLongs(0, capacity, c.microseconds);
      } else if (t instanceof DateType) {
        col.putInts(0, capacity, row.getInt(fieldIdx));
-      } else if (t instanceof TimestampType) {
+      } else if (t instanceof TimestampType || t instanceof TimestampNTZType) {


The partition schema is validated in:
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L328

We need this to pass the tests.

SparkQA · 2021-07-14T13:20:28Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45533/

SparkQA · 2021-07-14T13:52:56Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45533/

SparkQA · 2021-07-14T17:27:39Z

Test build #141018 has finished for PR 33344 at commit aa853c5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2021-07-15T16:49:28Z

...cala/org/apache/spark/sql/execution/datasources/parquet/ParquetPartitionDiscoverySuite.scala

+    // The inferred timestmap type is consistent with the value of `SQLConf.TIMESTAMP_TYPE`
+    Seq(TimestampTypes.TIMESTAMP_LTZ, TimestampTypes.TIMESTAMP_NTZ).foreach { tsType =>
+      withSQLConf(SQLConf.TIMESTAMP_TYPE.key -> tsType.toString) {
+        check("1990-02-24 12:00:30", SQLConf.get.timestampType)


nit: SQLConf.get.timestampType -> tsType

tsType is ENUM type in SQLConf.

cloud-fan · 2021-07-15T16:50:27Z

...cala/org/apache/spark/sql/execution/datasources/parquet/ParquetPartitionDiscoverySuite.scala

+          s"hdfs://host:9000/path/a=2014-01-01 00%3A01%3A00.0/b=$defaultPartitionName"),
+          PartitionSpec(
+            StructType(Seq(
+              StructField("a", SQLConf.get.timestampType),


cloud-fan · 2021-07-15T16:50:48Z

...cala/org/apache/spark/sql/execution/datasources/parquet/ParquetPartitionDiscoverySuite.scala

+            DecimalType(10, 5),
+            DecimalType.SYSTEM_DEFAULT,
+            DateType,
+            SQLConf.get.timestampType,


cloud-fan · 2021-07-15T16:50:58Z

...cala/org/apache/spark/sql/execution/datasources/parquet/ParquetPartitionDiscoverySuite.scala

+            DoubleType,
+            DecimalType(20, 0),
+            DateType,
+            SQLConf.get.timestampType,


gengliangwang · 2021-07-15T17:13:06Z

Merging to master/3.2

### What changes were proposed in this pull request? Support TimestampNTZ type in file partitioning * When there is no provided schema and the default Timestamp type is TimestampNTZ , Spark should infer and parse the timestamp value partitions as TimestampNTZ. * When the provided Partition schema is TimestampNTZ, Spark should be able to parse the TimestampNTZ type partition column. ### Why are the changes needed? File partitioning is an important feature and Spark should support TimestampNTZ type in it. ### Does this PR introduce _any_ user-facing change? Yes, Spark supports TimestampNTZ type in file partitioning ### How was this patch tested? Unit tests Closes #33344 from gengliangwang/partition. Authored-by: Gengliang Wang <gengliang@apache.org> Signed-off-by: Gengliang Wang <gengliang@apache.org> (cherry picked from commit 96c2919) Signed-off-by: Gengliang Wang <gengliang@apache.org>

gengliangwang added 2 commits July 12, 2021 21:29

partition

3fc0171

add tests

aa853c5

github-actions bot added the SQL label Jul 14, 2021

gengliangwang requested review from MaxGekk and cloud-fan July 14, 2021 12:35

gengliangwang commented Jul 14, 2021

View reviewed changes

cloud-fan reviewed Jul 15, 2021

View reviewed changes

cloud-fan approved these changes Jul 15, 2021

View reviewed changes

gengliangwang closed this in 96c2919 Jul 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-36135][SQL] Support TimestampNTZ type in file partitioning #33344

[SPARK-36135][SQL] Support TimestampNTZ type in file partitioning #33344

gengliangwang commented Jul 14, 2021

gengliangwang Jul 14, 2021

gengliangwang Jul 14, 2021

SparkQA commented Jul 14, 2021

SparkQA commented Jul 14, 2021

SparkQA commented Jul 14, 2021

cloud-fan Jul 15, 2021

gengliangwang Jul 15, 2021

cloud-fan Jul 15, 2021

cloud-fan Jul 15, 2021

cloud-fan Jul 15, 2021

gengliangwang commented Jul 15, 2021

[SPARK-36135][SQL] Support TimestampNTZ type in file partitioning #33344

[SPARK-36135][SQL] Support TimestampNTZ type in file partitioning #33344

Conversation

gengliangwang commented Jul 14, 2021

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

gengliangwang Jul 14, 2021

Choose a reason for hiding this comment

gengliangwang Jul 14, 2021

Choose a reason for hiding this comment

SparkQA commented Jul 14, 2021

SparkQA commented Jul 14, 2021

SparkQA commented Jul 14, 2021

cloud-fan Jul 15, 2021

Choose a reason for hiding this comment

gengliangwang Jul 15, 2021

Choose a reason for hiding this comment

cloud-fan Jul 15, 2021

Choose a reason for hiding this comment

cloud-fan Jul 15, 2021

Choose a reason for hiding this comment

cloud-fan Jul 15, 2021

Choose a reason for hiding this comment

gengliangwang commented Jul 15, 2021