New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-36135][SQL] Support TimestampNTZ type in file partitioning #33344
Conversation
@@ -867,6 +867,9 @@ object TypeCoercion extends TypeCoercionBase { | |||
case (_: TimestampType, _: DateType) | (_: DateType, _: TimestampType) => | |||
Some(TimestampType) | |||
|
|||
case (_: TimestampNTZType, _: DateType) | (_: DateType, _: TimestampNTZType) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed for a test case with mixed Date
& TimestampNTZ
partition columns
@@ -94,7 +94,7 @@ public static void populate(WritableColumnVector col, InternalRow row, int field | |||
col.getChild(1).putLongs(0, capacity, c.microseconds); | |||
} else if (t instanceof DateType) { | |||
col.putInts(0, capacity, row.getInt(fieldIdx)); | |||
} else if (t instanceof TimestampType) { | |||
} else if (t instanceof TimestampType || t instanceof TimestampNTZType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The partition schema is validated in:
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L328
We need this to pass the tests.
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #141018 has finished for PR 33344 at commit
|
// The inferred timestmap type is consistent with the value of `SQLConf.TIMESTAMP_TYPE` | ||
Seq(TimestampTypes.TIMESTAMP_LTZ, TimestampTypes.TIMESTAMP_NTZ).foreach { tsType => | ||
withSQLConf(SQLConf.TIMESTAMP_TYPE.key -> tsType.toString) { | ||
check("1990-02-24 12:00:30", SQLConf.get.timestampType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: SQLConf.get.timestampType
-> tsType
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tsType
is ENUM type in SQLConf.
s"hdfs://host:9000/path/a=2014-01-01 00%3A01%3A00.0/b=$defaultPartitionName"), | ||
PartitionSpec( | ||
StructType(Seq( | ||
StructField("a", SQLConf.get.timestampType), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
DecimalType(10, 5), | ||
DecimalType.SYSTEM_DEFAULT, | ||
DateType, | ||
SQLConf.get.timestampType, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
DoubleType, | ||
DecimalType(20, 0), | ||
DateType, | ||
SQLConf.get.timestampType, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
Merging to master/3.2 |
### What changes were proposed in this pull request? Support TimestampNTZ type in file partitioning * When there is no provided schema and the default Timestamp type is TimestampNTZ , Spark should infer and parse the timestamp value partitions as TimestampNTZ. * When the provided Partition schema is TimestampNTZ, Spark should be able to parse the TimestampNTZ type partition column. ### Why are the changes needed? File partitioning is an important feature and Spark should support TimestampNTZ type in it. ### Does this PR introduce _any_ user-facing change? Yes, Spark supports TimestampNTZ type in file partitioning ### How was this patch tested? Unit tests Closes #33344 from gengliangwang/partition. Authored-by: Gengliang Wang <gengliang@apache.org> Signed-off-by: Gengliang Wang <gengliang@apache.org> (cherry picked from commit 96c2919) Signed-off-by: Gengliang Wang <gengliang@apache.org>
What changes were proposed in this pull request?
Support TimestampNTZ type in file partitioning
Why are the changes needed?
File partitioning is an important feature and Spark should support TimestampNTZ type in it.
Does this PR introduce any user-facing change?
Yes, Spark supports TimestampNTZ type in file partitioning
How was this patch tested?
Unit tests