[SPARK-51661][SQL] Partitions discovery of TIME column values #50453

MaxGekk · 2025-03-29T13:15:48Z

What changes were proposed in this pull request?

In the PR, I propose to infer the TIME data type from partition values that match to the pattern HH:mm:ss[.SSSSSS]. The second fraction part has variable length namely the following values match to the pattern: 01:02:03.001, 23:59:59 and 12:13:14.123456.

Why are the changes needed?

Currently, Spark can save a dataset partitioned by a TIME column, and read it back if an user set a schema explicitly, but it cannot infer the TIME data type of the column automatically. For example:

scala> sql("SELECT time'12:00' AS t, 0 as id").write.partitionBy("t").parquet("/Users/maxim.gekk/tmp/time_parquet2")
scala> spark.read.parquet("/Users/maxim.gekk/tmp/time_parquet2").printSchema()
root
 |-- id: integer (nullable = true)
 |-- t: string (nullable = true)

Does this PR introduce any user-facing change?

Yes. After the changes, the inferred type is TIME(6) instead of STRING for the example above:

scala> spark.read.parquet("/Users/maxim.gekk/tmp/time_parquet2").printSchema()
root
 |-- id: integer (nullable = true)
 |-- t: time(6) (nullable = true)

How was this patch tested?

By running new test:

$ build/sbt "test:testOnly *ParquetV1PartitionDiscoverySuite"
$ build/sbt "test:testOnly *ParquetV2PartitionDiscoverySuite"

Was this patch authored or co-authored using generative AI tooling?

No.

dongjoon-hyun · 2025-03-30T22:50:26Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala

        // Then falls back to date/timestamp types
        .orElse(timestampTry)
        .orElse(dateTry)
+        .orElse(timeTry)


Thank you for putting this here.

dongjoon-hyun

+1, this looks good to me. Thank you, @MaxGekk .

MaxGekk · 2025-03-31T05:17:54Z

Merging to master. Thank you, @dongjoon-hyun @HyukjinKwon for review.

Infer the TIME type from time partition values

c152e47

github-actions bot added the SQL label Mar 29, 2025

MaxGekk changed the title ~~[WIP][SQL] Infer the TIME type from time partition values~~ [WIP][SQL] Partitions discovery of TIME column values Mar 29, 2025

MaxGekk changed the title ~~[WIP][SQL] Partitions discovery of TIME column values~~ [WIP][SPARK-51661][SQL] Partitions discovery of TIME column values Mar 29, 2025

MaxGekk added 3 commits March 29, 2025 21:16

Test for invalid partitions

62daf68

Update a table

b114698

Test resolving conflicts

87a5522

MaxGekk changed the title ~~[WIP][SPARK-51661][SQL] Partitions discovery of TIME column values~~ [SPARK-51661][SQL] Partitions discovery of TIME column values Mar 30, 2025

MaxGekk marked this pull request as ready for review March 30, 2025 06:15

dongjoon-hyun reviewed Mar 30, 2025

View reviewed changes

dongjoon-hyun approved these changes Mar 30, 2025

View reviewed changes

HyukjinKwon approved these changes Mar 30, 2025

View reviewed changes

MaxGekk closed this in c8c8c9a Mar 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-51661][SQL] Partitions discovery of TIME column values #50453

[SPARK-51661][SQL] Partitions discovery of TIME column values #50453

Uh oh!

MaxGekk commented Mar 29, 2025 •

edited

Loading

Uh oh!

dongjoon-hyun Mar 30, 2025

Uh oh!

dongjoon-hyun left a comment

Uh oh!

MaxGekk commented Mar 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-51661][SQL] Partitions discovery of TIME column values #50453

[SPARK-51661][SQL] Partitions discovery of TIME column values #50453

Uh oh!

Conversation

MaxGekk commented Mar 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

dongjoon-hyun Mar 30, 2025

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

MaxGekk commented Mar 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MaxGekk commented Mar 29, 2025 •

edited

Loading