Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Mar 29, 2025

What changes were proposed in this pull request?

In the PR, I propose to infer the TIME data type from partition values that match to the pattern HH:mm:ss[.SSSSSS]. The second fraction part has variable length namely the following values match to the pattern: 01:02:03.001, 23:59:59 and 12:13:14.123456.

Why are the changes needed?

Currently, Spark can save a dataset partitioned by a TIME column, and read it back if an user set a schema explicitly, but it cannot infer the TIME data type of the column automatically. For example:

scala> sql("SELECT time'12:00' AS t, 0 as id").write.partitionBy("t").parquet("/Users/maxim.gekk/tmp/time_parquet2")
scala> spark.read.parquet("/Users/maxim.gekk/tmp/time_parquet2").printSchema()
root
 |-- id: integer (nullable = true)
 |-- t: string (nullable = true)

Does this PR introduce any user-facing change?

Yes. After the changes, the inferred type is TIME(6) instead of STRING for the example above:

scala> spark.read.parquet("/Users/maxim.gekk/tmp/time_parquet2").printSchema()
root
 |-- id: integer (nullable = true)
 |-- t: time(6) (nullable = true)

How was this patch tested?

By running new test:

$ build/sbt "test:testOnly *ParquetV1PartitionDiscoverySuite"
$ build/sbt "test:testOnly *ParquetV2PartitionDiscoverySuite"

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Mar 29, 2025
@MaxGekk MaxGekk changed the title [WIP][SQL] Infer the TIME type from time partition values [WIP][SQL] Partitions discovery of TIME column values Mar 29, 2025
@MaxGekk MaxGekk changed the title [WIP][SQL] Partitions discovery of TIME column values [WIP][SPARK-51661][SQL] Partitions discovery of TIME column values Mar 29, 2025
@MaxGekk MaxGekk changed the title [WIP][SPARK-51661][SQL] Partitions discovery of TIME column values [SPARK-51661][SQL] Partitions discovery of TIME column values Mar 30, 2025
@MaxGekk MaxGekk marked this pull request as ready for review March 30, 2025 06:15
// Then falls back to date/timestamp types
.orElse(timestampTry)
.orElse(dateTry)
.orElse(timeTry)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for putting this here.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, this looks good to me. Thank you, @MaxGekk .

@MaxGekk
Copy link
Member Author

MaxGekk commented Mar 31, 2025

Merging to master. Thank you, @dongjoon-hyun @HyukjinKwon for review.

@MaxGekk MaxGekk closed this in c8c8c9a Mar 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants