-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[SPARK-36418][SQL] Use CAST in parsing of dates/timestamps with default pattern #33709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
3e7eda0
a11ae7c
09aaadd
8da6961
b0a6ff5
d878eff
184c49b
ace6d86
b8de899
c6a3c61
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1067,7 +1067,7 @@ abstract class ParquetPartitionDiscoverySuite | |
|
|
||
| test("SPARK-23436: invalid Dates should be inferred as String in partition inference") { | ||
| withTempPath { path => | ||
| val data = Seq(("1", "2018-01", "2018-01-01-04", "test")) | ||
| val data = Seq(("1", "2018-41", "2018-01-01-04", "test")) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @MaxGekk In spark 3.3.0, I tested what is the read type of partition columns. val data = Seq(("1", "2018-01", "2018-01-01-04", "test"))
.toDF("id", "date_month", "date_hour", "data")
val path = "some_path"
data.write.partitionBy("date_month", "date_hour").parquet(path)
val input = spark.read.parquet(path).select("id", "date_month", "date_hour", "data")
println(input.schema)
println(data.schema)So, I would like to ask what it means to change the date_month value from '2018-01' to '2018-41' in this test code. Thanks.🙏
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Additionally, sbt:spark-sql> testOnly *ParquetV1PartitionDiscoverySuite -- -z "SPARK-23436"
sbt:spark-sql> testOnly *ParquetV2PartitionDiscoverySuite -- -z "SPARK-23436"=> All success..! However, after changing the value to |
||
| .toDF("id", "date_month", "date_hour", "data") | ||
|
|
||
| data.write.partitionBy("date_month", "date_hour").parquet(path.getAbsolutePath) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need the changes here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the changes, the partition value
2018-01became a valid date value. New partition formatter can parse it as 2018-01-01, see patterns that supported by the CAST expression:spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
Line 228 in ed0e351
As a consequence, Spark infers DateType as the type of partition values. And finally, it converts all strings to the type.