[SPARK-31212][SQL][2.4] Fix Failure of casting the '1000-02-29' string to the date type#28445
[SPARK-31212][SQL][2.4] Fix Failure of casting the '1000-02-29' string to the date type#28445tianshizz wants to merge 1 commit intoapache:branch-2.4from
Conversation
…the date type in 2.4
|
attempted to fix the bug in branch-2.4, as discussed in #28443 |
|
Could you please add |
|
Looks like we should also check https://github.com/apache/spark/blob/branch-2.4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L608-L610. Is it all instances we should fix? |
|
cc @cloud-fan too. |
|
ok to test |
|
Test build #122244 has finished for PR 28445 at commit
|
|
|
||
| private val threadLocalUtcGregorianCalendar = new ThreadLocal[GregorianCalendar] { | ||
| override def initialValue(): GregorianCalendar = { | ||
| new GregorianCalendar(TimeZoneUTC) |
There was a problem hiding this comment.
does timezone matter here?
There was a problem hiding this comment.
Based on the implementation (https://github.com/JetBrains/jdk8u_jdk/blob/master/src/share/classes/java/util/GregorianCalendar.java#L819), I don't think timezone would make a difference here.
|
I believe there are more bugs in Spark 2.4 as it has a lot of homemade datetime processing code. I'm not sure you can safely process datetime values before 1582 with Spark 2.4, even with this fix. |
|
@HyukjinKwon thanks for updating the title. I did change the place that you pointed to. According to the jira ticket, I believe that's the only place we need to fix? |
|
@cloud-fan I agree that there might be more time related bugs in 2.4. Do you think it would be good to fix this specific bug in this pr and file more tickets as we go? I'm very new to the community and would love to take on something like this to sweep the time related bugs with some guidance from more tenured members here. |
the build failed due to a unrelated error I think. Could someone help me retest? |
|
retest this please |
|
Test build #122274 has finished for PR 28445 at commit
|
|
@MaxGekk what's your opinion? I'm fine with this fix but I won't encourage people to spend much time fixing datetime related bugs in 2.4. The datetime part is completely rewritten in 3.0. |
|
@MaxGekk bump to see how you feel about this pr. |
|
Okay, I had some time to investigate the related items here. I think there are some more places to fix, e.g. here. Let's don't fix these in branch-2.4 only because:
Considering these risks, let's don't land these fixes. We can more conservatively just document these in Spark 2.4 specifically given the potential maintenance overhead and technical difficulty, if we should. |
|
@HyukjinKwon Thanks for the detailed explanation. Agree that it's probably better to keep these bugs in 2.4 instead of fixing. For the next step, should we update the ticket with the reasoning and close it as won't fix? |
|
Yeah, let's close it as wont'fix. Thanks @tianshizz. |
What changes were proposed in this pull request?
Use
GregorianCanlendar.isLeapYear()as suggested in SPARK-31212 to fix parsing date string such as1000-02-29Why are the changes needed?
Fix a bug that leap years in Julian calendar can't be parsed correctly in v2.4
Does this PR introduce any user-facing change?
No
How was this patch tested?
added a unit test that would fail in the current branch