[SPARK-29622][SQL] do not allow leading 'interval' in the interval string format#26283
[SPARK-29622][SQL] do not allow leading 'interval' in the interval string format#26283cloud-fan wants to merge 1 commit intoapache:masterfrom
Conversation
| validate(intervals.nonEmpty, "at least one time unit should be given for interval literal", ctx) | ||
| Literal(intervals.reduce(_.add(_))) | ||
| } | ||
| override def visitMultiUnitsInterval(ctx: MultiUnitsIntervalContext): CalendarInterval = { |
There was a problem hiding this comment.
This is mostly the original visitSingleInterval, with some methods inlined.
| val value = string(ctx.STRING()) | ||
| try { | ||
| val interval = (ctx.from.getText, ctx.to.getText) match { | ||
| case ("YEAR", "MONTH") => |
There was a problem hiding this comment.
these are copied from the original visitIntervalField.
|
|
||
| def legacyCastStringToInterval(str: String): CalendarInterval = { | ||
| val trimmed = str.trim | ||
| if (trimmed.startsWith("interval")) { |
There was a problem hiding this comment.
I think this should be case insensitive check. You can use regionMatches() for that.
|
If we don't allow Just in case, could you write a round trip test if such test doesn't exist. Otherwise it looks strange that Spark cannot parse back its output. |
|
Test build #112773 has finished for PR 26283 at commit
|
|
The parser cleanup has been extracted to #26285 |
|
Thank you, @cloud-fan ! |
|
@cloud-fan Please, rebase this on the master. |
|
Test build #113214 has finished for PR 26283 at commit
|
a8c23a3 to
135f8e9
Compare
|
Test build #113236 has finished for PR 26283 at commit
|
|
Test build #113235 has finished for PR 26283 at commit
|
|
Test build #113239 has finished for PR 26283 at commit
|
|
This has bigger impacts than I expect, e.g. it breaks I'm closing it to avoid breaking public APIs. |
What changes were proposed in this pull request?
When parsing a string to interval, do not allow the leading "interval".
This PR also removes the leading "interval" from
CalendarInterval.toString, to be consistent.Why are the changes needed?
When parsing string to interval, we already know it's an interval and the leading "interval" in the string is kind of duplicated. The behavior is also consistent with pgsql
Does this PR introduce any user-facing change?
Yes. The
INTERVAL'...'syntax is newly introduced in the master branch, so it's OK to change. The cast behavior in 2.4 is pretty weird: the leading "interval" in the string is required. I think it's better to change it (with a migration guide and legacy config).How was this patch tested?
updated tests