[SPARK-36323][SQL] Support ANSI interval literals for TimeWindow#33551
[SPARK-36323][SQL] Support ANSI interval literals for TimeWindow#33551sarutak wants to merge 3 commits intoapache:masterfrom
Conversation
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #141751 has finished for PR 33551 at commit
|
| case NonFatal(e) => | ||
| throw QueryCompilationErrors.cannotParseTimeDelayError(interval, e) | ||
| } | ||
| cal.days * MICROS_PER_DAY + cal.microseconds |
There was a problem hiding this comment.
Not related to the PR but:
*and+can overflow. I think we should use exact arithmetic ops here.- one day in
CalendarIntervalcan be 23, 24, or 25 hours, see
There was a problem hiding this comment.
O.K, let's fix this in this PR.
There was a problem hiding this comment.
For 1, I've fixed in this PR.
For 2, it's about intervals, not timestamps so we don't need to consider daylight saving here right?
There was a problem hiding this comment.
For 2, it's about intervals, not timestamps so we don't need to consider daylight saving here right?
ok. Let's keep the assumption of 24 hours per day here. Not sure it will work fine during daylight saving time but it seems we don't have enough test coverage for that at the moment.
Maybe, open an JIRA to test the case?
There was a problem hiding this comment.
Hmm, I'll check it out and open a JIRA.
| throw new IllegalArgumentException( | ||
| s"Intervals greater than a month is not supported ($interval).") | ||
| val ymIntervalErrMsg = s"Intervals greater than a month is not supported ($interval)." | ||
| val cal = try { |
There was a problem hiding this comment.
This is similar to
spark/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
Lines 744 to 758 in 07fa38e
Dataset
There was a problem hiding this comment.
At first I tried to put a common helper function but they are slightly different so I didn't.
But it's O.K, to try to factor out the similar code.
| import org.apache.spark.sql.types.{LongType, StructField, StructType, TimestampNTZType, TimestampType} | ||
|
|
||
| class TimeWindowSuite extends SparkFunSuite with ExpressionEvalHelper with PrivateMethodTester { | ||
|
|
There was a problem hiding this comment.
Let's avoid unnecessary changes. This can cause conflicts in down stream.
| Seq(StructField("start", TimestampNTZType), StructField("end", TimestampNTZType)))) | ||
| } | ||
|
|
||
| test("SPARK-36323: Support ANSI interval literals for TimeWindow") { |
There was a problem hiding this comment.
Could you test when spark.sql.legacy.interval.enabled is true?
|
Test build #141762 has finished for PR 33551 at commit
|
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
|
Test build #141783 has finished for PR 33551 at commit
|
### What changes were proposed in this pull request? This PR proposes to support ANSI interval literals for `TimeWindow`. ### Why are the changes needed? Watermark also supports ANSI interval literals so it's great to support for `TimeWindow`. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New test. Closes #33551 from sarutak/window-interval. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: Max Gekk <max.gekk@gmail.com> (cherry picked from commit db18866) Signed-off-by: Max Gekk <max.gekk@gmail.com>
…legacy.interval.enabled is true ### What changes were proposed in this pull request? This PR adds test considering the case `spark.sql.legacy.interval.enabled` is `true` for SPARK-35815. ### Why are the changes needed? SPARK-35815 (#33456) changes `Dataset.withWatermark` to accept ANSI interval literals as `delayThreshold` but I noticed the change didn't work with `spark.sql.legacy.interval.enabled=true`. We can't detect this issue because there is no test which considers the legacy interval type at that time. In SPARK-36323 (#33551), this issue was resolved but it's better to add test. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New test. Closes #33606 from sarutak/test-watermark-with-legacy-interval. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
…legacy.interval.enabled is true ### What changes were proposed in this pull request? This PR adds test considering the case `spark.sql.legacy.interval.enabled` is `true` for SPARK-35815. ### Why are the changes needed? SPARK-35815 (#33456) changes `Dataset.withWatermark` to accept ANSI interval literals as `delayThreshold` but I noticed the change didn't work with `spark.sql.legacy.interval.enabled=true`. We can't detect this issue because there is no test which considers the legacy interval type at that time. In SPARK-36323 (#33551), this issue was resolved but it's better to add test. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New test. Closes #33606 from sarutak/test-watermark-with-legacy-interval. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: Max Gekk <max.gekk@gmail.com> (cherry picked from commit 92cdb17) Signed-off-by: Max Gekk <max.gekk@gmail.com>
What changes were proposed in this pull request?
This PR proposes to support ANSI interval literals for
TimeWindow.Why are the changes needed?
Watermark also supports ANSI interval literals so it's great to support for
TimeWindow.Does this PR introduce any user-facing change?
No.
How was this patch tested?
New test.