Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-35107][SQL] Parse unit-to-unit interval literals to ANSI intervals #32209

Closed
wants to merge 13 commits into from
2 changes: 2 additions & 0 deletions docs/sql-migration-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,8 @@ license: |

- In Spark 3.2, `TRANSFORM` operator can't support alias in inputs. In Spark 3.1 and earlier, we can write script transform like `SELECT TRANSFORM(a AS c1, b AS c2) USING 'cat' FROM TBL`.

- In Spark 3.2, the unit-to-unit interval literals like `INTERVAL '1-1' YEAR TO MONTH` are converted to ANSI interval types: `YearMonthIntervalType` or `DayTimeIntervalType`. In Spark 3.1 and earlier, such interval literals are converted to `CalendarIntervalType`. To restore the behavior before Spark 3.2, you can set `spark.sql.legacy.interval.enabled` to `true`.

## Upgrading from Spark SQL 3.0 to 3.1

- In Spark 3.1, statistical aggregation function includes `std`, `stddev`, `stddev_samp`, `variance`, `var_samp`, `skewness`, `kurtosis`, `covar_samp`, `corr` will return `NULL` instead of `Double.NaN` when `DivideByZero` occurs during expression evaluation, for example, when `stddev_samp` applied on a single element set. In Spark version 3.0 and earlier, it will return `Double.NaN` in such case. To restore the behavior before Spark 3.1, you can set `spark.sql.legacy.statisticalAggregate` to `true`.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
package org.apache.spark.sql.catalyst.parser

import java.util.Locale
import java.util.concurrent.TimeUnit
import javax.xml.bind.DatatypeConverter

import scala.collection.JavaConverters._
Expand Down Expand Up @@ -2306,12 +2307,23 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg
}

/**
* Create a [[CalendarInterval]] literal expression. Two syntaxes are supported:
* Create a [[CalendarInterval]] or ANSI interval literal expression.
* Two syntaxes are supported:
* - multiple unit value pairs, for instance: interval 2 months 2 days.
* - from-to unit, for instance: interval '1-2' year to month.
*/
override def visitInterval(ctx: IntervalContext): Literal = withOrigin(ctx) {
Literal(parseIntervalLiteral(ctx), CalendarIntervalType)
val calendarInterval = parseIntervalLiteral(ctx)
if (ctx.errorCapturingUnitToUnitInterval != null && !conf.legacyIntervalEnabled) {
if (calendarInterval.months == 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just for curiosity. The corner cases '0-0', '0 00:00:00' are valid and problems here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems, yes. Need to look at the text since CalendarInterval doesn't have enough info.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, we'd better add some comments here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. I will add a comment.

val micros = IntervalUtils.getDuration(calendarInterval, TimeUnit.MICROSECONDS)
Literal(micros, DayTimeIntervalType)
} else {
Literal(calendarInterval.months, YearMonthIntervalType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we add an assert here? assert(calendarInterval.days == 0 && calendarInterval.microseconds == 0)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I will add an assert.

}
} else {
Literal(calendarInterval, CalendarIntervalType)
}
}

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -714,37 +714,39 @@ class ExpressionParserSuite extends AnalysisTest {
// Non Existing unit
intercept("interval 10 nanoseconds", "invalid unit 'nanoseconds'")

// Year-Month intervals.
val yearMonthValues = Seq("123-10", "496-0", "-2-3", "-123-0", "\t -1-2\t")
yearMonthValues.foreach { value =>
val result = Literal(IntervalUtils.fromYearMonthString(value))
checkIntervals(s"'$value' year to month", result)
}
withSQLConf(SQLConf.LEGACY_INTERVAL_ENABLED.key -> "true") {
// Year-Month intervals.
val yearMonthValues = Seq("123-10", "496-0", "-2-3", "-123-0", "\t -1-2\t")
yearMonthValues.foreach { value =>
val result = Literal(IntervalUtils.fromYearMonthString(value))
checkIntervals(s"'$value' year to month", result)
}

// Day-Time intervals.
val datTimeValues = Seq(
"99 11:22:33.123456789",
"-99 11:22:33.123456789",
"10 9:8:7.123456789",
"1 0:0:0",
"-1 0:0:0",
"1 0:0:1",
"\t 1 0:0:1 ")
datTimeValues.foreach { value =>
val result = Literal(IntervalUtils.fromDayTimeString(value))
checkIntervals(s"'$value' day to second", result)
}
// Day-Time intervals.
val datTimeValues = Seq(
"99 11:22:33.123456789",
"-99 11:22:33.123456789",
"10 9:8:7.123456789",
"1 0:0:0",
"-1 0:0:0",
"1 0:0:1",
"\t 1 0:0:1 ")
datTimeValues.foreach { value =>
val result = Literal(IntervalUtils.fromDayTimeString(value))
checkIntervals(s"'$value' day to second", result)
}

// Hour-Time intervals.
val hourTimeValues = Seq(
"11:22:33.123456789",
"9:8:7.123456789",
"-19:18:17.123456789",
"0:0:0",
"0:0:1")
hourTimeValues.foreach { value =>
val result = Literal(IntervalUtils.fromDayTimeString(value, HOUR, SECOND))
checkIntervals(s"'$value' hour to second", result)
// Hour-Time intervals.
val hourTimeValues = Seq(
"11:22:33.123456789",
"9:8:7.123456789",
"-19:18:17.123456789",
"0:0:0",
"0:0:1")
hourTimeValues.foreach { value =>
val result = Literal(IntervalUtils.fromDayTimeString(value, HOUR, SECOND))
checkIntervals(s"'$value' hour to second", result)
}
}

// Unknown FROM TO intervals
Expand Down
89 changes: 44 additions & 45 deletions sql/core/src/test/resources/sql-tests/results/ansi/interval.sql.out
Original file line number Diff line number Diff line change
Expand Up @@ -130,49 +130,49 @@ struct<(+ INTERVAL '-1 months 1 days -1 seconds'):interval>
-- !query
select interval -'1-1' year to month
-- !query schema
struct<INTERVAL '-1 years -1 months':interval>
struct<INTERVAL '-1-1' YEAR TO MONTH:year-month interval>
-- !query output
-1 years -1 months
-1-1


-- !query
select interval -'-1-1' year to month
-- !query schema
struct<INTERVAL '1 years 1 months':interval>
struct<INTERVAL '1-1' YEAR TO MONTH:year-month interval>
-- !query output
1 years 1 months
1-1


-- !query
select interval +'-1-1' year to month
-- !query schema
struct<INTERVAL '-1 years -1 months':interval>
struct<INTERVAL '-1-1' YEAR TO MONTH:year-month interval>
-- !query output
-1 years -1 months
-1-1


-- !query
select interval - '1 2:3:4.001' day to second
-- !query schema
struct<INTERVAL '-1 days -2 hours -3 minutes -4.001 seconds':interval>
struct<INTERVAL '-1 02:03:04.001' DAY TO SECOND:day-time interval>
-- !query output
-1 days -2 hours -3 minutes -4.001 seconds
-1 02:03:04.001000000


-- !query
select interval +'1 2:3:4.001' day to second
-- !query schema
struct<INTERVAL '1 days 2 hours 3 minutes 4.001 seconds':interval>
struct<INTERVAL '1 02:03:04.001' DAY TO SECOND:day-time interval>
-- !query output
1 days 2 hours 3 minutes 4.001 seconds
1 02:03:04.001000000


-- !query
select interval -'-1 2:3:4.001' day to second
-- !query schema
struct<INTERVAL '1 days 2 hours 3 minutes 4.001 seconds':interval>
struct<INTERVAL '1 02:03:04.001' DAY TO SECOND:day-time interval>
-- !query output
1 days 2 hours 3 minutes 4.001 seconds
1 02:03:04.001000000


-- !query
Expand Down Expand Up @@ -331,73 +331,73 @@ struct<INTERVAL '32 years 1 months -100 days 41 hours 24 minutes 59.889987 secon
-- !query
select interval '0 0:0:0.1' day to second
-- !query schema
struct<INTERVAL '0.1 seconds':interval>
struct<INTERVAL '0 00:00:00.1' DAY TO SECOND:day-time interval>
-- !query output
0.1 seconds
0 00:00:00.100000000


-- !query
select interval '10-9' year to month
-- !query schema
struct<INTERVAL '10 years 9 months':interval>
struct<INTERVAL '10-9' YEAR TO MONTH:year-month interval>
-- !query output
10 years 9 months
10-9


-- !query
select interval '20 15' day to hour
-- !query schema
struct<INTERVAL '20 days 15 hours':interval>
struct<INTERVAL '20 15:00:00' DAY TO SECOND:day-time interval>
-- !query output
20 days 15 hours
20 15:00:00.000000000


-- !query
select interval '20 15:40' day to minute
-- !query schema
struct<INTERVAL '20 days 15 hours 40 minutes':interval>
struct<INTERVAL '20 15:40:00' DAY TO SECOND:day-time interval>
-- !query output
20 days 15 hours 40 minutes
20 15:40:00.000000000


-- !query
select interval '20 15:40:32.99899999' day to second
-- !query schema
struct<INTERVAL '20 days 15 hours 40 minutes 32.998999 seconds':interval>
struct<INTERVAL '20 15:40:32.998999' DAY TO SECOND:day-time interval>
-- !query output
20 days 15 hours 40 minutes 32.998999 seconds
20 15:40:32.998999000


-- !query
select interval '15:40' hour to minute
-- !query schema
struct<INTERVAL '15 hours 40 minutes':interval>
struct<INTERVAL '0 15:40:00' DAY TO SECOND:day-time interval>
-- !query output
15 hours 40 minutes
0 15:40:00.000000000


-- !query
select interval '15:40:32.99899999' hour to second
-- !query schema
struct<INTERVAL '15 hours 40 minutes 32.998999 seconds':interval>
struct<INTERVAL '0 15:40:32.998999' DAY TO SECOND:day-time interval>
-- !query output
15 hours 40 minutes 32.998999 seconds
0 15:40:32.998999000


-- !query
select interval '40:32.99899999' minute to second
-- !query schema
struct<INTERVAL '40 minutes 32.998999 seconds':interval>
struct<INTERVAL '0 00:40:32.998999' DAY TO SECOND:day-time interval>
-- !query output
40 minutes 32.998999 seconds
0 00:40:32.998999000


-- !query
select interval '40:32' minute to second
-- !query schema
struct<INTERVAL '40 minutes 32 seconds':interval>
struct<INTERVAL '0 00:40:32' DAY TO SECOND:day-time interval>
-- !query output
40 minutes 32 seconds
0 00:40:32.000000000


-- !query
Expand Down Expand Up @@ -786,7 +786,7 @@ select
interval '2-2' year to month + dateval
from interval_arithmetic
-- !query schema
struct<dateval:date,dateval - INTERVAL '2 years 2 months':date,dateval - INTERVAL '-2 years -2 months':date,dateval + INTERVAL '2 years 2 months':date,dateval + INTERVAL '-2 years -2 months':date,dateval + (- INTERVAL '2 years 2 months'):date,dateval + INTERVAL '2 years 2 months':date>
struct<dateval:date,dateval - INTERVAL '2-2' YEAR TO MONTH:date,dateval - INTERVAL '-2-2' YEAR TO MONTH:date,dateval + INTERVAL '2-2' YEAR TO MONTH:date,dateval + INTERVAL '-2-2' YEAR TO MONTH:date,dateval + (- INTERVAL '2-2' YEAR TO MONTH):date,dateval + INTERVAL '2-2' YEAR TO MONTH:date>
-- !query output
2012-01-01 2009-11-01 2014-03-01 2014-03-01 2009-11-01 2009-11-01 2014-03-01

Expand All @@ -802,7 +802,7 @@ select
interval '2-2' year to month + tsval
from interval_arithmetic
-- !query schema
struct<tsval:timestamp,tsval - INTERVAL '2 years 2 months':timestamp,tsval - INTERVAL '-2 years -2 months':timestamp,tsval + INTERVAL '2 years 2 months':timestamp,tsval + INTERVAL '-2 years -2 months':timestamp,tsval + (- INTERVAL '2 years 2 months'):timestamp,tsval + INTERVAL '2 years 2 months':timestamp>
struct<tsval:timestamp,tsval - INTERVAL '2-2' YEAR TO MONTH:timestamp,tsval - INTERVAL '-2-2' YEAR TO MONTH:timestamp,tsval + INTERVAL '2-2' YEAR TO MONTH:timestamp,tsval + INTERVAL '-2-2' YEAR TO MONTH:timestamp,tsval + (- INTERVAL '2-2' YEAR TO MONTH):timestamp,tsval + INTERVAL '2-2' YEAR TO MONTH:timestamp>
-- !query output
2012-01-01 00:00:00 2009-11-01 00:00:00 2014-03-01 00:00:00 2014-03-01 00:00:00 2009-11-01 00:00:00 2009-11-01 00:00:00 2014-03-01 00:00:00

Expand All @@ -813,9 +813,9 @@ select
interval '2-2' year to month - interval '3-3' year to month
from interval_arithmetic
-- !query schema
struct<(INTERVAL '2 years 2 months' + INTERVAL '3 years 3 months'):interval,(INTERVAL '2 years 2 months' - INTERVAL '3 years 3 months'):interval>
struct<(INTERVAL '2-2' YEAR TO MONTH + INTERVAL '3-3' YEAR TO MONTH):year-month interval,(INTERVAL '2-2' YEAR TO MONTH - INTERVAL '3-3' YEAR TO MONTH):year-month interval>
-- !query output
5 years 5 months -1 years -1 months
5-5 -1-1


-- !query
Expand All @@ -829,10 +829,9 @@ select
interval '99 11:22:33.123456789' day to second + dateval
from interval_arithmetic
-- !query schema
struct<>
struct<dateval:date,dateval - INTERVAL '99 11:22:33.123456' DAY TO SECOND:timestamp,dateval - INTERVAL '-99 11:22:33.123456' DAY TO SECOND:timestamp,dateval + INTERVAL '99 11:22:33.123456' DAY TO SECOND:timestamp,dateval + INTERVAL '-99 11:22:33.123456' DAY TO SECOND:timestamp,dateval + (- INTERVAL '99 11:22:33.123456' DAY TO SECOND):timestamp,dateval + INTERVAL '99 11:22:33.123456' DAY TO SECOND:timestamp>
-- !query output
java.lang.IllegalArgumentException
requirement failed: Cannot add hours, minutes or seconds, milliseconds, microseconds to a date
2012-01-01 2011-09-23 12:37:26.876544 2012-04-09 11:22:33.123456 2012-04-09 11:22:33.123456 2011-09-23 12:37:26.876544 2011-09-23 12:37:26.876544 2012-04-09 11:22:33.123456


-- !query
Expand All @@ -846,7 +845,7 @@ select
interval '99 11:22:33.123456789' day to second + tsval
from interval_arithmetic
-- !query schema
struct<tsval:timestamp,tsval - INTERVAL '99 days 11 hours 22 minutes 33.123456 seconds':timestamp,tsval - INTERVAL '-99 days -11 hours -22 minutes -33.123456 seconds':timestamp,tsval + INTERVAL '99 days 11 hours 22 minutes 33.123456 seconds':timestamp,tsval + INTERVAL '-99 days -11 hours -22 minutes -33.123456 seconds':timestamp,tsval + (- INTERVAL '99 days 11 hours 22 minutes 33.123456 seconds'):timestamp,tsval + INTERVAL '99 days 11 hours 22 minutes 33.123456 seconds':timestamp>
struct<tsval:timestamp,tsval - INTERVAL '99 11:22:33.123456' DAY TO SECOND:timestamp,tsval - INTERVAL '-99 11:22:33.123456' DAY TO SECOND:timestamp,tsval + INTERVAL '99 11:22:33.123456' DAY TO SECOND:timestamp,tsval + INTERVAL '-99 11:22:33.123456' DAY TO SECOND:timestamp,tsval + (- INTERVAL '99 11:22:33.123456' DAY TO SECOND):timestamp,tsval + INTERVAL '99 11:22:33.123456' DAY TO SECOND:timestamp>
-- !query output
2012-01-01 00:00:00 2011-09-23 12:37:26.876544 2012-04-09 11:22:33.123456 2012-04-09 11:22:33.123456 2011-09-23 12:37:26.876544 2011-09-23 12:37:26.876544 2012-04-09 11:22:33.123456

Expand All @@ -862,7 +861,7 @@ select
interval '99 11:22:33.123456789' day to second + strval
from interval_arithmetic
-- !query schema
struct<strval:string,strval - INTERVAL '99 days 11 hours 22 minutes 33.123456 seconds':string,strval - INTERVAL '-99 days -11 hours -22 minutes -33.123456 seconds':string,strval + INTERVAL '99 days 11 hours 22 minutes 33.123456 seconds':string,strval + INTERVAL '-99 days -11 hours -22 minutes -33.123456 seconds':string,strval + (- INTERVAL '99 days 11 hours 22 minutes 33.123456 seconds'):string,strval + INTERVAL '99 days 11 hours 22 minutes 33.123456 seconds':string>
struct<strval:string,strval - INTERVAL '99 11:22:33.123456' DAY TO SECOND:string,strval - INTERVAL '-99 11:22:33.123456' DAY TO SECOND:string,strval + INTERVAL '99 11:22:33.123456' DAY TO SECOND:string,strval + INTERVAL '-99 11:22:33.123456' DAY TO SECOND:string,strval + (- INTERVAL '99 11:22:33.123456' DAY TO SECOND):string,strval + INTERVAL '99 11:22:33.123456' DAY TO SECOND:string>
-- !query output
2012-01-01 2011-09-23 12:37:26.876544 2012-04-09 11:22:33.123456 2012-04-09 11:22:33.123456 2011-09-23 12:37:26.876544 2011-09-23 12:37:26.876544 2012-04-09 11:22:33.123456

Expand All @@ -873,9 +872,9 @@ select
interval '99 11:22:33.123456789' day to second - interval '10 9:8:7.123456789' day to second
from interval_arithmetic
-- !query schema
struct<(INTERVAL '99 days 11 hours 22 minutes 33.123456 seconds' + INTERVAL '10 days 9 hours 8 minutes 7.123456 seconds'):interval,(INTERVAL '99 days 11 hours 22 minutes 33.123456 seconds' - INTERVAL '10 days 9 hours 8 minutes 7.123456 seconds'):interval>
struct<(INTERVAL '99 11:22:33.123456' DAY TO SECOND + INTERVAL '10 09:08:07.123456' DAY TO SECOND):day-time interval,(INTERVAL '99 11:22:33.123456' DAY TO SECOND - INTERVAL '10 09:08:07.123456' DAY TO SECOND):day-time interval>
-- !query output
109 days 20 hours 30 minutes 40.246912 seconds 89 days 2 hours 14 minutes 26 seconds
109 20:30:40.246912000 89 02:14:26.000000000


-- !query
Expand Down Expand Up @@ -921,9 +920,9 @@ struct<INTERVAL '1 days':interval>
-- !query
select interval '2-2\t' year to month
-- !query schema
struct<INTERVAL '2 years 2 months':interval>
struct<INTERVAL '2-2' YEAR TO MONTH:year-month interval>
-- !query output
2 years 2 months
2-2


-- !query
Expand All @@ -943,9 +942,9 @@ select interval '-\t2-2\t' year to month
-- !query
select interval '\n0 12:34:46.789\t' day to second
-- !query schema
struct<INTERVAL '12 hours 34 minutes 46.789 seconds':interval>
struct<INTERVAL '0 12:34:46.789' DAY TO SECOND:day-time interval>
-- !query output
12 hours 34 minutes 46.789 seconds
0 12:34:46.789000000


-- !query
Expand Down
Loading