Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-35107][SQL] Parse unit-to-unit interval literals to ANSI intervals #32209

Closed
wants to merge 13 commits into from
2 changes: 2 additions & 0 deletions docs/sql-migration-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,8 @@ license: |

- In Spark 3.2, `TRANSFORM` operator can support `ArrayType/MapType/StructType` without Hive SerDe, in this mode, we use `StructsToJosn` to convert `ArrayType/MapType/StructType` column to `STRING` and use `JsonToStructs` to parse `STRING` to `ArrayType/MapType/StructType`. In Spark 3.1, Spark just support case `ArrayType/MapType/StructType` column as `STRING` but can't support parse `STRING` to `ArrayType/MapType/StructType` output columns.

- In Spark 3.2, the unit-to-unit interval literals like `INTERVAL '1-1' YEAR TO MONTH` are converted to ANSI interval types: `YearMonthIntervalType` or `DayTimeIntervalType`. In Spark 3.1 and earlier, such interval literals are converted to `CalendarIntervalType`. To restore the behavior before Spark 3.2, you can set `spark.sql.legacy.interval.enabled` to `true`.

## Upgrading from Spark SQL 3.0 to 3.1

- In Spark 3.1, statistical aggregation function includes `std`, `stddev`, `stddev_samp`, `variance`, `var_samp`, `skewness`, `kurtosis`, `covar_samp`, `corr` will return `NULL` instead of `Double.NaN` when `DivideByZero` occurs during expression evaluation, for example, when `stddev_samp` applied on a single element set. In Spark version 3.0 and earlier, it will return `Double.NaN` in such case. To restore the behavior before Spark 3.1, you can set `spark.sql.legacy.statisticalAggregate` to `true`.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
package org.apache.spark.sql.catalyst.parser

import java.util.Locale
import java.util.concurrent.TimeUnit
import javax.xml.bind.DatatypeConverter

import scala.collection.JavaConverters._
Expand Down Expand Up @@ -2306,12 +2307,30 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg
}

/**
* Create a [[CalendarInterval]] literal expression. Two syntaxes are supported:
* Create a [[CalendarInterval]] or ANSI interval literal expression.
* Two syntaxes are supported:
* - multiple unit value pairs, for instance: interval 2 months 2 days.
* - from-to unit, for instance: interval '1-2' year to month.
*/
override def visitInterval(ctx: IntervalContext): Literal = withOrigin(ctx) {
Literal(parseIntervalLiteral(ctx), CalendarIntervalType)
val calendarInterval = parseIntervalLiteral(ctx)
if (ctx.errorCapturingUnitToUnitInterval != null && !conf.legacyIntervalEnabled) {
// Check the `to` unit to distinguish year-month and day-time intervals because
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little bit of opinion. Why not create a new parseIntervalLiteral for ansi interval, but reuse parseIntervalLiteral?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will have to do that at the end to support all combinations of to and from + precision ( like INTERVAL '1 2:3:4.123' DAY(6) TO SECOND(3)). But for now, we implement simplest possible solution to unblock other sub-tasks needed for Milestone 1 in SPARK-27790. The Milestone 1 supposes feature parity of new ANSI types with CalendarIntervalType.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got it. @MaxGekk Thans for your explanation.

// `CalendarInterval` doesn't have enough info. For instance, new CalendarInterval(0, 0, 0)
// can be derived from INTERVAL '0-0' YEAR TO MONTH as well as from
// INTERVAL '0 00:00:00' DAY TO SECOND.
val toUnit = ctx.errorCapturingUnitToUnitInterval.body.to.getText.toLowerCase(Locale.ROOT)
if (toUnit == "month") {
assert(calendarInterval.days == 0 && calendarInterval.microseconds == 0)
Literal(calendarInterval.months, YearMonthIntervalType)
} else {
assert(calendarInterval.months == 0)
val micros = IntervalUtils.getDuration(calendarInterval, TimeUnit.MICROSECONDS)
Literal(micros, DayTimeIntervalType)
}
} else {
Literal(calendarInterval, CalendarIntervalType)
}
}

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -714,37 +714,39 @@ class ExpressionParserSuite extends AnalysisTest {
// Non Existing unit
intercept("interval 10 nanoseconds", "invalid unit 'nanoseconds'")

// Year-Month intervals.
val yearMonthValues = Seq("123-10", "496-0", "-2-3", "-123-0", "\t -1-2\t")
yearMonthValues.foreach { value =>
val result = Literal(IntervalUtils.fromYearMonthString(value))
checkIntervals(s"'$value' year to month", result)
}
withSQLConf(SQLConf.LEGACY_INTERVAL_ENABLED.key -> "true") {
// Year-Month intervals.
val yearMonthValues = Seq("123-10", "496-0", "-2-3", "-123-0", "\t -1-2\t")
yearMonthValues.foreach { value =>
val result = Literal(IntervalUtils.fromYearMonthString(value))
checkIntervals(s"'$value' year to month", result)
}

// Day-Time intervals.
val datTimeValues = Seq(
"99 11:22:33.123456789",
"-99 11:22:33.123456789",
"10 9:8:7.123456789",
"1 0:0:0",
"-1 0:0:0",
"1 0:0:1",
"\t 1 0:0:1 ")
datTimeValues.foreach { value =>
val result = Literal(IntervalUtils.fromDayTimeString(value))
checkIntervals(s"'$value' day to second", result)
}
// Day-Time intervals.
val datTimeValues = Seq(
"99 11:22:33.123456789",
"-99 11:22:33.123456789",
"10 9:8:7.123456789",
"1 0:0:0",
"-1 0:0:0",
"1 0:0:1",
"\t 1 0:0:1 ")
datTimeValues.foreach { value =>
val result = Literal(IntervalUtils.fromDayTimeString(value))
checkIntervals(s"'$value' day to second", result)
}

// Hour-Time intervals.
val hourTimeValues = Seq(
"11:22:33.123456789",
"9:8:7.123456789",
"-19:18:17.123456789",
"0:0:0",
"0:0:1")
hourTimeValues.foreach { value =>
val result = Literal(IntervalUtils.fromDayTimeString(value, HOUR, SECOND))
checkIntervals(s"'$value' hour to second", result)
// Hour-Time intervals.
val hourTimeValues = Seq(
"11:22:33.123456789",
"9:8:7.123456789",
"-19:18:17.123456789",
"0:0:0",
"0:0:1")
hourTimeValues.foreach { value =>
val result = Literal(IntervalUtils.fromDayTimeString(value, HOUR, SECOND))
checkIntervals(s"'$value' hour to second", result)
}
}

// Unknown FROM TO intervals
Expand Down
2 changes: 2 additions & 0 deletions sql/core/src/test/resources/sql-tests/inputs/interval.sql
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,8 @@ select cast('- +1 second' as interval);
select interval 13.123456789 seconds, interval -13.123456789 second;
select interval 1 year 2 month 3 week 4 day 5 hour 6 minute 7 seconds 8 millisecond 9 microsecond;
select interval '30' year '25' month '-100' day '40' hour '80' minute '299.889987299' second;
select interval '0-0' year to month;
select interval '0 0:0:0' day to second;
select interval '0 0:0:0.1' day to second;
select interval '10-9' year to month;
select interval '20 15' day to hour;
Expand Down
Loading