Skip to content

Commit

Permalink
[SPARK-35107][SQL] Parse unit-to-unit interval literals to ANSI inter…
Browse files Browse the repository at this point in the history
…vals

### What changes were proposed in this pull request?
Parse the year-month interval literals like `INTERVAL '1-1' YEAR TO MONTH` to values of `YearMonthIntervalType`, and day-time interval literals to `DayTimeIntervalType` values. Currently, Spark SQL supports:
- DAY TO HOUR
- DAY TO MINUTE
- DAY TO SECOND
- HOUR TO MINUTE
- HOUR TO SECOND
- MINUTE TO SECOND

All such interval literals are converted to `DayTimeIntervalType`, and `YEAR TO MONTH` to `YearMonthIntervalType` while loosing info about `from` and `to` units.

**Note**: new behavior is under the SQL config `spark.sql.legacy.interval.enabled` which is `false` by default. When the config is set to `true`, the interval literals are parsed to `CaledarIntervalType` values.

Closes #32176

### Why are the changes needed?
To conform the ANSI SQL standard which assumes conversions of interval literals to year-month or day-time interval but not to mixed interval type like Catalyst's `CalendarIntervalType`.

### Does this PR introduce _any_ user-facing change?
Yes.

Before:
```sql
spark-sql> SELECT INTERVAL '1 01:02:03.123' DAY TO SECOND;
1 days 1 hours 2 minutes 3.123 seconds
spark-sql> SELECT typeof(INTERVAL '1 01:02:03.123' DAY TO SECOND);
interval
```

After:
```sql
spark-sql> SELECT INTERVAL '1 01:02:03.123' DAY TO SECOND;
1 01:02:03.123000000
spark-sql> SELECT typeof(INTERVAL '1 01:02:03.123' DAY TO SECOND);
day-time interval
```

### How was this patch tested?
1. By running the affected test suites:
```
$ ./build/sbt "test:testOnly *.ExpressionParserSuite"
$ SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/testOnly *SQLQueryTestSuite -- -z interval.sql"
$ SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/testOnly *SQLQueryTestSuite -- -z create_view.sql"
$ SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/testOnly *SQLQueryTestSuite -- -z date.sql"
$ SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/testOnly *SQLQueryTestSuite -- -z timestamp.sql"
```
2. PostgresSQL tests are executed with `spark.sql.legacy.interval.enabled` is set to `true` to keep compatibility with PostgreSQL output:
```sql
> SELECT interval '999' second;
0 years 0 mons 0 days 0 hours 16 mins 39.00 secs
```

Closes #32209 from MaxGekk/parse-ansi-interval-literals.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
  • Loading branch information
MaxGekk committed Apr 19, 2021
1 parent 8dc455b commit 1d1ed3e
Show file tree
Hide file tree
Showing 11 changed files with 254 additions and 194 deletions.
2 changes: 2 additions & 0 deletions docs/sql-migration-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,8 @@ license: |

- In Spark 3.2, `TRANSFORM` operator can support `ArrayType/MapType/StructType` without Hive SerDe, in this mode, we use `StructsToJosn` to convert `ArrayType/MapType/StructType` column to `STRING` and use `JsonToStructs` to parse `STRING` to `ArrayType/MapType/StructType`. In Spark 3.1, Spark just support case `ArrayType/MapType/StructType` column as `STRING` but can't support parse `STRING` to `ArrayType/MapType/StructType` output columns.

- In Spark 3.2, the unit-to-unit interval literals like `INTERVAL '1-1' YEAR TO MONTH` are converted to ANSI interval types: `YearMonthIntervalType` or `DayTimeIntervalType`. In Spark 3.1 and earlier, such interval literals are converted to `CalendarIntervalType`. To restore the behavior before Spark 3.2, you can set `spark.sql.legacy.interval.enabled` to `true`.

## Upgrading from Spark SQL 3.0 to 3.1

- In Spark 3.1, statistical aggregation function includes `std`, `stddev`, `stddev_samp`, `variance`, `var_samp`, `skewness`, `kurtosis`, `covar_samp`, `corr` will return `NULL` instead of `Double.NaN` when `DivideByZero` occurs during expression evaluation, for example, when `stddev_samp` applied on a single element set. In Spark version 3.0 and earlier, it will return `Double.NaN` in such case. To restore the behavior before Spark 3.1, you can set `spark.sql.legacy.statisticalAggregate` to `true`.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
package org.apache.spark.sql.catalyst.parser

import java.util.Locale
import java.util.concurrent.TimeUnit
import javax.xml.bind.DatatypeConverter

import scala.collection.JavaConverters._
Expand Down Expand Up @@ -2306,12 +2307,30 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg
}

/**
* Create a [[CalendarInterval]] literal expression. Two syntaxes are supported:
* Create a [[CalendarInterval]] or ANSI interval literal expression.
* Two syntaxes are supported:
* - multiple unit value pairs, for instance: interval 2 months 2 days.
* - from-to unit, for instance: interval '1-2' year to month.
*/
override def visitInterval(ctx: IntervalContext): Literal = withOrigin(ctx) {
Literal(parseIntervalLiteral(ctx), CalendarIntervalType)
val calendarInterval = parseIntervalLiteral(ctx)
if (ctx.errorCapturingUnitToUnitInterval != null && !conf.legacyIntervalEnabled) {
// Check the `to` unit to distinguish year-month and day-time intervals because
// `CalendarInterval` doesn't have enough info. For instance, new CalendarInterval(0, 0, 0)
// can be derived from INTERVAL '0-0' YEAR TO MONTH as well as from
// INTERVAL '0 00:00:00' DAY TO SECOND.
val toUnit = ctx.errorCapturingUnitToUnitInterval.body.to.getText.toLowerCase(Locale.ROOT)
if (toUnit == "month") {
assert(calendarInterval.days == 0 && calendarInterval.microseconds == 0)
Literal(calendarInterval.months, YearMonthIntervalType)
} else {
assert(calendarInterval.months == 0)
val micros = IntervalUtils.getDuration(calendarInterval, TimeUnit.MICROSECONDS)
Literal(micros, DayTimeIntervalType)
}
} else {
Literal(calendarInterval, CalendarIntervalType)
}
}

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -714,37 +714,39 @@ class ExpressionParserSuite extends AnalysisTest {
// Non Existing unit
intercept("interval 10 nanoseconds", "invalid unit 'nanoseconds'")

// Year-Month intervals.
val yearMonthValues = Seq("123-10", "496-0", "-2-3", "-123-0", "\t -1-2\t")
yearMonthValues.foreach { value =>
val result = Literal(IntervalUtils.fromYearMonthString(value))
checkIntervals(s"'$value' year to month", result)
}
withSQLConf(SQLConf.LEGACY_INTERVAL_ENABLED.key -> "true") {
// Year-Month intervals.
val yearMonthValues = Seq("123-10", "496-0", "-2-3", "-123-0", "\t -1-2\t")
yearMonthValues.foreach { value =>
val result = Literal(IntervalUtils.fromYearMonthString(value))
checkIntervals(s"'$value' year to month", result)
}

// Day-Time intervals.
val datTimeValues = Seq(
"99 11:22:33.123456789",
"-99 11:22:33.123456789",
"10 9:8:7.123456789",
"1 0:0:0",
"-1 0:0:0",
"1 0:0:1",
"\t 1 0:0:1 ")
datTimeValues.foreach { value =>
val result = Literal(IntervalUtils.fromDayTimeString(value))
checkIntervals(s"'$value' day to second", result)
}
// Day-Time intervals.
val datTimeValues = Seq(
"99 11:22:33.123456789",
"-99 11:22:33.123456789",
"10 9:8:7.123456789",
"1 0:0:0",
"-1 0:0:0",
"1 0:0:1",
"\t 1 0:0:1 ")
datTimeValues.foreach { value =>
val result = Literal(IntervalUtils.fromDayTimeString(value))
checkIntervals(s"'$value' day to second", result)
}

// Hour-Time intervals.
val hourTimeValues = Seq(
"11:22:33.123456789",
"9:8:7.123456789",
"-19:18:17.123456789",
"0:0:0",
"0:0:1")
hourTimeValues.foreach { value =>
val result = Literal(IntervalUtils.fromDayTimeString(value, HOUR, SECOND))
checkIntervals(s"'$value' hour to second", result)
// Hour-Time intervals.
val hourTimeValues = Seq(
"11:22:33.123456789",
"9:8:7.123456789",
"-19:18:17.123456789",
"0:0:0",
"0:0:1")
hourTimeValues.foreach { value =>
val result = Literal(IntervalUtils.fromDayTimeString(value, HOUR, SECOND))
checkIntervals(s"'$value' hour to second", result)
}
}

// Unknown FROM TO intervals
Expand Down
2 changes: 2 additions & 0 deletions sql/core/src/test/resources/sql-tests/inputs/interval.sql
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,8 @@ select cast('- +1 second' as interval);
select interval 13.123456789 seconds, interval -13.123456789 second;
select interval 1 year 2 month 3 week 4 day 5 hour 6 minute 7 seconds 8 millisecond 9 microsecond;
select interval '30' year '25' month '-100' day '40' hour '80' minute '299.889987299' second;
select interval '0-0' year to month;
select interval '0 0:0:0' day to second;
select interval '0 0:0:0.1' day to second;
select interval '10-9' year to month;
select interval '20 15' day to hour;
Expand Down
Loading

0 comments on commit 1d1ed3e

Please sign in to comment.