Skip to content

Commit

Permalink
[SPARK-31469][SQL] Make extract interval field ANSI compliance
Browse files Browse the repository at this point in the history
### What changes were proposed in this pull request?

Currently, we can extract `millennium/century/decade/year/quarter/month/week/day/hour/minute/second(with fractions)//millisecond/microseconds` and `epoch` from interval values

While getting the `millennium/century/decade/year`, it means how many the interval `months` part can be converted to that unit-value. The content of `millennium/century/decade` will overlap `year` and each other.

While getting `month/day` and so on, it means the integral remainder of the previous unit. Here all the units including `year` are individual.

So while extracting `year`, `month`, `day`, `hour`, `minute`, `second`, which are ANSI primary datetime units, the semantic is `extracting`, but others might refer to `transforming`.

While getting epoch we have treat month as 30 days which varies the natural Calendar rules we use.

To avoid ambiguity, I suggest we should only support those extract field defined ANSI with their abbreviations.

### Why are the changes needed?

Extracting `millennium`, `century` etc does not obey the meaning of extracting, and they are not so useful and worth maintaining.

The `extract` is ANSI standard expression and `date_part` is its pg-specific alias function. The current support extract-fields are fully bought from PostgreSQL.

With a look at other systems like Presto/Hive, they don't support those ambiguous fields too.

e.g. Hive 2.2.x also take it from PostgreSQL but without introducing those ambiguous fields https://issues.apache.org/jira/secure/attachment/12828349/HIVE-14579

e.g. presto

```sql
presto> select extract(quater from interval '10-0' year to month);
Query 20200417_094723_00020_m8xq4 failed: line 1:8: Invalid EXTRACT field: quater
select extract(quater from interval '10-0' year to month)

presto> select extract(decade from interval '10-0' year to month);
Query 20200417_094737_00021_m8xq4 failed: line 1:8: Invalid EXTRACT field: decade
select extract(decade from interval '10-0' year to month)

```

### Does this PR introduce any user-facing change?

Yes, as we already have previews versions, this PR will remove support for extracting `millennium/century/decade/quarter/week/millisecond/microseconds` and `epoch` from intervals with `date_part` function

### How was this patch tested?

rm some used tests

Closes #28242 from yaooqinn/SPARK-31469.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
  • Loading branch information
yaooqinn authored and cloud-fan committed Apr 17, 2020
1 parent f1489e6 commit 697083c
Show file tree
Hide file tree
Showing 10 changed files with 120 additions and 540 deletions.
Expand Up @@ -19,15 +19,9 @@

public class DateTimeConstants {

public static final int YEARS_PER_DECADE = 10;
public static final int YEARS_PER_CENTURY = 100;
public static final int YEARS_PER_MILLENNIUM = 1000;

public static final byte MONTHS_PER_QUARTER = 3;
public static final int MONTHS_PER_YEAR = 12;

public static final byte DAYS_PER_WEEK = 7;
public static final long DAYS_PER_MONTH = 30L;

public static final long HOURS_PER_DAY = 24L;

Expand All @@ -47,9 +41,6 @@ public class DateTimeConstants {
public static final long MICROS_PER_MINUTE = SECONDS_PER_MINUTE * MICROS_PER_SECOND;
public static final long MICROS_PER_HOUR = MINUTES_PER_HOUR * MICROS_PER_MINUTE;
public static final long MICROS_PER_DAY = HOURS_PER_DAY * MICROS_PER_HOUR;
public static final long MICROS_PER_MONTH = DAYS_PER_MONTH * MICROS_PER_DAY;
/* 365.25 days per year assumes leap year every four years */
public static final long MICROS_PER_YEAR = (36525L * MICROS_PER_DAY) / 100;

public static final long NANOS_PER_MICROS = 1000L;
public static final long NANOS_PER_MILLIS = MICROS_PER_MILLIS * NANOS_PER_MICROS;
Expand Down
Expand Up @@ -2131,19 +2131,12 @@ object DatePart {
"MICROSECONDS", ("USEC", "USECS", "USECONDS", "MICROSECON", "US"),
"EPOCH"]
Supported string values of `field` for intervals are:
["MILLENNIUM", ("MILLENNIA", "MIL", "MILS"),
"CENTURY", ("CENTURIES", "C", "CENT"),
"DECADE", ("DECADES", "DEC", "DECS"),
"YEAR", ("Y", "YEARS", "YR", "YRS"),
"QUARTER", ("QTR"),
"MONTH", ("MON", "MONS", "MONTHS"),
"DAY", ("D", "DAYS"),
"HOUR", ("H", "HOURS", "HR", "HRS"),
"MINUTE", ("M", "MIN", "MINS", "MINUTES"),
"SECOND", ("S", "SEC", "SECONDS", "SECS"),
"MILLISECONDS", ("MSEC", "MSECS", "MILLISECON", "MSECONDS", "MS"),
"MICROSECONDS", ("USEC", "USECS", "USECONDS", "MICROSECON", "US"),
"EPOCH"]
["YEAR", ("Y", "YEARS", "YR", "YRS"),
"MONTH", ("MON", "MONS", "MONTHS"),
"DAY", ("D", "DAYS"),
"HOUR", ("H", "HOURS", "HR", "HRS"),
"MINUTE", ("M", "MIN", "MINS", "MINUTES"),
"SECOND", ("S", "SEC", "SECONDS", "SECS")]
* source - a date/timestamp or interval column from where `field` should be extracted
""",
examples = """
Expand Down
Expand Up @@ -45,21 +45,9 @@ abstract class ExtractIntervalPart(
}
}

case class ExtractIntervalMillenniums(child: Expression)
extends ExtractIntervalPart(child, IntegerType, getMillenniums, "getMillenniums")

case class ExtractIntervalCenturies(child: Expression)
extends ExtractIntervalPart(child, IntegerType, getCenturies, "getCenturies")

case class ExtractIntervalDecades(child: Expression)
extends ExtractIntervalPart(child, IntegerType, getDecades, "getDecades")

case class ExtractIntervalYears(child: Expression)
extends ExtractIntervalPart(child, IntegerType, getYears, "getYears")

case class ExtractIntervalQuarters(child: Expression)
extends ExtractIntervalPart(child, ByteType, getQuarters, "getQuarters")

case class ExtractIntervalMonths(child: Expression)
extends ExtractIntervalPart(child, ByteType, getMonths, "getMonths")

Expand All @@ -75,38 +63,18 @@ case class ExtractIntervalMinutes(child: Expression)
case class ExtractIntervalSeconds(child: Expression)
extends ExtractIntervalPart(child, DecimalType(8, 6), getSeconds, "getSeconds")

case class ExtractIntervalMilliseconds(child: Expression)
extends ExtractIntervalPart(child, DecimalType(8, 3), getMilliseconds, "getMilliseconds")

case class ExtractIntervalMicroseconds(child: Expression)
extends ExtractIntervalPart(child, LongType, getMicroseconds, "getMicroseconds")

// Number of seconds in 10000 years is 315576000001 (30 days per one month)
// which is 12 digits + 6 digits for the fractional part of seconds.
case class ExtractIntervalEpoch(child: Expression)
extends ExtractIntervalPart(child, DecimalType(18, 6), getEpoch, "getEpoch")

object ExtractIntervalPart {

def parseExtractField(
extractField: String,
source: Expression,
errorHandleFunc: => Nothing): Expression = extractField.toUpperCase(Locale.ROOT) match {
case "MILLENNIUM" | "MILLENNIA" | "MIL" | "MILS" => ExtractIntervalMillenniums(source)
case "CENTURY" | "CENTURIES" | "C" | "CENT" => ExtractIntervalCenturies(source)
case "DECADE" | "DECADES" | "DEC" | "DECS" => ExtractIntervalDecades(source)
case "YEAR" | "Y" | "YEARS" | "YR" | "YRS" => ExtractIntervalYears(source)
case "QUARTER" | "QTR" => ExtractIntervalQuarters(source)
case "MONTH" | "MON" | "MONS" | "MONTHS" => ExtractIntervalMonths(source)
case "DAY" | "D" | "DAYS" => ExtractIntervalDays(source)
case "HOUR" | "H" | "HOURS" | "HR" | "HRS" => ExtractIntervalHours(source)
case "MINUTE" | "M" | "MIN" | "MINS" | "MINUTES" => ExtractIntervalMinutes(source)
case "SECOND" | "S" | "SEC" | "SECONDS" | "SECS" => ExtractIntervalSeconds(source)
case "MILLISECONDS" | "MSEC" | "MSECS" | "MILLISECON" | "MSECONDS" | "MS" =>
ExtractIntervalMilliseconds(source)
case "MICROSECONDS" | "USEC" | "USECS" | "USECONDS" | "MICROSECON" | "US" =>
ExtractIntervalMicroseconds(source)
case "EPOCH" => ExtractIntervalEpoch(source)
case _ => errorHandleFunc
}
}
Expand Down
Expand Up @@ -50,26 +50,10 @@ object IntervalUtils {
interval.months / MONTHS_PER_YEAR
}

def getMillenniums(interval: CalendarInterval): Int = {
getYears(interval) / YEARS_PER_MILLENNIUM
}

def getCenturies(interval: CalendarInterval): Int = {
getYears(interval) / YEARS_PER_CENTURY
}

def getDecades(interval: CalendarInterval): Int = {
getYears(interval) / YEARS_PER_DECADE
}

def getMonths(interval: CalendarInterval): Byte = {
(interval.months % MONTHS_PER_YEAR).toByte
}

def getQuarters(interval: CalendarInterval): Byte = {
(getMonths(interval) / MONTHS_PER_QUARTER + 1).toByte
}

def getDays(interval: CalendarInterval): Int = {
interval.days
}
Expand All @@ -82,25 +66,8 @@ object IntervalUtils {
((interval.microseconds % MICROS_PER_HOUR) / MICROS_PER_MINUTE).toByte
}

def getMicroseconds(interval: CalendarInterval): Long = {
interval.microseconds % MICROS_PER_MINUTE
}

def getSeconds(interval: CalendarInterval): Decimal = {
Decimal(getMicroseconds(interval), 8, 6)
}

def getMilliseconds(interval: CalendarInterval): Decimal = {
Decimal(getMicroseconds(interval), 8, 3)
}

// Returns total number of seconds with microseconds fractional part in the given interval.
def getEpoch(interval: CalendarInterval): Decimal = {
var result = interval.microseconds
result += MICROS_PER_DAY * interval.days
result += MICROS_PER_YEAR * (interval.months / MONTHS_PER_YEAR)
result += MICROS_PER_MONTH * (interval.months % MONTHS_PER_YEAR)
Decimal(result, 18, 6)
Decimal(interval.microseconds % MICROS_PER_MINUTE, 8, 6)
}

private def toLongWithRange(
Expand Down
Expand Up @@ -34,42 +34,6 @@ class IntervalExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
Literal(stringToInterval( "interval " + s))
}

test("millenniums") {
checkEvaluation(ExtractIntervalMillenniums("0 years"), 0)
checkEvaluation(ExtractIntervalMillenniums("9999 years"), 9)
checkEvaluation(ExtractIntervalMillenniums("1000 years"), 1)
checkEvaluation(ExtractIntervalMillenniums("-2000 years"), -2)
// Microseconds part must not be taken into account
checkEvaluation(ExtractIntervalMillenniums("999 years 400 days"), 0)
// Millennium must be taken from years and months
checkEvaluation(ExtractIntervalMillenniums("999 years 12 months"), 1)
checkEvaluation(ExtractIntervalMillenniums("1000 years -1 months"), 0)
}

test("centuries") {
checkEvaluation(ExtractIntervalCenturies("0 years"), 0)
checkEvaluation(ExtractIntervalCenturies("9999 years"), 99)
checkEvaluation(ExtractIntervalCenturies("1000 years"), 10)
checkEvaluation(ExtractIntervalCenturies("-2000 years"), -20)
// Microseconds part must not be taken into account
checkEvaluation(ExtractIntervalCenturies("99 years 400 days"), 0)
// Century must be taken from years and months
checkEvaluation(ExtractIntervalCenturies("99 years 12 months"), 1)
checkEvaluation(ExtractIntervalCenturies("100 years -1 months"), 0)
}

test("decades") {
checkEvaluation(ExtractIntervalDecades("0 years"), 0)
checkEvaluation(ExtractIntervalDecades("9999 years"), 999)
checkEvaluation(ExtractIntervalDecades("1000 years"), 100)
checkEvaluation(ExtractIntervalDecades("-2000 years"), -200)
// Microseconds part must not be taken into account
checkEvaluation(ExtractIntervalDecades("9 years 400 days"), 0)
// Decade must be taken from years and months
checkEvaluation(ExtractIntervalDecades("9 years 12 months"), 1)
checkEvaluation(ExtractIntervalDecades("10 years -1 months"), 0)
}

test("years") {
checkEvaluation(ExtractIntervalYears("0 years"), 0)
checkEvaluation(ExtractIntervalYears("9999 years"), 9999)
Expand All @@ -82,19 +46,6 @@ class IntervalExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
checkEvaluation(ExtractIntervalYears("10 years -1 months"), 9)
}

test("quarters") {
checkEvaluation(ExtractIntervalQuarters("0 months"), 1.toByte)
checkEvaluation(ExtractIntervalQuarters("1 months"), 1.toByte)
checkEvaluation(ExtractIntervalQuarters("-1 months"), 1.toByte)
checkEvaluation(ExtractIntervalQuarters("2 months"), 1.toByte)
checkEvaluation(ExtractIntervalQuarters("-2 months"), 1.toByte)
checkEvaluation(ExtractIntervalQuarters("1 years -1 months"), 4.toByte)
checkEvaluation(ExtractIntervalQuarters("-1 years 1 months"), -2.toByte)
checkEvaluation(ExtractIntervalQuarters("2 years 3 months"), 2.toByte)
checkEvaluation(ExtractIntervalQuarters("-2 years -3 months"), 0.toByte)
checkEvaluation(ExtractIntervalQuarters("9999 years"), 1.toByte)
}

test("months") {
checkEvaluation(ExtractIntervalMonths("0 year"), 0.toByte)
for (m <- -24 to 24) {
Expand Down Expand Up @@ -158,46 +109,6 @@ class IntervalExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
checkEvaluation(ExtractIntervalSeconds("61 seconds 1 microseconds"), Decimal(1000001, 8, 6))
}

test("milliseconds") {
checkEvaluation(ExtractIntervalMilliseconds("0 milliseconds"), Decimal(0, 8, 3))
checkEvaluation(ExtractIntervalMilliseconds("1 milliseconds"), Decimal(1.0, 8, 3))
checkEvaluation(ExtractIntervalMilliseconds("-1 milliseconds"), Decimal(-1.0, 8, 3))
checkEvaluation(
ExtractIntervalMilliseconds("1 second 999 milliseconds"),
Decimal(1999.0, 8, 3))
checkEvaluation(
ExtractIntervalMilliseconds("999 milliseconds 1 microsecond"),
Decimal(999.001, 8, 3))
checkEvaluation(
ExtractIntervalMilliseconds("-1 second -999 milliseconds"),
Decimal(-1999.0, 8, 3))
// Years and months must not be taken into account
checkEvaluation(ExtractIntervalMilliseconds("100 year 1 millisecond"), Decimal(1.0, 8, 3))
checkEvaluation(ExtractIntervalMilliseconds(largeInterval), Decimal(59999.999, 8, 3))
}

test("microseconds") {
checkEvaluation(ExtractIntervalMicroseconds("0 microseconds"), 0L)
checkEvaluation(ExtractIntervalMicroseconds("1 microseconds"), 1L)
checkEvaluation(ExtractIntervalMicroseconds("-1 microseconds"), -1L)
checkEvaluation(ExtractIntervalMicroseconds("1 second 999 microseconds"), 1000999L)
checkEvaluation(ExtractIntervalMicroseconds("999 milliseconds 1 microseconds"), 999001L)
checkEvaluation(ExtractIntervalMicroseconds("-1 second -999 microseconds"), -1000999L)
// Years and months must not be taken into account
checkEvaluation(ExtractIntervalMicroseconds("11 year 1 microseconds"), 1L)
checkEvaluation(ExtractIntervalMicroseconds(largeInterval), 59999999L)
}

test("epoch") {
checkEvaluation(ExtractIntervalEpoch("0 months"), Decimal(0.0, 18, 6))
checkEvaluation(ExtractIntervalEpoch("10000 years"), Decimal(315576000000.0, 18, 6))
checkEvaluation(ExtractIntervalEpoch("1 year"), Decimal(31557600.0, 18, 6))
checkEvaluation(ExtractIntervalEpoch("-1 year"), Decimal(-31557600.0, 18, 6))
checkEvaluation(
ExtractIntervalEpoch("1 second 1 millisecond 1 microsecond"),
Decimal(1.001001, 18, 6))
}

test("multiply") {
def check(
interval: String,
Expand Down
Expand Up @@ -213,7 +213,7 @@ class RebaseDateTimeSuite extends SparkFunSuite with Matchers with SQLHelper {
val rebased = rebaseGregorianToJulianMicros(zid, micros)
val rebasedAndOptimized = rebaseGregorianToJulianMicros(micros)
assert(rebasedAndOptimized === rebased)
micros += (MICROS_PER_MONTH * (0.5 + Math.random())).toLong
micros += (MICROS_PER_DAY * 30 * (0.5 + Math.random())).toLong
} while (micros <= end)
}
}
Expand All @@ -233,7 +233,7 @@ class RebaseDateTimeSuite extends SparkFunSuite with Matchers with SQLHelper {
val rebased = rebaseJulianToGregorianMicros(zid, micros)
val rebasedAndOptimized = rebaseJulianToGregorianMicros(micros)
assert(rebasedAndOptimized === rebased)
micros += (MICROS_PER_MONTH * (0.5 + Math.random())).toLong
micros += (MICROS_PER_DAY * 30 * (0.5 + Math.random())).toLong
} while (micros <= end)
}
}
Expand Down
34 changes: 0 additions & 34 deletions sql/core/src/test/resources/sql-tests/inputs/date_part.sql
Expand Up @@ -71,30 +71,12 @@ select date_part(null, c) from t;

CREATE TEMPORARY VIEW t2 AS select interval 1010 year 9 month 8 day 7 hour 6 minute 5 second 4 millisecond 3 microsecond as c;

select date_part('millennium', c) from t2;
select date_part('millennia', c) from t2;
select date_part('mil', c) from t2;
select date_part('mils', c) from t2;

select date_part('century', c) from t2;
select date_part('centuries', c) from t2;
select date_part('c', c) from t2;
select date_part('cent', c) from t2;

select date_part('decade', c) from t2;
select date_part('decades', c) from t2;
select date_part('dec', c) from t2;
select date_part('decs', c) from t2;

select date_part('year', c) from t2;
select date_part('y', c) from t2;
select date_part('years', c) from t2;
select date_part('yr', c) from t2;
select date_part('yrs', c) from t2;

select date_part('quarter', c) from t2;
select date_part('qtr', c) from t2;

select date_part('month', c) from t2;
select date_part('mon', c) from t2;
select date_part('mons', c) from t2;
Expand Down Expand Up @@ -122,22 +104,6 @@ select date_part('sec', c) from t2;
select date_part('seconds', c) from t2;
select date_part('secs', c) from t2;

select date_part('milliseconds', c) from t2;
select date_part('msec', c) from t2;
select date_part('msecs', c) from t2;
select date_part('millisecon', c) from t2;
select date_part('mseconds', c) from t2;
select date_part('ms', c) from t2;

select date_part('microseconds', c) from t2;
select date_part('usec', c) from t2;
select date_part('usecs', c) from t2;
select date_part('useconds', c) from t2;
select date_part('microsecon', c) from t2;
select date_part('us', c) from t2;

select date_part('epoch', c) from t2;

select date_part('not_supported', c) from t2;

select date_part(c, c) from t2;
Expand Down

0 comments on commit 697083c

Please sign in to comment.