Skip to content

Commit

Permalink
[SPARK-28690][SQL] Add date_part function for timestamps/dates
Browse files Browse the repository at this point in the history
## What changes were proposed in this pull request?

In the PR, I propose new function `date_part()`. The function is modeled on the traditional Ingres equivalent to the SQL-standard function `extract`:
```
date_part('field', source)
```
and added for feature parity with PostgreSQL (https://www.postgresql.org/docs/11/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT).

The `source` can have `DATE` or `TIMESTAMP` type. Supported string values of `'field'` are:
- `millennium` - the current millennium for given date (or a timestamp implicitly casted to a date). For example, years in the 1900s are in the second millennium. The third millennium started _January 1, 2001_.
- `century` - the current millennium for given date (or timestamp). The first century starts at 0001-01-01 AD.
- `decade` - the current decade for given date (or timestamp). Actually, this is the year field divided by 10.
- isoyear` - the ISO 8601 week-numbering year that the date falls in. Each ISO 8601 week-numbering year begins with the Monday of the week containing the 4th of January.
- `year`, `month`, `day`, `hour`, `minute`, `second`
- `week` - the number of the ISO 8601 week-numbering week of the year. By definition, ISO weeks start on Mondays and the first week of a year contains January 4 of that year.
- `quarter` - the quarter of the year (1 - 4)
- `dayofweek` - the day of the week for date/timestamp (1 = Sunday, 2 = Monday, ..., 7 = Saturday)
- `dow` - the day of the week as Sunday (0) to Saturday (6)
- `isodow` - the day of the week as Monday (1) to Sunday (7)
- `doy` - the day of the year (1 - 365/366)
- `milliseconds` - the seconds field including fractional parts multiplied by 1,000.
- `microseconds` - the seconds field including fractional parts multiplied by 1,000,000.
- `epoch` - the number of seconds since 1970-01-01 00:00:00 local time in microsecond precision.

Here are examples:
```sql
spark-sql> select date_part('year', timestamp'2019-08-12 01:00:00.123456');
2019
spark-sql> select date_part('week', timestamp'2019-08-12 01:00:00.123456');
33
spark-sql> select date_part('doy', timestamp'2019-08-12 01:00:00.123456');
224
```

I changed implementation of `extract` to re-use `date_part()` internally.

## How was this patch tested?

Added `date_part.sql` and regenerated results of `extract.sql`.

Closes apache#25410 from MaxGekk/date_part.

Lead-authored-by: Maxim Gekk <max.gekk@gmail.com>
Co-authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
  • Loading branch information
2 people authored and PavithraRamachandran committed Sep 14, 2019
1 parent 4284696 commit 3789eb1
Show file tree
Hide file tree
Showing 9 changed files with 727 additions and 153 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -417,6 +417,7 @@ object FunctionRegistry {
expression[TimeWindow]("window"),
expression[MakeDate]("make_date"),
expression[MakeTimestamp]("make_timestamp"),
expression[DatePart]("date_part"),

// collection functions
expression[CreateArray]("array"),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1963,3 +1963,90 @@ case class Epoch(child: Expression, timeZoneId: Option[String] = None)
defineCodeGen(ctx, ev, c => s"$dtu.getEpoch($c, $zid)")
}
}

object DatePart {

def parseExtractField(
extractField: String,
source: Expression,
errorHandleFunc: => Nothing): Expression = extractField.toUpperCase(Locale.ROOT) match {
case "MILLENNIUM" | "MILLENNIA" | "MIL" | "MILS" => Millennium(source)
case "CENTURY" | "CENTURIES" | "C" | "CENT" => Century(source)
case "DECADE" | "DECADES" | "DEC" | "DECS" => Decade(source)
case "YEAR" | "Y" | "YEARS" | "YR" | "YRS" => Year(source)
case "ISOYEAR" => IsoYear(source)
case "QUARTER" | "QTR" => Quarter(source)
case "MONTH" | "MON" | "MONS" | "MONTHS" => Month(source)
case "WEEK" | "W" | "WEEKS" => WeekOfYear(source)
case "DAY" | "D" | "DAYS" => DayOfMonth(source)
case "DAYOFWEEK" => DayOfWeek(source)
case "DOW" => Subtract(DayOfWeek(source), Literal(1))
case "ISODOW" => Add(WeekDay(source), Literal(1))
case "DOY" => DayOfYear(source)
case "HOUR" | "H" | "HOURS" | "HR" | "HRS" => Hour(source)
case "MINUTE" | "M" | "MIN" | "MINS" | "MINUTES" => Minute(source)
case "SECOND" | "S" | "SEC" | "SECONDS" | "SECS" => Second(source)
case "MILLISECONDS" | "MSEC" | "MSECS" | "MILLISECON" | "MSECONDS" | "MS" =>
Milliseconds(source)
case "MICROSECONDS" | "USEC" | "USECS" | "USECONDS" | "MICROSECON" | "US" =>
Microseconds(source)
case "EPOCH" => Epoch(source)
case _ => errorHandleFunc
}
}

@ExpressionDescription(
usage = "_FUNC_(field, source) - Extracts a part of the date/timestamp.",
arguments = """
Arguments:
* field - selects which part of the source should be extracted. Supported string values are:
["MILLENNIUM", ("MILLENNIA", "MIL", "MILS"),
"CENTURY", ("CENTURIES", "C", "CENT"),
"DECADE", ("DECADES", "DEC", "DECS"),
"YEAR", ("Y", "YEARS", "YR", "YRS"),
"ISOYEAR",
"QUARTER", ("QTR"),
"MONTH", ("MON", "MONS", "MONTHS"),
"WEEK", ("W", "WEEKS"),
"DAY", ("D", "DAYS"),
"DAYOFWEEK",
"DOW",
"ISODOW",
"DOY",
"HOUR", ("H", "HOURS", "HR", "HRS"),
"MINUTE", ("M", "MIN", "MINS", "MINUTES"),
"SECOND", ("S", "SEC", "SECONDS", "SECS"),
"MILLISECONDS", ("MSEC", "MSECS", "MILLISECON", "MSECONDS", "MS"),
"MICROSECONDS", ("USEC", "USECS", "USECONDS", "MICROSECON", "US"),
"EPOCH"]
* source - a date (or timestamp) column from where `field` should be extracted
""",
examples = """
Examples:
> SELECT _FUNC_('YEAR', TIMESTAMP '2019-08-12 01:00:00.123456');
2019
> SELECT _FUNC_('week', timestamp'2019-08-12 01:00:00.123456');
33
> SELECT _FUNC_('doy', DATE'2019-08-12');
224
""",
since = "3.0.0")
case class DatePart(field: Expression, source: Expression, child: Expression)
extends RuntimeReplaceable {

def this(field: Expression, source: Expression) {
this(field, source, {
if (!field.foldable) {
throw new AnalysisException("The field parameter needs to be a foldable string value.")
}
val fieldStr = field.eval().asInstanceOf[UTF8String].toString
DatePart.parseExtractField(fieldStr, source, {
throw new AnalysisException(s"Literals of type '$fieldStr' are currently not supported.")
})
})
}

override def flatArguments: Iterator[Any] = Iterator(field, source)
override def sql: String = s"$prettyName(${field.sql}, ${source.sql})"
override def prettyName: String = "date_part"
}
Original file line number Diff line number Diff line change
Expand Up @@ -1409,48 +1409,12 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging
* Create a Extract expression.
*/
override def visitExtract(ctx: ExtractContext): Expression = withOrigin(ctx) {
ctx.field.getText.toUpperCase(Locale.ROOT) match {
case "MILLENNIUM" | "MILLENNIA" | "MIL" | "MILS" =>
Millennium(expression(ctx.source))
case "CENTURY" | "CENTURIES" | "C" | "CENT" =>
Century(expression(ctx.source))
case "DECADE" | "DECADES" | "DEC" | "DECS" =>
Decade(expression(ctx.source))
case "YEAR" | "Y" | "YEARS" | "YR" | "YRS" =>
Year(expression(ctx.source))
case "ISOYEAR" =>
IsoYear(expression(ctx.source))
case "QUARTER" | "QTR" =>
Quarter(expression(ctx.source))
case "MONTH" | "MON" | "MONS" | "MONTHS" =>
Month(expression(ctx.source))
case "WEEK" | "W" | "WEEKS" =>
WeekOfYear(expression(ctx.source))
case "DAY" | "D" | "DAYS" =>
DayOfMonth(expression(ctx.source))
case "DAYOFWEEK" =>
DayOfWeek(expression(ctx.source))
case "DOW" =>
Subtract(DayOfWeek(expression(ctx.source)), Literal(1))
case "ISODOW" =>
Add(WeekDay(expression(ctx.source)), Literal(1))
case "DOY" =>
DayOfYear(expression(ctx.source))
case "HOUR" | "H" | "HOURS" | "HR" | "HRS" =>
Hour(expression(ctx.source))
case "MINUTE" | "M" | "MIN" | "MINS" | "MINUTES" =>
Minute(expression(ctx.source))
case "SECOND" | "S" | "SEC" | "SECONDS" | "SECS" =>
Second(expression(ctx.source))
case "MILLISECONDS" | "MSEC" | "MSECS" | "MILLISECON" | "MSECONDS" | "MS" =>
Milliseconds(expression(ctx.source))
case "MICROSECONDS" | "USEC" | "USECS" | "USECONDS" | "MICROSECON" | "US" =>
Microseconds(expression(ctx.source))
case "EPOCH" =>
Epoch(expression(ctx.source))
case other =>
throw new ParseException(s"Literals of type '$other' are currently not supported.", ctx)
}
val fieldStr = ctx.field.getText
val source = expression(ctx.source)
val extractField = DatePart.parseExtractField(fieldStr, source, {
throw new ParseException(s"Literals of type '$fieldStr' are currently not supported.", ctx)
})
new DatePart(Literal(fieldStr), expression(ctx.source), extractField)
}

/**
Expand Down
68 changes: 68 additions & 0 deletions sql/core/src/test/resources/sql-tests/inputs/date_part.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
CREATE TEMPORARY VIEW t AS select '2011-05-06 07:08:09.1234567' as c;

select date_part('millennium', c) from t;
select date_part('millennia', c) from t;
select date_part('mil', c) from t;
select date_part('mils', c) from t;

select date_part('century', c) from t;
select date_part('centuries', c) from t;
select date_part('c', c) from t;
select date_part('cent', c) from t;

select date_part('decade', c) from t;
select date_part('decades', c) from t;
select date_part('dec', c) from t;
select date_part('decs', c) from t;

select date_part('year', c) from t;
select date_part('y', c) from t;
select date_part('years', c) from t;
select date_part('yr', c) from t;
select date_part('yrs', c) from t;

select date_part('quarter', c) from t;
select date_part('qtr', c) from t;

select date_part('month', c) from t;
select date_part('mon', c) from t;
select date_part('mons', c) from t;
select date_part('months', c) from t;

select date_part('week', c) from t;
select date_part('w', c) from t;
select date_part('weeks', c) from t;

select date_part('day', c) from t;
select date_part('d', c) from t;
select date_part('days', c) from t;

select date_part('dayofweek', c) from t;

select date_part('dow', c) from t;

select date_part('isodow', c) from t;

select date_part('doy', c) from t;

select date_part('hour', c) from t;
select date_part('h', c) from t;
select date_part('hours', c) from t;
select date_part('hr', c) from t;
select date_part('hrs', c) from t;

select date_part('minute', c) from t;
select date_part('m', c) from t;
select date_part('min', c) from t;
select date_part('mins', c) from t;
select date_part('minutes', c) from t;

select date_part('second', c) from t;
select date_part('s', c) from t;
select date_part('sec', c) from t;
select date_part('seconds', c) from t;
select date_part('secs', c) from t;

select date_part('not_supported', c) from t;

select date_part(c, c) from t;
31 changes: 15 additions & 16 deletions sql/core/src/test/resources/sql-tests/inputs/pgSQL/timestamp.sql
Original file line number Diff line number Diff line change
Expand Up @@ -187,22 +187,21 @@ SELECT '' AS date_trunc_week, date_trunc( 'week', timestamp '2004-02-29 15:44:17
-- WHERE d1 BETWEEN timestamp '1902-01-01'
-- AND timestamp '2038-01-01';

-- [SPARK-28420] Date/Time Functions: date_part
-- SELECT '' AS "54", d1 as "timestamp",
-- date_part( 'year', d1) AS year, date_part( 'month', d1) AS month,
-- date_part( 'day', d1) AS day, date_part( 'hour', d1) AS hour,
-- date_part( 'minute', d1) AS minute, date_part( 'second', d1) AS second
-- FROM TIMESTAMP_TBL WHERE d1 BETWEEN '1902-01-01' AND '2038-01-01';

-- SELECT '' AS "54", d1 as "timestamp",
-- date_part( 'quarter', d1) AS quarter, date_part( 'msec', d1) AS msec,
-- date_part( 'usec', d1) AS usec
-- FROM TIMESTAMP_TBL WHERE d1 BETWEEN '1902-01-01' AND '2038-01-01';

-- SELECT '' AS "54", d1 as "timestamp",
-- date_part( 'isoyear', d1) AS isoyear, date_part( 'week', d1) AS week,
-- date_part( 'dow', d1) AS dow
-- FROM TIMESTAMP_TBL WHERE d1 BETWEEN '1902-01-01' AND '2038-01-01';
SELECT '' AS `54`, d1 as `timestamp`,
date_part( 'year', d1) AS `year`, date_part( 'month', d1) AS `month`,
date_part( 'day', d1) AS `day`, date_part( 'hour', d1) AS `hour`,
date_part( 'minute', d1) AS `minute`, date_part( 'second', d1) AS `second`
FROM TIMESTAMP_TBL WHERE d1 BETWEEN '1902-01-01' AND '2038-01-01';

SELECT '' AS `54`, d1 as `timestamp`,
date_part( 'quarter', d1) AS quarter, date_part( 'msec', d1) AS msec,
date_part( 'usec', d1) AS usec
FROM TIMESTAMP_TBL WHERE d1 BETWEEN '1902-01-01' AND '2038-01-01';

SELECT '' AS `54`, d1 as `timestamp`,
date_part( 'isoyear', d1) AS isoyear, date_part( 'week', d1) AS week,
date_part( 'dow', d1) AS dow
FROM TIMESTAMP_TBL WHERE d1 BETWEEN '1902-01-01' AND '2038-01-01';

-- [SPARK-28137] Data Type Formatting Functions
-- TO_CHAR()
Expand Down
Loading

0 comments on commit 3789eb1

Please sign in to comment.