Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-28623][SQL] Support dow, isodow and doy by extract() #25367

Closed
wants to merge 4 commits into from

Conversation

MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Aug 6, 2019

What changes were proposed in this pull request?

In the PR, I propose to use existing expressions DayOfYear, WeekDay and DayOfWeek, and support additional parameters of extract() for feature parity with PostgreSQL (https://www.postgresql.org/docs/11/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT):

  1. dow - the day of the week as Sunday (0) to Saturday (6)
  2. isodow - the day of the week as Monday (1) to Sunday (7)
  3. doy - the day of the year (1 - 365/366)

Here are examples:

spark-sql> SELECT EXTRACT(DOW FROM TIMESTAMP '2001-02-16 20:38:40');
5
spark-sql> SELECT EXTRACT(ISODOW FROM TIMESTAMP '2001-02-18 20:38:40');
7
spark-sql> SELECT EXTRACT(DOY FROM TIMESTAMP '2001-02-16 20:38:40');
47

How was this patch tested?

Updated extract.sql.

@SparkQA
Copy link

SparkQA commented Aug 6, 2019

Test build #108711 has finished for PR 25367 at commit ef3552b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Merged to master.
Thank you, @MaxGekk !

@MaxGekk MaxGekk deleted the extract-ext branch October 5, 2019 19:17
@@ -1408,6 +1408,12 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging
DayOfMonth(expression(ctx.source))
case "DAYOFWEEK" =>
DayOfWeek(expression(ctx.source))
case "DOW" =>
Subtract(DayOfWeek(expression(ctx.source)), Literal(1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's very confusing that DOW returns different results from DAYOFWEEK. DOW should just be an abbreviation of DAYOFWEEK. See also https://docs.snowflake.com/en/sql-reference/functions-date-time.html#label-supported-date-time-parts

Do we have a clear document about what these names mean? cc @HyukjinKwon @maropu @wangyum

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we just follow PostgreSQL definition of DOW, see:

dow
The day of the week as Sunday (0) to Saturday (6)

SELECT EXTRACT(DOW FROM TIMESTAMP '2001-02-16 20:38:40');
Result: 5

Do you propose to break compatibility with PostgreSQL?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We never promise pgsql compatibility and internal consistency is much more important for Spark.

For this case, I think the behavior of DOW from pgsql is correct and commonly used by other databases. We should fix our DAYOFWEEK instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a clear document about what these names mean? cc @HyukjinKwon @maropu @wangyum

Currently, we just list up these names in ExpressionDescription of DatePart: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala#L2124
If we add ExpressionDescription for EXTRACT, the list will appear for EXTRACT, too, in the Built-in Functions document for Spark 3.0. If we need to describe more about these parameters, I think we need to update the argument section there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, @cloud-fan . DAYOFWEEK is added two years ago. (#21479)

We should fix our DAYOFWEEK instead.

cc @marmbrus and @gatorsmile since the request might mean a behavior change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants