New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-28623][SQL] Support dow
, isodow
and doy
by extract()
#25367
Conversation
Test build #108711 has finished for PR 25367 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Merged to master.
Thank you, @MaxGekk !
@@ -1408,6 +1408,12 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging | |||
DayOfMonth(expression(ctx.source)) | |||
case "DAYOFWEEK" => | |||
DayOfWeek(expression(ctx.source)) | |||
case "DOW" => | |||
Subtract(DayOfWeek(expression(ctx.source)), Literal(1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's very confusing that DOW
returns different results from DAYOFWEEK
. DOW
should just be an abbreviation of DAYOFWEEK
. See also https://docs.snowflake.com/en/sql-reference/functions-date-time.html#label-supported-date-time-parts
Do we have a clear document about what these names mean? cc @HyukjinKwon @maropu @wangyum
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we just follow PostgreSQL definition of DOW
, see:
dow
The day of the week as Sunday (0) to Saturday (6)
SELECT EXTRACT(DOW FROM TIMESTAMP '2001-02-16 20:38:40');
Result: 5
Do you propose to break compatibility with PostgreSQL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We never promise pgsql compatibility and internal consistency is much more important for Spark.
For this case, I think the behavior of DOW
from pgsql is correct and commonly used by other databases. We should fix our DAYOFWEEK
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a clear document about what these names mean? cc @HyukjinKwon @maropu @wangyum
Currently, we just list up these names in ExpressionDescription
of DatePart
: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala#L2124
If we add ExpressionDescription
for EXTRACT
, the list will appear for EXTRACT
, too, in the Built-in Functions document for Spark 3.0. If we need to describe more about these parameters, I think we need to update the argument
section there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, @cloud-fan . DAYOFWEEK
is added two years ago. (#21479)
We should fix our DAYOFWEEK instead.
cc @marmbrus and @gatorsmile since the request might mean a behavior change.
What changes were proposed in this pull request?
In the PR, I propose to use existing expressions
DayOfYear
,WeekDay
andDayOfWeek
, and support additional parameters ofextract()
for feature parity with PostgreSQL (https://www.postgresql.org/docs/11/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT):dow
- the day of the week as Sunday (0) to Saturday (6)isodow
- the day of the week as Monday (1) to Sunday (7)doy
- the day of the year (1 - 365/366)Here are examples:
How was this patch tested?
Updated
extract.sql
.