New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-29364][SQL] Return an interval from date subtract according to SQL standard #26112
Conversation
@cloud-fan Could you take a look at this PR, please. |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
Outdated
Show resolved
Hide resolved
Test build #112041 has finished for PR 26112 at commit
|
Test build #112058 has finished for PR 26112 at commit
|
jenkins, retest this, please |
Test build #112067 has finished for PR 26112 at commit
|
LGTM if tests pass |
Test build #112085 has finished for PR 26112 at commit
|
jenkins, retest this, please |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
Outdated
Show resolved
Hide resolved
Test build #112094 has finished for PR 26112 at commit
|
docs/sql-migration-guide.md
Outdated
@@ -217,6 +217,8 @@ license: | | |||
|
|||
- Since Spark 3.0, the `size` function returns `NULL` for the `NULL` input. In Spark version 2.4 and earlier, this function gives `-1` for the same input. To restore the behavior before Spark 3.0, you can set `spark.sql.legacy.sizeOfNull` to `true`. | |||
|
|||
- In Spark version 2.4 and earlier, dates subtraction `date1` - `date2` gives the number of days from `date1` to `date2`. Since Spark 3.0, the expression has the `INTERVAL` type and returns an interval between two dates. To get the number of days, you can set `spark.sql.legacy.datesSubtraction.enabled` to `true` or use the `datediff` function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We added the date1
- date2
operator since SPARK-27898 (Spark 3.0). Not Spark 2.4 or earlier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good news! then we can remove the config and migration guide :)
Test build #112127 has finished for PR 26112 at commit
|
interval 3 months 2 days | ||
interval 38 years 3 months 1 weeks | ||
interval 39 years 3 months 1 weeks 1 days | ||
interval 40 years 3 months 1 weeks 2 days | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These results are inconsistent with PostgreSQL:
https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/expected/date.out#L845-L882
@@ -849,12 +850,13 @@ object TypeCoercion { | |||
case Add(l @ DateType(), r @ IntegerType()) => DateAdd(l, r) | |||
case Add(l @ IntegerType(), r @ DateType()) => DateAdd(r, l) | |||
case Subtract(l @ DateType(), r @ IntegerType()) => DateSub(l, r) | |||
case Subtract(l @ DateType(), r @ DateType()) => DateDiff(l, r) | |||
case Subtract(l @ TimestampType(), r @ TimestampType()) => TimestampDiff(l, r) | |||
case Subtract(l @ DateType(), r @ DateType()) => SubtractDates(l, r) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like we should use DateDiff
if the dialect is pgsql.
Test build #112139 has finished for PR 26112 at commit
|
retest this please |
Test build #112146 has finished for PR 26112 at commit
|
… SQL standard ### What changes were proposed in this pull request? Proposed new expression `SubtractDates` which is used in `date1` - `date2`. It has the `INTERVAL` type, and returns the interval from `date1` (inclusive) and `date2` (exclusive). For example: ```sql > select date'tomorrow' - date'yesterday'; interval 2 days ``` Closes #26034 ### Why are the changes needed? - To conform the SQL standard which states the result type of `date operand 1` - `date operand 2` must be the interval type. See [4.5.3 Operations involving datetimes and intervals](http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt). - Improve Spark SQL UX and allow mixing date and timestamp in subtractions. For example: `select timestamp'now' + (date'2019-10-01' - date'2019-09-15')` ### Does this PR introduce any user-facing change? Before the query below returns number of days: ```sql spark-sql> select date'2019-10-05' - date'2018-09-01'; 399 ``` After it returns an interval: ```sql spark-sql> select date'2019-10-05' - date'2018-09-01'; interval 1 years 1 months 4 days ``` ### How was this patch tested? - by new tests in `DateExpressionsSuite` and `TypeCoercionSuite`. - by existing tests in `date.sql` Closes #26112 from MaxGekk/date-subtract. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Yuming Wang <wgyumg@gmail.com> (cherry picked from commit d11cbf2) Signed-off-by: Yuming Wang <wgyumg@gmail.com>
Merged to master and branch-3.0-preview. |
… SQL standard Proposed new expression `SubtractDates` which is used in `date1` - `date2`. It has the `INTERVAL` type, and returns the interval from `date1` (inclusive) and `date2` (exclusive). For example: ```sql > select date'tomorrow' - date'yesterday'; interval 2 days ``` Closes apache#26034 - To conform the SQL standard which states the result type of `date operand 1` - `date operand 2` must be the interval type. See [4.5.3 Operations involving datetimes and intervals](http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt). - Improve Spark SQL UX and allow mixing date and timestamp in subtractions. For example: `select timestamp'now' + (date'2019-10-01' - date'2019-09-15')` Before the query below returns number of days: ```sql spark-sql> select date'2019-10-05' - date'2018-09-01'; 399 ``` After it returns an interval: ```sql spark-sql> select date'2019-10-05' - date'2018-09-01'; interval 1 years 1 months 4 days ``` - by new tests in `DateExpressionsSuite` and `TypeCoercionSuite`. - by existing tests in `date.sql` Closes apache#26112 from MaxGekk/date-subtract. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Yuming Wang <wgyumg@gmail.com>
### What changes were proposed in this pull request? Reprocess all PostgreSQL dialect related PRs, listing in order: - #25158: PostgreSQL integral division support [revert] - #25170: UT changes for the integral division support [revert] - #25458: Accept "true", "yes", "1", "false", "no", "0", and unique prefixes as input and trim input for the boolean data type. [revert] - #25697: Combine below 2 feature tags into "spark.sql.dialect" [revert] - #26112: Date substraction support [keep the ANSI-compliant part] - #26444: Rename config "spark.sql.ansi.enabled" to "spark.sql.dialect.spark.ansi.enabled" [revert] - #26463: Cast to boolean support for PostgreSQL dialect [revert] - #26584: Make the behavior of Postgre dialect independent of ansi mode config [keep the ANSI-compliant part] ### Why are the changes needed? As the discussion in http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-PostgreSQL-dialect-td28417.html, we need to remove PostgreSQL dialect form code base for several reasons: 1. The current approach makes the codebase complicated and hard to maintain. 2. Fully migrating PostgreSQL workloads to Spark SQL is not our focus for now. ### Does this PR introduce any user-facing change? Yes, the config `spark.sql.dialect` will be removed. ### How was this patch tested? Existing UT. Closes #26763 from xuanyuanking/SPARK-30125. Lead-authored-by: Yuanjian Li <xyliyuanjian@gmail.com> Co-authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
Proposed new expression
SubtractDates
which is used indate1
-date2
. It has theINTERVAL
type, and returns the interval fromdate1
(inclusive) anddate2
(exclusive). For example:Closes #26034
Why are the changes needed?
date operand 1
-date operand 2
must be the interval type. See 4.5.3 Operations involving datetimes and intervals.select timestamp'now' + (date'2019-10-01' - date'2019-09-15')
Does this PR introduce any user-facing change?
Before the query below returns number of days:
After it returns an interval:
How was this patch tested?
DateExpressionsSuite
andTypeCoercionSuite
.date.sql