Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-29364][SQL] Return an interval from date subtract according to SQL standard #26112

Closed
wants to merge 14 commits into from

Conversation

MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Oct 14, 2019

What changes were proposed in this pull request?

Proposed new expression SubtractDates which is used in date1 - date2. It has the INTERVAL type, and returns the interval from date1 (inclusive) and date2 (exclusive). For example:

> select date'tomorrow' - date'yesterday';
interval 2 days

Closes #26034

Why are the changes needed?

  • To conform the SQL standard which states the result type of date operand 1 - date operand 2 must be the interval type. See 4.5.3 Operations involving datetimes and intervals.
  • Improve Spark SQL UX and allow mixing date and timestamp in subtractions. For example: select timestamp'now' + (date'2019-10-01' - date'2019-09-15')

Does this PR introduce any user-facing change?

Before the query below returns number of days:

spark-sql> select date'2019-10-05' - date'2018-09-01';
399

After it returns an interval:

spark-sql> select date'2019-10-05' - date'2018-09-01';
interval 1 years 1 months 4 days

How was this patch tested?

  • by new tests in DateExpressionsSuite and TypeCoercionSuite.
  • by existing tests in date.sql

@MaxGekk
Copy link
Member Author

MaxGekk commented Oct 14, 2019

@cloud-fan Could you take a look at this PR, please.

@SparkQA
Copy link

SparkQA commented Oct 14, 2019

Test build #112041 has finished for PR 26112 at commit 4a8173b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 14, 2019

Test build #112058 has finished for PR 26112 at commit 199adff.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Oct 14, 2019

jenkins, retest this, please

@SparkQA
Copy link

SparkQA commented Oct 15, 2019

Test build #112067 has finished for PR 26112 at commit 199adff.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

LGTM if tests pass

@SparkQA
Copy link

SparkQA commented Oct 15, 2019

Test build #112085 has finished for PR 26112 at commit 8b49c9b.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Oct 15, 2019

jenkins, retest this, please

@SparkQA
Copy link

SparkQA commented Oct 15, 2019

Test build #112094 has finished for PR 26112 at commit 8b49c9b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -217,6 +217,8 @@ license: |

- Since Spark 3.0, the `size` function returns `NULL` for the `NULL` input. In Spark version 2.4 and earlier, this function gives `-1` for the same input. To restore the behavior before Spark 3.0, you can set `spark.sql.legacy.sizeOfNull` to `true`.

- In Spark version 2.4 and earlier, dates subtraction `date1` - `date2` gives the number of days from `date1` to `date2`. Since Spark 3.0, the expression has the `INTERVAL` type and returns an interval between two dates. To get the number of days, you can set `spark.sql.legacy.datesSubtraction.enabled` to `true` or use the `datediff` function.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We added the date1 - date2 operator since SPARK-27898 (Spark 3.0). Not Spark 2.4 or earlier.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good news! then we can remove the config and migration guide :)

@SparkQA
Copy link

SparkQA commented Oct 15, 2019

Test build #112127 has finished for PR 26112 at commit f232b64.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

interval 3 months 2 days
interval 38 years 3 months 1 weeks
interval 39 years 3 months 1 weeks 1 days
interval 40 years 3 months 1 weeks 2 days

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -849,12 +850,13 @@ object TypeCoercion {
case Add(l @ DateType(), r @ IntegerType()) => DateAdd(l, r)
case Add(l @ IntegerType(), r @ DateType()) => DateAdd(r, l)
case Subtract(l @ DateType(), r @ IntegerType()) => DateSub(l, r)
case Subtract(l @ DateType(), r @ DateType()) => DateDiff(l, r)
case Subtract(l @ TimestampType(), r @ TimestampType()) => TimestampDiff(l, r)
case Subtract(l @ DateType(), r @ DateType()) => SubtractDates(l, r)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like we should use DateDiff if the dialect is pgsql.

@SparkQA
Copy link

SparkQA commented Oct 16, 2019

Test build #112139 has finished for PR 26112 at commit c6ec211.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum
Copy link
Member

wangyum commented Oct 16, 2019

retest this please

@SparkQA
Copy link

SparkQA commented Oct 16, 2019

Test build #112146 has finished for PR 26112 at commit c6ec211.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum wangyum closed this in d11cbf2 Oct 16, 2019
wangyum pushed a commit that referenced this pull request Oct 16, 2019
… SQL standard

### What changes were proposed in this pull request?
Proposed new expression `SubtractDates` which is used in `date1` - `date2`. It has the `INTERVAL` type, and returns the interval from `date1` (inclusive) and `date2` (exclusive). For example:
```sql
> select date'tomorrow' - date'yesterday';
interval 2 days
```

Closes #26034

### Why are the changes needed?
- To conform the SQL standard which states the result type of `date operand 1` - `date operand 2` must be the interval type. See [4.5.3  Operations involving datetimes and intervals](http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt).
- Improve Spark SQL UX and allow mixing date and timestamp in subtractions. For example: `select timestamp'now' + (date'2019-10-01' - date'2019-09-15')`

### Does this PR introduce any user-facing change?
Before the query below returns number of days:
```sql
spark-sql> select date'2019-10-05' - date'2018-09-01';
399
```
After it returns an interval:
```sql
spark-sql> select date'2019-10-05' - date'2018-09-01';
interval 1 years 1 months 4 days
```

### How was this patch tested?
- by new tests in `DateExpressionsSuite` and `TypeCoercionSuite`.
- by existing tests in `date.sql`

Closes #26112 from MaxGekk/date-subtract.

Authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: Yuming Wang <wgyumg@gmail.com>
(cherry picked from commit d11cbf2)
Signed-off-by: Yuming Wang <wgyumg@gmail.com>
@wangyum
Copy link
Member

wangyum commented Oct 16, 2019

Merged to master and branch-3.0-preview.

xuanyuanking pushed a commit to xuanyuanking/spark that referenced this pull request Dec 9, 2019
… SQL standard

Proposed new expression `SubtractDates` which is used in `date1` - `date2`. It has the `INTERVAL` type, and returns the interval from `date1` (inclusive) and `date2` (exclusive). For example:
```sql
> select date'tomorrow' - date'yesterday';
interval 2 days
```

Closes apache#26034

- To conform the SQL standard which states the result type of `date operand 1` - `date operand 2` must be the interval type. See [4.5.3  Operations involving datetimes and intervals](http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt).
- Improve Spark SQL UX and allow mixing date and timestamp in subtractions. For example: `select timestamp'now' + (date'2019-10-01' - date'2019-09-15')`

Before the query below returns number of days:
```sql
spark-sql> select date'2019-10-05' - date'2018-09-01';
399
```
After it returns an interval:
```sql
spark-sql> select date'2019-10-05' - date'2018-09-01';
interval 1 years 1 months 4 days
```

- by new tests in `DateExpressionsSuite` and `TypeCoercionSuite`.
- by existing tests in `date.sql`

Closes apache#26112 from MaxGekk/date-subtract.

Authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: Yuming Wang <wgyumg@gmail.com>
cloud-fan pushed a commit that referenced this pull request Dec 10, 2019
### What changes were proposed in this pull request?
Reprocess all PostgreSQL dialect related PRs, listing in order:

- #25158: PostgreSQL integral division support [revert]
- #25170: UT changes for the integral division support [revert]
- #25458: Accept "true", "yes", "1", "false", "no", "0", and unique prefixes as input and trim input for the boolean data type. [revert]
- #25697: Combine below 2 feature tags into "spark.sql.dialect" [revert]
- #26112: Date substraction support [keep the ANSI-compliant part]
- #26444: Rename config "spark.sql.ansi.enabled" to "spark.sql.dialect.spark.ansi.enabled" [revert]
- #26463: Cast to boolean support for PostgreSQL dialect [revert]
- #26584: Make the behavior of Postgre dialect independent of ansi mode config [keep the ANSI-compliant part]

### Why are the changes needed?
As the discussion in http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-PostgreSQL-dialect-td28417.html, we need to remove PostgreSQL dialect form code base for several reasons:
1. The current approach makes the codebase complicated and hard to maintain.
2. Fully migrating PostgreSQL workloads to Spark SQL is not our focus for now.

### Does this PR introduce any user-facing change?
Yes, the config `spark.sql.dialect` will be removed.

### How was this patch tested?
Existing UT.

Closes #26763 from xuanyuanking/SPARK-30125.

Lead-authored-by: Yuanjian Li <xyliyuanjian@gmail.com>
Co-authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@MaxGekk MaxGekk deleted the date-subtract branch June 5, 2020 19:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
6 participants