Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-28141][SQL] Support special date values #25708

Closed
wants to merge 28 commits into from

Conversation

MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Sep 6, 2019

What changes were proposed in this pull request?

Supported special string values for DATE type. They are simply notational shorthands that will be converted to ordinary date values when read. The following string values are supported:

  • epoch [zoneId] - 1970-01-01
  • today [zoneId] - the current date in the time zone specified by spark.sql.session.timeZone.
  • yesterday [zoneId] - the current date -1
  • tomorrow [zoneId] - the current date + 1
  • now - the date of running the current query. It has the same notion as today.

For example:

spark-sql> SELECT date 'tomorrow' - date 'yesterday';
2

Why are the changes needed?

To maintain feature parity with PostgreSQL, see 8.5.1.4. Special Values

Does this PR introduce any user-facing change?

Previously, the parser fails on the special values with the error:

spark-sql> select date 'today';
Error in query: 
Cannot parse the DATE value: today(line 1, pos 7)

After the changes, the special values are converted to appropriate dates:

spark-sql> select date 'today';
2019-09-06

How was this patch tested?

  • Added tests to DateFormatterSuite to check parsing special values from regular strings.
  • Tests in DateTimeUtilsSuite check parsing those values from UTF8String
  • Uncommented tests in date.sql

@SparkQA
Copy link

SparkQA commented Sep 6, 2019

Test build #110230 has finished for PR 25708 at commit 775c7ec.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


assert(toDate("epoch", zoneId).get === 0)
val today = localDateToDays(LocalDate.now(zoneId))
assert(toDate("yesterday", zoneId).get === today - 1)
Copy link
Member Author

@MaxGekk MaxGekk Sep 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, there is the risk that today contains previous day if we were unlucky and the test was executed at the end of the day. Maybe, need to introduce some tolerance.

@SparkQA
Copy link

SparkQA commented Sep 6, 2019

Test build #110236 has finished for PR 25708 at commit be787f4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 6, 2019

Test build #110255 has finished for PR 25708 at commit 4fb8834.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 6, 2019

Test build #110258 has finished for PR 25708 at commit 6da7905.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Sep 7, 2019

@dongjoon-hyun @maropu @cloud-fan Please, review the PR.

@SparkQA
Copy link

SparkQA commented Sep 7, 2019

Test build #110285 has finished for PR 25708 at commit d57e11e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Sep 8, 2019

The other DBMS-like systems support these syntaxes? If not, we should turn on/off these features by the compatibility mode flag https://issues.apache.org/jira/browse/SPARK-28934?

@maropu
Copy link
Member

maropu commented Sep 8, 2019

  • How about the timestamp case?
  • You dropped inf/-inf in the supported?

org.apache.spark.sql.catalyst.util.DateTimeUtils.microsToEpochDays($c, $zid);"""
case _ =>
(c, evPrim, evNull) => code"$evNull = true;"
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just put the gen'd code for other reviewers;

/* 051 */         boolean project_isNull_0 = columnartorow_isNull_0;
/* 052 */         int project_value_0 = -1;
/* 053 */         if (!columnartorow_isNull_0) {
/* 054 */           scala.Option<Integer> project_intOpt_0 =
/* 055 */           org.apache.spark.sql.catalyst.util.DateTimeUtils.stringToDate(columnartorow_value_0, ((java.time.ZoneId) references[2] /* zoneId */));
/* 056 */           if (project_intOpt_0.isDefined()) {
/* 057 */             project_value_0 = ((Integer) project_intOpt_0.get()).intValue();
/* 058 */           } else {
/* 059 */             project_isNull_0 = true;
/* 060 */           }
/* 061 */         }

val zid = getZoneId()
(c, evPrim, evNull) =>
code"""
scala.Option<Integer> $intOpt =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, (this is not related to this pr though), scala.Option<Integer> is ok for java compilers/jvm? @rednaxelafx I remember that jdk compilers cannot compile this statement: #21770 (comment) because it seems the compilers erase the returned type from scala.Option<Integer> to scala.Option<Object>. I'm not sure why janino accepts this though....

@SparkQA
Copy link

SparkQA commented Sep 10, 2019

Test build #110409 has finished for PR 25708 at commit 106524b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Sep 10, 2019

retest this please

@SparkQA
Copy link

SparkQA commented Sep 10, 2019

Test build #110417 has finished for PR 25708 at commit 106524b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Sep 12, 2019

@maropu @dongjoon-hyun Here is the same question as in #25716 (comment)

…ecial-values

# Conflicts:
#	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
#	sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
#	sql/core/src/test/scala/org/apache/spark/sql/CsvFunctionsSuite.scala
#	sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala
@SparkQA
Copy link

SparkQA commented Sep 18, 2019

Test build #110922 has finished for PR 25708 at commit ff92531.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Sep 19, 2019

I have rebased this on the master with merged #25716 . @HyukjinKwon Could you take a look at this, please.

@MaxGekk
Copy link
Member Author

MaxGekk commented Sep 20, 2019

@dongjoon-hyun @maropu @HyukjinKwon Could take a look at this PR which is similar to already merged changes for the timestamp type.

Copy link
Member

@maropu maropu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice cc: @HyukjinKwon @dongjoon-hyun

@HyukjinKwon
Copy link
Member

I am merging this just to unblock and easier to work further but please make sure to have a flag. Seems like that's pointed out by multiple committers such as @maropu and @cloud-fan.

@HyukjinKwon
Copy link
Member

Otherwise, I will revert this and #25716

Merged to master.

@maropu
Copy link
Member

maropu commented Sep 22, 2019

We already have a jira ticket for linking this feature to the flag?

@MaxGekk
Copy link
Member Author

MaxGekk commented Sep 22, 2019

This PR #25834 hides the feature under the SQL config spark.sql.dialect = "PostgreSQL"

@maropu
Copy link
Member

maropu commented Sep 22, 2019

Oh, I see. Thanks!

dongjoon-hyun pushed a commit that referenced this pull request Sep 27, 2019
… SQL migration guide

### What changes were proposed in this pull request?

Updated the SQL migration guide regarding to recently supported special date and timestamp values, see #25716 and #25708.

Closes #25834

### Why are the changes needed?
To let users know about new feature in Spark 3.0.

### Does this PR introduce any user-facing change?
No

Closes #25948 from MaxGekk/special-values-migration-guide.

Authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
@MaxGekk MaxGekk deleted the datetime-special-values branch October 5, 2019 19:18
MaxGekk added a commit that referenced this pull request Jun 1, 2021
…only

### What changes were proposed in this pull request?
In the PR, I propose to support special datetime values introduced by #25708 and by #25716 only in typed literals, and don't recognize them in parsing strings to dates/timestamps. The following string values are supported only in typed timestamp literals:
- `epoch [zoneId]` - `1970-01-01 00:00:00+00 (Unix system time zero)`
- `today [zoneId]` - midnight today.
- `yesterday [zoneId]` - midnight yesterday
- `tomorrow [zoneId]` - midnight tomorrow
- `now` - current query start time.

For example:
```sql
spark-sql> SELECT timestamp 'tomorrow';
2019-09-07 00:00:00
```

Similarly, the following special date values are supported only in typed date literals:
- `epoch [zoneId]` - `1970-01-01`
- `today [zoneId]` - the current date in the time zone specified by `spark.sql.session.timeZone`.
- `yesterday [zoneId]` - the current date -1
- `tomorrow [zoneId]` - the current date + 1
- `now` - the date of running the current query. It has the same notion as `today`.

For example:
```sql
spark-sql> SELECT date 'tomorrow' - date 'yesterday';
2
```

### Why are the changes needed?
In the current implementation, Spark supports the special date/timestamp value in any input strings casted to dates/timestamps that leads to the following problems:
- If executors have different system time, the result is inconsistent, and random. Column values depend on where the conversions were performed.
- The special values play the role of distributed non-deterministic functions though users might think of the values as constants.

### Does this PR introduce _any_ user-facing change?
Yes but the probability should be small.

### How was this patch tested?
By running existing test suites:
```
$ build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z interval.sql"
$ build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z date.sql"
$ build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z timestamp.sql"
$ build/sbt "test:testOnly *DateTimeUtilsSuite"
```

Closes #32714 from MaxGekk/remove-datetime-special-values.

Lead-authored-by: Max Gekk <max.gekk@gmail.com>
Co-authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
6 participants