Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-32133][SQL] Forbid time field steps for date start/end in Sequence #28926

Conversation

TJX2014
Copy link
Contributor

@TJX2014 TJX2014 commented Jun 25, 2020

What changes were proposed in this pull request?

1.Add time field steps check for date start/end in Sequence at org.apache.spark.sql.catalyst.expressions.Sequence.TemporalSequenceImpl
2.Add a UT:SPARK-32133: Sequence step must be a day interval if start and end values are dates at org.apache.spark.sql.catalyst.expressions.CollectionExpressionsSuite

Why are the changes needed?

Sequence time field steps for date start/end looks strange in spark as follows:

scala> sql("select explode(sequence(cast('2011-03-01' as date), cast('2011-03-02' as date), interval 1 hour))").head(3)
res0: Array[org.apache.spark.sql.Row] = _Array([2011-03-01], [2011-03-01], [2011-03-01])_ **<- strange result.**

scala> sql("select explode(sequence(cast('2011-03-01' as date), cast('2011-03-02' as date), interval 1 day))").head(3)
res1: Array[org.apache.spark.sql.Row] = Array([2011-03-01], [2011-03-02])

While this behavior in Prosto make sense:

presto> select sequence(date('2011-03-01'),date('2011-03-02'),interval '1' hour);
Query 20200624_122744_00002_pehix failed: sequence step must be a day interval if start and end values are dates
presto> select sequence(date('2011-03-01'),date('2011-03-02'),interval '1' day);
_col0
[2011-03-01, 2011-03-02]

Does this PR introduce any user-facing change?

Yes, after this patch, users will get informed sequence step must be a day interval if start and end values are dates when
use time field steps for date start/end in Sequence.

How was this patch tested?

Unit test.

@@ -2612,6 +2614,9 @@ object Sequence {
val stepDays = step.days
val stepMicros = step.microseconds

require(scale != MICROS_PER_DAY || stepMonths != 0 || stepDays != 0,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add scale constraint here to DateType

@TJX2014
Copy link
Contributor Author

TJX2014 commented Jun 25, 2020

Follow: #28856
cc @cloud-fan @MaxGekk

@@ -2612,6 +2614,9 @@ object Sequence {
val stepDays = step.days
val stepMicros = step.microseconds

require(scale != MICROS_PER_DAY || stepMonths != 0 || stepDays != 0,
"sequence step must be a day interval if start and end values are dates")

if (stepMonths == 0 && stepMicros == 0 && scale == MICROS_PER_DAY) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove this branch now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems we need the require check in eval, and I remove SPARK-32198 branch code from this PR, is it ok ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TJX2014 . What is SPARK-32198? It's not created yet.

I remove SPARK-32198 branch code from this PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I have the same question like @cloud-fan .

Copy link
Contributor Author

@TJX2014 TJX2014 Jul 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun Sorry, the correct PR is SPARK-31982, this PR is forbid step and the other is cross the timezone.

val startMicros: Long = num.toLong(start) * scale
val stopMicros: Long = num.toLong(stop) * scale

// Date to timestamp is not equal from GMT and Chicago timezones
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we exclude these changes and make this PR smaller to only focus on "forbid time field steps for date start/end"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan Thanks, I have followed the suggestion and make a new jira ticket for this.

@TJX2014 TJX2014 force-pushed the master-SPARK-31982-sequence-cross-dst-follow-presto branch from e6ccfec to 7021539 Compare June 29, 2020 23:02
@TJX2014 TJX2014 changed the title [SPARK-31982][SQL][FOLLOWUP]Function sequence doesn't handle date increments that cross DST [SPARK-32133][SQL] Forbid time field steps for date start/end in Sequence Jun 29, 2020
@TJX2014 TJX2014 requested a review from cloud-fan June 30, 2020 00:21
@@ -2679,6 +2682,11 @@ object Sequence {
|final int $stepDays = $step.days;
|final long $stepMicros = $step.microseconds;
|
|if (${scale}L == ${MICROS_PER_DAY}L && $stepMonths == 0 && $stepDays == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the scale is known before generating the code. We can do better

val check = if (scale == MICROS_PER_DAY) {
  s"""
    if ($stepMonths == 0 && $stepDays == 0) ...
   """
} else {
  ""
}

@@ -2612,6 +2612,9 @@ object Sequence {
val stepDays = step.days
val stepMicros = step.microseconds

require(scale != MICROS_PER_DAY || stepMonths != 0 || stepDays != 0,
"sequence step must be a day interval if start and end values are dates")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use

if (...) throw new IllegalArgumentException

to be more consistent with the codegen version.

test("SPARK-32133: Sequence step must be a day interval " +
"if start and end values are dates") {
val e = intercept[Exception](
checkEvaluation(Sequence(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should use checkExceptionInExpression for error tests.

Comment on lines 1865 to 1869
checkEvaluation(Sequence(
Cast(Literal("2011-03-01"), DateType),
Cast(Literal("2011-03-02"), DateType),
Option(Literal(stringToInterval("interval 1 day")))),
Seq(Date.valueOf("2011-03-01"), Date.valueOf("2011-03-02")))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan Thank you for your suggestion, I have done. Besides, I add a positive test case, hope this could be better.

@@ -1854,4 +1854,18 @@ class CollectionExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper
Literal(stringToInterval("interval 1 year"))),
Seq(Date.valueOf("2018-01-01")))
}

test("SPARK-32133: Sequence step must be a day interval " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of adding a new test, shall we just put the new negative test in the existing test case Sequence of dates? We can point to the JIRA number in the code comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I have moved the negative test to Sequence of dates and point to the JIRA number in the code comment.

@cloud-fan
Copy link
Contributor

ok to test

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Jul 1, 2020

Test build #124752 has finished for PR 28926 at commit 24384c9.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Jul 1, 2020

Test build #124759 has finished for PR 28926 at commit 24384c9.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 1, 2020

Test build #5049 has finished for PR 28926 at commit 24384c9.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Jul 1, 2020

Test build #124791 has finished for PR 28926 at commit 24384c9.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -2612,6 +2612,11 @@ object Sequence {
val stepDays = step.days
val stepMicros = step.microseconds

if(scale == MICROS_PER_DAY && stepMonths == 0 && stepDays == 0) {
Copy link
Member

@dongjoon-hyun dongjoon-hyun Jul 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

val check = if (scale == MICROS_PER_DAY) {
s"""
if ($stepMonths == 0 && $stepDays == 0) {
throw new IllegalArgumentException(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation?

val check = if (scale == MICROS_PER_DAY) {
s"""
if ($stepMonths == 0 && $stepDays == 0) {
throw new IllegalArgumentException(
Copy link
Contributor

@cloud-fan cloud-fan Jul 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: for generated code, we shall use the multiline string syntax:

s"""
  |if ...
  |""".stripMargin

@SparkQA
Copy link

SparkQA commented Jul 2, 2020

Test build #124896 has finished for PR 28926 at commit fe6a32d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

throw new IllegalArgumentException(
"sequence step must be a day interval if start and end values are dates")
}

if (stepMonths == 0 && stepMicros == 0 && scale == MICROS_PER_DAY) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably add comments for each branch. For example, this branch is for adding days to date start/end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

… github.com:TJX2014/spark into master-SPARK-31982-sequence-cross-dst-follow-presto
Comment on lines +2621 to +2625
// Adding pure days to date start/end
backedSequenceImpl.eval(start, stop, fromLong(stepDays))

} else if (stepMonths == 0 && stepDays == 0 && scale == 1) {
// Adding pure microseconds to timestamp start/end
Copy link
Contributor Author

@TJX2014 TJX2014 Jul 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan Thanks, I add more comments for pure days and months branch, the former exception check seems the exception content has enough informs, and the last branch seems have detail content.

@SparkQA
Copy link

SparkQA commented Jul 6, 2020

Test build #125059 has finished for PR 28926 at commit 8c4ffaa.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Jul 6, 2020

Test build #125097 has finished for PR 28926 at commit 8c4ffaa.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Retest this please.

@SparkQA
Copy link

SparkQA commented Jul 7, 2020

Test build #125231 has finished for PR 28926 at commit 8c4ffaa.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Retest this please.

@SparkQA
Copy link

SparkQA commented Jul 8, 2020

Test build #125242 has finished for PR 28926 at commit 8c4ffaa.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Jul 8, 2020

Test build #125333 has finished for PR 28926 at commit 8c4ffaa.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Jul 8, 2020

Test build #125358 has finished for PR 28926 at commit 8c4ffaa.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

dongjoon-hyun added a commit that referenced this pull request Jul 8, 2020
### What changes were proposed in this pull request?

This PR aims to disable SBT `unidoc` generation testing in Jenkins environment because it's flaky in Jenkins environment and not used for the official documentation generation. Also, GitHub Action has the correct test coverage for the official documentation generation.

- #28848 (comment) (amp-jenkins-worker-06)
- #28926 (comment) (amp-jenkins-worker-06)
- #28969 (comment) (amp-jenkins-worker-06)
- #28975 (comment) (amp-jenkins-worker-05)
- #28986 (comment)  (amp-jenkins-worker-05)
- #28992 (comment) (amp-jenkins-worker-06)
- #28993 (comment) (amp-jenkins-worker-05)
- #28999 (comment) (amp-jenkins-worker-04)
- #29010 (comment) (amp-jenkins-worker-03)
- #29013 (comment) (amp-jenkins-worker-04)
- #29016 (comment) (amp-jenkins-worker-05)
- #29025 (comment) (amp-jenkins-worker-04)
- #29042 (comment) (amp-jenkins-worker-03)

### Why are the changes needed?

Apache Spark `release-build.sh` generates the official document by using the following command.
- https://github.com/apache/spark/blob/master/dev/create-release/release-build.sh#L341

```bash
PRODUCTION=1 RELEASE_VERSION="$SPARK_VERSION" jekyll build
```

And, this is executed by the following `unidoc` command for Scala/Java API doc.
- https://github.com/apache/spark/blob/master/docs/_plugins/copy_api_dirs.rb#L30

```ruby
system("build/sbt -Pkinesis-asl clean compile unidoc") || raise("Unidoc generation failed")
```

However, the PR builder disabled `Jekyll build` and instead has a different test coverage.
```python
# determine if docs were changed and if we're inside the amplab environment
# note - the below commented out until *all* Jenkins workers can get `jekyll` installed
# if "DOCS" in changed_modules and test_env == "amplab_jenkins":
#    build_spark_documentation()
```

```
Building Unidoc API Documentation
========================================================================
[info] Building Spark unidoc using SBT with these arguments:
-Phadoop-3.2 -Phive-2.3 -Pspark-ganglia-lgpl -Pkubernetes -Pmesos
-Phadoop-cloud -Phive -Phive-thriftserver -Pkinesis-asl -Pyarn unidoc
```

### Does this PR introduce _any_ user-facing change?

No. (This is used only for testing and not used in the official doc generation.)

### How was this patch tested?

Pass the Jenkins without doc generation invocation.

Closes #29017 from dongjoon-hyun/SPARK-DOC-GEN.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@dongjoon-hyun
Copy link
Member

Retest this please.

@SparkQA
Copy link

SparkQA commented Jul 9, 2020

Test build #125419 has finished for PR 28926 at commit 8c4ffaa.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Jul 9, 2020

Test build #125454 has finished for PR 28926 at commit 8c4ffaa.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Jul 9, 2020

Test build #125487 has started for PR 28926 at commit 8c4ffaa.

@dongjoon-hyun
Copy link
Member

Retest this please

@SparkQA
Copy link

SparkQA commented Jul 9, 2020

Test build #125506 has finished for PR 28926 at commit 8c4ffaa.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Jul 10, 2020

Test build #125577 has finished for PR 28926 at commit 8c4ffaa.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. I tested PySpark/SparkR locally. Merged to master. Thank you, @TJX2014 and all!

Tests passed in 869 seconds
OK:       2306
Failed:   0
Warnings: 0
Skipped:  1

@TJX2014
Copy link
Contributor Author

TJX2014 commented Jul 11, 2020

Thank you very much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants