Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-31980][SQL]Function sequence() fails if start and end of range are equal dates #28819

Closed
wants to merge 3 commits into from

Conversation

TJX2014
Copy link
Contributor

@TJX2014 TJX2014 commented Jun 13, 2020

What changes were proposed in this pull request?

  1. Add judge equal as bigger condition in org.apache.spark.sql.catalyst.expressions.Sequence.TemporalSequenceImpl#eval
  2. Unit test for interval day, month, year

Why are the changes needed?

Bug exists when sequence input get same equal start and end dates, which will occur while loop forever

Does this PR introduce any user-facing change?

Yes,
Before this PR, people will get a java.lang.ArrayIndexOutOfBoundsException, when eval as below:
sql("select sequence(cast('2011-03-01' as date), cast('2011-03-01' as date), interval 1 year)").show(false)

How was this patch tested?

Unit test.

@DaveDeCaprio
Copy link
Contributor

I'm not an admin or committer, but this patch looks good to me.

@HyukjinKwon
Copy link
Member

I think it matches the behaviour with Presto's (see #21155). Shall we check how it works in Presto?

@TJX2014
Copy link
Contributor Author

TJX2014 commented Jun 14, 2020

I think it matches the behaviour with Presto's (see #21155). Shall we check how it works in Presto?

Sure, we can get result from Presto(presto-server-0.236) as follows:
`
presto> select sequence(date('2011-03-01'),date('2011-03-01'),interval '1' year);
_col0
[2011-03-01]
(1 row)

Query 20200614_091013_00013_wnnjv, FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
0:00 [0 rows, 0B] [0 rows/s, 0B/s]

presto> select sequence(date('2011-03-01'),date('2012-03-01'),interval '1' year);
_col0
[2011-03-01, 2012-03-01]
(1 row)

Query 20200614_091017_00014_wnnjv, FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
0:00 [0 rows, 0B] [0 rows/s, 0B/s]

presto> select sequence(date('2011-03-01'),date('2010-03-01'),interval '1' year);
Query 20200614_091021_00015_wnnjv failed: sequence stop value should be greater than or equal to start value if step is greater than zero otherwise stop should be less than or equal to start
`

@HyukjinKwon
Copy link
Member

ok to test

@HyukjinKwon
Copy link
Member

cc @ueshin FYI

@SparkQA
Copy link

SparkQA commented Jun 14, 2020

Test build #124007 has finished for PR 28819 at commit c1765e1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you all. Merged to master/3.0.

dongjoon-hyun pushed a commit that referenced this pull request Jun 20, 2020
…e are equal dates

### What changes were proposed in this pull request?
1. Add judge equal as bigger condition in `org.apache.spark.sql.catalyst.expressions.Sequence.TemporalSequenceImpl#eval`
2. Unit test for interval `day`, `month`, `year`

### Why are the changes needed?
Bug exists when sequence input get same equal start and end dates, which will occur `while loop` forever

### Does this PR introduce _any_ user-facing change?
Yes,
Before this PR, people will get a `java.lang.ArrayIndexOutOfBoundsException`, when eval as below:
`sql("select sequence(cast('2011-03-01' as date), cast('2011-03-01' as date), interval 1 year)").show(false)
`

### How was this patch tested?
Unit test.

Closes #28819 from TJX2014/master-SPARK-31980.

Authored-by: TJX2014 <xiaoxingstack@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 177a380)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun pushed a commit that referenced this pull request Jun 20, 2020
…e are equal dates

### What changes were proposed in this pull request?
1. Add judge equal as bigger condition in `org.apache.spark.sql.catalyst.expressions.Sequence.TemporalSequenceImpl#eval`
2. Unit test for interval `day`, `month`, `year`

### Why are the changes needed?
Bug exists when sequence input get same equal start and end dates, which will occur `while loop` forever

### Does this PR introduce _any_ user-facing change?
Yes,
Before this PR, people will get a `java.lang.ArrayIndexOutOfBoundsException`, when eval as below:
`sql("select sequence(cast('2011-03-01' as date), cast('2011-03-01' as date), interval 1 year)").show(false)
`

### How was this patch tested?
Unit test.

Closes #28819 from TJX2014/master-SPARK-31980.

Authored-by: TJX2014 <xiaoxingstack@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 177a380)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@dongjoon-hyun
Copy link
Member

Hi, @TJX2014 .
In branch-2.4, stringToInterval doesn't exist. Could you make a PR for Apache Spark 2.4.7 please?

@TJX2014
Copy link
Contributor Author

TJX2014 commented Jun 20, 2020

Thanks, I will make a PR for branch-2.4.

@TJX2014
Copy link
Contributor Author

TJX2014 commented Jun 20, 2020

Thanks, I will make a PR for branch-2.4.

@dongjoon-hyun This is the backport.#28877

dongjoon-hyun pushed a commit that referenced this pull request Jun 20, 2020
… range are equal dates

### What changes were proposed in this pull request?
1. Add judge equal as bigger condition in `org.apache.spark.sql.catalyst.expressions.Sequence.TemporalSequenceImpl#eval`
2. Unit test for interval `day`, `month`, `year`
Former PR to master is [#28819
Current PR change `stringToInterval => CalendarInterval.fromString` in order to compatible with branch-2.4

### Why are the changes needed?
Bug exists when sequence input get same equal start and end dates, which will occur `while loop` forever

### Does this PR introduce _any_ user-facing change?
Yes,
Before this PR, people will get a `java.lang.ArrayIndexOutOfBoundsException`, when eval as below:
`sql("select sequence(cast('2011-03-01' as date), cast('2011-03-01' as date), interval 1 year)").show(false)
`

### How was this patch tested?
Unit test.

Closes #28877 from TJX2014/branch-2.4-SPARK-31980-compatible.

Authored-by: TJX2014 <xiaoxingstack@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
sarutak added a commit that referenced this pull request Sep 3, 2021
… ArrayIndexOutOfBoundsException if the arguments are under the condition of start == stop && step < 0

### What changes were proposed in this pull request?

This PR fixes an issue that `sequence` builtin function causes `ArrayIndexOutOfBoundsException` if the arguments are under the condition of `start == stop && step < 0`.
This is an example.
```
SELECT sequence(timestamp'2021-08-31', timestamp'2021-08-31', -INTERVAL 1 month);
21/09/02 04:14:42 ERROR SparkSQLDriver: Failed in [SELECT sequence(timestamp'2021-08-31', timestamp'2021-08-31', -INTERVAL 1 month)]
java.lang.ArrayIndexOutOfBoundsException: 1
```
Actually, this example succeeded before SPARK-31980 (#28819) was merged.

### Why are the changes needed?

Bug fix.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

New tests.

Closes #33895 from sarutak/fix-sequence-issue.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>
sarutak added a commit that referenced this pull request Sep 3, 2021
… ArrayIndexOutOfBoundsException if the arguments are under the condition of start == stop && step < 0

### What changes were proposed in this pull request?

This PR fixes an issue that `sequence` builtin function causes `ArrayIndexOutOfBoundsException` if the arguments are under the condition of `start == stop && step < 0`.
This is an example.
```
SELECT sequence(timestamp'2021-08-31', timestamp'2021-08-31', -INTERVAL 1 month);
21/09/02 04:14:42 ERROR SparkSQLDriver: Failed in [SELECT sequence(timestamp'2021-08-31', timestamp'2021-08-31', -INTERVAL 1 month)]
java.lang.ArrayIndexOutOfBoundsException: 1
```
Actually, this example succeeded before SPARK-31980 (#28819) was merged.

### Why are the changes needed?

Bug fix.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

New tests.

Closes #33895 from sarutak/fix-sequence-issue.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>
(cherry picked from commit cf3bc65)
Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>
sarutak added a commit that referenced this pull request Sep 3, 2021
… ArrayIndexOutOfBoundsException if the arguments are under the condition of start == stop && step < 0

### What changes were proposed in this pull request?

This PR fixes an issue that `sequence` builtin function causes `ArrayIndexOutOfBoundsException` if the arguments are under the condition of `start == stop && step < 0`.
This is an example.
```
SELECT sequence(timestamp'2021-08-31', timestamp'2021-08-31', -INTERVAL 1 month);
21/09/02 04:14:42 ERROR SparkSQLDriver: Failed in [SELECT sequence(timestamp'2021-08-31', timestamp'2021-08-31', -INTERVAL 1 month)]
java.lang.ArrayIndexOutOfBoundsException: 1
```
Actually, this example succeeded before SPARK-31980 (#28819) was merged.

### Why are the changes needed?

Bug fix.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

New tests.

Closes #33895 from sarutak/fix-sequence-issue.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>
(cherry picked from commit cf3bc65)
Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>
dongjoon-hyun pushed a commit that referenced this pull request Sep 6, 2021
…auses ArrayIndexOutOfBoundsException if the arguments are under the condition of start == stop && step < 0

### What changes were proposed in this pull request?

This PR backports the change of SPARK-36639 (#33895) for `branch-3.0`.

This PR fixes an issue that `sequence` builtin function causes `ArrayIndexOutOfBoundsException` if the arguments are under the condition of `start == stop && step < 0`.
This is an example.
```
SELECT sequence(timestamp'2021-08-31', timestamp'2021-08-31', -INTERVAL 1 month);
21/09/02 04:14:42 ERROR SparkSQLDriver: Failed in [SELECT sequence(timestamp'2021-08-31', timestamp'2021-08-31', -INTERVAL 1 month)]
java.lang.ArrayIndexOutOfBoundsException: 1
```
Actually, this example succeeded before SPARK-31980 (#28819) was merged.

### Why are the changes needed?

Bug fix.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

New tests.
### What changes were proposed in this pull request?

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes #33909 from sarutak/backport-SPARK-36639-branch-3.0.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
flyrain pushed a commit to flyrain/spark that referenced this pull request Sep 21, 2021
… ArrayIndexOutOfBoundsException if the arguments are under the condition of start == stop && step < 0

### What changes were proposed in this pull request?

This PR fixes an issue that `sequence` builtin function causes `ArrayIndexOutOfBoundsException` if the arguments are under the condition of `start == stop && step < 0`.
This is an example.
```
SELECT sequence(timestamp'2021-08-31', timestamp'2021-08-31', -INTERVAL 1 month);
21/09/02 04:14:42 ERROR SparkSQLDriver: Failed in [SELECT sequence(timestamp'2021-08-31', timestamp'2021-08-31', -INTERVAL 1 month)]
java.lang.ArrayIndexOutOfBoundsException: 1
```
Actually, this example succeeded before SPARK-31980 (apache#28819) was merged.

### Why are the changes needed?

Bug fix.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

New tests.

Closes apache#33895 from sarutak/fix-sequence-issue.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>
(cherry picked from commit cf3bc65)
Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>
fishcus pushed a commit to fishcus/spark that referenced this pull request Jan 12, 2022
… ArrayIndexOutOfBoundsException if the arguments are under the condition of start == stop && step < 0

### What changes were proposed in this pull request?

This PR fixes an issue that `sequence` builtin function causes `ArrayIndexOutOfBoundsException` if the arguments are under the condition of `start == stop && step < 0`.
This is an example.
```
SELECT sequence(timestamp'2021-08-31', timestamp'2021-08-31', -INTERVAL 1 month);
21/09/02 04:14:42 ERROR SparkSQLDriver: Failed in [SELECT sequence(timestamp'2021-08-31', timestamp'2021-08-31', -INTERVAL 1 month)]
java.lang.ArrayIndexOutOfBoundsException: 1
```
Actually, this example succeeded before SPARK-31980 (apache#28819) was merged.

### Why are the changes needed?

Bug fix.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

New tests.

Closes apache#33895 from sarutak/fix-sequence-issue.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>
(cherry picked from commit cf3bc65)
Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
6 participants