Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-36632][SQL] DivideYMInterval and DivideDTInterval should throw the same exception when divide by zero. #33889

Closed
wants to merge 15 commits into from

Conversation

beliefer
Copy link
Contributor

@beliefer beliefer commented Sep 1, 2021

What changes were proposed in this pull request?

When dividing by zero, DivideYMInterval and DivideDTInterval output

java.lang.ArithmeticException
/ by zero

But, in ansi mode, select 2 / 0 will output

org.apache.spark.SparkArithmeticException
divide by zero

The behavior looks not inconsistent.

Why are the changes needed?

Make consistent behavior.

Does this PR introduce any user-facing change?

'Yes'.

How was this patch tested?

New tests.

@github-actions github-actions bot added the SQL label Sep 1, 2021
@SparkQA
Copy link

SparkQA commented Sep 1, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47417/

@SparkQA
Copy link

SparkQA commented Sep 1, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47417/

@SparkQA
Copy link

SparkQA commented Sep 1, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47418/

@SparkQA
Copy link

SparkQA commented Sep 1, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47418/

@SparkQA
Copy link

SparkQA commented Sep 1, 2021

Test build #142916 has finished for PR 33889 at commit 218824c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 1, 2021

Test build #142915 has finished for PR 33889 at commit 631da9e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 1, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47423/

@SparkQA
Copy link

SparkQA commented Sep 1, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47423/

@SparkQA
Copy link

SparkQA commented Sep 1, 2021

Test build #142920 has finished for PR 33889 at commit 0325bfc.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@beliefer
Copy link
Contributor Author

beliefer commented Sep 2, 2021

ping @MaxGekk @cloud-fan @gengliangwang

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that ANSI intervals don't have to support non-ANSI mode. @beliefer Could you clarify why it should be supported? I would just fix current behavior. It should be the same independently from the SQL config.

@HyukjinKwon
Copy link
Member

I don't feel strongly but just to clarify I think this PR more aims to make other functions to compute internal types in ANSI style. ANSI mode is technically not the compliance of ANSI but more like ANSI-like or ANSI-style IIRC.

WDYT @gengliangwang and @cloud-fan on this support?

@MaxGekk
Copy link
Member

MaxGekk commented Sep 2, 2021

... to make other functions to compute internal types in ANSI style.

My question is why the functions/expressions that work on ANSI interval should support non-ANSI mode. They are not legacy, what's the reason?

@beliefer
Copy link
Contributor Author

beliefer commented Sep 2, 2021

@MaxGekk @HyukjinKwon The function try_divide with non-ansi mode and Analyzer replace try_divide with Divide and Analyzer replace Divide with DivideYMInterval later. so it will not correct.
Please refer to

this(left, right, TryEval(Divide(left, right, failOnError = true)))

and
case (_: YearMonthIntervalType, _) => DivideYMInterval(l, r)

@gengliangwang
Copy link
Member

@beliefer could you clarify the exact inconsistency behavior in the PR description?

@gengliangwang
Copy link
Member

OK, I got it.
There is similar discussion in #33751 (comment)

@beliefer
Copy link
Contributor Author

beliefer commented Sep 2, 2021

@MaxGekk
DivideYMInterval not consider the ansi mode, which leads some issue:

  • Divide supports both ansi and non-ansi mode. DivideYMInterval only supports ansi mode. If DivideYMInterval does not need to support non-ansi mode, then it is very contradictory with the second issue.

  • If we use Divide with non-ansi mode, when Analyzer replacing Divide with DivideYMInterval, current code ignored the non-ansi mode . The behavior looks so strange.

Or I think, if ansi interval encounters failOnError, it should throw an unsupported exception.

@MaxGekk
Copy link
Member

MaxGekk commented Sep 2, 2021

... , current code ignored the non-ansi mode . The behavior looks so strange.

I cannot agree with that. ANSI intervals are new feature. We don't have any legacy user code which could require to support non-ANSI behavior. Everywhere in the new expressions, we have already implemented strong (ANSI) mode. Now, you propose to add many new branches for non-ansi implementations. Should be a good reason for overcomplicating the code besides of "consistency" from your point of view.

@beliefer
Copy link
Contributor Author

beliefer commented Sep 2, 2021

@MaxGekk in ansi mode, select 2 / 0 will output

org.apache.spark.SparkArithmeticException
divide by zero

but select interval '2' year / 0 will output

java.lang.ArithmeticException
/ by zero

The behavior looks not inconsistent. I just think select interval '2' year / 0 should output the same exception.

@MaxGekk
Copy link
Member

MaxGekk commented Sep 2, 2021

but select interval '2' year / 0 will output
java.lang.ArithmeticException

Then let's throw org.apache.spark.SparkArithmeticException independently from the SQL config (ANSI or non-ANSI).

@@ -612,6 +612,13 @@ case class DivideYMInterval(
override def inputTypes: Seq[AbstractDataType] = Seq(YearMonthIntervalType, NumericType)
override def dataType: DataType = YearMonthIntervalType()

@transient
private lazy val checkFunc: (Any) => Unit = right.dataType match {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ... val divideByZeroCheck: Any => Unit = ...

@@ -641,6 +649,11 @@ case class DivideYMInterval(
val javaType = CodeGenerator.javaType(dataType)
val months = left.genCode(ctx)
val num = right.genCode(ctx)
val checkDivideByZero =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can add

private def divideByZeroCheckCodegen(value: String): String = right.dataType match {
  case _: DecimalType => "if ($value.isZero()) throw ..."
  case _ => "if ($value == 0) throw ..."
}

to avoid duplicate code here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and we can move these util functions to IntervalDivide

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

@SparkQA
Copy link

SparkQA commented Sep 3, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47475/

@SparkQA
Copy link

SparkQA commented Sep 3, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47477/

@SparkQA
Copy link

SparkQA commented Sep 3, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47475/

@SparkQA
Copy link

SparkQA commented Sep 3, 2021

Test build #142965 has finished for PR 33889 at commit 823048a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 3, 2021

Test build #142974 has finished for PR 33889 at commit b00d8d8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 3, 2021

Test build #142976 has finished for PR 33889 at commit 7df29e5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -612,6 +617,13 @@ case class DivideYMInterval(
override def inputTypes: Seq[AbstractDataType] = Seq(YearMonthIntervalType, NumericType)
override def dataType: DataType = YearMonthIntervalType()

@transient
private lazy val divideByZeroCheck: Any => Unit = right.dataType match {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move this to IntervalDivide as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree to Wenchen, let's move it to the common trait.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except of one comment about moving code to the common trait.

@@ -598,6 +598,11 @@ trait IntervalDivide {
}
}
}

def divideByZeroCheckCodegen(expr: Expression, value: String): String = expr.dataType match {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Expression is too general as the first parameter, you could pass DataType. Up to you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@SparkQA
Copy link

SparkQA commented Oct 7, 2021

Test build #143941 has started for PR 33889 at commit 7df29e5.

@SparkQA
Copy link

SparkQA commented Oct 7, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48440/

@SparkQA
Copy link

SparkQA commented Oct 7, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48440/

@cloud-fan
Copy link
Contributor

@beliefer any updates?

@beliefer
Copy link
Contributor Author

@beliefer any updates?

I'm so sorry for update late.

@SparkQA
Copy link

SparkQA commented Oct 14, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48721/

@SparkQA
Copy link

SparkQA commented Oct 14, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48721/

@cloud-fan
Copy link
Contributor

thanks, merging to master/3.2!

@cloud-fan cloud-fan closed this in de0161a Oct 14, 2021
cloud-fan pushed a commit that referenced this pull request Oct 14, 2021
… the same exception when divide by zero

### What changes were proposed in this pull request?
When dividing by zero, `DivideYMInterval` and `DivideDTInterval` output
```
java.lang.ArithmeticException
/ by zero
```
But, in ansi mode, `select 2 / 0` will output
```
org.apache.spark.SparkArithmeticException
divide by zero
```
The behavior looks not inconsistent.

### Why are the changes needed?
Make consistent behavior.

### Does this PR introduce _any_ user-facing change?
'Yes'.

### How was this patch tested?
New tests.

Closes #33889 from beliefer/SPARK-36632.

Lead-authored-by: gengjiaan <gengjiaan@360.cn>
Co-authored-by: beliefer <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit de0161a)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@beliefer
Copy link
Contributor Author

@cloud-fan @MaxGekk @gengliangwang @HyukjinKwon Thank you for review.

@SparkQA
Copy link

SparkQA commented Oct 14, 2021

Test build #144241 has finished for PR 33889 at commit d202787.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

sunchao pushed a commit to sunchao/spark that referenced this pull request Dec 8, 2021
… the same exception when divide by zero

### What changes were proposed in this pull request?
When dividing by zero, `DivideYMInterval` and `DivideDTInterval` output
```
java.lang.ArithmeticException
/ by zero
```
But, in ansi mode, `select 2 / 0` will output
```
org.apache.spark.SparkArithmeticException
divide by zero
```
The behavior looks not inconsistent.

### Why are the changes needed?
Make consistent behavior.

### Does this PR introduce _any_ user-facing change?
'Yes'.

### How was this patch tested?
New tests.

Closes apache#33889 from beliefer/SPARK-36632.

Lead-authored-by: gengjiaan <gengjiaan@360.cn>
Co-authored-by: beliefer <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit de0161a)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
catalinii pushed a commit to lyft/spark that referenced this pull request Feb 22, 2022
… the same exception when divide by zero

### What changes were proposed in this pull request?
When dividing by zero, `DivideYMInterval` and `DivideDTInterval` output
```
java.lang.ArithmeticException
/ by zero
```
But, in ansi mode, `select 2 / 0` will output
```
org.apache.spark.SparkArithmeticException
divide by zero
```
The behavior looks not inconsistent.

### Why are the changes needed?
Make consistent behavior.

### Does this PR introduce _any_ user-facing change?
'Yes'.

### How was this patch tested?
New tests.

Closes apache#33889 from beliefer/SPARK-36632.

Lead-authored-by: gengjiaan <gengjiaan@360.cn>
Co-authored-by: beliefer <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit de0161a)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
catalinii pushed a commit to lyft/spark that referenced this pull request Mar 4, 2022
… the same exception when divide by zero

### What changes were proposed in this pull request?
When dividing by zero, `DivideYMInterval` and `DivideDTInterval` output
```
java.lang.ArithmeticException
/ by zero
```
But, in ansi mode, `select 2 / 0` will output
```
org.apache.spark.SparkArithmeticException
divide by zero
```
The behavior looks not inconsistent.

### Why are the changes needed?
Make consistent behavior.

### Does this PR introduce _any_ user-facing change?
'Yes'.

### How was this patch tested?
New tests.

Closes apache#33889 from beliefer/SPARK-36632.

Lead-authored-by: gengjiaan <gengjiaan@360.cn>
Co-authored-by: beliefer <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit de0161a)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
6 participants