New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-36632][SQL] DivideYMInterval and DivideDTInterval should throw the same exception when divide by zero. #33889
Conversation
Kubernetes integration test starting |
Kubernetes integration test status failure |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #142916 has finished for PR 33889 at commit
|
Test build #142915 has finished for PR 33889 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #142920 has finished for PR 33889 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that ANSI intervals don't have to support non-ANSI mode. @beliefer Could you clarify why it should be supported? I would just fix current behavior. It should be the same independently from the SQL config.
I don't feel strongly but just to clarify I think this PR more aims to make other functions to compute internal types in ANSI style. ANSI mode is technically not the compliance of ANSI but more like ANSI-like or ANSI-style IIRC. WDYT @gengliangwang and @cloud-fan on this support? |
My question is why the functions/expressions that work on ANSI interval should support non-ANSI mode. They are not legacy, what's the reason? |
@MaxGekk @HyukjinKwon The function spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TryEval.scala Line 116 in 9c5bcac
and spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala Line 428 in 9c5bcac
|
@beliefer could you clarify the exact inconsistency behavior in the PR description? |
OK, I got it. |
@MaxGekk
Or I think, if ansi interval encounters |
I cannot agree with that. ANSI intervals are new feature. We don't have any legacy user code which could require to support non-ANSI behavior. Everywhere in the new expressions, we have already implemented strong (ANSI) mode. Now, you propose to add many new branches for non-ansi implementations. Should be a good reason for overcomplicating the code besides of "consistency" from your point of view. |
@MaxGekk in ansi mode,
but
The behavior looks not inconsistent. I just think |
Then let's throw |
@@ -612,6 +612,13 @@ case class DivideYMInterval( | |||
override def inputTypes: Seq[AbstractDataType] = Seq(YearMonthIntervalType, NumericType) | |||
override def dataType: DataType = YearMonthIntervalType() | |||
|
|||
@transient | |||
private lazy val checkFunc: (Any) => Unit = right.dataType match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: ... val divideByZeroCheck: Any => Unit = ...
@@ -641,6 +649,11 @@ case class DivideYMInterval( | |||
val javaType = CodeGenerator.javaType(dataType) | |||
val months = left.genCode(ctx) | |||
val num = right.genCode(ctx) | |||
val checkDivideByZero = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can add
private def divideByZeroCheckCodegen(value: String): String = right.dataType match {
case _: DecimalType => "if ($value.isZero()) throw ..."
case _ => "if ($value == 0) throw ..."
}
to avoid duplicate code here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and we can move these util functions to IntervalDivide
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
Kubernetes integration test starting |
Kubernetes integration test unable to build dist. exiting with code: 1 |
Kubernetes integration test status failure |
Test build #142965 has finished for PR 33889 at commit
|
Test build #142974 has finished for PR 33889 at commit
|
Test build #142976 has finished for PR 33889 at commit
|
@@ -612,6 +617,13 @@ case class DivideYMInterval( | |||
override def inputTypes: Seq[AbstractDataType] = Seq(YearMonthIntervalType, NumericType) | |||
override def dataType: DataType = YearMonthIntervalType() | |||
|
|||
@transient | |||
private lazy val divideByZeroCheck: Any => Unit = right.dataType match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we move this to IntervalDivide
as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree to Wenchen, let's move it to the common trait.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except of one comment about moving code to the common trait.
@@ -598,6 +598,11 @@ trait IntervalDivide { | |||
} | |||
} | |||
} | |||
|
|||
def divideByZeroCheckCodegen(expr: Expression, value: String): String = expr.dataType match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Expression is too general as the first parameter, you could pass DataType
. Up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
Test build #143941 has started for PR 33889 at commit |
Kubernetes integration test starting |
Kubernetes integration test status failure |
@beliefer any updates? |
I'm so sorry for update late. |
Kubernetes integration test starting |
Kubernetes integration test status failure |
thanks, merging to master/3.2! |
… the same exception when divide by zero ### What changes were proposed in this pull request? When dividing by zero, `DivideYMInterval` and `DivideDTInterval` output ``` java.lang.ArithmeticException / by zero ``` But, in ansi mode, `select 2 / 0` will output ``` org.apache.spark.SparkArithmeticException divide by zero ``` The behavior looks not inconsistent. ### Why are the changes needed? Make consistent behavior. ### Does this PR introduce _any_ user-facing change? 'Yes'. ### How was this patch tested? New tests. Closes #33889 from beliefer/SPARK-36632. Lead-authored-by: gengjiaan <gengjiaan@360.cn> Co-authored-by: beliefer <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit de0161a) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@cloud-fan @MaxGekk @gengliangwang @HyukjinKwon Thank you for review. |
Test build #144241 has finished for PR 33889 at commit
|
… the same exception when divide by zero ### What changes were proposed in this pull request? When dividing by zero, `DivideYMInterval` and `DivideDTInterval` output ``` java.lang.ArithmeticException / by zero ``` But, in ansi mode, `select 2 / 0` will output ``` org.apache.spark.SparkArithmeticException divide by zero ``` The behavior looks not inconsistent. ### Why are the changes needed? Make consistent behavior. ### Does this PR introduce _any_ user-facing change? 'Yes'. ### How was this patch tested? New tests. Closes apache#33889 from beliefer/SPARK-36632. Lead-authored-by: gengjiaan <gengjiaan@360.cn> Co-authored-by: beliefer <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit de0161a) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
… the same exception when divide by zero ### What changes were proposed in this pull request? When dividing by zero, `DivideYMInterval` and `DivideDTInterval` output ``` java.lang.ArithmeticException / by zero ``` But, in ansi mode, `select 2 / 0` will output ``` org.apache.spark.SparkArithmeticException divide by zero ``` The behavior looks not inconsistent. ### Why are the changes needed? Make consistent behavior. ### Does this PR introduce _any_ user-facing change? 'Yes'. ### How was this patch tested? New tests. Closes apache#33889 from beliefer/SPARK-36632. Lead-authored-by: gengjiaan <gengjiaan@360.cn> Co-authored-by: beliefer <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit de0161a) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
… the same exception when divide by zero ### What changes were proposed in this pull request? When dividing by zero, `DivideYMInterval` and `DivideDTInterval` output ``` java.lang.ArithmeticException / by zero ``` But, in ansi mode, `select 2 / 0` will output ``` org.apache.spark.SparkArithmeticException divide by zero ``` The behavior looks not inconsistent. ### Why are the changes needed? Make consistent behavior. ### Does this PR introduce _any_ user-facing change? 'Yes'. ### How was this patch tested? New tests. Closes apache#33889 from beliefer/SPARK-36632. Lead-authored-by: gengjiaan <gengjiaan@360.cn> Co-authored-by: beliefer <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit de0161a) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
When dividing by zero,
DivideYMInterval
andDivideDTInterval
outputBut, in ansi mode,
select 2 / 0
will outputThe behavior looks not inconsistent.
Why are the changes needed?
Make consistent behavior.
Does this PR introduce any user-facing change?
'Yes'.
How was this patch tested?
New tests.