Skip to content

[SPARK-35886][SQL] CodeGenerator.getLocalInputVariableValues should handle case that expression match SubExprEliminationState but not VariableValue#33082

Closed
AngersZhuuuu wants to merge 3 commits intoapache:masterfrom
AngersZhuuuu:SPARK-35886

Conversation

@AngersZhuuuu
Copy link
Contributor

@AngersZhuuuu AngersZhuuuu commented Jun 25, 2021

What changes were proposed in this pull request?

For test case

spark.sql(
  """
    |CREATE TABLE t1 (
    |  c1 DECIMAL(18,6),
    |  c2 DECIMAL(18,6),
    |  c3 DECIMAL(18,6))
    |USING parquet;
    |""".stripMargin)

spark.sql("SELECT sum(c1 * c3) + sum(c2 * c3) FROM t1").show

failed when codegen

20:23:36.272 ERROR org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 56, Column 6: Expression "agg_exprIsNull_2_0" is not an rvalue
org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 56, Column 6: Expression "agg_exprIsNull_2_0" is not an rvalue
	at org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:12675)
	at org.codehaus.janino.UnitCompiler.toRvalueOrCompileException(UnitCompiler.java:7676)

It's caused by when call CodeGenerator.getLocalInputVariableValues, expression match SubExprEliminationState but is not VariableValue, then skip this whole expression, but when call splitAggregateExpressions, since we need to find all input variables by this method, this will cause we skip some inputs.
Here in this case , we put expression's children to stack too.

Why are the changes needed?

Fix Bug

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added UT

…andle matched subQuery but not VariableValue
@github-actions github-actions bot added the SQL label Jun 25, 2021
@SparkQA
Copy link

SparkQA commented Jun 25, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44851/

@SparkQA
Copy link

SparkQA commented Jun 25, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44851/

@SparkQA
Copy link

SparkQA commented Jun 25, 2021

Test build #140318 has finished for PR 33082 at commit d03657c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AngersZhuuuu
Copy link
Contributor Author

ping @cloud-fan @maropu @wangyum @Ngone51

@SparkQA
Copy link

SparkQA commented Jun 25, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44865/

@SparkQA
Copy link

SparkQA commented Jun 25, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44865/

@SparkQA
Copy link

SparkQA commented Jun 25, 2021

Test build #140333 has finished for PR 33082 at commit 01e5d38.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum
Copy link
Member

wangyum commented Jun 26, 2021

Also cc @viirya

}

test("SPARK-35886: CodeGenerator.getLocalInputVariableValues should handle " +
"matched subQuery but not VariableValue") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean but not ONLY VariableValue?

Comment on lines +1781 to +1782
collectLocalVariable(value, e)
collectLocalVariable(isNull, e)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, some SubExprEliminationState contains not VariableValue but still some statements that includes other expressions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, some SubExprEliminationState contains not VariableValue but still some statements that includes other expressions?

Yes, the expression contains input var may match SubExprEliminationState and just skiped.

Copy link
Member

@viirya viirya Jun 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, after second look, it looks not root cause. A subexpr should be evaluated before using it.

So technically all its children expressions are not needed.

I found the root cause. PromotePrecision overwrites Expression.genCode where is subexpression replace happens. So PromotePrecision skips subexpression elimination. I will submit another PR and include your test as co-author there. Thanks.

@viirya
Copy link
Member

viirya commented Jun 26, 2021

Do we have "subquery" here? Please update the description if it is not "subquery".

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable, although the description and the title confuse.

@AngersZhuuuu AngersZhuuuu changed the title [SPARK-35886][SQL] CodeGenerator.getLocalInputVariableValues should handle matched subQuery but not VariableValue [SPARK-35886][SQL] CodeGenerator.getLocalInputVariableValues should handle case that expression match SubExprEliminationState but not VariableValue Jun 26, 2021
@AngersZhuuuu
Copy link
Contributor Author

Do we have "subquery" here? Please update the description if it is not "subquery".

My mistake, should be SubExprEliminationState

@AngersZhuuuu
Copy link
Contributor Author

Looks reasonable, although the description and the title confuse.

Updated, how about current?

@AngersZhuuuu
Copy link
Contributor Author

Close this according to #33103

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants