-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-49915][SQL] Handle zeros and ones in ReorderAssociativeOperator #48395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| checkAggregateRemoved(df, ansiMode) | ||
| val expectedPlanFragment = if (ansiMode) { | ||
| "PushedAggregates: [SUM(2147483647 + DEPT)], " + | ||
| "PushedAggregates: [SUM(DEPT + 2147483647)], " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test changes as the reorder rule is applied when containing only one foldable expression
|
cc @cloud-fan @dongjoon-hyun thanks |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
Show resolved
Hide resolved
| testRelation.select( | ||
| $"a" + 0, | ||
| Literal(-3) + $"a" + 3, | ||
| $"b" * 0 * 1 * 2 * 3, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we test non-nullable b multiply 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed
|
|
||
|
|
||
| -- !query | ||
| select b + 0 from t1 where a = 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should test b * 0, where the optimization should be skipped to respect the NULL semantic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are already such test cases in ReorderAssociativeOperatorSuite.scala
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, then do we need to add additional tests in this golden file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ReorderAssociativeOperatorSuite is more about the plan correctness.
If we want the test to be more intuitive and specific for NULL semantics, I can add some more null-relevant cases here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NVM, I guess we already have such test cases https://github.com/apache/spark/pull/48395/files#diff-c3c4e84c1d15e1c494ce6c062e78a9d53128a489d4831af6940bda50a1148d38R16-L15
|
The Python job failure is irrelevant. Merged to master, thank you @cloud-fan |
…ract in ConstantFolding ### What changes were proposed in this pull request? This PR fixes a long-standing issue in `ReorderAssociativeOperator`. In this rule, we flatten the Add/Multiply nodes, and combine the foldable operands into a single Add/Multiply, then evaluate it into a literal. This is fine normally, but we added a new contract in `ConstantFolding` with #36468 , due to the introduction of ANSI mode and we don't want to fail eagerly for expressions within conditional branches. `ReorderAssociativeOperator` does not follow this contract. The solution in this PR is to leave the expression evaluation to `ConstantFolding`. `ReorderAssociativeOperator` should only match literals. This makes sure that the early expression evaluation follows all the contracts in `ConstantFolding`. ### Why are the changes needed? Avoid failing the query which should not fail. This also fixes a regression caused by #48395 , which does not introduce the bug, but makes the bug more likely to happen. ### Does this PR introduce _any_ user-facing change? Yes, failed queries can run now. ### How was this patch tested? new test ### Was this patch authored or co-authored using generative AI tooling? no Closes #48918 from cloud-fan/error. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
What changes were proposed in this pull request?
For additions, we omit the
Addoperation if the foldable ones finally result in 0, e.g.-3 + a + 3is simplified toainstead ofa + 0.For multiplication,
Literal(0, dt)if the foldable ones finally result in 0 && the expression itself isn't nullableMultiplyoperation if the foldable ones finally result in 1Why are the changes needed?
Improve the simplicity of expression evaluation and the opportunities for predicates to be pushed down to data sources
Does this PR introduce any user-facing change?
no, the result shall be identical
How was this patch tested?
new tests
Was this patch authored or co-authored using generative AI tooling?
no