Skip to content

Conversation

@yaooqinn
Copy link
Member

@yaooqinn yaooqinn commented Oct 9, 2024

What changes were proposed in this pull request?

For additions, we omit the Add operation if the foldable ones finally result in 0, e.g. -3 + a + 3 is simplified to a instead of a + 0.
For multiplication,

  • we simplify the expression to Literal(0, dt) if the foldable ones finally result in 0 && the expression itself isn't nullable
  • we omit the Multiply operation if the foldable ones finally result in 1

Why are the changes needed?

Improve the simplicity of expression evaluation and the opportunities for predicates to be pushed down to data sources

Does this PR introduce any user-facing change?

no, the result shall be identical

How was this patch tested?

new tests

Was this patch authored or co-authored using generative AI tooling?

no

@github-actions github-actions bot added the SQL label Oct 9, 2024
checkAggregateRemoved(df, ansiMode)
val expectedPlanFragment = if (ansiMode) {
"PushedAggregates: [SUM(2147483647 + DEPT)], " +
"PushedAggregates: [SUM(DEPT + 2147483647)], " +
Copy link
Member Author

@yaooqinn yaooqinn Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test changes as the reorder rule is applied when containing only one foldable expression

@yaooqinn yaooqinn changed the title [WIP][SPARK-49915][SQL] Handle zeros and ones in ReorderAssociativeOperator [SPARK-49915][SQL] Handle zeros and ones in ReorderAssociativeOperator Oct 11, 2024
@yaooqinn
Copy link
Member Author

cc @cloud-fan @dongjoon-hyun thanks

testRelation.select(
$"a" + 0,
Literal(-3) + $"a" + 3,
$"b" * 0 * 1 * 2 * 3,
Copy link
Contributor

@cloud-fan cloud-fan Oct 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we test non-nullable b multiply 0?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed



-- !query
select b + 0 from t1 where a = 5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should test b * 0, where the optimization should be skipped to respect the NULL semantic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are already such test cases in ReorderAssociativeOperatorSuite.scala

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, then do we need to add additional tests in this golden file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReorderAssociativeOperatorSuite is more about the plan correctness.

If we want the test to be more intuitive and specific for NULL semantics, I can add some more null-relevant cases here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yaooqinn
Copy link
Member Author

The Python job failure is irrelevant.

Merged to master, thank you @cloud-fan

@yaooqinn yaooqinn closed this in 5104d1d Oct 11, 2024
dongjoon-hyun pushed a commit that referenced this pull request Nov 21, 2024
…ract in ConstantFolding

### What changes were proposed in this pull request?

This PR fixes a long-standing issue in `ReorderAssociativeOperator`. In this rule, we flatten the Add/Multiply nodes, and combine the foldable operands into a single Add/Multiply, then evaluate it into a literal. This is fine normally, but we added a new contract in `ConstantFolding` with #36468 , due to the introduction of ANSI mode and we don't want to fail eagerly for expressions within conditional branches. `ReorderAssociativeOperator` does not follow this contract.

The solution in this PR is to leave the expression evaluation to `ConstantFolding`. `ReorderAssociativeOperator` should only match literals. This makes sure that the early expression evaluation follows all the contracts in `ConstantFolding`.

### Why are the changes needed?

Avoid failing the query which should not fail. This also fixes a regression caused by #48395 , which does not introduce the bug, but makes the bug more likely to happen.

### Does this PR introduce _any_ user-facing change?

Yes, failed queries can run now.

### How was this patch tested?

new test

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #48918 from cloud-fan/error.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants